DataGen SQL Connector
The DataGen connector allows for creating tables based on in-memory data generation. This is useful when developing queries locally without access to external systems such as Kafka. Tables can include Computed Column syntax which allows for flexible record generation.
By default, a DataGen table will create an unbounded number of rows with a random value for each column. For variable sized types, char/varchar/string/array/map/multiset, the length can be specified. Additionally, a total number of rows can be specified, resulting in a bounded table.
Time types are always the local machines current system time.
CREATE TABLE Orders (
order_number BIGINT,
buyer ROW<first_name STRING, last_name STRING>,
) WITH (...)
-- create a bounded mock table
'connector' = 'datagen',
'number-of-rows' = '10'
)
LIKE Orders (EXCLUDING ALL)
Option | Required | Default | Type | Description |
---|---|---|---|---|
connector | required | (none) | String | Specify what connector to use, here should be ‘datagen’. |
rows-per-second | optional | 10000 | Long | Rows per second to control the emit rate. |
number-of-rows | optional | (none) | Long | The total number of rows to emit. By default, the table is unbounded. |
fields.#.kind | optional | random | String | Generator of this ‘#’ field. Can be ‘sequence’ or ‘random’. |
fields.#.min | optional | (Minimum value of type) | (Type of field) | Minimum value of random generator, work for numeric types. |
fields.#.max | optional | (Maximum value of type) | (Type of field) | Maximum value of random generator, work for numeric types. |
fields.#.max-past | optional | 0 | Duration | Maximum past of timestamp random generator, only works for timestamp types. |
fields.#.length | optional | 100 | Integer | Size or length of the collection for generating char/varchar/string/array/map/multiset types. |
fields.#.start | optional | (none) | (Type of field) | Start value of sequence generator. |
fields.#.end | optional | (none) | (Type of field) | End value of sequence generator. |