DataGen SQL Connector

    The DataGen connector allows for creating tables based on in-memory data generation. This is useful when developing queries locally without access to external systems such as Kafka. Tables can include Computed Column syntax which allows for flexible record generation.

    By default, a DataGen table will create an unbounded number of rows with a random value for each column. For variable sized types, char/varchar/string/array/map/multiset, the length can be specified. Additionally, a total number of rows can be specified, resulting in a bounded table.

    Time types are always the local machines current system time.

    1. CREATE TABLE Orders (
    2. order_number BIGINT,
    3. buyer ROW<first_name STRING, last_name STRING>,
    4. ) WITH (...)
    5. -- create a bounded mock table
    6. 'connector' = 'datagen',
    7. 'number-of-rows' = '10'
    8. )
    9. LIKE Orders (EXCLUDING ALL)
    OptionRequiredDefaultTypeDescription
    connector
    required(none)StringSpecify what connector to use, here should be ‘datagen’.
    rows-per-second
    optional10000LongRows per second to control the emit rate.
    number-of-rows
    optional(none)LongThe total number of rows to emit. By default, the table is unbounded.
    fields.#.kind
    optionalrandomStringGenerator of this ‘#’ field. Can be ‘sequence’ or ‘random’.
    fields.#.min
    optional(Minimum value of type)(Type of field)Minimum value of random generator, work for numeric types.
    fields.#.max
    optional(Maximum value of type)(Type of field)Maximum value of random generator, work for numeric types.
    fields.#.max-past
    optional0DurationMaximum past of timestamp random generator, only works for timestamp types.
    fields.#.length
    optional100IntegerSize or length of the collection for generating char/varchar/string/array/map/multiset types.
    fields.#.start
    optional(none)(Type of field)Start value of sequence generator.
    fields.#.end
    optional(none)(Type of field)End value of sequence generator.