Data Types

    A data type describes the logical type of a value in the table ecosystem. It can be used to declare input and/or output types of operations.

    Flink’s data types are similar to the SQL standard’s data type terminology but also contain information about the nullability of a value for efficient handling of scalar expressions.

    Examples of data types are:

    • INT
    • INT NOT NULL
    • INTERVAL DAY TO SECOND(3)
    • ROW<myField ARRAY<BOOLEAN>, myOtherField TIMESTAMP(3)>

    A list of all pre-defined data types can be found below.

    Java/Scala

    Users of the JVM-based API work with instances of org.apache.flink.table.types.DataType within the Table API or when defining connectors, catalogs, or user-defined functions.

    A DataType instance has two responsibilities:

    • Declaration of a logical type which does not imply a concrete physical representation for transmission or storage but defines the boundaries between JVM-based/Python languages and the table ecosystem.
    • Optional: Giving hints about the physical representation of data to the planner which is useful at the edges to other APIs.

    For JVM-based languages, all pre-defined data types are available in org.apache.flink.table.api.DataTypes.

    Python

    Users of the Python API work with instances of pyflink.table.types.DataType within the Python Table API or when defining Python user-defined functions.

    A DataType instance has such a responsibility:

    • Declaration of a logical type which does not imply a concrete physical representation for transmission or storage but defines the boundaries between Python languages and the table ecosystem.

    For Python language, those types are available in pyflink.table.types.DataTypes.

    Java

    It is recommended to add a star import to your table programs for having a fluent API:

    Scala

    It is recommended to add a star import to your table programs for having a fluent API:

    1. import org.apache.flink.table.api.DataTypes._
    2. val t: DataType = INTERVAL(DAY(), SECOND(3));

    Python

    1. from pyflink.table.types import DataTypes
    2. t = DataTypes.INTERVAL(DataTypes.DAY(), DataTypes.SECOND(3))

    Physical Hints

    Physical hints are required at the edges of the table ecosystem where the SQL-based type system ends and programming-specific data types are required. Hints indicate the data format that an implementation expects.

    For example, a data source could express that it produces values for logical TIMESTAMPs using a java.sql.Timestamp class instead of using java.time.LocalDateTime which would be the default. With this information, the runtime is able to convert the produced class into its internal data format. In return, a data sink can declare the data format it consumes from the runtime.

    Here are some examples of how to declare a bridging conversion class:

    Java

    1. // tell the runtime to not produce or consume java.time.LocalDateTime instances
    2. // but java.sql.Timestamp
    3. DataType t = DataTypes.TIMESTAMP(3).bridgedTo(java.sql.Timestamp.class);
    4. // tell the runtime to not produce or consume boxed integer arrays
    5. // but primitive int arrays
    6. DataType t = DataTypes.ARRAY(DataTypes.INT().notNull()).bridgedTo(int[].class);

    Scala

    1. // tell the runtime to not produce or consume java.time.LocalDateTime instances
    2. // but java.sql.Timestamp
    3. val t: DataType = DataTypes.TIMESTAMP(3).bridgedTo(classOf[java.sql.Timestamp]);
    4. // tell the runtime to not produce or consume boxed integer arrays
    5. // but primitive int arrays
    6. val t: DataType = DataTypes.ARRAY(DataTypes.INT().notNull()).bridgedTo(classOf[Array[Int]]);

    Attention Please note that physical hints are usually only required if the API is extended. Users of predefined sources/sinks/functions do not need to define such hints. Hints within a table program (e.g. field.cast(TIMESTAMP(3).bridgedTo(Timestamp.class))) are ignored.

    Java/Scala

    The default planner supports the following set of SQL types:

    Python

    N/A

    This section lists all pre-defined data types.

    Java/Scala

    For the JVM-based Table API those types are also available in org.apache.flink.table.api.DataTypes.

    Python

    For the Python Table API, those types are available in pyflink.table.types.DataTypes.

    Character Strings

    CHAR

    Data type of a fixed-length character string.

    Declaration

    SQL

    1. CHAR
    2. CHAR(n)

    Java/Scala

    1. DataTypes.CHAR(n)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.lang.StringXXDefault
    byte[]XXAssumes UTF-8 encoding.
    org.apache.flink.table.data.StringDataXXInternal data structure.

    Python

    1. Not supported.

    The type can be declared using CHAR(n) where n is the number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.

    VARCHAR / STRING

    Data type of a variable-length character string.

    Declaration

    SQL

    1. VARCHAR
    2. VARCHAR(n)
    3. STRING

    Java/Scala

    1. DataTypes.VARCHAR(n)
    2. DataTypes.STRING()

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.lang.StringXXDefault
    byte[]XXAssumes UTF-8 encoding.
    org.apache.flink.table.data.StringDataXXInternal data structure.

    Python

    1. DataTypes.VARCHAR(n)
    2. DataTypes.STRING()

    Attention The specified maximum number of code points n in DataTypes.VARCHAR(n) must be 2,147,483,647 currently.

    The type can be declared using VARCHAR(n) where n is the maximum number of code points. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.

    STRING is a synonym for VARCHAR(2147483647).

    Binary Strings

    BINARY

    Data type of a fixed-length binary string (=a sequence of bytes).

    Declaration

    SQL

    1. BINARY
    2. BINARY(n)

    Java/Scala

    1. DataTypes.BINARY(n)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    byte[]XXDefault

    Python

    1. Not supported.

    The type can be declared using BINARY(n) where n is the number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.

    VARBINARY / BYTES

    Data type of a variable-length binary string (=a sequence of bytes).

    Declaration

    SQL

    1. VARBINARY
    2. VARBINARY(n)
    3. BYTES

    Java/Scala

    1. DataTypes.VARBINARY(n)
    2. DataTypes.BYTES()

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    byte[]XXDefault

    Python

    1. DataTypes.VARBINARY(n)
    2. DataTypes.BYTES()

    Attention The specified maximum number of bytes n in DataTypes.VARBINARY(n) must be 2,147,483,647 currently.

    The type can be declared using VARBINARY(n) where n is the maximum number of bytes. n must have a value between 1 and 2,147,483,647 (both inclusive). If no length is specified, n is equal to 1.

    BYTES is a synonym for VARBINARY(2147483647).

    DECIMAL

    Data type of a decimal number with fixed precision and scale.

    Declaration

    SQL

    1. DECIMAL
    2. DECIMAL(p)
    3. DEC
    4. DEC(p)
    5. DEC(p, s)
    6. NUMERIC
    7. NUMERIC(p)
    8. NUMERIC(p, s)

    Java/Scala

    1. DataTypes.DECIMAL(p, s)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.math.BigDecimalXXDefault
    org.apache.flink.table.data.DecimalDataXXInternal data structure.

    Python

    1. DataTypes.DECIMAL(p, s)

    Attention The precision and scale specified in DataTypes.DECIMAL(p, s) must be 38 and 18 separately currently.

    The type can be declared using DECIMAL(p, s) where p is the number of digits in a number (precision) and s is the number of digits to the right of the decimal point in a number (scale). p must have a value between 1 and 38 (both inclusive). s must have a value between 0 and p (both inclusive). The default value for p is 10. The default value for s is 0.

    NUMERIC(p, s) and DEC(p, s) are synonyms for this type.

    TINYINT

    Data type of a 1-byte signed integer with values from -128 to 127.

    Declaration

    SQL

    1. TINYINT

    Java/Scala

    1. DataTypes.TINYINT()

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.lang.ByteXXDefault
    byteX(X)Output only if type is not nullable.

    Python

    1. DataTypes.TINYINT()

    SMALLINT

    Data type of a 2-byte signed integer with values from -32,768 to 32,767.

    Declaration

    SQL

    1. SMALLINT

    Java/Scala

    1. DataTypes.SMALLINT()

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.lang.ShortXXDefault
    shortX(X)Output only if type is not nullable.

    Python

    1. DataTypes.SMALLINT()

    INT

    Data type of a 4-byte signed integer with values from -2,147,483,648 to 2,147,483,647.

    Declaration

    SQL

    1. INT
    2. INTEGER

    Java/Scala

    1. DataTypes.INT()

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.lang.IntegerXXDefault
    intX(X)Output only if type is not nullable.

    Python

    INTEGER is a synonym for this type.

    BIGINT

    Data type of an 8-byte signed integer with values from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

    Declaration

    SQL

    1. BIGINT

    Bridging to JVM Types

    Python

    1. DataTypes.BIGINT()

    Approximate Numerics

    FLOAT

    Data type of a 4-byte single precision floating point number.

    Compared to the SQL standard, the type does not take parameters.

    Declaration

    SQL

    1. FLOAT

    Java/Scala

    1. DataTypes.FLOAT()

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.lang.FloatXXDefault
    floatX(X)Output only if type is not nullable.

    Python

    1. DataTypes.FLOAT()

    DOUBLE

    Data type of an 8-byte double precision floating point number.

    Declaration

    SQL

    1. DOUBLE
    2. DOUBLE PRECISION

    Java/Scala

    1. DataTypes.DOUBLE()

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.lang.DoubleXXDefault
    doubleX(X)Output only if type is not nullable.

    Python

    1. DataTypes.DOUBLE()

    DOUBLE PRECISION is a synonym for this type.

    Date and Time

    DATE

    Data type of a date consisting of year-month-day with values ranging from 0000-01-01 to 9999-12-31.

    Compared to the SQL standard, the range starts at year 0000.

    Declaration

    SQL

    1. DATE

    Java/Scala

    1. DataTypes.DATE()

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.time.LocalDateXXDefault
    java.sql.DateXX
    java.lang.IntegerXXDescribes the number of days since epoch.
    intX(X)Describes the number of days since epoch.
    Output only if type is not nullable.

    Python

    1. DataTypes.DATE()

    TIME

    Data type of a time without time zone consisting of hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 00:00:00.000000000 to 23:59:59.999999999.

    SQL/Java/Scala

    Compared to the SQL standard, leap seconds (23:59:60 and 23:59:61) are not supported as the semantics are closer to java.time.LocalTime. A time with time zone is not provided.

    Python

    Compared to the SQL standard, leap seconds (23:59:60 and 23:59:61) are not supported. A time with time zone is not provided.

    Declaration

    SQL

    1. TIME
    2. TIME(p)

    Java/Scala

    1. DataTypes.TIME(p)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.time.LocalTimeXXDefault
    java.sql.TimeXX
    java.lang.IntegerXXDescribes the number of milliseconds of the day.
    intX(X)Describes the number of milliseconds of the day.
    Output only if type is not nullable.
    java.lang.LongXXDescribes the number of nanoseconds of the day.
    longX(X)Describes the number of nanoseconds of the day.
    Output only if type is not nullable.

    Python

    1. DataTypes.TIME(p)

    Attention The precision specified in DataTypes.TIME(p) must be 0 currently.

    The type can be declared using TIME(p) where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 0.

    TIMESTAMP

    Data type of a timestamp without time zone consisting of year-month-day hour:minute:second[.fractional] with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 to 9999-12-31 23:59:59.999999999.

    SQL/Java/Scala

    Compared to the SQL standard, leap seconds (23:59:60 and 23:59:61) are not supported as the semantics are closer to java.time.LocalDateTime.

    A conversion from and to BIGINT (a JVM long type) is not supported as this would imply a time zone. However, this type is time zone free. For more java.time.Instant-like semantics use TIMESTAMP_LTZ.

    Python

    Compared to the SQL standard, leap seconds (23:59:60 and 23:59:61) are not supported.

    A conversion from and to BIGINT is not supported as this would imply a time zone. However, this type is time zone free. If you have such a requirement please use TIMESTAMP_LTZ.

    Declaration

    SQL

    1. TIMESTAMP
    2. TIMESTAMP(p)
    3. TIMESTAMP WITHOUT TIME ZONE
    4. TIMESTAMP(p) WITHOUT TIME ZONE

    Java/Scala

    1. DataTypes.TIMESTAMP(p)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.time.LocalDateTimeXXDefault
    java.sql.TimestampXX
    org.apache.flink.table.data.TimestampDataXXInternal data structure.

    Python

    1. DataTypes.TIMESTAMP(p)

    Attention The precision specified in DataTypes.TIMESTAMP(p) must be 3 currently.

    The type can be declared using TIMESTAMP(p) where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 6.

    TIMESTAMP(p) WITHOUT TIME ZONE is a synonym for this type.

    TIMESTAMP WITH TIME ZONE

    Data type of a timestamp with time zone consisting of year-month-day hour:minute:second[.fractional] zone with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 +14:59 to 9999-12-31 23:59:59.999999999 -14:59.

    SQL/Java/Scala

    Compared to the SQL standard, leap seconds (23:59:60 and 23:59:61) are not supported as the semantics are closer to java.time.OffsetDateTime.

    Python

    Compared to the SQL standard, leap seconds (23:59:60 and 23:59:61) are not supported.

    Compared to TIMESTAMP_LTZ, the time zone offset information is physically stored in every datum. It is used individually for every computation, visualization, or communication to external systems.

    Declaration

    SQL

    1. TIMESTAMP WITH TIME ZONE
    2. TIMESTAMP(p) WITH TIME ZONE

    Java/Scala

    1. DataTypes.TIMESTAMP_WITH_TIME_ZONE(p)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.time.OffsetDateTimeXXDefault
    java.time.ZonedDateTimeXIgnores the zone ID.

    Python

    1. Not supported.

    SQL/Java/Scala

    The type can be declared using TIMESTAMP(p) WITH TIME ZONE where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 6.

    Python

    TIMESTAMP_LTZ

    Data type of a timestamp with local time zone consisting of year-month-day hour:minute:second[.fractional] zone with up to nanosecond precision and values ranging from 0000-01-01 00:00:00.000000000 +14:59 to 9999-12-31 23:59:59.999999999 -14:59.

    SQL/Java/Scala

    Leap seconds (23:59:60 and 23:59:61) are not supported as the semantics are closer to java.time.OffsetDateTime.

    Compared to TIMESTAMP WITH TIME ZONE, the time zone offset information is not stored physically in every datum. Instead, the type assumes java.time.Instant semantics in UTC time zone at the edges of the table ecosystem. Every datum is interpreted in the local time zone configured in the current session for computation and visualization.

    Python

    Leap seconds (23:59:60 and 23:59:61) are not supported.

    Compared to TIMESTAMP WITH TIME ZONE, the time zone offset information is not stored physically in every datum. Every datum is interpreted in the local time zone configured in the current session for computation and visualization.

    This type fills the gap between time zone free and time zone mandatory timestamp types by allowing the interpretation of UTC timestamps according to the configured session time zone.

    Declaration

    SQL

    1. TIMESTAMP_LTZ
    2. TIMESTAMP_LTZ(p)
    3. TIMESTAMP WITH LOCAL TIME ZONE
    4. TIMESTAMP(p) WITH LOCAL TIME ZONE

    Java/Scala

    1. DataTypes.TIMESTAMP_LTZ(p)
    2. DataTypes.TIMESTAMP_WITH_LOCAL_TIME_ZONE(p)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.time.InstantXXDefault
    java.lang.IntegerXXDescribes the number of seconds since epoch.
    intX(X)Describes the number of seconds since epoch.
    Output only if type is not nullable.
    java.lang.LongXXDescribes the number of milliseconds since epoch.
    longX(X)Describes the number of milliseconds since epoch.
    Output only if type is not nullable.
    java.sql.TimestampXXDescribes the number of milliseconds since epoch.
    org.apache.flink.table.data.TimestampDataXXInternal data structure.

    Python

    1. DataTypes.TIMESTAMP_LTZ(p)
    2. DataTypes.TIMESTAMP_WITH_LOCAL_TIME_ZONE(p)

    Attention The precision specified in DataTypes.TIMESTAMP_LTZ(p) must be 3 currently.

    The type can be declared using TIMESTAMP_LTZ(p) where p is the number of digits of fractional seconds (precision). p must have a value between 0 and 9 (both inclusive). If no precision is specified, p is equal to 6.

    TIMESTAMP(p) WITH LOCAL TIME ZONE is a synonym for this type.

    INTERVAL YEAR TO MONTH

    Data type for a group of year-month interval types.

    The type must be parameterized to one of the following resolutions:

    • interval of years,
    • interval of years to months,
    • or interval of months.

    An interval of year-month consists of +years-months with values ranging from -9999-11 to +9999-11.

    The value representation is the same for all types of resolutions. For example, an interval of months of 50 is always represented in an interval-of-years-to-months format (with default year precision): +04-02.

    Declaration

    SQL

    1. INTERVAL YEAR
    2. INTERVAL YEAR(p)
    3. INTERVAL YEAR(p) TO MONTH
    4. INTERVAL MONTH

    Java/Scala

    1. DataTypes.INTERVAL(DataTypes.YEAR())
    2. DataTypes.INTERVAL(DataTypes.YEAR(p))
    3. DataTypes.INTERVAL(DataTypes.YEAR(p), DataTypes.MONTH())
    4. DataTypes.INTERVAL(DataTypes.MONTH())

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.time.PeriodXXIgnores the days part. Default
    java.lang.IntegerXXDescribes the number of months.
    intX(X)Describes the number of months.
    Output only if type is not nullable.

    Python

    1. DataTypes.INTERVAL(DataTypes.YEAR())
    2. DataTypes.INTERVAL(DataTypes.YEAR(p))
    3. DataTypes.INTERVAL(DataTypes.YEAR(p), DataTypes.MONTH())
    4. DataTypes.INTERVAL(DataTypes.MONTH())

    The type can be declared using the above combinations where p is the number of digits of years (year precision). p must have a value between 1 and 4 (both inclusive). If no year precision is specified, p is equal to 2.

    INTERVAL DAY TO SECOND

    Data type for a group of day-time interval types.

    The type must be parameterized to one of the following resolutions with up to nanosecond precision:

    • interval of days,
    • interval of days to hours,
    • interval of days to minutes,
    • interval of days to seconds,
    • interval of hours,
    • interval of hours to minutes,
    • interval of hours to seconds,
    • interval of minutes,
    • interval of minutes to seconds,
    • or interval of seconds.

    An interval of day-time consists of +days hours:months:seconds.fractional with values ranging from -999999 23:59:59.999999999 to +999999 23:59:59.999999999. The value representation is the same for all types of resolutions. For example, an interval of seconds of 70 is always represented in an interval-of-days-to-seconds format (with default precisions): +00 00:01:10.000000.

    Declaration

    SQL

    1. DataTypes.INTERVAL(DataTypes.DAY())
    2. DataTypes.INTERVAL(DataTypes.DAY(p1))
    3. DataTypes.INTERVAL(DataTypes.DAY(p1), DataTypes.HOUR())
    4. DataTypes.INTERVAL(DataTypes.DAY(p1), DataTypes.MINUTE())
    5. DataTypes.INTERVAL(DataTypes.DAY(p1), DataTypes.SECOND(p2))
    6. DataTypes.INTERVAL(DataTypes.HOUR())
    7. DataTypes.INTERVAL(DataTypes.HOUR(), DataTypes.MINUTE())
    8. DataTypes.INTERVAL(DataTypes.HOUR(), DataTypes.SECOND(p2))
    9. DataTypes.INTERVAL(DataTypes.MINUTE())
    10. DataTypes.INTERVAL(DataTypes.MINUTE(), DataTypes.SECOND(p2))
    11. DataTypes.INTERVAL(DataTypes.SECOND())
    12. DataTypes.INTERVAL(DataTypes.SECOND(p2))

    Bridging to JVM Types

    Python

    1. DataTypes.INTERVAL(DataTypes.DAY())
    2. DataTypes.INTERVAL(DataTypes.DAY(p1))
    3. DataTypes.INTERVAL(DataTypes.DAY(p1), DataTypes.MINUTE())
    4. DataTypes.INTERVAL(DataTypes.DAY(p1), DataTypes.SECOND(p2))
    5. DataTypes.INTERVAL(DataTypes.HOUR())
    6. DataTypes.INTERVAL(DataTypes.HOUR(), DataTypes.MINUTE())
    7. DataTypes.INTERVAL(DataTypes.HOUR(), DataTypes.SECOND(p2))
    8. DataTypes.INTERVAL(DataTypes.MINUTE())
    9. DataTypes.INTERVAL(DataTypes.MINUTE(), DataTypes.SECOND(p2))
    10. DataTypes.INTERVAL(DataTypes.SECOND())
    11. DataTypes.INTERVAL(DataTypes.SECOND(p2))

    The type can be declared using the above combinations where p1 is the number of digits of days (day precision) and p2 is the number of digits of fractional seconds (fractional precision). p1 must have a value between 1 and 6 (both inclusive). p2 must have a value between 0 and 9 (both inclusive). If no p1 is specified, it is equal to 2 by default. If no p2 is specified, it is equal to 6 by default.

    ARRAY

    Data type of an array of elements with same subtype.

    Compared to the SQL standard, the maximum cardinality of an array cannot be specified but is fixed at 2,147,483,647. Also, any valid type is supported as a subtype.

    Declaration

    SQL

    1. ARRAY<t>
    2. t ARRAY

    Java/Scala

    1. DataTypes.ARRAY(t)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    t[](X)(X)Depends on the subtype. Default
    XX
    subclass of java.util.List<t>X
    org.apache.flink.table.data.ArrayDataXXInternal data structure.

    Python

    1. DataTypes.ARRAY(t)

    The type can be declared using ARRAY<t> where t is the data type of the contained elements.

    t ARRAY is a synonym for being closer to the SQL standard. For example, INT ARRAY is equivalent to ARRAY<INT>.

    MAP

    Data type of an associative array that maps keys (including NULL) to values (including NULL). A map cannot contain duplicate keys; each key can map to at most one value.

    There is no restriction of element types; it is the responsibility of the user to ensure uniqueness.

    The map type is an extension to the SQL standard.

    Declaration

    SQL

    1. MAP<kt, vt>

    Java/Scala

    1. DataTypes.MAP(kt, vt)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.util.Map<kt, vt>XXDefault
    subclass of java.util.Map<kt, vt>X
    org.apache.flink.table.data.MapDataXXInternal data structure.

    Python

    1. DataTypes.MAP(kt, vt)

    The type can be declared using MAP<kt, vt> where kt is the data type of the key elements and vt is the data type of the value elements.

    MULTISET

    Data type of a multiset (=bag). Unlike a set, it allows for multiple instances for each of its elements with a common subtype. Each unique value (including NULL) is mapped to some multiplicity.

    There is no restriction of element types; it is the responsibility of the user to ensure uniqueness.

    Declaration

    SQL

    1. MULTISET<t>
    2. t MULTISET

    Java/Scala

    1. DataTypes.MULTISET(t)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.util.Map<t, java.lang.Integer>XXAssigns each value to an integer multiplicity. Default
    subclass of java.util.Map<t, java.lang.Integer>>X
    org.apache.flink.table.data.MapDataXXInternal data structure.

    Python

    1. DataTypes.MULTISET(t)

    The type can be declared using MULTISET<t> where t is the data type of the contained elements.

    t MULTISET is a synonym for being closer to the SQL standard. For example, INT MULTISET is equivalent to MULTISET<INT>.

    ROW

    Data type of a sequence of fields.

    A field consists of a field name, field type, and an optional description. The most specific type of a row of a table is a row type. In this case, each column of the row corresponds to the field of the row type that has the same ordinal position as the column.

    Compared to the SQL standard, an optional field description simplifies the handling with complex structures.

    A row type is similar to the STRUCT type known from other non-standard-compliant frameworks.

    Declaration

    SQL

    1. ROW<n0 t0, n1 t1, ...>
    2. ROW<n0 t0 'd0', n1 t1 'd1', ...>
    3. ROW(n0 t0, n1 t1, ...>
    4. ROW(n0 t0 'd0', n1 t1 'd1', ...)

    Java/Scala

    1. DataTypes.ROW(DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...)
    2. DataTypes.ROW(DataTypes.FIELD(n0, t0, d0), DataTypes.FIELD(n1, t1, d1), ...)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    org.apache.flink.types.RowXXDefault
    org.apache.flink.table.data.RowDataXXInternal data structure.

    Python

    1. DataTypes.ROW([DataTypes.FIELD(n0, t0), DataTypes.FIELD(n1, t1), ...])
    2. DataTypes.ROW([DataTypes.FIELD(n0, t0, d0), DataTypes.FIELD(n1, t1, d1), ...])

    The type can be declared using ROW<n0 t0 'd0', n1 t1 'd1', ...> where n is the unique name of a field, t is the logical type of a field, d is the description of a field.

    ROW(...) is a synonym for being closer to the SQL standard. For example, ROW(myField INT, myOtherField BOOLEAN) is equivalent to ROW<myField INT, myOtherField BOOLEAN>.

    User-Defined Data Types

    Java/Scala

    Attention User-defined data types are not fully supported yet. They are currently (as of Flink 1.11) only exposed as unregistered structured types in parameters and return types of functions.

    A structured type is similar to an object in an object-oriented programming language. It contains zero, one or more attributes. Each attribute consists of a name and a type.

    There are two kinds of structured types:

    • Types that are stored in a catalog and are identified by a catalog identifer (like cat.db.MyType). Those are equal to the SQL standard definition of structured types.

    • Anonymously defined, unregistered types (usually reflectively extracted) that are identified by an implementation class (like com.myorg.model.MyType). Those are useful when programmatically defining a table program. They enable reusing existing JVM classes without manually defining the schema of a data type again.

    Registered Structured Types

    Currently, registered structured types are not supported. Thus, they cannot be stored in a catalog or referenced in a CREATE TABLE DDL.

    Unregistered Structured Types

    Unregistered structured types can be created from regular POJOs (Plain Old Java Objects) using automatic reflective extraction.

    The implementation class of a structured type must meet the following requirements:

    • The class must be globally accessible which means it must be declared public, static, and not abstract.
    • The class must offer a default constructor with zero arguments or a full constructor that assigns all fields.
    • All fields of the class must be readable by either public declaration or a getter that follows common coding style such as getField(), isField(), field().
    • All fields of the class must be writable by either public declaration, fully assigning constructor, or a setter that follows common coding style such as setField(...), field(...).
    • All fields must be mapped to a data type either implicitly via reflective extraction or explicitly using the @DataTypeHint .
    • Fields that are declared static or transient are ignored.

    The reflective extraction supports arbitrary nesting of fields as long as a field type does not (transitively) refer to itself.

    The declared field class (e.g. public int age;) must be contained in the list of supported JVM bridging classes defined for every data type in this document (e.g. java.lang.Integer or int for INT).

    For some classes an annotation is required in order to map the class to a data type (e.g. @DataTypeHint("DECIMAL(10, 2)") to assign a fixed precision and scale for java.math.BigDecimal).

    Python

    Declaration

    Java

    1. class User {
    2. // extract fields automatically
    3. public int age;
    4. public String name;
    5. // enrich the extraction with precision information
    6. public @DataTypeHint("DECIMAL(10, 2)") BigDecimal totalBalance;
    7. // enrich the extraction with forcing using RAW types
    8. public @DataTypeHint("RAW") Class<?> modelClass;
    9. }
    10. DataTypes.of(User.class);

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    classXXOriginating class or subclasses (for input) or
    superclasses (for output). Default
    org.apache.flink.types.RowXXRepresent the structured type as a row.
    org.apache.flink.table.data.RowDataXXInternal data structure.

    Scala

    1. case class User(
    2. // extract fields automatically
    3. age: Int,
    4. name: String,
    5. // enrich the extraction with precision information
    6. @DataTypeHint("DECIMAL(10, 2)") totalBalance: java.math.BigDecimal,
    7. // enrich the extraction with forcing using a RAW type
    8. @DataTypeHint("RAW") modelClass: Class[_]
    9. )
    10. DataTypes.of(classOf[User])

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    classXXOriginating class or subclasses (for input) or
    superclasses (for output). Default
    org.apache.flink.types.RowXXRepresent the structured type as a row.
    org.apache.flink.table.data.RowDataXXInternal data structure.

    Python

    1. Not supported.

    Other Data Types

    BOOLEAN

    Data type of a boolean with a (possibly) three-valued logic of TRUE, FALSE, and UNKNOWN.

    Declaration

    SQL

    1. BOOLEAN

    Java/Scala

    1. DataTypes.BOOLEAN()

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    java.lang.BooleanXXDefault
    booleanX(X)Output only if type is not nullable.

    Python

    1. DataTypes.BOOLEAN()

    RAW

    Data type of an arbitrary serialized type. This type is a black box within the table ecosystem and is only deserialized at the edges.

    The raw type is an extension to the SQL standard.

    Declaration

    SQL

    1. RAW('class', 'snapshot')

    Java/Scala

    1. DataTypes.RAW(class, serializer)
    2. DataTypes.RAW(class)

    Bridging to JVM Types

    Java TypeInputOutputRemarks
    classXXOriginating class or subclasses (for input) or
    superclasses (for output). Default
    byte[]X
    org.apache.flink.table.data.RawValueDataXXInternal data structure.

    Python

    1. Not supported.

    SQL/Java/Scala

    The type can be declared using RAW('class', 'snapshot') where class is the originating class and snapshot is the serialized TypeSerializerSnapshot in Base64 encoding. Usually, the type string is not declared directly but is generated while persisting the type.

    In the API, the RAW type can be declared either by directly supplying a Class + TypeSerializer or by passing Class and letting the framework extract Class + TypeSerializer from there.

    Python

    NULL

    Data type for representing untyped NULL values.

    The null type is an extension to the SQL standard. A null type has no other value except NULL, thus, it can be cast to any nullable type similar to JVM semantics.

    This type helps in representing unknown types in API calls that use a NULL literal as well as bridging to formats such as JSON or Avro that define such a type as well.

    This type is not very useful in practice and is just mentioned here for completeness.

    Declaration

    SQL

    1. NULL

    Java/Scala

    1. DataTypes.NULL()

    Bridging to JVM Types

    Python

    1. Not supported.

    Java/Scala

    At many locations in the API, Flink tries to automatically extract data type from class information using reflection to avoid repetitive manual schema work. However, extracting a data type reflectively is not always successful because logical information might be missing. Therefore, it might be necessary to add additional information close to a class or field declaration for supporting the extraction logic.

    The following table lists classes that can be implicitly mapped to a data type without requiring further information.

    If you intend to implement classes in Scala, it is recommended to use boxed types (e.g. java.lang.Integer) instead of Scala’s primitives. Scala’s primitives (e.g. Int or Double) are compiled to JVM primitives (e.g. int/double) and result in NOT NULL semantics as shown in the table below. Furthermore, Scala primitives that are used in generics (e.g. java.util.Map[Int, Double]) are erased during compilation and lead to class information similar to java.util.Map[java.lang.Object, java.lang.Object].

    ClassData Type
    java.lang.StringSTRING
    java.lang.BooleanBOOLEAN
    booleanBOOLEAN NOT NULL
    java.lang.ByteTINYINT
    byteTINYINT NOT NULL
    java.lang.ShortSMALLINT
    shortSMALLINT NOT NULL
    java.lang.IntegerINT
    intINT NOT NULL
    java.lang.LongBIGINT
    longBIGINT NOT NULL
    java.lang.FloatFLOAT
    floatFLOAT NOT NULL
    java.lang.DoubleDOUBLE
    doubleDOUBLE NOT NULL
    java.sql.DateDATE
    java.time.LocalDateDATE
    java.sql.TimeTIME(0)
    java.time.LocalTimeTIME(9)
    java.sql.TimestampTIMESTAMP(9)
    java.time.LocalDateTimeTIMESTAMP(9)
    java.time.OffsetDateTimeTIMESTAMP(9) WITH TIME ZONE
    java.time.InstantTIMESTAMP_LTZ(9)
    java.time.DurationINVERVAL SECOND(9)
    java.time.PeriodINTERVAL YEAR(4) TO MONTH
    byte[]BYTES
    T[]ARRAY<T>
    java.util.Map<K, V>MAP<K, V>
    structured type Tanonymous structured type T

    Other JVM bridging classes mentioned in this document require a @DataTypeHint annotation.

    Data type hints can parameterize or replace the default extraction logic of individual function parameters and return types, structured classes, or fields of structured classes. An implementer can choose to what extent the default extraction logic should be modified by declaring a @DataTypeHint annotation.

    The @DataTypeHint annotation provides a set of optional hint parameters. Some of those parameters are shown in the following example. More information can be found in the documentation of the annotation class.

    Python

    Java

    1. import org.apache.flink.table.annotation.DataTypeHint;
    2. class User {
    3. // defines an INT data type with a default conversion class `java.lang.Integer`
    4. public @DataTypeHint("INT") Object o;
    5. // defines a TIMESTAMP data type of millisecond precision with an explicit conversion class
    6. public @DataTypeHint(value = "TIMESTAMP(3)", bridgedTo = java.sql.Timestamp.class) Object o;
    7. // enrich the extraction with forcing using a RAW type
    8. public @DataTypeHint("RAW") Class<?> modelClass;
    9. // defines that all occurrences of java.math.BigDecimal (also in nested fields) will be
    10. // extracted as DECIMAL(12, 2)
    11. public @DataTypeHint(defaultDecimalPrecision = 12, defaultDecimalScale = 2) AccountStatement stmt;
    12. // defines that whenever a type cannot be mapped to a data type, instead of throwing
    13. // an exception, always treat it as a RAW type
    14. public @DataTypeHint(allowRawGlobally = HintFlag.TRUE) ComplexModel model;

    Python