Raw Format
The Raw format allows to read and write raw (byte based) values as a single column.
Note: this format encodes values as null
of byte[]
type. This may have limitation when used in upsert-kafka
, because upsert-kafka
treats null
values as a tombstone message (DELETE on the key). Therefore, we recommend avoiding using upsert-kafka
connector and the format as a value.format
if the field can have a null
value.
For example, you may have following raw log data in Kafka and want to read and analyse such data using Flink SQL.
The following creates a table where it reads from (and can writes to) the underlying Kafka topic as an anonymous string value in UTF-8 encoding by using raw
format:
In contrast, you can also write a single column of STRING type into this Kafka topic as an anonymous string value in UTF-8 encoding.
The table below details the SQL types the format supports, including details of the serializer and deserializer class for encoding and decoding.
Flink SQL type | Value |
---|---|
CHAR / VARCHAR / STRING | A UTF-8 (by default) encoded text string. The encoding charset can be configured by ‘raw.charset’. |
BINARY / VARBINARY / BYTES | The sequence of bytes itself. |
A single byte to indicate boolean value, 0 means false, 1 means true. | |
TINYINT | A single byte of the signed number value. |
SMALLINT | Two bytes with big-endian (by default) encoding. The endianness can be configured by ‘raw.endianness’. |
INT | Four bytes with big-endian (by default) encoding. The endianness can be configured by ‘raw.endianness’. |
BIGINT | Eight bytes with big-endian (by default) encoding. The endianness can be configured by ‘raw.endianness’. |
FLOAT | Four bytes with IEEE 754 format and big-endian (by default) encoding. The endianness can be configured by ‘raw.endianness’. |
DOUBLE | Eight bytes with IEEE 754 format and big-endian (by default) encoding. The endianness can be configured by ‘raw.endianness’. |
The sequence of bytes serialized by the underlying TypeSerializer of the RAW type. |