Parquet format
This format is compatible with the new Source that can be used in both batch and streaming modes. Thus, you can use this format in two ways:
In this example we create a DataStream containing Parquet records as Flink Rows. We project the schema to read only certain fields (“f7”, “f4” and “f99”).
We read records in batches of 500 records. The first boolean parameter specifies if timestamp columns need to be interpreted as UTC. The second boolean instructs the application if the projected Parquet fields names are to be interpreted in a case sensitive way. There is no need for a watermark strategy as records do not contain event timestamps.
In this example we create a DataStream containing Parquet records as Flink Rows that will infinitely grow as new files are added to the directory. We monitor for new files each second. We project the schema to read only certain fields (“f7”, “f4” and “f99”).
We read records in batches of 500 records. The first boolean parameter specifies if timestamp columns need to be interpreted as UTC. The second boolean instructs the application if the projected Parquet fields names are to be interpreted in a case sensitive way. There is no need for a watermark strategy as records do not contain event timestamps.