Hybrid Source
For example, a bootstrap use case may need to read several days worth of bounded input from S3 before continuing with the latest unbounded input from Kafka. HybridSource
switches from FileSource
to KafkaSource
when the bounded file input finishes without interrupting the application.
Prior to , it was necessary to create a topology with multiple sources and define a switching mechanism in user land, which leads to operational complexity and inefficiency.
With HybridSource
the multiple sources appear as a single source in the Flink job graph and from DataStream
API perspective.
To use the connector, add the flink-connector-base
dependency to your project:
Copied to clipboard!
(Typically comes as transitive dependency with concrete sources.)
Here we cover the most basic and then a more complex scenario, following the File/Kafka example.
Fixed start position at graph construction time
Example: Read till pre-determined switch time from files and then continue reading from Kafka. Each source covers an upfront known range and therefore the contained sources can be created upfront as if they were used directly:
Dynamic start position at switch time
Example: File source reads a very large backlog, taking potentially longer than retention available for next source. Switch needs to occur at “current time - X”. This requires the start time for the next source to be set at switch time. Here we require transfer of end position from the previous file enumerator for deferred construction of KafkaSource
by implementing SourceFactory
.