Glossary

    Checkpoint Storage

    A Flink Application Cluster is a dedicated Flink Cluster that only executes from one Flink Application. The lifetime of the is bound to the lifetime of the Flink Application.

    A Flink Job Cluster is a dedicated that only executes a single Flink Job. The lifetime of the is bound to the lifetime of the Flink Job.

    A distributed system consisting of (typically) one and one or more Flink TaskManager processes.

    Event

    An event is a statement about a change of the state of the domain modelled by the application. Events can be input and/or output of a stream or batch processing application. Events are special types of records.

    ExecutionGraph

    see Physical Graph

    Function

    Functions are implemented by the user and encapsulate the application logic of a Flink program. Most Functions are wrapped by a corresponding Operator.

    Instance

    The term instance is used to describe a specific instance of a specific type (usually Operator or ) during runtime. As Apache Flink is mostly written in Java, this corresponds to the definition of Instance or Object in Java. In the context of Apache Flink, the term parallel instance is also frequently used to emphasize that multiple instances of the same Operator or type are running in parallel.

    A Flink application is a Java Application that submits one or multiple from the method (or by some other means). Submitting jobs is usually done by calling on an execution environment.

    The jobs of an application can either be submitted to a long running Flink Session Cluster, to a dedicated , or to a Flink Job Cluster.

    JobGraph

    see

    The JobManager is the orchestrator of a . It contains three distinct components: Flink Resource Manager, Flink Dispatcher and one Flink JobMaster per running .

    JobMasters are one of the components running in the . A JobMaster is responsible for supervising the execution of the Tasks of a single job.

    Logical Graph

    A logical graph is a directed graph where the nodes are Operators and the edges define input/output-relationships of the operators and correspond to data streams or data sets. A logical graph is created by submitting jobs from a .

    Logical graphs are also often referred to as dataflow graphs.

    Managed State

    Managed State describes application state which has been registered with the framework. For Managed State, Apache Flink will take care about persistence and rescaling among other things.

    Operator

    Node of a Logical Graph. An Operator performs a certain operation, which is usually executed by a . Sources and Sinks are special Operators for data ingestion and data egress.

    Operator Chain

    An Operator Chain consists of two or more consecutive without any repartitioning in between. Operators within the same Operator Chain forward records to each other directly without going through serialization or Flink’s network stack.

    Partition

    A partition is an independent subset of the overall data stream or data set. A data stream or data set is divided into partitions by assigning each to one or more partitions. Partitions of data streams or data sets are consumed by Tasks during runtime. A transformation which changes the way a data stream or data set is partitioned is often called repartitioning.

    Physical Graph

    Record

    Records are the constituent elements of a data set or data stream. and Functions receive records as input and emit records as output.

    (Runtime) Execution Mode

    DataStream API programs can be executed in one of two execution modes: or . See Execution Mode for more details.

    A long-running Flink Cluster which accepts multiple for execution. The lifetime of this Flink Cluster is not bound to the lifetime of any Flink Job. Formerly, a Flink Session Cluster was also known as a Flink Cluster in session mode. Compare to Flink Application Cluster.

    State Backend

    For stream processing programs, the State Backend of a Flink Job determines how its is stored on each TaskManager (Java Heap of TaskManager or (embedded) RocksDB).

    Sub-Task

    A Sub-Task is a responsible for processing a partition of the data stream. The term “Sub-Task” emphasizes that there are multiple parallel Tasks for the same or Operator Chain.

    Table Program

    A generic term for pipelines declared with Flink’s relational APIs (Table API or SQL).

    Task

    Node of a . A task is the basic unit of work, which is executed by Flink’s runtime. Tasks encapsulate exactly one parallel instance of an Operator or .

    TaskManagers are the worker processes of a . Tasks are scheduled to TaskManagers for execution. They communicate with each other to exchange data between subsequent Tasks.

    Transformation

    A Transformation is applied on one or more data streams or data sets and results in one or more output data streams or data sets. A transformation might change a data stream or data set on a per-record basis, but might also only change its partitioning or perform an aggregation. While Operators and are the “physical” parts of Flink’s API, Transformations are only an API concept. Specifically, most transformations are implemented by certain Operators.