DAG: The full name is Directed Acyclic Graph, referred to as DAG. Task tasks in the workflow are assembled in the form of a directed acyclic graph, and topological traversal is performed from nodes with zero degrees of entry until there are no subsequent nodes. Examples are as follows:

    dag example

    Process definition: Visualization formed by dragging task nodes and establishing task node associationsDAG

    Process instance: The process instance is the instantiation of the process definition, which can be generated by manual start or scheduled scheduling. Each time the process definition runs, a process instance is generated

    Task instance: The task instance is the instantiation of the task node in the process definition, which identifies the specific task execution status

    Task type: Currently supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT ( depends), and plans to support dynamic plug-in expansion, note: SUB_PROCESS It is also a separate process definition that can be started and executed separately

    Scheduled: System adopts quartz distributed scheduler, and supports the visual generation of cron expressions

    Rely: The system not only supports DAG simple dependencies between the predecessor and successor nodes, but also provides task dependent nodes, supporting between processes

    Priority: Support the priority of process instances and task instances, if the priority of process instances and task instances is not set, the default is first-in-first-out

    Email alert: Support SQL task Query result email sending, process instance running result email alert and fault tolerance alert notification

    Failure strategy: For tasks running in parallel, if a task fails, two failure strategy processing methods are provided. Continue refers to regardless of the status of the task running in parallel until the end of the process failure. End means that once a failed task is found, Kill will also run the parallel task at the same time, and the process fails and ends

    Complement: Supplement historical data,Supports interval parallel and serial two complement methods

    • dolphinscheduler-common General constant enumeration, utility class, data structure or base class

    • dolphinscheduler-dao provides operations such as database access.

    • dolphinscheduler-server MasterServer and WorkerServer services

    • dolphinscheduler-service service module, including Quartz, Zookeeper, log client access service, easy to call server module and api module

    • dolphinscheduler-ui front-end module

    From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued