DAG: The full name is Directed Acyclic Graph, referred to as DAG. Task tasks in the workflow are assembled in the form of a directed acyclic graph, and topological traversal is performed from nodes with zero degrees of entry until there are no subsequent nodes. Examples are as follows:
dag example
Process definition: Visualization formed by dragging task nodes and establishing task node associationsDAG
Process instance: The process instance is the instantiation of the process definition, which can be generated by manual start or scheduled scheduling. Each time the process definition runs, a process instance is generated
Task instance: The task instance is the instantiation of the task node in the process definition, which identifies the specific task execution status
Task type: Currently supports SHELL, SQL, SUB_PROCESS (sub-process), PROCEDURE, MR, SPARK, PYTHON, DEPENDENT ( depends), and plans to support dynamic plug-in expansion, note: SUB_PROCESS It is also a separate process definition that can be started and executed separately
Scheduled: System adopts quartz distributed scheduler, and supports the visual generation of cron expressions
Rely: The system not only supports DAG simple dependencies between the predecessor and successor nodes, but also provides task dependent nodes, supporting between processes
Priority: Support the priority of process instances and task instances, if the priority of process instances and task instances is not set, the default is first-in-first-out
Email alert: Support SQL task Query result email sending, process instance running result email alert and fault tolerance alert notification
Failure strategy: For tasks running in parallel, if a task fails, two failure strategy processing methods are provided. Continue refers to regardless of the status of the task running in parallel until the end of the process failure. End means that once a failed task is found, Kill will also run the parallel task at the same time, and the process fails and ends
Complement: Supplement historical data,Supports interval parallel and serial two complement methods
dolphinscheduler-common General constant enumeration, utility class, data structure or base class
dolphinscheduler-dao provides operations such as database access.
dolphinscheduler-server MasterServer and WorkerServer services
dolphinscheduler-service service module, including Quartz, Zookeeper, log client access service, easy to call server module and api module
dolphinscheduler-ui front-end module
From the perspective of scheduling, this article preliminarily introduces the architecture principles and implementation ideas of the big data distributed workflow scheduling system-DolphinScheduler. To be continued