DolphinScheduler-Master allocates tasks to workers, and by default provides three algorithms:
Weighted random (random)
Smoothing polling (roundrobin)
Linear load (lowerweight)
The default configuration is the linear load.
eg: master.host.selector = random (case-insensitive)
The configuration file is worker.properties
weight
All of the above load algorithms are weighted based on weights, which affect the outcome of the triage. You can set different weights for different machines by modifying the worker.weight value.
Preheating
With JIT optimisation in mind, we will let the worker run at low power for a period of time after startup so that it can gradually reach its optimal state, a process we call preheating. If you are interested, you can read some articles about JIT.
So the worker will gradually reach its maximum weight over time after it starts (by default ten minutes, we don’t provide a configuration item, you can change it and submit a PR if needed).
Random (weighted)
Smoothed polling (weighted)
An obvious drawback of the weighted polling algorithm. Namely, under certain specific weights, weighted polling scheduling generates an uneven sequence of instances, and this unsmoothed load may cause some instances to experience transient high loads, leading to a risk of system downtime. To address this scheduling flaw, we provide a smooth weighted polling algorithm.
Each worker is given two weights, weight (which remains constant after warm-up is complete) and current_weight (which changes dynamically), for each route. The current_weight + weight is iterated over all the workers, and the weight of all the workers is added up and counted as total_weight, then the worker with the largest current_weight is selected as the worker for this task. current_weight-total_weight.
Linear weighting (default algorithm)
The algorithm reports its own load information to the registry at regular intervals. We base our judgement on two main pieces of information
- load average (default is the number of CPU cores * 2)
- available physical memory (default is 0.3, in G)
If either of the two is lower than the configured item, then this worker will not participate in the load. (no traffic will be allocated)
You can customise the configuration by changing the following properties in worker.properties
- worker.max.cpuload.avg=-1 (worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2)