BroadcastJob
In the end, BroadcastJob does not consume any resources after each Pod succeeds on every node. This controller is particularly useful when upgrading a software, e.g., Kubelet, or validation check in every node, which is typically needed only once within a long period of time or running an adhoc full cluster inspection script.
Optionally, a BroadcastJob can keep alive after all Pods on desired nodes complete so that a Pod will be automatically launched for every new node after it is added to the cluster.
describes the Pod template used to run the job. Note that for the Pod restart policy, only Never
or OnFailure
is allowed for BroadcastJob.
Parallelism
Parallelism
specifies the maximal desired number of Pods that should be run at any given time. By default, there’s no limit.
CompletionPolicy
CompletionPolicy
specifies the controller behavior when reconciling the BroadcastJob.
Always (default)
Always
policy means the job will eventually complete with either failed or succeeded condition. The following parameters take effect with this policy:
ActiveDeadlineSeconds
specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it. For example, ifActiveDeadlineSeconds
is set to 60 seconds, after the BroadcastJob starts running for 60 seconds, all the running pods will be deleted and the job will be marked as Failed.
Never
Type
Type
indicates the type of FailurePolicyType
.
Continue
means the job will be still running, when failed pod is found.FailFast
(default) means the job will be failed, when failed pod is found.Pause
means the job will be paused, when failed pod is found.
RestartLimit
RestartLimit
specifies the number of retries before marking the pod failed. Currently, the number of retries are defined as the aggregated number of restart counts across all Pods created by the job, i.e., the sum of the for all containers in every Pod. If this value exceeds , the job is marked as Failed and all running Pods are deleted. No limit is enforced ifRestartLimit
is not set.
Examples
Monitor BroadcastJob status
Assuming the cluster has only one node, run kubectl get bcj
(shortcut name for BroadcastJob) and we will see the following:
Desired
: The number of desired Pods. This equals to the number of matched nodes in the cluster.Active
: The number of active Pods.SUCCEEDED
: The number of succeeded Pods.FAILED
: The number of failed Pods.
ttlSecondsAfterFinished
Run a BroadcastJob that each Pod computes a pi, with ttlSecondsAfterFinished
set to 30. The job will be deleted in 30 seconds after it is finished.
Run a BroadcastJob that each Pod sleeps for 50 seconds, with activeDeadlineSeconds
set to 10 seconds. The job will be marked as Failed after it runs for 10 seconds, and the running Pods will be deleted.
completionPolicy
failurePolicy
restartLimit
Run a BroadcastJob with FailFast
failurePolicy. The job will be failed, when failed pod is found.