Handling retriable and non-retriable pod failures with Pod failure policy
This document shows you how to use the Pod failure policy, in combination with the default , to improve the control over the handling of container- or Pod-level failure within a Job.
The definition of Pod failure policy may help you to:
- better utilize the computational resources by avoiding unnecessary Pod retries.
- avoid Job failures due to Pod disruptions (such , API-initiated eviction or -based eviction).
You should already be familiar with the basic use of .
You need to have a Kubernetes cluster, and the kubectl command-line tool must be configured to communicate with your cluster. It is recommended to run this tutorial on a cluster with at least two nodes that are not acting as control plane hosts. If you do not already have a cluster, you can create one by using minikube or you can use one of these Kubernetes playgrounds:
Your Kubernetes server must be at or later than version v1.25. To check the version, enter kubectl version
.
Ensure that the PodDisruptionConditions
and JobPodFailurePolicy
are both enabled in your cluster.
With the following example, you can learn how to use Pod failure policy to avoid unnecessary Pod restarts when a Pod failure indicates a non-retriable software bug.
First, create a Job based on the config:
by running:
kubectl create -f job-pod-failure-policy-failjob.yaml
After around 30s the entire Job should be terminated. Inspect the status of the Job by running:
kubectl get jobs -l job-name=job-pod-failure-policy-failjob -o yaml
In the Job status, see a job Failed
condition with the field reason
equal PodFailurePolicy
. Additionally, the message
field contains a more detailed information about the Job termination, such as: Container main for pod default/job-pod-failure-policy-failjob-8ckj8 failed with exit code 42 matching FailJob rule at index 0
.
For comparison, if the Pod failure policy was disabled it would take 6 retries of the Pod, taking at least 2 minutes.
Delete the Job you created:
kubectl delete jobs/job-pod-failure-policy-failjob
The cluster automatically cleans up the Pods.
Caution: Timing is important for this example, so you may want to read the steps before execution. In order to trigger a Pod disruption it is important to drain the node while the Pod is running on it (within 90s since the Pod is scheduled).
Create a Job based on the config:
/controllers/job-pod-failure-policy-ignore.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: job-pod-failure-policy-ignore
spec:
completions: 4
parallelism: 2
template:
spec:
restartPolicy: Never
containers:
- name: main
command: ["bash"]
args:
- -c
backoffLimit: 0
podFailurePolicy:
rules:
- action: Ignore
onPodConditions:
- type: DisruptionTarget
by running:
kubectl create -f job-pod-failure-policy-ignore.yaml
Run this command to check the
nodeName
the Pod is scheduled to:nodeName=$(kubectl get pods -l job-name=job-pod-failure-policy-ignore -o jsonpath='{.items[0].spec.nodeName}')
Drain the node to evict the Pod before it completes (within 90s):
Inspect the
.status.failed
to check the counter for the Job is not incremented:kubectl get jobs -l job-name=job-pod-failure-policy-ignore -o yaml
Uncordon the node:
kubectl uncordon nodes/$nodeName
The Job resumes and succeeds.
For comparison, if the Pod failure policy was disabled the Pod disruption would result in terminating the entire Job (as the .spec.backoffLimit
is set to 0).
Delete the Job you created:
kubectl delete jobs/job-pod-failure-policy-ignore
The cluster automatically cleans up the Pods.
With the following example, you can learn how to use Pod failure policy to avoid unnecessary Pod restarts based on custom Pod Conditions.
Note: The example below works since version 1.27 as it relies on transitioning of deleted pods, in the Pending
phase, to a terminal phase (see: ).
First, create a Job based on the config:
apiVersion: batch/v1
kind: Job
metadata:
name: job-pod-failure-policy-config-issue
spec:
completions: 8
parallelism: 2
template:
restartPolicy: Never
containers:
- name: main
image: "non-existing-repo/non-existing-image:example"
podFailurePolicy:
rules:
- action: FailJob
onPodConditions:
- type: ConfigIssue
by running:
kubectl create -f job-pod-failure-policy-config-issue.yaml
Note that, the image is misconfigured, as it does not exist.
Inspect the status of the job’s Pods by running:
kubectl get pods -l job-name=job-pod-failure-policy-config-issue -o yaml
You will see output similar to this:
Note that the pod remains in the
Pending
phase as it fails to pull the misconfigured image. This, in principle, could be a transient issue and the image could get pulled. However, in this case, the image does not exist so we indicate this fact by a custom condition.Add the custom condition. First prepare the patch by running:
cat <<EOF > patch.yaml
status:
conditions:
- type: ConfigIssue
status: "True"
reason: "NonExistingImage"
lastTransitionTime: "$(date -u +"%Y-%m-%dT%H:%M:%SZ")"
EOF
Second, select one of the pods created by the job by running:
podName=$(kubectl get pods -l job-name=job-pod-failure-policy-config-issue -o jsonpath='{.items[0].metadata.name}')
Then, apply the patch on one of the pods by running the following command:
kubectl patch pod $podName --subresource=status --patch-file=patch.yaml
If applied successfully, you will get a notification like this:
pod/job-pod-failure-policy-config-issue-k6pvp patched
Delete the pod to transition it to
Failed
phase, by running the command:kubectl delete pods/$podName
Inspect the status of the Job by running:
In the Job status, see a job
Failed
condition with the fieldreason
equalPodFailurePolicy
. Additionally, themessage
field contains a more detailed information about the Job termination, such as:Pod default/job-pod-failure-policy-config-issue-k6pvp has condition ConfigIssue matching FailJob rule at index 0
.
Note: In a production environment, the steps 3 and 4 should be automated by a user-provided controller.
Delete the Job you created:
The cluster automatically cleans up the Pods.