Automatic Cleanup for Finished Jobs
FEATURE STATE:
When your Job has finished, it’s useful to keep that Job in the API (and not immediately delete the Job) so that you can tell whether the Job succeeded or failed.
Kubernetes’ TTL-after-finished controller provides a TTL (time to live) mechanism to limit the lifetime of Job objects that have finished execution.
The TTL-after-finished controller assumes that a Job is eligible to be cleaned up TTL seconds after the Job has finished. The timer starts once the status condition of the Job changes to show that the Job is either Complete
or Failed
; once the TTL has expired, that Job becomes eligible for cascading removal. When the TTL-after-finished controller cleans up a job, it will delete it cascadingly, that is to say it will delete its dependent objects together with it.
Kubernetes honors object lifecycle guarantees on the Job, such as waiting for .
You can set the TTL seconds at any time. Here are some examples for setting the field of a Job:
- Specify this field in the Job manifest, so that a Job can be cleaned up automatically some time after it finishes.
- Manually set this field of existing, already finished Jobs, so that they become eligible for cleanup.
- Use a mutating admission webhook to set this field dynamically after the Job has finished, and choose different TTL values based on job status, labels. For this case, the webhook needs to detect changes to the
.status
of the Job and only set a TTL when the Job is being marked as completed. - Write your own controller to manage the cleanup TTL for Jobs that match a particular .
Time skew
Because the TTL-after-finished controller uses timestamps stored in the Kubernetes jobs to determine whether the TTL has expired or not, this feature is sensitive to time skew in your cluster, which may cause the control plane to clean up Job objects at the wrong time.
Clocks aren’t always correct, but the difference should be very small. Please be aware of this risk when setting a non-zero TTL.