Pruning objects to reclaim resources

Cluster administrators can periodically prune older versions of objects from the cluster that are no longer required. For example, by pruning images you can delete older images and layers that are no longer in use, but are still taking up disk space.

The CLI groups prune operations under a common parent command:

This specifies:

  • The to perform the action on, such as groups, builds, deployments, or images.

  • The <options> supported to prune that object type.

Pruning groups

To prune groups records from an external provider, administrators can run the following command:

  1. $ oc adm prune groups \
  2. --sync-config=path/to/sync/config [<options>]

Procedure

  1. To see the groups that the prune command deletes, run the following command:

    1. $ oc adm prune groups --sync-config=ldap-sync-config.yaml
  2. To perform the prune operation, add the --confirm flag:

    1. $ oc adm prune groups --sync-config=ldap-sync-config.yaml --confirm

You can prune resources associated with deployments that are no longer required by the system, due to age and status.

The following command prunes replication controllers associated with DeploymentConfig objects:

  1. $ oc adm prune deployments [<options>]

To also prune replica sets associated with Deployment objects, use the —replica-sets flag. This flag is currently a Technology Preview feature.

Table 2. oc adm prune deployments flags
OptionDescription

—confirm

Indicate that pruning should occur, instead of performing a dry-run.

—keep-complete=<N>

Per the DeploymentConfig object, keep the last N replication controllers that have a status of Complete and replica count of zero. The default is 5.

—keep-failed=<N>

Per the DeploymentConfig object, keep the last N replication controllers that have a status of Failed and replica count of zero. The default is 1.

—keep-younger-than=<duration>

Do not prune any replication controller that is younger than <duration> relative to the current time. Valid units of measurement include nanoseconds (ns), microseconds (us), milliseconds (ms), seconds (s), minutes (m), and hours (h). The default is 60m.

—orphans

Prune all replication controllers that no longer have a DeploymentConfig object, has status of Complete or Failed, and has a replica count of zero.

—replica-sets=true|false

If true, replica sets are included in the pruning process. The default is false.

This flag is a Technology Preview feature.

Procedure

  1. To see what a pruning operation would delete, run the following command:

    1. $ oc adm prune deployments --orphans --keep-complete=5 --keep-failed=1 \
    2. --keep-younger-than=60m
  2. To actually perform the prune operation, add the --confirm flag:

    1. $ oc adm prune deployments --orphans --keep-complete=5 --keep-failed=1 \
    2. --keep-younger-than=60m --confirm

Pruning builds

To prune builds that are no longer required by the system due to age and status, administrators can run the following command:

  1. $ oc adm prune builds [<options>]

Procedure

  1. To see what a pruning operation would delete, run the following command:

    1. $ oc adm prune builds --orphans --keep-complete=5 --keep-failed=1 \
    2. --keep-younger-than=60m
  2. To actually perform the prune operation, add the --confirm flag:

    1. $ oc adm prune builds --orphans --keep-complete=5 --keep-failed=1 \
    2. --keep-younger-than=60m --confirm

Developers can enable automatic build pruning by modifying their build configuration.

Additional resources

Images from the OpenShift image registry that are no longer required by the system due to age, status, or exceed limits are automatically pruned. Cluster administrators can configure the Pruning Custom Resource, or suspend it.

Prerequisites

  • Cluster administrator permissions.

  • Install the oc CLI.

Procedure

  • Verify that the object named imagepruners.imageregistry.operator.openshift.io/cluster contains the following spec and status fields:
  1. spec:
  2. schedule: 0 0 * * * (1)
  3. suspend: false (2)
  4. keepTagRevisions: 3 (3)
  5. keepYoungerThanDuration: 60m (4)
  6. keepYoungerThan: 3600000000000 (5)
  7. resources: {} (6)
  8. affinity: {} (7)
  9. nodeSelector: {} (8)
  10. tolerations: [] (9)
  11. successfulJobsHistoryLimit: 3 (10)
  12. failedJobsHistoryLimit: 3 (11)
  13. status:
  14. observedGeneration: 2 (12)
  15. conditions: (13)
  16. - type: Available
  17. status: "True"
  18. lastTransitionTime: 2019-10-09T03:13:45
  19. reason: Ready
  20. message: "Periodic image pruner has been created."
  21. - type: Scheduled
  22. status: "True"
  23. lastTransitionTime: 2019-10-09T03:13:45
  24. reason: Scheduled
  25. message: "Image pruner job has been scheduled."
  26. - type: Failed
  27. staus: "False"
  28. lastTransitionTime: 2019-10-09T03:13:45
  29. reason: Succeeded
  30. message: "Most recent image pruning job succeeded."
1schedule: CronJob formatted schedule. This is an optional field, default is daily at midnight.
2suspend: If set to true, the CronJob running pruning is suspended. This is an optional field, default is false. The initial value on new clusters is false.
3: The number of revisions per tag to keep. This is an optional field, default is 3. The initial value is 3.
4keepYoungerThanDuration: Retain images younger than this duration. This is an optional field. If a value is not specified, either keepYoungerThan or the default value 60m (60 minutes) is used.
5keepYoungerThan: Deprecated. The same as keepYoungerThanDuration, but the duration is specified as an integer in nanoseconds. This is an optional field. When keepYoungerThanDuration is set, this field is ignored.
6resources: Standard pod resource requests and limits. This is an optional field.
7affinity: Standard pod affinity. This is an optional field.
8nodeSelector: Standard pod node selector. This is an optional field.
9tolerations: Standard pod tolerations. This is an optional field.
10successfulJobsHistoryLimit: The maximum number of successful jobs to retain. Must be >= 1 to ensure metrics are reported. This is an optional field, default is 3. The initial value is 3.
11failedJobsHistoryLimit: The maximum number of failed jobs to retain. Must be >= 1 to ensure metrics are reported. This is an optional field, default is 3. The initial value is 3.
12observedGeneration: The generation observed by the Operator.
13conditions: The standard condition objects with the following types:
  • Available: Indicates if the pruning job has been created. Reasons can be Ready or Error.

  • Scheduled: Indicates if the next pruning job has been scheduled. Reasons can be Scheduled, Suspended, or Error.

  • Failed: Indicates if the most recent pruning job failed.

The Image Registry Operator’s behavior for managing the pruner is orthogonal to the managementState specified on the Image Registry Operator’s ClusterOperator object. If the Image Registry Operator is not in the Managed state, the image pruner can still be configured and managed by the Pruning Custom Resource.

However, the managementState of the Image Registry Operator alters the behavior of the deployed image pruner job:

  • Managed: the —prune-registry flag for the image pruner is set to true.

  • Removed: the —prune-registry flag for the image pruner is set to false, meaning it only prunes image metatdata in etcd.

  • Unmanaged: the —prune-registry flag for the image pruner is set to false.

Manually pruning images

The pruning custom resource enables automatic image pruning for the images from the OpenShift image registry. However, administrators can manually prune images that are no longer required by the system due to age, status, or exceed limits. There are two methods to manually prune images:

Prerequisites

  • To prune images, you must first log in to the CLI as a user with an access token. The user must also have the system:image-pruner cluster role or greater (for example, cluster-admin).

  • Expose the image registry.

Procedure

To manually prune images that are no longer required by the system due to age, status, or exceed limits, use one of the following methods:

  • Run image pruning as a Job or CronJob on the cluster by creating a YAML file for the pruner service account, for example:

    Example output

    1. kind: List
    2. apiVersion: v1
    3. items:
    4. - apiVersion: v1
    5. kind: ServiceAccount
    6. metadata:
    7. name: pruner
    8. namespace: openshift-image-registry
    9. - apiVersion: rbac.authorization.k8s.io/v1
    10. kind: ClusterRoleBinding
    11. metadata:
    12. name: openshift-image-registry-pruner
    13. roleRef:
    14. apiGroup: rbac.authorization.k8s.io
    15. kind: ClusterRole
    16. name: system:image-pruner
    17. subjects:
    18. - kind: ServiceAccount
    19. name: pruner
    20. namespace: openshift-image-registry
    21. - apiVersion: batch/v1
    22. kind: CronJob
    23. metadata:
    24. name: image-pruner
    25. namespace: openshift-image-registry
    26. spec:
    27. schedule: "0 0 * * *"
    28. concurrencyPolicy: Forbid
    29. successfulJobsHistoryLimit: 1
    30. failedJobsHistoryLimit: 3
    31. jobTemplate:
    32. spec:
    33. template:
    34. spec:
    35. restartPolicy: OnFailure
    36. containers:
    37. - image: "quay.io/openshift/origin-cli:4.1"
    38. resources:
    39. requests:
    40. cpu: 1
    41. memory: 1Gi
    42. terminationMessagePolicy: FallbackToLogsOnError
    43. command:
    44. - oc
    45. args:
    46. - adm
    47. - prune
    48. - images
    49. - --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
    50. - --keep-tag-revisions=5
    51. - --keep-younger-than=96h
    52. - --confirm=true
    53. name: image-pruner
    54. serviceAccountName: pruner
  • Run the oc adm prune images [<options>] command:

    1. $ oc adm prune images [<options>]

    Pruning images removes data from the integrated registry unless --prune-registry=false is used.

    Pruning images with the --namespace flag does not remove images, only image streams. Images are non-namespaced resources. Therefore, limiting pruning to a particular namespace makes it impossible to calculate its current usage.

    By default, the integrated registry caches metadata of blobs to reduce the number of requests to storage, and to increase the request-processing speed. Pruning does not update the integrated registry cache. Images that still contain pruned layers after pruning will be broken because the pruned layers that have metadata in the cache will not be pushed. Therefore, you must redeploy the registry to clear the cache after pruning:

    1. $ oc rollout restart deployment/image-registry -n openshift-image-registry

    If the integrated registry uses a Redis cache, you must clean the database manually.

    If redeploying the registry after pruning is not an option, then you must permanently disable the cache.

    oc adm prune images operations require a route for your registry. Registry routes are not created by default.

    The Prune images CLI configuration options table describes the options you can use with the oc adm prune images <options> command.

You can apply conditions to your manually pruned images.

  • To remove any image managed by OKD, or images with the annotation openshift.io/image.managed:

    • Created at least --keep-younger-than minutes ago and are not currently referenced by any:

      • Pods created less than --keep-younger-than minutes ago

      • Image streams created less than --keep-younger-than minutes ago

      • Running pods

      • Pending pods

      • Replication controllers

      • Deployments

      • Deployment configs

      • Replica sets

      • Build configurations

      • Builds

      • --keep-tag-revisions most recent items in stream.status.tags[].items

    • That are exceeding the smallest limit defined in the same project and are not currently referenced by any:

      • Running pods

      • Pending pods

      • Replication controllers

      • Deployment configs

      • Replica sets

      • Build configurations

      • Builds

  • There is no support for pruning from external registries.

  • When an image is pruned, all references to the image are removed from all image streams that have a reference to the image in status.tags.

  • Image layers that are no longer referenced by any images are removed.

The —prune-over-size-limit flag cannot be combined with the —keep-tag-revisions flag nor the —keep-younger-than flags. Doing so returns information that this operation is not allowed.

Separating the removal of OKD image API objects and image data from the registry by using --prune-registry=false, followed by hard pruning the registry, can narrow timing windows and is safer when compared to trying to prune both through one command. However, timing windows are not completely removed.

For example, you can still create a pod referencing an image as pruning identifies that image for pruning. You should still keep track of an API object created during the pruning operations that might reference images so that you can mitigate any references to deleted content.

Re-doing the pruning without the --prune-registry option or with --prune-registry=true does not lead to pruning the associated storage in the image registry for images previously pruned by --prune-registry=false. Any images that were pruned with --prune-registry=false can only be deleted from registry storage by hard pruning the registry.

Procedure

  1. To see what a pruning operation would delete:

    1. Keeping up to three tag revisions, and keeping resources (images, image streams, and pods) younger than 60 minutes:

      1. $ oc adm prune images --keep-tag-revisions=3 --keep-younger-than=60m
      1. $ oc adm prune images --prune-over-size-limit
  2. To perform the prune operation with the options from the previous step:

    1. $ oc adm prune images --keep-tag-revisions=3 --keep-younger-than=60m --confirm
    1. $ oc adm prune images --prune-over-size-limit --confirm

The secure connection is the preferred and recommended approach. It is done over HTTPS protocol with a mandatory certificate verification. The prune command always attempts to use it if possible. If it is not possible, in some cases it can fall-back to insecure connection, which is dangerous. In this case, either certificate verification is skipped or plain HTTP protocol is used.

The fall-back to insecure connection is allowed in the following cases unless --certificate-authority is specified:

  1. The prune command is run with the --force-insecure option.

  2. The provided registry-url is prefixed with the http:// scheme.

  3. The provided registry-url is a local-link address or localhost.

  4. The configuration of the current user allows for an insecure connection. This can be caused by the user either logging in using --insecure-skip-tls-verify or choosing the insecure connection when prompted.

If the registry is secured by a certificate authority different from the one used by OKD, it must be specified using the —certificate-authority flag. Otherwise, the prune command fails with an error.

Images not being pruned

If your images keep accumulating and the prune command removes just a small portion of what you expect, ensure that you understand the image prune conditions that must apply for an image to be considered a candidate for pruning.

Ensure that images you want removed occur at higher positions in each tag history than your chosen tag revisions threshold. For example, consider an old and obsolete image named sha:abz. By running the following command in namespace N, where the image is tagged, the image is tagged three times in a single image stream named myapp:

  1. $ oc get is -n N -o go-template='{{range $isi, $is := .items}}{{range $ti, $tag := $is.status.tags}}'\
  2. '{{range $ii, $item := $tag.items}}{{if eq $item.image "'"sha:abz"\
  3. $'"}}{{$is.metadata.name}}:{{$tag.tag}} at position {{$ii}} out of {{len $tag.items}}\n'\
  4. '{{end}}{{end}}{{end}}{{end}}'

Example output

  1. myapp:v2 at position 4 out of 5
  2. myapp:v2.1 at position 2 out of 2
  3. myapp:v2.1-may-2016 at position 0 out of 1

When default options are used, the image is never pruned because it occurs at position 0 in a history of myapp:v2.1-may-2016 tag. For an image to be considered for pruning, the administrator must either:

  • Specify --keep-tag-revisions=0 with the oc adm prune images command.

    This action removes all the tags from all the namespaces with underlying images, unless they are younger or they are referenced by objects younger than the specified threshold.

  • Delete all the istags where the position is below the revision threshold, which means myapp:v2.1 and myapp:v2.1-may-2016.

  • Move the image further in the history, either by running new builds pushing to the same istag, or by tagging other image. This is not always desirable for old release tags.

Tags having a date or time of a particular image’s build in their names should be avoided, unless the image must be preserved for an undefined amount of time. Such tags tend to have just one image in their history, which prevents them from ever being pruned.

Using a secure connection against insecure registry

If you see a message similar to the following in the output of the oc adm prune images command, then your registry is not secured and the oc adm prune images client attempts to use a secure connection:

  1. error: error communicating with registry: Get https://172.30.30.30:5000/healthz: http: server gave HTTP response to HTTPS client
  • The recommended solution is to secure the registry. Otherwise, you can force the client to use an insecure connection by appending --force-insecure to the command; however, this is not recommended.
Using an insecure connection against a secured registry

If you see one of the following errors in the output of the oc adm prune images command, it means that your registry is secured using a certificate signed by a certificate authority other than the one used by oc adm prune images client for connection verification:

By default, the certificate authority data stored in the user’s configuration files is used; the same is true for communication with the master API.

Use the --certificate-authority option to provide the right certificate authority for the container image registry server.

Using the wrong certificate authority

The following error means that the certificate authority used to sign the certificate of the secured container image registry is different from the authority used by the client:

  1. error: error communicating with registry: Get https://172.30.30.30:5000/: x509: certificate signed by unknown authority

Make sure to provide the right one with the flag --certificate-authority.

As a workaround, the --force-insecure flag can be added instead. However, this is not recommended.

Additional resources

The OpenShift Container Registry can accumulate blobs that are not referenced by the OKD cluster’s etcd. The basic pruning images procedure, therefore, is unable to operate on them. These are called orphaned blobs.

Orphaned blobs can occur from the following scenarios:

  • Manually deleting an image with oc delete image <sha256:image-id> command, which only removes the image from etcd, but not from the registry’s storage.

  • Pushing to the registry initiated by daemon failures, which causes some blobs to get uploaded, but the image manifest (which is uploaded as the very last component) does not. All unique image blobs become orphans.

  • OKD refusing an image because of quota restrictions.

  • The standard image pruner deleting an image manifest, but is interrupted before it deletes the related blobs.

  • A bug in the registry pruner, which fails to remove the intended blobs, causing the image objects referencing them to be removed and the blobs becoming orphans.

Hard pruning the registry, a separate procedure from basic image pruning, allows cluster administrators to remove orphaned blobs. You should hard prune if you are running out of storage space in your OpenShift Container Registry and believe you have orphaned blobs.

This should be an infrequent operation and is necessary only when you have evidence that significant numbers of new orphans have been created. Otherwise, you can perform standard image pruning at regular intervals, for example, once a day (depending on the number of images being created).

Procedure

To hard prune orphaned blobs from the registry:

  1. Log in.

    Log in to the cluster with the CLI as kubeadmin or another privileged user that has access to the openshift-image-registry namespace.

  2. Run a basic image prune.

    Basic image pruning removes additional images that are no longer needed. The hard prune does not remove images on its own. It only removes blobs stored in the registry storage. Therefore, you should run this just before the hard prune.

  3. Switch the registry to read-only mode.

    If the registry is not running in read-only mode, any pushes happening at the same time as the prune will either:

    • fail and cause new orphans, or

    • succeed although the images cannot be pulled (because some of the referenced blobs were deleted).

    Pushes will not succeed until the registry is switched back to read-write mode. Therefore, the hard prune must be carefully scheduled.

    To switch the registry to read-only mode:

    1. In configs.imageregistry.operator.openshift.io/cluster, set spec.readOnly to true:

      1. $ oc patch configs.imageregistry.operator.openshift.io/cluster -p '{"spec":{"readOnly":true}}' --type=merge
  4. Add the system:image-pruner role.

    The service account used to run the registry instances requires additional permissions to list some resources.

    1. Get the service account name:

      1. $ service_account=$(oc get -n openshift-image-registry \
      2. -o jsonpath='{.spec.template.spec.serviceAccountName}' deploy/image-registry)
    2. Add the system:image-pruner cluster role to the service account:

      1. $ oc adm policy add-cluster-role-to-user \
      2. system:image-pruner -z \
      3. ${service_account} -n openshift-image-registry
  5. Optional: Run the pruner in dry-run mode.

    To see how many blobs would be removed, run the hard pruner in dry-run mode. No changes are actually made. The following example references an image registry pod called image-registry-3-vhndw:

    1. $ oc -n openshift-image-registry exec pod/image-registry-3-vhndw -- /bin/sh -c '/usr/bin/dockerregistry -prune=check'

    Alternatively, to get the exact paths for the prune candidates, increase the logging level:

    1. $ oc -n openshift-image-registry exec pod/image-registry-3-vhndw -- /bin/sh -c 'REGISTRY_LOG_LEVEL=info /usr/bin/dockerregistry -prune=check'

    Example output

    1. time="2017-06-22T11:50:25.066156047Z" level=info msg="start prune (dry-run mode)" distribution_version="v2.4.1+unknown" kubernetes_version=v1.6.1+$Format:%h$ openshift_version=unknown
    2. time="2017-06-22T11:50:25.092257421Z" level=info msg="Would delete blob: sha256:00043a2a5e384f6b59ab17e2c3d3a3d0a7de01b2cabeb606243e468acc663fa5" go.version=go1.7.5 instance.id=b097121c-a864-4e0c-ad6c-cc25f8fdf5a6
    3. time="2017-06-22T11:50:25.092395621Z" level=info msg="Would delete blob: sha256:0022d49612807cb348cabc562c072ef34d756adfe0100a61952cbcb87ee6578a" go.version=go1.7.5 instance.id=b097121c-a864-4e0c-ad6c-cc25f8fdf5a6
    4. time="2017-06-22T11:50:25.092492183Z" level=info msg="Would delete blob: sha256:0029dd4228961086707e53b881e25eba0564fa80033fbbb2e27847a28d16a37c" go.version=go1.7.5 instance.id=b097121c-a864-4e0c-ad6c-cc25f8fdf5a6
    5. time="2017-06-22T11:50:26.673946639Z" level=info msg="Would delete blob: sha256:ff7664dfc213d6cc60fd5c5f5bb00a7bf4a687e18e1df12d349a1d07b2cf7663" go.version=go1.7.5 instance.id=b097121c-a864-4e0c-ad6c-cc25f8fdf5a6
    6. time="2017-06-22T11:50:26.674024531Z" level=info msg="Would delete blob: sha256:ff7a933178ccd931f4b5f40f9f19a65be5eeeec207e4fad2a5bafd28afbef57e" go.version=go1.7.5 instance.id=b097121c-a864-4e0c-ad6c-cc25f8fdf5a6
    7. time="2017-06-22T11:50:26.674675469Z" level=info msg="Would delete blob: sha256:ff9b8956794b426cc80bb49a604a0b24a1553aae96b930c6919a6675db3d5e06" go.version=go1.7.5 instance.id=b097121c-a864-4e0c-ad6c-cc25f8fdf5a6
    8. ...
    9. Would delete 13374 blobs
    10. Would free up 2.835 GiB of disk space
    11. Use -prune=delete to actually delete the data
  6. Run the hard prune.

    Execute the following command inside one running instance of a image-registry pod to run the hard prune. The following example references an image registry pod called image-registry-3-vhndw:

    1. $ oc -n openshift-image-registry exec pod/image-registry-3-vhndw -- /bin/sh -c '/usr/bin/dockerregistry -prune=delete'

    Example output

    1. Deleted 13374 blobs
    2. Freed up 2.835 GiB of disk space
  7. Switch the registry back to read-write mode.

    After the prune is finished, the registry can be switched back to read-write mode. In configs.imageregistry.operator.openshift.io/cluster, set spec.readOnly to false:

    1. $ oc patch configs.imageregistry.operator.openshift.io/cluster -p '{"spec":{"readOnly":false}}' --type=merge

Pruning cron jobs

Cron jobs can perform pruning of successful jobs, but might not properly handle failed jobs. Therefore, the cluster administrator should perform regular cleanup of jobs manually. They should also restrict the access to cron jobs to a small group of trusted users and set appropriate quota to prevent the cron job from creating too many jobs and pods.

Additional resources