BroadcastJob + Advanced CronJob Help You Maintain Kubernetes Nodes

    Kubernetes job is obviously very suitable for this kind of one-time temporary work, such as cleaning up disk, because unlike the agent process running in host, Kubernetes job only needs to temporarily use some resources, and it will be automatically released the resources after the task is completed. But, Kubernetes native jobs have the following limitations in the scenarios of node operation and maintenance:

    1. Its default scheduling rule is unsuitable. Multiple pods may be scheduled to the same node, causing the problem of repeated execution of jobs;
    2. It cannot automatically perceive the scale of cluster nodes. When a node is added/deleted to/from the cluster, the job configuration must be updated manually.

    Openkruise provides BroadcastJob and Advanced CronJob features to solve such problems. BroadcastJob allows users to schedule the pods in a way similar to DaemonSet. When a user apply a BroadcastJob, it will create pods for each worker node of the cluster by default, and these pods will be cleaned up automatically when the task is completed. Furthermore, Advanced CronJob can create the BroadcastJob periodically. This article will demonstrate how to use Advanced CronJob and BroadcastJob to periodically clean up useless images stored in Kubernetes nodes to help you understand these features.

    We deployed a cluster on an ECS (host), and all kind nodes adopt containerd as container runtime. The kind cluster consists of three nodes, including one master node and two worker nodes:

    Before the demonstration, we should take a look at the disk pressure of ECS (host), to compare with the effect after demonstration:

    1. Filesystem Size Used Avail Use% Mounted on
    2. udev 7.7G 0 7.7G 0% /dev
    3. tmpfs 1.6G 1.4M 1.6G 1% /run
    4. /dev/vda1 79G 63G 13G 84% /
    5. tmpfs 7.7G 0 7.7G 0% /dev/shm
    6. tmpfs 5.0M 0 5.0M 0% /run/lock
    7. tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
    8. tmpfs 1.6G 0 1.6G 0% /run/user/0
    9. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/94e3ec1c3a45a43e4ffa34c654bc3639007eb2fb5d4e9724fed056c6bb8d119f/merged
    10. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/7718d5a17be239ade398f907f82acf2c90fb7752a90a667114a573c60757d23b/merged
    11. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/0f78036c619c03fb37ec8029e5718bb206472971169bb2711bee06af21228763/merged
    12. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/029e008a7c5b754e4246c8fc55bf189c83a0b8b1df50c2ecb67d1734095b935b/merged
    13. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/899a50ca07b4e2de08d627dbb1e6f1cc9e1eb0c048a71c4905854f31bf51f056/merged
    14. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/c72de0669810b5dcbf4b2726c0c32765fbbb1e4c21826f59533414fb474c826a/merged
    15. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/af8c22b65e7ae64f15f0132baed91550adfe81cd4e088e2bb84e01476619340a/merged
    16. overlay 79G 63G 13G 84% /var/lib/docker/overlay2/454a7e90cb3c723dc6b22b0d54e60714700b4c0bcf947b29206d882c6a2c25fe/merged

    Also, Let’s take a look at the images in the worker1 node. We can see that this node currently has 125 images:

    1. root@kruise:~# docker exec -it worker1 /bin/sh
    2. $ crictl images | wc -l
    3. 125
    4. $ crictl images
    5. REPOSITORY TAG IMAGE ID SIZE
    6. docker.io/minchou/cleaner v1 7e36ca8e9d40 68.6MB
    7. docker.io/minchou/rollout v0.7.3 120dc8c670ef 57MB
    8. docker.io/minchou/rollout v0.7.2 2f1f320cd94a 57MB
    9. docker.io/minchou/rollout v0.7.1 c90679a2e4ff 57MB
    10. docker.io/minchou/rollout v0.7.0 a81db48ec891 57MB
    11. docker.io/minchou/rollout v0.6.2 af5ef616c30e 55.9MB
    12. docker.io/minchou/rollout v0.6.1 71ba2e84e92e 55.9MB
    13. docker.io/minchou/rollout v0.6.0 3fe9eb8f0144 55.9MB
    14. ... .... ... ....

    job.yaml

    1. apiVersion: apps.kruise.io/v1alpha1
    2. kind: AdvancedCronJob
    3. metadata:
    4. name: acj-test
    5. spec:
    6. schedule: "*/5 * * * *"
    7. startingDeadlineSeconds: 60
    8. broadcastJobTemplate:
    9. spec:
    10. template:
    11. spec:
    12. containers:
    13. - name: node-cleaner
    14. image: minchou/cleaner:v1
    15. imagePullPolicy: IfNotPresent
    16. # crictl use this env to find conatiner runtime socket.
    17. # this value should consistent with the path of mounted
    18. # container runtime socket file.
    19. - name: CONTAINER_RUNTIME_ENDPOINT
    20. value: unix:///var/run/containerd/containerd.sock
    21. volumeMounts:
    22. # mount container runtime socket file to this path.
    23. - name: containerd
    24. mountPath: /var/run/containerd
    25. volumes:
    26. - name: containerd
    27. hostPath:
    28. path: /var/run/containerd
    29. restartPolicy: OnFailure
    30. completionPolicy:
    31. type: Always
    32. ttlSecondsAfterFinished: 90
    33. failurePolicy:
    34. type: Continue
    35. restartLimit: 3

    Similarly, if your application log is also written directly under the host path, you can also mount it in this way and clean it together.

    In order to make it easier for us to observe the operation of Advanced CronJob, we define its schedule period 5 minutes, that is, the schedule field is defined as * / 5 * * *. In fact, in the real scene, we can clean it every few days or weeks instead of 5 minutes. You can refer to to customize the schedule.

    File directory structure:

    In order to build the image faster, we downloaded and put it in the same directory as Dockerfile.

    Note: if it is used in the production, please strictly verify your script!

    cleaner.sh

    1. #!/bin/sh
    2. echo "container runtime endpoint:" $CONTAINER_RUNTIME_ENDPOINT
    3. # clean up docker resources if have
    4. crictl ps > /dev/null
    5. if [ $? -eq 0 ]
    6. then
    7. # Implement your customized script here, such as:
    8. # get the images that is used, these images cannot be deleted
    9. crictl ps | awk '{if(NR>1){print $2}}' > used-images.txt
    10. # @@ You can choose the images you want to clean according to your requirement @@
    11. # ** Here, we will clean all images from my docker.io/minchou repo! **
    12. crictl images | grep -i "docker.io/minchou"| awk '{print $3}' > target-images.txt
    13. # filter out the used images and delete thoese unused images
    14. else
    15. echo "crictl does not exist"
    16. exit 0

    Dockerfile Sample

    1. FROM alpine
    2. COPY crictl-v1.23.0-linux-amd64.tar.gz ./
    3. RUN tar zxvf crictl-v1.23.0-linux-amd64.tar.gz -C /bin && rm crictl-v1.23.0-linux-amd64.tar.gz
    4. COPY cleaner.sh /bin/
    5. RUN chmod +x /bin/cleaner.sh
    6. CMD ["bash", "/bin/cleaner.sh"]
    1. $ docker build . -t minchou/cleaner:v1 && docker push minchou/cleaner:v1

    Then apply the Advanced CronJob configuration:

    We can see that the next execution time is 2022-03-24 08:50:00 +0000 UTC in kruise log:

    1. $ kubectl -n kruise-system logs kruise-controller-manager-745594ff76-9nwwx --tail 1000 | grep "no upcoming scheduled times, sleeping until next now"
    2. I0324 08:45:08.131928 1 advancedcronjob_broadcastjob_controller.go:290] no upcoming scheduled times, sleeping until next now 2022-03-24 08:45:08.131896998 +0000 UTC m=+535162.957711312 and next run 2022-03-24 08:50:00 +0000 UTC default/acj-test

    When the time is up, the advanced cronjob applied a BroadcastJob, and let’s take a look at the log of the pod that is created by BroadcastJob for worker1 node:

    1. $ kubectl logs acj-test-1648111800-8t8bx
    2. container runtime endpoint: unix:///var/run/containerd/containerd.sock
    3. Deleted: docker.io/minchou/rollout:v0.2.7
    4. Deleted: docker.io/minchou/rollout:v0.4.1
    5. Deleted: docker.io/minchou/rollout:v0.7.3
    6. Deleted: docker.io/minchou/rollout:br-5
    7. Deleted: docker.io/minchou/rollout:v0.4.2
    8. Deleted: docker.io/minchou/kruiserollout:br-f
    9. Deleted: docker.io/minchou/rollout:v0.7.2
    10. Deleted: docker.io/minchou/rollout:v0.4.0
    11. Deleted: docker.io/minchou/rollout:v0.3.8
    12. Deleted: docker.io/minchou/rollout:v0.3.0
    13. Deleted: docker.io/minchou/kruiserollout:br-2
    14. Deleted: docker.io/minchou/rollout:br-3
    15. ... ... ... ...

    we can see that cleaner.sh script works, the target image has been deleted. Then, let’s take a look at the disk pressure of ECS (host):

    1. root@kruise011162126109:~# df -h
    2. Filesystem Size Used Avail Use% Mounted on
    3. udev 7.7G 0 7.7G 0% /dev
    4. tmpfs 1.6G 1.4M 1.6G 1% /run
    5. /dev/vda1 79G 44G 32G 59% /
    6. tmpfs 7.7G 0 7.7G 0% /dev/shm
    7. tmpfs 5.0M 0 5.0M 0% /run/lock
    8. tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
    9. tmpfs 1.6G 0 1.6G 0% /run/user/0
    10. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/94e3ec1c3a45a43e4ffa34c654bc3639007eb2fb5d4e9724fed056c6bb8d119f/merged
    11. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/7718d5a17be239ade398f907f82acf2c90fb7752a90a667114a573c60757d23b/merged
    12. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/0f78036c619c03fb37ec8029e5718bb206472971169bb2711bee06af21228763/merged
    13. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/029e008a7c5b754e4246c8fc55bf189c83a0b8b1df50c2ecb67d1734095b935b/merged
    14. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/899a50ca07b4e2de08d627dbb1e6f1cc9e1eb0c048a71c4905854f31bf51f056/merged
    15. overlay 79G 44G 32G 59% /var/lib/docker/overlay2/c72de0669810b5dcbf4b2726c0c32765fbbb1e4c21826f59533414fb474c826a/merged

    It can be seen that the disk pressure has decreased from 84% to 59%, which is very significant. Finally, we also can find out the next execution time from kruise’s log, the next execution is really 5 minutes later (2022-03-24 08:55:00 + 0000 UTC):

    From the above demonstration, we can see that the Advanced Cronjob + BroadcastJob + Customized Script can help you clean up useless images of nodes periodically. Of course, this is just a simple example of node operation and maintenance. If you encounter the similar problems, I hope this article can help and inspire you.