Managing Resources
Note that pretty much all controllers consume:
- : largely based on the number of reconciliations they perform, which are generally related to event activity for resources they’re watching.
Memory
: largely based on the number of primary resources that exist (multiplied by some factor based on the number of operand resources they need to watch as a result) via informer caches.
And then there is a concern that one Pod or Container could monopolize all available resources and Cluster admins must consider the effects that one Pod or Container may have on other components.
In an effort to prevent a container from consuming all the resources on a cluster or affecting other workloads from being scheduled, many production clusters will define ResourceQuota configurations.
The configuration also applies to tenant workloads that are managed by your Operator. Cluster administrators will typically set a ResourceQuota for each tenant’s namespace as part of the onboarding. If a with default values has not been created in each namespace and your Operator creates Containers inside the tenant namespace without specifying at least resource requests for CPU and Memory of its Pods then, the system or quota may reject Pod creation. Check the following statements obtained from K8s docs:
In an effort to support clusters with the above configuration, to ensure safe operations and avoid negatively impacting other workloads: Operators should always include reasonable memory and CPU resource requests for their own deployment as well as for operands they deploy.
HINT Cluster admins might also able to avoid the above scenario by setting default values when they are not specified for each Pod and/or Container in a namespace.
Resource requests and limits for the Operator Deployment can be defined by modifying theconfig/manager/manager.yaml
as shown below:
IMPORTANT: A single configuration that fits all scenarios is not possible. In this way, Operators authors MUST to ensure that Cluster Admins and its users can change the resource requests/limits of the Operator/manager, and of its Operands.
However, you are able to benchmark your Operator by Monitoring the resource usage to ensure good and reasonable values for the general cases. Kubebuilder and SDK provide some which can help you with.
NOTE Also, be aware that if the project was generated by Kubebuilder or SDK scaffold then some values for the Operator/manager
(See config/manager/manager.yaml
) are populated by default to get you started, however, you ought to optimize them based on your own tests and the specific needs of your operator.
If your operator is managed by OLM, administrators or users can configure your operator’s resource requests and limits via the subscription.
Following are some general recommendations to manage the resources:
- MUST declare resource requests for both, CPU and Memory, for the and any
Pod/Deployment
managed by it - OPTIONALLY setting the resources limit for CPU and Memory for the Operator Pod and any
Pod/Deployment
managed by it. - SHOULD provide the mechanisms for so that, Cluster Admins can use these metrics to monitor and resize the Operator and its Operands.*CAVEAT: If the Operator is integrated with OLM and the bundle has a
PodMonitor
or aServiceMonitor
the completeInstallPlan
will fail on a cluster, which does not have these CRD/the Prometheus operator installed. In this case, you might want to ensure the dependency requirement with OLM dependency or make clear its requirement for the Operator consumers.* - SHOULD allow admins to customize the
requests/limits
amounts defined for thePod/Deployment
created by the Operator and not hardcode these values.
Resource Requests
What happens when the resource requests are not set?
- configurations made by the cluster administrators such as ResourceQuota might not work without LimitRanges. The LimitRanger admission controller can provide default values for resource requests and limits when they have not been defined.
- the Operators consumers might face resource shortages on a node when resource usage increases, for example, during a daily peak in request rate.
- the Operator’s consumers might be unable to successfully deploy the Operator because it does not have the minimal resources available.
- the scheduler cannot make an informed placement choice when it picks the nodes the operator pods will be running on.
- when there is memory contention on the node the pod is likely to get either evicted or OOM killed.
- when there is CPU contention on the node the pod is likely to get starved of CPU cycles making the operator unresponsive.
Resource Limits
What happens when the resource limits are not set?
However, a popular practice by cluster administrators is to leverage ResourceQuota to limit the total amount of resources that can be requested or allowed in a single namespace. This may protect against over consumption of resources by operands of a faulty or wrongly configured operator. On the other hand it also means that the operator may not be able to create additional pods, limiting its functionality when the limit has been reached.
Also, see might want to check in the K8s docs the following sections:
Limits reached
What happens when the resource limits have been reached?
For Memory
: the container might be terminated with the reasonOOM Killed
. If it is restartable, the kubelet will restart it, as with any other type of runtime failure.
You might want to check the Troubleshooting section in the Kubernetes documents to better understand how to debug these scenarios.
Limits are specified but not requests
What happens when the resource limits are defined but not the requests?
If you specify a CPU or Memory limit for a Container but do not specify a request, Kubernetes automatically assigns a CPU or Memory request that matches the limit. In this way, you will be requesting always the limit and will be allocating more resources than required. (NOT RECOMMENDED)
Values are too big
Memory and CPU requests and limits are associated with Containers, but be aware that the Memory and CPU requests and limits of a Pod are the sum of its specific computing types for all the containers in the Pod.
If you define that your Pods should have Memory or CPU request too big then, you might not only be allocating and blocking the usage of more than you ought unnecessarily. Also, your Operator consumers might be able to install your Operator via OLM, for example, but will be unable to check the Pods/Deployment running successfully when the amount defined exceeds the capacity available. In these scenarios, the Operator consumers will check that Pod(s) failed to schedule with event errors like Insufficient cpu
and/or .