Safely Drain a Node
Your Kubernetes server must be at or later than version 1.5. To check the version, enter kubectl version
.
This task also assumes that you have met the following prerequisites:
- You do not require your applications to be highly available during the node drain, or
- You have read about the PodDisruptionBudget concept, and have for applications that need them.
(Optional) Configure a disruption budget
To ensure that your workloads remain available during maintenance, you can configure a .
If availability is important for any applications that run or could run on the node(s) that you are draining, configure a PodDisruptionBudgets first and the continue following this guide.
You can use kubectl drain
to safely evict all of your pods from a node before you perform maintenance on the node (e.g. kernel upgrade, hardware maintenance, etc.). Safe evictions allow the pod’s containers to and will respect the PodDisruptionBudgets you have specified.
Note: By default kubectl drain
ignores certain system pods on the node that cannot be killed; see the kubectl drain documentation for more details.
When kubectl drain
returns successfully, that indicates that all of the pods (except the ones excluded as described in the previous paragraph) have been safely evicted (respecting the desired graceful termination period, and respecting the PodDisruptionBudget you have defined). It is then safe to bring down the node by powering down its physical machine or, if running on a cloud platform, deleting its virtual machine.
Next, tell Kubernetes to drain the node:
Once it returns (without giving an error), you can power down the node (or equivalently, if on a cloud platform, delete the virtual machine backing the node). If you leave the node in the cluster during the maintenance operation, you need to run
afterwards to tell Kubernetes that it can resume scheduling new pods onto the node.
Draining multiple nodes in parallel
The kubectl drain
command should only be issued to a single node at a time. However, you can run multiple kubectl drain
commands for different nodes in parallel, in different terminals or in the background. Multiple drain commands running concurrently will still respect the PodDisruptionBudget you specify.
For example, if you have a StatefulSet with three replicas and have set a PodDisruptionBudget for that set specifying minAvailable: 2
, only evicts a pod from the StatefulSet if all three replicas pods are ready; if then you issue multiple drain commands in parallel, Kubernetes respects the PodDisruptionBudget and ensure that only 1 (calculated as replicas - minAvailable
) Pod is unavailable at any given time. Any drains that would cause the number of ready replicas to fall below the specified budget are blocked.
If you prefer not to use kubectl drain (such as to avoid calling to an external command, or to get finer control over the pod eviction process), you can also programmatically cause evictions using the eviction API.
You should first be familiar with using to access the API.
Note: policy/v1
Eviction is available in v1.22+. Use policy/v1beta1
with prior releases.
{
"apiVersion": "policy/v1",
"kind": "Eviction",
"metadata": {
"name": "quux",
"namespace": "default"
Note: Deprecated in v1.22 in favor of policy/v1
You can attempt an eviction using curl
:
curl -v -H 'Content-type: application/json' https://your-cluster-api-endpoint.example/api/v1/namespaces/default/pods/quux/eviction -d @eviction.json
The API can respond in one of three ways:
- If the eviction is granted, then the Pod is deleted as if you sent a
DELETE
request to the Pod’s URL and received back200 OK
. - If the current state of affairs wouldn’t allow an eviction by the rules set forth in the budget, you get back
429 Too Many Requests
. This is typically used for generic rate limiting of any requests, but here we mean that this request isn’t allowed right now but it may be allowed later. - If there is some kind of misconfiguration; for example multiple PodDisruptionBudgets that refer the same Pod, you get a
500 Internal Server Error
response.
For a given eviction request, there are two cases:
- There is no budget that matches this pod. In this case, the server always returns
200 OK
. - There is at least one budget. In this case, any of the three above responses may apply.
Stuck evictions
In some cases, an application may reach a broken state, one where unless you intervene the eviction API will never return anything other than 429 or 500.
For example: this can happen if ReplicaSet is creating Pods for your application but the replacement Pods do not become . You can also see similar symptoms if the last Pod evicted has a very long termination grace period.
- Abort or pause the automated operation. Investigate the reason for the stuck application, and restart the automation.
- After a suitably long wait,
DELETE
the Pod from your cluster’s control plane, instead of using the eviction API.
Kubernetes does not specify what the behavior should be in this case; it is up to the application owners and cluster owners to establish an agreement on behavior in these cases.
- Follow steps to protect your application by .