Recommendations for different cases
- Low Update and Slow reaction

Overview

To have a simple view the most of parts of HA will be skipped to describeKubelet<->Controller Manager communication only.

By default the normal behavior looks like:

Kubernetes controller manager checks the statuses of Kubelets every–-node-monitor-period. The default value is 5s.
In case the status is updated within --node-monitor-grace-period of time,Kubernetes controller manager considers healthy status of Kubelet. Thedefault value is 40s.

Failure

Kubelet will try to update the status infunction. Kubelet uses http.Client() Golang method, but has no specifiedtimeout. Thus there may be some glitches when API Server is overloaded whileTCP connection is established.

So, there will be nodeStatusUpdateRetry * attempts to set a status of node.

At the same time Kubernetes controller manager will try to checknodeStatusUpdateRetry times every --node-monitor-period of time. After--node-monitor-grace-period it will consider node unhealthy. It will removeits pods based on --pod-eviction-timeout

Kube proxy has a watcher over API. Once pods are evicted, Kube proxy willnotice and will update iptables of the node. It will remove endpoints fromservices so pods from failed node won’t be accessible anymore.

Recommendations for different cases

If -–node-status-update-frequency is set to 4s (10s is default).--node-monitor-period to 2s (5s is default).--node-monitor-grace-period to 20s (40s is default). is set to 30s (5m is default)

In such scenario, pods will be evicted in 50s because the node will beconsidered as down after 20s, and --pod-eviction-timeout occurs after30s more. However, this scenario creates an overhead on etcd as every nodewill try to update its status every 2 seconds.

Let’s set -–node-status-update-frequency to 20s--node-monitor-grace-period to 2m and --pod-eviction-timeout to 1m.In that case, Kubelet will try to update status every 20s. So, it will be 6 * 5= 30 attempts before Kubernetes controller manager will consider unhealthystatus of node. After 1m it will evict all pods. The total time will be 3mbefore eviction process.

Such scenario is good for medium environments as 1000 nodes will require 3000etcd updates per minute.

Let’s set -–node-status-update-frequency to 1m.--node-monitor-grace-period will set to 5m and to 1m. In this scenario, every kubelet will try to update the status everyminute. There will be 5 * 5 = 25 attempts before unhealthy status. After 5m,Kubernetes controller manager will set unhealthy status. This means that podswill be evicted after 1m after being marked unhealthy. (6m in total).

Kubernetes reliability

Overview

Failure

Recommendations for different cases