For a list of requirements for your cluster, including the requirements for OS/Docker, hardware, and networking, refer to the section on node requirements.
For a full list of all the best practices that we recommend, refer to the
- Make sure your nodes fulfill all of the node requirements, including the port requirements.
Back up etcd
- Nodes should have one of the following role configurations:
controlplane
etcd
andcontrolplane
worker
(the role should not be used or added on nodes with theetcd
orcontrolplane
role)
- Have at least three nodes with the role
etcd
to survive losing one node. Increase this count for higher node fault toleration, and spread them across (availability) zones to provide even better fault tolerance. - Assign two or more nodes the role for workload rescheduling upon node failure.
For more information about the number of nodes for each Kubernetes role, refer to the section on recommended architecture.
Logging and Monitoring
- Configure alerts/notifiers for Kubernetes components (System Service).
- Configure logging for cluster analysis and post-mortems.
- Perform load tests on your cluster to verify that its hardware can support your workloads.
Networking
- Minimize network latency. Rancher recommends minimizing latency between the etcd nodes. The default setting for
heartbeat-interval
is500
, and the default setting forelection-timeout
is5000
. These allow etcd to run in most networks (except really high latency networks). - Cluster nodes should be located within a single region. Most cloud providers provide multiple availability zones within a region, which can be used to create higher availability for your cluster. Using multiple availability zones is fine for nodes with any role. If you are using Kubernetes Cloud Provider resources, consult the documentation for any restrictions (i.e. zone storage restrictions).