For a list of requirements for your cluster, including the requirements for OS/Docker, hardware, and networking, refer to the section on node requirements.

For a full list of all the best practices that we recommend, refer to the

  • Make sure your nodes fulfill all of the node requirements, including the port requirements.

Back up etcd

  • Nodes should have one of the following role configurations:
    • controlplane
    • etcd and controlplane
    • worker (the role should not be used or added on nodes with the etcd or controlplane role)
  • Have at least three nodes with the role etcd to survive losing one node. Increase this count for higher node fault toleration, and spread them across (availability) zones to provide even better fault tolerance.
  • Assign two or more nodes the role for workload rescheduling upon node failure.

For more information about the number of nodes for each Kubernetes role, refer to the section on recommended architecture.

Logging and Monitoring

  • Configure alerts/notifiers for Kubernetes components (System Service).
  • Configure logging for cluster analysis and post-mortems.
  • Perform load tests on your cluster to verify that its hardware can support your workloads.

Networking

  • Minimize network latency. Rancher recommends minimizing latency between the etcd nodes. The default setting for heartbeat-interval is 500, and the default setting for election-timeout is 5000. These allow etcd to run in most networks (except really high latency networks).
  • Cluster nodes should be located within a single region. Most cloud providers provide multiple availability zones within a region, which can be used to create higher availability for your cluster. Using multiple availability zones is fine for nodes with any role. If you are using Kubernetes Cloud Provider resources, consult the documentation for any restrictions (i.e. zone storage restrictions).