As a critical building block for distributed systems it is crucial to perform adequate capacity planning in order to support the intended cluster workload. As a highly available and strongly consistent data store increasing the number of nodes in an etcd cluster will generally affect performance adversely. This makes sense intuitively, as more nodes means more members for the leader to coordinate state across. The most direct way to increase throughput and decrease latency of an etcd cluster is allocate more disk I/O, network I/O, CPU, and memory to cluster members. In the event it is impossible to temporarily divert incoming requests to the cluster, scaling the EC2 instances which comprise the etcd cluster members one at a time may improve performance. It is, however, best to avoid bottlenecks through capacity planning.

The etcd team has produced a hardware recommendation guide which is very useful for “ballparking” how many nodes and what instance type are necessary for a cluster.

AWS provides a service for creating groups of EC2 instances which are dynamically sized to match load on the instances. Using an Auto Scaling Group ( ) to dynamically scale an etcd cluster is not recommended for several reasons including:

  • etcd performance is generally inversely proportional to the number of members in a cluster due to the synchronous replication which provides strong consistency of data stored in etcd
  • the operational complexity of adding lifecycle hooks to properly add and remove members from an etcd cluster by modifying the

Auto Scaling Groups do provide a number of benefits besides cluster scaling which include:

  • distribution of EC2 instances across Availability Zones (AZs)
  • EC2 instance fail over across AZs
  • consolidated monitoring and life cycle control of instances within an ASG

The use of an ASG to create a self healing etcd cluster is one of the design considerations when deploying an etcd cluster to AWS.

Cluster design

  • block device provider: limited to the tradeoffs between EBS or instance storage (InstanceStore)
  • cluster topology: how many nodes should make up an etcd cluster; should these nodes be distributed over multiple AZs

The intended cluster workload should dictate the cluster design. A configuration store for microservices may require different design considerations than a distributed lock service, a secrets store, or a Kubernetes control plane. Cluster design tradeoffs include considerations such as:

  • availability
  • data durability after member failure
  • performance/throughput
  • self healing

Instance availability on AWS is ultimately determined by the Amazon EC2 Region Service Level Agreement ( ) which is the policy by which Amazon describes their precise definition of a regional outage.

In the context of an etcd cluster this means a cluster must contain a minimum of three members where EC2 instances are spread across at least two AZs in order for an etcd cluster to be considered highly available at a Regional level.

For most use cases the additional latency associated with a cluster spanning across Availability Zones will introduce a negligible performance impact.

Availability considerations apply to all components of an application; if the application which accesses the etcd cluster will only be deployed to a single Availability Zone it may not make sense to make the etcd cluster highly available across zones.

  • replication: etcd replicates all data to all members of the etcd cluster. Therefore, given more members in the cluster and more independent failure domains, the less likely that data stored in an etcd cluster will be permanently lost in the event of disaster.
  • Point in time etcd snapshotting: the etcd v3 API introduced support for snapshotting clusters. The operation is cheap enough (completing in the order of minutes) to run quite frequently and the resulting archives can be archived in a storage service like Amazon Simple Storage Service (S3).
  • Amazon Elastic Block Storage (EBS): an EBS volume is a replicated network attached block device which have stronger storage safety guarantees than InstanceStore which has a life cycle associated with the life cycle of the attached EC2 instance. The life cycle of an EBS volume is not necessarily tied to an EC2 instance and can be detached and snapshotted independently which means that a single node etcd cluster backed by an EBS volume can provide a fairly reasonable level of data durability.

The performance of an etcd cluster is roughly quantifiable through latency and throughput metrics which are primarily affected by disk and network performance. Detailed performance planning information is provided in the performance section of the etcd operations guide.

Network

AWS offers EC2 Placement Groups which allow the collocation of EC2 instances within a single Availability Zone which can be utilized in order to minimize network latency between etcd members in the cluster. It is important to remember that collocation of etcd nodes within a single AZ will provide weaker fault tolerance than distributing members across multiple AZs. may also improve network performance of individual EC2 instances.

Disk

AWS provides two basic types of block storage: EBS volumes and . As mentioned, an EBS volume is a network attached block device while instance storage is directly attached to the hypervisor of the EC2 host. EBS volumes will generally have higher latency, lower throughput, and greater performance variance than Instance Store volumes. If performance, rather than data safety, is the primary concern it is highly recommended that instance storage on the EC2 instances be utilized. Remember that the amount of available instance storage varies by EC2 instance types which may impose additional performance considerations.

Inconsistent EBS volume performance can introduce etcd cluster instability. can provide more consistent performance than general purpose SSD EBS volumes. More information about EBS volume performance is available from AWS and Datadog has shared their experience with in their engineering blog.

While using an ASG to scale the size of an etcd cluster is not recommended, an ASG can be used effectively to maintain the desired number of nodes in the event of node failure. The maintenance of a stable number of etcd nodes will provide the etcd cluster with a measure of self healing.