EndpointSlices
EndpointSlices provide a simple way to track network endpoints within a Kubernetes cluster. They offer a more scalable and extensible alternative to Endpoints.
The Endpoints API has provided a simple and straightforward way of tracking network endpoints in Kubernetes. Unfortunately as Kubernetes clusters and Services have grown to handle and send more traffic to more backend Pods, limitations of that original API became more visible. Most notably, those included challenges with scaling to larger numbers of network endpoints.
Since all network endpoints for a Service were stored in a single Endpoints resource, those resources could get quite large. That affected the performance of Kubernetes components (notably the master control plane) and resulted in significant amounts of network traffic and processing when Endpoints changed. EndpointSlices help you mitigate those issues as well as provide an extensible platform for additional features such as topological routing.
In Kubernetes, an EndpointSlice contains references to a set of network endpoints. The control plane automatically creates EndpointSlices for any Kubernetes Service that has a specified. These EndpointSlices include references to all the Pods that match the Service selector. EndpointSlices group network endpoints together by unique combinations of protocol, port number, and Service name. The name of a EndpointSlice object must be a valid DNS subdomain name.
As an example, here’s a sample EndpointSlice resource for the example
Kubernetes Service.
By default, the control plane creates and manages EndpointSlices to have no more than 100 endpoints each. You can configure this with the --max-endpoints-per-slice
flag, up to a maximum of 1000.
EndpointSlices can act as the source of truth for kube-proxy when it comes to how to route internal traffic. When enabled, they should provide a performance improvement for services with large numbers of endpoints.
EndpointSlices support three address types:
- IPv4
- IPv6
- FQDN (Fully Qualified Domain Name)
Conditions
The EndpointSlice API stores conditions about endpoints that may be useful for consumers. The three conditions are ready
, serving
, and terminating
.
Ready
Serving
FEATURE STATE: Kubernetes v1.20 [alpha]
serving
is identical to the ready
condition, except it does not account for terminating states. Consumers of the EndpointSlice API should check this condition if they care about pod readiness while the pod is also terminating.
Note: Although serving
is almost identical to ready
, it was added to prevent break the existing meaning of ready
. It may be unexpected for existing clients if ready
could be true
for terminating endpoints, since historically terminating endpoints were never included in the Endpoints or EndpointSlice API to begin with. For this reason, ready
is always false
for terminating endpoints, and a new condition serving
was added in v1.20 so that clients can track readiness for terminating pods independent of the existing semantics for ready
.
Terminating
FEATURE STATE: Kubernetes v1.20 [alpha]
Terminating
is a condition that indicates whether an endpoint is terminating. For pods, this is any pod that has a deletion timestamp set.
Each endpoint within an EndpointSlice can contain relevant topology information. The topology information includes the location of the endpoint and information about the corresponding Node and zone. These are available in the following per endpoint fields on EndpointSlices:
zone
- The zone this endpoint is in.
Note:
In the v1 API, the per endpoint was effectively removed in favor of the dedicated fields nodeName
and zone
.
Setting arbitrary topology fields on the endpoint
field of an EndpointSlice
resource has been deprecated and is not be supported in the v1 API. Instead, the v1 API supports setting individual nodeName
and zone
fields. These fields are automatically translated between API versions. For example, the value of the "topology.kubernetes.io/zone"
key in the topology
field in the v1beta1 API is accessible as the zone
field in the v1 API.
Management
To ensure that multiple entities can manage EndpointSlices without interfering with each other, Kubernetes defines the label endpointslice.kubernetes.io/managed-by
, which indicates the entity managing an EndpointSlice. The endpoint slice controller sets endpointslice-controller.k8s.io
as the value for this label on all EndpointSlices it manages. Other entities managing EndpointSlices should also set a unique value for this label.
In most use cases, EndpointSlices are owned by the Service that the endpoint slice object tracks endpoints for. This ownership is indicated by an owner reference on each EndpointSlice as well as a kubernetes.io/service-name
label that enables simple lookups of all EndpointSlices belonging to a Service.
EndpointSlice mirroring
In some cases, applications create custom Endpoints resources. To ensure that these applications do not need to concurrently write to both Endpoints and EndpointSlice resources, the cluster’s control plane mirrors most Endpoints resources to corresponding EndpointSlices.
The control plane mirrors Endpoints resources unless:
- the Endpoints resource has a
endpointslice.kubernetes.io/skip-mirror
label set totrue
. - the Endpoints resource has a
control-plane.alpha.kubernetes.io/leader
annotation. - the corresponding Service resource does not exist.
- the corresponding Service resource has a non-nil selector.
Individual Endpoints resources may translate into multiple EndpointSlices. This will occur if an Endpoints resource has multiple subsets or includes endpoints with multiple IP families (IPv4 and IPv6). A maximum of 1000 addresses per subset will be mirrored to EndpointSlices.
Each EndpointSlice has a set of ports that applies to all endpoints within the resource. When named ports are used for a Service, Pods may end up with different target port numbers for the same named port, requiring different EndpointSlices. This is similar to the logic behind how subsets are grouped with Endpoints.
The control plane tries to fill EndpointSlices as full as possible, but does not actively rebalance them. The logic is fairly straightforward:
- Iterate through existing EndpointSlices, remove endpoints that are no longer desired and update matching endpoints that have changed.
- Iterate through EndpointSlices that have been modified in the first step and fill them up with any new endpoints needed.
- If there’s still new endpoints left to add, try to fit them into a previously unchanged slice and/or create new ones.
Importantly, the third step prioritizes limiting EndpointSlice updates over a perfectly full distribution of EndpointSlices. As an example, if there are 10 new endpoints to add and 2 EndpointSlices with room for 5 more endpoints each, this approach will create a new EndpointSlice instead of filling up the 2 existing EndpointSlices. In other words, a single EndpointSlice creation is preferrable to multiple EndpointSlice updates.
With kube-proxy running on each Node and watching EndpointSlices, every change to an EndpointSlice becomes relatively expensive since it will be transmitted to every Node in the cluster. This approach is intended to limit the number of changes that need to be sent to every Node, even if it may result with multiple EndpointSlices that are not full.
Duplicate endpoints
Due to the nature of EndpointSlice changes, endpoints may be represented in more than one EndpointSlice at the same time. This naturally occurs as changes to different EndpointSlice objects can arrive at the Kubernetes client watch/cache at different times. Implementations using EndpointSlice must be able to have the endpoint appear in more than one slice. A reference implementation of how to perform endpoint deduplication can be found in the EndpointSliceCache
implementation in .