Anti-Entropy

    It is important to first understand the moving pieces involved in services and health checks: the agent and the . These are described conceptually below to make anti-entropy easier to understand.

    Agent

    Each Consul agent maintains its own set of service and check registrations as well as health information. The agents are responsible for executing their own health checks and updating their local state.

    Services and checks within the context of an agent have a rich set of configuration options available. This is because the agent is responsible for generating information about its services and their health through the use of .

    Catalog

    Consul’s service discovery is backed by a service catalog. This catalog is formed by aggregating information submitted by the agents. The catalog maintains the high-level view of the cluster, including which services are available, which nodes run those services, health information, and more. The catalog is used to expose this information via the various interfaces Consul provides, including DNS and HTTP.

    The catalog is maintained only by server nodes. This is because the catalog is replicated via the to provide a consolidated and consistent view of the cluster.

    Entropy is the tendency of systems to become increasingly disordered. Consul’s anti-entropy mechanisms are designed to counter this tendency, to keep the state of the cluster ordered even through failures of its components.

    Consul has a clear separation between the global service catalog and the agent’s local state as discussed above. The anti-entropy mechanism reconciles these two views of the world: anti-entropy is a synchronization of the local agent state and the catalog. For example, when a user registers a new service or check with the agent, the agent in turn notifies the catalog that this new check exists. Similarly, when a check is deleted from the agent, it is consequently removed from the catalog as well.

    Anti-entropy is also used to update availability information. As agents run their health checks, their status may change in which case their new status is synced to the catalog. Using this information, the catalog can respond intelligently to queries about its nodes and services based on their availability.

    In addition to running when changes to the agent occur, anti-entropy is also a long-running process which periodically wakes up to sync service and check status to the catalog. This ensures that the catalog closely matches the agent’s true state. This also allows Consul to re-populate the service catalog even in the case of complete data loss.

    To avoid saturation, the amount of time between periodic anti-entropy runs will vary based on cluster size. The table below defines the relationship between cluster size and sync interval:

    The intervals above are approximate. Each Consul agent will choose a randomly staggered start time within the interval window to avoid a thundering herd.

    Anti-entropy can fail in a number of cases, including misconfiguration of the agent or its operating environment, I/O problems (full disk, filesystem permission, etc.), networking problems (agent cannot communicate with server), among others. Because of this, the agent attempts to sync in best-effort fashion.

    Synchronization of service registration can be partially modified to allow external agents to change the tags for a service. This can be useful in situations where an external monitoring service needs to be the source of truth for tag information. For example, the Redis database and its monitoring service Redis Sentinel have this kind of relationship. Redis instances are responsible for much of their configuration, but Sentinels determine whether the Redis instance is a primary or a secondary. Using the Consul service configuration item enable_tag_override you can instruct the Consul agent on which the Redis database is running to NOT update the tags during anti-entropy synchronization. For more information see page.