Upgrading Specific Versions

Enterprise

Consul Enterprise 1.10 has removed temporary licensing capabilities from the binaries found on https://releases.hashicorp.com. Servers will no longer load a license previously set through the CLI or API. Instead the license must be present in the server’s configuration or environment prior to starting. See the for more information about how to configure the license. Client agents previously retrieved their license from the servers in the cluster within 30 minutes of starting and the snapshot agent would similarly retrieve its license from the server or client agent it was configured to use. As of Consul Enterprise 1.10 both the snapshot agent and client agent have gained the ability to have a license loaded from a configuration file or from their environment the same way server agents must have the license specified. Both agents can still perform automatic retrieval of their license but with a few extra stipulations. First, license auto-retrieval now requires that ACLs are on and that the client or snapshot agent is configured with a valid ACL token. Secondly, client agents require that either the start_join or configurations are set and that they resolve to server agents. If those stipulations are not met, attempting to start the client or snapshot agent will result in it immediately shutting down.

For the step by step upgrade procedures see the Upgrading to 1.10.0 documentation. For answers to common licensing questions please refer to the

Envoy xDS Protocol Upgrades

Consul versions 1.9 and earlier exposed an xDS server for use by proxies using the v2 “State of the World” protocol variant.

Consul 1.10.0 adds support for the v3 protocol variant as the preferred way of conversing with Envoy. Both protocol variants are supported in this Consul version to facilitate upgrading Consul and Envoy in a stairstep order to avoid downtime.

In a future version of Consul the v2 State of the World protocol support will be removed.

Escape Hatches

Any that are defined will likely need to be switched from using xDS v2 to xDS v3 structures. Mostly this involves migrating off of deprecated (and now removed) fields and switching untyped config to typed config with @type attributes set appropriately.

xDS v3 syntax has been so this could be done on most earlier versions of Consul+Envoy in advance of the Consul 1.10.0 upgrade.

As an example, here’s a Zipkin integration before and

Stairstep Upgrade Path

  1. Upgrade Envoy sidecars to the latest version of Envoy that is by the currently running version of Consul as well as Consul 1.10.0.

  2. Perform a normal upgrade of both Consul servers and clients to 1.10.0. At this point the existing Envoy instances will continue to speak the v2 State of the World protocol to the new Consul instances without issue.

  3. Once a Consul client is upgraded, use an updated CLI binary to re-bootstrap and restart Envoy using consul connect envoy. This will ensure it switches over to the v3 Incremental xDS protocol.

    Depending upon how you have chosen to run Envoy this is either one step (consul connect envoy) or two steps (consul connect envoy -bootstrap followed by running Envoy directly).

  4. (Optionally) upgrade Envoy to the latest version supported in Consul 1.10.0.

Transparent Proxy on Kubernetes

When upgrading to Consul >= 1.10.0, Consul-helm >= 0.32.0, and Consul-k8s >= 0.26.0, a Kubernetes Service must be added for every service registered to Consul. This Service should be added before performing the upgrade. This will allow services to be managed by a central component, called endpoints-controller, which will enable features like transparent proxy.

After the upgrade is performed, all Pods of a service will need to be restarted. The service will be up and health checks will continue to work without restarting the service, but a restart is required so the Pods can be re-injected with the latest container configuration.

Consul 1.9.0

Changes to Raft Protocol Support

Consul 1.8 supported Raft protocols 2 and 3. Consul 1.9.0 now only supports Raft protocol 3. Consul has defaulted to using Raft protocol 3 since version 1.0.0, so this should only impact users who have been using Consul prior to 1.0.0 and may have the raft_protocol config setting set to 2. Users in that position should upgrade to a previous release supporting both protocol versions and update their configuration to use Raft protocol 3 before continuing their upgrade to Consul 1.9.0.

The enable_central_service_config configuration now defaults to true.

Changes to Intentions

Namespaced Intentions

Enterprise

The API endpoint to now accepts the same ns query parameter (or X-Consul-Namespace header) used on other API endpoints. By default this will now only list the intentions in a specific namespace, rather than listing all intentions across all namespaces. To achieve the same results as Consul versions prior to 1.9.0 request the wildcard namespace with a query parameter of ?ns=*.

Migration

Upgrading to Consul 1.9.0 will trigger a one-time background migration of into an equivalent set of service-intentions config entries. This process will wait until all of the Consul servers in the primary datacenter are running Consul 1.9.0+.

All write requests via either the endpoints or Config Entry API endpoints for a service-intentions kind will be blocked until the migration process is complete after the upgrade. Reads will function normally throughout the migration, so authorization enforcement will be unaffected.

Secondary datacenters will perform their own one-time migration operations after the primary datacenter completes its migration and all of the Consul servers in the secondary datacenter are running Consul 1.9.0+. It is safe to upgrade the datacenters in any order.

Deprecated Fields

All old ID-based Intentions API CRUD endpoints will retain all of their prior fields as long as those endpoints are exclusively used to edit intentions. Once the underlying config entry representation is edited it will transition the intention into the newer format where some fields are no longer present. Once this transition occurs those intentions can no longer be used with the ID-based endpoints unless they are re-created via the old endpoints. Fields that are being removed or changing behavior:

  • after migration is stored in the LegacyID field. After transitioning this field is cleared.

  • after migration is stored in the LegacyCreateTime field. After transitioning this field is cleared.

  • after migration is stored in the LegacyUpdateTime field. After transitioning this field is cleared.

  • after migration is stored in the LegacyMeta field. To complete the transition, this field must be cleared manually and the metadata moved up to the enclosing config entry’s field. This is not done automatically since it is potentially a lossy operation.

Consul 1.8.0

Removal of Deprecated Features

The acl_enforce_version_8 configuration has been removed (with version 8 ACL support by being on by default).

Consul 1.7.0

Consul 1.7.0 contains three major changes that impact upgrades: stricter JSON decoding, , and backward-incompatible Session API changes.

Session API

Consul 1.7.0 introduced a backwards incompatible change to the Session API. Queries to view or renew sessions from agents on earlier versions will be rejected. This impacts features and products including: Vault, the Enterprise snapshot agent, and locks.

The issue occurs when clients are still running 1.6.4 or earlier but servers have been upgraded to 1.7.0 or 1.7.1. For this reason, we recommend you upgrade directly to 1.7.2 when it is available as it will include a fix for this issue.

Stricter JSON Decoding

The HTTP API will now return 400 status codes with a textual error when unknown fields are present in the payload of a request. Previously, Consul would simply ignore the unknown fields. You will need to ensure that your API usage only uses supported fields which are those documented in the example payloads in the API documentation.

Consul will now return the canonical service name in response to PTR queries. For OSS users the change is that the datacenter will be present where it was not before. For Consul Enterprise users, both the datacenter and the services namespace will be present. For example, where a PTR record would previously have contained web.service.consul, it will now be web.service.dc1.consul in OSS or web.service.ns1.dc1.consul for Enterprise.

Telemetry: semantics of consul.rpc.query changed, see consul.rpc.queries_blocking

Consul has changed the semantics of query counts in its telemetry. consul.rpc.query now only increments on the start of a query (blocking or non-blocking), whereas before it would measure when blocking queries polled for more data. The consul.rpc.queries_blocking gauge has been added to more precisely capture the view of active blocking queries.

Vault: default http_max_conns_per_client too low to run Vault properly

Consul 1.7.0 introduced limiting of connections per client. The default value was 100, but Vault could use up to 128, which caused problems. If you want to use Vault with Consul 1.7.0, you should change the value to 200. Starting with Consul 1.7.1 this is the new default.

Consul 1.6.3

Vault: default http_max_conns_per_client too low to run Vault properly

Consul 1.6.3 introduced . The default value was 100, but Vault could use up to 128, which caused problems. If you want to use Vault with Consul 1.6.3 through 1.7.0, you should change the value to 200. Starting with Consul 1.7.1 this is the new default.

Consul 1.6.0

Removal of Deprecated Features

Managed proxies (which have been deprecated since Consul 1.3.0) have now been . Before upgrading, you will need to migrate any managed proxy usage to sidecar service registrations.

There are two major features in Consul 1.4.0 that may impact upgrades: a and multi-datacenter support for Connect in the Enterprise version.

Consul 1.4.0 includes a that is designed to have a smooth upgrade path but requires care to upgrade components in the right order.

Note: As with most major version upgrades, you cannot downgrade once the upgrade to 1.4.0 is complete as it adds new state to the raft store. As always it is strongly recommended that you test the upgrade first outside of production and ensure you take backup snapshots of all datacenters before upgrading.

Primary Datacenter

The “ACL datacenter” in 1.3.x and earlier is now referred to as the “Primary datacenter”. All configuration is backwards compatible and shouldn’t need to change prior to upgrade although it’s strongly recommended to migrate ACL configuration to the new syntax soon after upgrade. This includes moving to primary_datacenter rather than acl_datacenter and acl_* to the new .

Datacenters can be upgraded in any order although secondaries will remain in Legacy ACL mode until the primary datacenter is fully upgraded.

Each datacenter should follow the .

Legacy ACL Mode

When a 1.4.0 server first starts, it runs in “Legacy ACL mode”. In this mode, bootstrap requests and new ACL APIs will not be functional yet and will return an error. The server advertises its ability to support 1.4.0 ACLs via gossip and waits.

In the primary datacenter, the servers all wait in legacy ACL mode until they see every server in the primary datacenter advertise 1.4.0 ACL support. Once this happens, the leader will complete the transition out of “legacy ACL mode” and write this into the state so future restarts don’t need to go through the same transition.

In a secondary datacenter, the same process happens except that servers additionally wait for all servers in the primary datacenter making it safe to upgrade datacenters in any order.

Legacy Token Accessor Migration

As soon as all servers in the primary datacenter have been upgraded to 1.4.0, the leader will begin the process of creating new accessor IDs for all existing ACL tokens.

This process completes in the background and is rate limited to ensure it doesn’t overload the leader. It completes upgrades in batches of 128 tokens and will not upgrade more than one batch per second so on a cluster with 10,000 tokens, this may take several minutes.

While this is happening both old and new ACLs will work correctly with the caveat that new ACL Token APIs may not return an accessor ID for legacy tokens that are not yet migrated.

Migrating Existing ACLs

New ACL policies have slightly different syntax designed to fix some shortcomings in old ACL syntax. During and after the upgrade process, any old ACL tokens will continue to work and grant exactly the same level of access.

After upgrade, it is still possible to create “legacy” tokens using the existing API so existing integrations that create tokens (e.g. Vault) will continue to work. The “legacy” tokens generated though will not be able to take advantage of new policy features. It’s recommended that you complete migration of all tokens as soon as possible after upgrade, as well as updating any integrations to work with the the new ACL Token and APIs.

More complete details on how to upgrade “legacy” tokens is available here.

Connect Multi-datacenter

This only applies to users upgrading from an older version of Consul Enterprise to Consul Enterprise 1.4.0 (all license types).

In addition, this upgrade will only affect clusters where Connect is enabled on your servers before the migration.

Connect multi-datacenter uses the same primary/secondary approach as ACLs and will use the same . When a secondary datacenter server restarts with 1.4.0 it will detect it is not the primary and begin an automatic bootstrap of multi-datacenter CA federation.

Datacenters can be upgraded in either order; secondary datacenters will not switch into multi-datacenter mode until all servers in both the secondary and primary datacenter are detected to be running at least Consul 1.4.0. Secondary datacenters monitor this periodically (every few minutes) and will automatically upgrade Connect to use a federated Certificate Authority when they do.

In general, migrating a Consul cluster from OSS to Enterprise will update the CA to be federated automatically and without impact on Connect traffic. When upgrading Consul Enterprise 1.3.x to Consul Enterprise 1.4.0 upgrades the CA upgrade is seamless, however depending on the size of the cluster, new connection attempts in the secondary datacenter might fail for a short window (typically seconds) while the update is propagated due to the 1.3.x Beta authorization endpoint validating originating cluster in a way that was not fully forwards compatible with migrating between cluster trust domains. That issue is fixed in 1.4.0 as part of General Availability.

Once migrated (typically a few seconds). Connect will use the primary datacenter’s Certificate Authority as the root of trust for all other datacenters. CA migration or root key changes in the primary will now rotate automatically and without loss of connectivity throughout all datacenters and workloads.

For more information see Connect Multi-datacenter.

Consul 1.3.0

This version added support for multiple tag filters in service discovery queries, however it introduced a subtle bug where API calls to /catalog/service/:name?tag=<tag> would ignore the tag filter only during the upgrade. It only occurs when clients are still running 1.2.3 or earlier but servers have been upgraded. The /health/service/:name?tag=<tag> endpoint and DNS interface were not affected.

For this reason, we recommend you upgrade directly to 1.3.1 which includes only a fix for this issue.

Consul 1.1.0

Removal of Deprecated Features

The following previously deprecated fields and config options have been removed:

  • CheckID has been removed from config file check definitions (use id instead).
  • has been removed from config file check definitions (use args instead).
  • enableTagOverride is no longer valid in service definitions (use enable_tag_override instead).
  • The (beginning with consul.consul.) has been removed along with the enable_deprecated_names option from the metrics configuration.

New defaults for Raft Snapshot Creation

Consul 1.0.1 (and earlier versions of Consul) checked for raft snapshots every 5 seconds, and created new snapshots for every 8192 writes. These defaults cause constant disk IO in large busy clusters. Consul 1.1.0 increases these to larger values, and makes them tunable via the and raft_snapshot_threshold parameters. We recommend keeping the new defaults. However, operators can go back to the old defaults by changing their config if they prefer more frequent snapshots. See the documentation for and raft_snapshot_threshold to understand the trade-offs when tuning these.

Consul 1.0.7

When requesting a specific service (/v1/health/:service or /v1/catalog/:service endpoints), the X-Consul-Index returned is now the index at which that specific service was last modified. In version 1.0.6 and earlier the X-Consul-Index returned was the index at which any service was last modified. See GH-3890 for more details.

During upgrades from 1.0.6 or lower to 1.0.7 or higher, watchers are likely to see X-Consul-Index for these endpoints decrease between blocking calls.

Consul’s watch feature and consul-template should gracefully handle this case. Other tools relying on blocking service or health queries are also likely to work; some may require a restart. It is possible external tools could break and either stop working or continually re-request data without blocking if they have assumed indexes can never decrease or be reset and/or persist index values. Please test any blocking query integrations in a controlled environment before proceeding.

Consul 1.0.1

Carefully Check and Remove Stale Servers During Rolling Upgrades

Consul 1.0 (and earlier versions of Consul when running with had an issue where performing rolling updates of Consul servers could result in an outage from old servers remaining in the cluster. Autopilot would normally remove old servers when new ones come online, but it was also waiting to promote servers to voters in pairs to maintain an odd quorum size. The pairwise promotion feature was removed so that servers become voters as soon as they are stable, allowing Autopilot to remove old servers in a safer way.

When upgrading from Consul 1.0, you may need to manually old servers as part of a rolling update to Consul 1.0.1.

Consul 1.0

Consul 1.0 has several important breaking changes that are documented here. Please be sure to read over all the details here before upgrading.

Raft Protocol Now Defaults to 3

The -raft-protocol default has been changed from 2 to 3, enabling all features by default.

Raft protocol version 3 requires Consul running 0.8.0 or newer on all servers in order to work, so if you are upgrading with older servers in a cluster then you will need to set this back to 2 in order to upgrade. See Raft Protocol Version Compatibility for more details. Also the format of peers.json used for outage recovery is different when running with the latest Raft protocol. Review for a description of the required format.

Please note that the Raft protocol is different from Consul’s internal protocol as described on the Protocol Compatibility Promise page, and as is shown in commands like consul members and consul version. To see the version of the Raft protocol in use on each server, use the consul operator raft list-peers command.

The easiest way to upgrade servers is to have each server leave the cluster, upgrade its Consul version, and then add it back. Make sure the new server joins successfully and that the cluster is stable before rolling the upgrade forward to the next server. It’s also possible to stand up a new set of servers, and then slowly stand down each of the older servers in a similar fashion.

When using Raft protocol version 3, servers are identified by their instead of their IP address when Consul makes changes to its internal Raft quorum configuration. This means that once a cluster has been upgraded with servers all running Raft protocol version 3, it will no longer allow servers running any older Raft protocol versions to be added. If running a single Consul server, restarting it in-place will result in that server not being able to elect itself as a leader. To avoid this, either set the Raft protocol back to 2, or use Manual Recovery Using peers.json to map the server to its node ID in the Raft quorum configuration.

Config Files Require an Extension

As part of supporting the HCL format for Consul’s config files, an .hcl or .json extension is required for all config files loaded by Consul, even when using the argument to specify a file directly.

Service Definition Parameter Case changed

All config file formats now require snake_case fields, so all CamelCased parameter names should be changed before upgrading. See documentation for details.

Deprecated Options Have Been Removed

All of Consul’s previously deprecated command line flags and config options have been removed, so these will need to be mapped to their equivalents before upgrading. Here’s the complete list of removed options and their equivalents:

statsite_prefix Renamed to metrics_prefix

Since the statsite_prefix configuration option applied to all telemetry providers, statsite_prefix was renamed to metrics_prefix. Configuration files will need to be updated when upgrading to this version of Consul.

advertise_addrs Removed

This configuration option was removed since it was redundant with advertise_addr and advertise_addr_wan in combination with ports and also wrongly stated that you could configure both host and port.

Escaping Behavior Changed for go-discover Configs

The format for and -retry-join-wan values that use cloud auto joining has changed. Values in key=val sequences must no longer be URL encoded and can be provided as literals as long as they do not contain spaces, backslashes \ or double quotes ". If values contain these characters then use double quotes as in "some key"="some value". Special characters within a double quoted string can be escaped with a backslash \.

HTTP Verbs are Enforced in Many HTTP APIs

Many endpoints in the HTTP API that previously took any HTTP verb now check for specific HTTP verbs and enforce them. This may break clients relying on the old behavior. Here’s the complete list of updated endpoints and required HTTP verbs:

Unauthorized KV Requests Return 403

When ACLs are enabled, reading a key with an unauthorized token returns a 403. This previously returned a 404 response.

Config Section of Agent Self Endpoint has Changed

The /v1/agent/self endpoint’s Config section has often been in flux as it was directly returning one of Consul’s internal data structures. This configuration structure has been moved under DebugConfig, and is documents as for debugging use and subject to change, and a small set of elements of Config have been maintained and documented. See endpoint documentation for details.

Deprecated configtest Command Removed

The configtest command was deprecated and has been superseded by the validate command.

Undocumented Flags in validate Command Removed

The validate command supported the -config-file and -config-dir command line flags but did not document them. This support has been removed since the flags are not required.

Metric Names Updated

Metric names no longer start with . To help with transitioning dashboards and other metric consumers, the field enable_deprecated_names has been added to the telemetry section of the config, which will enable metrics with the old naming scheme to be sent alongside the new ones. The following prefixes were affected:

Checks Validated On Agent Startup

Consul agents now validate health check definitions in their configuration and will fail at startup if any checks are invalid. In previous versions of Consul, invalid health checks would get skipped.

Script Checks Are Now Opt-In

A new configuration option was added, and defaults to false, meaning that in order to allow an agent to run health checks that execute scripts, this will need to be configured and set to true. This provides a safer out-of-the-box configuration for Consul where operators must opt-in to allow script-based health checks.

If your cluster uses script health checks please be sure to set this to true as part of upgrading agents. If this is set to true, you should also enable ACLs to provide control over which users are allowed to register health checks that could potentially execute scripts on the agent machines.

Security Warning: Using enable_script_checks without ACLs and without allow_write_http_from is DANGEROUS. Use the enable_local_script_checks setting introduced in v0.9.4 instead. See for more information.

Web UI Is No Longer Released Separately

Consul releases will no longer include a web_ui.zip file with the compiled web assets. These have been built in to the Consul binary since the 0.7.x series and can be enabled with the configuration option. These built-in web assets have always been identical to the contents of the web_ui.zip file for each release. The -ui-dir option is still available for hosting customized versions of the web assets, but the vast majority of Consul users can just use the built in web assets.

Consul 0.8.0

Upgrade Current Cluster Leader Last

We identified a potential issue with Consul 0.8 that requires the current cluster leader to be upgraded last when updating multiple servers. Please see for more details.

Command-Line Interface RPC Deprecation

The RPC client interface has been removed. All CLI commands that used RPC and the -rpc-addr flag to communicate with Consul have been converted to use the HTTP API and the appropriate flags for it, and the rpc field has been removed from the port and address binding configs. You will need to remove these fields from your config files and update any scripts that passed a custom -rpc-addr to the following commands:

Version 8 ACLs Are Now Opt-Out

The acl_enforce_version_8 configuration now defaults to true to enable full version 8 ACL support by default. If you are upgrading an existing cluster with ACLs enabled, you will need to set this to false during the upgrade on both Consul agents and Consul servers. Version 8 ACLs were also changed so that must be set on agents in order to enable the agent-side enforcement of ACLs. This makes for a smoother experience in clusters where ACLs aren’t enabled at all, but where the agents would have to wait to contact a Consul server before learning that.

Remote Exec Is Now Opt-In

Raft Protocol Version Compatibility

When upgrading to Consul 0.8.0 from a version lower than 0.7.0, users will need to set the -raft-protocol option to 1 in order to maintain backwards compatibility with the old servers during the upgrade. After the servers have been migrated to version 0.8.0, -raft-protocol can be moved up to 2 and the servers restarted to match the default.

The Raft protocol must be stepped up in this way; only adjacent version numbers are compatible (for example, version 1 cannot talk to version 3). Here is a table of the Raft Protocol versions supported by each Consul version:

In order to enable all features, all servers in a Consul datacenter must be running with Raft protocol version 3 or later.

Consul 0.7.1

Child Process Reaping

Child process reaping support has been removed, along with the reap configuration option. Reaping is also done via dumb-init in the , so removing it from Consul itself simplifies the code and eases future maintenance for Consul. If you are running Consul as PID 1 in a container you will need to arrange for a wrapper process to reap child processes.

DNS Resiliency Defaults

The default for has been increased from 5 seconds to a near-indefinite threshold (10 years) to allow DNS queries to continue to be served in the event of a long outage with no leader. A new telemetry counter was added at consul.dns.stale_queries to track when agents serve DNS queries that are stale by more than 5 seconds.

Consul 0.7

Consul version 0.7 is a very large release with many important changes. Changes to be aware of during an upgrade are categorized below.

Performance Timing Defaults and Tuning

Consul 0.7 now defaults the DNS configuration to allow for stale queries by defaulting allow_stale to true for better utilization of available servers. If you want to retain the previous behavior, set the following configuration:

Consul also 0.7 introduced support for tuning Raft performance using a new . Also, the default Raft timing is set to a lower-performance mode suitable for minimal Consul servers.

To continue to use the high-performance settings that were the default prior to Consul 0.7 (recommended for production servers), add the following configuration to all Consul servers when upgrading:

  1. {
  2. "performance": {
  3. "raft_multiplier": 1
  4. }
  5. }

See the guide for more details.

The default behavior of and skip_leave_on_interrupt are now dependent on whether or not the agent is acting as a server or client:

  • For servers, leave_on_terminate defaults to “false” and skip_leave_on_interrupt defaults to “true”.

  • For clients, leave_on_terminate defaults to “true” and skip_leave_on_interrupt defaults to “false”.

These defaults are designed to be safer for servers so that you must explicitly configure them to leave the cluster. This also results in a better experience for clients, especially in cloud environments where they may be created and destroyed often and users prefer not to wait for the 72 hour reap time for cleanup.

Dropped Support for Protocol Version 1

Consul version 0.7 dropped support for protocol version 1, which means it is no longer compatible with versions of Consul prior to 0.3. You will need to upgrade all agents to a newer version of Consul before upgrading to Consul 0.7.

Prepared Query Changes

Consul version 0.7 adds a feature which allows prepared queries to store a in the query definition itself. This feature enables using the distance sorting features of prepared queries without explicitly providing the node to sort near in requests, but requires the agent servicing a request to send additional information about itself to the Consul servers when executing the prepared query. Agents prior to 0.7 do not send this information, which means they are unable to properly execute prepared queries configured with a Near parameter. Similarly, any server nodes prior to version 0.7 are unable to store the Near parameter, making them unable to properly serve requests for prepared queries using the feature. It is recommended that all agents be running version 0.7 prior to using this feature.

WAN Address Translation in HTTP Endpoints

Consul version 0.7 added support for translating WAN addresses in certain . The servers and the agents need to be running version 0.7 or later in order to use this feature.

These translated addresses could break HTTP endpoint consumers that are expecting local addresses, so a new X-Consul-Translate-Addresses header was added to allow clients to detect if translation is enabled for HTTP responses. A “lan” tag was added to TaggedAddresses for clients that need the local address regardless of translation.

Outage Recovery and peers.json Changes

The peers.json file is no longer present by default and is only used when performing recovery. This file will be deleted after Consul starts and ingests the file. Consul 0.7 also uses a new, automatically-created raft/peers.info file to avoid ingesting the peers.json file on the first start after upgrading (the peers.json file is simply deleted on the first start after upgrading).

Please be sure to review the Outage Recovery tutorial before upgrading for more details.

Consul 0.6.4

Consul 0.6.4 made some substantial changes to how ACLs work with prepared queries. Existing queries will execute with no changes, but there are important differences to understand about how prepared queries are managed before you upgrade. In particular, prepared queries with no Name defined will no longer require any ACL to manage them, and prepared queries with a Name defined are now governed by a new query ACL policy that will need to be configured after the upgrade.

See the ACL rules documentation for more details about the new behavior and how it compares to previous versions of Consul.

Consul 0.6

Consul version 0.6 is a very large release with many enhancements and optimizations. Changes to be aware of during an upgrade are categorized below.

Data Store Changes

Consul changed the format used to store data on the server nodes in version 0.5 (see 0.5.1 notes below for details). Previously, Consul would automatically detect data directories using the old LMDB format, and convert them to the newer BoltDB format. This automatic upgrade has been removed for Consul 0.6, and instead a safeguard has been put in place which will prevent Consul from booting if the old directory format is detected.

It is still possible to migrate from a 0.5.x version of Consul to 0.6+ using the CLI utility. This is the same tool that was previously embedded into Consul. See the releases page for downloadable versions of the tool.

Also, in this release Consul switched from LMDB to a fully in-memory database for the state store. Because LMDB is a disk-based backing store, it was able to store more data than could fit in RAM in some cases (though this is not a recommended configuration for Consul). If you have an extremely large data set that won’t fit into RAM, you may encounter issues upgrading to Consul 0.6.0 and later. Consul should be provisioned with physical memory approximately 2X the data set size to allow for bursty allocations and subsequent garbage collection.

ACL Enhancements

Consul 0.6 introduces enhancements to the ACL system which may require special handling:

  • Service ACLs are enforced during service discovery (REST + DNS)

Previously, service discovery was wide open, and any client could query information about any service without providing a token. Consul now requires read-level access at a minimum when ACLs are enabled to return service information over the REST or DNS interfaces. If clients depend on an open service discovery system, then the following should be added to all ACL tokens which require it:

When the DNS interface is queried, the agent’s acl_token is used, so be sure that token has sufficient privileges to return the DNS records you expect to retrieve from it.

  • Event and keyring ACLs

Similar to service discovery, the new event and keyring ACLs will block access to these operations if the acl_default_policy is set to deny. If clients depend on open access to these, then the following should be added to all ACL tokens which require them:

  1. event "" {
  2. policy = "write"
  3. }
  4. keyring = "write"

Unfortunately, these are new ACLs for Consul 0.6, so they must be added after the upgrade is complete.

Prepared Queries

Prepared queries introduce a new Raft log entry type that isn’t supported on older versions of Consul. It’s important to not use the prepared query features of Consul until all servers in a cluster have been upgraded to version 0.6.0.

Single Private IP Enforcement

Consul will refuse to start if there are multiple private IPs available, so if this is the case you will need to configure Consul’s advertise or bind addresses before upgrading.

New Web UI File Layout

The release .zip file for Consul’s web UI no longer contains a dist sub-folder; everything has been moved up one level. If you have any automated scripts that expect the old layout you may need to update them.

Consul version 0.5.1 uses a different backend store for persisting the Raft log. Because of this change, a data migration is necessary to move the log entries out of LMDB and into the newer backend, BoltDB.

Consul version 0.5.1+ makes this transition seamless and easy. As a user, there are no special steps you need to take. When Consul starts, it checks for presence of the legacy LMDB data files, and migrates them automatically if any are found. You will see a log emitted when Raft data is migrated, like this:

This automatic upgrade will only exist in Consul 0.5.1+ and it will be removed starting with Consul 0.6.0+. It will still be possible to upgrade directly from pre-0.5.1 versions by using the consul-migrate utility, which is available on the Consul Tools page.

Consul 0.5

Consul version 0.5 adds two features that complicate the upgrade process:

  • ACL system includes service discovery and registration
  • Internal use of tombstones to fix behavior of blocking queries in certain edge cases.

Users of the ACL system need to be aware that deploying Consul 0.5 will cause service registration to be enforced. This means if an agent attempts to register a service without proper privileges it will be denied. If the acl_default_policy is “allow” then clients will continue to work without an updated policy. If the policy is “deny”, then all clients will begin to have their registration rejected causing issues.

To avoid this situation, all the ACL policies should be updated to add something like this:

  1. # Enable all services to be registered
  2. service "" {
  3. policy = "write"
  4. }

This will set the service policy to level for all services. The blank service name is the catch-all value. A more specific service can also be specified:

The ACL policy can be updated while running 0.4, and enforcement will being with the upgrade to 0.5. The policy updates will ensure the availability of the cluster.

The second major change is the new internal command used for tombstones. The details of the change are not important, however to function the leader node will replicate a new command to its followers. Consul is designed defensively, and when a command that is not recognized is received, the server will panic. This is a purposeful design decision to avoid the possibility of data loss, inconsistencies, or security issues caused by future incompatibility.

In practice, this means if a Consul 0.5 node is the leader, all of its followers must also be running 0.5. There are a number of ways to do this to ensure cluster availability:

  • Add new 0.5 nodes, then remove the old servers. This will add the new nodes as followers, and once the old servers are removed, one of the 0.5 nodes will become leader.

  • Upgrade the followers first, then the leader last. Using consul info, you can determine which nodes are followers. Do an in-place upgrade on them first, and finally upgrade the leader last.

Finally, even if any of the methods above are not possible or the process fails for some reason, it is not fatal. The older version of the server will simply panic and stop. At that point, you can upgrade to the new version and restart the agent. There will be no data loss and the cluster will resume operations.