Statistics

    The cluster manager has a statistics tree rooted at cluster_manager. with the following statistics. Any character in the stats name is replaced with _. Stats include all clusters managed by the cluster manager, including both clusters used for data plane upstreams and control plane xDS clusters.

    Every cluster has a statistics tree rooted at cluster.<name>. with the following statistics:

    Name

    Type

    Description

    upstream_cx_total

    Counter

    Total connections

    upstream_cx_active

    Gauge

    Total active connections

    upstream_cx_http1_total

    Counter

    Total HTTP/1.1 connections

    upstream_cx_http2_total

    Counter

    Total HTTP/2 connections

    upstream_cx_http3_total

    Counter

    Total HTTP/3 connections

    upstream_cx_connect_fail

    Counter

    Total connection failures

    upstream_cx_connect_timeout

    Counter

    Total connection connect timeouts

    upstream_cx_idle_timeout

    Counter

    Total connection idle timeouts

    upstream_cx_connect_attempts_exceeded

    Counter

    Total consecutive connection failures exceeding configured connection attempts

    upstream_cx_overflow

    Counter

    Total times that the cluster’s connection circuit breaker overflowed

    upstream_cx_connect_ms

    Histogram

    Connection establishment milliseconds

    upstream_cx_length_ms

    Histogram

    Connection length milliseconds

    upstream_cx_destroy

    Counter

    Total destroyed connections

    upstream_cx_destroy_local

    Counter

    Total connections destroyed locally

    upstream_cx_destroy_remote

    Counter

    Total connections destroyed remotely

    upstream_cx_destroy_with_active_rq

    Counter

    Total connections destroyed with 1+ active request

    upstream_cx_destroy_local_with_active_rq

    Counter

    Total connections destroyed locally with 1+ active request

    upstream_cx_destroy_remote_with_active_rq

    Counter

    Total connections destroyed remotely with 1+ active request

    upstream_cx_close_notify

    Counter

    Total connections closed via HTTP/1.1 connection close header or HTTP/2 or HTTP/3 GOAWAY

    upstream_cx_rx_bytes_total

    Counter

    Total received connection bytes

    upstream_cx_rx_bytes_buffered

    Gauge

    Received connection bytes currently buffered

    upstream_cx_tx_bytes_total

    Counter

    Total sent connection bytes

    upstream_cx_tx_bytes_buffered

    Gauge

    Send connection bytes currently buffered

    upstream_cx_pool_overflow

    Counter

    Total times that the cluster’s connection pool circuit breaker overflowed

    upstream_cx_protocol_error

    Counter

    Total connection protocol errors

    upstream_cx_max_requests

    Counter

    Total connections closed due to maximum requests

    upstream_cx_none_healthy

    Counter

    Total times connection not established due to no healthy hosts

    upstream_rq_total

    Counter

    Total requests

    upstream_rq_active

    Gauge

    Total active requests

    upstream_rq_pending_total

    Counter

    Total requests pending a connection pool connection

    upstream_rq_pending_overflow

    Counter

    Total requests that overflowed connection pool or requests (mainly for HTTP/2 and above) circuit breaking and were failed

    upstream_rq_pending_failure_eject

    Counter

    Total requests that were failed due to a connection pool connection failure or remote connection termination

    upstream_rq_pending_active

    Gauge

    Total active requests pending a connection pool connection

    upstream_rq_cancelled

    Counter

    Total requests cancelled before obtaining a connection pool connection

    upstream_rq_maintenance_mode

    Counter

    Total requests that resulted in an immediate 503 due to maintenance mode

    upstream_rq_timeout

    Counter

    Total requests that timed out waiting for a response

    upstream_rq_max_duration_reached

    Counter

    Total requests closed due to max duration reached

    upstream_rq_per_try_timeout

    Counter

    Total requests that hit the per try timeout (except when request hedging is enabled)

    upstream_rq_rx_reset

    Counter

    Total requests that were reset remotely

    upstream_rq_tx_reset

    Counter

    Total requests that were reset locally

    upstream_rq_retry

    Counter

    Total request retries

    upstream_rq_retry_backoff_exponential

    Counter

    Total retries using the exponential backoff strategy

    upstream_rq_retry_backoff_ratelimited

    Counter

    Total retries using the ratelimited backoff strategy

    upstream_rq_retry_limit_exceeded

    Counter

    Total requests not retried due to exceeding

    upstream_rq_retry_success

    Counter

    Total request retry successes

    upstream_rq_retry_overflow

    Counter

    Total requests not retried due to circuit breaking or exceeding the retry budget

    upstream_flow_control_paused_reading_total

    Counter

    Total number of times flow control paused reading from upstream

    upstream_flow_control_resumed_reading_total

    Counter

    Total number of times flow control resumed reading from upstream

    upstream_flow_control_backed_up_total

    Counter

    Total number of times the upstream connection backed up and paused reads from downstream

    upstream_flow_control_drained_total

    Counter

    Total number of times the upstream connection drained and resumed reads from downstream

    upstream_internal_redirect_failed_total

    Counter

    Total number of times failed internal redirects resulted in redirects being passed downstream.

    upstream_internal_redirect_succeeded_total

    Total number of times internal redirects resulted in a second upstream request.

    membership_change

    Counter

    Total cluster membership changes

    membership_healthy

    Gauge

    Current cluster healthy total (inclusive of both health checking and outlier detection)

    membership_degraded

    Gauge

    Current cluster total

    membership_excluded

    Gauge

    Current cluster excluded total

    membership_total

    Gauge

    Current cluster membership total

    retry_or_shadow_abandoned

    Counter

    Total number of times shadowing or retry buffering was canceled due to buffer limits

    config_reload

    Counter

    Total API fetches that resulted in a config reload due to a different config

    update_attempt

    Counter

    Total attempted cluster membership updates by service discovery

    update_success

    Counter

    Total successful cluster membership updates by service discovery

    update_failure

    Counter

    Total failed cluster membership updates by service discovery

    update_duration

    Histogram

    Amount of time spent updating configs

    update_empty

    Counter

    Total cluster membership updates ending with empty cluster load assignment and continuing with previous config

    update_no_rebuild

    Counter

    Total successful cluster membership updates that didn’t result in any cluster load balancing structure rebuilds

    version

    Gauge

    Hash of the contents from the last successful API fetch

    max_host_weight

    Gauge

    Maximum weight of any host in the cluster

    bind_errors

    Counter

    Total errors binding the socket to the configured source address

    assignment_timeout_received

    Counter

    Total assignments received with endpoint lease information.

    assignment_stale

    Counter

    Number of times the received assignments went stale before new assignments arrived.

    HTTP/3 protocol statistics

    HTTP/3 protocol stats are global with the following statistics:

    Name

    Type

    Description

    upstream.<tx/rx>.quicconnection_close_error_code<errorcode>

    Counter

    A collection of counters that are lazily initialized to record each QUIC connection close’s error code.

    upstream.<tx/rx>.quic_reset_stream_error_code<error_code>

    Counter

    A collection of counters that are lazily initialized to record each QUIC stream reset error code.

    Health check statistics

    If health check is configured, the cluster has an additional statistics tree rooted at cluster.<name>.health_check. with the following statistics:

    Name

    Type

    Description

    attempt

    Counter

    Number of health checks

    success

    Counter

    Number of successful health checks

    failure

    Counter

    Number of immediately failed health checks (e.g. HTTP 503) as well as network failures

    passive_failure

    Counter

    Number of health check failures due to passive events (e.g. x-envoy-immediate-health-check-fail)

    network_failure

    Counter

    Number of health check failures due to network error

    verify_cluster

    Counter

    Number of health checks that attempted cluster name verification

    healthy

    Gauge

    Number of healthy members

    Outlier detection statistics

    If is configured for a cluster, statistics will be rooted at cluster.<name>.outlier_detection. and contain the following:

    Name

    Type

    Description

    ejections_enforced_total

    Counter

    Number of enforced ejections due to any outlier type

    ejections_active

    Gauge

    Number of currently ejected hosts

    ejections_overflow

    Counter

    Number of ejections aborted due to the max ejection %

    ejections_enforced_consecutive_5xx

    Counter

    Number of enforced consecutive 5xx ejections

    ejections_detected_consecutive_5xx

    Counter

    Number of detected consecutive 5xx ejections (even if unenforced)

    ejections_enforced_success_rate

    Counter

    Number of enforced success rate outlier ejections. Exact meaning of this counter depends on outlier_detection.split_external_local_origin_errors config item. Refer to for details.

    ejections_detected_success_rate

    Counter

    Number of detected success rate outlier ejections (even if unenforced). Exact meaning of this counter depends on outlier_detection.split_external_local_origin_errors config item. Refer to for details.

    ejections_enforced_consecutive_gateway_failure

    Counter

    Number of enforced consecutive gateway failure ejections

    ejections_detected_consecutive_gateway_failure

    Counter

    Number of detected consecutive gateway failure ejections (even if unenforced)

    ejections_enforced_consecutive_local_origin_failure

    Counter

    Number of enforced consecutive local origin failure ejections

    ejections_detected_consecutive_local_origin_failure

    Counter

    Number of detected consecutive local origin failure ejections (even if unenforced)

    ejections_enforced_local_origin_success_rate

    Counter

    Number of enforced success rate outlier ejections for locally originated failures

    ejections_detected_local_origin_success_rate

    Counter

    Number of detected success rate outlier ejections for locally originated failures (even if unenforced)

    ejections_enforced_failure_percentage

    Counter

    Number of enforced failure percentage outlier ejections. Exact meaning of this counter depends on outlier_detection.split_external_local_origin_errors config item. Refer to for details.

    ejections_detected_failure_percentage

    Counter

    Number of detected failure percentage outlier ejections (even if unenforced). Exact meaning of this counter depends on outlier_detection.split_external_local_origin_errors config item. Refer to for details.

    ejections_enforced_failure_percentage_local_origin

    Counter

    Number of enforced failure percentage outlier ejections for locally originated failures

    ejections_detected_failure_percentage_local_origin

    Counter

    Number of detected failure percentage outlier ejections for locally originated failures (even if unenforced)

    ejections_total

    Counter

    Deprecated. Number of ejections due to any outlier type (even if unenforced)

    ejections_consecutive_5xx

    Counter

    Deprecated. Number of consecutive 5xx ejections (even if unenforced)

    Circuit breakers statistics will be rooted at cluster.<name>.circuit_breakers.<priority>. and contain the following:

    If timeout budget statistic tracking is turned on, statistics will be added to cluster.<name> and contain the following:

    Name

    Type

    Description

    upstream_rq_timeout_budget_percent_used

    Histogram

    What percentage of the global timeout was used waiting for a response

    upstream_rq_timeout_budget_per_try_percent_used

    Histogram

    What percentage of the per try timeout was used waiting for a response

    Dynamic HTTP statistics

    If HTTP is used, dynamic HTTP response code statistics are also available. These are emitted by various internal systems as well as some filters such as the and rate limit filter. They are rooted at cluster.<name>. and contain the following statistics:

    Name

    Type

    Description

    upstreamrq_completed

    Counter

    Total upstream requests completed

    upstream_rq<xx>

    Counter

    Aggregate HTTP response codes (e.g., 2xx, 3xx, etc.)

    upstreamrq<>

    Specific HTTP response codes (e.g., 201, 302, etc.)

    upstreamrq_time

    Histogram

    Request time milliseconds

    canary.upstream_rq_completed

    Counter

    Total upstream canary requests completed

    canary.upstream_rq<xx>

    Counter

    Upstream canary aggregate HTTP response codes

    canary.upstreamrq<>

    Counter

    Upstream canary specific HTTP response codes

    canary.upstreamrq_time

    Histogram

    Upstream canary request time milliseconds

    internal.upstream_rq_completed

    Counter

    Total internal origin requests completed

    internal.upstream_rq<xx>

    Counter

    Internal origin aggregate HTTP response codes

    internal.upstreamrq<>

    Counter

    Internal origin specific HTTP response codes

    internal.upstreamrq_time

    Histogram

    Internal origin request time milliseconds

    external.upstream_rq_completed

    Counter

    Total external origin requests completed

    external.upstream_rq<xx>

    Counter

    External origin aggregate HTTP response codes

    external.upstreamrq<>

    Counter

    External origin specific HTTP response codes

    external.upstream_rq_time

    Histogram

    External origin request time milliseconds

    TLS statistics

    If TLS is used by the cluster the following statistics are rooted at cluster.<name>.ssl.:

    Name

    Type

    Description

    connection_error

    Counter

    Total TLS connection errors not including failed certificate verifications

    handshake

    Counter

    Total successful TLS connection handshakes

    session_reused

    Counter

    Total successful TLS session resumptions

    no_certificate

    Counter

    Total successful TLS connections with no client certificate

    fail_verify_no_cert

    Counter

    Total TLS connections that failed because of missing client certificate

    fail_verify_error

    Counter

    Total TLS connections that failed CA verification

    fail_verify_san

    Counter

    Total TLS connections that failed SAN verification

    fail_verify_cert_hash

    Counter

    Total TLS connections that failed certificate pinning verification

    ocsp_staple_failed

    Counter

    Total TLS connections that failed compliance with the OCSP policy

    ocsp_staple_omitted

    Counter

    Total TLS connections that succeeded without stapling an OCSP response

    ocsp_staple_responses

    Counter

    Total TLS connections where a valid OCSP response was available (irrespective of whether the client requested stapling)

    ocsp_staple_requests

    Counter

    Total TLS connections where the client requested an OCSP staple

    ciphers.<cipher>

    Counter

    Total successful TLS connections that used cipher <cipher>

    curves.<curve>

    Counter

    Total successful TLS connections that used ECDHE curve <curve>

    sigalgs.<sigalg>

    Counter

    Total successful TLS connections that used signature algorithm <sigalg>

    versions.<version>

    Counter

    Total successful TLS connections that used protocol version <version>

    Alternate tree dynamic HTTP statistics

    If alternate tree statistics are configured, they will be present in the cluster.<name>.<alt name>. namespace. The statistics produced are the same as documented in the dynamic HTTP statistics section .

    If the service zone is available for the local service (via --service-zone) and the , Envoy will track the following statistics in cluster.<name>.zone.<from_zone>.<to_zone>. namespace.

    Name

    Type

    Description

    upstreamrq<xx>

    Counter

    Aggregate HTTP response codes (e.g., 2xx, 3xx, etc.)

    upstreamrq<>

    Counter

    Specific HTTP response codes (e.g., 201, 302, etc.)

    upstream_rq_time

    Histogram

    Request time milliseconds

    Statistics for monitoring load balancer decisions. Stats are rooted at cluster.<name>. and contain the following statistics:

    Statistics for monitoring load balancer subset decisions. Stats are rooted at cluster.<name>. and contain the following statistics:

    Name

    Type

    Description

    lb_subsets_active

    Gauge

    Number of currently available subsets

    lb_subsets_created

    Counter

    Number of subsets created

    lb_subsets_removed

    Counter

    Number of subsets removed due to no hosts

    lb_subsets_selected

    Counter

    Number of times any subset was selected for load balancing

    lb_subsets_fallback

    Counter

    Number of times the fallback policy was invoked

    lb_subsets_fallback_panic

    Counter

    Number of times the subset panic mode triggered

    lb_subsets_single_host_per_subset_duplicate

    Gauge

    Number of duplicate (unused) hosts when using

    Statistics for monitoring the size and effective distribution of hashes when using the ring hash load balancer. Stats are rooted at cluster.<name>.ring_hash_lb. and contain the following statistics:

    Name

    Type

    Description

    size

    Gauge

    Total number of host hashes on the ring

    min_hashes_per_host

    Gauge

    Minimum number of hashes for a single host

    max_hashes_per_host

    Gauge

    Maximum number of hashes for a single host

    Maglev load balancer statistics

    Statistics for monitoring effective host weights when using the . Stats are rooted at cluster.<name>.maglev_lb. and contain the following statistics:

    Name

    Type

    Description

    min_entries_per_host

    Gauge

    Minimum number of entries for a single host

    max_entries_per_host

    Gauge

    Maximum number of entries for a single host

    If request response size statistics are tracked, statistics will be added to cluster.<name> and contain the following:

    Name

    Type

    Description

    upstream_rq_headers_size

    Histogram

    Request headers size in bytes per upstream

    upstream_rq_body_size

    Histogram

    Request body size in bytes per upstream

    upstream_rs_headers_size

    Histogram

    Response headers size in bytes per upstream

    upstream_rs_body_size

    Histogram