Estimate Vitals Storage in PostgreSQL

    These types of metrics are proxy latency, upstream latency, and cache hit/miss. Kong Gateway node statistics are stored in tables like the following:

    • stores 1 new row for every second Kong runs
    • vitals_stats_days stores 1 new row for every day Kong runs

    Kong Gateway node statistics are not associated with specific Kong Gateway entities like Workspaces, Services, or Routes. They’re designed to represent the cluster’s state in time. This means the tables will have new rows regardless if Kong Gateway is routing traffic or idle.

    The tables do not grow infinitely and hold data for the following duration of time:

    • vitals_stats_seconds_timestamp holds data for 1 hour (3600 rows)
    • vitals_stats_minutes holds data for 25 hours (90000 rows)
    • vitals_stats_days holds data for 2 years (730 rows)

    Request response codes are stored in the other group of tables following a different rationale. Tables in this group share the same structure (entity_id, at, duration, status_code, count):

    • vitals_code_classes_by_workspace
    • vitals_code_classes_by_cluster
    • vitals_codes_by_route

    The entity_id does not exist in vitals_code_classes_by_cluster as this table doesn’t store entity-specific information. In the vitals_code_classes_by_workspace table, entity_id is . In the vitals_codes_by_route table, entity_id is service_id and route_id.

    at is a timestamp. It logs the start of the period a row represents, while duration is the duration of that period.

    While Kong Gateway node statistic tables grow only according to time, status code tables only have new rows when Kong Gateway proxies traffic, and the number of new rows depends on the traffic itself.

    Consider a brand new Kong Gateway that hasn’t proxied any traffic yet. Kong Gateway node statistic tables have rows but status codes tables don’t.

    When Kong Gateway proxies its first request at t returning status code 200, the following rows are added:

    Second, minute, and day content is trimmed in the following way:

    • minute(t) is t trimmed to minutes, for example: minute(2021-01-01 20:21:30.234) would be 2021-01-01 20:21:00.
    • is t trimmed to day, for example: day(2021-01-01 20:21:30.234) would be 2021-01-01 00:00:00.

    Let’s consider what happens when new requests are proxied in some scenarios.

    If we make the same request again at the same t and it also receives 200, no new rows will be inserted.

    If the last request received a 500 status code, new rows are inserted:

    Assume that at t + 5s, where minute(t)==minute(t + 5s), Kong Gateway proxies the same request returning 200. Since minute() and day() for both t and t + 5s are the same, minute and day rows should just be updated. Since second() is different for the two instants, a new second row should be inserted in each table.

    In summary, the number of rows in those status codes tables is directly related to:

    • The number of status codes observed in Kong Gateway proxied requests
    • The number of Kong Gateway entities involved in those requests
    • The constant flow of proxied requests

    In an estimate of row numbers in scenario, consider a Kong Gateway cluster with the following characteristics:

    • A constant flow of requests returning all 5 possible groups of status codes (1xx, 2xx, 3xx, 4xx and 5xx).
    • Just 1 workspace, 1 service, and 1 route

    After 24 hours of traffic, the status codes tables will have this number of rows:

    It’s important to note that this assumes that all 5 groups of status codes had been observed in those 24 hours of traffic. This is why quantities were multiplied by 5.

    If the above Kong Gateway cluster is expanded to have 10 workspaces with 1 route each (10 routes total) and it proxies traffic for 24 hours and returns all 5 status codes, vitals_codes_by_workspace and would have 252,000 rows.