heartbeat_agg() functions
It’s also possible to combine multiple heartbeat aggregates to determine the overall health of a service. For example, the heartbeat aggregates from a primary and standby server could be combine to see if there was ever a window where both machines were down at the same time.
Hide content
This group of functions uses the two-step aggregation pattern.
Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.
Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.
The two-step aggregation pattern has several advantages:
- More efficient because multiple accessors can reuse the same aggregate
- Easier to reason about performance, because aggregation is separate from final computation
- Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
- Can perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result
To learn more, see the .
warning
This function group includes some experimental functions. Experimental functions might change or be removed in future releases. We do not recommend using them in production. Experimental functions are marked with an Experimental tag.
Aggregate
ExperimentalCreate a liveness aggregate from a set of heartbeats
Accessor
Get the down intervals from a heartbeat_agg
Get the total time dead during a heartbeat aggregate
Adjust a heartbeat aggregate with predecessor information
Get the total time dead from a heartbeat aggregate and predecessor
Get the total time live from a heartbeat aggregate and predecessor
Test if the aggregate has a heartbeat covering a given time
Get the live intervals from a heartbeat_agg
Get the total time live during a heartbeat aggregate
Rollup
ExperimentalCombine multiple heartbeat aggregates
Introduced in Toolkit v1.13.0
Hide content
`
`
agg_start TIMESTAMPTZ,
agg_duration INTERVAL,
heartbeat_liveness INTERVAL
`
) RETURNS HeartbeatAgg
`
This takes a set of heartbeat timestamps and aggregates the liveness state of the underlying system for the specified time range.
Required arguments
Returns
Column | Type | Description |
---|---|---|
heartbeat_agg | HeartbeatAgg | The liveness data for the heartbeated system over the provided interval. |
Examples
Given a table called system_health
with a ping_time
column, construct an aggregate of system liveness for 10 days starting from Jan 1, 2022. This assumes a system is unhealthy if it hasn’t been heard from in a 5 minute window:
SELECT heartbeat_agg(
ping_time,
'01-01-2022 UTC',
'10 days',
'5 min')
FROM system_health;
dead_ranges()
Introduced in Toolkit v1.13.0
Hide content
`
dead_ranges(
`
agg HEARTBEATAGG
`
) RETURNS TABLE (
`
start TIMESTAMPTZ,
end TIMESTAMPTZ
`
)
`
Given a heartbeat aggregate, this will return a set of (starttime, endtime) pairs representing when the underlying system did not have a valid heartbeat during the interval of the aggregate.
Required arguments
Name | Type | Description |
---|---|---|
agg | HeartbeatAgg | A heartbeat aggregate to get the liveness data from. |
Column | Type | Description |
---|---|---|
dead_ranges | TABLE (start TIMESTAMPTZ, end TIMESTAMPTZ) | The (start, end) pairs of when the system was down. |
Examples
Given a table called liveness
containing weekly heartbeat aggregates in column health
with timestamp column date
, we can use the following to get the intervals where the system was down during the week of Jan 9, 2022:
SELECT dead_ranges(health)
FROM liveness
WHERE date = '01-9-2022 UTC'
dead_ranges
-----------------------------------------------------
("2022-01-09 00:00:00+00","2022-01-09 00:00:30+00")
("2022-01-12 15:27:22+00","2022-01-12 15:31:17+00")
downtime()
Introduced in Toolkit v1.13.0
Hide content
`
downtime(
`
agg HEARTBEATAGG
`
) RETURNS INTERVAL
`
Given a heartbeat aggregate, this will sum all the ranges where the system did not have a recent enough heartbeat.
There may appear to be some downtime between the start of the aggregate and the first heartbeat. If there is a heartbeat aggregage covering the previous period, you can use its last heartbeat to correct for this using interpolated_downtime().
Required arguments
Name | Type | Description |
---|---|---|
agg | HeartbeatAgg | A heartbeat aggregate to get the liveness data from. |
Returns
Column | Type | Description |
---|---|---|
downtime | INTERVAL | The sum of all the dead ranges in the aggregate. |
Examples
Given a table called liveness
containing weekly heartbeat aggregates in column health
with timestamp column date
, we can use the following to get the total downtime of the system during the week of Jan 9, 2022:
SELECT downtime(health)
FROM liveness
WHERE date = '01-9-2022 UTC'
downtime
--------
00:04:25
interpolate()
Introduced in Toolkit v1.13.0
Hide content
`
`
agg HEARTBEATAGG,
pred HEARTBEATAGG
`
) RETURNS HEARTBEATAGG
`
This takes a heartbeat aggregate and the aggregate immediately preceding it. It updates the aggregate to include any live ranges that should have been carried over from the last heartbeat in the predecessor, even if there aren’t heartbeats for that range in the interval covered by this aggregate. It returns the updated aggregate, which can then be used with any of the heartbeat aggregate accessors.
Required arguments
Name | Type | Description |
---|---|---|
agg | HeartbeatAgg | A heartbeat aggregate containing liveness data for a particular interval. |
Optional arguments
Returns
Column | Type | Description |
---|---|---|
interpolate | A copy of agg which has been update to include any heartbeat intervals extending past the end of pred . |
Examples
Given a table called liveness
containing weekly heartbeat aggregates in column health
with timestamp column date
, we can use the following to get the intervals where the system was unhealthy during the week of Jan 9, 2022. This correctly excludes any ranges covered by a heartbeat at the end of the Jan 2 week:
SELECT dead_ranges(
interpolate(
health,
LAG(health) OVER (ORDER BY date)
)
)
FROM liveness
WHERE date = '01-9-2022 UTC'
dead_ranges
-----------------------------------------------------
("2022-01-12 15:27:22+00","2022-01-12 15:31:17+00")
Introduced in Toolkit v1.13.0
Hide content
`
interpolated_downtime(
`
agg HEARTBEATAGG,
pred HEARTBEATAGG
`
) RETURNS INTERVAL
`
This behaves very similarly to , but it also takes the heartbeat aggregate from the preceding interval. It checks when the last heartbeat in the predecessor was received and makes sure not to consider the heartbeat interval after that time as unhealthy, even if it extends into the current aggregate prior to the first heartbeat.
Required arguments
Name | Type | Description |
---|---|---|
agg | HeartbeatAgg | A heartbeat aggregate to get the liveness data from. |
Optional arguments
Name | Type | Description |
---|---|---|
pred | HeartbeatAgg | The heartbeat aggregate for the interval before the one being measured, if one exists. |
Returns
Column | Type | Description |
---|---|---|
interpolated_downtime | INTERVAL | The sum of all the unhealthy ranges in the aggregate, excluding those covered by the last heartbeat of the previous interval. |
Examples
Given a table called liveness
containing weekly heartbeat aggregates in column health
with timestamp column date
, you can use this command to get the total interpolated downtime of the system during the week of Jan 9, 2022:
SELECT interpolated_downtime(
health,
LAG(health) OVER (ORDER BY date)
)
FROM liveness
WHERE date = '01-9-2022 UTC'
interpolated_downtime
---------------------
00:03:55
interpolated_uptime()
Introduced in Toolkit v1.13.0
Hide content
`
interpolated_uptime(
`
agg HEARTBEATAGG,
pred HEARTBEATAGG
`
) RETURNS INTERVAL
`
This behaves very similarly to uptime(), but it also takes the heartbeat aggregate from the preceding interval. It checks when the last heartbeat in the predecessor was received and makes sure that the entire heartbeat interval after that is considered live. This addresses the issue where uptime
would consider the interval between the start of the interval and the first heartbeat as dead.
Required arguments
Name | Type | Description |
---|---|---|
agg | HeartbeatAgg | A heartbeat aggregate to get the liveness data from. |
Optional arguments
Name | Type | Description |
---|---|---|
pred | HeartbeatAgg | The heartbeat aggregate for the interval before the one being measured, if one exists. |
Examples
Given a table called liveness
containing weekly heartbeat aggregates in column health
with timestamp column date
, you can use this command to get the total interpolated uptime of the system during the week of Jan 9, 2022:
SELECT interpolated_uptime(
health,
LAG(health) OVER (ORDER BY date)
)
FROM liveness
WHERE date = '01-9-2022 UTC'
-------------------
6 days 23:56:05
live_at()
Introduced in Toolkit v1.13.0
Hide content
`
live_at(
`
agg HEARTBEATAGG,
test TIMESTAMPTZ
`
`
Given a heartbeat aggregate and a timestamp, this returns whether the aggregate has a heartbeat indicating the system was live at the given time.
Note that this returns false for any time not covered by the aggregate.
Required arguments
Name | Type | Description |
---|---|---|
agg | HeartbeatAgg | A heartbeat aggregate to get the liveness data from. |
test | TimestampTz | The time to test the liveness of. |
Returns
Column | Type | Description |
---|---|---|
live_at | bool | True if the heartbeat aggregate had a heartbeat close before the test time. |
Examples
Given a table called liveness
containing weekly heartbeat aggregates in column health
with timestamp column date
, we can use the following to see if the sytem was live at a particular time:
SELECT live_at(health, '2022-01-12 15:30:00+00')
FROM liveness
WHERE date = '01-9-2022 UTC'
live_at
---------
f
live_ranges()
Introduced in Toolkit v1.13.0
Hide content
`
live_ranges(
`
agg HEARTBEATAGG
`
) RETURNS TABLE (
`
start TIMESTAMPTZ,
end TIMESTAMPTZ
`
)
`
Given a heartbeat aggregate, this returns a set of (starttime, endtime) pairs representing when the underlying system was live during the interval of the aggregate.
Required arguments
Name | Type | Description |
---|---|---|
agg | HeartbeatAgg | A heartbeat aggregate to get the liveness data from. |
Returns
Column | Type | Description |
---|---|---|
live_ranges | TABLE (start TIMESTAMPTZ, end TIMESTAMPTZ) | The (start, end) pairs of when the system was live. |
Examples
Given a table called liveness
containing weekly heartbeat aggregates in column health
with timestamp column date
, we can use the following to get the intervals where the system was live during the week of Jan 9, 2022:
SELECT live_ranges(health)
FROM liveness
WHERE date = '01-9-2022 UTC'
live_ranges
-----------------------------------------------------
("2022-01-09 00:00:30+00","2022-01-12 15:27:22+00")
("2022-01-12 15:31:17+00","2022-01-16 00:00:00+00")
Introduced in Toolkit v1.13.0
Hide content
`
uptime(
`
agg HEARTBEATAGG
`
) RETURNS INTERVAL
`
Given a heartbeat aggregate, this sums all the ranges where the system was live and returns the total.
There may appear to be some downtime between the start of the aggregate and the first heartbeat. If there is a heartbeat aggregage covering the previous period, you can use its last heartbeat to correct for this using .
Required arguments
Name | Type | Description |
---|---|---|
agg | HeartbeatAgg | A heartbeat aggregate to get the liveness data from. |
Returns
Column | Type | Description |
---|---|---|
uptime | INTERVAL | The sum of all the live ranges in the aggregate. |
Examples
Given a table called liveness
containing weekly heartbeat aggregates in column health
with timestamp column date
, you can use this command to get the total uptime of the system during the week of Jan 9, 2022:
SELECT uptime(health)
FROM liveness
WHERE date = '01-9-2022 UTC'
uptime
-----------------
6 days 23:55:35
rollup()
Introduced in Toolkit v1.13.0
Hide content
`
`
heartbeatagg HEARTBEATAGG
`
`
This combines multiple heartbeat aggregates into one. This can be used to combine aggregates into adjacent intervals into one larger interval, such as rolling daily aggregates into a weekly or monthly aggregate.
Another use for this is to combine heartbeat aggregates for redundant systems to determine if there were any overlapping failures. For instance, a master and standby system can have their heartbeats combined to see if there were any intervals where both systems were down at the same time. The result of rolling overlapping heartbeats together like this is a heartbeat aggregate which considers a time live if any of its component aggregates were live.
Returns
Column | Type | Description |
---|---|---|
rollup | HeartbeatAgg | A heartbeat aggregate covering the interval from the earliest start time of its component aggregates to the latest end time. It combines the live ranges of all the components. |