Get faster DISTINCT queries with SkipScan

    To query your database and find the most recent value of an item, you could use a DISTINCT query. For example, you might want to find the latest stock or cryptocurrency price for each of your investments. Or you might have graphs and alarms that repeatedly query the most recent values for every device or service.

    As your tables get larger, DISTINCT queries tend to get slower. This is because PostgreSQL does not currently have a good mechanism for pulling a list of unique values from an ordered index. Even when you have an index that matches the exact order and columns for these kinds of queries, PostgreSQL scans the entire index to find all unique values. As a table grows, this operation keeps getting slower.

    SkipScan allows queries to incrementally jump from one ordered value to the next without reading all of the rows in between. Without support for this feature, the database engine has to scan the entire ordered index and then de-duplicate at the end, which is a much slower process.

    note

    SkipScan is an optimization for queries of the form SELECT DISTINCT ON column_name. Conceptually, SkipScan is a regular IndexScan that skips across an index looking for the next value that is greater than the current value.

    When you issue a query that uses SkipScan, the EXPLAIN output includes a new operator, or node, that can quickly return distinct items from a properly ordered index. With an IndexOnly scan, PostgreSQL has to scan the entire index, but SkipScan incrementally searches for each successive item in the ordered index. As it locates one item, the SkipScan node quickly restarts the search for the next item. This is a much more efficient way of finding distinct items in an ordered index.

    For benchmarking information on how SkipScan compares to regular DISTINCT queries, see our .

    Use SkipScan queries

    Your index must:

    • Contain the column as the first column.
    • Be a BTREE index.
    • Match the ORDER BY used in your query.

    Your query must:

    • Use the DISTINCT keyword on a single column.

    If the DISTINCT column is not the first column of the index, ensure any leading columns are used as constraints in your query. This means that if you are asking a question such as “retrieve a list of unique IDs in order” and “retrieve the last reading of each ID,” you need at least one index like this:

    1. -> Unique
    2. -> Merge Append
    3. Sort Key: _hyper_8_79_chunk.tags_id, _hyper_8_79_chunk."time" DESC
    4. -> Custom Scan (SkipScan) on _hyper_8_79_chunk
    5. -> Index Only Scan using _hyper_8_79_chunk_cpu_tags_id_time_idx on _hyper_8_79_chunk
    6. Index Cond: (tags_id > NULL::integer)
    7. -> Custom Scan (SkipScan) on _hyper_8_80_chunk
    8. -> Index Only Scan using _hyper_8_80_chunk_cpu_tags_id_time_idx on _hyper_8_80_chunk
    9. Index Cond: (tags_id > NULL::integer)