Secondary indexes
Data in YDB tables is always indexed by primary key. This means that fetching any record from a table with the specified values of primary key fields will always take the minimum fixed time, regardless of the total number of records in the table. A primary key index also lets you get any consecutive range of records in ascending or descending order of primary key values. The execution time of this operation will only depend on the number of records received, regardless of the total number of records in the table.
To use similar features for any table fields or combinations thereof, additional indexes called secondary indexes can be built based on them.
Transactional systems use indexes to avoid performance degradation and an increase in the query execution cost if the amount of stored data grows.
This article describes the basic operations with secondary indexes and provides links to detailed information on each operation. For information about different types of secondary indexes and their specifics, see the article in the “Concepts” section.
A secondary index is a data schema object that can be set when creating a table with the YQL CREATE TABLE statement or added to it later with the statement.
The table index add command for creating an index is supported in the YDB CLI.
Since an index contains its own data that is derived from table data, an initial index build operation is performed when creating an index in an existing data table. This may take a long time. This operation is run in the background and is non-blocking for the table. However, you can’t use a new index until its build completes.
Indexes can only be used in the order of their fields. If there are two index fields, and b
, this index can be effectively used for queries like:
where a = $var1 and b = $var2
.where a = $var1
.where a > $var1
, and other comparison operators.where a = $var1 and b > $var2
, and any other comparison operators, but the first field must be checked for equality.
where b = $var1
.- .
Considering the above specifics, it’s useless to try to index all possible combinations of table columns in advance to speed up the execution of any query. An index is always a trade-off between the speed of searching and writing data and the storage space this data takes. Indexes are created for specific selects and criteria for a search that an application will make in the database.
Using secondary indexes when making a Select
To access a table by secondary index, its name must be explicitly specified in the view
section after the table name as described in the article about the YQL SELECT statement. For example, to make a Select from the orders
table for the customer with the specified ID (id_customer
), run a query like this:
whereidx_customer
is the name of a secondary index on the orders
table with the id_customer
field specified first.
If the view
section is omitted, a full scan of the orders
table is performed for making this query.
In transactional applications, such information requests are executed using paginated output. This helps avoid an increase in costs and execution time if the number of records that meet the filtering criteria grows. The approach to making that is described using a primary key as an example is also applicable to columns included in a secondary index.
Any query in a transactional application should be checked in terms of how many I/O operations it performed in the database and how much CPU it used to run. You should also make sure that these metrics do not grow indefinitely with the growth of the DB size. After making every query, YDB returns statistics that you can use to analyze the necessary parameters.
If you use the YDB CLI, select the option to enable statistics output after executing the yql
command. All YDB SDKs also contain structures that provide statistics after making queries. If you run queries in the UI, there is also a tab with statistics next to the results tab.
Updating data using a secondary index
To update data in table1
, run the query:
To delete data by secondary index, use SELECT
with a predicate by secondary index and then call DELETE ON
.
To delete all data about series with zero views from the table, run the query:
Performance of writing data to tables with secondary indexes
Additional data structures are needed for secondary indexes to work. The support of these structures leads to an increase in the cost of table data update operations.
With synchronous index updates, a transaction is only committed after writing all the necessary data, both in the table and in synchronous indexes. As a result, it takes longer to execute transactions and makes it necessary to use distributed transactions even if adding and updating records in the only partition.
Asynchronously updated indexes can still use single-shard transactions. However, they only guarantee eventual consistency and still generate load on the database.