If is used as IndexType, the index block is a 2nd level index on index partitions, i.e., each entry points to another index block that contains one entry per data block. In this case, the format will be

As described above, the key stored for a block is between the last key of the block and the first key of the next block. There are usually a range of potential keys met this condition. Choosing a smaller one can reduce the index size. If BlockBasedTableOptions.index_shortening is set to kShortenSeparators or kShortenSeparatorsAndSuccessor, the last key of the block, and the first key of the next block will be passed to Comparator::FindShortestSeparator() to find out the shortest separator key. This function is implemented in builtin byte-wise and reverse byte-wise comparators. User comparators need to implement the function to take advantage this feature.

Similarly the index key for the last block is determined by Comparator::FindShortSuccessor(), which provides any key that is greater or equal to the last key of the last block. Users also need to implement this function for customized comparator to take advantage of the memory saving feature.

Value kNoShortening can also be used together with to prevent reading some blocks using a special function, which is explained below.

Up to RocksDB version 5.14, BlockBasedTableOptions::format_version\=2, the format of index and data blocks are the same, where the index blocks use same key format of <user_key,seq> but special values, <offset,size>, that point to data blocks. Different from data blocks, the option controlling restart block size is BlockBasedTableOptions.index_block_restart_interval, rather than BlockBasedTableOptions.block_restart_interval. The default value is 1, rather than 16 for data blocks. So the default is relatively memory costly. Setting the value to 8 or 16 can usually shrink index block size by half, but the CPU overhead might increase based on workloads. format_version=3,4 further optimized size, yet forward-incompatible format for index blocks.

  • format_version\=4 (Since RocksDB 5.16): Changes the format of index blocks by delta encoding the index values, which are the block handles. This saves the encoding of BlockHandle::offset of the non-head index entries in each restart interval. If used, TableProperties::index_value_is_delta_encoded is set, which is used by the reader to know how to decode the index block. The format of each key is (shared_size, non_shared_size, shared, non_shared). The format of each value, i.e., block handle, is (offset, size) whenever the shared_size is 0, which included the first entry in each restart point. Otherwise the format is delta-size = block handle size - size of last block handle.

The index format in format_version=4 would be as follows:

The feature of index_type == kBinarySearchWithFirstKey is to allow RocksDB to see first key of a data block without reading it from the disk. With this feature, RocksDB knows the key of the first block, so it doesn’t have to read the data block immediately. Only when users call Iterator::value(), the block can be loaded. This can effectively prevent I/O and other overhead for some special workloads. For example, when data block usually occupy the whole data block, a Get() can skip data block reads for files not containing the keys.

If this option is used, for each entry in the index block, following the (offset, size) part, the first key of the block is stored, in the form of length prefixed string. For example, version_format = 5 will take the format as:

It’s similar to other format versions and restart block size.