RocksDB Iterator

DBIter is a wrapper around an InternalIterator (In this case a MergingIterator). DBIter‘s job is to parse InternalKeys exposed by the underlying InternalIterator and expose them as user keys.

Example:

The underlying InternalIterator exposed

But what DBIter will expose to the user is

The MergingIterator is composed of many child iterators, MergingIterator is basically a heap for Iterators. In MergingIterator we put all child Iterators in a heap and expose them as one sorted stream.

Example:

The underlying child Iterators exposed

The will keep all child Iterators in a heap and expose them as one sorted stream

This is a wrapper around MemtableRep::Iterator, Every memtable representation implements its own Iterator to expose the keys/values in the memtable as a sorted stream.

A TwoLevelIterator is composed of 2 Iterators

  • First level Iterator (first_level_iter_)
  • Second level Iterator (second_level_iter_)

first_level_iter_ is used to figure out the second_level_iter_ to use, and second_level_iter_ points to the actual data that we are reading.

Example:

RocksDB uses TwoLevelIterator to read SST files, first_level_iter_ is a BlockIter on the SST file Index block and is a BlockIter on a Data block.

Let’s look at this simplified representation of an SST file, we have 4 Data blocks and 1 Index Block

To read this file we will create a TwoLevelIterator with

  • first_level_iter_ => BlockIter over Index block

When we ask our TwoLevelIterator to Seek to KEY8 for example, it will first use first_level_iter_ (BlockIter over Index block) to figure out which block may contain this key. this will lead us to set the second_level_iter_ to be (BlockIter over data block with offset ). We will then use the second_level_iter_ to find our key & value in the data block.