Introduction

Read Path

  1. The mutable memtable is looked up. If a bloom filter is configured for memtable, the filter is probed using either the whole key or prefix. If the result is positive, the memtable rep lookup happens.
  2. If the key was not found, 0 or more immutable memtables are looked up using the same process as #1
  3. Step #3 is repeated for each level, with the only difference in L2 and higher being the fractional cascading for SST file lookup.

MultiGet Performance Optimizations

  1. When is set, filter and index blocks for an SST file are fetched from the block cache on each key lookup. On a system with many threads performing reads, this results in significant lock contention on the LRU mutex. MultiGet looks up the filter and index block in the block cache only once for a whole batch of keys overlapping an SST file key range, thus drastically reducing the LRU mutex contention.
  2. In steps 1, 2 and 3c, CPU cache misses occur due to bloom filter probes. Assuming a database with 6 levels and most keys being found in the bottommost level, with an average of 2 L0 files, we will have ~6 cache misses due to filter lookups in SST files. There may be an additional 1-2 cache misses if memtable bloom filters are configured. By batching the lookups at each stage, the filter cache line accesses can be pipelined, thus hiding the cache miss latency.
  3. In a large database, data block reads are highly likely to require IO. This introduces latency. MultiGet has the capability to issue IO requests for multiple data blocks in the same SST file in parallel, thus reducing latency. This depends on support for parallel reads in the same thread from the underlying implementation. On Linux, has the capability to do parallel IO for MultiGet() using the IO Uring interface. IO Uring is a new asynchronous IO implementation introduced in the Linux kernel starting from 5.1.