• Block cache
  • Indexes and bloom filters
  • Blocks pinned by iteratorsWe will describe each of them in turn.

Block cache is where RocksDB caches uncompressed data blocks. You can configure block cache's size by setting block_cache property of BlockBasedTableOptions:

If the data block is not found in block cache, RocksDB reads it from file using buffered IO. That means it also uses page cache — it contains raw compressed blocks. In a way, RocksDB's cache is two-tiered: block cache and page cache. Unintuitively, decreasing block cache size will not increase IO. The memory saved will likely be used for page cache, so even more data will be cached. However, CPU usage might grow because RocksDB needs to decompress pages it reads from page cache.

To learn how much memory is block cache using, you can call a function GetUsage() on block cache object:

In MongoRocks, you can get the size of block cache by calling

Here's how you can roughly calculate and manage sizes of index and filter blocks:

  • For each data block we store three information in the index: a key, a offset and size. Therefore, there are two ways you can reduce the size of the index. If you increase block size, the number of blocks will decrease, so the index size will also reduce linearly. By default our block size is 4KB, although we usually run with 16-32KB in production. The second way to reduce the index size is the reduce key size, although that might not be an option for some use-cases.
  • Calculating the size of filter blocks is easy. If you configure bloom filters with 10 bits per key (default, which gives 1% of false positives), the bloom filter size is number_of_keys * 10 bits. There's one trick you can play here, though. If you're certain that Get() will mostly find a key you're looking for, you can set options.optimize_filters_for_hits = true. With this option turned on, we will not build bloom filters on the last level, which contains 90% of the database. Thus, the memory usage for bloom filters will be 10X less. You will pay one IO for each Get() that doesn't find data in the database, though.There are two options that configure how much index and filter blocks we fit in memory:

  • If you set cache_index_and_filter_blocks to true, index and filter blocks will be stored in block cache, together with all other data blocks. This also means they can be paged out. If your access pattern is very local (i.e. you have some very cold key ranges), this setting might make sense. However, in most cases it will hurt your performance, since you need to have index and filter to access a certain file. Always consider to set pin_l0_filter_and_index_blocks_in_cache too to minimize the performance impact.

  1. std::string out;
  2. db->GetProperty("rocksdb.estimate-table-readers-mem", &out);

In MongoRocks, just call this API from the mongo shell:

You can think of memtables as in-memory write buffers. Each new key-value pair is first written to the memtable. Memtable size is controlled by the option . It's usually not a big memory consumer. However, memtable size is inversely proportional to write amplification — the more memory you give to the memtable, the less the write amplification is. If you increase your memtable size, be sure to also increase your L1 size! L1 size is controlled by the option max_bytes_for_level_base.

To get the current memtable size, you can use:

  1. std::string out;
  2. db->GetProperty("rocksdb.cur-size-all-mem-tables", &out);

In MongoRocks, the equivalent call is

Since version 5.6, you can cost the memory budget of memtables as a part of block cache. Check Write Buffer Manager for the information.

  1. table_options.block_cache->GetPinnedUsage();