Each key-value pair in RocksDB is associated with exactly one Column Family. If there is no Column Family specified, key-value pair is associated with Column Family "default".

Column Families provide a way to logically partition the database. Some interesting properties:

  • Atomic writes across Column Families are supported. This means you can atomically execute Write({cf1, key1, value1}, {cf2, key2, value2}).
  • Consistent view of the database across Column Families.
  • On-the-fly adding new Column Families and dropping them. Both operations are reasonably fast.

Although we needed to make drastic API changes to support Column Families, we still support the old API. You don't need to make any changes to upgrade your application to RocksDB 3.0. All key-value pairs inserted through the old API are inserted into the Column Family "default". The same is true for downgrade after an upgrade. If you never use more than one Column Family, we don't change any disk format, which means you can safely roll back to RocksDB 2.8. This is very important for our customers inside Facebook.

  1. ColumnFamilyHandle

Column Families are handled and referenced with a ColumnFamilyHandle. Think of it as a open file descriptor. You need to delete all ColumnFamilyHandles before you delete your DB pointer. One interesting thing: Even if ColumnFamilyHandle is pointing to a dropped Column Family, you can continue using it. The data is actually deleted only after you delete all outstanding ColumnFamilyHandles.

When opening a DB in a read-write mode, you need to specify all Column Families that currently exist in a DB. If that's not the case, DB::Open call will return . You specify Column Families with a vector of ColumnFamilyDescriptors. ColumnFamilyDescriptor is just a struct with a Column Family name and ColumnFamilyOptions. Open call will return a Status and also a vector of pointers to ColumnFamilyHandles, which you can then use to reference Column Families. Make sure to delete all ColumnFamilyHandles before you delete the DB pointer.

  1. DB::OpenForReadOnly(const DBOptions& db_options, const std::string& name, const std::vector<ColumnFamilyDescriptor>& column_families, std::vector<ColumnFamilyHandle*>* handles, DB** dbptr, bool error_if_log_file_exist = false)

The behavior is similar to DB::Open, except that it opens DB in read-only mode. One big difference is that when opening the DB as read-only, you don't need to specify all Column Families — you can only open a subset of Column Families.

ListColumnFamilies is a static function that returns the list of all column families currently present in the DB.

  1. DB::CreateColumnFamily(const ColumnFamilyOptions& options, const std::string& column_family_name, ColumnFamilyHandle** handle)

Drop the column family specified by ColumnFamilyHandle. Note that the actual data is not deleted until the client calls delete column_family;. You can still continue using the column family if you have outstanding ColumnFamilyHandle pointer.

    This is the new call, which enables you to create iterators on multiple Column Families that have consistent view of the database.

    WriteBatch

    To execute multiple writes atomically, you need to build a WriteBatch. All WriteBatch API calls now also take ColumnFamilyHandle* to specify the Column Family you want to write to.

    All other API calls

    All other API calls have a new argument ColumnFamilyHandle*, through which you can specify the Column Family.

    Every time a single Column Family is flushed, we create a new WAL (write-ahead log). All new writes to all Column Families go to the new WAL. However, we still can't delete the old WAL since it contains live data from other Column Families. We can delete the old WAL only when all Column Families have been flushed and all data contained in that WAL persisted in table files. This created some interesting implementation details and will create interesting tuning requirements. Make sure to tune your RocksDB such that all column families are regularly flushed. Also, take a look at Options::max_total_wal_size, which can be configured such that stale column families are automatically flushed.