Life Cycle of a WAL

Let’s use an example to illustrate the life cycle of a WAL. A RocksDB instace is created with two column families “new_cf” and “default”. Once the db is opened, a new WAL will be created on disk to persist all writes.

Some key-value pairs are added to both column families

At this point the WAL should have recorded all writes. The WAL will stay open and keep recording future writes until its size reaches DBOptions::max_total_wal_size.

If user decides to flush the column family “new_cf”, several things happen: 1) new_cf’s data (key1 and key3) is flushed to a new SST file 2) a new WAL is created and all future writes to all column families now go to the new WAL 3) the older WAL will not accept new writes but the deletion may be delayed.

At this point there will be two WALs, the older WAL contains key1 through key4 and newer WAL contains key5 and key6. Because the older WAL still contains live data for at least one column family (“default”), it cannot be deleted yet. Only when user finally decides to flush “default” column family, the older WAL can be archived and purged from disk automatically.

The following configuration can be found in

DBOptions::wal_dir

sets the directory where RocksDB stores write-ahead log files, which allows WALs to be stored in a separate directory from the actual data.

DBOptions::WAL_ttl_seconds, DBOptions::WAL_size_limit_MB

These two fields affect how quickly archived WALs will be deleted. Nonzero values indicate the time and disk space threshold to trigger archived WAL deletion. See options.h for detailed explanation.

DBOptions::max_total_wal_size

In order to limit the size of WALs, RocksDB uses DBOptions::max_total_wal_size as the trigger of column family flush. Once WALs exceed this size, RocksDB will start forcing the flush of column families to allow deletion of some oldest WALs. This config can be useful when column families are updated at non-uniform frequencies. If there’s no size limit, users may need to keep really old WALs when the infrequently-updated column families hasn’t flushed for a while.

DBOptions::avoid_flush_during_recovery

This config is self explanatory.

DBOptions::manual_wal_flush

DBOptions::wal_filter

Through DBOptions::wal_filter, users can provide a filter object to be invoked while processing WALs during recovery. Note: Not supported in ROCKSDB_LITE mode

WriteOptions::disableWAL

WriteOptions::disableWAL is useful when users rely on other logging or don’t care about data loss.

WAL Filter

Transaction log iterator provides a way to replicate the data between RocksDB instances. Once a WAL is archived due to column family flush, the WAL is archived instead of immediately deleted. The goal is to allow transaction log iterator to keep reading the WAL and send to slave for replay.

Related Pages

Write Ahead Log File Format