- Increase space amplification, which could lead to running out of disk space;
- Increase read amplification, significantly degrading read performance.
The idea is to slow down incoming writes to the speed that the database can handle. However, sometimes the database can be too sensitive to a temporary write burst, or underestimate what the hardware can handle, so that you may get unexpected slowness or query timeouts.
To find out whether your DB is suffering from write stalls, you can look at:
- Compaction stats found in LOG file.
Causes of Write Stalls
Stalls may be triggered for the following reasons:
Too many memtables. When the number of memtables waiting to flush is greater or equal to , writes are fully stopped to wait for flush finishes. In addition, if
max_write_buffer_number
is greater than 3, and the number of memtables waiting for flush is greater than or equal tomax_write_buffer_number - 1
, writes are stalled. In these cases, you will get info logs in LOG file similar to:Too many level-0 SST files. When the number of level-0 SST files reaches
level0_slowdown_writes_trigger
, writes are stalled. When the number of level-0 SST files reacheslevel0_stop_writes_trigger
, writes are fully stopped to wait for level-0 to level-1 compaction reduce the number of level-0 files. In these cases, you will get info logs in LOG file similar toToo many pending compaction bytes. When estimated bytes pending for compaction reaches
soft_pending_compaction_bytes
, writes are stalled. When estimated bytes pending for compaction reaches , write are fully stopped to wait for compaction. In these cases, you will get info logs in LOG file similar to
Whenever stall conditions are triggered, RocksDB will reduce write rate to delayed_write_rate
, and could possibly reduce write rate to even lower than delayed_write_rate
if estimated pending compaction bytes accumulates. One thing worth to note is that slowdown/stop triggers and pending compaction bytes limit are per-column family, and write stalls apply to the whole DB, which means if one column family triggers write stall, the whole DB will be stalled.
If a write slowdown/stop is triggered, application threads that do Put/Merge/Delete etc. will block. If a slowdown is in effect, each write will sleep for sometime (typically 1ms) before proceeding. If writes are stalled, the thread can be blocked indefinitely. If blocking the thread is not desirable, applications can avoid it by setting no_slowdown = true
in WriteOptions
. All writes with this option will be immediately returned with Status::Incomplete()
if they could not be completed due to a slowdown/stall.
Internally, RocksDB tries to batch write requests from different threads together before writing to the WAL in order to increase performance. However, writes with set will not be batched with writes that don’t have it, which might result in a slight performance hit.
Write Stall mitigation
There are multiple options you can tune to mitigate write stalls. If you have some workload which can tolerate write stalls and some don’t, you can set some writes to to avoid stalling in those latency-critical writes.
If write stalls are triggered by pending flushes, you can try:
- Increase
max_write_buffer_number
to have smaller memtable to flush.
- Increase
max_background_compactions
to have more compaction threads. - Increase
write_buffer_size
to have large memtable, to reduce write amplification. - Increase
min_write_buffer_number_to_merge
.
You can also set stop/slowdown triggers and pending compaction bytes limits to huge number to avoid hitting write stall. Also take a look at “What’s the fastest way to load data into RocksDB?” in our FAQ if you are bulk loading data to RocksDB.