rocksdb::SstFileWriter can be used to create SST file. After creating a SstFileWriter object you can open a file, insert rows into it and finish.

This is an example of how to create SST file in

Now we have our SST file located at /home/usr/file1.sst.

  • Options passed to SstFileWriter will be used to figure out the table type, compression options, etc that will be used to create the SST file.
  • The Comparator that is passed to the SstFileWriter must be exactly the same as the Comparator used in the DB that this file will be ingested into.
  • Rows must be inserted in a strictly increasing order.You can learn more about the SstFileWriter by checking include/rocksdb/sst_file_writer.h

Ingesting an SST files is simple, all you need to do is to call DB::IngestExternalFile() and pass the file paths as a vector of

  1. IngestExternalFileOptions ifo;
  2. Status s = db_->IngestExternalFile({"/home/usr/file1.sst", "/home/usr/file2.sst"}, ifo);
  3. if (!s.ok()) {
  4. printf("Error while adding file %s and %s, Error %s\n",
  5. file_path1.c_str(), file_path2.c_str(), s.ToString().c_str());
  6. return 1;
  7. }

You can learn more by checking DB::IngestExternalFile() and DB::IngestExternalFiles() in . DB::IngestExternalFiles() ingests a collection of external SST files for multiple column families following the 'all-or-nothing' property. If the function returns Status::OK, then all files are ingested successfully for all column families of interest. If the function returns non-OK status, then none of the files are ingested into none of the column families.

When you call DB::IngestExternalFile() We will

  • Copy or link the file into the DB directory
  • If file key range overlap with memtable key range, flush memtable
  • Assign the file to the best level possible in the LSM-tree
  • Assign the file a global sequence number
  • The file can fit in the level

  • The file key range don't overlap with any keys in upper layers
  • The file don't overlap with the outputs of running compactions going to this levelGlobal sequence number

Files created using SstFileWriter have a special field in their metablock called global sequence number, when this field is used, all the keys inside this file start acting as if they have such sequence number. When we ingest a file, we assign a sequence number to all the keys in this file. Before RocksDB 5.16, RocksDB always updates this global sequence number field in the metablock of the SST file using a random write. From RocksDB 5.16, RocksDB enables user to choose whether to update this field via IngestExternalFileOptions::write_global_seqno. If this field is false during ingestion, then RocksDB uses the information in MANIFEST to deduce the global sequence number when accessing the file. This can be useful if the underlying file system does not support random write or if users wish to minimize sync operations. If backward compatibility is the concern, set this option to true so that external SST files ingested by RocksDB 5.16 or newer can be opened by RocksDB 5.15 or older.

Starting from 5.5, IngestExternalFile() will load a list of external SST files with ingestion behind supported, which means duplicate keys will be skipped if . In this mode we will always ingest in the bottom mode level. Duplicate keys in the file being ingested to be skipped rather than overwriting existing data under that key.

Back-fill of some historical data in the database without over-writing existing newer version of data. This option could only be used if the DB has been running with allow_ingest_behind=true since the dawn of time.All files will be ingested at the bottommost level with seqno=0.