SST identity: If a wrong SST file is transferred to a RocksDB SST file directory, all block checksum will match, but it doesn’t contain the data we want. This usually can be caught by file name and file size mismatch because the chance that two different SST files share the same size is very small, but it may not be a good assumption to make. A full file checksum
SST file checksum can be used when: 1) SST files are copied to other places (e.g., backup, move, or replicate); 2) SST files are stored remotely, 3) ingesting external SST files to RocksDB, 4) verify the SST file when the whole file is read in DB (e.g., compaction).
- where to generate: SST file checksum is generated when a SST file is generated in RocksDB (1. flush Memtable 2. compaction) via writeable_file_writer.
- Flexibility
- options.file_checksum_gen_factory is for upper-layer applications to plugin a specific file checksum generator factory implementation. FileChecksumGenFactory creates a FileChecksumGenerator object for each SST file and it generates the file checksum for a certain file. The object IS NOT shared, so FileChecksumGenerator can store the intermediate data during checksum generating in the object and the implementation does not need to be thread safe.
- The checksum value is std::string, any other checksum value type such as uint32, int, uint64 can be easily converted to a string type. checksum function name is also a string.
- what should be stored
- the checksum value if self.
- the name of the checksum function: there are many different checksum functions. Therefore, the checksum value should be pair with its function name. Otherwise, either RocksDB or the application is not able to make meaningful checksum check.
- where to store the checksums
- we store the checksum function name and checksum value in vstorage as part of FileMetadata.
- Tools: Dump the checksum of all SST file from MANIFEST in a map (in ldb)
To implement a customized checksum generator factory, the application needs to implement a checksum generator. For example:
And also the checksum generator factory, for example:
In the current stage, we do not provide a public db interface to list or get the checksum value and checksum function name. However, there are two ways that user can get the checksum.
- by calling , checksum value and checksum function name are included in the LiveFileMetadata. The checksum information is from vstorage in memory.
- If the db is not running, or if user only has the Manifest file, we can use ldb tool to print a list of checksum with the file name. It will print a list of SST file wit checksum information as the following format:[file_number, checksum_function_name, checksum value]
We plan to work on following:
- Take advantage of SST file checksum with backup engine.
- Work with some use cases to apply the full file checksum.
- Implement WAL file checksum and store them in manifest too.