The list of files in an LSM tree is kept in a data structure called version
. In the end of a compaction or a mem table flush, a new version
is created for the updated LSM tree. At one time, there is only one “current” version
that represents the files in the up-to-date LSM tree. New get
requests or new iterators will use the current version
through the whole read process or life cycle of iterator. All version
s used by get
or iterators need to be kept. An out-of-date version
that is not used by any get
or iterator needs to be dropped. All files not used by any other version need to be deleted. For example,
If we start with a version
with three files:
and now an iterator is created with it:
files on disk: f1, f2, f3
Now a compaction happened compact f2
, f3
and f4
into a new file f5
with a new v3
created:
v3={f1, f5} (current)
v1={f1, f2, f3} (used by iterator1)
files on disk: f1, f2, f3, f4, f5
Now v2
is neither up-to-date, nor used by anyone, so it qualifies to be removed, together with f4
. While v1
still cannot be removed for it is still needed by iterator1
:
Assuming now iterator1
is destroyed:
v1={f1, f2, f3}
files on disk: f1, f2, f3, f5
This logic is implemented using reference counts. Both of an SST file and a version
have a reference count. While we create a , we incremented the reference counts for all files. If a version
is not needed, all files’ of the version have their reference counts decremented. If a file’s reference count drops to 0, the file can be deleted.
In a similar way, each version
has a reference count. When a version
is created, it is an up-to-date one, so it has reference count 1. If the version
is not up-to-date anymore, its reference count is decremented. Anyone who needs to work on the version
has its reference count incremented by 1, and decremented by 1 when finishing using it. When a version
’s reference count is 0, it should be removed. Either a version
is up-to-date or someone is using it, its reference count is not 0, so it will be kept.
Sometimes a reader holds reference of a version
directly, like the source version
for a compaction. More often, a reader holds it indirectly through a data structure called super version
, which holds reference counts for list of mem tables and a version
— a whole view of the DB. A reader only needs to increase and decrease one reference count, while it is the super version that holds the reference count of version
. It also enables further optimization to avoid locking for the reference counting in most of the time. See the .