Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
wuxb45 authored Feb 5, 2021
1 parent 0faa77a commit 9274b8d
Showing 1 changed file with 13 additions and 13 deletions.
26 changes: 13 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,27 +22,27 @@ TODO: implement the full WAL mechainisms to provide the same log-recovery semant
This implementation employs an optimization to minimize the REMIX building cost.

When creating a new table file, RemixDB can create a copy of all the keys in the table.
Specificially, it encodes all the keys (without values) in sorted order using prefix compression, which creates a *compressed keys block*.
The compressed keys block is stored at the end of the table file.
This feature can be freely turned on and off. There is no compatibility issue when tables with and without the compressed keys block are used together.
Specificially, it encodes all the keys (without values) in sorted order using prefix compression, which creates a *Compressed Keys Block* (*CKB*).
The CKB is stored at the end of the table file.
This feature can be freely turned on and off. There is no compatibility issue when tables with and without the CKB are used together.

When creating a new REMIX, the building process will check if all the input tables contain such an compressed keys block.
If true, the process will build the new REMIX using these blocks. It also leverages the existing REMIX to avoid unncecssary key comparisons.
In this way, the new REMIX will be created by reading the old REMIX and the compressed keys blocks, without accessing the key-value data blocks of the table files.
When creating a new REMIX, the building process will check if every input table contains a CKB.
If true, the process will build the new REMIX using these CKBs. It also leverages the existing REMIX to avoid unncecssary key comparisons.
In this way, the new REMIX will be created by reading the old REMIX and the CKBs, without accessing the key-value data blocks of the table files.

In a running system the old REMIX structures are usually cache-resident.
The compressed keys blocks are only used for REMIX building, which are read into memory in batch, and discarded once the building is finished.
The CKBs are only used for REMIX building, which are read into memory in batch, and discarded once the building is finished.

An compressed keys block is often much smaller than the original key-value data block, unless the system manages huge keys with small values.
Suppose the average compressed keys block size is 10% of the key-value data block,
A CKB is often much smaller than the original key-value data block, unless the system manages huge keys with small values.
Suppose the average CKB size is 10% of the key-value data block,
this optimization trades 10% more write I/O and storage space usage for a 90% reduction of read I/O during REMIX building.

`remixdb_open` opens/creates a remixdb with the optimization turned on. New sstables will have the compressed keys block.
You should use `remixdb_open` unless you're sure its absolutely necessary to save a little bit disk space.
`remixdb_open_compact` opens a remixdb with the optimization turned off. New sstables will not have the compressed keys block.
`remixdb_open` opens/creates a remixdb with the optimization turned on. Each newly created sstable will have the CKB.
You should use `remixdb_open` unless it's absolutely necessary to save a little bit disk space.
`remixdb_open_compact` opens a remixdb with the optimization turned off. Each newly created sstable will not contain a CKB.
A store created by one of these functions can be safely opened by the other function.

TODO: compress the compressed keys block with lz4/zstd/etc.?
TODO: compress the CKB with lz4/zstd/etc.?

# Getting Started

Expand Down

0 comments on commit 9274b8d

Please sign in to comment.