Skip to content
/ remixdb Public

RemixDB: A read- and write-optimized concurrent KV store. Fast point and range queries. Extremely low write-amplification.

License

Notifications You must be signed in to change notification settings

wuxb45/remixdb

Repository files navigation

REMIX and RemixDB

The REMIX data structure was introduced in paper "REMIX: Efficient Range Query for LSM-trees", FAST'21.

This repository maintains a reference implementation of the REMIX index data structure, as well as a thread-safe embedded key-value store implementation, namely RemixDB. It compiles on recent Linux/FreeBSD/MacOS and supports x86_64 and AArch64 CPUs.

Limitations of the Current Implementation

  • KV size: The maximum key+value size is capped at 65500 bytes. This roughly corresponds to the 64KB block size limit. TODO: store every huge value in a separate file and record the file name as the value of the KV pair in RemixDB.

Optimization: Minimizing REMIX (Re-)Building Cost

This implementation employs an optimization to minimize the REMIX building cost. This optimization improves the throughput by 2x (0.96MOPS vs. 0.50MOPS) in a random-write experiment, compared to the implementation described in the REMIX paper. Configuration: klen=16; vlen=120; 2.02 billion KVs; 256GB valid KV data; single-threaded loading in random order; no compression.

When creating a new table file, RemixDB can create a copy of all the keys in the table. Specificially, it encodes all the keys (without values) in sorted order using prefix compression, which creates a Compressed Keys Block (CKB). The CKB is stored at the end of the table file. This feature can be freely turned on and off. There is no compatibility issue when tables with and without the CKB are used together.

When creating a new REMIX, the building process will check if every input table contains a CKB. If true, the process will build the new REMIX using these CKBs. It also leverages the existing REMIX to avoid unncecssary key comparisons. In this way, the new REMIX will be created by reading the old REMIX and the CKBs, without accessing the key-value data blocks of the table files.

In a running system the old REMIX structures are usually cache-resident. The CKBs are only used for REMIX building, which are read into memory in batch, and discarded once the building is finished.

A CKB is often much smaller than the original key-value data block, unless the system manages huge keys with small values. Suppose the average CKB size is 10% of the average key-value data block size, this optimization trades 10% more write I/O and storage space usage for a 90% reduction of read I/O during REMIX building.

remixdb_open opens/creates a remixdb with the optimization turned on. Each newly created sstable will have the CKB. You should use remixdb_open unless it's absolutely necessary to save a little bit disk space. remixdb_open_compact opens a remixdb with the optimization turned off. Each newly created sstable will not contain a CKB. A store created by one of these functions can be safely opened by the other function.

TODO: compress the CKB with lz4/zstd/etc.?

Getting Started

RemixDB by default uses liburing (io_uring) and thus requires a Linux kernel >= 5.1. It also works with POSIX AIO on all the supported platforms but the performance can be negatively affected.

clang is the default compiler. It usually produces faster code than GCC. To use GCC:

$ make CCC=gcc

If jemalloc is available and you prefer to use it, use M=j with make:

$ make M=j

The xdbdemo.c contains sample code that uses the remixdb_* functions. These functions present a clean programming interface without using special data types or structures.

xdbdemo

To compile and run the demo code:

$ make xdbdemo.out
$ ./xdbdemo.out

xdbtest

xdbtest is a stress test program that uses the remixdb_* functions.

Run with a 4GB block cache, 4GB MemTables, and a dataset with 32 million KVs:

$ make xdbtest.out
$ ./xdbtest.out /tmp/xdbtest 4096 25 30

If your memory is small, run with smaller sizes (a 256MB block cache, 256MB Memtables, and 1 million KVs):

$ ./xdbtest.out /tmp/xdbtest 256 20 30

The first run of xdbtest.out should always show stale=0. If you run it again without deleting /tmp/xdbtest, it will show non-zero stale numbers at the beginning but it will quickly drop and eventually reach zero.

xdbexit

xdbexit is a simple program testing crash-recovery. It inserts some new keys and calls remixdb_sync() to make all buffered data persist in the WAL. Then it immediately calls exit() without doing any clean-up. Run it repeatedly. In each run it should show that all the previously inserted KVs are found.

libremixdb.so

To use remixdb as a shared library, run make libremixdb.so and make install. A PKGBUILD (for Archlinux's pacman) is included as an example packaging script.

About

RemixDB: A read- and write-optimized concurrent KV store. Fast point and range queries. Extremely low write-amplification.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages