A simple implementation of KEVIN.
Two parts: yfs in filesystem layer and yssd in block device layer.
yfs is responsible for translating POSIX syscalls to KV commands.
yssd is responsible for indexing KV objects and transaction management.
Tested environment:
-
release: Linux 5.4.0-135-generic
-
version: #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022
-
platform: x86_64
git clone https://github.com/kabu1204/yfs.git
cd yfs
make
dd if=/dev/zero of=./yssd.data bs=1k count=10240
sudo insmod yssd/yssd.ko yssd_file="./yssd.data"
There's an example code of interacting with YSSD via ioctl()
in user space.
gcc -o t user/main.c
sudo ./t
sudo rmmod yssd
MemIndex is an in-memory rbtree, which stores the <start_key, end_key, ptr> of each data block. These entries are sorted by the start_key and timestamp. When we need to search for KV on disk, we first lookup MemIndex to get blocks that may store the key, and iterate by the time sequence.
Keys in each SSTable are sorted. The key range between SSTables can be overlapped. Each SSTable is 2MB, the reserved space for level0 is 8MB.
When there are 4 SSTables in level0, the compaction thread will merge all SSTables in level0, and appends to level1.
How space freed by GC is reused?
vlog maintains two queues: queue1(from head
to tail1
) and queue2(from end
to tail2
).
- queue1 and queue2 store persisted KVs.
- Flush will try to append new KVs to queue2 first.
- GC always happens on queue1.
- GC will also try to append valid KVs to queue2 first.
When head
catches up with tail1
, in other words, the length of queue1 becomes 0, then queue1 becomes queue2 and queue2 is reset(tail2=end
):
The concurrency control between multiple front-end access threads and back-end flush threads is quite complicated, because of the consistency between in-memory caching (MemTable in LSMTree and HashTable in value log) and persisted data. Even leveldb uses a global mutex to synchronize all user requests.
For simplicity, the basic policy:
- GET/SET/DEL/ITER will try to hold the global mutex.
- Value log manages one write deamon thread to perform flush/GC.
- LSMTree manages another write deamon thread to perform compaction.
So, in YSSD layer, there exists only one access thread. The value log or LSMTree will only need to take care of the safety between access thread and its write thread.