OpenCSD

OpenCSD is an improved version of ZCSD achieving log-structured filesystem (LFS) integration on Zoned Namespaces (ZNS) SSD Computational Storage Devices (CSD). Below is a diagram of the overall architecture as presented to the end user. However, the actual implementation differs due to the use of emulation using technologies such as QEMU, uBPF and SPDK.

Progress Report

provisional

Logbook

Serves as a place to quickly store digital information until it can be refined and processed into the thesis.

Discussion Notes
Correlation POSIX and FUSE
Non-persistent Conditional Extended Attributes in FUSE
RocksDB Integration

Discussion Notes

In order to analyze the exact calls RocksDB makes during its benchmarks tools like strace can be used.
Several methods exist to prototype filesystem integration for CSDs. Among these are using LD_PRELOAD to override system calls such as read(), write() and open(). In this design we choose to use FUSE as this simplifies some of the management and opens the possibility of allowing parallelism while the interface between FUSE and the filesystem calls is still thin enough it can be correlated.
The filesystem can use a snapshot concurrency model with reference counts.
Each file can maintain a special table that associates system calls with CSD kernels. To isolate this behavior (to specific users) we can use filehandles and process IDs (These should be available for most FUSE API calls anyway).
The design should reuse existing operating system interfaces as much as possible. Any new API or call should be well motivated with solid arguments. As an initial idea we can investigate reusing POSIX fadvise.
As requirements our FUSE LFS requires gc and snapshots. It would be nice to have parallelism.
Crossing kernel and userspace boundaries can be achieved using ioctl should the need arise.
As experiment for evaluation we should try to run RocksDB benchmarks on top of the FUSE LFS filesystem while offloading bloom filter computations from SST tables
Filebench benchmark to identify filesystems calls. db_bench from RocksDB, run both with strace
Filesystem design why FUSE, why build from scratch
FUSE, is it enough? filesystem calls, does the API support what we need. Research question.

Correlation POSIX and FUSE

For convenience and reasonings sake a map between common POSIX I/O and FUSE API calls is needed.

POSIX

close
(p/w)read
(p/w)write
lseek
open
fcntl
readdir
posix_fadvise

FUSE

getattr
readdir
open
create
read
write
unlink
statfs

Non-persistent Conditional Extended Attributes in FUSE

Extended filesystem attributes support various namespaces with different behavior and responsibility. Since the underlying filesystem is still tasked with storing these attributes persistently regardless of namespace, the FUSE filesystem is effectively in full control on how to proceed.

Given the already existing standard to use namespaces for permissions roles and behavior an additional namespace is an easy and clean extension. Introducing the process namespace. Non-persistent extended file attributes that are only visible to the process that created them. Effectively an in memory map that lives inside the filesystem instead of in the calling process.

Requirements:

Calling PID must be (made) available to either the high level or low level FUSE API hooks (By observing the -d FUSE output the PID is already available in some contexts just not to the API calls).
A clean method to deregister all hooks is needed, this either needs to be done when the file is released or when the file is reopened using a previously used PID. Using the release / releasedir system calls is difficult as the calling PID is not available in this context.

RocksDB Integration

Required syscalls, by analysis of https://github.com/facebook/rocksdb/blob/7743f033b17bf3e0ea338bc6751b28adcc8dc559/env/io_posix.cc

clearerr (stdio.h)
close (unistd.h)
fclose (stdio.h)
feof (stdio.h)
ferror (stdio.h)
fread_unlocked (stdio.h)
fseek (stdio.h)
fstat (sys/stat.h)
fstatfs (sys/statfs.h / sys/vfs.h)
ioctl (sys/ioctl.h)
major (sys/sysmacros.h)
open (fcntl.h)
posix_fadvise (fcntl.h)
pread (unistd.h)
pwrite (unistd.h)
readahead (fcntl.h + _GNU_SOURCE)
realpath (stdlib.h)
sync_file_range (fcntl.h + _GNU_SOURCE)
write (unistd.h)

Potential issues:

Use of IOCTL
Use of IO_URING

ZCSD

ZCSD is a full stack prototype to execute eBPF programs as if they are running on a ZNS SSD CSD. The entire prototype can be run from userspace by utilizing existing technologies such as SPDK and uBPF. Since consumer ZNS SSDs are still unavailable, QEMU can be used to create a virtual ZNS SSD. The programming and interactive steps of individual components is shown below.

Getting Started

To get started using OpenCSD perform the steps described in the Setup section, followed by the steps in Usage Examples.

Directory structure

qemu-csd - project source files
cmake - small cmake snippets to enable various features
dependencies - project dependencies
docs - doxygen generated source code documentation
documentation - project report written in LaTeX
playground - small toy examples or other experiments
presentation - midterm presentation written in LaTeX
python - python scripts to aid in visualization or measurements
scripts - Shell scripts primarily used by CMake to install project dependencies
tests - unit tests and possibly integration tests
.vscode - Launch targets and settings to debug programs runnings inside QEMU over SSH

Modules

Module	Task
arguments	Parse commandline arguments to relevant components
bpf_helpers	Headers to define functions available from within BPF
bpf_programs	BPF programs ready to run on a CSD using bpf_helpers
fuse_lfs	Log Structured Filesystem in FUSE
nvme_csd	Emulated additional NVMe commands to enable BPF CSDs
nvme_zns	Interface to handle zoned I/O using abstracted backends
spdk_init	Provides SPDK initialization and handles for nvme_csd

Dependencies

This project requires quite some dependencies, the majority will be compiled by the project itself and installed into the build directory. Anything that is not automatically compiled and linked is shown below. Note however, these dependencies are already installed on the image used with QEMU.

Warning Meson must be below version 0.60 due to a bug in DPDK

General
- Linux 5.5 or higher
- compiler with c++17 support
- clang 10 or higher
- cmake 3.18 or higher
- python 3.x
- mesonbuild < 0.60 (pip3 install meson==0.59)
- pyelftools (pip3 install pyelftools)
- ninja
- cunit
Documentation
- doxygen
- LaTeX
Code Coverage
- ctest
- lcov
- gcov
- gcovr
Continuous Integration
- valgrind
Python scripts
- virtualenv

The following dependencies are automatically compiled. Dependencies are preferably linked statically due to the nature of this project. However, for several dependencies this is not possible due to various reason. For Boost, it is because the unit test framework can not be statically linked (easily):

Dependency	Systen	Version
backward	ZCSD	1.6
booost	ZCSD	1.74.0
bpftool	ZCSD	5.14
bpf_load	ZCSD	5.10
dpdk	ZCSD	20.11.0
generic-ebpf	ZCSD	c9cee73
fuse-lfs	OpenCSD	526454b
libbpf	ZCSD	0.5
libfuse	OpenCSD	3.10.5
libbpf-bootstrap	ZCSD	67a29e5
linux	ZCSD	5.14
spdk	ZCSD	21.07
isa-l	ZCSD	spdk-v2.30.0
rocksdb	OpenCSD	6.25.3
qemu	ZCSD	6.1.0
uBPF	ZCSD	9eb26b4

Setup

Building tools and dependencies is done by simply executing the following commands from the root directory. For a more complete list of cmake options see the Configuration section. The environment file sourced with source builddir/qemu-csd/activate needs to be sourced every time. It configures essential include and binary paths to be able to run all the dependencies.

This first section of commands generates targets for host development. Among these is compiling and downloading an image for QEMU. Many parts of this project can be developed on the host but some require being developed on the guest. See the next section for on guest development.

Navigate to the root directory of the project before executing the following instructions. These instructions will compile the dependencies on the host, these include a version of QEMU.

git submodule update --init
mkdir build
cd build
cmake ..
cmake --build .
# Do not use make -j $(nproc), CMake is not able to solve concurrent dependency chain
cmake .. # this prevents re-compiling dependencies on every next make command
source qemu-csd/activate
# run commands and tools as you please for host based development
deactivate

From the root directory execute the following commands for the one time deployment into the QEMU guest. These command assume the previous section of commands has successfully been executed. The QEMU guest will automatically start an SSH server reachable on port 7777. Both the arch and root user can be used to login. In both cases the password is arch as well. By default the QEMU script will only bind the guest ports on localhost to reduce security concerns due to these basic passwords.

git bundle create deploy.git HEAD
cd build/qemu-csd
source activate
qemu-img create -f raw znsssd.img 16777216
# By default qemu will use 4 CPU cores and 8GB of memory
./qemu-start.sh
# Wait for QEMU VM to fully boot... (might take some time)
rsync -avz -e "ssh -p 7777" ../../deploy.git arch@localhost:~/
# Type password (arch)
ssh arch@localhost -p 7777
# Type password (arch)
git clone deploy.git qemu-csd
rm deploy.git
cd qemu-csd
git -c submodule."dependencies/qemu".update=none submodule update --init
mkdir build
cd build
cmake -DENABLE_DOCUMENTATION=off -DIS_DEPLOYED=on ..
# Do not use make -j $(nproc), CMake is not able to solve concurrent dependency chain
cmake --build .

Optionally, if the intend is to develop on the guest and commit code, the git remote can be updated. In that case it also best to generate an ssh keypair, be sure to start an ssh-agent as well as this needs to be performed manually on Arch. The ssh-agent is only valid for as long as the terminal session that started it. Optionally, it can be included in .bashrc.

git remote set-url origin [email protected]:Dantali0n/qemu-csd.git
ssh-keygen -t rsa -b 4096
eval $(ssh-agent) # must be done after each login
ssh-add ~/.ssh/NAME_OF_KEY

Additionally, any python based tools and graphs are generated by execution these additional commands from the root directory. Ensure the previous environment has been deactivated.

virtualenv -p python3 python
cd python
source bin/activate
pip install -r requirements.txt

Running & Debugging

Running and debugging programs is an essential part of development. Often, barrier to entry and clumsy development procedures can severely hinder productivity. Qemu-csd comes with a variety of scripts preconfigured to reduce this initial barrier and enable quick development iterations.

Environment:

Within the build folder will be a qemu-csd/activate script. This script can be sourced using any shell source qemu-csd/activate. This script configures environment variables such as LD_LIBRARY_PATH while also exposing an essential sudo alias: ld-sudo.

The environment variables ensure any linked libraries can be found for targets compiled by Cmake. Additionally, ld-sudo provides a mechanism to start targets with sudo privileges while retaining these environment variables. The environment can be deactivated at any time by executing deactivate.

Usage Examples:

TODO: Generate integer data file, describe qemucsd and spdk-native applications, usage parameters, relevant code segments to write your own BPF program, relevant code segments to extend the prototype.

Debugging on host:

For debugging, several mechanisms are put in place to simplify this process. Firstly, vscode launch files are created to debug applications even though the require environmental configuration. Any application can be launched using the following set of commands:

source qemu-csd/activate
# For when the target does not require sudo
gdbserver localhost:2222 playground/play-boost-locale
# For when the target requires sudo privileges
ld-sudo gdbserver localhost:2222 playground/play-spdk

Note, that when QEMU is running the port 2222 will be used by QEMU instead. The launch targets in .vscode/launch.json can be easily modified or extended.

When gdbserver is running simply open vscode and select the root folder of qemu-csd, navigate to the source files of interest and set breakpoints and select the launch target from the dropdown (top left). The debugging panel in vscode can be accessed quickly by pressing ctrl+shift+d.

Alternative debugging methods such as using gdb TUI or gdbgui should work but will require more manual setup.

Debugging on QEMU:

Debugging on QEMU is similar but uses different launch targets in vscode. This target automatically logs-in using SSH and forwards the gdbserver connection.

More native debugging sessions are also supported. Simply login to QEMU and start the gdbserver manually. On the host connect to this gdbserver and set up substitute-path.

On QEMU:

# from the root of the project folder.
cd  build
source qemu-csd/activate
ld-sudo gdbserver localhost:2000 playground/play-spdk

On host:

gdb
target remote localhost:2222
set substitute-path /home/arch/qemu-csd/ /path/to/root/of/project

More detailed information about development & debugging for this project can be found in the report.

Debugging FUSE:

Debugging FUSE filesystem operations can be done through the compiled filesystem binaries by adding the -f argument. This argument will keep the FUSE filesystem process in the foreground.

gdb ./filesystem
b ...
run -f mountpoint

CMake Configuration

This section documents all configuration parameters that the CMake project exposes and how they influence the project. For more information about the CMake project see the report generated from the documentation folder. Below all parameters are listed along their default value and a brief description.

Parameter	Default	Use case
ENABLE_TESTS	ON	Enables unit tests and adds tests target
ENABLE_CODECOV	OFF	Produce code coverage report \w unit tests
ENABLE_DOCUMENTATION	ON	Produce code documentation using doxygen & LaTeX
ENABLE_PLAYGROUND	OFF	Enables playground targets
ENABLE_LEAK_TESTS	OFF	Add compile parameter for address sanitizer
IS_DEPLOYED	OFF	Indicate that CMake project is deployed in QEMU

For several parameters a more in depth explanation is required, primarily IS_DEPLOYED. This parameter is used as the Cmake project is both used to compile QEMU and configure it as well as compile binaries to run inside QEMU. As a results, the CMake project needs to be able to identify if it is being executed outside of QEMU or not. This is what IS_DEPLOYED facilitates. Particularly, IS_DEPLOYED prevents the compilation of QEMU from source.

Licensing

This project is available under the MIT license, several limitations apply including:

Source files with an alternative author or license statement other than Dantali0n and MIT respectively.
Images subject to copyright or usage terms, such the VU and UvA logo.
CERN beamer template files by Jerome Belleman.
Configuration files that can't be subject to licensing such as doxygen.cnf or .vscode/launch.json

References

SPDK
Zoned storage ZNS SSDs introduction
Getting started with ZNS in QEMU
NVMe ZNS command set 1.0 ratified TP
libnvme presentation
FUSE
- FUSE kermel documentation
BPF
- Linux Kernel related
  - Linux bpf manpage
  - BPF kernel documentation
- BPF-CO-RE & BTF
  - Linux BTF documentation
  - BPF portability and CO-RE Highly Recommended Read
- libbpf / standalone related
  - BCC to libbpf conversion
  - Cilium BPF + XDP reference guide Highly Recommended Read
  - bpf_load
    - Linux Observability with BPF
  - bpf-bootstrap
    - Building BPF applications with libbpf-bootstrap
- userspace BPF execution / interpretation
  - uBPF
  - generic-ebpf
- Various
Repositories / Libraries
- uNVME
- SPDK
Patchsets
- ZNS SSD QEMU patch v11
- ZNS SSD QEMU patch v2

Snippets

SPDK -> now supports ZNS zone append
uNVME
OCSSD
RMDA
libbpf (standalone)
libbpf-tools (BCC)
Linux Kernel:
- p2pdma
- ioctl

Configuration and parameters for QEMU ZNS SSDs:

Usage:
      -device nvme-subsys,id=subsys0
      -device nvme,serial=foo,id=nvme0,subsys=subsys0
      -device nvme,serial=bar,id=nvme1,subsys=subsys0
      -device nvme,serial=baz,id=nvme2,subsys=subsys0
      -device nvme-ns,id=ns1,drive=<drv>,nsid=1,subsys=subsys0  # Shared
      -device nvme-ns,id=ns2,drive=<drv>,nsid=2,bus=nvme2

nvme options:
  addr=<int32>           - Slot and optional function number, example: 06.0 or 06 (default: -1)
  aer_max_queued=<uint32> -  (default: 64)
  aerl=<uint8>           -  (default: 3)
  cmb_size_mb=<uint32>   -  (default: 0)
  discard_granularity=<size> -  (default: 4294967295)
  drive=<str>            - Node name or ID of a block device to use as a backend
  failover_pair_id=<str>
  logical_block_size=<size> - A power of two between 512 B and 2 MiB (default: 0)
  max_ioqpairs=<uint32>  -  (default: 64)
  mdts=<uint8>           -  (default: 7)
  min_io_size=<size>     -  (default: 0)
  msix_qsize=<uint16>    -  (default: 65)
  multifunction=<bool>   - on/off (default: false)
  num_queues=<uint32>    -  (default: 0)
  opt_io_size=<size>     -  (default: 0)
  physical_block_size=<size> - A power of two between 512 B and 2 MiB (default: 0)
  pmrdev=<link<memory-backend>>
  rombar=<uint32>        -  (default: 1)
  romfile=<str>
  serial=<str>
  share-rw=<bool>        -  (default: false)
  smart_critical_warning=<uint8>
  subsys=<link<nvme-subsys>>
  use-intel-id=<bool>    -  (default: false)
  write-cache=<OnOffAuto> - on/off/auto (default: "auto")
  x-pcie-extcap-init=<bool> - on/off (default: true)
  x-pcie-lnksta-dllla=<bool> - on/off (default: true)
  zoned.append_size_limit=<size> -  (default: 131072)

nvme-ns options:
  bootindex=<int32>
  discard_granularity=<size> -  (default: 4294967295)
  drive=<str>            - Node name or ID of a block device to use as a backend
  logical_block_size=<size> - A power of two between 512 B and 2 MiB (default: 0)
  min_io_size=<size>     -  (default: 0)
  nsid=<uint32>          -  (default: 0)
  opt_io_size=<size>     -  (default: 0)
  physical_block_size=<size> - A power of two between 512 B and 2 MiB (default: 0)
  share-rw=<bool>        -  (default: false)
  subsys=<link<nvme-subsys>>
  uuid=<str>             - UUID (aka GUID) or "auto" for random value (default) (default: "auto")
  write-cache=<OnOffAuto> - on/off/auto (default: "auto")
  zoned.cross_read=<bool> -  (default: false)
  zoned.descr_ext_size=<uint32> -  (default: 0)
  zoned.max_active=<uint32> -  (default: 0)
  zoned.max_open=<uint32> -  (default: 0)
  zoned.zone_capacity=<size> -  (default: 0)
  zoned.zone_size=<size> -  (default: 134217728)
  zoned=<bool>           -  (default: false)

Create required images and launch QEMU with ZNS SSD:

qemu-img create -f raw znsssd.img 16777216
qemu-system-x86_64 -name qemucsd -m 4G -cpu Haswell -smp 2 -hda ./arch-qemucsd.qcow2 \
-net user,hostfwd=tcp::7777-:22,hostfwd=tcp::2222-:2000 -net nic \
-drive file=./znsssd.img,id=mynvme,format=raw,if=none \
-device nvme,serial=baz,id=nvme2,zoned.append_size_limit=131072 \
-device nvme-ns,id=ns2,drive=mynvme,nsid=2,logical_block_size=4096,\
physical_block_size=4096,zoned=true,zoned.zone_size=131072,zoned.zone_capacity=131072,\
zoned.max_open=0,zoned.max_active=0,bus=nvme2

Week 1 friday demo scripts:

cat /sys/block/nvme0n1/queue/zoned
cat /sys/block/nvme0n1/queue/chunk_sectors
cat /sys/block/nvme0n1/queue/nr_zones
sudo blkzone report /dev/nvme0n1
sudo nvme zns id-ns /dev/nvme0n1
sudo nvme zns report-zones /dev/nvme0n1
sudo nvme zns open-zone /dev/nvme0n1 -s 0xe40
sudo nvme zns finish-zone /dev/nvme0n1 -s 0xe40
sudo nvme zns report-zones /dev/nvme0n1
sudo nvme zns reset-zone /dev/nvme0n1 -s 0xe40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenCSD

Progress Report

Logbook

Discussion Notes

Correlation POSIX and FUSE

Non-persistent Conditional Extended Attributes in FUSE

RocksDB Integration

ZCSD

Getting Started

Index

Directory structure

Modules

Dependencies

Setup

Running & Debugging

Environment:

Usage Examples:

Debugging on host:

Debugging on QEMU:

Debugging FUSE:

CMake Configuration

Licensing

References

Snippets

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 230 Commits
.vscode		.vscode
cmake		cmake
compsys2021		compsys2021
dependencies		dependencies
documentation		documentation
playground		playground
presentation		presentation
python		python
qemu-csd		qemu-csd
scripts		scripts
tests		tests
thesis		thesis
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
CLUSTER.md		CLUSTER.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
doxygen.cnf		doxygen.cnf

License

dusollee22/qemu-csd

Folders and files

Latest commit

History

Repository files navigation

OpenCSD

Progress Report

Logbook

Discussion Notes

Correlation POSIX and FUSE

Non-persistent Conditional Extended Attributes in FUSE

RocksDB Integration

ZCSD

Getting Started

Index

Directory structure

Modules

Dependencies

Setup

Running & Debugging

Environment:

Usage Examples:

Debugging on host:

Debugging on QEMU:

Debugging FUSE:

CMake Configuration

Licensing

References

Snippets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages