YugaByteDB is a transactional, high-performance database for building distributed cloud services developed by YugaByte.
Jepsen is a testing framework for networked databases, developed by Kyle 'Aphyr' Kingsbury to exercise and validate the claims to consistency made by database developers or their documentation.
The tests run concurrent operations on different nodes in a YugaByteDB cluster and checks that the operations preserve the consistency properties defined in each test. During the tests, various combinations of nemeses can be added to interfere with the database operations and exercise the database's consistency protocols.
Quickstart:
To run a single workload, use lein run test
:
lein run test -o debian --version 1.2.10.0 --workload ycql/counter --nemesis partition
This command runs the set test against version 1.2.10.0, with network partitions, assuming nodes run Debian Jessie.
To run a full suite of tests, with various workloads and nemeses, use lein run test-all
lein run test-all -o debian --version 1.2.10.0 --url https://downloads.yugabyte.com/yugabyte-ce-1.2.10.0-linux.tar.gz --concurrency 4n --time-limit 300 --only-workloads-expected-to-pass
Here, we're testing a specific pre-release tarball of version 1.1.15.0-b16. We're running 4 clients per node, running for 300 seconds per test, and constraining our run to only those workloads we think should pass.
The following workloads are available with --workload
(or -w
).
Workloads have format <api-name>/<test-name>
, where <api-name>
is either ycql
or ysql
.
The following tests are available for both YCQL and YSQL:
counter
- concurrent counter increments.set
- inserts single records and concurrently reads all of them back.bank
- concurrent transfers between rows of a shared table.long-fork
- looks for a snapshot isolation violation due to incompatible read orders.single-key-acid
- each workers group is doing concurrent read, write, update-if operations on on their designated row.multi-key-acid
- concurrent reads and write batches to a table with two-column composite key.
YCQL-specific tests:
set-index
- like set, but reads from a small pool of indices
YSQL-specific tests:
bank-multitable
- like bank, but across different tables.
The following nemeses are available with --nemesis
. Nemeses can be combined
with commas, like --nemesis partition,clock-skew
:
none
- no failuresclock-skew
- jumps and strobes in clocks, up to hundreds of secondspartition
- all kinds of network partitionspartition-half
- cuts the network into two halves, one with a majoritypartition-one
- isolate a single nodepartition-ring
- every node sees a majority, but no node sees the same setkill
- kills and restarts tservers and masterskill-tserver
- kill and restart tserverskill-master
- kill and restart mastersstop
- stops and restarts tservers and mastersstop-tserver
- stops and restarts tserversstop-master
- stops and restarts masterspause
- pauses (with SIGSTOP) and resumes (with SIGCONT) tservers and masterspause-tserver
- pauses tserverspause-master
- pauses masters
YugaByte's original version of these tests ran on CentOS 7, and used a
pre-installed Enterprise Edition cluster. We've preserved those codepaths in
this version of the tests (see jepsen.auto
), but they haven't been tested,
and likely need some additional polish to work.
Install https://leiningen.org:
mkdir ~/bin
curl https://raw.githubusercontent.com/technomancy/leiningen/stable/bin/lein -o /home/centos/bin/lein \
&& chmod +x ~/bin/lein
Add ~/bin
to $PATH
.
Install Cassaforte driver:
mkdir ~/code
git clone https://github.com/YugaByte/cassaforte ~/code/cassaforte
cd ~/code/cassaforte
git checkout driver-3.0-yb
lein install
Install gnuplot:
sudo yum install gnuplot
- Create YugaByteDB cluster with 5 nodes and replication factor of 3.
- Create text file
~/code/jepsen/nodes
and list all cluster nodes there - one per line, for example:
yb-test-jepsen-n1
yb-test-jepsen-n2
yb-test-jepsen-n3
yb-test-jepsen-n4
yb-test-jepsen-n5
- Setup cluster nodes for running Jepsen tests:
~/code/jepsen/yugabyte/setup-jepsen.sh
These wrapper scripts were written for YugaByte's version of these tests, and may no longer work correctly. They're preserved here in case anyone would like to use them going forward. They aren't necessary to run the tests; the CLI interface for these tests can run all tests automatically.
All commands described below should be run in ~/code/jepsen/yugabyte
directory.
In order to display help and see available tests and nemeses:
lein run test --help
To run test with specific nemesis, for example start-stop-master
:
lein run test --nodes-file ~/code/jepsen/nodes --nemesis start-stop-master
To run all tests one by one under each nemesis in infinite loop:
./run-jepsen.py
This will also classify test results by categories and put them into ~/code/jepsen/yugabyte/results-sorted
sub-directories:
- ok
- timed-out - test run (including analysis phase) took more than time limit defined in
run-jepsen.py
. - no-history - file with operations history is absent.
- valid-unknown - test results checker wasn't able to determine whether results are valid.
- invalid - history of operations is inconsisent.