Skip to content

Commit

Permalink
Make it easier to start using RocksDB
Browse files Browse the repository at this point in the history
Summary:
This diff is addressing multiple things with a single goal -- to make RocksDB easier to use:
* Add some functions to Options that make RocksDB easier to tune.
* Add example code for both simple RocksDB and RocksDB with Column Families.
* Rewrite our README.md

Regarding Options, I took a stab at something we talked about for a long time:
* https://www.facebook.com/groups/rocksdb.dev/permalink/563169950448190/

I added functions:
* IncreaseParallelism() -- easy, increases the thread pool and max_background_compactions
* OptimizeLevelStyleCompaction(memtable_memory_budget) -- the easiest way to optimize rocksdb for less stalls with level style compaction. This is very likely not ideal configuration. Feel free to suggest improvements. I used some of Mark's suggestions from here: facebook#54
* OptimizeUniversalStyleCompaction(memtable_memory_budget) -- optimize for universal compaction.

Test Plan: compiled rocksdb. ran examples.

Reviewers: dhruba, MarkCallaghan, haobo, sdong, yhchiang

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18621
  • Loading branch information
igorcanadi committed May 10, 2014
1 parent acd17fd commit 038a477
Show file tree
Hide file tree
Showing 9 changed files with 246 additions and 82 deletions.
82 changes: 0 additions & 82 deletions README

This file was deleted.

24 changes: 24 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

RocksDB is developed and maintained by Facebook Database Engineering Team.
It is built on on earlier work on LevelDB by Sanjay Ghemawat ([email protected])
and Jeff Dean ([email protected])

This code is a library that forms the core building block for a fast
key value server, especially suited for storing data on flash drives.
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor (SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See [doc/index.html](https://github.com/facebook/rocksdb/blob/master/doc/index.html) and
[github wiki](https://github.com/facebook/rocksdb/wiki) for more explanation.

The public interface is in `include/`. Callers should not include or
rely on the details of any other header files in this package. Those
internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/
2 changes: 2 additions & 0 deletions examples/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
column_families_example
simple_example
9 changes: 9 additions & 0 deletions examples/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
include ../build_config.mk

all: simple_example column_families_example

simple_example: simple_example.cc
$(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS)

column_families_example: column_families_example.cc
$(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS)
1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Compile RocksDB first by executing `make static_lib` in parent dir
72 changes: 72 additions & 0 deletions examples/column_families_example.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
// This source code is licensed under the BSD-style license found in the
// LICENSE file in the root directory of this source tree. An additional grant
// of patent rights can be found in the PATENTS file in the same directory.
#include <cstdio>
#include <string>
#include <vector>

#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"

using namespace rocksdb;

std::string kDBPath = "/tmp/rocksdb_column_families_example";

int main() {
// open DB
Options options;
options.create_if_missing = true;
DB* db;
Status s = DB::Open(options, kDBPath, &db);
assert(s.ok());

// create column family
ColumnFamilyHandle* cf;
s = db->CreateColumnFamily(ColumnFamilyOptions(), "new_cf", &cf);
assert(s.ok());

// close DB
delete cf;
delete db;

// open DB with two column families
std::vector<ColumnFamilyDescriptor> column_families;
// have to open default column familiy
column_families.push_back(ColumnFamilyDescriptor(
kDefaultColumnFamilyName, ColumnFamilyOptions()));
// open the new one, too
column_families.push_back(ColumnFamilyDescriptor(
"new_cf", ColumnFamilyOptions()));
std::vector<ColumnFamilyHandle*> handles;
s = DB::Open(DBOptions(), kDBPath, column_families, &handles, &db);
assert(s.ok());

// put and get from non-default column family
s = db->Put(WriteOptions(), handles[1], Slice("key"), Slice("value"));
assert(s.ok());
std::string value;
s = db->Get(ReadOptions(), handles[1], Slice("key"), &value);
assert(s.ok());

// atomic write
WriteBatch batch;
batch.Put(handles[0], Slice("key2"), Slice("value2"));
batch.Put(handles[1], Slice("key3"), Slice("value3"));
batch.Delete(handles[0], Slice("key"));
s = db->Write(WriteOptions(), &batch);
assert(s.ok());

// drop column family
s = db->DropColumnFamily(handles[1]);
assert(s.ok());

// close db
for (auto handle : handles) {
delete handle;
}
delete db;

return 0;
}
41 changes: 41 additions & 0 deletions examples/simple_example.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
// This source code is licensed under the BSD-style license found in the
// LICENSE file in the root directory of this source tree. An additional grant
// of patent rights can be found in the PATENTS file in the same directory.
#include <cstdio>
#include <string>

#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"

using namespace rocksdb;

std::string kDBPath = "/tmp/rocksdb_simple_example";

int main() {
DB* db;
Options options;
// Optimize RocksDB. This is the easiest way to get RocksDB to perform well
options.IncreaseParallelism();
options.OptimizeLevelStyleCompaction();
// create the DB if it's not already present
options.create_if_missing = true;

// open DB
Status s = DB::Open(options, kDBPath, &db);
assert(s.ok());

// Put key-value
s = db->Put(WriteOptions(), "key", "value");
assert(s.ok());
std::string value;
// get value
s = db->Get(ReadOptions(), "key", &value);
assert(s.ok());
assert(value == "value");

delete db;

return 0;
}
33 changes: 33 additions & 0 deletions include/rocksdb/options.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,29 @@ enum UpdateStatus { // Return status For inplace update callback
struct Options;

struct ColumnFamilyOptions {
// Some functions that make it easier to optimize RocksDB

// Use this if you don't need to keep the data sorted, i.e. you'll never use
// an iterator, only Put() and Get() API calls
ColumnFamilyOptions* OptimizeForPointLookup();

// Default values for some parameters in ColumnFamilyOptions are not
// optimized for heavy workloads and big datasets, which means you might
// observe write stalls under some conditions. As a starting point for tuning
// RocksDB options, use the following two functions:
// * OptimizeLevelStyleCompaction -- optimizes level style compaction
// * OptimizeUniversalStyleCompaction -- optimizes universal style compaction
// Universal style compaction is focused on reducing Write Amplification
// Factor for big data sets, but increases Space Amplification. You can learn
// more about the different styles here:
// https://github.com/facebook/rocksdb/wiki/Rocksdb-Architecture-Guide
// Note: we might use more memory than memtable_memory_budget during high
// write rate period
ColumnFamilyOptions* OptimizeLevelStyleCompaction(
uint64_t memtable_memory_budget = 512 * 1024 * 1024);
ColumnFamilyOptions* OptimizeUniversalStyleCompaction(
uint64_t memtable_memory_budget = 512 * 1024 * 1024);

// -------------------
// Parameters that affect behavior

Expand Down Expand Up @@ -336,6 +359,7 @@ struct ColumnFamilyOptions {
// With bloomfilter and fast storage, a miss on one level
// is very cheap if the file handle is cached in table cache
// (which is true if max_open_files is large).
// Default: true
bool disable_seek_compaction;

// Puts are delayed 0-1 ms when any level has a compaction score that exceeds
Expand Down Expand Up @@ -546,6 +570,15 @@ struct ColumnFamilyOptions {
};

struct DBOptions {
// Some functions that make it easier to optimize RocksDB

// By default, RocksDB uses only one background thread for flush and
// compaction. Calling this function will set it up such that total of
// `total_threads` is used. Good value for `total_threads` is the number of
// cores. You almost definitely want to call this function if your system is
// bottlenecked by RocksDB.
DBOptions* IncreaseParallelism(int total_threads = 16);

// If true, the database will be created if it is missing.
// Default: false
bool create_if_missing;
Expand Down
64 changes: 64 additions & 0 deletions util/options.cc
Original file line number Diff line number Diff line change
Expand Up @@ -480,4 +480,68 @@ Options::PrepareForBulkLoad()
return this;
}

// Optimization functions
ColumnFamilyOptions* ColumnFamilyOptions::OptimizeForPointLookup() {
prefix_extractor.reset(NewNoopTransform());
BlockBasedTableOptions block_based_options;
block_based_options.index_type = BlockBasedTableOptions::kBinarySearch;
table_factory.reset(new BlockBasedTableFactory(block_based_options));
memtable_factory.reset(NewHashLinkListRepFactory());
return this;
}

ColumnFamilyOptions* ColumnFamilyOptions::OptimizeLevelStyleCompaction(
uint64_t memtable_memory_budget) {
write_buffer_size = memtable_memory_budget / 4;
// merge two memtables when flushing to L0
min_write_buffer_number_to_merge = 2;
// this means we'll use 50% extra memory in the worst case, but will reduce
// write stalls.
max_write_buffer_number = 6;
// start flushing L0->L1 as soon as possible. each file on level0 is
// (memtable_memory_budget / 2). This will flush level 0 when it's bigger than
// memtable_memory_budget.
level0_file_num_compaction_trigger = 2;
// doesn't really matter much, but we don't want to create too many files
target_file_size_base = memtable_memory_budget / 8;
// make Level1 size equal to Level0 size, so that L0->L1 compactions are fast
max_bytes_for_level_base = memtable_memory_budget;

// level style compaction
compaction_style = kCompactionStyleLevel;

// only compress levels >= 2
compression_per_level.resize(num_levels);
for (int i = 0; i < num_levels; ++i) {
if (i < 2) {
compression_per_level[i] = kNoCompression;
} else {
compression_per_level[i] = kSnappyCompression;
}
}
return this;
}

ColumnFamilyOptions* ColumnFamilyOptions::OptimizeUniversalStyleCompaction(
uint64_t memtable_memory_budget) {
write_buffer_size = memtable_memory_budget / 4;
// merge two memtables when flushing to L0
min_write_buffer_number_to_merge = 2;
// this means we'll use 50% extra memory in the worst case, but will reduce
// write stalls.
max_write_buffer_number = 6;
// universal style compaction
compaction_style = kCompactionStyleUniversal;
compaction_options_universal.compression_size_percent = 80;
return this;
}

DBOptions* DBOptions::IncreaseParallelism(int total_threads) {
max_background_compactions = total_threads - 1;
max_background_flushes = 1;
env->SetBackgroundThreads(total_threads, Env::LOW);
env->SetBackgroundThreads(1, Env::HIGH);
return this;
}

} // namespace rocksdb

0 comments on commit 038a477

Please sign in to comment.