Skip to content

Commit

Permalink
Added memtable garbage statistics (facebook#8411)
Browse files Browse the repository at this point in the history
Summary:
**Summary**:
2 new statistics counters are added to RocksDB: `MEMTABLE_PAYLOAD_BYTES_AT_FLUSH` and `MEMTABLE_GARBAGE_BYTES_AT_FLUSH`. The former tracks how many raw bytes of useful data are present on the memtable at flush time, whereas the latter is tracks how many of these raw bytes are considered garbage, meaning that they ended up not being imported on the SSTables resulting from the flush operations.

**Unit test**: run `make db_flush_test -j$(nproc); ./db_flush_test` to run the unit test.
This executable includes 3 tests, that test support and correct stat calculations for workloads with inserts, deletes, and DeleteRanges. The parameters are set such that the workloads are performed on a single memtable, and a single SSTable is created as a result of the flush operation. The flush operation is manually called in the test file. The tests verify that the values of these 2 statistics counters introduced in this PR  can be exactly predicted, showing that we have a full understanding of the underlying operations.

**Performance testing**:
`./db_bench -statistics -benchmarks=fillrandom -num=10000000` repeated 10 times.
Timing done using "date" function in a bash script.
_Results_:
Original Rocksdb fork: mean 66.6 sec, std 1.18 sec.
This feature branch: mean 67.4 sec, std 1.35 sec.

Pull Request resolved: facebook#8411

Reviewed By: akankshamahajan15

Differential Revision: D29150629

Pulled By: bjlemaire

fbshipit-source-id: 7b3c2e86d50c6aa34fa50fd134282eacb543a5b1
  • Loading branch information
bjlemaire authored and facebook-github-bot committed Jun 18, 2021
1 parent 5ba1b6e commit e817bc9
Show file tree
Hide file tree
Showing 11 changed files with 455 additions and 3 deletions.
1 change: 1 addition & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Rocksdb Change Log
## Unreleased
### Behavior Changes
* Added two additional tickers, MEMTABLE_PAYLOAD_BYTES_AT_FLUSH and MEMTABLE_GARBAGE_BYTES_AT_FLUSH. These stats can be used to estimate the ratio of "garbage" (outdated) bytes in the memtable that are discarded at flush time.
* Added API comments clarifying safe usage of Disable/EnableManualCompaction and EventListener callbacks for compaction.

### Bug Fixes
Expand Down
25 changes: 24 additions & 1 deletion db/builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ Status BuildTable(
int job_id, const Env::IOPriority io_priority,
TableProperties* table_properties, Env::WriteLifeTimeHint write_hint,
const std::string* full_history_ts_low,
BlobFileCompletionCallback* blob_callback, uint64_t* num_input_entries) {
BlobFileCompletionCallback* blob_callback, uint64_t* num_input_entries,
uint64_t* memtable_payload_bytes, uint64_t* memtable_garbage_bytes) {
assert((tboptions.column_family_id ==
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily) ==
tboptions.column_family_name.empty());
Expand All @@ -89,9 +90,12 @@ Status BuildTable(
new CompactionRangeDelAggregator(&tboptions.internal_comparator,
snapshots));
uint64_t num_unfragmented_tombstones = 0;
uint64_t total_tombstone_payload_bytes = 0;
for (auto& range_del_iter : range_del_iters) {
num_unfragmented_tombstones +=
range_del_iter->num_unfragmented_tombstones();
total_tombstone_payload_bytes +=
range_del_iter->total_tombstone_payload_bytes();
range_del_agg->AddTombstones(std::move(range_del_iter));
}

Expand Down Expand Up @@ -254,6 +258,25 @@ Status BuildTable(
meta->marked_for_compaction = builder->NeedCompact();
assert(meta->fd.GetFileSize() > 0);
tp = builder->GetTableProperties(); // refresh now that builder is finished
if (memtable_payload_bytes != nullptr &&
memtable_garbage_bytes != nullptr) {
const CompactionIterationStats& ci_stats = c_iter.iter_stats();
uint64_t total_payload_bytes = ci_stats.total_input_raw_key_bytes +
ci_stats.total_input_raw_value_bytes +
total_tombstone_payload_bytes;
uint64_t total_payload_bytes_written =
(tp.raw_key_size + tp.raw_value_size);
// Prevent underflow, which may still happen at this point
// since we only support inserts, deletes, and deleteRanges.
if (total_payload_bytes_written <= total_payload_bytes) {
*memtable_payload_bytes = total_payload_bytes;
*memtable_garbage_bytes =
total_payload_bytes - total_payload_bytes_written;
} else {
*memtable_payload_bytes = 0;
*memtable_garbage_bytes = 0;
}
}
if (table_properties) {
*table_properties = tp;
}
Expand Down
4 changes: 3 additions & 1 deletion db/builder.h
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ extern Status BuildTable(
Env::WriteLifeTimeHint write_hint = Env::WLTH_NOT_SET,
const std::string* full_history_ts_low = nullptr,
BlobFileCompletionCallback* blob_callback = nullptr,
uint64_t* num_input_entries = nullptr);
uint64_t* num_input_entries = nullptr,
uint64_t* memtable_payload_bytes = nullptr,
uint64_t* memtable_garbage_bytes = nullptr);

} // namespace ROCKSDB_NAMESPACE
Loading

0 comments on commit e817bc9

Please sign in to comment.