Skip to content

Commit

Permalink
Create a BlockCacheLookupContext to enable fine-grained block cache t…
Browse files Browse the repository at this point in the history
…racing. (facebook#5421)

Summary:
BlockCacheLookupContext only contains the caller for now.
We will trace block accesses at five places:
1. BlockBasedTable::GetFilter.
2. BlockBasedTable::GetUncompressedDict.
3. BlockBasedTable::MaybeReadAndLoadToCache. (To trace access on data, index, and range deletion block.)
4. BlockBasedTable::Get. (To trace the referenced key and whether the referenced key exists in a fetched data block.)
5. BlockBasedTable::MultiGet. (To trace the referenced key and whether the referenced key exists in a fetched data block.)

We create the context at:
1. BlockBasedTable::Get. (kUserGet)
2. BlockBasedTable::MultiGet. (kUserMGet)
3. BlockBasedTable::NewIterator. (either kUserIterator, kCompaction, or external SST ingestion calls this function.)
4. BlockBasedTable::Open. (kPrefetch)
5. Index/Filter::CacheDependencies. (kPrefetch)
6. BlockBasedTable::ApproximateOffsetOf. (kCompaction or kUserApproximateSize).

I loaded 1 million key-value pairs into the database and ran the readrandom benchmark with a single thread. I gave the block cache 10 GB to make sure all reads hit the block cache after warmup. The throughput is comparable.
Throughput of this PR: 231334 ops/s.
Throughput of the master branch: 238428 ops/s.

Experiment setup:
RocksDB:    version 6.2
Date:       Mon Jun 10 10:42:51 2019
CPU:        24 * Intel Core Processor (Skylake)
CPUCache:   16384 KB
Keys:       20 bytes each
Values:     100 bytes each (100 bytes after compression)
Entries:    1000000
Prefix:    20 bytes
Keys per prefix:    0
RawSize:    114.4 MB (estimated)
FileSize:   114.4 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: NoCompression
Compression sampling rate: 0
Memtablerep: skip_list
Perf Level: 1

Load command: ./db_bench --benchmarks="fillseq" --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000

Run command: ./db_bench --benchmarks="readrandom,stats" --use_existing_db --threads=1 --duration=120 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --statistics --cache_index_and_filter_blocks --cache_size=10737418240 --disable_auto_compactions=1 --disable_wal=1 --compression_type=none --min_level_to_compress=-1 --compression_ratio=1 --num=1000000 --duration=120

TODOs:
1. Create a caller for external SST file ingestion and differentiate the callers for iterator.
2. Integrate tracer to trace block cache accesses.
Pull Request resolved: facebook#5421

Differential Revision: D15704258

Pulled By: HaoyuHuang

fbshipit-source-id: 4aa8a55f8cb1576ffb367bfa3186a91d8f06d93a
  • Loading branch information
HaoyuHuang authored and facebook-github-bot committed Jun 10, 2019
1 parent 63ace8e commit 5efa0d6
Show file tree
Hide file tree
Showing 22 changed files with 634 additions and 335 deletions.
3 changes: 2 additions & 1 deletion db/compaction/compaction_job.cc
Original file line number Diff line number Diff line change
Expand Up @@ -520,7 +520,8 @@ void CompactionJob::GenSubcompactionBoundaries() {
// to the index block and may incur I/O cost in the process. Unlock db
// mutex to reduce contention
db_mutex_->Unlock();
uint64_t size = versions_->ApproximateSize(v, a, b, start_lvl, out_lvl + 1);
uint64_t size = versions_->ApproximateSize(v, a, b, start_lvl, out_lvl + 1,
/*for_compaction*/ true);
db_mutex_->Lock();
ranges.emplace_back(a, b, size);
sum += size;
Expand Down
4 changes: 3 additions & 1 deletion db/db_impl/db_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2717,7 +2717,9 @@ void DBImpl::GetApproximateSizes(ColumnFamilyHandle* column_family,
InternalKey k2(range[i].limit, kMaxSequenceNumber, kValueTypeForSeek);
sizes[i] = 0;
if (include_flags & DB::SizeApproximationFlags::INCLUDE_FILES) {
sizes[i] += versions_->ApproximateSize(v, k1.Encode(), k2.Encode());
sizes[i] += versions_->ApproximateSize(
v, k1.Encode(), k2.Encode(), /*start_level=*/0, /*end_level=*/-1,
/*for_compaction=*/false);
}
if (include_flags & DB::SizeApproximationFlags::INCLUDE_MEMTABLES) {
sizes[i] += sv->mem->ApproximateStats(k1.Encode(), k2.Encode()).size;
Expand Down
21 changes: 12 additions & 9 deletions db/version_set.cc
Original file line number Diff line number Diff line change
Expand Up @@ -4827,7 +4827,7 @@ Status VersionSet::WriteSnapshot(log::Writer* log) {
// maintain state of where they first appear in the files.
uint64_t VersionSet::ApproximateSize(Version* v, const Slice& start,
const Slice& end, int start_level,
int end_level) {
int end_level, bool for_compaction) {
// pre-condition
assert(v->cfd_->internal_comparator().Compare(start, end) <= 0);

Expand All @@ -4848,7 +4848,7 @@ uint64_t VersionSet::ApproximateSize(Version* v, const Slice& start,

if (!level) {
// level 0 data is sorted order, handle the use case explicitly
size += ApproximateSizeLevel0(v, files_brief, start, end);
size += ApproximateSizeLevel0(v, files_brief, start, end, for_compaction);
continue;
}

Expand All @@ -4865,7 +4865,7 @@ uint64_t VersionSet::ApproximateSize(Version* v, const Slice& start,
// inferred from the sorted order
for (uint64_t i = idx_start; i < files_brief.num_files; i++) {
uint64_t val;
val = ApproximateSize(v, files_brief.files[i], end);
val = ApproximateSize(v, files_brief.files[i], end, for_compaction);
if (!val) {
// the files after this will not have the range
break;
Expand All @@ -4876,7 +4876,7 @@ uint64_t VersionSet::ApproximateSize(Version* v, const Slice& start,
if (i == idx_start) {
// subtract the bytes needed to be scanned to get to the starting
// key
val = ApproximateSize(v, files_brief.files[i], start);
val = ApproximateSize(v, files_brief.files[i], start, for_compaction);
assert(size >= val);
size -= val;
}
Expand All @@ -4889,21 +4889,24 @@ uint64_t VersionSet::ApproximateSize(Version* v, const Slice& start,
uint64_t VersionSet::ApproximateSizeLevel0(Version* v,
const LevelFilesBrief& files_brief,
const Slice& key_start,
const Slice& key_end) {
const Slice& key_end,
bool for_compaction) {
// level 0 files are not in sorted order, we need to iterate through
// the list to compute the total bytes that require scanning
uint64_t size = 0;
for (size_t i = 0; i < files_brief.num_files; i++) {
const uint64_t start = ApproximateSize(v, files_brief.files[i], key_start);
const uint64_t end = ApproximateSize(v, files_brief.files[i], key_end);
const uint64_t start =
ApproximateSize(v, files_brief.files[i], key_start, for_compaction);
const uint64_t end =
ApproximateSize(v, files_brief.files[i], key_end, for_compaction);
assert(end >= start);
size += end - start;
}
return size;
}

uint64_t VersionSet::ApproximateSize(Version* v, const FdWithKeyRange& f,
const Slice& key) {
const Slice& key, bool for_compaction) {
// pre-condition
assert(v);

Expand All @@ -4923,7 +4926,7 @@ uint64_t VersionSet::ApproximateSize(Version* v, const FdWithKeyRange& f,
*f.file_metadata, nullptr /* range_del_agg */,
v->GetMutableCFOptions().prefix_extractor.get(), &table_reader_ptr);
if (table_reader_ptr != nullptr) {
result = table_reader_ptr->ApproximateOffsetOf(key);
result = table_reader_ptr->ApproximateOffsetOf(key, for_compaction);
}
delete iter;
}
Expand Down
7 changes: 4 additions & 3 deletions db/version_set.h
Original file line number Diff line number Diff line change
Expand Up @@ -982,7 +982,7 @@ class VersionSet {
// in levels [start_level, end_level). If end_level == 0 it will search
// through all non-empty levels
uint64_t ApproximateSize(Version* v, const Slice& start, const Slice& end,
int start_level = 0, int end_level = -1);
int start_level, int end_level, bool for_compaction);

// Return the size of the current manifest file
uint64_t manifest_file_size() const { return manifest_file_size_; }
Expand Down Expand Up @@ -1032,10 +1032,11 @@ class VersionSet {

// ApproximateSize helper
uint64_t ApproximateSizeLevel0(Version* v, const LevelFilesBrief& files_brief,
const Slice& start, const Slice& end);
const Slice& start, const Slice& end,
bool for_compaction);

uint64_t ApproximateSize(Version* v, const FdWithKeyRange& f,
const Slice& key);
const Slice& key, bool for_compaction);

// Save current contents to *log
Status WriteSnapshot(log::Writer* log);
Expand Down
6 changes: 4 additions & 2 deletions table/block_based/block_based_filter_block.cc
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,8 @@ BlockBasedFilterBlockReader::BlockBasedFilterBlockReader(
bool BlockBasedFilterBlockReader::KeyMayMatch(
const Slice& key, const SliceTransform* /* prefix_extractor */,
uint64_t block_offset, const bool /*no_io*/,
const Slice* const /*const_ikey_ptr*/) {
const Slice* const /*const_ikey_ptr*/,
BlockCacheLookupContext* /*context*/) {
assert(block_offset != kNotValid);
if (!whole_key_filtering_) {
return true;
Expand All @@ -198,7 +199,8 @@ bool BlockBasedFilterBlockReader::KeyMayMatch(
bool BlockBasedFilterBlockReader::PrefixMayMatch(
const Slice& prefix, const SliceTransform* /* prefix_extractor */,
uint64_t block_offset, const bool /*no_io*/,
const Slice* const /*const_ikey_ptr*/) {
const Slice* const /*const_ikey_ptr*/,
BlockCacheLookupContext* /*context*/) {
assert(block_offset != kNotValid);
return MayMatch(prefix, block_offset);
}
Expand Down
23 changes: 12 additions & 11 deletions table/block_based/block_based_filter_block.h
Original file line number Diff line number Diff line change
Expand Up @@ -82,17 +82,18 @@ class BlockBasedFilterBlockReader : public FilterBlockReader {
const BlockBasedTableOptions& table_opt,
bool whole_key_filtering,
BlockContents&& contents, Statistics* statistics);
virtual bool IsBlockBased() override { return true; }

virtual bool KeyMayMatch(
const Slice& key, const SliceTransform* prefix_extractor,
uint64_t block_offset = kNotValid, const bool no_io = false,
const Slice* const const_ikey_ptr = nullptr) override;
virtual bool PrefixMayMatch(
const Slice& prefix, const SliceTransform* prefix_extractor,
uint64_t block_offset = kNotValid, const bool no_io = false,
const Slice* const const_ikey_ptr = nullptr) override;
virtual size_t ApproximateMemoryUsage() const override;
bool IsBlockBased() override { return true; }

bool KeyMayMatch(const Slice& key, const SliceTransform* prefix_extractor,
uint64_t block_offset, const bool no_io,
const Slice* const const_ikey_ptr,
BlockCacheLookupContext* context) override;
bool PrefixMayMatch(const Slice& prefix,
const SliceTransform* prefix_extractor,
uint64_t block_offset, const bool no_io,
const Slice* const const_ikey_ptr,
BlockCacheLookupContext* context) override;
size_t ApproximateMemoryUsage() const override;

// convert this object to a human readable form
std::string ToString() const override;
Expand Down
Loading

0 comments on commit 5efa0d6

Please sign in to comment.