Skip to content

Commit

Permalink
Use user-provided ReadOptions for metadata block reads more often (fa…
Browse files Browse the repository at this point in the history
…cebook#11208)

Summary:
This is mostly taken from facebook#10427 with my own comments addressed. This PR plumbs the user’s `ReadOptions` down to `GetOrReadIndexBlock()`, `GetOrReadFilterBlock()`, and `GetFilterPartitionBlock()`. Now those functions no longer have to make up a `ReadOptions` with incomplete information.

I also let `PartitionIndexReader::NewIterator()` pass through its caller's `ReadOptions::verify_checksums`, which was inexplicably dropped previously.

Fixes facebook#10463

Pull Request resolved: facebook#11208

Test Plan:
Functional:
- Measured `-verify_checksum=false` applies to metadata blocks read outside of table open
  - setup command: `TEST_TMPDIR=/tmp/100M-DB/ ./db_bench -benchmarks=filluniquerandom,waitforcompaction -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -compression_type=none -num=1638400 -key_size=8 -value_size=56`
  - run command: `TEST_TMPDIR=/tmp/100M-DB/ ./db_bench -benchmarks=readrandom -use_existing_db=true -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -compression_type=none -num=1638400 -key_size=8 -value_size=56 -duration=10 -threads=32 -cache_size=131072 -statistics=true -verify_checksum=false -open_files=20 -cache_index_and_filter_blocks=true`
  - before: `rocksdb.block.checksum.compute.count COUNT : 384353`
  - after: `rocksdb.block.checksum.compute.count COUNT : 22`

Performance:
- Setup command (tmpfs, 128MB logical data size, cache indexes/filters without pinning so index/filter lookups go through table reader): `TEST_TMPDIR=/dev/shm/128M-DB/ ./db_bench -benchmarks=filluniquerandom,waitforcompaction -write_buffer_size=131072 -target_file_size_base=131072 -max_bytes_for_level_base=524288 -compression_type=none -num=4194304 -key_size=8 -value_size=24 -bloom_bits=8 -whole_key_filtering=1`
- Measured point lookup performance. Database is fully cached to emphasize any new callstack overheads
  - Command: `TEST_TMPDIR=/dev/shm/128M-DB/ ./db_bench -benchmarks=readrandom[-W1][-X20] -use_existing_db=true -cache_index_and_filter_blocks=true -disable_auto_compactions=true -num=4194304 -key_size=8 -value_size=24 -bloom_bits=8 -whole_key_filtering=1 -duration=10 -cache_size=1048576000`
  - Before: `readrandom [AVG    20 runs] : 274848 (± 3717) ops/sec;    8.4 (± 0.1) MB/sec`
  - After: `readrandom [AVG    20 runs] : 277904 (± 4474) ops/sec;    8.5 (± 0.1) MB/sec`

Reviewed By: hx235

Differential Revision: D43145366

Pulled By: ajkr

fbshipit-source-id: 75ec062ece86a82cd788783de9de2c72df57f994
  • Loading branch information
ajkr authored and facebook-github-bot committed Apr 4, 2023
1 parent 03ccb1c commit b457386
Show file tree
Hide file tree
Showing 19 changed files with 133 additions and 175 deletions.
1 change: 1 addition & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
### Behavior changes
* Changed default block cache size from an 8MB to 32MB LRUCache, which increases the default number of cache shards from 16 to 64. This change is intended to minimize cache mutex contention under stress conditions. See https://github.com/facebook/rocksdb/wiki/Block-Cache for more information.
* For level compaction with `level_compaction_dynamic_level_bytes=true`, RocksDB now trivially moves levels down to fill LSM starting from bottommost level during DB open. See more in comments for option `level_compaction_dynamic_level_bytes`.
* User-provided `ReadOptions` take effect for more reads of non-`CacheEntryRole::kDataBlock` blocks.

### New Features
* Add experimental `PerfContext` counters `iter_{next|prev|seek}_count` for db iterator, each counting the times of corresponding API being called.
Expand Down
5 changes: 2 additions & 3 deletions table/block_based/binary_search_index_reader.cc
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,8 @@ InternalIteratorBase<IndexValue>* BinarySearchIndexReader::NewIterator(
const BlockBasedTable::Rep* rep = table()->get_rep();
const bool no_io = (read_options.read_tier == kBlockCacheTier);
CachableEntry<Block> index_block;
const Status s =
GetOrReadIndexBlock(no_io, read_options.rate_limiter_priority,
get_context, lookup_context, &index_block);
const Status s = GetOrReadIndexBlock(no_io, get_context, lookup_context,
&index_block, read_options);
if (!s.ok()) {
if (iter != nullptr) {
iter->Invalidate(s);
Expand Down
26 changes: 12 additions & 14 deletions table/block_based/block_based_table_reader.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1781,7 +1781,7 @@ bool BlockBasedTable::PrefixRangeMayMatch(
read_options.iterate_upper_bound, user_key_without_ts, prefix_extractor,
rep_->internal_comparator.user_comparator(), const_ikey_ptr,
&filter_checked, need_upper_bound_check, no_io, lookup_context,
read_options.rate_limiter_priority);
read_options);
}

if (filter_checked) {
Expand Down Expand Up @@ -1855,7 +1855,7 @@ bool BlockBasedTable::FullFilterKeyMayMatch(
FilterBlockReader* filter, const Slice& internal_key, const bool no_io,
const SliceTransform* prefix_extractor, GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) const {
const ReadOptions& read_options) const {
if (filter == nullptr) {
return true;
}
Expand All @@ -1865,15 +1865,13 @@ bool BlockBasedTable::FullFilterKeyMayMatch(
size_t ts_sz = rep_->internal_comparator.user_comparator()->timestamp_size();
Slice user_key_without_ts = StripTimestampFromUserKey(user_key, ts_sz);
if (rep_->whole_key_filtering) {
may_match =
filter->KeyMayMatch(user_key_without_ts, no_io, const_ikey_ptr,
get_context, lookup_context, rate_limiter_priority);
may_match = filter->KeyMayMatch(user_key_without_ts, no_io, const_ikey_ptr,
get_context, lookup_context, read_options);
} else if (!PrefixExtractorChanged(prefix_extractor) &&
prefix_extractor->InDomain(user_key_without_ts) &&
!filter->PrefixMayMatch(
prefix_extractor->Transform(user_key_without_ts), no_io,
const_ikey_ptr, get_context, lookup_context,
rate_limiter_priority)) {
const_ikey_ptr, get_context, lookup_context, read_options)) {
// FIXME ^^^: there should be no reason for Get() to depend on current
// prefix_extractor at all. It should always use table_prefix_extractor.
may_match = false;
Expand All @@ -1889,14 +1887,14 @@ void BlockBasedTable::FullFilterKeysMayMatch(
FilterBlockReader* filter, MultiGetRange* range, const bool no_io,
const SliceTransform* prefix_extractor,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) const {
const ReadOptions& read_options) const {
if (filter == nullptr) {
return;
}
uint64_t before_keys = range->KeysLeft();
assert(before_keys > 0); // Caller should ensure
if (rep_->whole_key_filtering) {
filter->KeysMayMatch(range, no_io, lookup_context, rate_limiter_priority);
filter->KeysMayMatch(range, no_io, lookup_context, read_options);
uint64_t after_keys = range->KeysLeft();
if (after_keys) {
RecordTick(rep_->ioptions.stats, BLOOM_FILTER_FULL_POSITIVE, after_keys);
Expand All @@ -1913,7 +1911,7 @@ void BlockBasedTable::FullFilterKeysMayMatch(
// FIXME ^^^: there should be no reason for MultiGet() to depend on current
// prefix_extractor at all. It should always use table_prefix_extractor.
filter->PrefixesMayMatch(range, prefix_extractor, false, lookup_context,
rate_limiter_priority);
read_options);
RecordTick(rep_->ioptions.stats, BLOOM_FILTER_PREFIX_CHECKED, before_keys);
uint64_t after_keys = range->KeysLeft();
uint64_t filtered_keys = before_keys - after_keys;
Expand Down Expand Up @@ -2010,9 +2008,9 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key,
read_options.snapshot != nullptr;
}
TEST_SYNC_POINT("BlockBasedTable::Get:BeforeFilterMatch");
const bool may_match = FullFilterKeyMayMatch(
filter, key, no_io, prefix_extractor, get_context, &lookup_context,
read_options.rate_limiter_priority);
const bool may_match =
FullFilterKeyMayMatch(filter, key, no_io, prefix_extractor, get_context,
&lookup_context, read_options);
TEST_SYNC_POINT("BlockBasedTable::Get:AfterFilterMatch");
if (!may_match) {
RecordTick(rep_->ioptions.stats, BLOOM_FILTER_USEFUL);
Expand Down Expand Up @@ -2182,7 +2180,7 @@ Status BlockBasedTable::MultiGetFilter(const ReadOptions& read_options,
TableReaderCaller::kUserMultiGet, tracing_mget_id,
/*_get_from_user_specified_snapshot=*/read_options.snapshot != nullptr};
FullFilterKeysMayMatch(filter, mget_range, no_io, prefix_extractor,
&lookup_context, read_options.rate_limiter_priority);
&lookup_context, read_options);

return Status::OK();
}
Expand Down
4 changes: 2 additions & 2 deletions table/block_based/block_based_table_reader.h
Original file line number Diff line number Diff line change
Expand Up @@ -429,13 +429,13 @@ class BlockBasedTable : public TableReader {
const SliceTransform* prefix_extractor,
GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) const;
const ReadOptions& read_options) const;

void FullFilterKeysMayMatch(FilterBlockReader* filter, MultiGetRange* range,
const bool no_io,
const SliceTransform* prefix_extractor,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) const;
const ReadOptions& read_options) const;

// If force_direct_prefetch is true, always prefetching to RocksDB
// buffer, rather than calling RandomAccessFile::Prefetch().
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -335,7 +335,7 @@ DEFINE_SYNC_AND_ASYNC(void, BlockBasedTable::MultiGet)
TableReaderCaller::kUserMultiGet, tracing_mget_id,
/*_get_from_user_specified_snapshot=*/read_options.snapshot != nullptr};
FullFilterKeysMayMatch(filter, &sst_file_range, no_io, prefix_extractor,
&lookup_context, read_options.rate_limiter_priority);
&lookup_context, read_options);

if (!sst_file_range.empty()) {
IndexBlockIter iiter_on_stack;
Expand Down
15 changes: 7 additions & 8 deletions table/block_based/filter_block.h
Original file line number Diff line number Diff line change
Expand Up @@ -113,17 +113,17 @@ class FilterBlockReader {
const Slice* const const_ikey_ptr,
GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) = 0;
const ReadOptions& read_options) = 0;

virtual void KeysMayMatch(MultiGetRange* range, const bool no_io,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) {
const ReadOptions& read_options) {
for (auto iter = range->begin(); iter != range->end(); ++iter) {
const Slice ukey_without_ts = iter->ukey_without_ts;
const Slice ikey = iter->ikey;
GetContext* const get_context = iter->get_context;
if (!KeyMayMatch(ukey_without_ts, no_io, &ikey, get_context,
lookup_context, rate_limiter_priority)) {
lookup_context, read_options)) {
range->SkipKey(iter);
}
}
Expand All @@ -136,21 +136,20 @@ class FilterBlockReader {
const Slice* const const_ikey_ptr,
GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) = 0;
const ReadOptions& read_options) = 0;

virtual void PrefixesMayMatch(MultiGetRange* range,
const SliceTransform* prefix_extractor,
const bool no_io,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) {
const ReadOptions& read_options) {
for (auto iter = range->begin(); iter != range->end(); ++iter) {
const Slice ukey_without_ts = iter->ukey_without_ts;
const Slice ikey = iter->ikey;
GetContext* const get_context = iter->get_context;
if (prefix_extractor->InDomain(ukey_without_ts) &&
!PrefixMayMatch(prefix_extractor->Transform(ukey_without_ts), no_io,
&ikey, get_context, lookup_context,
rate_limiter_priority)) {
&ikey, get_context, lookup_context, read_options)) {
range->SkipKey(iter);
}
}
Expand All @@ -176,7 +175,7 @@ class FilterBlockReader {
bool* filter_checked, bool need_upper_bound_check,
bool no_io,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) = 0;
const ReadOptions& read_options) = 0;
};

} // namespace ROCKSDB_NAMESPACE
14 changes: 6 additions & 8 deletions table/block_based/filter_block_reader_common.cc
Original file line number Diff line number Diff line change
Expand Up @@ -70,21 +70,20 @@ Status FilterBlockReaderCommon<TBlocklike>::GetOrReadFilterBlock(
bool no_io, GetContext* get_context,
BlockCacheLookupContext* lookup_context,
CachableEntry<TBlocklike>* filter_block,
Env::IOPriority rate_limiter_priority) const {
const ReadOptions& read_options) const {
assert(filter_block);

if (!filter_block_.IsEmpty()) {
filter_block->SetUnownedValue(filter_block_.GetValue());
return Status::OK();
}

ReadOptions read_options;
read_options.rate_limiter_priority = rate_limiter_priority;
ReadOptions ro = read_options;
if (no_io) {
read_options.read_tier = kBlockCacheTier;
ro.read_tier = kBlockCacheTier;
}

return ReadFilterBlock(table_, nullptr /* prefetch_buffer */, read_options,
return ReadFilterBlock(table_, nullptr /* prefetch_buffer */, ro,
cache_filter_blocks(), get_context, lookup_context,
filter_block);
}
Expand All @@ -104,8 +103,7 @@ bool FilterBlockReaderCommon<TBlocklike>::RangeMayExist(
const SliceTransform* prefix_extractor, const Comparator* comparator,
const Slice* const const_ikey_ptr, bool* filter_checked,
bool need_upper_bound_check, bool no_io,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) {
BlockCacheLookupContext* lookup_context, const ReadOptions& read_options) {
if (!prefix_extractor || !prefix_extractor->InDomain(user_key_without_ts)) {
*filter_checked = false;
return true;
Expand All @@ -119,7 +117,7 @@ bool FilterBlockReaderCommon<TBlocklike>::RangeMayExist(
*filter_checked = true;
return PrefixMayMatch(prefix, no_io, const_ikey_ptr,
/* get_context */ nullptr, lookup_context,
rate_limiter_priority);
read_options);
}
}

Expand Down
4 changes: 2 additions & 2 deletions table/block_based/filter_block_reader_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ class FilterBlockReaderCommon : public FilterBlockReader {
const Slice* const const_ikey_ptr, bool* filter_checked,
bool need_upper_bound_check, bool no_io,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) override;
const ReadOptions& read_options) override;

protected:
static Status ReadFilterBlock(const BlockBasedTable* table,
Expand All @@ -58,7 +58,7 @@ class FilterBlockReaderCommon : public FilterBlockReader {
Status GetOrReadFilterBlock(bool no_io, GetContext* get_context,
BlockCacheLookupContext* lookup_context,
CachableEntry<TBlocklike>* filter_block,
Env::IOPriority rate_limiter_priority) const;
const ReadOptions& read_options) const;

size_t ApproximateFilterBlockMemoryUsage() const;

Expand Down
41 changes: 18 additions & 23 deletions table/block_based/full_filter_block.cc
Original file line number Diff line number Diff line change
Expand Up @@ -127,12 +127,11 @@ bool FullFilterBlockReader::KeyMayMatch(const Slice& key, const bool no_io,
const Slice* const /*const_ikey_ptr*/,
GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) {
const ReadOptions& read_options) {
if (!whole_key_filtering()) {
return true;
}
return MayMatch(key, no_io, get_context, lookup_context,
rate_limiter_priority);
return MayMatch(key, no_io, get_context, lookup_context, read_options);
}

std::unique_ptr<FilterBlockReader> FullFilterBlockReader::Create(
Expand Down Expand Up @@ -165,20 +164,18 @@ std::unique_ptr<FilterBlockReader> FullFilterBlockReader::Create(
bool FullFilterBlockReader::PrefixMayMatch(
const Slice& prefix, const bool no_io,
const Slice* const /*const_ikey_ptr*/, GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) {
return MayMatch(prefix, no_io, get_context, lookup_context,
rate_limiter_priority);
BlockCacheLookupContext* lookup_context, const ReadOptions& read_options) {
return MayMatch(prefix, no_io, get_context, lookup_context, read_options);
}

bool FullFilterBlockReader::MayMatch(
const Slice& entry, bool no_io, GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) const {
bool FullFilterBlockReader::MayMatch(const Slice& entry, bool no_io,
GetContext* get_context,
BlockCacheLookupContext* lookup_context,
const ReadOptions& read_options) const {
CachableEntry<ParsedFullFilterBlock> filter_block;

const Status s = GetOrReadFilterBlock(no_io, get_context, lookup_context,
&filter_block, rate_limiter_priority);
&filter_block, read_options);
if (!s.ok()) {
IGNORE_STATUS_IF_ERROR(s);
return true;
Expand All @@ -203,33 +200,31 @@ bool FullFilterBlockReader::MayMatch(

void FullFilterBlockReader::KeysMayMatch(
MultiGetRange* range, const bool no_io,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) {
BlockCacheLookupContext* lookup_context, const ReadOptions& read_options) {
if (!whole_key_filtering()) {
// Simply return. Don't skip any key - consider all keys as likely to be
// present
return;
}
MayMatch(range, no_io, nullptr, lookup_context, rate_limiter_priority);
MayMatch(range, no_io, nullptr, lookup_context, read_options);
}

void FullFilterBlockReader::PrefixesMayMatch(
MultiGetRange* range, const SliceTransform* prefix_extractor,
const bool no_io, BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) {
MayMatch(range, no_io, prefix_extractor, lookup_context,
rate_limiter_priority);
const ReadOptions& read_options) {
MayMatch(range, no_io, prefix_extractor, lookup_context, read_options);
}

void FullFilterBlockReader::MayMatch(
MultiGetRange* range, bool no_io, const SliceTransform* prefix_extractor,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) const {
void FullFilterBlockReader::MayMatch(MultiGetRange* range, bool no_io,
const SliceTransform* prefix_extractor,
BlockCacheLookupContext* lookup_context,
const ReadOptions& read_options) const {
CachableEntry<ParsedFullFilterBlock> filter_block;

const Status s =
GetOrReadFilterBlock(no_io, range->begin()->get_context, lookup_context,
&filter_block, rate_limiter_priority);
&filter_block, read_options);
if (!s.ok()) {
IGNORE_STATUS_IF_ERROR(s);
return;
Expand Down
16 changes: 8 additions & 8 deletions table/block_based/full_filter_block.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,40 +108,40 @@ class FullFilterBlockReader
bool KeyMayMatch(const Slice& key, const bool no_io,
const Slice* const const_ikey_ptr, GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) override;
const ReadOptions& read_options) override;

bool PrefixMayMatch(const Slice& prefix, const bool no_io,
const Slice* const const_ikey_ptr,
GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) override;
const ReadOptions& read_options) override;

void KeysMayMatch(MultiGetRange* range, const bool no_io,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) override;
const ReadOptions& read_options) override;
// Used in partitioned filter code
void KeysMayMatch2(MultiGetRange* range,
const SliceTransform* /*prefix_extractor*/,
const bool no_io, BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) {
KeysMayMatch(range, no_io, lookup_context, rate_limiter_priority);
const ReadOptions& read_options) {
KeysMayMatch(range, no_io, lookup_context, read_options);
}

void PrefixesMayMatch(MultiGetRange* range,
const SliceTransform* prefix_extractor,
const bool no_io,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) override;
const ReadOptions& read_options) override;
size_t ApproximateMemoryUsage() const override;

private:
bool MayMatch(const Slice& entry, bool no_io, GetContext* get_context,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) const;
const ReadOptions& read_options) const;
void MayMatch(MultiGetRange* range, bool no_io,
const SliceTransform* prefix_extractor,
BlockCacheLookupContext* lookup_context,
Env::IOPriority rate_limiter_priority) const;
const ReadOptions& read_options) const;
};

} // namespace ROCKSDB_NAMESPACE
Loading

0 comments on commit b457386

Please sign in to comment.