Skip to content

Commit

Permalink
Checksum for each SST file and stores in MANIFEST (facebook#6216)
Browse files Browse the repository at this point in the history
Summary:
In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata.

Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string).  A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h
Pull Request resolved: facebook#6216

Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check.

Differential Revision: D19171461

Pulled By: zhichao-cao

fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
  • Loading branch information
zhichao-cao authored and facebook-github-bot committed Feb 10, 2020
1 parent 594e815 commit 4369f2c
Show file tree
Hide file tree
Showing 49 changed files with 1,355 additions and 71 deletions.
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -665,6 +665,7 @@ set(SOURCES
util/random.cc
util/rate_limiter.cc
util/slice.cc
util/file_checksum_helper.cc
util/status.cc
util/string_util.cc
util/thread_local.cc
Expand Down
3 changes: 3 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
* The BlobDB garbage collector now emits the statistics `BLOB_DB_GC_NUM_FILES` (number of blob files obsoleted during GC), `BLOB_DB_GC_NUM_NEW_FILES` (number of new blob files generated during GC), `BLOB_DB_GC_FAILURES` (number of failed GC passes), `BLOB_DB_GC_NUM_KEYS_RELOCATED` (number of blobs relocated during GC), and `BLOB_DB_GC_BYTES_RELOCATED` (total size of blobs relocated during GC). On the other hand, the following statistics, which are not relevant for the new GC implementation, are now deprecated: `BLOB_DB_GC_NUM_KEYS_OVERWRITTEN`, `BLOB_DB_GC_NUM_KEYS_EXPIRED`, `BLOB_DB_GC_BYTES_OVERWRITTEN`, `BLOB_DB_GC_BYTES_EXPIRED`, and `BLOB_DB_GC_MICROS`.
* Disable recycle_log_file_num when an inconsistent recovery modes are requested: kPointInTimeRecovery and kAbsoluteConsistency

### New Features
* Added the checksum for each SST file generated by Flush or Compaction. Added sst_file_checksum_func to Options such that user can plugin their own SST file checksum function via override the FileChecksumFunc class. If user does not set the sst_file_checksum_func, SST file checksum calculation will not be enabled. The checksum information inlcuding uint32_t checksum value and a checksum function name (string). The checksum information is stored in FileMetadata in version store and also logged to MANIFEST. A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST (stored in an unordered_map).

## 6.7.0 (01/21/2020)
### Public API Change
* Added a rocksdb::FileSystem class in include/rocksdb/file_system.h to encapsulate file creation/read/write operations, and an option DBOptions::file_system to allow a user to pass in an instance of rocksdb::FileSystem. If its a non-null value, this will take precendence over DBOptions::env for file operations. A new API rocksdb::FileSystem::Default() returns a platform default object. The DBOptions::env option and Env::Default() API will continue to be used for threading and other OS related functions, and where DBOptions::file_system is not specified, for file operations. For storage developers who are accustomed to rocksdb::Env, the interface in rocksdb::FileSystem is new and will probably undergo some changes as more storage systems are ported to it from rocksdb::Env. As of now, no env other than Posix has been ported to the new interface.
Expand Down
1 change: 1 addition & 0 deletions TARGETS
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ cpp_library(
"util/concurrent_task_limiter_impl.cc",
"util/crc32c.cc",
"util/dynamic_bloom.cc",
"util/file_checksum_helper.cc",
"util/hash.cc",
"util/murmurhash.cc",
"util/random.cc",
Expand Down
10 changes: 7 additions & 3 deletions db/builder.cc
Original file line number Diff line number Diff line change
Expand Up @@ -131,9 +131,10 @@ Status BuildTable(
file->SetIOPriority(io_priority);
file->SetWriteLifeTimeHint(write_hint);

file_writer.reset(
new WritableFileWriter(std::move(file), fname, file_options, env,
ioptions.statistics, ioptions.listeners));
file_writer.reset(new WritableFileWriter(
std::move(file), fname, file_options, env, ioptions.statistics,
ioptions.listeners, ioptions.sst_file_checksum_func));

builder = NewTableBuilder(
ioptions, mutable_cf_options, internal_comparator,
int_tbl_prop_collector_factories, column_family_id,
Expand Down Expand Up @@ -199,6 +200,9 @@ Status BuildTable(
if (table_properties) {
*table_properties = tp;
}
// Add the checksum information to file metadata.
meta->file_checksum = builder->GetFileChecksum();
meta->file_checksum_func_name = builder->GetFileChecksumFuncName();
}
delete builder;

Expand Down
8 changes: 7 additions & 1 deletion db/compaction/compaction_job.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1296,6 +1296,11 @@ Status CompactionJob::FinishCompactionOutputFile(
}
const uint64_t current_bytes = sub_compact->builder->FileSize();
if (s.ok()) {
// Add the checksum information to file metadata.
meta->file_checksum = sub_compact->builder->GetFileChecksum();
meta->file_checksum_func_name =
sub_compact->builder->GetFileChecksumFuncName();

meta->fd.file_size = current_bytes;
}
sub_compact->current_output()->finished = true;
Expand Down Expand Up @@ -1508,7 +1513,8 @@ Status CompactionJob::OpenCompactionOutputFile(
sub_compact->compaction->immutable_cf_options()->listeners;
sub_compact->outfile.reset(
new WritableFileWriter(std::move(writable_file), fname, file_options_,
env_, db_options_.statistics.get(), listeners));
env_, db_options_.statistics.get(), listeners,
db_options_.sst_file_checksum_func.get()));

// If the Column family flag is to only optimize filters for hits,
// we can skip creating filters if this is the bottommost_level where
Expand Down
3 changes: 2 additions & 1 deletion db/compaction/compaction_job_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,8 @@ class CompactionJobTest : public testing::Test {
VersionEdit edit;
edit.AddFile(level, file_number, 0, 10, smallest_key, largest_key,
smallest_seqno, largest_seqno, false, oldest_blob_file_number,
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime,
kUnknownFileChecksum, kUnknownFileChecksumFuncName);

mutex_.Lock();
versions_->LogAndApply(versions_->GetColumnFamilySet()->GetDefault(),
Expand Down
3 changes: 2 additions & 1 deletion db/compaction/compaction_picker_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,8 @@ class CompactionPickerTest : public testing::Test {
InternalKey(smallest, smallest_seq, kTypeValue),
InternalKey(largest, largest_seq, kTypeValue), smallest_seq,
largest_seq, /* marked_for_compact */ false, kInvalidBlobFileNumber,
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime,
kUnknownFileChecksum, kUnknownFileChecksumFuncName);
f->compensated_file_size =
(compensated_file_size != 0) ? compensated_file_size : file_size;
vstorage_->AddFile(level, f);
Expand Down
20 changes: 10 additions & 10 deletions db/comparator_db_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ using std::unique_ptr;
namespace rocksdb {
namespace {

static const Comparator* comparator;
static const Comparator* kTestComparator = nullptr;

class KVIter : public Iterator {
public:
Expand Down Expand Up @@ -74,7 +74,7 @@ void AssertItersEqual(Iterator* iter1, Iterator* iter2) {
void DoRandomIteraratorTest(DB* db, std::vector<std::string> source_strings,
Random* rnd, int num_writes, int num_iter_ops,
int num_trigger_flush) {
stl_wrappers::KVMap map((stl_wrappers::LessOfComparator(comparator)));
stl_wrappers::KVMap map((stl_wrappers::LessOfComparator(kTestComparator)));

for (int i = 0; i < num_writes; i++) {
if (num_trigger_flush > 0 && i != 0 && i % num_trigger_flush == 0) {
Expand Down Expand Up @@ -263,7 +263,7 @@ class ComparatorDBTest

public:
ComparatorDBTest() : env_(Env::Default()), db_(nullptr) {
comparator = BytewiseComparator();
kTestComparator = BytewiseComparator();
dbname_ = test::PerThreadDBPath("comparator_db_test");
BlockBasedTableOptions toptions;
toptions.format_version = GetParam();
Expand All @@ -275,7 +275,7 @@ class ComparatorDBTest
~ComparatorDBTest() override {
delete db_;
EXPECT_OK(DestroyDB(dbname_, last_options_));
comparator = BytewiseComparator();
kTestComparator = BytewiseComparator();
}

DB* GetDB() { return db_; }
Expand All @@ -286,7 +286,7 @@ class ComparatorDBTest
} else {
comparator_guard.reset();
}
comparator = cmp;
kTestComparator = cmp;
last_options_.comparator = cmp;
}

Expand Down Expand Up @@ -334,7 +334,7 @@ TEST_P(ComparatorDBTest, SimpleSuffixReverseComparator) {

for (int rnd_seed = 301; rnd_seed < 316; rnd_seed++) {
Options* opt = GetOptions();
opt->comparator = comparator;
opt->comparator = kTestComparator;
DestroyAndReopen();
Random rnd(rnd_seed);

Expand All @@ -360,7 +360,7 @@ TEST_P(ComparatorDBTest, Uint64Comparator) {

for (int rnd_seed = 301; rnd_seed < 316; rnd_seed++) {
Options* opt = GetOptions();
opt->comparator = comparator;
opt->comparator = kTestComparator;
DestroyAndReopen();
Random rnd(rnd_seed);
Random64 rnd64(rnd_seed);
Expand All @@ -384,7 +384,7 @@ TEST_P(ComparatorDBTest, DoubleComparator) {

for (int rnd_seed = 301; rnd_seed < 316; rnd_seed++) {
Options* opt = GetOptions();
opt->comparator = comparator;
opt->comparator = kTestComparator;
DestroyAndReopen();
Random rnd(rnd_seed);

Expand All @@ -409,7 +409,7 @@ TEST_P(ComparatorDBTest, HashComparator) {

for (int rnd_seed = 301; rnd_seed < 316; rnd_seed++) {
Options* opt = GetOptions();
opt->comparator = comparator;
opt->comparator = kTestComparator;
DestroyAndReopen();
Random rnd(rnd_seed);

Expand All @@ -428,7 +428,7 @@ TEST_P(ComparatorDBTest, TwoStrComparator) {

for (int rnd_seed = 301; rnd_seed < 316; rnd_seed++) {
Options* opt = GetOptions();
opt->comparator = comparator;
opt->comparator = kTestComparator;
DestroyAndReopen();
Random rnd(rnd_seed);

Expand Down
6 changes: 4 additions & 2 deletions db/db_impl/db_impl_compaction_flush.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1258,7 +1258,8 @@ Status DBImpl::ReFitLevel(ColumnFamilyData* cfd, int level, int target_level) {
f->fd.GetFileSize(), f->smallest, f->largest,
f->fd.smallest_seqno, f->fd.largest_seqno,
f->marked_for_compaction, f->oldest_blob_file_number,
f->oldest_ancester_time, f->file_creation_time);
f->oldest_ancester_time, f->file_creation_time,
f->file_checksum, f->file_checksum_func_name);
}
ROCKS_LOG_DEBUG(immutable_db_options_.info_log,
"[%s] Apply version edit:\n%s", cfd->GetName().c_str(),
Expand Down Expand Up @@ -2669,7 +2670,8 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
f->largest, f->fd.smallest_seqno,
f->fd.largest_seqno, f->marked_for_compaction,
f->oldest_blob_file_number, f->oldest_ancester_time,
f->file_creation_time);
f->file_creation_time, f->file_checksum,
f->file_checksum_func_name);

ROCKS_LOG_BUFFER(
log_buffer,
Expand Down
3 changes: 2 additions & 1 deletion db/db_impl/db_impl_experimental.cc
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,8 @@ Status DBImpl::PromoteL0(ColumnFamilyHandle* column_family, int target_level) {
f->fd.GetFileSize(), f->smallest, f->largest,
f->fd.smallest_seqno, f->fd.largest_seqno,
f->marked_for_compaction, f->oldest_blob_file_number,
f->oldest_ancester_time, f->file_creation_time);
f->oldest_ancester_time, f->file_creation_time,
f->file_checksum, f->file_checksum_func_name);
}

status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
Expand Down
3 changes: 2 additions & 1 deletion db/db_impl/db_impl_open.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1259,7 +1259,8 @@ Status DBImpl::WriteLevel0TableForRecovery(int job_id, ColumnFamilyData* cfd,
meta.fd.GetFileSize(), meta.smallest, meta.largest,
meta.fd.smallest_seqno, meta.fd.largest_seqno,
meta.marked_for_compaction, meta.oldest_blob_file_number,
meta.oldest_ancester_time, meta.file_creation_time);
meta.oldest_ancester_time, meta.file_creation_time,
meta.file_checksum, meta.file_checksum_func_name);
}

InternalStats::CompactionStats stats(CompactionReason::kFlush, 1);
Expand Down
10 changes: 5 additions & 5 deletions db/external_sst_file_ingestion_job.cc
Original file line number Diff line number Diff line change
Expand Up @@ -255,11 +255,11 @@ Status ExternalSstFileIngestionJob::Run() {
static_cast<uint64_t>(temp_current_time);
}

edit_.AddFile(f.picked_level, f.fd.GetNumber(), f.fd.GetPathId(),
f.fd.GetFileSize(), f.smallest_internal_key,
f.largest_internal_key, f.assigned_seqno, f.assigned_seqno,
false, kInvalidBlobFileNumber, oldest_ancester_time,
current_time);
edit_.AddFile(
f.picked_level, f.fd.GetNumber(), f.fd.GetPathId(), f.fd.GetFileSize(),
f.smallest_internal_key, f.largest_internal_key, f.assigned_seqno,
f.assigned_seqno, false, kInvalidBlobFileNumber, oldest_ancester_time,
current_time, kUnknownFileChecksum, kUnknownFileChecksumFuncName);
}
return status;
}
Expand Down
3 changes: 2 additions & 1 deletion db/flush_job.cc
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,8 @@ Status FlushJob::WriteLevel0Table() {
meta_.fd.GetFileSize(), meta_.smallest, meta_.largest,
meta_.fd.smallest_seqno, meta_.fd.largest_seqno,
meta_.marked_for_compaction, meta_.oldest_blob_file_number,
meta_.oldest_ancester_time, meta_.file_creation_time);
meta_.oldest_ancester_time, meta_.file_creation_time,
meta_.file_checksum, meta_.file_checksum_func_name);
}
#ifndef ROCKSDB_LITE
// Piggyback FlushJobInfo on the first first flushed memtable.
Expand Down
3 changes: 2 additions & 1 deletion db/import_column_family_job.cc
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,8 @@ Status ImportColumnFamilyJob::Run() {
f.fd.GetFileSize(), f.smallest_internal_key,
f.largest_internal_key, file_metadata.smallest_seqno,
file_metadata.largest_seqno, false, kInvalidBlobFileNumber,
oldest_ancester_time, current_time);
oldest_ancester_time, current_time, kUnknownFileChecksum,
kUnknownFileChecksumFuncName);

// If incoming sequence number is higher, update local sequence number.
if (file_metadata.largest_seqno > versions_->LastSequence()) {
Expand Down
3 changes: 2 additions & 1 deletion db/repair.cc
Original file line number Diff line number Diff line change
Expand Up @@ -586,7 +586,8 @@ class Repairer {
table->meta.largest, table->meta.fd.smallest_seqno,
table->meta.fd.largest_seqno, table->meta.marked_for_compaction,
table->meta.oldest_blob_file_number,
table->meta.oldest_ancester_time, table->meta.file_creation_time);
table->meta.oldest_ancester_time, table->meta.file_creation_time,
table->meta.file_checksum, table->meta.file_checksum_func_name);
}
assert(next_file_number_ > 0);
vset_.MarkFileNumberUsed(next_file_number_ - 1);
Expand Down
Loading

0 comments on commit 4369f2c

Please sign in to comment.