Skip to content

Commit

Permalink
Fix BackupEngine
Browse files Browse the repository at this point in the history
Summary:
In D28521 we removed GarbageCollect() from BackupEngine's constructor. The reason was that opening BackupEngine on HDFS was very slow and in most cases we didn't have any garbage. We allowed the user to call GarbageCollect() when it detects some garbage files in his backup directory.

Unfortunately, this left us vulnerable to an interesting issue. Let's say we started a backup and copied files {1, 3} but the backup failed. On another host, we restore DB from backup and generate {1, 3, 5}. Since {1, 3} is already there, we will not overwrite. However, these files might be from a different database so their contents might be different. See internal task t6781803 for more info.

Now, when we're copying files and we discover a file already there, we check:
1. if the file is not referenced from any backups, we overwrite the file.
2. if the file is referenced from other backups AND the checksums don't match, we fail the backup. This will only happen if user is using a single backup directory for backing up two different databases.
3. if the file is referenced from other backups AND the checksums match, it's all good. We skip the copy and go copy the next file.

Test Plan: Added new test to backupable_db_test. The test fails before this patch.

Reviewers: sdong, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37599
  • Loading branch information
igorcanadi committed May 8, 2015
1 parent 962f8ba commit 50eab9c
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 9 deletions.
36 changes: 27 additions & 9 deletions utilities/backupable/backupable_db.cc
Original file line number Diff line number Diff line change
Expand Up @@ -634,8 +634,9 @@ Status BackupEngineImpl::CreateNewBackup(DB* db, bool flush_before_backup) {
Log(options_.info_log, "Backup failed -- %s", s.ToString().c_str());
Log(options_.info_log, "Backup Statistics %s\n",
backup_statistics_.ToString().c_str());
backups_.erase(new_backup_id);
(void) GarbageCollect();
// delete files that we might have already written
DeleteBackup(new_backup_id);
GarbageCollect();
return s;
}

Expand Down Expand Up @@ -1013,21 +1014,33 @@ Status BackupEngineImpl::BackupFile(BackupID backup_id, BackupMeta* backup,

// if it's shared, we also need to check if it exists -- if it does,
// no need to copy it again
bool need_to_copy = true;
if (shared && backup_env_->FileExists(dst_path)) {
need_to_copy = false;
if (shared_checksum) {
Log(options_.info_log,
"%s already present, with checksum %u and size %" PRIu64,
src_fname.c_str(), checksum_value, size);
} else if (backuped_file_infos_.find(dst_relative) ==
backuped_file_infos_.end()) {
// file already exists, but it's not referenced by any backup. overwrite
// the file
Log(options_.info_log,
"%s already present, but not referenced by any backup. We will "
"overwrite the file.",
src_fname.c_str());
need_to_copy = true;
backup_env_->DeleteFile(dst_path);
} else {
backup_env_->GetFileSize(dst_path, &size); // Ignore error
// the file is present and referenced by a backup
db_env_->GetFileSize(src_dir + src_fname, &size); // Ignore error
Log(options_.info_log, "%s already present, calculate checksum",
src_fname.c_str());
s = CalculateChecksum(src_dir + src_fname,
db_env_,
size_limit,
s = CalculateChecksum(src_dir + src_fname, db_env_, size_limit,
&checksum_value);
}
} else {
}
if (need_to_copy) {
Log(options_.info_log, "Copying %s to %s", src_fname.c_str(),
dst_path_tmp.c_str());
s = CopyFile(src_dir + src_fname,
Expand Down Expand Up @@ -1117,14 +1130,17 @@ Status BackupEngineImpl::GarbageCollect() {
&shared_children);
for (auto& child : shared_children) {
std::string rel_fname = GetSharedFileRel(child);
auto child_itr = backuped_file_infos_.find(rel_fname);
// if it's not refcounted, delete it
if (backuped_file_infos_.find(rel_fname) == backuped_file_infos_.end()) {
if (child_itr == backuped_file_infos_.end() ||
child_itr->second->refs == 0) {
// this might be a directory, but DeleteFile will just fail in that
// case, so we're good
Status s = backup_env_->DeleteFile(GetAbsolutePath(rel_fname));
if (s.ok()) {
Log(options_.info_log, "Deleted %s", rel_fname.c_str());
}
backuped_file_infos_.erase(rel_fname);
}
}

Expand Down Expand Up @@ -1178,7 +1194,9 @@ Status BackupEngineImpl::BackupMeta::AddFile(
}
} else {
if (itr->second->checksum_value != file_info->checksum_value) {
return Status::Corruption("Checksum mismatch for existing backup file");
return Status::Corruption(
"Checksum mismatch for existing backup file. Delete old backups and "
"try again.");
}
++itr->second->refs; // increase refcount if already present
}
Expand Down
24 changes: 24 additions & 0 deletions utilities/backupable/backupable_db_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1023,6 +1023,30 @@ TEST_F(BackupableDBTest, ReadOnlyBackupEngine) {
delete db;
}

TEST_F(BackupableDBTest, GarbageCollectionBeforeBackup) {
DestroyDB(dbname_, Options());
OpenBackupableDB(true);

env_->CreateDirIfMissing(backupdir_ + "/shared");
std::string file_five = backupdir_ + "/shared/000005.sst";
std::string file_five_contents = "I'm not really a sst file";
// this depends on the fact that 00005.sst is the first file created by the DB
ASSERT_OK(file_manager_->WriteToFile(file_five, file_five_contents));

FillDB(db_.get(), 0, 100);
// backup overwrites file 000005.sst
ASSERT_TRUE(db_->CreateNewBackup(true).ok());

std::string new_file_five_contents;
ASSERT_OK(ReadFileToString(env_, file_five, &new_file_five_contents));
// file 000005.sst was overwritten
ASSERT_TRUE(new_file_five_contents != file_five_contents);

CloseBackupableDB();

AssertBackupConsistency(0, 0, 100);
}

} // anon namespace

} // namespace rocksdb
Expand Down

0 comments on commit 50eab9c

Please sign in to comment.