Skip to content

Commit beb86ad

Browse files
anand76facebook-github-bot
anand76
authored andcommittedJan 26, 2022
Fix race condition in SstFileManagerImpl error recovery code (facebook#9435)
Summary: There is a race in SstFileManagerImpl between the ClearError() function and CancelErrorRecovery(). The race can cause ClearError() to deref the file system pointer after it has been freed. This is likely to occur during process shutdown, when the order of destruction of the DB/Env/FileSystem and SstFileManagerImpl is not deterministic. Pull Request resolved: facebook#9435 Test Plan: Reproduce the crash in a TSAN build by introducing sleeps in the code, and verify with the fix. Reviewed By: siying Differential Revision: D33774696 Pulled By: anand1976 fbshipit-source-id: 643d3da31b8d2ee6d9b6db5d33327e0053ce3b83
1 parent 8822562 commit beb86ad

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed
 

‎HISTORY.md

+1
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ Note: The next release will be major release 7.0. See https://github.com/faceboo
3131
* Fix a bug that FlushMemTable may return ok even flush not succeed.
3232
* Fixed a bug of Sync() and Fsync() not using `fcntl(F_FULLFSYNC)` on OS X and iOS.
3333
* Fixed a significant performance regression in version 6.26 when a prefix extractor is used on the read path (Seek, Get, MultiGet). (Excessive time was spent in SliceTransform::AsString().)
34+
* Fixed a race condition in SstFileManagerImpl error recovery code that can cause a crash during process shutdown.
3435

3536
### New Features
3637
* Added RocksJava support for MacOS universal binary (ARM+x86)

‎file/sst_file_manager_impl.cc

+3-2
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ void SstFileManagerImpl::ClearError() {
256256
while (true) {
257257
MutexLock l(&mu_);
258258

259-
if (closing_) {
259+
if (error_handler_list_.empty() || closing_) {
260260
return;
261261
}
262262

@@ -297,7 +297,8 @@ void SstFileManagerImpl::ClearError() {
297297

298298
// Someone could have called CancelErrorRecovery() and the list could have
299299
// become empty, so check again here
300-
if (s.ok() && !error_handler_list_.empty()) {
300+
if (s.ok()) {
301+
assert(!error_handler_list_.empty());
301302
auto error_handler = error_handler_list_.front();
302303
// Since we will release the mutex, set cur_instance_ to signal to the
303304
// shutdown thread, if it calls // CancelErrorRecovery() the meantime,

0 commit comments

Comments
 (0)
Please sign in to comment.