Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CancelAllBackgroundWork: Flush will not wait for stall conditions to clear #832

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Yuval-Ariel
Copy link
Contributor

@Yuval-Ariel Yuval-Ariel commented Feb 11, 2024

When closing the db, CancelAllBackgroundWork is called and a flush is initiated when there are unpersisted writes (without wal - has_unpersisted_data_ is true).
This flush currently waits for some stall conditions to clear (in WaitUntilFlushWouldNotStallWrites).
The wait is controlled by FlushOptions.allow_write_stall and is false by default.

Such behaviour does not seem logical since the user simply wants to close the db and they should not be forced to wait for stall conditions.

In cases such as db_bench, this leads to some obscurity by sometimes hiding the compaction debt that was accumulated during the test.
This hiding effect is caused when the test ends in a stall condition and it has unpersisted data. As mentioned above, new compactions will be executed since CancelAllBackgroundWork is waiting until the flush ends which in turn waits for the stall conditions to clear.
A different test which has a lower compaction debt will close in much shorter time and the benefit of having a lower compaction debt will be overlooked

Solve this by allowing the flush to run despite the stall conditions and thus only currently running bg operations will be executed and not new ones.

Results:

fillup cmd:
db_bench --compression_type=None -db=/data/ -num=60000000 -value_size=1000 -key_size=16 -report_interval_seconds=1 -stats_dump_period_sec=120 -num_column_families=1 -write_buffer_size=67108864 -histogram --delayed_write_rate=536870912 -max_write_buffer_number=4 -db_write_buffer_size=0 -max_background_compactions=8 -cache_size=8388608 -max_background_flushes=4 -bloom_bits=10 -benchmark_read_rate_limit=0 -benchmark_write_rate_limit=0 -report_file=fillrandom.csv --disable_wal=true --benchmarks=fillrandom,levelstats,memstats,stats -max_write_buffer_number=8

It takes 3 secs for Shutdown (" Shutdown: canceling all background work" in the LOG) in this branch and almost 40 secs in main.

lsm state when CancelAllBackgroundWork is called in fillup:

main:
files[33 62 39 420 71 0 0]

this branch:
files[33 22 28 428 86 0 0]

lsm state at start of next test:

main:
image

this branch:
image

As can be seen, the main branch has done much compaction between the two tests which is unaccounted for in the performance measurements.

@Yuval-Ariel Yuval-Ariel self-assigned this Feb 11, 2024
@Yuval-Ariel Yuval-Ariel force-pushed the dont-wait-on-flush-when-cancelallbgwork branch from fb89759 to 07276cb Compare February 11, 2024 15:48
@udi-speedb
Copy link
Contributor

@Yuval-Ariel -

  1. What about cases where there is a compaction debt but it is not big enough to trigger a stall condition? Wouldn't we close the DB while still having a compaction debt, only smaller?
  2. CancelAllBackgroundWork() is exposed in the C interface, the JAVA interface and is called directly from multiple locations, in addition to the Close() API.
    Although it makes sense to have db_bench tests complete all compactions before closing the DB and ending the benchmark, I feel it is problematic to change that internally as you have (unconditionally, without user control).
    I suggest adding a new boolean parameter with a default value of false (to retain the current behavior by default). When the new parameter is true, Close() would wait for all of the compactions to complete before returning control to the user.

@Yuval-Ariel
Copy link
Contributor Author

@udi-speedb

  1. The goal here is to prevent compactions running at db close when the flush is waiting for stall conditions. So that close is as fast as possible and the user can still ask to wait for bg work to finish by passing true to CancelAllBackgroundWork.
  2. i've opened CancelAllBackgroundWork: Flush of unpersisted data waits for stall conditions to clear and delays db close facebook/rocksdb#12346

@Yuval-Ariel Yuval-Ariel force-pushed the dont-wait-on-flush-when-cancelallbgwork branch from 07276cb to 67b7b61 Compare February 20, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants