CancelAllBackgroundWork: Flush will not wait for stall conditions to clear #832

Yuval-Ariel · 2024-02-11T13:00:29Z

When closing the db, CancelAllBackgroundWork is called and a flush is initiated when there are unpersisted writes (without wal - has_unpersisted_data_ is true).
This flush currently waits for some stall conditions to clear (in WaitUntilFlushWouldNotStallWrites).
The wait is controlled by FlushOptions.allow_write_stall and is false by default.

Such behaviour does not seem logical since the user simply wants to close the db and they should not be forced to wait for stall conditions.

In cases such as db_bench, this leads to some obscurity by sometimes hiding the compaction debt that was accumulated during the test.
This hiding effect is caused when the test ends in a stall condition and it has unpersisted data. As mentioned above, new compactions will be executed since CancelAllBackgroundWork is waiting until the flush ends which in turn waits for the stall conditions to clear.
A different test which has a lower compaction debt will close in much shorter time and the benefit of having a lower compaction debt will be overlooked

Solve this by allowing the flush to run despite the stall conditions and thus only currently running bg operations will be executed and not new ones.

Results:

fillup cmd:
db_bench --compression_type=None -db=/data/ -num=60000000 -value_size=1000 -key_size=16 -report_interval_seconds=1 -stats_dump_period_sec=120 -num_column_families=1 -write_buffer_size=67108864 -histogram --delayed_write_rate=536870912 -max_write_buffer_number=4 -db_write_buffer_size=0 -max_background_compactions=8 -cache_size=8388608 -max_background_flushes=4 -bloom_bits=10 -benchmark_read_rate_limit=0 -benchmark_write_rate_limit=0 -report_file=fillrandom.csv --disable_wal=true --benchmarks=fillrandom,levelstats,memstats,stats -max_write_buffer_number=8

It takes 3 secs for Shutdown (" Shutdown: canceling all background work" in the LOG) in this branch and almost 40 secs in main.

lsm state when CancelAllBackgroundWork is called in fillup:

main:
files[33 62 39 420 71 0 0]

this branch:
files[33 22 28 428 86 0 0]

lsm state at start of next test:

main:

this branch:

As can be seen, the main branch has done much compaction between the two tests which is unaccounted for in the performance measurements.

udi-speedb · 2024-02-12T07:11:20Z

@Yuval-Ariel -

What about cases where there is a compaction debt but it is not big enough to trigger a stall condition? Wouldn't we close the DB while still having a compaction debt, only smaller?
CancelAllBackgroundWork() is exposed in the C interface, the JAVA interface and is called directly from multiple locations, in addition to the Close() API.
Although it makes sense to have db_bench tests complete all compactions before closing the DB and ending the benchmark, I feel it is problematic to change that internally as you have (unconditionally, without user control).
I suggest adding a new boolean parameter with a default value of false (to retain the current behavior by default). When the new parameter is true, Close() would wait for all of the compactions to complete before returning control to the user.

Yuval-Ariel · 2024-02-13T10:28:44Z

@udi-speedb

The goal here is to prevent compactions running at db close when the flush is waiting for stall conditions. So that close is as fast as possible and the user can still ask to wait for bg work to finish by passing true to CancelAllBackgroundWork.
i've opened CancelAllBackgroundWork: Flush of unpersisted data waits for stall conditions to clear and delays db close facebook/rocksdb#12346

…clear

Yuval-Ariel self-assigned this Feb 11, 2024

Yuval-Ariel force-pushed the dont-wait-on-flush-when-cancelallbgwork branch from fb89759 to 07276cb Compare February 11, 2024 15:48

Yuval-Ariel requested a review from udi-speedb February 11, 2024 15:49

Yuval-Ariel added 2 commits February 20, 2024 14:45

CancelAllBackgroundWork: Flush does not wait for stall conditions to …

29e91cb

…clear

update history

67b7b61

Yuval-Ariel force-pushed the dont-wait-on-flush-when-cancelallbgwork branch from 07276cb to 67b7b61 Compare February 20, 2024 12:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CancelAllBackgroundWork: Flush will not wait for stall conditions to clear #832

CancelAllBackgroundWork: Flush will not wait for stall conditions to clear #832

Yuval-Ariel commented Feb 11, 2024 •

edited

Loading

udi-speedb commented Feb 12, 2024

Yuval-Ariel commented Feb 13, 2024

CancelAllBackgroundWork: Flush will not wait for stall conditions to clear #832

Are you sure you want to change the base?

CancelAllBackgroundWork: Flush will not wait for stall conditions to clear #832

Conversation

Yuval-Ariel commented Feb 11, 2024 • edited Loading

Results:

lsm state when CancelAllBackgroundWork is called in fillup:

lsm state at start of next test:

udi-speedb commented Feb 12, 2024

Yuval-Ariel commented Feb 13, 2024

Yuval-Ariel commented Feb 11, 2024 •

edited

Loading