row_cache: Always touch the partition on entry · mixellent/scylladb@6c168ee

Commit

row_cache: Always touch the partition on entry

This fixes a potential cause for reactor stalls during memory
reclamation. Applies only to schemas without clustering columns.

Every partition in cache has a dummy row at the end of the clustering
range (last dummy). That dummy must be evicted last, because MVCC
logic needs it to be there at all times. If LRU picks it for eviction
and it's not the last row, eviction does nothing and moves
on. Eventually, all other rows in this partition will be evicted too
and then the partition will go away with it.

Mutation reader updates the position of rows in the LRU (aka touching)
as it walks over them. However, it was not always touching the last
dummy row. If the partition was fully cached, and schema had no
clustering key, it would exit early before reaching the last dummy
row, here:

    inline
    void cache_flat_mutation_reader::move_to_next_entry() {
	clogger.trace("csm {}: move_to_next_entry(), curr={}", fmt::ptr(this), _next_row.position());
	if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) {
	    move_to_next_range();

That's because no_clustering_row_between() is always true for any key
in tables with no clustering columns, and the reader advances to
end-of-stream without advancing _next_row to the last dummy. This is
expected and desired, it means that the query range ends at the
current row and there is no need to move further. We would not take
this exit for tables with a non-singular clustering key domain and
open-ended query range, since there would be a key gap before the last
dummy row.

Refs scylladb#2972.

The effect of leaving the last dummy row not touched will be that such
scans will segregate rows in the LRU, bring all regular rows to the
front, and dummy rows at the tail. When eviction reaches the band of
dummy rows, it will have to walk over it, because evicting them
releases no memory. This can cause a reactor stall.

An easy fix for the scenario would be to always touch the dummy entry
when entering the partition. It's unlikely that the read will not
proceed to the regular rows. It would be best to avoid linking such
dummies in the LRU, but that's a much more complex change.

Discovered in perf_row_cache_update, test_small_partitions(). I saw
200ms stalls with -m8G.

Refs scylladb#8541.

Tests:

   - row_cache_test (release)
   - perf_simple_query [no change]

Message-Id: <[email protected]>

Loading branch information

tgrabiec authored and avikivity committed Apr 28, 2021

1 parent 96ef204 commit 6c168ee

cache_flat_mutation_reader.hh

-Original file line number
+Diff line change
@@ Expand Up / @@ -245,9 +245,9 @@ void cache_flat_mutation_reader::touch_partition() { @@
     inline
     future<> cache_flat_mutation_reader::fill_buffer(db::timeout_clock::time_point timeout) {
         if (_state == state::before_static_row) {
+            touch_partition();
             auto after_static_row = [this, timeout] {
                 if (_ck_ranges_curr == _ck_ranges_end) {
-                    touch_partition();
                     finish_reader();
                     return make_ready_future<>();
                 }
@@ Expand Down @@

0 comments on commit `6c168ee`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `6c168ee`

Commit

There are no files selected for viewing

0 comments on commit 6c168ee

0 comments on commit `6c168ee`