Skip to content

Commit

Permalink
mm: correct calculation of wb's bg_thresh in cgroup domain
Browse files Browse the repository at this point in the history
wb_calc_thresh() is calculating wb's share of bg_thresh in the global
domain.  However in case of cgroup writeback this is not the right
thing to do.  Consider the following domain hierarchy:

                global domain (> 20G)
                /                 \
          cgroup1 (10G)     cgroup2 (10G)
                |                 |
bdi            wb1               wb2

and assume wb1 and wb2 have the same bandwidth and the background
threshold is set at 10%.  The bg_thresh of cgroup1 and cgroup2 is going
to be 1G.  Now because wb_calc_thresh(mdtc->wb, mdtc->bg_thresh)
calculates per-wb threshold in the global domain as (wb bandwidth) /
(domain bandwidth) it returns bg_thresh for wb1 as 0.5G although it has
nobody to compete against in cgroup1.

Fix the problem by calculating wb's share of bg_thresh in the cgroup
domain.

Test as following:
/* make it easier to observe the issue */
echo 300000 > /proc/sys/vm/dirty_expire_centisecs
echo 100 > /proc/sys/vm/dirty_writeback_centisecs

/* run fio in wb1 */
cd /sys/fs/cgroup
echo "+memory +io" > cgroup.subtree_control
mkdir group1
cd group1
echo 10G > memory.high
echo 10G > memory.max
echo $$ > cgroup.procs
mkfs.ext4 -F /dev/vdb
mount /dev/vdb /bdi1/
fio -name test -filename=/bdi1/file -size=600M -ioengine=libaio -bs=4K \
-iodepth=1 -rw=write -direct=0 --time_based -runtime=600 -invalidate=0

/* run fio in wb2 with a new shell */
cd /sys/fs/cgroup
mkdir group2
cd group2
echo 10G > memory.high
echo 10G > memory.max
echo $$ > cgroup.procs
mkfs.ext4 -F /dev/vdc
mount /dev/vdc /bdi2/
fio -name test -filename=/bdi2/file -size=600M -ioengine=libaio -bs=4K \
-iodepth=1 -rw=write -direct=0 --time_based -runtime=600 -invalidate=0

Before fix, the wrttien pages of wb1 and wb2 reported from
toos/writeback/wb_monitor.py keep growing. After fix, rare written pages
are accumulated.
There is no obvious change in fio result.

[[email protected]: changelog rewording]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 74d3694 ("writeback: Fix performance regression in wb_over_bg_thresh()")
Signed-off-by: Kemeng Shi <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Cc: Howard Cochran <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
  • Loading branch information
Kemeng Shi authored and akpm00 committed May 6, 2024
1 parent 13fc441 commit fabd2e4
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion mm/page-writeback.c
Original file line number Diff line number Diff line change
Expand Up @@ -2137,7 +2137,7 @@ bool wb_over_bg_thresh(struct bdi_writeback *wb)
if (mdtc->dirty > mdtc->bg_thresh)
return true;

thresh = wb_calc_thresh(mdtc->wb, mdtc->bg_thresh);
thresh = __wb_calc_thresh(mdtc, mdtc->bg_thresh);
if (thresh < 2 * wb_stat_error())
reclaimable = wb_stat_sum(wb, WB_RECLAIMABLE);
else
Expand Down

0 comments on commit fabd2e4

Please sign in to comment.