Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
mm: vmscan: Reduce throttling due to a failure to make progress
Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar problems due to reclaim throttling for excessive lengths of time. In Alexey's case, a memory hog that should go OOM quickly stalls for several minutes before stalling. In Mike and Darrick's cases, a small memcg environment stalled excessively even though the system had enough memory overall. Commit 69392a4 ("mm/vmscan: throttle reclaim when no progress is being made") introduced the problem although commit a19594c ("mm/vmscan: increase the timeout if page reclaim is not making progress") made it worse. Systems at or near an OOM state that cannot be recovered must reach OOM quickly and memcg should kill tasks if a memcg is near OOM. To address this, only stall for the first zone in the zonelist, reduce the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if the scan control nr_reclaimed is 0, kswapd is still active and there were excessive pages pending for writeback. If kswapd has stopped reclaiming due to excessive failures, do not stall at all so that OOM triggers relatively quickly. Similarly, if an LRU is simply congested, only lightly throttle similar to NOPROGRESS. Alexey's original case was the most straight forward for i in {1..3}; do tail /dev/zero; done On vanilla 5.16-rc1, this test stalled heavily, after the patch the test completes in a few seconds similar to 5.15. Alexey's second test case added watching a youtube video while tail runs 10 times. On 5.15, playback only jitters slightly, 5.16-rc1 stalls a lot with lots of frames missing and numerous audio glitches. With this patch applies, the video plays similarly to 5.15. [[email protected]: Fix W=1 build warning] Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Link: https://linux-regtracking.leemhuis.info/regzbot/regression/[email protected]/ Reported-and-tested-by: Alexey Avramov <[email protected]> Reported-and-tested-by: Mike Galbraith <[email protected]> Reported-and-tested-by: Darrick J. Wong <[email protected]> Reported-by: kernel test robot <[email protected]> Acked-by: Hugh Dickins <[email protected]> Tracked-by: Thorsten Leemhuis <[email protected]> Fixes: 69392a4 ("mm/vmscan: throttle reclaim when no progress is being made") Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
- Loading branch information