Skip to content

Commit

Permalink
Linux: Make zfs_prune() fair on NUMA systems
Browse files Browse the repository at this point in the history
Previous code evicted nr_to_scan items from each NUMA node.  This
not only multiplied the eviction by the number of nodes, but could
exhaust the smaller ones, evicting inodes used by acive workload
and requiring their immediate recreation.  This patch spreads the
requested eviction between all NUMA nodes proportionally to their
evictable counts, which should be closer to expected LRU logic.
See kernel's super_cache_scan() as a similar logic example.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16397
  • Loading branch information
amotin authored Aug 8, 2024
1 parent 5b9f3b7 commit 3ae05e3
Showing 1 changed file with 13 additions and 5 deletions.
18 changes: 13 additions & 5 deletions module/os/linux/zfs/zfs_vfsops.c
Original file line number Diff line number Diff line change
Expand Up @@ -1264,14 +1264,22 @@ zfs_prune(struct super_block *sb, unsigned long nr_to_scan, int *objects)
defined(SHRINK_CONTROL_HAS_NID) && \
defined(SHRINKER_NUMA_AWARE)
if (shrinker->flags & SHRINKER_NUMA_AWARE) {
long tc = 1;
for_each_online_node(sc.nid) {
long c = shrinker->count_objects(shrinker, &sc);
if (c == 0 || c == SHRINK_EMPTY)
continue;
tc += c;
}
*objects = 0;
for_each_online_node(sc.nid) {
long c = shrinker->count_objects(shrinker, &sc);
if (c == 0 || c == SHRINK_EMPTY)
continue;
if (c > tc)
tc = c;
sc.nr_to_scan = mult_frac(nr_to_scan, c, tc) + 1;
*objects += (*shrinker->scan_objects)(shrinker, &sc);
/*
* reset sc.nr_to_scan, modified by
* scan_objects == super_cache_scan
*/
sc.nr_to_scan = nr_to_scan;
}
} else {
*objects = (*shrinker->scan_objects)(shrinker, &sc);
Expand Down

0 comments on commit 3ae05e3

Please sign in to comment.