Skip to content

Commit

Permalink
mm: do not sleep in balance_pgdat if there's no i/o congestion
Browse files Browse the repository at this point in the history
On a 4GB RAM machine, where Normal zone is much smaller than DMA32 zone,
the Normal zone gets fragmented in time.  This requires relatively more
pressure in balance_pgdat to get the zone above the required watermark.
Unfortunately, the congestion_wait() call in there slows it down for a
completely wrong reason, expecting that there's a lot of
writeback/swapout, even when there's none (much more common).  After a
few days, when fragmentation progresses, this flawed logic translates to
a very high CPU iowait times, even though there's no I/O congestion at
all.  If THP is enabled, the problem occurs sooner, but I was able to
see it even on !THP kernels, just by giving it a bit more time to occur.

The proper way to deal with this is to not wait, unless there's
congestion.  Thanks to Mel Gorman, we already have the function that
perfectly fits the job.  The patch was tested on a machine which nicely
revealed the problem after only 1 day of uptime, and it's been working
great.

Signed-off-by: Zlatko Calusic <[email protected]>
Acked-by: Mel Gorman <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
Zlatko Calusic authored and torvalds committed Dec 20, 2012
1 parent f01af9f commit cda73a1
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions mm/vmscan.c
Original file line number Diff line number Diff line change
Expand Up @@ -2570,7 +2570,7 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, long remaining,
static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
int *classzone_idx)
{
int all_zones_ok;
struct zone *unbalanced_zone;
unsigned long balanced;
int i;
int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */
Expand Down Expand Up @@ -2604,7 +2604,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
unsigned long lru_pages = 0;
int has_under_min_watermark_zone = 0;

all_zones_ok = 1;
unbalanced_zone = NULL;
balanced = 0;

/*
Expand Down Expand Up @@ -2743,7 +2743,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
}

if (!zone_balanced(zone, testorder, 0, end_zone)) {
all_zones_ok = 0;
unbalanced_zone = zone;
/*
* We are still under min water mark. This
* means that we have a GFP_ATOMIC allocation
Expand Down Expand Up @@ -2776,7 +2776,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
pfmemalloc_watermark_ok(pgdat))
wake_up(&pgdat->pfmemalloc_wait);

if (all_zones_ok || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))
if (!unbalanced_zone || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))
break; /* kswapd: all done */
/*
* OK, kswapd is getting into trouble. Take a nap, then take
Expand All @@ -2786,7 +2786,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
if (has_under_min_watermark_zone)
count_vm_event(KSWAPD_SKIP_CONGESTION_WAIT);
else
congestion_wait(BLK_RW_ASYNC, HZ/10);
wait_iff_congested(unbalanced_zone, BLK_RW_ASYNC, HZ/10);
}

/*
Expand All @@ -2805,7 +2805,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order,
* high-order: Balanced zones must make up at least 25% of the node
* for the node to be balanced
*/
if (!(all_zones_ok || (order && pgdat_balanced(pgdat, balanced, *classzone_idx)))) {
if (unbalanced_zone && (!order || !pgdat_balanced(pgdat, balanced, *classzone_idx))) {
cond_resched();

try_to_freeze();
Expand Down

0 comments on commit cda73a1

Please sign in to comment.