Skip to content

Commit

Permalink
mm: compaction: add a tunable that decides when memory should be comp…
Browse files Browse the repository at this point in the history
…acted and when it should be reclaimed

The kernel applies some heuristics when deciding if memory should be
compacted or reclaimed to satisfy a high-order allocation.  One of these
is based on the fragmentation.  If the index is below 500, memory will not
be compacted.  This choice is arbitrary and not based on data.  To help
optimise the system and set a sensible default for this value, this patch
adds a sysctl extfrag_threshold.  The kernel will only compact memory if
the fragmentation index is above the extfrag_threshold.

[[email protected]: Fix build errors when proc fs is not configured]
Signed-off-by: Mel Gorman <[email protected]>
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: KOSAKI Motohiro <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: KAMEZAWA Hiroyuki <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
gormanm authored and torvalds committed May 25, 2010
1 parent 56de726 commit 5e77190
Show file tree
Hide file tree
Showing 4 changed files with 44 additions and 1 deletion.
15 changes: 15 additions & 0 deletions Documentation/sysctl/vm.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ Currently, these files are in /proc/sys/vm:
- dirty_ratio
- dirty_writeback_centisecs
- drop_caches
- extfrag_threshold
- hugepages_treat_as_movable
- hugetlb_shm_group
- laptop_mode
Expand Down Expand Up @@ -149,6 +150,20 @@ user should run `sync' first.

==============================================================

extfrag_threshold

This parameter affects whether the kernel will compact memory or direct
reclaim to satisfy a high-order allocation. /proc/extfrag_index shows what
the fragmentation index for each order is in each zone in the system. Values
tending towards 0 imply allocations would fail due to lack of memory,
values towards 1000 imply failures are due to fragmentation and -1 implies
that the allocation will succeed as long as watermarks are met.

The kernel will not compact memory in a zone if the
fragmentation index is <= extfrag_threshold. The default value is 500.

==============================================================

hugepages_treat_as_movable

This parameter is only useful when kernelcore= is specified at boot time to
Expand Down
3 changes: 3 additions & 0 deletions include/linux/compaction.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@
extern int sysctl_compact_memory;
extern int sysctl_compaction_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos);
extern int sysctl_extfrag_threshold;
extern int sysctl_extfrag_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos);

extern int fragmentation_index(struct zone *zone, unsigned int order);
extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
Expand Down
15 changes: 15 additions & 0 deletions kernel/sysctl.c
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,11 @@ static int min_sched_shares_ratelimit = 100000; /* 100 usec */
static int max_sched_shares_ratelimit = NSEC_PER_SEC; /* 1 second */
#endif

#ifdef CONFIG_COMPACTION
static int min_extfrag_threshold;
static int max_extfrag_threshold = 1000;
#endif

static struct ctl_table kern_table[] = {
{
.procname = "sched_child_runs_first",
Expand Down Expand Up @@ -1130,6 +1135,16 @@ static struct ctl_table vm_table[] = {
.mode = 0200,
.proc_handler = sysctl_compaction_handler,
},
{
.procname = "extfrag_threshold",
.data = &sysctl_extfrag_threshold,
.maxlen = sizeof(int),
.mode = 0644,
.proc_handler = sysctl_extfrag_handler,
.extra1 = &min_extfrag_threshold,
.extra2 = &max_extfrag_threshold,
},

#endif /* CONFIG_COMPACTION */
{
.procname = "min_free_kbytes",
Expand Down
12 changes: 11 additions & 1 deletion mm/compaction.c
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,8 @@ static unsigned long compact_zone_order(struct zone *zone,
return compact_zone(zone, &cc);
}

int sysctl_extfrag_threshold = 500;

/**
* try_to_compact_pages - Direct compact to satisfy a high-order allocation
* @zonelist: The zonelist used for the current allocation
Expand Down Expand Up @@ -491,7 +493,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
* Only compact if a failure would be due to fragmentation.
*/
fragindex = fragmentation_index(zone, order);
if (fragindex >= 0 && fragindex <= 500)
if (fragindex >= 0 && fragindex <= sysctl_extfrag_threshold)
continue;

if (fragindex == -1 && zone_watermark_ok(zone, order, watermark, 0, 0)) {
Expand Down Expand Up @@ -572,6 +574,14 @@ int sysctl_compaction_handler(struct ctl_table *table, int write,
return 0;
}

int sysctl_extfrag_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos)
{
proc_dointvec_minmax(table, write, buffer, length, ppos);

return 0;
}

#if defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
ssize_t sysfs_compact_node(struct sys_device *dev,
struct sysdev_attribute *attr,
Expand Down

0 comments on commit 5e77190

Please sign in to comment.