Skip to content

Commit

Permalink
memblock: introduce a for_each_reserved_mem_region iterator
Browse files Browse the repository at this point in the history
Struct page initialisation had been identified as one of the reasons why
large machines take a long time to boot. Patches were posted a long time ago
to defer initialisation until they were first used.  This was rejected on
the grounds it should not be necessary to hurt the fast paths. This series
reuses much of the work from that time but defers the initialisation of
memory to kswapd so that one thread per node initialises memory local to
that node.

After applying the series and setting the appropriate Kconfig variable I
see this in the boot log on a 64G machine

[    7.383764] kswapd 0 initialised deferred memory in 188ms
[    7.404253] kswapd 1 initialised deferred memory in 208ms
[    7.411044] kswapd 3 initialised deferred memory in 216ms
[    7.411551] kswapd 2 initialised deferred memory in 216ms

On a 1TB machine, I see

[    8.406511] kswapd 3 initialised deferred memory in 1116ms
[    8.428518] kswapd 1 initialised deferred memory in 1140ms
[    8.435977] kswapd 0 initialised deferred memory in 1148ms
[    8.437416] kswapd 2 initialised deferred memory in 1148ms

Once booted the machine appears to work as normal. Boot times were measured
from the time shutdown was called until ssh was available again.  In the
64G case, the boot time savings are negligible. On the 1TB machine, the
savings were 16 seconds.

Nate Zimmer said:

: On an older 8 TB box with lots and lots of cpus the boot time, as
: measure from grub to login prompt, the boot time improved from 1484
: seconds to exactly 1000 seconds.

Waiman Long said:

: I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system.  From
: grub menu to ssh login, the bootup time was 453s before the patch and 265s
: after the patch - a saving of 188s (42%).

Daniel Blueman said:

: On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're seeing
: stock 4.0 boot in 7136s.  This drops to 2159s, or a 70% reduction with
: this patchset.  Non-temporal PMD init (https://lkml.org/lkml/2015/4/23/350)
: drops this to 1045s.

This patch (of 13):

As part of initializing struct page's in 2MiB chunks, we noticed that at
the end of free_all_bootmem(), there was nothing which had forced the
reserved/allocated 4KiB pages to be initialized.

This helper function will be used for that expansion.

Signed-off-by: Robin Holt <[email protected]>
Signed-off-by: Nate Zimmer <[email protected]>
Signed-off-by: Mel Gorman <[email protected]>
Tested-by: Nate Zimmer <[email protected]>
Tested-by: Waiman Long <[email protected]>
Tested-by: Daniel J Blueman <[email protected]>
Acked-by: Pekka Enberg <[email protected]>
Cc: Robin Holt <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Scott Norton <[email protected]>
Cc: "Luck, Tony" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
Robin Holt authored and torvalds committed Jul 1, 2015
1 parent 6aaf0da commit 8e7a7f8
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 0 deletions.
18 changes: 18 additions & 0 deletions include/linux/memblock.h
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,9 @@ void __next_mem_range_rev(u64 *idx, int nid, ulong flags,
struct memblock_type *type_b, phys_addr_t *out_start,
phys_addr_t *out_end, int *out_nid);

void __next_reserved_mem_region(u64 *idx, phys_addr_t *out_start,
phys_addr_t *out_end);

/**
* for_each_mem_range - iterate through memblock areas from type_a and not
* included in type_b. Or just type_a if type_b is NULL.
Expand Down Expand Up @@ -142,6 +145,21 @@ void __next_mem_range_rev(u64 *idx, int nid, ulong flags,
__next_mem_range_rev(&i, nid, flags, type_a, type_b, \
p_start, p_end, p_nid))

/**
* for_each_reserved_mem_region - iterate over all reserved memblock areas
* @i: u64 used as loop variable
* @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
*
* Walks over reserved areas of memblock. Available as soon as memblock
* is initialized.
*/
#define for_each_reserved_mem_region(i, p_start, p_end) \
for (i = 0UL, \
__next_reserved_mem_region(&i, p_start, p_end); \
i != (u64)ULLONG_MAX; \
__next_reserved_mem_region(&i, p_start, p_end))

#ifdef CONFIG_MOVABLE_NODE
static inline bool memblock_is_hotpluggable(struct memblock_region *m)
{
Expand Down
32 changes: 32 additions & 0 deletions mm/memblock.c
Original file line number Diff line number Diff line change
Expand Up @@ -819,6 +819,38 @@ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
}


/**
* __next_reserved_mem_region - next function for for_each_reserved_region()
* @idx: pointer to u64 loop variable
* @out_start: ptr to phys_addr_t for start address of the region, can be %NULL
* @out_end: ptr to phys_addr_t for end address of the region, can be %NULL
*
* Iterate over all reserved memory regions.
*/
void __init_memblock __next_reserved_mem_region(u64 *idx,
phys_addr_t *out_start,
phys_addr_t *out_end)
{
struct memblock_type *rsv = &memblock.reserved;

if (*idx >= 0 && *idx < rsv->cnt) {
struct memblock_region *r = &rsv->regions[*idx];
phys_addr_t base = r->base;
phys_addr_t size = r->size;

if (out_start)
*out_start = base;
if (out_end)
*out_end = base + size - 1;

*idx += 1;
return;
}

/* signal end of iteration */
*idx = ULLONG_MAX;
}

/**
* __next__mem_range - next function for for_each_free_mem_range() etc.
* @idx: pointer to u64 loop variable
Expand Down

0 comments on commit 8e7a7f8

Please sign in to comment.