Skip to content

Commit

Permalink
memory-hotplug.rst: document the "auto-movable" online policy
Browse files Browse the repository at this point in the history
Commit e83a437 ("mm/memory_hotplug: introduce "auto-movable" online
policy") introduced a new memory online policy to automatically select a
zone for memory blocks to be onlined.  It added a way to set the active
online policy and tunables for the auto-movable online policy.

Follow-up commits tweaked the "auto-movable" policy to also consider
memory device details when selecting zones for memory blocks to be
onlined.

Let's document the new toggles and how the two online policies we have
work.

[[email protected]: updates]
  Link: https://lkml.kernel.org/r/[email protected]

Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Acked-by: Mike Rapoport <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
davidhildenbrand authored and torvalds committed Nov 6, 2021
1 parent a8db400 commit 9e122cc
Showing 1 changed file with 121 additions and 20 deletions.
141 changes: 121 additions & 20 deletions Documentation/admin-guide/mm/memory-hotplug.rst
Original file line number Diff line number Diff line change
@@ -165,9 +165,8 @@ Or alternatively::

% echo 1 > /sys/devices/system/memory/memoryXXX/online

The kernel will select the target zone automatically, usually defaulting to
``ZONE_NORMAL`` unless ``movable_node`` has been specified on the kernel
command line or if the memory block would intersect the ZONE_MOVABLE already.
The kernel will select the target zone automatically, depending on the
configured ``online_policy``.

One can explicitly request to associate an offline memory block with
ZONE_MOVABLE by::
@@ -198,6 +197,9 @@ Auto-onlining can be enabled by writing ``online``, ``online_kernel`` or

% echo online > /sys/devices/system/memory/auto_online_blocks

Similarly to manual onlining, with ``online`` the kernel will select the
target zone automatically, depending on the configured ``online_policy``.

Modifying the auto-online behavior will only affect all subsequently added
memory blocks only.

@@ -393,11 +395,16 @@ command line parameters are relevant:
======================== =======================================================
``memhp_default_state`` configure auto-onlining by essentially setting
``/sys/devices/system/memory/auto_online_blocks``.
``movable_node`` configure automatic zone selection in the kernel. When
set, the kernel will default to ZONE_MOVABLE, unless
other zones can be kept contiguous.
``movable_node`` configure automatic zone selection in the kernel when
using the ``contig-zones`` online policy. When
set, the kernel will default to ZONE_MOVABLE when
onlining a memory block, unless other zones can be kept
contiguous.
======================== =======================================================

See Documentation/admin-guide/kernel-parameters.txt for a more generic
description of these command line parameters.

Module Parameters
------------------

@@ -414,20 +421,114 @@ and they can be observed (and some even modified at runtime) via::

The following module parameters are currently defined:

======================== =======================================================
``memmap_on_memory`` read-write: Allocate memory for the memmap from the
added memory block itself. Even if enabled, actual
support depends on various other system properties and
should only be regarded as a hint whether the behavior
would be desired.

While allocating the memmap from the memory block
itself makes memory hotplug less likely to fail and
keeps the memmap on the same NUMA node in any case, it
can fragment physical memory in a way that huge pages
in bigger granularity cannot be formed on hotplugged
memory.
======================== =======================================================
================================ ===============================================
``memmap_on_memory`` read-write: Allocate memory for the memmap from
the added memory block itself. Even if enabled,
actual support depends on various other system
properties and should only be regarded as a
hint whether the behavior would be desired.

While allocating the memmap from the memory
block itself makes memory hotplug less likely
to fail and keeps the memmap on the same NUMA
node in any case, it can fragment physical
memory in a way that huge pages in bigger
granularity cannot be formed on hotplugged
memory.
``online_policy`` read-write: Set the basic policy used for
automatic zone selection when onlining memory
blocks without specifying a target zone.
``contig-zones`` has been the kernel default
before this parameter was added. After an
online policy was configured and memory was
online, the policy should not be changed
anymore.

When set to ``contig-zones``, the kernel will
try keeping zones contiguous. If a memory block
intersects multiple zones or no zone, the
behavior depends on the ``movable_node`` kernel
command line parameter: default to ZONE_MOVABLE
if set, default to the applicable kernel zone
(usually ZONE_NORMAL) if not set.

When set to ``auto-movable``, the kernel will
try onlining memory blocks to ZONE_MOVABLE if
possible according to the configuration and
memory device details. With this policy, one
can avoid zone imbalances when eventually
hotplugging a lot of memory later and still
wanting to be able to hotunplug as much as
possible reliably, very desirable in
virtualized environments. This policy ignores
the ``movable_node`` kernel command line
parameter and isn't really applicable in
environments that require it (e.g., bare metal
with hotunpluggable nodes) where hotplugged
memory might be exposed via the
firmware-provided memory map early during boot
to the system instead of getting detected,
added and onlined later during boot (such as
done by virtio-mem or by some hypervisors
implementing emulated DIMMs). As one example, a
hotplugged DIMM will be onlined either
completely to ZONE_MOVABLE or completely to
ZONE_NORMAL, not a mixture.
As another example, as many memory blocks
belonging to a virtio-mem device will be
onlined to ZONE_MOVABLE as possible,
special-casing units of memory blocks that can
only get hotunplugged together. *This policy
does not protect from setups that are
problematic with ZONE_MOVABLE and does not
change the zone of memory blocks dynamically
after they were onlined.*
``auto_movable_ratio`` read-write: Set the maximum MOVABLE:KERNEL
memory ratio in % for the ``auto-movable``
online policy. Whether the ratio applies only
for the system across all NUMA nodes or also
per NUMA nodes depends on the
``auto_movable_numa_aware`` configuration.

All accounting is based on present memory pages
in the zones combined with accounting per
memory device. Memory dedicated to the CMA
allocator is accounted as MOVABLE, although
residing on one of the kernel zones. The
possible ratio depends on the actual workload.
The kernel default is "301" %, for example,
allowing for hotplugging 24 GiB to a 8 GiB VM
and automatically onlining all hotplugged
memory to ZONE_MOVABLE in many setups. The
additional 1% deals with some pages being not
present, for example, because of some firmware
allocations.

Note that ZONE_NORMAL memory provided by one
memory device does not allow for more
ZONE_MOVABLE memory for a different memory
device. As one example, onlining memory of a
hotplugged DIMM to ZONE_NORMAL will not allow
for another hotplugged DIMM to get onlined to
ZONE_MOVABLE automatically. In contrast, memory
hotplugged by a virtio-mem device that got
onlined to ZONE_NORMAL will allow for more
ZONE_MOVABLE memory within *the same*
virtio-mem device.
``auto_movable_numa_aware`` read-write: Configure whether the
``auto_movable_ratio`` in the ``auto-movable``
online policy also applies per NUMA
node in addition to the whole system across all
NUMA nodes. The kernel default is "Y".

Disabling NUMA awareness can be helpful when
dealing with NUMA nodes that should be
completely hotunpluggable, onlining the memory
completely to ZONE_MOVABLE automatically if
possible.

Parameter availability depends on CONFIG_NUMA.
================================ ===============================================

ZONE_MOVABLE
============

0 comments on commit 9e122cc

Please sign in to comment.