Skip to content

Commit

Permalink
vswitchd: Only lock pages that are faulted in.
Browse files Browse the repository at this point in the history
The main purpose of locking the memory is to ensure that OVS can keep
doing what it did before in case of increased memory pressure, e.g.,
during VM ingest / migration.  Fulfilling this requirement can be
achieved without locking all the allocated memory, but only the pages
already accessed in the past (faulted in).  Processing of the new
traffic involves new memory allocations.  Latency on these operations
can't be guaranteed by the locking.  The main difference would be
the pre-faulting of the stack memory.  However, in order to revalidate
or process upcalls on the same traffic, the same amount of stack is
likely needed, so all the necessary memory will already be faulted in.

Switch 'mlockall' to MCL_ONFAULT to avoid consuming unnecessarily
large amounts of RAM on systems with high core counts.  For example,
in a densely populated OVN cluster this saves about 650 MB of RAM per
node on a system with 64 cores.  This equates to 320 GB of allocated
but unused RAM in a 500 node cluster.

This also makes OVS better suited by default for small systems with
limited amount of memory.

The MCL_ONFAULT flag was introduced in Linux kernel 4.4 and wasn't
available at the time of '--mlockall' introduction, but we can use it
now.  Falling back to an old way of locking in case we're running on
an older kernel just in case.

Only locking the faulted in pages also makes locking compatible with
vhost post-copy live migration by default, because we'll no longer
pre-fault all the guest's memory.  Post-copy relies on userfaultfd
to work on shared huge pages, which is only available in 4.11+ kernels.
So, technically, it should not be possible for MCL_ONFAULT to fail and
the call without it to succeed.  But keeping the check just in case
for now.

Acked-by: Simon Horman <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Signed-off-by: Ilya Maximets <[email protected]>
  • Loading branch information
igsilya committed Jun 28, 2024
1 parent 639fcf2 commit 56e3159
Show file tree
Hide file tree
Showing 8 changed files with 35 additions and 22 deletions.
5 changes: 3 additions & 2 deletions Documentation/ref/ovs-ctl.8.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,8 +170,9 @@ The following options are less important:
* ``--no-mlockall``

By default ``ovs-ctl`` passes ``--mlockall`` to ``ovs-vswitchd``,
requesting that it lock all of its virtual memory, preventing it
from being paged to disk. This option suppresses that behavior.
requesting that it lock all of its virtual memory on page fault (on
allocation, when running on Linux kernel 4.4 and older), preventing
it from being paged to disk. This option suppresses that behavior.

* ``--no-self-confinement``

Expand Down
6 changes: 4 additions & 2 deletions Documentation/topics/dpdk/vhost-user.rst
Original file line number Diff line number Diff line change
Expand Up @@ -340,8 +340,10 @@ The default value is ``false``.
fixes (like userfaulfd leak) was released in 3.0.1.

DPDK Post-copy feature requires avoiding to populate the guest memory
(application must not call mlock* syscall). So enabling mlockall is
incompatible with post-copy feature.
(application must not call mlock* syscall without MCL_ONFAULT).
So enabling mlockall is incompatible with post-copy feature in OVS 3.3 and
older. Newer versions of OVS only lock memory pages that are faulted in,
so both features can be used at the same time.

Note that during migration of vhost-user device, PMD threads hang for the
time of faulted pages download from source host. Transferring 1GB hugepage
Expand Down
2 changes: 2 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
Post-v3.3.0
--------------------
- Option '--mlockall' now only locks memory pages on fault, if possible.
This also makes it compatible with vHost Post-copy Live Migration.
- Userspace datapath:
* Conntrack now supports 'random' flag for selecting ports in a range
while natting and 'persistent' flag for selection of the IP address
Expand Down
2 changes: 1 addition & 1 deletion lib/netdev-dpdk.c
Original file line number Diff line number Diff line change
Expand Up @@ -6719,7 +6719,7 @@ parse_vhost_config(const struct smap *ovs_other_config)

vhost_postcopy_enabled = smap_get_bool(ovs_other_config,
"vhost-postcopy-support", false);
if (vhost_postcopy_enabled && memory_locked()) {
if (vhost_postcopy_enabled && memory_all_locked()) {
VLOG_WARN("vhost-postcopy-support and mlockall are not compatible.");
vhost_postcopy_enabled = false;
}
Expand Down
12 changes: 6 additions & 6 deletions lib/util.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@ DEFINE_PER_THREAD_MALLOCED_DATA(char *, subprogram_name);
/* --version option output. */
static char *program_version;

/* 'true' if mlockall() succeeded. */
static bool is_memory_locked = false;
/* 'true' if mlockall() succeeded, but doesn't support ONFAULT. */
static bool is_all_memory_locked = false;

/* Buffer used by ovs_strerror() and ovs_format_message(). */
DEFINE_STATIC_PER_THREAD_DATA(struct { char s[128]; },
Expand Down Expand Up @@ -102,15 +102,15 @@ ovs_assert_failure(const char *where, const char *function,
}

void
set_memory_locked(void)
set_all_memory_locked(void)
{
is_memory_locked = true;
is_all_memory_locked = true;
}

bool
memory_locked(void)
memory_all_locked(void)
{
return is_memory_locked;
return is_all_memory_locked;
}

void
Expand Down
4 changes: 2 additions & 2 deletions lib/util.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,8 @@ void ctl_timeout_setup(unsigned int secs);

void ovs_print_version(uint8_t min_ofp, uint8_t max_ofp);

void set_memory_locked(void);
bool memory_locked(void);
void set_all_memory_locked(void);
bool memory_all_locked(void);

OVS_NO_RETURN void out_of_memory(void);

Expand Down
9 changes: 5 additions & 4 deletions vswitchd/ovs-vswitchd.8.in
Original file line number Diff line number Diff line change
Expand Up @@ -68,10 +68,11 @@ load the Open vSwitch kernel module.
.PP
.SH OPTIONS
.IP "\fB\-\-mlockall\fR"
Causes \fBovs\-vswitchd\fR to call the \fBmlockall()\fR function, to
attempt to lock all of its process memory into physical RAM,
preventing the kernel from paging any of its memory to disk. This
helps to avoid networking interruptions due to system memory pressure.
Causes \fBovs\-vswitchd\fR to call the \fBmlockall()\fR function, to attempt to
lock all of its process memory into physical RAM on page faults (on allocation,
when running on Linux kernel 4.4 or older), preventing the kernel from paging
any of its memory to disk. This helps to avoid networking interruptions due to
system memory pressure.
.IP
Some systems do not support \fBmlockall()\fR at all, and other systems
only allow privileged users, such as the superuser, to use it.
Expand Down
17 changes: 12 additions & 5 deletions vswitchd/ovs-vswitchd.c
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@

VLOG_DEFINE_THIS_MODULE(vswitchd);

/* --mlockall: If set, locks all process memory into physical RAM, preventing
/* --mlockall: If set, locks all present process memory pages into physical
* RAM and all the new pages the moment they are faulted in, preventing
* the kernel from paging any of its memory to disk. */
static bool want_mlockall;

Expand Down Expand Up @@ -96,10 +97,16 @@ main(int argc, char *argv[])

if (want_mlockall) {
#ifdef HAVE_MLOCKALL
if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
VLOG_ERR("mlockall failed: %s", ovs_strerror(errno));
} else {
set_memory_locked();
/* MCL_ONFAULT introduced in Linux kernel 4.4. */
#ifndef MCL_ONFAULT
#define MCL_ONFAULT 4
#endif
if (mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) {
if (mlockall(MCL_CURRENT | MCL_FUTURE)) {
VLOG_ERR("mlockall failed: %s", ovs_strerror(errno));
} else {
set_all_memory_locked();
}
}
#else
VLOG_ERR("mlockall not supported on this system");
Expand Down

0 comments on commit 56e3159

Please sign in to comment.