Skip to content

Commit

Permalink
dpdk: Support both shared and per port mempools.
Browse files Browse the repository at this point in the history
This commit re-introduces the concept of shared mempools as the default
memory model for DPDK devices. Per port mempools are still available but
must be enabled explicitly by a user.

OVS previously used a shared mempool model for ports with the same MTU
and socket configuration. This was replaced by a per port mempool model
to address issues flagged by users such as:

https://mail.openvswitch.org/pipermail/ovs-discuss/2016-September/042560.html

However the per port model potentially requires an increase in memory
resource requirements to support the same number of ports and configuration
as the shared port model.

This is considered a blocking factor for current deployments of OVS
when upgrading to future OVS releases as a user may have to redimension
memory for the same deployment configuration. This may not be possible for
users.

This commit resolves the issue by re-introducing shared mempools as
the default memory behaviour in OVS DPDK but also refactors the memory
configuration code to allow for per port mempools.

This patch adds a new global config option, per-port-memory, that
controls the enablement of per port mempools for DPDK devices.

    ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true

This value defaults to false; to enable per port memory support,
this field should be set to true when setting other global parameters
on init (such as "dpdk-socket-mem", for example). Changing the value at
runtime is not supported, and requires restarting the vswitch
daemon.

The mempool sweep functionality is also replaced with the
sweep functionality from OVS 2.9 found in commits

c77f692 (netdev-dpdk: Free mempool only when no in-use mbufs.)
a7fb0a4 (netdev-dpdk: Add mempool reuse/free debug.)

A new document to discuss the specifics of the memory models and example
memory requirement calculations is also added.

Signed-off-by: Ian Stokes <[email protected]>
Acked-by: Kevin Traynor <[email protected]>
Acked-by: Tiago Lam <[email protected]>
Tested-by: Tiago Lam <[email protected]>
  • Loading branch information
istokes committed Jul 6, 2018
1 parent c3c722d commit 43307ad
Show file tree
Hide file tree
Showing 10 changed files with 457 additions and 98 deletions.
1 change: 1 addition & 0 deletions Documentation/automake.mk
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ DOC_SOURCE = \
Documentation/topics/dpdk/index.rst \
Documentation/topics/dpdk/bridge.rst \
Documentation/topics/dpdk/jumbo-frames.rst \
Documentation/topics/dpdk/memory.rst \
Documentation/topics/dpdk/pdump.rst \
Documentation/topics/dpdk/phy.rst \
Documentation/topics/dpdk/pmd.rst \
Expand Down
6 changes: 6 additions & 0 deletions Documentation/intro/install/dpdk.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,12 @@ Mount the hugepages, if not already mounted by default::

$ mount -t hugetlbfs none /dev/hugepages``

.. note::

The amount of hugepage memory required can be affected by various
aspects of the datapath and device configuration. Refer to
:doc:`/topics/dpdk/memory` for more details.

.. _dpdk-vfio:

Setup DPDK devices using VFIO
Expand Down
1 change: 1 addition & 0 deletions Documentation/topics/dpdk/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,4 @@ The DPDK Datapath
/topics/dpdk/qos
/topics/dpdk/pdump
/topics/dpdk/jumbo-frames
/topics/dpdk/memory
216 changes: 216 additions & 0 deletions Documentation/topics/dpdk/memory.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
..
Copyright (c) 2018 Intel Corporation
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain
a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations
under the License.

Convention for heading levels in Open vSwitch documentation:

======= Heading 0 (reserved for the title in a document)
------- Heading 1
~~~~~~~ Heading 2
+++++++ Heading 3
''''''' Heading 4

Avoid deeper levels because they do not render well.

=========================
DPDK Device Memory Models
=========================

DPDK device memory can be allocated in one of two ways in OVS DPDK,
**shared memory** or **per port memory**. The specifics of both are
detailed below.

Shared Memory
-------------

By default OVS DPDK uses a shared memory model. This means that multiple
ports can share the same mempool. For example when a port is added it will
have a given MTU and socket ID associated with it. If a mempool has been
created previously for an existing port that has the same MTU and socket ID,
that mempool is used for both ports. If there is no existing mempool
supporting these parameters then a new mempool is created.

Per Port Memory
---------------

In the per port memory model, mempools are created per device and are not
shared. The benefit of this is a more transparent memory model where mempools
will not be exhausted by other DPDK devices. However this comes at a potential
increase in cost for memory dimensioning for a given deployment. Users should
be aware of the memory requirements for their deployment before using this
model and allocate the required hugepage memory.

Per port mempool support may be enabled via a global config value,
```per-port-memory```. Setting this to true enables the per port memory
model for all DPDK devices in OVS::

$ ovs-vsctl set Open_vSwitch . other_config:per-port-memory=true

.. important::

This value should be set before setting dpdk-init=true. If set after
dpdk-init=true then the daemon must be restarted to use per-port-memory.

Calculating Memory Requirements
-------------------------------

The amount of memory required for a given mempool can be calculated by the
**number mbufs in the mempool \* mbuf size**.

Users should be aware of the following:

* The **number of mbufs** per mempool will differ between memory models.

* The **size of each mbuf** will be affected by the requested **MTU** size.

.. important::

An mbuf size in bytes is always larger than the requested MTU size due to
alignment and rounding needed in OVS DPDK.

Below are a number of examples of memory requirement calculations for both
shared and per port memory models.

Shared Memory Calculations
~~~~~~~~~~~~~~~~~~~~~~~~~~

In the shared memory model the number of mbufs requested is directly
affected by the requested MTU size as described in the table below.

+--------------------+-------------+
| MTU Size | Num MBUFS |
+====================+=============+
| 1500 or greater | 262144 |
+--------------------+-------------+
| Less than 1500 | 16384 |
+------------+-------+-------------+

.. Important::

If a deployment does not have enough memory to provide 262144 mbufs then
the requested amount is halved up until 16384.

Example 1
+++++++++
::

MTU = 1500 Bytes
Number of mbufs = 262144
Mbuf size = 3008 Bytes
Memory required = 262144 * 3008 = 788 MB

Example 2
+++++++++
::

MTU = 1800 Bytes
Number of mbufs = 262144
Mbuf size = 3008 Bytes
Memory required = 262144 * 3008 = 788 MB

.. note::

Assuming the same socket is in use for example 1 and 2 the same mempool
would be shared.

Example 3
+++++++++
::

MTU = 6000 Bytes
Number of mbufs = 262144
Mbuf size = 8128 Bytes
Memory required = 262144 * 8128 = 2130 MB

Example 4
+++++++++
::

MTU = 9000 Bytes
Number of mbufs = 262144
Mbuf size = 10176 Bytes
Memory required = 262144 * 10176 = 2667 MB

Per Port Memory Calculations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The number of mbufs requested in the per port model is more complicated and
accounts for multiple dynamic factors in the datapath and device
configuration.

A rough estimation of the number of mbufs required for a port is:
::

packets required to fill the device rxqs +
packets that could be stuck on other ports txqs +
packets on the pmd threads +
additional corner case memory.

The algorithm in OVS used to calculate this is as follows:
::

requested number of rxqs * requested rxq size +
requested number of txqs * requested txq size +
min(RTE_MAX_LCORE, requested number of rxqs) * netdev_max_burst +
MIN_NB_MBUF.

where:

* **requested number of rxqs**: Number of requested receive queues for a
device.
* **requested rxq size**: The number of descriptors requested for a rx queue.
* **requested number of txqs**: Number of requested transmit queues for a
device. Calculated as the number of PMDs configured +1.
* **requested txq size**: the number of descriptors requested for a tx queue.
* **min(RTE_MAX_LCORE, requested number of rxqs)**: Compare the maximum
number of lcores supported by DPDK to the number of requested receive
queues for the device and use the variable of lesser value.
* **NETDEV_MAX_BURST**: Maximum number of of packets in a burst, defined as
32.
* **MIN_NB_MBUF**: Additional memory for corner case, defined as 16384.

For all examples below assume the following values:

* requested_rxq_size = 2048
* requested_txq_size = 2048
* RTE_MAX_LCORE = 128
* netdev_max_burst = 32
* MIN_NB_MBUF = 16384

Example 1: (1 rxq, 1 PMD, 1500 MTU)
+++++++++++++++++++++++++++++++++++
::

MTU = 1500
Number of mbufs = (1 * 2048) + (2 * 2048) + (1 * 32) + (16384) = 22560
Mbuf size = 3008 Bytes
Memory required = 22560 * 3008 = 67 MB

Example 2: (1 rxq, 2 PMD, 6000 MTU)
+++++++++++++++++++++++++++++++++++
::

MTU = 6000
Number of mbufs = (1 * 2048) + (3 * 2048) + (1 * 32) + (16384) = 24608
Mbuf size = 8128 Bytes
Memory required = 24608 * 8128 = 200 MB

Example 3: (2 rxq, 2 PMD, 9000 MTU)
+++++++++++++++++++++++++++++++++++
::

MTU = 9000
Number of mbufs = (2 * 2048) + (3 * 2048) + (1 * 32) + (16384) = 26656
Mbuf size = 10176 Bytes
Memory required = 26656 * 10176 = 271 MB
1 change: 1 addition & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ Post-v2.9.0
* Add LSC interrupt support for DPDK physical devices.
* Allow init to fail and record DPDK status/version in OVS database.
* Add experimental flow hardware offload support
* Support both shared and per port mempools for DPDK devices.
- Userspace datapath:
* Commands ovs-appctl dpif-netdev/pmd-*-show can now work on a single PMD
* Detailed PMD performance metrics available with new command
Expand Down
6 changes: 6 additions & 0 deletions lib/dpdk-stub.c
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ dpdk_vhost_iommu_enabled(void)
return false;
}

bool
dpdk_per_port_memory(void)
{
return false;
}

void
print_dpdk_version(void)
{
Expand Down
12 changes: 12 additions & 0 deletions lib/dpdk.c
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */
static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU support */
static bool dpdk_initialized = false; /* Indicates successful initialization
* of DPDK. */
static bool per_port_memory = false; /* Status of per port memory support */

static int
process_vhost_flags(char *flag, const char *default_val, int size,
Expand Down Expand Up @@ -384,6 +385,11 @@ dpdk_init__(const struct smap *ovs_other_config)
VLOG_INFO("IOMMU support for vhost-user-client %s.",
vhost_iommu_enabled ? "enabled" : "disabled");

per_port_memory = smap_get_bool(ovs_other_config,
"per-port-memory", false);
VLOG_INFO("Per port memory for DPDK devices %s.",
per_port_memory ? "enabled" : "disabled");

argv = grow_argv(&argv, 0, 1);
argc = 1;
argv[0] = xstrdup(ovs_get_program_name());
Expand Down Expand Up @@ -541,6 +547,12 @@ dpdk_vhost_iommu_enabled(void)
return vhost_iommu_enabled;
}

bool
dpdk_per_port_memory(void)
{
return per_port_memory;
}

void
dpdk_set_lcore_id(unsigned cpu)
{
Expand Down
1 change: 1 addition & 0 deletions lib/dpdk.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ void dpdk_init(const struct smap *ovs_other_config);
void dpdk_set_lcore_id(unsigned cpu);
const char *dpdk_get_vhost_sock_dir(void);
bool dpdk_vhost_iommu_enabled(void);
bool dpdk_per_port_memory(void);
void print_dpdk_version(void);
void dpdk_status(const struct ovsrec_open_vswitch *);
#endif /* dpdk.h */
Loading

0 comments on commit 43307ad

Please sign in to comment.