Skip to content

Commit

Permalink
netdev-dpdk: add dpdk vhost-user ports
Browse files Browse the repository at this point in the history
This patch adds support for a new port type to the userspace
datapath called dpdkvhostuser.

A new dpdkvhostuser port will create a unix domain socket which
when provided to QEMU is used to facilitate communication between
the virtio-net device on the VM and the OVS port on the host.

vhost-cuse ('dpdkvhost') ports are still available as 'dpdkvhostcuse'
ports and will be enabled if vhost-cuse support is detected in the
DPDK build specified during compilation of the switch. Otherwise,
vhost-user ports are enabled.

Signed-off-by: Ciara Loftus <[email protected]>
Acked-by: Flavio Leitner <[email protected]>
Signed-off-by: Pravin B Shelar <[email protected]>
cloftus authored and Pravin B Shelar committed Jun 15, 2015
1 parent 1c38055 commit 7d1ced0
Showing 5 changed files with 310 additions and 70 deletions.
194 changes: 160 additions & 34 deletions INSTALL.DPDK.md
Original file line number Diff line number Diff line change
@@ -16,7 +16,9 @@ OVS needs a system with 1GB hugepages support.
Building and Installing:
------------------------

Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu)
Required: DPDK 2.0
Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev`
on Debian/Ubuntu)

1. Configure build & install DPDK:
1. Set `$DPDK_DIR`
@@ -32,12 +34,9 @@ Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu)
`CONFIG_RTE_BUILD_COMBINE_LIBS=y`
Update `config/common_linuxapp` so that DPDK is built with vhost
libraries; currently, OVS only supports vhost-cuse, so DPDK vhost-user
libraries should be explicitly turned off (they are enabled by default
in DPDK 2.0).
libraries.
`CONFIG_RTE_LIBRTE_VHOST=y`
`CONFIG_RTE_LIBRTE_VHOST_USER=n`
Then run `make install` to build and install the library.
For default install without IVSHMEM:
@@ -316,40 +315,164 @@ the vswitchd.
DPDK vhost:
-----------
vhost-cuse is only supported at present i.e. not using the standard QEMU
vhost-user interface. It is intended that vhost-user support will be added
in future releases when supported in DPDK and that vhost-cuse will eventually
be deprecated. See [DPDK Docs] for more info on vhost.
DPDK 2.0 supports two types of vhost:
Prerequisites:
1. Insert the Cuse module:
1. vhost-user
2. vhost-cuse
`modprobe cuse`
Whatever type of vhost is enabled in the DPDK build specified, is the type
that will be enabled in OVS. By default, vhost-user is enabled in DPDK.
Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports
will be enabled in OVS.
Please note that support for vhost-cuse is intended to be deprecated in OVS
in a future release.
2. Build and insert the `eventfd_link` module:
DPDK vhost-user:
----------------
`cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
`make`
`insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
The following sections describe the use of vhost-user 'dpdkvhostuser' ports
with OVS.
Following the steps above to create a bridge, you can now add DPDK vhost
as a port to the vswitch.
DPDK vhost-user Prerequisites:
-------------------------
`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
1. DPDK 2.0 with vhost support enabled as documented in the "Building and
Installing section"
Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names:
2. QEMU version v2.1.0+
`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing
your VM with memory greater than 1GB due to potential issues with memory
mapping larger areas.
However, please note that when attaching userspace devices to QEMU, the
name provided during the add-port operation must match the ifname parameter
on the QEMU command line.
Adding DPDK vhost-user ports to the Switch:
--------------------------------------
Following the steps above to create a bridge, you can now add DPDK vhost-user
as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can
have arbitrary names.
DPDK vhost VM configuration:
----------------------------
- For vhost-user, the name of the port type is `dpdkvhostuser`
vhost ports use a Linux* character device to communicate with QEMU.
```
ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
type=dpdkvhostuser
```
This action creates a socket located at
`/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
to your VM on the QEMU command line. More instructions on this can be
found in the next section "DPDK vhost-user VM configuration"
Note: If you wish for the vhost-user sockets to be created in a
directory other than `/usr/local/var/run/openvswitch`, you may specify
another location on the ovs-vswitchd command line like so:
`./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
DPDK vhost-user VM configuration:
---------------------------------
Follow the steps below to attach vhost-user port(s) to a VM.
1. Configure sockets.
Pass the following parameters to QEMU to attach a vhost-user device:
```
-chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
-netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
```
...where vhost-user-1 is the name of the vhost-user port added
to the switch.
Repeat the above parameters for multiple devices, changing the
chardev path and id as necessary. Note that a separate and different
chardev path needs to be specified for each vhost-user device. For
example you have a second vhost-user port named 'vhost-user-2', you
append your QEMU command line with an additional set of parameters:
```
-chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
-netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
-device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
```
2. Configure huge pages.
QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access
a virtio-net device's virtual rings and packet buffers mapping the VM's
physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
memory into their process address space, pass the following paramters
to QEMU:
```
-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
share=on
-numa node,memdev=mem -mem-prealloc
```
DPDK vhost-cuse:
----------------
The following sections describe the use of vhost-cuse 'dpdkvhostcuse' ports
with OVS.
DPDK vhost-cuse Prerequisites:
-------------------------
1. DPDK 2.0 with vhost support enabled as documented in the "Building and
Installing section"
As an additional step, you must enable vhost-cuse in DPDK by setting the
following additional flag in `config/common_linuxapp`:
`CONFIG_RTE_LIBRTE_VHOST_USER=n`
Following this, rebuild DPDK as per the instructions in the "Building and
Installing" section. Finally, rebuild OVS as per step 3 in the "Building
and Installing" section - OVS will detect that DPDK has vhost-cuse libraries
compiled and in turn will enable support for it in the switch and disable
vhost-user support.
2. Insert the Cuse module:
`modprobe cuse`
3. Build and insert the `eventfd_link` module:
```
cd $DPDK_DIR/lib/librte_vhost/eventfd_link/
make
insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko
```
4. QEMU version v2.1.0+
vhost-cuse will work with QEMU v2.1.0 and above, however it is recommended to
use v2.2.0 if providing your VM with memory greater than 1GB due to potential
issues with memory mapping larger areas.
Note: QEMU v1.6.2 will also work, with slightly different command line parameters,
which are specified later in this document.
Adding DPDK vhost-cuse ports to the Switch:
--------------------------------------
Following the steps above to create a bridge, you can now add DPDK vhost-cuse
as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can have
arbitrary names.
- For vhost-cuse, the name of the port type is `dpdkvhostcuse`
```
ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
type=dpdkvhostcuse
```
When attaching vhost-cuse ports to QEMU, the name provided during the
add-port operation must match the ifname parameter on the QEMU command
line. More instructions on this can be found in the next section.
DPDK vhost-cuse VM configuration:
---------------------------------
vhost-cuse ports use a Linux* character device to communicate with QEMU.
By default it is set to `/dev/vhost-net`. It is possible to reuse this
standard device for DPDK vhost, which makes setup a little simpler but it
is better practice to specify an alternative character device in order to
@@ -415,16 +538,19 @@ DPDK vhost VM configuration:
QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
virtio-net device's virtual rings and packet buffers mapping the VM's
physical memory on hugetlbfs. To enable vhost-ports to map the VM's
memory into their process address space, pass the following paramters
memory into their process address space, pass the following parameters
to QEMU:
`-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
share=on -numa node,memdev=mem -mem-prealloc`
Note: For use with an earlier QEMU version such as v1.6.2, use the
following to configure hugepages instead:
DPDK vhost VM configuration with QEMU wrapper:
----------------------------------------------
`-mem-path /dev/hugepages -mem-prealloc`
DPDK vhost-cuse VM configuration with QEMU wrapper:
---------------------------------------------------
The QEMU wrapper script automatically detects and calls QEMU with the
necessary parameters. It performs the following actions:
@@ -450,8 +576,8 @@ qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
netdev=net1,mac=00:00:00:00:00:01
```
DPDK vhost VM configuration with libvirt:
-----------------------------------------
DPDK vhost-cuse VM configuration with libvirt:
----------------------------------------------
If you are using libvirt, you must enable libvirt to access the character
device by adding it to controllers cgroup for libvirtd using the following
@@ -525,7 +651,7 @@ Now you may launch your VM using virt-manager, or like so:
`virsh create my_vhost_vm.xml`
DPDK vhost VM configuration with libvirt and QEMU wrapper:
DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
----------------------------------------------------------
To use the qemu-wrapper script in conjuntion with libvirt, follow the
@@ -553,7 +679,7 @@ steps in the previous section before proceeding with the following steps:
the correct emulator location and set any additional options. If you are
using a alternative character device name, please set "us_vhost_path" to the
location of that device. The script will automatically detect and insert
the correct "vhostfd" value in the QEMU command line arguements.
the correct "vhostfd" value in the QEMU command line arguments.
5. Use virt-manager to launch the VM
3 changes: 3 additions & 0 deletions acinclude.m4
Original file line number Diff line number Diff line change
@@ -220,6 +220,9 @@ AC_DEFUN([OVS_CHECK_DPDK], [
DPDK_vswitchd_LDFLAGS=-Wl,--whole-archive,$DPDK_LIB,--no-whole-archive
AC_SUBST([DPDK_vswitchd_LDFLAGS])
AC_DEFINE([DPDK_NETDEV], [1], [System uses the DPDK module.])
OVS_GREP_IFELSE([$RTE_SDK/include/rte_config.h], [define RTE_LIBRTE_VHOST_USER 1],
[], [AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse support enabled, vhost-user disabled.])])
else
RTE_SDK=
fi
161 changes: 129 additions & 32 deletions lib/netdev-dpdk.c
Original file line number Diff line number Diff line change
@@ -16,7 +16,6 @@

#include <config.h>

#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <stdlib.h>
@@ -26,8 +25,12 @@
#include <sched.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/stat.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>

#include "dirs.h"
#include "dp-packet.h"
#include "dpif-netdev.h"
#include "list.h"
@@ -90,8 +93,8 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF))
#define NIC_PORT_RX_Q_SIZE 2048 /* Size of Physical NIC RX Queue, Max (n+32<=4096)*/
#define NIC_PORT_TX_Q_SIZE 2048 /* Size of Physical NIC TX Queue, Max (n+32<=4096)*/

/* Character device cuse_dev_name. */
static char *cuse_dev_name = NULL;
char *cuse_dev_name = NULL; /* Character device cuse_dev_name. */
char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */

/*
* Maximum amount of time in micro seconds to try and enqueue to vhost.
@@ -126,7 +129,7 @@ enum { DRAIN_TSC = 200000ULL };

enum dpdk_dev_type {
DPDK_DEV_ETH = 0,
DPDK_DEV_VHOST = 1
DPDK_DEV_VHOST = 1,
};

static int rte_eal_init_ret = ENODEV;
@@ -221,6 +224,9 @@ struct netdev_dpdk {
/* virtio-net structure for vhost device */
OVSRCU_TYPE(struct virtio_net *) virtio_dev;

/* Identifier used to distinguish vhost devices from each other */
char vhost_id[PATH_MAX];

/* In dpdk_list. */
struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex);
};
@@ -594,21 +600,51 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[],
}

static int
netdev_dpdk_vhost_construct(struct netdev *netdev_)
vhost_construct_helper(struct netdev *netdev_)
{
struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
int err;

if (rte_eal_init_ret) {
return rte_eal_init_ret;
}

rte_spinlock_init(&netdev->vhost_tx_lock);
return netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST);
}

static int
netdev_dpdk_vhost_cuse_construct(struct netdev *netdev_)
{
struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
int err;

ovs_mutex_lock(&dpdk_mutex);
err = netdev_dpdk_init(netdev_, -1, DPDK_DEV_VHOST);
strncpy(netdev->vhost_id, netdev->up.name, sizeof(netdev->vhost_id));
err = vhost_construct_helper(netdev_);
ovs_mutex_unlock(&dpdk_mutex);
return err;
}

rte_spinlock_init(&netdev->vhost_tx_lock);
static int
netdev_dpdk_vhost_user_construct(struct netdev *netdev_)
{
struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
int err;

ovs_mutex_lock(&dpdk_mutex);
/* Take the name of the vhost-user port and append it to the location where
* the socket is to be created, then register the socket.
*/
snprintf(netdev->vhost_id, sizeof(netdev->vhost_id), "%s/%s",
vhost_sock_dir, netdev_->name);
err = rte_vhost_driver_register(netdev->vhost_id);
if (err) {
VLOG_ERR("vhost-user socket device setup failure for socket %s\n",
netdev->vhost_id);
}
VLOG_INFO("Socket %s created for vhost-user port %s\n", netdev->vhost_id, netdev_->name);
err = vhost_construct_helper(netdev_);
ovs_mutex_unlock(&dpdk_mutex);
return err;
}

@@ -1607,7 +1643,7 @@ new_device(struct virtio_net *dev)
ovs_mutex_lock(&dpdk_mutex);
/* Add device to the vhost port with the same name as that passed down. */
LIST_FOR_EACH(netdev, list_node, &dpdk_list) {
if (strncmp(dev->ifname, netdev->up.name, IFNAMSIZ) == 0) {
if (strncmp(dev->ifname, netdev->vhost_id, IF_NAME_SZ) == 0) {
ovs_mutex_lock(&netdev->mutex);
ovsrcu_set(&netdev->virtio_dev, dev);
ovs_mutex_unlock(&netdev->mutex);
@@ -1687,7 +1723,7 @@ static const struct virtio_net_device_ops virtio_net_device_ops =
};

static void *
start_cuse_session_loop(void *dummy OVS_UNUSED)
start_vhost_loop(void *dummy OVS_UNUSED)
{
pthread_detach(pthread_self());
/* Put the cuse thread into quiescent state. */
@@ -1698,10 +1734,17 @@ start_cuse_session_loop(void *dummy OVS_UNUSED)

static int
dpdk_vhost_class_init(void)
{
rte_vhost_driver_callback_register(&virtio_net_device_ops);
ovs_thread_create("vhost_thread", start_vhost_loop, NULL);
return 0;
}

static int
dpdk_vhost_cuse_class_init(void)
{
int err = -1;

rte_vhost_driver_callback_register(&virtio_net_device_ops);

/* Register CUSE device to handle IOCTLs.
* Unless otherwise specified on the vswitchd command line, cuse_dev_name
@@ -1714,7 +1757,14 @@ dpdk_vhost_class_init(void)
return -1;
}

ovs_thread_create("cuse_thread", start_cuse_session_loop, NULL);
dpdk_vhost_class_init();
return 0;
}

static int
dpdk_vhost_user_class_init(void)
{
dpdk_vhost_class_init();
return 0;
}

@@ -1923,6 +1973,33 @@ netdev_dpdk_ring_construct(struct netdev *netdev)
NULL, /* rxq_drain */ \
}

static int
process_vhost_flags(char *flag, char *default_val, int size,
char **argv, char **new_val)
{
int changed = 0;

/* Depending on which version of vhost is in use, process the vhost-specific
* flag if it is provided on the vswitchd command line, otherwise resort to
* a default value.
*
* For vhost-user: Process "-cuse_dev_name" to set the custom location of
* the vhost-user socket(s).
* For vhost-cuse: Process "-vhost_sock_dir" to set the custom name of the
* vhost-cuse character device.
*/
if (!strcmp(argv[1], flag) && (strlen(argv[2]) <= size)) {
changed = 1;
*new_val = strdup(argv[2]);
VLOG_INFO("User-provided %s in use: %s", flag, *new_val);
} else {
VLOG_INFO("No %s provided - defaulting to %s", flag, default_val);
*new_val = default_val;
}

return changed;
}

int
dpdk_init(int argc, char **argv)
{
@@ -1937,27 +2014,29 @@ dpdk_init(int argc, char **argv)
argc--;
argv++;

/* If the cuse_dev_name parameter has been provided, set 'cuse_dev_name' to
* this string if it meets the correct criteria. Otherwise, set it to the
* default (vhost-net).
*/
if (!strcmp(argv[1], "--cuse_dev_name") &&
(strlen(argv[2]) <= NAME_MAX)) {

cuse_dev_name = strdup(argv[2]);
#ifdef VHOST_CUSE
if (process_vhost_flags("-cuse_dev_name", strdup("vhost-net"),
PATH_MAX, argv, &cuse_dev_name)) {
#else
if (process_vhost_flags("-vhost_sock_dir", strdup(ovs_rundir()),
NAME_MAX, argv, &vhost_sock_dir)) {
struct stat s;
int err;

/* Remove the cuse_dev_name configuration parameters from the argument
err = stat(vhost_sock_dir, &s);
if (err) {
VLOG_ERR("vHostUser socket DIR '%s' does not exist.",
vhost_sock_dir);
return err;
}
#endif
/* Remove the vhost flag configuration parameters from the argument
* list, so that the correct elements are passed to the DPDK
* initialization function
*/
argc -= 2;
argv += 2; /* Increment by two to bypass the cuse_dev_name arguments */
argv += 2; /* Increment by two to bypass the vhost flag arguments */
base = 2;

VLOG_ERR("User-provided cuse_dev_name in use: /dev/%s", cuse_dev_name);
} else {
cuse_dev_name = "vhost-net";
VLOG_INFO("No cuse_dev_name provided - defaulting to /dev/vhost-net");
}

/* Keep the program name argument as this is needed for call to
@@ -2012,11 +2091,25 @@ static const struct netdev_class dpdk_ring_class =
netdev_dpdk_get_status,
netdev_dpdk_rxq_recv);

static const struct netdev_class dpdk_vhost_class =
static const struct netdev_class dpdk_vhost_cuse_class =
NETDEV_DPDK_CLASS(
"dpdkvhost",
dpdk_vhost_class_init,
netdev_dpdk_vhost_construct,
"dpdkvhostcuse",
dpdk_vhost_cuse_class_init,
netdev_dpdk_vhost_cuse_construct,
netdev_dpdk_vhost_destruct,
netdev_dpdk_vhost_set_multiq,
netdev_dpdk_vhost_send,
netdev_dpdk_vhost_get_carrier,
netdev_dpdk_vhost_get_stats,
NULL,
NULL,
netdev_dpdk_vhost_rxq_recv);

const struct netdev_class dpdk_vhost_user_class =
NETDEV_DPDK_CLASS(
"dpdkvhostuser",
dpdk_vhost_user_class_init,
netdev_dpdk_vhost_user_construct,
netdev_dpdk_vhost_destruct,
netdev_dpdk_vhost_set_multiq,
netdev_dpdk_vhost_send,
@@ -2039,7 +2132,11 @@ netdev_dpdk_register(void)
dpdk_common_init();
netdev_register_provider(&dpdk_class);
netdev_register_provider(&dpdk_ring_class);
netdev_register_provider(&dpdk_vhost_class);
#ifdef VHOST_CUSE
netdev_register_provider(&dpdk_vhost_cuse_class);
#else
netdev_register_provider(&dpdk_vhost_user_class);
#endif
ovsthread_once_done(&once);
}
}
3 changes: 2 additions & 1 deletion lib/netdev.c
Original file line number Diff line number Diff line change
@@ -111,7 +111,8 @@ netdev_is_pmd(const struct netdev *netdev)
{
return (!strcmp(netdev->netdev_class->type, "dpdk") ||
!strcmp(netdev->netdev_class->type, "dpdkr") ||
!strcmp(netdev->netdev_class->type, "dpdkvhost"));
!strcmp(netdev->netdev_class->type, "dpdkvhostcuse") ||
!strcmp(netdev->netdev_class->type, "dpdkvhostuser"));
}

static void
19 changes: 16 additions & 3 deletions vswitchd/ovs-vswitchd.c
Original file line number Diff line number Diff line change
@@ -72,6 +72,10 @@ main(int argc, char *argv[])

set_program_name(argv[0]);
retval = dpdk_init(argc,argv);
if (retval < 0) {
return retval;
}

argc -= retval;
argv += retval;

@@ -252,9 +256,18 @@ usage(void)
daemon_usage();
vlog_usage();
printf("\nDPDK options:\n"
" --dpdk options Initialize DPDK datapath.\n"
" --cuse_dev_name BASENAME override default character device name\n"
" for use with userspace vHost.\n");
" --dpdk [VHOST] [DPDK] Initialize DPDK datapath.\n"
" where DPDK are options for initializing DPDK lib and VHOST is\n"
#ifdef VHOST_CUSE
" option to override default character device name used for\n"
" for use with userspace vHost\n"
" -cuse_dev_name NAME\n"
#else
" option to override default directory where vhost-user\n"
" sockets are created.\n"
" -vhost_sock_dir DIR\n"
#endif
);
printf("\nOther options:\n"
" --unixctl=SOCKET override default control socket name\n"
" -h, --help display this help message\n"

0 comments on commit 7d1ced0

Please sign in to comment.