Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into ovn4
Browse files Browse the repository at this point in the history
  • Loading branch information
Justin Pettit committed Jun 19, 2015
2 parents d7c3b1d + 5262eea commit 2d34dbd
Show file tree
Hide file tree
Showing 42 changed files with 910 additions and 386 deletions.
1 change: 1 addition & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,7 @@ SUGYO Kazushi [email protected]
Tadaaki Nagao [email protected]
Terry Wilson [email protected]
Tetsuo NAKAGAWA [email protected]
Thadeu Lima de Souza Cascardo [email protected]
Thomas F. Herbert [email protected]
Thomas Goirand [email protected]
Thomas Graf [email protected]
Expand Down
53 changes: 51 additions & 2 deletions DESIGN.md
Original file line number Diff line number Diff line change
Expand Up @@ -280,11 +280,60 @@ OpenFlow 1.4
------------

OpenFlow 1.4 adds the "importance" field to flow_mods, but it does not
explicitly specify which kinds of flow_mods set the importance.For
explicitly specify which kinds of flow_mods set the importance. For
consistency, Open vSwitch uses the same rule for importance as for
idle_timeout and hard_timeout, that is, only an "ADD" flow_mod sets
the importance. (This issue has been filed with the ONF as EXT-496.)


OpenFlow 1.4 Bundles
====================

Open vSwitch makes all flow table modifications atomically, i.e., any
datapath packet only sees flow table configurations either before or
after any change made by any flow_mod. For example, if a controller
removes all flows with a single OpenFlow "flow_mod", no packet sees an
intermediate version of the OpenFlow pipeline where only some of the
flows have been deleted.

It should be noted that Open vSwitch caches datapath flows, and that
the cached flows are NOT flushed immediately when a flow table
changes. Instead, the datapath flows are revalidated against the new
flow table as soon as possible, and usually within one second of the
modification. This design amortizes the cost of datapath cache
flushing across multiple flow table changes, and has a significant
performance effect during simultaneous heavy flow table churn and high
traffic load. This means that different cached datapath flows may
have been computed based on a different flow table configurations, but
each of the datapath flows is guaranteed to have been computed over a
coherent view of the flow tables, as described above.

With OpenFlow 1.4 bundles this atomicity can be extended across an
arbitrary set of flow_mods. Bundles are supported for flow_mod and
port_mod messages only. For flow_mods, both 'atomic' and 'ordered'
bundle flags are trivially supported, as all bundled messages are
executed in the order they were added and all flow table modifications
are now atomic to the datapath. Port mods may not appear in atomic
bundles, as port status modifications are not atomic.

To support bundles, ovs-ofctl has a '--bundle' option that makes the
flow mod commands ('add-flow', 'add-flows', 'mod-flows', 'del-flows',
and 'replace-flows') use an OpenFlow 1.4 bundle to operate the
modifications as a single atomic transaction. If any of the flow mods
in a transaction fail, none of them are executed. All flow mods in a
bundle appear to datapath lookups simultaneously.

Furthermore, ovs-ofctl 'add-flow' and 'add-flows' commands now accept
arbitrary flow mods as an input by allowing the flow specification to
start with an explicit 'add', 'modify', 'modify_strict', 'delete', or
'delete_strict' keyword. A missing keyword is treated as 'add', so
this is fully backwards compatible. With the new '--bundle' option
all the flow mods are executed as a single atomic transaction using an
OpenFlow 1.4 bundle. Without the '--bundle' option the flow mods are
executed in order up to the first failing flow_mod, and in case of an
error the earlier successful flow_mods are not rolled back.


OFPT_PACKET_IN
==============

Expand Down Expand Up @@ -844,7 +893,7 @@ not know the MAC address of the local port that is sending the traffic
or the MAC address of the remote in the guest VM.

With a few notable exceptions below, in-band should work in most
network setups. The following are considered "supported' in the
network setups. The following are considered "supported" in the
current implementation:

- Locally Connected. The switch and remote are on the same
Expand Down
194 changes: 160 additions & 34 deletions INSTALL.DPDK.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ OVS needs a system with 1GB hugepages support.
Building and Installing:
------------------------

Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu)
Required: DPDK 2.0
Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev`
on Debian/Ubuntu)

1. Configure build & install DPDK:
1. Set `$DPDK_DIR`
Expand All @@ -32,12 +34,9 @@ Required DPDK 2.0, `fuse`, `fuse-devel` (`libfuse-dev` on Debian/Ubuntu)
`CONFIG_RTE_BUILD_COMBINE_LIBS=y`
Update `config/common_linuxapp` so that DPDK is built with vhost
libraries; currently, OVS only supports vhost-cuse, so DPDK vhost-user
libraries should be explicitly turned off (they are enabled by default
in DPDK 2.0).
libraries.
`CONFIG_RTE_LIBRTE_VHOST=y`
`CONFIG_RTE_LIBRTE_VHOST_USER=n`
Then run `make install` to build and install the library.
For default install without IVSHMEM:
Expand Down Expand Up @@ -316,40 +315,164 @@ the vswitchd.
DPDK vhost:
-----------
vhost-cuse is only supported at present i.e. not using the standard QEMU
vhost-user interface. It is intended that vhost-user support will be added
in future releases when supported in DPDK and that vhost-cuse will eventually
be deprecated. See [DPDK Docs] for more info on vhost.
DPDK 2.0 supports two types of vhost:
Prerequisites:
1. Insert the Cuse module:
1. vhost-user
2. vhost-cuse
`modprobe cuse`
Whatever type of vhost is enabled in the DPDK build specified, is the type
that will be enabled in OVS. By default, vhost-user is enabled in DPDK.
Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports
will be enabled in OVS.
Please note that support for vhost-cuse is intended to be deprecated in OVS
in a future release.
2. Build and insert the `eventfd_link` module:
DPDK vhost-user:
----------------
`cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
`make`
`insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
The following sections describe the use of vhost-user 'dpdkvhostuser' ports
with OVS.
Following the steps above to create a bridge, you can now add DPDK vhost
as a port to the vswitch.
DPDK vhost-user Prerequisites:
-------------------------
`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 type=dpdkvhost`
1. DPDK 2.0 with vhost support enabled as documented in the "Building and
Installing section"
Unlike DPDK ring ports, DPDK vhost ports can have arbitrary names:
2. QEMU version v2.1.0+
`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC type=dpdkvhost`
QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing
your VM with memory greater than 1GB due to potential issues with memory
mapping larger areas.
However, please note that when attaching userspace devices to QEMU, the
name provided during the add-port operation must match the ifname parameter
on the QEMU command line.
Adding DPDK vhost-user ports to the Switch:
--------------------------------------
Following the steps above to create a bridge, you can now add DPDK vhost-user
as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can
have arbitrary names.
DPDK vhost VM configuration:
----------------------------
- For vhost-user, the name of the port type is `dpdkvhostuser`
vhost ports use a Linux* character device to communicate with QEMU.
```
ovs-ofctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
type=dpdkvhostuser
```
This action creates a socket located at
`/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
to your VM on the QEMU command line. More instructions on this can be
found in the next section "DPDK vhost-user VM configuration"
Note: If you wish for the vhost-user sockets to be created in a
directory other than `/usr/local/var/run/openvswitch`, you may specify
another location on the ovs-vswitchd command line like so:
`./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...`
DPDK vhost-user VM configuration:
---------------------------------
Follow the steps below to attach vhost-user port(s) to a VM.
1. Configure sockets.
Pass the following parameters to QEMU to attach a vhost-user device:
```
-chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1
-netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
```
...where vhost-user-1 is the name of the vhost-user port added
to the switch.
Repeat the above parameters for multiple devices, changing the
chardev path and id as necessary. Note that a separate and different
chardev path needs to be specified for each vhost-user device. For
example you have a second vhost-user port named 'vhost-user-2', you
append your QEMU command line with an additional set of parameters:
```
-chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2
-netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
-device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
```
2. Configure huge pages.
QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access
a virtio-net device's virtual rings and packet buffers mapping the VM's
physical memory on hugetlbfs. To enable vhost-user ports to map the VM's
memory into their process address space, pass the following paramters
to QEMU:
```
-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
share=on
-numa node,memdev=mem -mem-prealloc
```
DPDK vhost-cuse:
----------------
The following sections describe the use of vhost-cuse 'dpdkvhostcuse' ports
with OVS.
DPDK vhost-cuse Prerequisites:
-------------------------
1. DPDK 2.0 with vhost support enabled as documented in the "Building and
Installing section"
As an additional step, you must enable vhost-cuse in DPDK by setting the
following additional flag in `config/common_linuxapp`:
`CONFIG_RTE_LIBRTE_VHOST_USER=n`
Following this, rebuild DPDK as per the instructions in the "Building and
Installing" section. Finally, rebuild OVS as per step 3 in the "Building
and Installing" section - OVS will detect that DPDK has vhost-cuse libraries
compiled and in turn will enable support for it in the switch and disable
vhost-user support.
2. Insert the Cuse module:
`modprobe cuse`
3. Build and insert the `eventfd_link` module:
```
cd $DPDK_DIR/lib/librte_vhost/eventfd_link/
make
insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko
```
4. QEMU version v2.1.0+
vhost-cuse will work with QEMU v2.1.0 and above, however it is recommended to
use v2.2.0 if providing your VM with memory greater than 1GB due to potential
issues with memory mapping larger areas.
Note: QEMU v1.6.2 will also work, with slightly different command line parameters,
which are specified later in this document.
Adding DPDK vhost-cuse ports to the Switch:
--------------------------------------
Following the steps above to create a bridge, you can now add DPDK vhost-cuse
as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can have
arbitrary names.
- For vhost-cuse, the name of the port type is `dpdkvhostcuse`
```
ovs-ofctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
type=dpdkvhostcuse
```
When attaching vhost-cuse ports to QEMU, the name provided during the
add-port operation must match the ifname parameter on the QEMU command
line. More instructions on this can be found in the next section.
DPDK vhost-cuse VM configuration:
---------------------------------
vhost-cuse ports use a Linux* character device to communicate with QEMU.
By default it is set to `/dev/vhost-net`. It is possible to reuse this
standard device for DPDK vhost, which makes setup a little simpler but it
is better practice to specify an alternative character device in order to
Expand Down Expand Up @@ -415,16 +538,19 @@ DPDK vhost VM configuration:
QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
virtio-net device's virtual rings and packet buffers mapping the VM's
physical memory on hugetlbfs. To enable vhost-ports to map the VM's
memory into their process address space, pass the following paramters
memory into their process address space, pass the following parameters
to QEMU:
`-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,
share=on -numa node,memdev=mem -mem-prealloc`
Note: For use with an earlier QEMU version such as v1.6.2, use the
following to configure hugepages instead:
DPDK vhost VM configuration with QEMU wrapper:
----------------------------------------------
`-mem-path /dev/hugepages -mem-prealloc`
DPDK vhost-cuse VM configuration with QEMU wrapper:
---------------------------------------------------
The QEMU wrapper script automatically detects and calls QEMU with the
necessary parameters. It performs the following actions:
Expand All @@ -450,8 +576,8 @@ qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
netdev=net1,mac=00:00:00:00:00:01
```
DPDK vhost VM configuration with libvirt:
-----------------------------------------
DPDK vhost-cuse VM configuration with libvirt:
----------------------------------------------
If you are using libvirt, you must enable libvirt to access the character
device by adding it to controllers cgroup for libvirtd using the following
Expand Down Expand Up @@ -525,7 +651,7 @@ Now you may launch your VM using virt-manager, or like so:
`virsh create my_vhost_vm.xml`
DPDK vhost VM configuration with libvirt and QEMU wrapper:
DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
----------------------------------------------------------
To use the qemu-wrapper script in conjuntion with libvirt, follow the
Expand Down Expand Up @@ -553,7 +679,7 @@ steps in the previous section before proceeding with the following steps:
the correct emulator location and set any additional options. If you are
using a alternative character device name, please set "us_vhost_path" to the
location of that device. The script will automatically detect and insert
the correct "vhostfd" value in the QEMU command line arguements.
the correct "vhostfd" value in the QEMU command line arguments.
5. Use virt-manager to launch the VM
Expand Down
19 changes: 12 additions & 7 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
Post-v2.3.0
Post-v2.4.0
---------------------


v2.4.0 - xx xxx xxxx
---------------------
- Flow table modifications are now atomic, meaning that each packet
now sees a coherent version of the OpenFlow pipeline. For
Expand Down Expand Up @@ -32,11 +36,12 @@ Post-v2.3.0
commands are now redundant and will be removed in a future
release. See ovs-vswitchd(8) for details.
- OpenFlow:
* OpenFlow 1.4 bundles are now supported, but for flow mod
messages only. Both 'atomic' and 'ordered' bundle flags are
trivially supported, as all bundled messages are executed in
the order they were added and all flow table modifications are
now atomic to the datapath.
* OpenFlow 1.4 bundles are now supported for flow mods and port
mods. For flow mods, both 'atomic' and 'ordered' bundle flags
are trivially supported, as all bundled messages are executed
in the order they were added and all flow table modifications
are now atomic to the datapath. Port mods may not appear in
atomic bundles, as port status modifications are not atomic.
* IPv6 flow label and neighbor discovery fields are now modifiable.
* OpenFlow 1.5 extended registers are now supported.
* The OpenFlow 1.5 actset_output field is now supported.
Expand Down Expand Up @@ -86,7 +91,7 @@ Post-v2.3.0
with Docker, the wrapper script will be retired.
- Added support for DPDK Tunneling. VXLAN, GRE, and Geneve are supported
protocols. This is generic tunneling mechanism for userspace datapath.
- Support for multicast snooping (IGMPv1 and IGMPv2)
- Support for multicast snooping (IGMPv1, IGMPv2 and IGMPv3)
- Support for Linux kernels up to 4.0.x
- The documentation now use the term 'destination' to mean one of syslog,
console or file for vlog logging instead of the previously used term
Expand Down
6 changes: 6 additions & 0 deletions OPENFLOW-1.1+.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,12 @@ parallel in OVS.
Transactional modification. OpenFlow 1.4 requires to support
flow_mods and port_mods in a bundle if bundle is supported.
(Not related to OVS's 'ofbundle' stuff.)
Implemented as an OpenFlow 1.4 feature. Only flow_mods and
port_mods are supported in a bundle. If the bundle includes port
mods, it may not specify the OFPBF_ATOMIC flag. Nevertheless,
port mods and flow mods in a bundle are always applied in order
and consecutive flow mods between port mods are made available to
lookups atomically.
[EXT-230]
[optional for OF1.4+]

Expand Down
Loading

0 comments on commit 2d34dbd

Please sign in to comment.