Skip to content

Commit

Permalink
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/gi…
Browse files Browse the repository at this point in the history
…t/dledford/rdma

Pull rdma updates from Doug Ledford:
 "More exchaustive description of primary updates in this release:

   - Lots of driver fixes and misc fixes across the board.

   - I had to base on a net-next tree because the IPoIB Accelorator
     patches needed it.

     Unfortunately, it was known to Mellanox that there would need to be
     an IPoIB accelorator patch to the net tree (which left some
     functions turned off by an #ifdef construct to avoid warnings about
     defined but unused functions), then one to the RDMA tree, then a
     fixup that went back and re-enabled the functions in the net tree
     and enabled their use in the rdma tree

     Also, a sparse fix was sent to the net tree after I did my pull,
     and the fixup patch conflicts quite directly with that sparse fix,
     so I'm going to submit the fixup patch towards the end of the merge
     window by itself and based upon your master branch at the time.

   - Two separate rounds of hfi1 fixes, one that got dropped from last
     release because it came in just a day or two before the end of the
     merge window and then the one from this release cycle.

     Of note is that I now have a third series that just landed from
     Intel yesterday. It is not included in this pull request, but I may
     submit it by the end of the week. I'll talk to Intel about
     improving the timing of thier submissions for my workflow.

   - Changes to our idr usage in the RDMA subsystem that will tie into
     our cgroup management and also into the upcoming changes for the
     RDMA kernel<->userspace API.

   - Addition of support for a netdev to be tied to an RDMA device at
     the core level

   - Addition of the VNIC driver from Intel.

     While IPoIB provides IP over InfiniBand (and *only* IP, no lower
     layer protocol headers are allowed or supported), the VNIC driver
     presents a virtual Ethernet device with support for things like
     varying Ethertypes, VLANs, priorities and other features of
     Ethernet.

     The virtual devices are centrally managed by the OPA fabric
     manager, making this (for the time being) a strictly OPA specific
     feature.

   - Improvements to the On-Demand Paging support in the RDMA subsystem.

   - Addition of three significant OPA changes.

     While we added OPA support some time ago (via the hfi1 driver), the
     RDMA subsystem has so far glossed over the areas where OPA and
     InfiniBand differ.

     With this release we are starting to add support for the OPA
     extensions into the RDMA core in the following area: Extended port
     information for OPA is now supported, extended Address Handle
     attributes for OPA are now supported, and extended SA Queries to
     get OPA specific subnet information is now supported.

  Concise summary from the tag:
   - idr usage and locking changes
   - build fix for hns
   - ipoib debug path record file fix
   - hfi1 updates
   - core RDMA netdev addition
   - Intel VNIC driver addition
   - Enhanced accelerators for IPoIB addition
   - Debug cleanups in cxgb3/4
   - Trivial cleanups from SF Markus Elfring
   - Misc rxe fixes from Mellanox
   - Misc ipoib fixes from Mellanox
   - Lots of mlx4/mlx5 changes from Mellanox
   - Misc fixes across the RDMA subsystem
   - ODP paging fixes and improvements
   - qedr updates
   - hfi1 updates
   - OPA port info patches
   - OPA AH patches
   - OPA SA Query patches"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (191 commits)
  infiniband: avoid dereferencing uninitialized dst on error path
  IB/SA: Add OPA addr header
  IB/mlx5: Add port_xmit_wait to counter registers read
  IB/ocrdma: fix out of bounds access to local buffer
  IB/mlx4: Fix incorrect order of formal and actual parameters
  IB/mlx4: Change flush logic so it adheres to the variable name
  mlx5: Fix mlx5_ib_map_mr_sg mr length
  IB/rxe: Don't clamp residual length to mtu
  IB/SA: Add support to query OPA path records
  IB/SA: Add OPA path record type
  IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields
  IB/SA: Introduce path record specific types
  IB/SA: Rename ib_sa_path_rec to sa_path_rec
  IB/CM: Add braces when using sizeof
  IB/core: Define 'opa' rdma_ah_attr type
  IB/core: Define 'ib' and 'roce' rdma_ah_attr types
  IB/core: Use rdma_ah_attr accessor functions
  IB/core: Add accessor functions for rdma_ah_attr fields
  IB/PVRDMA: Rename ib_ah_attr related functions
  IB/mthca: Rename to_ib_ah_attr to to_rdma_ah_attr
  ...
  • Loading branch information
torvalds committed May 3, 2017
2 parents 16a12fa + 24b43c9 commit 1684096
Show file tree
Hide file tree
Showing 242 changed files with 14,599 additions and 5,505 deletions.
153 changes: 153 additions & 0 deletions Documentation/infiniband/opa_vnic.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
supports Ethernet functionality over Omni-Path fabric by encapsulating
the Ethernet packets between HFI nodes.

Architecture
=============
The patterns of exchanges of Omni-Path encapsulated Ethernet packets
involves one or more virtual Ethernet switches overlaid on the Omni-Path
fabric topology. A subset of HFI nodes on the Omni-Path fabric are
permitted to exchange encapsulated Ethernet packets across a particular
virtual Ethernet switch. The virtual Ethernet switches are logical
abstractions achieved by configuring the HFI nodes on the fabric for
header generation and processing. In the simplest configuration all HFI
nodes across the fabric exchange encapsulated Ethernet packets over a
single virtual Ethernet switch. A virtual Ethernet switch, is effectively
an independent Ethernet network. The configuration is performed by an
Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
application. HFI nodes can have multiple VNICs each connected to a
different virtual Ethernet switch. The below diagram presents a case
of two virtual Ethernet switches with two HFI nodes.

+-------------------+
| Subnet/ |
| Ethernet |
| Manager |
+-------------------+
/ /
/ /
/ /
/ /
+-----------------------------+ +------------------------------+
| Virtual Ethernet Switch | | Virtual Ethernet Switch |
| +---------+ +---------+ | | +---------+ +---------+ |
| | VPORT | | VPORT | | | | VPORT | | VPORT | |
+--+---------+----+---------+-+ +-+---------+----+---------+---+
| \ / |
| \ / |
| \/ |
| / \ |
| / \ |
+-----------+------------+ +-----------+------------+
| VNIC | VNIC | | VNIC | VNIC |
+-----------+------------+ +-----------+------------+
| HFI | | HFI |
+------------------------+ +------------------------+


The Omni-Path encapsulated Ethernet packet format is as described below.

Bits Field
------------------------------------
Quad Word 0:
0-19 SLID (lower 20 bits)
20-30 Length (in Quad Words)
31 BECN bit
32-51 DLID (lower 20 bits)
52-56 SC (Service Class)
57-59 RC (Routing Control)
60 FECN bit
61-62 L2 (=10, 16B format)
63 LT (=1, Link Transfer Head Flit)

Quad Word 1:
0-7 L4 type (=0x78 ETHERNET)
8-11 SLID[23:20]
12-15 DLID[23:20]
16-31 PKEY
32-47 Entropy
48-63 Reserved

Quad Word 2:
0-15 Reserved
16-31 L4 header
32-63 Ethernet Packet

Quad Words 3 to N-1:
0-63 Ethernet packet (pad extended)

Quad Word N (last):
0-23 Ethernet packet (pad extended)
24-55 ICRC
56-61 Tail
62-63 LT (=01, Link Transfer Tail Flit)

Ethernet packet is padded on the transmit side to ensure that the VNIC OPA
packet is quad word aligned. The 'Tail' field contains the number of bytes
padded. On the receive side the 'Tail' field is read and the padding is
removed (along with ICRC, Tail and OPA header) before passing packet up
the network stack.

The L4 header field contains the virtual Ethernet switch id the VNIC port
belongs to. On the receive side, this field is used to de-multiplex the
received VNIC packets to different VNIC ports.

Driver Design
==============
Intel OPA VNIC software design is presented in the below diagram.
OPA VNIC functionality has a HW dependent component and a HW
independent component.

The support has been added for IB device to allocate and free the RDMA
netdev devices. The RDMA netdev supports interfacing with the network
stack thus creating standard network interfaces. OPA_VNIC is an RDMA
netdev device type.

The HW dependent VNIC functionality is part of the HFI1 driver. It
implements the verbs to allocate and free the OPA_VNIC RDMA netdev.
It involves HW resource allocation/management for VNIC functionality.
It interfaces with the network stack and implements the required
net_device_ops functions. It expects Omni-Path encapsulated Ethernet
packets in the transmit path and provides HW access to them. It strips
the Omni-Path header from the received packets before passing them up
the network stack. It also implements the RDMA netdev control operations.

The OPA VNIC module implements the HW independent VNIC functionality.
It consists of two parts. The VNIC Ethernet Management Agent (VEMA)
registers itself with IB core as an IB client and interfaces with the
IB MAD stack. It exchanges the management information with the Ethernet
Manager (EM) and the VNIC netdev. The VNIC netdev part allocates and frees
the OPA_VNIC RDMA netdev devices. It overrides the net_device_ops functions
set by HW dependent VNIC driver where required to accommodate any control
operation. It also handles the encapsulation of Ethernet packets with an
Omni-Path header in the transmit path. For each VNIC interface, the
information required for encapsulation is configured by the EM via VEMA MAD
interface. It also passes any control information to the HW dependent driver
by invoking the RDMA netdev control operations.

+-------------------+ +----------------------+
| | | Linux |
| IB MAD | | Network |
| | | Stack |
+-------------------+ +----------------------+
| | |
| | |
+----------------------------+ |
| | |
| OPA VNIC Module | |
| (OPA VNIC RDMA Netdev | |
| & EMA functions) | |
| | |
+----------------------------+ |
| |
| |
+------------------+ |
| IB core | |
+------------------+ |
| |
| |
+--------------------------------------------+
| |
| HFI1 Driver with VNIC support |
| |
+--------------------------------------------+
16 changes: 12 additions & 4 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -5911,6 +5911,13 @@ F: drivers/block/cciss*
F: include/linux/cciss_ioctl.h
F: include/uapi/linux/cciss_ioctl.h

OPA-VNIC DRIVER
M: Dennis Dalessandro <[email protected]>
M: Niranjana Vishwanathapura <[email protected]>
L: [email protected]
S: Supported
F: drivers/infiniband/ulp/opa_vnic

HFI1 DRIVER
M: Mike Marciniszyn <[email protected]>
M: Dennis Dalessandro <[email protected]>
Expand Down Expand Up @@ -6519,6 +6526,7 @@ W: http://www.openfabrics.org/
Q: http://patchwork.kernel.org/project/linux-rdma/list/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git
S: Supported
F: Documentation/devicetree/bindings/infiniband/
F: Documentation/infiniband/
F: drivers/infiniband/
F: include/uapi/linux/if_infiniband.h
Expand Down Expand Up @@ -11489,11 +11497,11 @@ S: Supported
F: drivers/net/ethernet/emulex/benet/

EMULEX ONECONNECT ROCE DRIVER
M: Selvin Xavier <selvin.xavier@avagotech.com>
M: Devesh Sharma <devesh.sharma@avagotech.com>
M: Selvin Xavier <selvin.xavier@broadcom.com>
M: Devesh Sharma <devesh.sharma@broadcom.com>
L: [email protected]
W: http://www.emulex.com
S: Supported
W: http://www.broadcom.com
S: Odd Fixes
F: drivers/infiniband/hw/ocrdma/
F: include/uapi/rdma/ocrdma-abi.h

Expand Down
1 change: 1 addition & 0 deletions drivers/infiniband/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ source "drivers/infiniband/ulp/srpt/Kconfig"
source "drivers/infiniband/ulp/iser/Kconfig"
source "drivers/infiniband/ulp/isert/Kconfig"

source "drivers/infiniband/ulp/opa_vnic/Kconfig"
source "drivers/infiniband/sw/rdmavt/Kconfig"
source "drivers/infiniband/sw/rxe/Kconfig"

Expand Down
3 changes: 2 additions & 1 deletion drivers/infiniband/core/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,5 @@ ib_umad-y := user_mad.o

ib_ucm-y := ucm.o

ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o
ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
rdma_core.o uverbs_std_types.o
6 changes: 3 additions & 3 deletions drivers/infiniband/core/addr.c
Original file line number Diff line number Diff line change
Expand Up @@ -444,9 +444,9 @@ static int addr6_resolve(struct sockaddr_in6 *src_in,
fl6.saddr = src_in->sin6_addr;
fl6.flowi6_oif = addr->bound_dev_if;

dst = ip6_route_output(addr->net, NULL, &fl6);
if ((ret = dst->error))
goto put;
ret = ipv6_stub->ipv6_dst_lookup(addr->net, NULL, &dst, &fl6);
if (ret < 0)
return ret;

rt = (struct rt6_info *)dst;
if (ipv6_addr_any(&fl6.saddr)) {
Expand Down
4 changes: 2 additions & 2 deletions drivers/infiniband/core/agent.c
Original file line number Diff line number Diff line change
Expand Up @@ -137,13 +137,13 @@ void agent_send_response(const struct ib_mad_hdr *mad_hdr, const struct ib_grh *
err2:
ib_free_send_mad(send_buf);
err1:
ib_destroy_ah(ah);
rdma_destroy_ah(ah);
}

static void agent_send_handler(struct ib_mad_agent *mad_agent,
struct ib_mad_send_wc *mad_send_wc)
{
ib_destroy_ah(mad_send_wc->send_buf->ah);
rdma_destroy_ah(mad_send_wc->send_buf->ah);
ib_free_send_mad(mad_send_wc->send_buf);
}

Expand Down
Loading

0 comments on commit 1684096

Please sign in to comment.