Skip to content

Commit

Permalink
docs: infiniband: convert docs to ReST and rename to *.rst
Browse files Browse the repository at this point in the history
The InfiniBand docs are plain text with no markups.  So, all we needed to
do were to add the title markups and some markup sequences in order to
properly parse tables, lists and literal blocks.

At its new index.rst, let's add a :orphan: while this is not linked to the
main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
  • Loading branch information
mchehab authored and jgunthorpe committed Jun 25, 2019
1 parent b417c08 commit 97162a1
Show file tree
Hide file tree
Showing 10 changed files with 175 additions and 104 deletions.
Original file line number Diff line number Diff line change
@@ -1,50 +1,54 @@
INFINIBAND MIDLAYER LOCKING
===========================
InfiniBand Midlayer Locking
===========================

This guide is an attempt to make explicit the locking assumptions
made by the InfiniBand midlayer. It describes the requirements on
both low-level drivers that sit below the midlayer and upper level
protocols that use the midlayer.

Sleeping and interrupt context
==============================

With the following exceptions, a low-level driver implementation of
all of the methods in struct ib_device may sleep. The exceptions
are any methods from the list:

create_ah
modify_ah
query_ah
destroy_ah
post_send
post_recv
poll_cq
req_notify_cq
map_phys_fmr
- create_ah
- modify_ah
- query_ah
- destroy_ah
- post_send
- post_recv
- poll_cq
- req_notify_cq
- map_phys_fmr

which may not sleep and must be callable from any context.

The corresponding functions exported to upper level protocol
consumers:

ib_create_ah
ib_modify_ah
ib_query_ah
ib_destroy_ah
ib_post_send
ib_post_recv
ib_req_notify_cq
ib_map_phys_fmr
- ib_create_ah
- ib_modify_ah
- ib_query_ah
- ib_destroy_ah
- ib_post_send
- ib_post_recv
- ib_req_notify_cq
- ib_map_phys_fmr

are therefore safe to call from any context.

In addition, the function

ib_dispatch_event
- ib_dispatch_event

used by low-level drivers to dispatch asynchronous events through
the midlayer is also safe to call from any context.

Reentrancy
----------

All of the methods in struct ib_device exported by a low-level
driver must be fully reentrant. The low-level driver is required to
Expand All @@ -62,6 +66,7 @@ Reentrancy
information between different calls of ib_poll_cq() is not defined.

Callbacks
---------

A low-level driver must not perform a callback directly from the
same callchain as an ib_device method call. For example, it is not
Expand All @@ -74,25 +79,26 @@ Callbacks
completion event handlers for the same CQ are not called
simultaneously. The driver must guarantee that only one CQ event
handler for a given CQ is running at a time. In other words, the
following situation is not allowed:
following situation is not allowed::

CPU1 CPU2
CPU1 CPU2

low-level driver ->
consumer CQ event callback:
/* ... */
ib_req_notify_cq(cq, ...);
low-level driver ->
/* ... */ consumer CQ event callback:
/* ... */
return from CQ event handler
low-level driver ->
consumer CQ event callback:
/* ... */
ib_req_notify_cq(cq, ...);
low-level driver ->
/* ... */ consumer CQ event callback:
/* ... */
return from CQ event handler

The context in which completion event and asynchronous event
callbacks run is not defined. Depending on the low-level driver, it
may be process context, softirq context, or interrupt context.
Upper level protocol consumers may not sleep in a callback.

Hot-plug
--------

A low-level driver announces that a device is ready for use by
consumers when it calls ib_register_device(), all initialization
Expand Down
23 changes: 23 additions & 0 deletions Documentation/infiniband/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
:orphan:

==========
InfiniBand
==========

.. toctree::
:maxdepth: 1

core_locking
ipoib
opa_vnic
sysfs
tag_matching
user_mad
user_verbs

.. only:: subproject and html

Indices
=======

* :ref:`genindex`
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
IP OVER INFINIBAND
==================
IP over InfiniBand
==================

The ib_ipoib driver is an implementation of the IP over InfiniBand
protocol as specified by RFC 4391 and 4392, issued by the IETF ipoib
Expand All @@ -8,16 +10,17 @@ IP OVER INFINIBAND
masqueraded to the kernel as ethernet interfaces).

Partitions and P_Keys
=====================

When the IPoIB driver is loaded, it creates one interface for each
port using the P_Key at index 0. To create an interface with a
different P_Key, write the desired P_Key into the main interface's
/sys/class/net/<intf name>/create_child file. For example:
/sys/class/net/<intf name>/create_child file. For example::

echo 0x8001 > /sys/class/net/ib0/create_child

This will create an interface named ib0.8001 with P_Key 0x8001. To
remove a subinterface, use the "delete_child" file:
remove a subinterface, use the "delete_child" file::

echo 0x8001 > /sys/class/net/ib0/delete_child

Expand All @@ -28,6 +31,7 @@ Partitions and P_Keys
rtnl_link_ops, where children created using either way behave the same.

Datagram vs Connected modes
===========================

The IPoIB driver supports two modes of operation: datagram and
connected. The mode is set and read through an interface's
Expand All @@ -51,6 +55,7 @@ Datagram vs Connected modes
networking stack to use the smaller UD MTU for these neighbours.

Stateless offloads
==================

If the IB HW supports IPoIB stateless offloads, IPoIB advertises
TCP/IP checksum and/or Large Send (LSO) offloading capability to the
Expand All @@ -60,9 +65,10 @@ Stateless offloads
on/off using ethtool calls. Currently LRO is supported only for
checksum offload capable devices.

Stateless offloads are supported only in datagram mode.
Stateless offloads are supported only in datagram mode.

Interrupt moderation
====================

If the underlying IB device supports CQ event moderation, one can
use ethtool to set interrupt mitigation parameters and thus reduce
Expand All @@ -71,6 +77,7 @@ Interrupt moderation
moderation is supported.

Debugging Information
=====================

By compiling the IPoIB driver with CONFIG_INFINIBAND_IPOIB_DEBUG set
to 'y', tracing messages are compiled into the driver. They are
Expand All @@ -79,7 +86,7 @@ Debugging Information
runtime through files in /sys/module/ib_ipoib/.

CONFIG_INFINIBAND_IPOIB_DEBUG also enables files in the debugfs
virtual filesystem. By mounting this filesystem, for example with
virtual filesystem. By mounting this filesystem, for example with::

mount -t debugfs none /sys/kernel/debug

Expand All @@ -96,10 +103,13 @@ Debugging Information
performance, because it adds tests to the fast path.

References
==========

Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
http://ietf.org/rfc/rfc4391.txt
http://ietf.org/rfc/rfc4391.txt

IP over InfiniBand (IPoIB) Architecture (RFC 4392)
http://ietf.org/rfc/rfc4392.txt
http://ietf.org/rfc/rfc4392.txt

IP over InfiniBand: Connected Mode (RFC 4755)
http://ietf.org/rfc/rfc4755.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
=================================================================
Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC)
=================================================================

Intel Omni-Path (OPA) Virtual Network Interface Controller (VNIC) feature
supports Ethernet functionality over Omni-Path fabric by encapsulating
the Ethernet packets between HFI nodes.
Expand All @@ -17,70 +21,72 @@ an independent Ethernet network. The configuration is performed by an
Ethernet Manager (EM) which is part of the trusted Fabric Manager (FM)
application. HFI nodes can have multiple VNICs each connected to a
different virtual Ethernet switch. The below diagram presents a case
of two virtual Ethernet switches with two HFI nodes.

+-------------------+
| Subnet/ |
| Ethernet |
| Manager |
+-------------------+
/ /
/ /
/ /
/ /
+-----------------------------+ +------------------------------+
| Virtual Ethernet Switch | | Virtual Ethernet Switch |
| +---------+ +---------+ | | +---------+ +---------+ |
| | VPORT | | VPORT | | | | VPORT | | VPORT | |
+--+---------+----+---------+-+ +-+---------+----+---------+---+
| \ / |
| \ / |
| \/ |
| / \ |
| / \ |
+-----------+------------+ +-----------+------------+
| VNIC | VNIC | | VNIC | VNIC |
+-----------+------------+ +-----------+------------+
| HFI | | HFI |
+------------------------+ +------------------------+
of two virtual Ethernet switches with two HFI nodes::

+-------------------+
| Subnet/ |
| Ethernet |
| Manager |
+-------------------+
/ /
/ /
/ /
/ /
+-----------------------------+ +------------------------------+
| Virtual Ethernet Switch | | Virtual Ethernet Switch |
| +---------+ +---------+ | | +---------+ +---------+ |
| | VPORT | | VPORT | | | | VPORT | | VPORT | |
+--+---------+----+---------+-+ +-+---------+----+---------+---+
| \ / |
| \ / |
| \/ |
| / \ |
| / \ |
+-----------+------------+ +-----------+------------+
| VNIC | VNIC | | VNIC | VNIC |
+-----------+------------+ +-----------+------------+
| HFI | | HFI |
+------------------------+ +------------------------+


The Omni-Path encapsulated Ethernet packet format is as described below.

Bits Field
------------------------------------
==================== ================================
Bits Field
==================== ================================
Quad Word 0:
0-19 SLID (lower 20 bits)
20-30 Length (in Quad Words)
31 BECN bit
32-51 DLID (lower 20 bits)
52-56 SC (Service Class)
57-59 RC (Routing Control)
60 FECN bit
61-62 L2 (=10, 16B format)
63 LT (=1, Link Transfer Head Flit)
0-19 SLID (lower 20 bits)
20-30 Length (in Quad Words)
31 BECN bit
32-51 DLID (lower 20 bits)
52-56 SC (Service Class)
57-59 RC (Routing Control)
60 FECN bit
61-62 L2 (=10, 16B format)
63 LT (=1, Link Transfer Head Flit)

Quad Word 1:
0-7 L4 type (=0x78 ETHERNET)
8-11 SLID[23:20]
12-15 DLID[23:20]
16-31 PKEY
32-47 Entropy
48-63 Reserved
0-7 L4 type (=0x78 ETHERNET)
8-11 SLID[23:20]
12-15 DLID[23:20]
16-31 PKEY
32-47 Entropy
48-63 Reserved

Quad Word 2:
0-15 Reserved
16-31 L4 header
32-63 Ethernet Packet
0-15 Reserved
16-31 L4 header
32-63 Ethernet Packet

Quad Words 3 to N-1:
0-63 Ethernet packet (pad extended)
0-63 Ethernet packet (pad extended)

Quad Word N (last):
0-23 Ethernet packet (pad extended)
24-55 ICRC
56-61 Tail
62-63 LT (=01, Link Transfer Tail Flit)
0-23 Ethernet packet (pad extended)
24-55 ICRC
56-61 Tail
62-63 LT (=01, Link Transfer Tail Flit)
==================== ================================

Ethernet packet is padded on the transmit side to ensure that the VNIC OPA
packet is quad word aligned. The 'Tail' field contains the number of bytes
Expand Down Expand Up @@ -123,7 +129,7 @@ operation. It also handles the encapsulation of Ethernet packets with an
Omni-Path header in the transmit path. For each VNIC interface, the
information required for encapsulation is configured by the EM via VEMA MAD
interface. It also passes any control information to the HW dependent driver
by invoking the RDMA netdev control operations.
by invoking the RDMA netdev control operations::

+-------------------+ +----------------------+
| | | Linux |
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
SYSFS FILES
===========
Sysfs files
===========

The sysfs interface has moved to
Documentation/ABI/stable/sysfs-class-infiniband.
Loading

0 comments on commit 97162a1

Please sign in to comment.