Skip to content

Commit

Permalink
Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kerne…
Browse files Browse the repository at this point in the history
…l/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - BPF:
      - add syscall program type and libbpf support for generating
        instructions and bindings for in-kernel BPF loaders (BPF loaders
        for BPF), this is a stepping stone for signed BPF programs
      - infrastructure to migrate TCP child sockets from one listener to
        another in the same reuseport group/map to improve flexibility
        of service hand-off/restart
      - add broadcast support to XDP redirect

   - allow bypass of the lockless qdisc to improving performance (for
     pktgen: +23% with one thread, +44% with 2 threads)

   - add a simpler version of "DO_ONCE()" which does not require jump
     labels, intended for slow-path usage

   - virtio/vsock: introduce SOCK_SEQPACKET support

   - add getsocketopt to retrieve netns cookie

   - ip: treat lowest address of a IPv4 subnet as ordinary unicast
     address allowing reclaiming of precious IPv4 addresses

   - ipv6: use prandom_u32() for ID generation

   - ip: add support for more flexible field selection for hashing
     across multi-path routes (w/ offload to mlxsw)

   - icmp: add support for extended RFC 8335 PROBE (ping)

   - seg6: add support for SRv6 End.DT46 behavior

   - mptcp:
      - DSS checksum support (RFC 8684) to detect middlebox meddling
      - support Connection-time 'C' flag
      - time stamping support

   - sctp: packetization Layer Path MTU Discovery (RFC 8899)

   - xfrm: speed up state addition with seq set

   - WiFi:
      - hidden AP discovery on 6 GHz and other HE 6 GHz improvements
      - aggregation handling improvements for some drivers
      - minstrel improvements for no-ack frames
      - deferred rate control for TXQs to improve reaction times
      - switch from round robin to virtual time-based airtime scheduler

   - add trace points:
      - tcp checksum errors
      - openvswitch - action execution, upcalls
      - socket errors via sk_error_report

  Device APIs:

   - devlink: add rate API for hierarchical control of max egress rate
     of virtual devices (VFs, SFs etc.)

   - don't require RCU read lock to be held around BPF hooks in NAPI
     context

   - page_pool: generic buffer recycling

  New hardware/drivers:

   - mobile:
      - iosm: PCIe Driver for Intel M.2 Modem
      - support for Qualcomm MSM8998 (ipa)

   - WiFi: Qualcomm QCN9074 and WCN6855 PCI devices

   - sparx5: Microchip SparX-5 family of Enterprise Ethernet switches

   - Mellanox BlueField Gigabit Ethernet (control NIC of the DPU)

   - NXP SJA1110 Automotive Ethernet 10-port switch

   - Qualcomm QCA8327 switch support (qca8k)

   - Mikrotik 10/25G NIC (atl1c)

  Driver changes:

   - ACPI support for some MDIO, MAC and PHY devices from Marvell and
     NXP (our first foray into MAC/PHY description via ACPI)

   - HW timestamping (PTP) support: bnxt_en, ice, sja1105, hns3, tja11xx

   - Mellanox/Nvidia NIC (mlx5)
      - NIC VF offload of L2 bridging
      - support IRQ distribution to Sub-functions

   - Marvell (prestera):
      - add flower and match all
      - devlink trap
      - link aggregation

   - Netronome (nfp): connection tracking offload

   - Intel 1GE (igc): add AF_XDP support

   - Marvell DPU (octeontx2): ingress ratelimit offload

   - Google vNIC (gve): new ring/descriptor format support

   - Qualcomm mobile (rmnet & ipa): inline checksum offload support

   - MediaTek WiFi (mt76)
      - mt7915 MSI support
      - mt7915 Tx status reporting
      - mt7915 thermal sensors support
      - mt7921 decapsulation offload
      - mt7921 enable runtime pm and deep sleep

   - Realtek WiFi (rtw88)
      - beacon filter support
      - Tx antenna path diversity support
      - firmware crash information via devcoredump

   - Qualcomm WiFi (wcn36xx)
      - Wake-on-WLAN support with magic packets and GTK rekeying

   - Micrel PHY (ksz886x/ksz8081): add cable test support"

* tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2168 commits)
  tcp: change ICSK_CA_PRIV_SIZE definition
  tcp_yeah: check struct yeah size at compile time
  gve: DQO: Fix off by one in gve_rx_dqo()
  stmmac: intel: set PCI_D3hot in suspend
  stmmac: intel: Enable PHY WOL option in EHL
  net: stmmac: option to enable PHY WOL with PMT enabled
  net: say "local" instead of "static" addresses in ndo_dflt_fdb_{add,del}
  net: use netdev_info in ndo_dflt_fdb_{add,del}
  ptp: Set lookup cookie when creating a PTP PPS source.
  net: sock: add trace for socket errors
  net: sock: introduce sk_error_report
  net: dsa: replay the local bridge FDB entries pointing to the bridge dev too
  net: dsa: ensure during dsa_fdb_offload_notify that dev_hold and dev_put are on the same dev
  net: dsa: include fdb entries pointing to bridge in the host fdb list
  net: dsa: include bridge addresses which are local in the host fdb list
  net: dsa: sync static FDB entries on foreign interfaces to hardware
  net: dsa: install the host MDB and FDB entries in the master's RX filter
  net: dsa: reference count the FDB addresses at the cross-chip notifier level
  net: dsa: introduce a separate cross-chip notifier type for host FDBs
  net: dsa: reference count the MDB entries at the cross-chip notifier level
  ...
  • Loading branch information
torvalds committed Jun 30, 2021
2 parents a6eaf38 + b6df007 commit dbe69e4
Show file tree
Hide file tree
Showing 1,908 changed files with 109,791 additions and 28,910 deletions.
78 changes: 78 additions & 0 deletions Documentation/ABI/testing/sysfs-devices-platform-soc-ipa
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
What: /sys/devices/platform/soc@X/XXXXXXX.ipa/
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The /sys/devices/platform/soc@X/XXXXXXX.ipa/ directory
contains read-only attributes exposing information about
an IPA device. The X values could vary, but are typically
"soc@0/1e40000.ipa".

What: .../XXXXXXX.ipa/version
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/version file contains the IPA hardware
version, as a period-separated set of two or three integers
(e.g., "3.5.1" or "4.2").

What: .../XXXXXXX.ipa/feature/
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/feature/ directory contains a set of
attributes describing features implemented by the IPA
hardware.

What: .../XXXXXXX.ipa/feature/rx_offload
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/feature/rx_offload file contains a
string indicating the type of receive checksum offload
that is supported by the hardware. The possible values
are "MAPv4" or "MAPv5".

What: .../XXXXXXX.ipa/feature/tx_offload
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/feature/tx_offload file contains a
string indicating the type of transmit checksum offload
that is supported by the hardware. The possible values
are "MAPv4" or "MAPv5".

What: .../XXXXXXX.ipa/modem/
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/modem/ directory contains a set of
attributes describing properties of the modem execution
environment reachable by the IPA hardware.

What: .../XXXXXXX.ipa/modem/rx_endpoint_id
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/feature/rx_endpoint_id file contains
the AP endpoint ID that receives packets originating from
the modem execution environment. The "rx" is from the
perspective of the AP; this endpoint is considered an "IPA
producer". An endpoint ID is a small unsigned integer.

What: .../XXXXXXX.ipa/modem/tx_endpoint_id
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/feature/tx_endpoint_id file contains
the AP endpoint ID used to transmit packets destined for
the modem execution environment. The "tx" is from the
perspective of the AP; this endpoint is considered an "IPA
consumer". An endpoint ID is a small unsigned integer.
55 changes: 34 additions & 21 deletions Documentation/RCU/checklist.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,27 +211,40 @@ over a rather long period of time, but improvements are always welcome!
of the system, especially to real-time workloads running on
the rest of the system.

7. As of v4.20, a given kernel implements only one RCU flavor,
which is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y.
If the updater uses call_rcu() or synchronize_rcu(),
then the corresponding readers may use rcu_read_lock() and
rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(),
or any pair of primitives that disables and re-enables preemption,
for example, rcu_read_lock_sched() and rcu_read_unlock_sched().
If the updater uses synchronize_srcu() or call_srcu(),
then the corresponding readers must use srcu_read_lock() and
srcu_read_unlock(), and with the same srcu_struct. The rules for
the expedited primitives are the same as for their non-expedited
counterparts. Mixing things up will result in confusion and
broken kernels, and has even resulted in an exploitable security
issue.

One exception to this rule: rcu_read_lock() and rcu_read_unlock()
may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
in cases where local bottom halves are already known to be
disabled, for example, in irq or softirq context. Commenting
such cases is a must, of course! And the jury is still out on
whether the increased speed is worth it.
7. As of v4.20, a given kernel implements only one RCU flavor, which
is RCU-sched for PREEMPTION=n and RCU-preempt for PREEMPTION=y.
If the updater uses call_rcu() or synchronize_rcu(), then
the corresponding readers may use: (1) rcu_read_lock() and
rcu_read_unlock(), (2) any pair of primitives that disables
and re-enables softirq, for example, rcu_read_lock_bh() and
rcu_read_unlock_bh(), or (3) any pair of primitives that disables
and re-enables preemption, for example, rcu_read_lock_sched() and
rcu_read_unlock_sched(). If the updater uses synchronize_srcu()
or call_srcu(), then the corresponding readers must use
srcu_read_lock() and srcu_read_unlock(), and with the same
srcu_struct. The rules for the expedited RCU grace-period-wait
primitives are the same as for their non-expedited counterparts.

If the updater uses call_rcu_tasks() or synchronize_rcu_tasks(),
then the readers must refrain from executing voluntary
context switches, that is, from blocking. If the updater uses
call_rcu_tasks_trace() or synchronize_rcu_tasks_trace(), then
the corresponding readers must use rcu_read_lock_trace() and
rcu_read_unlock_trace(). If an updater uses call_rcu_tasks_rude()
or synchronize_rcu_tasks_rude(), then the corresponding readers
must use anything that disables interrupts.

Mixing things up will result in confusion and broken kernels, and
has even resulted in an exploitable security issue. Therefore,
when using non-obvious pairs of primitives, commenting is
of course a must. One example of non-obvious pairing is
the XDP feature in networking, which calls BPF programs from
network-driver NAPI (softirq) context. BPF relies heavily on RCU
protection for its data structures, but because the BPF program
invocation happens entirely within a single local_bh_disable()
section in a NAPI poll cycle, this usage is safe. The reason
that this usage is safe is that readers can use anything that
disables BH when updaters use call_rcu() or synchronize_rcu().

8. Although synchronize_rcu() is slower than is call_rcu(), it
usually results in simpler code. So, unless update performance is
Expand Down
14 changes: 14 additions & 0 deletions Documentation/bpf/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,19 @@ BPF instruction-set.
The Cilium project also maintains a `BPF and XDP Reference Guide`_
that goes into great technical depth about the BPF Architecture.

libbpf
======

Libbpf is a userspace library for loading and interacting with bpf programs.

.. toctree::
:maxdepth: 1

libbpf/libbpf
libbpf/libbpf_api
libbpf/libbpf_build
libbpf/libbpf_naming_convention

BPF Type Format (BTF)
=====================

Expand Down Expand Up @@ -84,6 +97,7 @@ Other
:maxdepth: 1

ringbuf
llvm_reloc

.. Links:
.. _networking-filter: ../networking/filter.rst
Expand Down
14 changes: 14 additions & 0 deletions Documentation/bpf/libbpf/libbpf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
libbpf
======

This is documentation for libbpf, a userspace library for loading and
interacting with bpf programs.

All general BPF questions, including kernel functionality, libbpf APIs and
their application, should be sent to [email protected] mailing list.
You can `subscribe <http://vger.kernel.org/vger-lists.html#bpf>`_ to the
mailing list search its `archive <https://lore.kernel.org/bpf/>`_.
Please search the archive before asking new questions. It very well might
be that this was already addressed or answered before.
27 changes: 27 additions & 0 deletions Documentation/bpf/libbpf/libbpf_api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
API
===

This documentation is autogenerated from header files in libbpf, tools/lib/bpf

.. kernel-doc:: tools/lib/bpf/libbpf.h
:internal:

.. kernel-doc:: tools/lib/bpf/bpf.h
:internal:

.. kernel-doc:: tools/lib/bpf/btf.h
:internal:

.. kernel-doc:: tools/lib/bpf/xsk.h
:internal:

.. kernel-doc:: tools/lib/bpf/bpf_tracing.h
:internal:

.. kernel-doc:: tools/lib/bpf/bpf_core_read.h
:internal:

.. kernel-doc:: tools/lib/bpf/bpf_endian.h
:internal:
37 changes: 37 additions & 0 deletions Documentation/bpf/libbpf/libbpf_build.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
Building libbpf
===============

libelf and zlib are internal dependencies of libbpf and thus are required to link
against and must be installed on the system for applications to work.
pkg-config is used by default to find libelf, and the program called
can be overridden with PKG_CONFIG.

If using pkg-config at build time is not desired, it can be disabled by
setting NO_PKG_CONFIG=1 when calling make.

To build both static libbpf.a and shared libbpf.so:

.. code-block:: bash
$ cd src
$ make
To build only static libbpf.a library in directory build/ and install them
together with libbpf headers in a staging directory root/:

.. code-block:: bash
$ cd src
$ mkdir build root
$ BUILD_STATIC_ONLY=y OBJDIR=build DESTDIR=root make install
To build both static libbpf.a and shared libbpf.so against a custom libelf
dependency installed in /build/root/ and install them together with libbpf
headers in a build directory /build/root/:

.. code-block:: bash
$ cd src
$ PKG_CONFIG_PATH=/build/root/lib64/pkgconfig DESTDIR=/build/root make
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
libbpf API naming convention
============================
API naming convention
=====================

libbpf API provides access to a few logically separated groups of
functions and types. Every group has its own naming convention
Expand All @@ -10,14 +10,14 @@ new function or type is added to keep libbpf API clean and consistent.

All types and functions provided by libbpf API should have one of the
following prefixes: ``bpf_``, ``btf_``, ``libbpf_``, ``xsk_``,
``perf_buffer_``.
``btf_dump_``, ``ring_buffer_``, ``perf_buffer_``.

System call wrappers
--------------------

System call wrappers are simple wrappers for commands supported by
sys_bpf system call. These wrappers should go to ``bpf.h`` header file
and map one-on-one to corresponding commands.
and map one to one to corresponding commands.

For example ``bpf_map_lookup_elem`` wraps ``BPF_MAP_LOOKUP_ELEM``
command of sys_bpf, ``bpf_prog_attach`` wraps ``BPF_PROG_ATTACH``, etc.
Expand Down Expand Up @@ -49,10 +49,6 @@ object, ``bpf_object``, double underscore and ``open`` that defines the
purpose of the function to open ELF file and create ``bpf_object`` from
it.

Another example: ``bpf_program__load`` is named for corresponding
object, ``bpf_program``, that is separated from other part of the name
by double underscore.

All objects and corresponding functions other than BTF related should go
to ``libbpf.h``. BTF types and functions should go to ``btf.h``.

Expand All @@ -72,11 +68,7 @@ of both low-level ring access functions and high-level configuration
functions. These can be mixed and matched. Note that these functions
are not reentrant for performance reasons.

Please take a look at Documentation/networking/af_xdp.rst in the Linux
kernel source tree on how to use XDP sockets and for some common
mistakes in case you do not get any traffic up to user space.

libbpf ABI
ABI
==========

libbpf can be both linked statically or used as DSO. To avoid possible
Expand Down Expand Up @@ -116,7 +108,8 @@ This bump in ABI version is at most once per kernel development cycle.

For example, if current state of ``libbpf.map`` is:

.. code-block::
.. code-block:: c
LIBBPF_0.0.1 {
global:
bpf_func_a;
Expand All @@ -128,7 +121,8 @@ For example, if current state of ``libbpf.map`` is:
, and a new symbol ``bpf_func_c`` is being introduced, then
``libbpf.map`` should be changed like this:

.. code-block::
.. code-block:: c
LIBBPF_0.0.1 {
global:
bpf_func_a;
Expand All @@ -148,7 +142,7 @@ Format of version script and ways to handle ABI changes, including
incompatible ones, described in details in [1].

Stand-alone build
=================
-------------------

Under https://github.com/libbpf/libbpf there is a (semi-)automated
mirror of the mainline's version of libbpf for a stand-alone build.
Expand All @@ -157,12 +151,12 @@ However, all changes to libbpf's code base must be upstreamed through
the mainline kernel tree.

License
=======
-------------------

libbpf is dual-licensed under LGPL 2.1 and BSD 2-Clause.

Links
=====
-------------------

[1] https://www.akkadia.org/drepper/dsohowto.pdf
(Chapter 3. Maintaining APIs and ABIs).
Loading

0 comments on commit dbe69e4

Please sign in to comment.