Skip to content

Commit

Permalink
Merge tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel…
Browse files Browse the repository at this point in the history
…/git/netdev/net-next

Pull networking changes from Paolo Abeni:
 "Core:

   - Refactor the forward memory allocation to better cope with memory
     pressure with many open sockets, moving from a per socket cache to
     a per-CPU one

   - Replace rwlocks with RCU for better fairness in ping, raw sockets
     and IP multicast router.

   - Network-side support for IO uring zero-copy send.

   - A few skb drop reason improvements, including codegen the source
     file with string mapping instead of using macro magic.

   - Rename reference tracking helpers to a more consistent netdev_*
     schema.

   - Adapt u64_stats_t type to address load/store tearing issues.

   - Refine debug helper usage to reduce the log noise caused by bots.

  BPF:

   - Improve socket map performance, avoiding skb cloning on read
     operation.

   - Add support for 64 bits enum, to match types exposed by kernel.

   - Introduce support for sleepable uprobes program.

   - Introduce support for enum textual representation in libbpf.

   - New helpers to implement synproxy with eBPF/XDP.

   - Improve loop performances, inlining indirect calls when possible.

   - Removed all the deprecated libbpf APIs.

   - Implement new eBPF-based LSM flavor.

   - Add type match support, which allow accurate queries to the eBPF
     used types.

   - A few TCP congetsion control framework usability improvements.

   - Add new infrastructure to manipulate CT entries via eBPF programs.

   - Allow for livepatch (KLP) and BPF trampolines to attach to the same
     kernel function.

  Protocols:

   - Introduce per network namespace lookup tables for unix sockets,
     increasing scalability and reducing contention.

   - Preparation work for Wi-Fi 7 Multi-Link Operation (MLO) support.

   - Add support to forciby close TIME_WAIT TCP sockets via user-space
     tools.

   - Significant performance improvement for the TLS 1.3 receive path,
     both for zero-copy and not-zero-copy.

   - Support for changing the initial MTPCP subflow priority/backup
     status

   - Introduce virtually contingus buffers for sockets over RDMA, to
     cope better with memory pressure.

   - Extend CAN ethtool support with timestamping capabilities

   - Refactor CAN build infrastructure to allow building only the needed
     features.

  Driver API:

   - Remove devlink mutex to allow parallel commands on multiple links.

   - Add support for pause stats in distributed switch.

   - Implement devlink helpers to query and flash line cards.

   - New helper for phy mode to register conversion.

  New hardware / drivers:

   - Ethernet DSA driver for the rockchip mt7531 on BPI-R2 Pro.

   - Ethernet DSA driver for the Renesas RZ/N1 A5PSW switch.

   - Ethernet DSA driver for the Microchip LAN937x switch.

   - Ethernet PHY driver for the Aquantia AQR113C EPHY.

   - CAN driver for the OBD-II ELM327 interface.

   - CAN driver for RZ/N1 SJA1000 CAN controller.

   - Bluetooth: Infineon CYW55572 Wi-Fi plus Bluetooth combo device.

  Drivers:

   - Intel Ethernet NICs:
      - i40e: add support for vlan pruning
      - i40e: add support for XDP framented packets
      - ice: improved vlan offload support
      - ice: add support for PPPoE offload

   - Mellanox Ethernet (mlx5)
      - refactor packet steering offload for performance and scalability
      - extend support for TC offload
      - refactor devlink code to clean-up the locking schema
      - support stacked vlans for bridge offloads
      - use TLS objects pool to improve connection rate

   - Netronome Ethernet NICs (nfp):
      - extend support for IPv6 fields mangling offload
      - add support for vepa mode in HW bridge
      - better support for virtio data path acceleration (VDPA)
      - enable TSO by default

   - Microsoft vNIC driver (mana)
      - add support for XDP redirect

   - Others Ethernet drivers:
      - bonding: add per-port priority support
      - microchip lan743x: extend phy support
      - Fungible funeth: support UDP segmentation offload and XDP xmit
      - Solarflare EF100: add support for virtual function representors
      - MediaTek SoC: add XDP support

   - Mellanox Ethernet/IB switch (mlxsw):
      - dropped support for unreleased H/W (XM router).
      - improved stats accuracy
      - unified bridge model coversion improving scalability (parts 1-6)
      - support for PTP in Spectrum-2 asics

   - Broadcom PHYs
      - add PTP support for BCM54210E
      - add support for the BCM53128 internal PHY

   - Marvell Ethernet switches (prestera):
      - implement support for multicast forwarding offload

   - Embedded Ethernet switches:
      - refactor OcteonTx MAC filter for better scalability
      - improve TC H/W offload for the Felix driver
      - refactor the Microchip ksz8 and ksz9477 drivers to share the
        probe code (parts 1, 2), add support for phylink mac
        configuration

   - Other WiFi:
      - Microchip wilc1000: diable WEP support and enable WPA3
      - Atheros ath10k: encapsulation offload support

  Old code removal:

   - Neterion vxge ethernet driver: this is untouched since more than 10 years"

* tag 'net-next-6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1890 commits)
  doc: sfp-phylink: Fix a broken reference
  wireguard: selftests: support UML
  wireguard: allowedips: don't corrupt stack when detecting overflow
  wireguard: selftests: update config fragments
  wireguard: ratelimiter: use hrtimer in selftest
  net/mlx5e: xsk: Discard unaligned XSK frames on striding RQ
  net: usb: ax88179_178a: Bind only to vendor-specific interface
  selftests: net: fix IOAM test skip return code
  net: usb: make USB_RTL8153_ECM non user configurable
  net: marvell: prestera: remove reduntant code
  octeontx2-pf: Reduce minimum mtu size to 60
  net: devlink: Fix missing mutex_unlock() call
  net/tls: Remove redundant workqueue flush before destroy
  net: txgbe: Fix an error handling path in txgbe_probe()
  net: dsa: Fix spelling mistakes and cleanup code
  Documentation: devlink: add add devlink-selftests to the table of contents
  dccp: put dccp_qpolicy_full() and dccp_qpolicy_push() in the same lock
  net: ionic: fix error check for vlan flags in ionic_set_nic_features()
  net: ice: fix error NETIF_F_HW_VLAN_CTAG_FILTER check in ice_vsi_sync_fltr()
  nfp: flower: add support for tunnel offload without key ID
  ...
  • Loading branch information
torvalds committed Aug 3, 2022
2 parents 526942b + 7c6327c commit f86d1fb
Show file tree
Hide file tree
Showing 1,753 changed files with 93,690 additions and 64,652 deletions.
62 changes: 49 additions & 13 deletions Documentation/ABI/testing/sysfs-devices-platform-soc-ipa
Original file line number Diff line number Diff line change
Expand Up @@ -46,33 +46,69 @@ Description:
that is supported by the hardware. The possible values
are "MAPv4" or "MAPv5".

What: .../XXXXXXX.ipa/endpoint_id/
Date: July 2022
KernelVersion: v5.19
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/endpoint_id/ directory contains
attributes that define IDs associated with IPA
endpoints. The "rx" or "tx" in an endpoint name is
from the perspective of the AP. An endpoint ID is a
small unsigned integer.

What: .../XXXXXXX.ipa/endpoint_id/modem_rx
Date: July 2022
KernelVersion: v5.19
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/endpoint_id/modem_rx file contains
the ID of the AP endpoint on which packets originating
from the embedded modem are received.

What: .../XXXXXXX.ipa/endpoint_id/modem_tx
Date: July 2022
KernelVersion: v5.19
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/endpoint_id/modem_tx file contains
the ID of the AP endpoint on which packets destined
for the embedded modem are sent.

What: .../XXXXXXX.ipa/endpoint_id/monitor_rx
Date: July 2022
KernelVersion: v5.19
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/endpoint_id/monitor_rx file contains
the ID of the AP endpoint on which IPA "monitor" data is
received. The monitor endpoint supplies replicas of
packets that enter the IPA hardware for processing.
Each replicated packet is preceded by a fixed-size "ODL"
header (see .../XXXXXXX.ipa/feature/monitor, above).
Large packets are truncated, to reduce the bandwidth
required to provide the monitor function.

What: .../XXXXXXX.ipa/modem/
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/modem/ directory contains a set of
attributes describing properties of the modem execution
environment reachable by the IPA hardware.
The .../XXXXXXX.ipa/modem/ directory contains attributes
describing properties of the modem embedded in the SoC.

What: .../XXXXXXX.ipa/modem/rx_endpoint_id
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/feature/rx_endpoint_id file contains
the AP endpoint ID that receives packets originating from
the modem execution environment. The "rx" is from the
perspective of the AP; this endpoint is considered an "IPA
producer". An endpoint ID is a small unsigned integer.
The .../XXXXXXX.ipa/modem/rx_endpoint_id file duplicates
the value found in .../XXXXXXX.ipa/endpoint_id/modem_rx.

What: .../XXXXXXX.ipa/modem/tx_endpoint_id
Date: June 2021
KernelVersion: v5.14
Contact: Alex Elder <[email protected]>
Description:
The .../XXXXXXX.ipa/feature/tx_endpoint_id file contains
the AP endpoint ID used to transmit packets destined for
the modem execution environment. The "tx" is from the
perspective of the AP; this endpoint is considered an "IPA
consumer". An endpoint ID is a small unsigned integer.
The .../XXXXXXX.ipa/modem/tx_endpoint_id file duplicates
the value found in .../XXXXXXX.ipa/endpoint_id/modem_tx.
12 changes: 12 additions & 0 deletions Documentation/admin-guide/sysctl/net.rst
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,18 @@ GRO has decided not to coalesce, it is placed on a per-NAPI list. This
list is then passed to the stack when the number of segments reaches the
gro_normal_batch limit.

high_order_alloc_disable
------------------------

By default the allocator for page frags tries to use high order pages (order-3
on x86). While the default behavior gives good results in most cases, some users
might have hit a contention in page allocations/freeing. This was especially
true on older kernels (< 5.14) when high-order pages were not stored on per-cpu
lists. This allows to opt-in for order-0 allocation instead but is now mostly of
historical importance.

Default: 0

2. /proc/sys/net/unix - Parameters for Unix domain sockets
----------------------------------------------------------

Expand Down
49 changes: 42 additions & 7 deletions Documentation/bpf/btf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ sequentially and type id is assigned to each recognized type starting from id
#define BTF_KIND_ARRAY 3 /* Array */
#define BTF_KIND_STRUCT 4 /* Struct */
#define BTF_KIND_UNION 5 /* Union */
#define BTF_KIND_ENUM 6 /* Enumeration */
#define BTF_KIND_ENUM 6 /* Enumeration up to 32-bit values */
#define BTF_KIND_FWD 7 /* Forward */
#define BTF_KIND_TYPEDEF 8 /* Typedef */
#define BTF_KIND_VOLATILE 9 /* Volatile */
Expand All @@ -87,6 +87,7 @@ sequentially and type id is assigned to each recognized type starting from id
#define BTF_KIND_FLOAT 16 /* Floating point */
#define BTF_KIND_DECL_TAG 17 /* Decl Tag */
#define BTF_KIND_TYPE_TAG 18 /* Type Tag */
#define BTF_KIND_ENUM64 19 /* Enumeration up to 64-bit values */

Note that the type section encodes debug info, not just pure types.
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
Expand All @@ -101,10 +102,10 @@ Each type contains the following common data::
* bits 24-28: kind (e.g. int, ptr, array...etc)
* bits 29-30: unused
* bit 31: kind_flag, currently used by
* struct, union and fwd
* struct, union, fwd, enum and enum64.
*/
__u32 info;
/* "size" is used by INT, ENUM, STRUCT and UNION.
/* "size" is used by INT, ENUM, STRUCT, UNION and ENUM64.
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
Expand Down Expand Up @@ -281,10 +282,10 @@ modes exist:

``struct btf_type`` encoding requirement:
* ``name_off``: 0 or offset to a valid C identifier
* ``info.kind_flag``: 0
* ``info.kind_flag``: 0 for unsigned, 1 for signed
* ``info.kind``: BTF_KIND_ENUM
* ``info.vlen``: number of enum values
* ``size``: 4
* ``size``: 1/2/4/8

``btf_type`` is followed by ``info.vlen`` number of ``struct btf_enum``.::

Expand All @@ -297,6 +298,10 @@ The ``btf_enum`` encoding:
* ``name_off``: offset to a valid C identifier
* ``val``: any value

If the original enum value is signed and the size is less than 4,
that value will be sign extended into 4 bytes. If the size is 8,
the value will be truncated into 4 bytes.

2.2.7 BTF_KIND_FWD
~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -364,7 +369,8 @@ No additional type data follow ``btf_type``.
* ``name_off``: offset to a valid C identifier
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_FUNC
* ``info.vlen``: 0
* ``info.vlen``: linkage information (BTF_FUNC_STATIC, BTF_FUNC_GLOBAL
or BTF_FUNC_EXTERN)
* ``type``: a BTF_KIND_FUNC_PROTO type

No additional type data follow ``btf_type``.
Expand All @@ -375,6 +381,9 @@ type. The BTF_KIND_FUNC may in turn be referenced by a func_info in the
:ref:`BTF_Ext_Section` (ELF) or in the arguments to :ref:`BPF_Prog_Load`
(ABI).

Currently, only linkage values of BTF_FUNC_STATIC and BTF_FUNC_GLOBAL are
supported in the kernel.

2.2.13 BTF_KIND_FUNC_PROTO
~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -493,7 +502,7 @@ the attribute is applied to a ``struct``/``union`` member or
a ``func`` argument, and ``btf_decl_tag.component_idx`` should be a
valid index (starting from 0) pointing to a member or an argument.

2.2.17 BTF_KIND_TYPE_TAG
2.2.18 BTF_KIND_TYPE_TAG
~~~~~~~~~~~~~~~~~~~~~~~~

``struct btf_type`` encoding requirement:
Expand All @@ -516,6 +525,32 @@ type_tag, then zero or more const/volatile/restrict/typedef
and finally the base type. The base type is one of
int, ptr, array, struct, union, enum, func_proto and float types.

2.2.19 BTF_KIND_ENUM64
~~~~~~~~~~~~~~~~~~~~~~

``struct btf_type`` encoding requirement:
* ``name_off``: 0 or offset to a valid C identifier
* ``info.kind_flag``: 0 for unsigned, 1 for signed
* ``info.kind``: BTF_KIND_ENUM64
* ``info.vlen``: number of enum values
* ``size``: 1/2/4/8

``btf_type`` is followed by ``info.vlen`` number of ``struct btf_enum64``.::

struct btf_enum64 {
__u32 name_off;
__u32 val_lo32;
__u32 val_hi32;
};

The ``btf_enum64`` encoding:
* ``name_off``: offset to a valid C identifier
* ``val_lo32``: lower 32-bit value for a 64-bit value
* ``val_hi32``: high 32-bit value for a 64-bit value

If the original enum value is signed and the size is less than 8,
that value will be sign extended into 8 bytes.

3. BTF Kernel API
=================

Expand Down
1 change: 1 addition & 0 deletions Documentation/bpf/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ that goes into great technical depth about the BPF Architecture.
faq
syscall_api
helpers
kfuncs
programs
maps
bpf_prog_run
Expand Down
4 changes: 2 additions & 2 deletions Documentation/bpf/instruction-set.rst
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ BPF_XOR | BPF_K | BPF_ALU64 means::
Byte swap instructions
----------------------

The byte swap instructions use an instruction class of ``BFP_ALU`` and a 4-bit
The byte swap instructions use an instruction class of ``BPF_ALU`` and a 4-bit
code field of ``BPF_END``.

The byte swap instructions operate on the destination register
Expand Down Expand Up @@ -351,7 +351,7 @@ These instructions have seven implicit operands:
* Register R0 is an implicit output which contains the data fetched from
the packet.
* Registers R1-R5 are scratch registers that are clobbered after a call to
``BPF_ABS | BPF_LD`` or ``BPF_IND`` | BPF_LD instructions.
``BPF_ABS | BPF_LD`` or ``BPF_IND | BPF_LD`` instructions.

These instructions have an implicit program exit condition as well. When an
eBPF program is trying to access the data beyond the packet boundary, the
Expand Down
Loading

0 comments on commit f86d1fb

Please sign in to comment.