Skip to content

Commit

Permalink
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kerne…
Browse files Browse the repository at this point in the history
…l/git/netdev/net-next

Pull networking updates from Jakub Kicinski:

 - Add redirect_neigh() BPF packet redirect helper, allowing to limit
   stack traversal in common container configs and improving TCP
   back-pressure.

   Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.

 - Expand netlink policy support and improve policy export to user
   space. (Ge)netlink core performs request validation according to
   declared policies. Expand the expressiveness of those policies
   (min/max length and bitmasks). Allow dumping policies for particular
   commands. This is used for feature discovery by user space (instead
   of kernel version parsing or trial and error).

 - Support IGMPv3/MLDv2 multicast listener discovery protocols in
   bridge.

 - Allow more than 255 IPv4 multicast interfaces.

 - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
   packets of TCPv6.

 - In Multi-patch TCP (MPTCP) support concurrent transmission of data on
   multiple subflows in a load balancing scenario. Enhance advertising
   addresses via the RM_ADDR/ADD_ADDR options.

 - Support SMC-Dv2 version of SMC, which enables multi-subnet
   deployments.

 - Allow more calls to same peer in RxRPC.

 - Support two new Controller Area Network (CAN) protocols - CAN-FD and
   ISO 15765-2:2016.

 - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
   kernel problem.

 - Add TC actions for implementing MPLS L2 VPNs.

 - Improve nexthop code - e.g. handle various corner cases when nexthop
   objects are removed from groups better, skip unnecessary
   notifications and make it easier to offload nexthops into HW by
   converting to a blocking notifier.

 - Support adding and consuming TCP header options by BPF programs,
   opening the doors for easy experimental and deployment-specific TCP
   option use.

 - Reorganize TCP congestion control (CC) initialization to simplify
   life of TCP CC implemented in BPF.

 - Add support for shipping BPF programs with the kernel and loading
   them early on boot via the User Mode Driver mechanism, hence reusing
   all the user space infra we have.

 - Support sleepable BPF programs, initially targeting LSM and tracing.

 - Add bpf_d_path() helper for returning full path for given 'struct
   path'.

 - Make bpf_tail_call compatible with bpf-to-bpf calls.

 - Allow BPF programs to call map_update_elem on sockmaps.

 - Add BPF Type Format (BTF) support for type and enum discovery, as
   well as support for using BTF within the kernel itself (current use
   is for pretty printing structures).

 - Support listing and getting information about bpf_links via the bpf
   syscall.

 - Enhance kernel interfaces around NIC firmware update. Allow
   specifying overwrite mask to control if settings etc. are reset
   during update; report expected max time operation may take to users;
   support firmware activation without machine reboot incl. limits of
   how much impact reset may have (e.g. dropping link or not).

 - Extend ethtool configuration interface to report IEEE-standard
   counters, to limit the need for per-vendor logic in user space.

 - Adopt or extend devlink use for debug, monitoring, fw update in many
   drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
   dpaa2-eth).

 - In mlxsw expose critical and emergency SFP module temperature alarms.
   Refactor port buffer handling to make the defaults more suitable and
   support setting these values explicitly via the DCBNL interface.

 - Add XDP support for Intel's igb driver.

 - Support offloading TC flower classification and filtering rules to
   mscc_ocelot switches.

 - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
   fixed interval period pulse generator and one-step timestamping in
   dpaa-eth.

 - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
   offload.

 - Add Lynx PHY/PCS MDIO module, and convert various drivers which have
   this HW to use it. Convert mvpp2 to split PCS.

 - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
   7-port Mediatek MT7531 IP.

 - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
   and wcn3680 support in wcn36xx.

 - Improve performance for packets which don't require much offloads on
   recent Mellanox NICs by 20% by making multiple packets share a
   descriptor entry.

 - Move chelsio inline crypto drivers (for TLS and IPsec) from the
   crypto subtree to drivers/net. Move MDIO drivers out of the phy
   directory.

 - Clean up a lot of W=1 warnings, reportedly the actively developed
   subsections of networking drivers should now build W=1 warning free.

 - Make sure drivers don't use in_interrupt() to dynamically adapt their
   code. Convert tasklets to use new tasklet_setup API (sadly this
   conversion is not yet complete).

* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
  Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
  net, sockmap: Don't call bpf_prog_put() on NULL pointer
  bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
  bpf, sockmap: Add locking annotations to iterator
  netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
  net: fix pos incrementment in ipv6_route_seq_next
  net/smc: fix invalid return code in smcd_new_buf_create()
  net/smc: fix valid DMBE buffer sizes
  net/smc: fix use-after-free of delayed events
  bpfilter: Fix build error with CONFIG_BPFILTER_UMH
  cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
  net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
  bpf: Fix register equivalence tracking.
  rxrpc: Fix loss of final ack on shutdown
  rxrpc: Fix bundle counting for exclusive connections
  netfilter: restore NF_INET_NUMHOOKS
  ibmveth: Identify ingress large send packets.
  ibmveth: Switch order of ibmveth_helper calls.
  cxgb4: handle 4-tuple PEDIT to NAT mode translation
  selftests: Add VRF route leaking tests
  ...
  • Loading branch information
torvalds committed Oct 16, 2020
2 parents 840e5bb + 105faa8 commit 9ff9b0d
Show file tree
Hide file tree
Showing 2,301 changed files with 130,475 additions and 51,396 deletions.
4 changes: 4 additions & 0 deletions CREDITS
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,10 @@ N: Krishna Balasubramanian
E: [email protected]
D: Wrote SYS V IPC (part of standard kernel since 0.99.10)

B: Robert Baldyga
E: [email protected]
D: Samsung S3FWRN5 NCI NFC Controller

N: Chris Ball
E: [email protected]
D: Former maintainer of the MMC/SD/SDIO subsystem.
Expand Down
5 changes: 5 additions & 0 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1349,6 +1349,11 @@
Format: <interval>,<probability>,<space>,<times>
See also Documentation/fault-injection/.

fb_tunnels= [NET]
Format: { initns | none }
See Documentation/admin-guide/sysctl/net.rst for
fb_tunnels_only_for_init_ns

floppy= [HW]
See Documentation/admin-guide/blockdev/floppy.rst.

Expand Down
20 changes: 14 additions & 6 deletions Documentation/admin-guide/sysctl/net.rst
Original file line number Diff line number Diff line change
Expand Up @@ -300,7 +300,6 @@ Note:
0: 0 1 2 3 4 5 6 7
RSS hash key:
84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89

netdev_tstamp_prequeue
----------------------

Expand All @@ -321,11 +320,20 @@ fb_tunnels_only_for_init_net
----------------------------

Controls if fallback tunnels (like tunl0, gre0, gretap0, erspan0,
sit0, ip6tnl0, ip6gre0) are automatically created when a new
network namespace is created, if corresponding tunnel is present
in initial network namespace.
If set to 1, these devices are not automatically created, and
user space is responsible for creating them if needed.
sit0, ip6tnl0, ip6gre0) are automatically created. There are 3 possibilities
(a) value = 0; respective fallback tunnels are created when module is
loaded in every net namespaces (backward compatible behavior).
(b) value = 1; [kcmd value: initns] respective fallback tunnels are
created only in init net namespace and every other net namespace will
not have them.
(c) value = 2; [kcmd value: none] fallback tunnels are not created
when a module is loaded in any of the net namespace. Setting value to
"2" is pointless after boot if these modules are built-in, so there is
a kernel command-line option that can change this default. Please refer to
Documentation/admin-guide/kernel-parameters.txt for additional details.

Not creating fallback tunnels gives control to userspace to create
whatever is needed only and avoid creating devices which are redundant.

Default : 0 (for compatibility reasons)

Expand Down
23 changes: 14 additions & 9 deletions Documentation/bpf/bpf_devel_QA.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,13 @@ Q: Where can I find patches currently under discussion for BPF subsystem?
A: All patches that are Cc'ed to netdev are queued for review under netdev
patchwork project:

http://patchwork.ozlabs.org/project/netdev/list/
https://patchwork.kernel.org/project/netdevbpf/list/

Those patches which target BPF, are assigned to a 'bpf' delegate for
further processing from BPF maintainers. The current queue with
patches under review can be found at:

https://patchwork.ozlabs.org/project/netdev/list/?delegate=77147
https://patchwork.kernel.org/project/netdevbpf/list/?delegate=121173

Once the patches have been reviewed by the BPF community as a whole
and approved by the BPF maintainers, their status in patchwork will be
Expand Down Expand Up @@ -149,7 +149,7 @@ In case the patch or patch series has to be reworked and sent out
again in a second or later revision, it is also required to add a
version number (``v2``, ``v3``, ...) into the subject prefix::

git format-patch --subject-prefix='PATCH net-next v2' start..finish
git format-patch --subject-prefix='PATCH bpf-next v2' start..finish

When changes have been requested to the patch series, always send the
whole patch series again with the feedback incorporated (never send
Expand Down Expand Up @@ -479,17 +479,18 @@ LLVM's static compiler lists the supported targets through

$ llc --version
LLVM (http://llvm.org/):
LLVM version 6.0.0svn
LLVM version 10.0.0
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake

Registered Targets:
bpf - BPF (host endian)
bpfeb - BPF (big endian)
bpfel - BPF (little endian)
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
aarch64 - AArch64 (little endian)
bpf - BPF (host endian)
bpfeb - BPF (big endian)
bpfel - BPF (little endian)
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64

For developers in order to utilize the latest features added to LLVM's
BPF back end, it is advisable to run the latest LLVM releases. Support
Expand Down Expand Up @@ -517,6 +518,10 @@ from the git repositories::
The built binaries can then be found in the build/bin/ directory, where
you can point the PATH variable to.

Set ``-DLLVM_TARGETS_TO_BUILD`` equal to the target you wish to build, you
will find a full list of targets within the llvm-project/llvm/lib/Target
directory.

Q: Reporting LLVM BPF issues
----------------------------
Q: Should I notify BPF kernel maintainers about issues in LLVM's BPF code
Expand Down
25 changes: 25 additions & 0 deletions Documentation/bpf/btf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -724,6 +724,31 @@ want to define unused entry in BTF_ID_LIST, like::
BTF_ID_UNUSED
BTF_ID(struct, task_struct)

The ``BTF_SET_START/END`` macros pair defines sorted list of BTF ID values
and their count, with following syntax::

BTF_SET_START(set)
BTF_ID(type1, name1)
BTF_ID(type2, name2)
BTF_SET_END(set)

resulting in following layout in .BTF_ids section::

__BTF_ID__set__set:
.zero 4
__BTF_ID__type1__name1__3:
.zero 4
__BTF_ID__type2__name2__4:
.zero 4

The ``struct btf_id_set set;`` variable is defined to access the list.

The ``typeX`` name can be one of following::

struct, union, typedef, func

and is used as a filter when resolving the BTF ID value.

All the BTF ID lists and sets are compiled in the .BTF_ids section and
resolved during the linking phase of kernel build by ``resolve_btfids`` tool.

Expand Down
1 change: 1 addition & 0 deletions Documentation/bpf/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ Program types
prog_cgroup_sysctl
prog_flow_dissector
bpf_lsm
prog_sk_lookup


Map types
Expand Down
98 changes: 98 additions & 0 deletions Documentation/bpf/prog_sk_lookup.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
.. SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
=====================
BPF sk_lookup program
=====================

BPF sk_lookup program type (``BPF_PROG_TYPE_SK_LOOKUP``) introduces programmability
into the socket lookup performed by the transport layer when a packet is to be
delivered locally.

When invoked BPF sk_lookup program can select a socket that will receive the
incoming packet by calling the ``bpf_sk_assign()`` BPF helper function.

Hooks for a common attach point (``BPF_SK_LOOKUP``) exist for both TCP and UDP.

Motivation
==========

BPF sk_lookup program type was introduced to address setup scenarios where
binding sockets to an address with ``bind()`` socket call is impractical, such
as:

1. receiving connections on a range of IP addresses, e.g. 192.0.2.0/24, when
binding to a wildcard address ``INADRR_ANY`` is not possible due to a port
conflict,
2. receiving connections on all or a wide range of ports, i.e. an L7 proxy use
case.

Such setups would require creating and ``bind()``'ing one socket to each of the
IP address/port in the range, leading to resource consumption and potential
latency spikes during socket lookup.

Attachment
==========

BPF sk_lookup program can be attached to a network namespace with
``bpf(BPF_LINK_CREATE, ...)`` syscall using the ``BPF_SK_LOOKUP`` attach type and a
netns FD as attachment ``target_fd``.

Multiple programs can be attached to one network namespace. Programs will be
invoked in the same order as they were attached.

Hooks
=====

The attached BPF sk_lookup programs run whenever the transport layer needs to
find a listening (TCP) or an unconnected (UDP) socket for an incoming packet.

Incoming traffic to established (TCP) and connected (UDP) sockets is delivered
as usual without triggering the BPF sk_lookup hook.

The attached BPF programs must return with either ``SK_PASS`` or ``SK_DROP``
verdict code. As for other BPF program types that are network filters,
``SK_PASS`` signifies that the socket lookup should continue on to regular
hashtable-based lookup, while ``SK_DROP`` causes the transport layer to drop the
packet.

A BPF sk_lookup program can also select a socket to receive the packet by
calling ``bpf_sk_assign()`` BPF helper. Typically, the program looks up a socket
in a map holding sockets, such as ``SOCKMAP`` or ``SOCKHASH``, and passes a
``struct bpf_sock *`` to ``bpf_sk_assign()`` helper to record the
selection. Selecting a socket only takes effect if the program has terminated
with ``SK_PASS`` code.

When multiple programs are attached, the end result is determined from return
codes of all the programs according to the following rules:

1. If any program returned ``SK_PASS`` and selected a valid socket, the socket
is used as the result of the socket lookup.
2. If more than one program returned ``SK_PASS`` and selected a socket, the last
selection takes effect.
3. If any program returned ``SK_DROP``, and no program returned ``SK_PASS`` and
selected a socket, socket lookup fails.
4. If all programs returned ``SK_PASS`` and none of them selected a socket,
socket lookup continues on.

API
===

In its context, an instance of ``struct bpf_sk_lookup``, BPF sk_lookup program
receives information about the packet that triggered the socket lookup. Namely:

* IP version (``AF_INET`` or ``AF_INET6``),
* L4 protocol identifier (``IPPROTO_TCP`` or ``IPPROTO_UDP``),
* source and destination IP address,
* source and destination L4 port,
* the socket that has been selected with ``bpf_sk_assign()``.

Refer to ``struct bpf_sk_lookup`` declaration in ``linux/bpf.h`` user API
header, and `bpf-helpers(7)
<https://man7.org/linux/man-pages/man7/bpf-helpers.7.html>`_ man-page section
for ``bpf_sk_assign()`` for details.

Example
=======

See ``tools/testing/selftests/bpf/prog_tests/sk_lookup.c`` for the reference
implementation.
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ Optional properties:
- reset-names: If the "reset" property is specified, this property should have
the value "switch" to denote the switch reset line.

- clocks: when provided, the first phandle is to the switch's main clock and
is valid for both BCM7445 and BCM7278. The second phandle is only applicable
to BCM7445 and is to support dividing the switch core clock.

- clock-names: when provided, the first phandle must be "sw_switch", and the
second must be named "sw_switch_mdiv".

Port subnodes:

Optional properties:
Expand Down
5 changes: 5 additions & 0 deletions Documentation/devicetree/bindings/net/brcm,systemport.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ Optional properties:
- systemport,num-tier1-arb: number of tier 1 arbiters, an integer
- systemport,num-txq: number of HW transmit queues, an integer
- systemport,num-rxq: number of HW receive queues, an integer
- clocks: When provided, must be two phandles to the functional clocks nodes of
the SYSTEMPORT block. The first phandle is the main SYSTEMPORT clock used
during normal operation, while the second phandle is the Wake-on-LAN clock.
- clock-names: When provided, names of the functional clock phandles, first
name should be "sw_sysport" and second should be "sw_sysportwol".

Example:
ethernet@f04a0000 {
Expand Down
10 changes: 7 additions & 3 deletions Documentation/devicetree/bindings/net/can/fsl-flexcan.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@ Required properties:

- compatible : Should be "fsl,<processor>-flexcan"

where <processor> is imx8qm, imx6q, imx28, imx53, imx35, imx25, p1010,
vf610, ls1021ar2, lx2160ar1, ls1028ar1.

The ls1028ar1 must be followed by lx2160ar1, e.g.
- "fsl,ls1028ar1-flexcan", "fsl,lx2160ar1-flexcan"

An implementation should also claim any of the following compatibles
that it is fully backwards compatible with:

Expand All @@ -25,12 +31,10 @@ Optional properties:
endian.

- fsl,stop-mode: register bits of stop mode control, the format is
<&gpr req_gpr req_bit ack_gpr ack_bit>.
<&gpr req_gpr req_bit>.
gpr is the phandle to general purpose register node.
req_gpr is the gpr register offset of CAN stop request.
req_bit is the bit offset of CAN stop request.
ack_gpr is the gpr register offset of CAN stop acknowledge.
ack_bit is the bit offset of CAN stop acknowledge.

- fsl,clk-source: Select the clock source to the CAN Protocol Engine (PE).
It's SoC Implementation dependent. Refer to RM for detailed
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,19 @@ Required properties:
Optional properties:
- vdd-supply: Regulator that powers the CAN controller.
- xceiver-supply: Regulator that powers the CAN transceiver.
- gpio-controller: Indicates this device is a GPIO controller.
- #gpio-cells: Should be two. The first cell is the pin number and
the second cell is used to specify the gpio polarity.

Example:
can0: can@1 {
compatible = "microchip,mcp2515";
reg = <1>;
clocks = <&clk24m>;
interrupt-parent = <&gpio4>;
interrupts = <13 0x2>;
interrupts = <13 IRQ_TYPE_LEVEL_LOW>;
vdd-supply = <&reg5v0>;
xceiver-supply = <&reg5v0>;
gpio-controller;
#gpio-cells = <2>;
};
Loading

0 comments on commit 9ff9b0d

Please sign in to comment.