forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says: ==================== pull-request: bpf-next 2019-07-03 The following pull-request contains BPF updates for your *net-next* tree. There is a minor merge conflict in mlx5 due to 8960b38 ("linux/dim: Rename externally used net_dim members") which has been pulled into your tree in the meantime, but resolution seems not that bad ... getting current bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed just so he's aware of the resolution below: ** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD static int mlx5e_open_cq(struct mlx5e_channel *c, struct dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) ======= int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) >>>>>>> e5a3e25 Resolution is to take the second chunk and rename net_dim_cq_moder into dim_cq_moder. Also the signature for mlx5e_open_cq() in ... drivers/net/ethernet/mellanox/mlx5/core/en.h +977 ... and in mlx5e_open_xsk() ... drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64 ... needs the same rename from net_dim_cq_moder into dim_cq_moder. ** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); struct dim_cq_moder icocq_moder = {0, 0}; struct net_device *netdev = priv->netdev; struct mlx5e_channel *c; unsigned int irq; ======= struct net_dim_cq_moder icocq_moder = {0, 0}; >>>>>>> e5a3e25 Take the second chunk and rename net_dim_cq_moder into dim_cq_moder as well. Let me know if you run into any issues. Anyway, the main changes are: 1) Long-awaited AF_XDP support for mlx5e driver, from Maxim. 2) Addition of two new per-cgroup BPF hooks for getsockopt and setsockopt along with a new sockopt program type which allows more fine-grained pass/reject settings for containers. Also add a sock_ops callback that can be selectively enabled on a per-socket basis and is executed for every RTT to help tracking TCP statistics, both features from Stanislav. 3) Follow-up fix from loops in precision tracking which was not propagating precision marks and as a result verifier assumed that some branches were not taken and therefore wrongly removed as dead code, from Alexei. 4) Fix BPF cgroup release synchronization race which could lead to a double-free if a leaf's cgroup_bpf object is released and a new BPF program is attached to the one of ancestor cgroups in parallel, from Roman. 5) Support for bulking XDP_TX on veth devices which improves performance in some cases by around 9%, from Toshiaki. 6) Allow for lookups into BPF devmap and improve feedback when calling into bpf_redirect_map() as lookup is now performed right away in the helper itself, from Toke. 7) Add support for fq's Earliest Departure Time to the Host Bandwidth Manager (HBM) sample BPF program, from Lawrence. 8) Various cleanups and minor fixes all over the place from many others. ==================== Signed-off-by: David S. Miller <[email protected]>
- Loading branch information
Showing
98 changed files
with
6,197 additions
and
841 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,7 @@ Program types | |
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
prog_cgroup_sockopt | ||
prog_cgroup_sysctl | ||
prog_flow_dissector | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
.. SPDX-License-Identifier: GPL-2.0 | ||
============================ | ||
BPF_PROG_TYPE_CGROUP_SOCKOPT | ||
============================ | ||
|
||
``BPF_PROG_TYPE_CGROUP_SOCKOPT`` program type can be attached to two | ||
cgroup hooks: | ||
|
||
* ``BPF_CGROUP_GETSOCKOPT`` - called every time process executes ``getsockopt`` | ||
system call. | ||
* ``BPF_CGROUP_SETSOCKOPT`` - called every time process executes ``setsockopt`` | ||
system call. | ||
|
||
The context (``struct bpf_sockopt``) has associated socket (``sk``) and | ||
all input arguments: ``level``, ``optname``, ``optval`` and ``optlen``. | ||
|
||
BPF_CGROUP_SETSOCKOPT | ||
===================== | ||
|
||
``BPF_CGROUP_SETSOCKOPT`` is triggered *before* the kernel handling of | ||
sockopt and it has writable context: it can modify the supplied arguments | ||
before passing them down to the kernel. This hook has access to the cgroup | ||
and socket local storage. | ||
|
||
If BPF program sets ``optlen`` to -1, the control will be returned | ||
back to the userspace after all other BPF programs in the cgroup | ||
chain finish (i.e. kernel ``setsockopt`` handling will *not* be executed). | ||
|
||
Note, that ``optlen`` can not be increased beyond the user-supplied | ||
value. It can only be decreased or set to -1. Any other value will | ||
trigger ``EFAULT``. | ||
|
||
Return Type | ||
----------- | ||
|
||
* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace. | ||
* ``1`` - success, continue with next BPF program in the cgroup chain. | ||
|
||
BPF_CGROUP_GETSOCKOPT | ||
===================== | ||
|
||
``BPF_CGROUP_GETSOCKOPT`` is triggered *after* the kernel handing of | ||
sockopt. The BPF hook can observe ``optval``, ``optlen`` and ``retval`` | ||
if it's interested in whatever kernel has returned. BPF hook can override | ||
the values above, adjust ``optlen`` and reset ``retval`` to 0. If ``optlen`` | ||
has been increased above initial ``getsockopt`` value (i.e. userspace | ||
buffer is too small), ``EFAULT`` is returned. | ||
|
||
This hook has access to the cgroup and socket local storage. | ||
|
||
Note, that the only acceptable value to set to ``retval`` is 0 and the | ||
original value that the kernel returned. Any other value will trigger | ||
``EFAULT``. | ||
|
||
Return Type | ||
----------- | ||
|
||
* ``0`` - reject the syscall, ``EPERM`` will be returned to the userspace. | ||
* ``1`` - success: copy ``optval`` and ``optlen`` to userspace, return | ||
``retval`` from the syscall (note that this can be overwritten by | ||
the BPF program from the parent cgroup). | ||
|
||
Cgroup Inheritance | ||
================== | ||
|
||
Suppose, there is the following cgroup hierarchy where each cgroup | ||
has ``BPF_CGROUP_GETSOCKOPT`` attached at each level with | ||
``BPF_F_ALLOW_MULTI`` flag:: | ||
|
||
A (root, parent) | ||
\ | ||
B (child) | ||
|
||
When the application calls ``getsockopt`` syscall from the cgroup B, | ||
the programs are executed from the bottom up: B, A. First program | ||
(B) sees the result of kernel's ``getsockopt``. It can optionally | ||
adjust ``optval``, ``optlen`` and reset ``retval`` to 0. After that | ||
control will be passed to the second (A) program which will see the | ||
same context as B including any potential modifications. | ||
|
||
Same for ``BPF_CGROUP_SETSOCKOPT``: if the program is attached to | ||
A and B, the trigger order is B, then A. If B does any changes | ||
to the input arguments (``level``, ``optname``, ``optval``, ``optlen``), | ||
then the next program in the chain (A) will see those changes, | ||
*not* the original input ``setsockopt`` arguments. The potentially | ||
modified values will be then passed down to the kernel. | ||
|
||
Example | ||
======= | ||
|
||
See ``tools/testing/selftests/bpf/progs/sockopt_sk.c`` for an example | ||
of BPF program that handles socket options. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.