Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Browse files Browse the repository at this point in the history
Daniel Borkmann says:

====================
The following pull-request contains BPF updates for your *net-next* tree.

There is a small merge conflict in libbpf (Cc Andrii so he's in the loop
as well):

        for (i = 1; i <= btf__get_nr_types(btf); i++) {
                t = (struct btf_type *)btf__type_by_id(btf, i);

                if (!has_datasec && btf_is_var(t)) {
                        /* replace VAR with INT */
                        t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
  <<<<<<< HEAD
                        /*
                         * using size = 1 is the safest choice, 4 will be too
                         * big and cause kernel BTF validation failure if
                         * original variable took less than 4 bytes
                         */
                        t->size = 1;
                        *(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
                } else if (!has_datasec && kind == BTF_KIND_DATASEC) {
  =======
                        t->size = sizeof(int);
                        *(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
                } else if (!has_datasec && btf_is_datasec(t)) {
  >>>>>>> 72ef80b
                        /* replace DATASEC with STRUCT */

Conflict is between the two commits 1d4126c ("libbpf: sanitize VAR to
conservative 1-byte INT") and b03bc68 ("libbpf: convert libbpf code to
use new btf helpers"), so we need to pick the sanitation fixup as well as
use the new btf_is_datasec() helper and the whitespace cleanup. Looks like
the following:

  [...]
                if (!has_datasec && btf_is_var(t)) {
                        /* replace VAR with INT */
                        t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
                        /*
                         * using size = 1 is the safest choice, 4 will be too
                         * big and cause kernel BTF validation failure if
                         * original variable took less than 4 bytes
                         */
                        t->size = 1;
                        *(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
                } else if (!has_datasec && btf_is_datasec(t)) {
                        /* replace DATASEC with STRUCT */
  [...]

The main changes are:

1) Addition of core parts of compile once - run everywhere (co-re) effort,
   that is, relocation of fields offsets in libbpf as well as exposure of
   kernel's own BTF via sysfs and loading through libbpf, from Andrii.

   More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2
   and http://vger.kernel.org/lpc-bpf2018.html#session-2

2) Enable passing input flags to the BPF flow dissector to customize parsing
   and allowing it to stop early similar to the C based one, from Stanislav.

3) Add a BPF helper function that allows generating SYN cookies from XDP and
   tc BPF, from Petar.

4) Add devmap hash-based map type for more flexibility in device lookup for
   redirects, from Toke.

5) Improvements to XDP forwarding sample code now utilizing recently enabled
   devmap lookups, from Jesper.

6) Add support for reporting the effective cgroup progs in bpftool, from Jakub
   and Takshak.

7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter.

8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan.

9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei.

10) Add perf event output helper also for other skb-based program types, from Allan.

11) Fix a co-re related compilation error in selftests, from Yonghong.
====================

Signed-off-by: Jakub Kicinski <[email protected]>
  • Loading branch information
Jakub Kicinski committed Aug 13, 2019
2 parents a9a9676 + 72ef80b commit 708852d
Show file tree
Hide file tree
Showing 121 changed files with 5,229 additions and 920 deletions.
17 changes: 17 additions & 0 deletions Documentation/ABI/testing/sysfs-kernel-btf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
What: /sys/kernel/btf
Date: Aug 2019
KernelVersion: 5.5
Contact: [email protected]
Description:
Contains BTF type information and related data for kernel and
kernel modules.

What: /sys/kernel/btf/vmlinux
Date: Aug 2019
KernelVersion: 5.5
Contact: [email protected]
Description:
Read-only binary attribute exposing kernel's own BTF type
information with description of all internal kernel types. See
Documentation/bpf/btf.rst for detailed description of format
itself.
18 changes: 18 additions & 0 deletions Documentation/bpf/prog_flow_dissector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ The inputs are:
* ``nhoff`` - initial offset of the networking header
* ``thoff`` - initial offset of the transport header, initialized to nhoff
* ``n_proto`` - L3 protocol type, parsed out of L2 header
* ``flags`` - optional flags

Flow dissector BPF program should fill out the rest of the ``struct
bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be
Expand Down Expand Up @@ -101,6 +102,23 @@ can be called for both cases and would have to be written carefully to
handle both cases.


Flags
=====

``flow_keys->flags`` might contain optional input flags that work as follows:

* ``BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG`` - tells BPF flow dissector to
continue parsing first fragment; the default expected behavior is that
flow dissector returns as soon as it finds out that the packet is fragmented;
used by ``eth_get_headlen`` to estimate length of all headers for GRO.
* ``BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL`` - tells BPF flow dissector to
stop parsing as soon as it reaches IPv6 flow label; used by
``___skb_get_hash`` and ``__skb_get_hash_symmetric`` to get flow hash.
* ``BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP`` - tells BPF flow dissector to stop
parsing as soon as it reaches encapsulated headers; used by routing
infrastructure.


Reference Implementation
========================

Expand Down
11 changes: 4 additions & 7 deletions include/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -713,15 +713,14 @@ struct xdp_buff;
struct sk_buff;

struct bpf_dtab_netdev *__dev_map_lookup_elem(struct bpf_map *map, u32 key);
void __dev_map_insert_ctx(struct bpf_map *map, u32 index);
struct bpf_dtab_netdev *__dev_map_hash_lookup_elem(struct bpf_map *map, u32 key);
void __dev_map_flush(struct bpf_map *map);
int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
struct net_device *dev_rx);
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
struct bpf_prog *xdp_prog);

struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key);
void __cpu_map_insert_ctx(struct bpf_map *map, u32 index);
void __cpu_map_flush(struct bpf_map *map);
int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
struct net_device *dev_rx);
Expand Down Expand Up @@ -801,8 +800,10 @@ static inline struct net_device *__dev_map_lookup_elem(struct bpf_map *map,
return NULL;
}

static inline void __dev_map_insert_ctx(struct bpf_map *map, u32 index)
static inline struct net_device *__dev_map_hash_lookup_elem(struct bpf_map *map,
u32 key)
{
return NULL;
}

static inline void __dev_map_flush(struct bpf_map *map)
Expand Down Expand Up @@ -834,10 +835,6 @@ struct bpf_cpu_map_entry *__cpu_map_lookup_elem(struct bpf_map *map, u32 key)
return NULL;
}

static inline void __cpu_map_insert_ctx(struct bpf_map *map, u32 index)
{
}

static inline void __cpu_map_flush(struct bpf_map *map)
{
}
Expand Down
1 change: 1 addition & 0 deletions include/linux/bpf_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY_OF_MAPS, array_of_maps_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops)
#ifdef CONFIG_NET
BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP, dev_map_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_DEVMAP_HASH, dev_map_hash_ops)
BPF_MAP_TYPE(BPF_MAP_TYPE_SK_STORAGE, sk_storage_map_ops)
#if defined(CONFIG_BPF_STREAM_PARSER)
BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops)
Expand Down
2 changes: 1 addition & 1 deletion include/linux/skbuff.h
Original file line number Diff line number Diff line change
Expand Up @@ -1271,7 +1271,7 @@ static inline int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)

struct bpf_flow_dissector;
bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx,
__be16 proto, int nhoff, int hlen);
__be16 proto, int nhoff, int hlen, unsigned int flags);

bool __skb_flow_dissect(const struct net *net,
const struct sk_buff *skb,
Expand Down
10 changes: 10 additions & 0 deletions include/net/tcp.h
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,16 @@ void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
int estab, struct tcp_fastopen_cookie *foc);
const u8 *tcp_parse_md5sig_option(const struct tcphdr *th);

/*
* BPF SKB-less helpers
*/
u16 tcp_v4_get_syncookie(struct sock *sk, struct iphdr *iph,
struct tcphdr *th, u32 *cookie);
u16 tcp_v6_get_syncookie(struct sock *sk, struct ipv6hdr *iph,
struct tcphdr *th, u32 *cookie);
u16 tcp_get_syncookie_mss(struct request_sock_ops *rsk_ops,
const struct tcp_request_sock_ops *af_ops,
struct sock *sk, struct tcphdr *th);
/*
* TCP v4 functions exported for the inet6 API
*/
Expand Down
3 changes: 2 additions & 1 deletion include/trace/events/xdp.h
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,8 @@ struct _bpf_dtab_netdev {
#endif /* __DEVMAP_OBJ_TYPE */

#define devmap_ifindex(fwd, map) \
((map->map_type == BPF_MAP_TYPE_DEVMAP) ? \
((map->map_type == BPF_MAP_TYPE_DEVMAP || \
map->map_type == BPF_MAP_TYPE_DEVMAP_HASH) ? \
((struct _bpf_dtab_netdev *)fwd)->dev->ifindex : 0)

#define _trace_xdp_redirect_map(dev, xdp, fwd, map, idx) \
Expand Down
37 changes: 36 additions & 1 deletion include/uapi/linux/bpf.h
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@ enum bpf_map_type {
BPF_MAP_TYPE_QUEUE,
BPF_MAP_TYPE_STACK,
BPF_MAP_TYPE_SK_STORAGE,
BPF_MAP_TYPE_DEVMAP_HASH,
};

/* Note that tracing related programs such as
Expand Down Expand Up @@ -2713,6 +2714,33 @@ union bpf_attr {
* **-EPERM** if no permission to send the *sig*.
*
* **-EAGAIN** if bpf program can try again.
*
* s64 bpf_tcp_gen_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
* Description
* Try to issue a SYN cookie for the packet with corresponding
* IP/TCP headers, *iph* and *th*, on the listening socket in *sk*.
*
* *iph* points to the start of the IPv4 or IPv6 header, while
* *iph_len* contains **sizeof**\ (**struct iphdr**) or
* **sizeof**\ (**struct ip6hdr**).
*
* *th* points to the start of the TCP header, while *th_len*
* contains the length of the TCP header.
*
* Return
* On success, lower 32 bits hold the generated SYN cookie in
* followed by 16 bits which hold the MSS value for that cookie,
* and the top 16 bits are unused.
*
* On failure, the returned value is one of the following:
*
* **-EINVAL** SYN cookie cannot be issued due to error
*
* **-ENOENT** SYN cookie should not be issued (no SYN flood)
*
* **-EOPNOTSUPP** kernel configuration does not enable SYN cookies
*
* **-EPROTONOSUPPORT** IP packet version is not 4 or 6
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
Expand Down Expand Up @@ -2824,7 +2852,8 @@ union bpf_attr {
FN(strtoul), \
FN(sk_storage_get), \
FN(sk_storage_delete), \
FN(send_signal),
FN(send_signal), \
FN(tcp_gen_syncookie),

/* integer value in 'imm' field of BPF_CALL instruction selects which helper
* function eBPF program intends to call
Expand Down Expand Up @@ -3507,6 +3536,10 @@ enum bpf_task_fd_type {
BPF_FD_TYPE_URETPROBE, /* filename + offset */
};

#define BPF_FLOW_DISSECTOR_F_PARSE_1ST_FRAG (1U << 0)
#define BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL (1U << 1)
#define BPF_FLOW_DISSECTOR_F_STOP_AT_ENCAP (1U << 2)

struct bpf_flow_keys {
__u16 nhoff;
__u16 thoff;
Expand All @@ -3528,6 +3561,8 @@ struct bpf_flow_keys {
__u32 ipv6_dst[4]; /* in6_addr; network order */
};
};
__u32 flags;
__be32 flow_label;
};

struct bpf_func_info {
Expand Down
3 changes: 3 additions & 0 deletions kernel/bpf/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,6 @@ obj-$(CONFIG_CGROUP_BPF) += cgroup.o
ifeq ($(CONFIG_INET),y)
obj-$(CONFIG_BPF_SYSCALL) += reuseport_array.o
endif
ifeq ($(CONFIG_SYSFS),y)
obj-$(CONFIG_DEBUG_INFO_BTF) += sysfs_btf.o
endif
17 changes: 13 additions & 4 deletions kernel/bpf/cgroup.c
Original file line number Diff line number Diff line change
Expand Up @@ -964,7 +964,6 @@ static int sockopt_alloc_buf(struct bpf_sockopt_kern *ctx, int max_optlen)
return -ENOMEM;

ctx->optval_end = ctx->optval + max_optlen;
ctx->optlen = max_optlen;

return 0;
}
Expand All @@ -984,7 +983,7 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
.level = *level,
.optname = *optname,
};
int ret;
int ret, max_optlen;

/* Opportunistic check to see whether we have any BPF program
* attached to the hook so we don't waste time allocating
Expand All @@ -994,10 +993,18 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
__cgroup_bpf_prog_array_is_empty(cgrp, BPF_CGROUP_SETSOCKOPT))
return 0;

ret = sockopt_alloc_buf(&ctx, *optlen);
/* Allocate a bit more than the initial user buffer for
* BPF program. The canonical use case is overriding
* TCP_CONGESTION(nv) to TCP_CONGESTION(cubic).
*/
max_optlen = max_t(int, 16, *optlen);

ret = sockopt_alloc_buf(&ctx, max_optlen);
if (ret)
return ret;

ctx.optlen = *optlen;

if (copy_from_user(ctx.optval, optval, *optlen) != 0) {
ret = -EFAULT;
goto out;
Expand All @@ -1016,7 +1023,7 @@ int __cgroup_bpf_run_filter_setsockopt(struct sock *sk, int *level,
if (ctx.optlen == -1) {
/* optlen set to -1, bypass kernel */
ret = 1;
} else if (ctx.optlen > *optlen || ctx.optlen < -1) {
} else if (ctx.optlen > max_optlen || ctx.optlen < -1) {
/* optlen is out of bounds */
ret = -EFAULT;
} else {
Expand Down Expand Up @@ -1063,6 +1070,8 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
if (ret)
return ret;

ctx.optlen = max_optlen;

if (!retval) {
/* If kernel getsockopt finished successfully,
* copy whatever was returned to the user back
Expand Down
Loading

0 comments on commit 708852d

Please sign in to comment.