Skip to content

Commit

Permalink
Merge branch 'sk_buff-add-extension-infrastructure'
Browse files Browse the repository at this point in the history
Florian Westphal says:

====================
sk_buff: add extension infrastructure

TL;DR:
 - objdiff shows no change if CONFIG_XFRM=n && BR_NETFILTER=n
 - small size reduction when one or both options are set
 - no changes in ipsec performance

 Changes since v1:
 - Allocate entire extension space from a kmem_cache.
 - Avoid atomic_dec_and_test operation on skb_ext_put() for refcnt == 1 case.
   (similar to kfree_skbmem() fclone_ref use).

This adds an optional extension infrastructure, with ispec (xfrm) and
bridge netfilter as first users.

The third (future) user is Multipath TCP which is still out-of-tree.
MPTCP needs to map logical mptcp sequence numbers to the tcp sequence
numbers used by individual subflows.

This DSS mapping is read/written from tcp option space on receive and
written to tcp option space on transmitted tcp packets that are part of
and MPTCP connection.

Extending skb_shared_info or adding a private data field to skb fclones
doesn't work for incoming skb, so a different DSS propagation method would
be required for the receive side.

mptcp has same requirements as secpath/bridge netfilter:

1. extension memory is released when the sk_buff is free'd.
2. data is shared after cloning an skb (clone inherits extension)
3. adding extension to an skb will COW the extension buffer if needed.

Two new members are added to sk_buff:
1. 'active_extensions' byte (filling a hole), telling which extensions
   are available for this skb.
   This has two purposes.
   a) avoids the need to initialize the pointer.
   b) allows to "delete" an extension by clearing its bit
   value in ->active_extensions.

   While it would be possible to store the active_extensions byte
   in the extension struct instead of sk_buff, there is one problem
   with this:
    When an extension has to be disabled, we can always clear the
    bit in skb->active_extensions.  But in case it would be stored in the
    extension buffer itself, we might have to COW it first, if
    we are dealing with a cloned skb.  On kmalloc failure we would
    be unable to turn an extension off.
2. extension pointer, located at the end of the sk_buff.
   If the active_extensions byte is 0, the pointer is undefined,
   it is not initialized on skb allocation.

This adds extra code to skb clone and free paths (to deal with
refcount/free of extension area) but this replaces similar code that
manages skb->nf_bridge and skb->sp structs in the followup patches of
the series.

It is possible to add support for extensions that are not preseved on
clones/copies:

1. define a bitmask of all extensions that need copy/cow on clone
2. change __skb_ext_copy() to check
   ->active_extensions & SKB_EXT_PRESERVE_ON_CLONE
3. set clone->active_extensions to 0 if test is false.

This isn't done here because all extensions that get added here
need the copy/cow semantics.

Last patch converts skb->sp, secpath information gets stored as
new SKB_EXT_SEC_PATH, so the 'sp' pointer is removed from skbuff.

Extra code added to skb clone and free paths (to deal with refcount/free
of extension area) replaces the existing code that does the same for
skb->nf_bridge and skb->secpath.

I don't see any other in-tree users that could benefit from this
infrastructure, it doesn't make sense to add an extension just for the sake
of a single flag bit (like skb->nf_trace).

Adding a new extension is a good fit if all of the following are true:

1. Data is related to the skb/packet aggregate
2. Data should be freed when the skb is free'd
3. Data is not going to be relevant/needed in normal case (udp, tcp,
   forwarding workloads, ...)
4. There are no fancy action(s) needed on clone/free, such as callbacks
   into kernel modules.
====================

Signed-off-by: David S. Miller <[email protected]>
  • Loading branch information
davem330 committed Dec 19, 2018
2 parents 8239d57 + 4165079 commit 4a54877
Show file tree
Hide file tree
Showing 39 changed files with 564 additions and 286 deletions.
7 changes: 4 additions & 3 deletions Documentation/networking/xfrm_device.txt
Original file line number Diff line number Diff line change
Expand Up @@ -111,9 +111,10 @@ the stack in xfrm_input().
xfrm_state_hold(xs);

store the state information into the skb
skb->sp = secpath_dup(skb->sp);
skb->sp->xvec[skb->sp->len++] = xs;
skb->sp->olen++;
sp = secpath_set(skb);
if (!sp) return;
sp->xvec[sp->len++] = xs;
sp->olen++;

indicate the success and/or error status of the offload
xo = xfrm_offload(skb);
Expand Down
4 changes: 3 additions & 1 deletion drivers/crypto/chelsio/chcr_ipsec.c
Original file line number Diff line number Diff line change
Expand Up @@ -570,6 +570,7 @@ int chcr_ipsec_xmit(struct sk_buff *skb, struct net_device *dev)
struct sge_eth_txq *q;
struct port_info *pi;
dma_addr_t addr[MAX_SKB_FRAGS + 1];
struct sec_path *sp;
bool immediate = false;

if (!x->xso.offload_handle)
Expand All @@ -578,7 +579,8 @@ int chcr_ipsec_xmit(struct sk_buff *skb, struct net_device *dev)
sa_entry = (struct ipsec_sa_entry *)x->xso.offload_handle;
kctx_len = sa_entry->kctx_len;

if (skb->sp->len != 1) {
sp = skb_sec_path(skb);
if (sp->len != 1) {
out_free: dev_kfree_skb_any(skb);
return NETDEV_TX_OK;
}
Expand Down
15 changes: 9 additions & 6 deletions drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c
Original file line number Diff line number Diff line change
Expand Up @@ -1065,11 +1065,13 @@ int ixgbe_ipsec_tx(struct ixgbe_ring *tx_ring,
struct ixgbe_adapter *adapter = netdev_priv(tx_ring->netdev);
struct ixgbe_ipsec *ipsec = adapter->ipsec;
struct xfrm_state *xs;
struct sec_path *sp;
struct tx_sa *tsa;

if (unlikely(!first->skb->sp->len)) {
sp = skb_sec_path(first->skb);
if (unlikely(!sp->len)) {
netdev_err(tx_ring->netdev, "%s: no xfrm state len = %d\n",
__func__, first->skb->sp->len);
__func__, sp->len);
return 0;
}

Expand Down Expand Up @@ -1159,6 +1161,7 @@ void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring,
struct xfrm_state *xs = NULL;
struct ipv6hdr *ip6 = NULL;
struct iphdr *ip4 = NULL;
struct sec_path *sp;
void *daddr;
__be32 spi;
u8 *c_hdr;
Expand Down Expand Up @@ -1198,12 +1201,12 @@ void ixgbe_ipsec_rx(struct ixgbe_ring *rx_ring,
if (unlikely(!xs))
return;

skb->sp = secpath_dup(skb->sp);
if (unlikely(!skb->sp))
sp = secpath_set(skb);
if (unlikely(!sp))
return;

skb->sp->xvec[skb->sp->len++] = xs;
skb->sp->olen++;
sp->xvec[sp->len++] = xs;
sp->olen++;
xo = xfrm_offload(skb);
xo->flags = CRYPTO_DONE;
xo->status = CRYPTO_SUCCESS;
Expand Down
5 changes: 3 additions & 2 deletions drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -8695,7 +8695,8 @@ netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
#endif /* IXGBE_FCOE */

#ifdef CONFIG_IXGBE_IPSEC
if (skb->sp && !ixgbe_ipsec_tx(tx_ring, first, &ipsec_tx))
if (secpath_exists(skb) &&
!ixgbe_ipsec_tx(tx_ring, first, &ipsec_tx))
goto out_drop;
#endif
tso = ixgbe_tso(tx_ring, first, &hdr_len, &ipsec_tx);
Expand Down Expand Up @@ -10192,7 +10193,7 @@ ixgbe_features_check(struct sk_buff *skb, struct net_device *dev,
*/
if (skb->encapsulation && !(features & NETIF_F_TSO_MANGLEID)) {
#ifdef CONFIG_IXGBE_IPSEC
if (!skb->sp)
if (!secpath_exists(skb))
#endif
features &= ~NETIF_F_TSO;
}
Expand Down
15 changes: 9 additions & 6 deletions drivers/net/ethernet/intel/ixgbevf/ipsec.c
Original file line number Diff line number Diff line change
Expand Up @@ -450,12 +450,14 @@ int ixgbevf_ipsec_tx(struct ixgbevf_ring *tx_ring,
struct ixgbevf_adapter *adapter = netdev_priv(tx_ring->netdev);
struct ixgbevf_ipsec *ipsec = adapter->ipsec;
struct xfrm_state *xs;
struct sec_path *sp;
struct tx_sa *tsa;
u16 sa_idx;

if (unlikely(!first->skb->sp->len)) {
sp = skb_sec_path(first->skb);
if (unlikely(!sp->len)) {
netdev_err(tx_ring->netdev, "%s: no xfrm state len = %d\n",
__func__, first->skb->sp->len);
__func__, sp->len);
return 0;
}

Expand Down Expand Up @@ -546,6 +548,7 @@ void ixgbevf_ipsec_rx(struct ixgbevf_ring *rx_ring,
struct xfrm_state *xs = NULL;
struct ipv6hdr *ip6 = NULL;
struct iphdr *ip4 = NULL;
struct sec_path *sp;
void *daddr;
__be32 spi;
u8 *c_hdr;
Expand Down Expand Up @@ -585,12 +588,12 @@ void ixgbevf_ipsec_rx(struct ixgbevf_ring *rx_ring,
if (unlikely(!xs))
return;

skb->sp = secpath_dup(skb->sp);
if (unlikely(!skb->sp))
sp = secpath_set(skb);
if (unlikely(!sp))
return;

skb->sp->xvec[skb->sp->len++] = xs;
skb->sp->olen++;
sp->xvec[sp->len++] = xs;
sp->olen++;
xo = xfrm_offload(skb);
xo->flags = CRYPTO_DONE;
xo->status = CRYPTO_SUCCESS;
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -4157,7 +4157,7 @@ static int ixgbevf_xmit_frame_ring(struct sk_buff *skb,
first->protocol = vlan_get_protocol(skb);

#ifdef CONFIG_IXGBEVF_IPSEC
if (skb->sp && !ixgbevf_ipsec_tx(tx_ring, first, &ipsec_tx))
if (secpath_exists(skb) && !ixgbevf_ipsec_tx(tx_ring, first, &ipsec_tx))
goto out_drop;
#endif
tso = ixgbevf_tso(tx_ring, first, &hdr_len, &ipsec_tx);
Expand Down
19 changes: 12 additions & 7 deletions drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_rxtx.c
Original file line number Diff line number Diff line change
Expand Up @@ -254,11 +254,13 @@ struct sk_buff *mlx5e_ipsec_handle_tx_skb(struct net_device *netdev,
struct mlx5e_ipsec_metadata *mdata;
struct mlx5e_ipsec_sa_entry *sa_entry;
struct xfrm_state *x;
struct sec_path *sp;

if (!xo)
return skb;

if (unlikely(skb->sp->len != 1)) {
sp = skb_sec_path(skb);
if (unlikely(sp->len != 1)) {
atomic64_inc(&priv->ipsec->sw_stats.ipsec_tx_drop_bundle);
goto drop;
}
Expand Down Expand Up @@ -305,10 +307,11 @@ mlx5e_ipsec_build_sp(struct net_device *netdev, struct sk_buff *skb,
struct mlx5e_priv *priv = netdev_priv(netdev);
struct xfrm_offload *xo;
struct xfrm_state *xs;
struct sec_path *sp;
u32 sa_handle;

skb->sp = secpath_dup(skb->sp);
if (unlikely(!skb->sp)) {
sp = secpath_set(skb);
if (unlikely(!sp)) {
atomic64_inc(&priv->ipsec->sw_stats.ipsec_rx_drop_sp_alloc);
return NULL;
}
Expand All @@ -320,8 +323,9 @@ mlx5e_ipsec_build_sp(struct net_device *netdev, struct sk_buff *skb,
return NULL;
}

skb->sp->xvec[skb->sp->len++] = xs;
skb->sp->olen++;
sp = skb_sec_path(skb);
sp->xvec[sp->len++] = xs;
sp->olen++;

xo = xfrm_offload(skb);
xo->flags = CRYPTO_DONE;
Expand Down Expand Up @@ -372,10 +376,11 @@ struct sk_buff *mlx5e_ipsec_handle_rx_skb(struct net_device *netdev,
bool mlx5e_ipsec_feature_check(struct sk_buff *skb, struct net_device *netdev,
netdev_features_t features)
{
struct sec_path *sp = skb_sec_path(skb);
struct xfrm_state *x;

if (skb->sp && skb->sp->len) {
x = skb->sp->xvec[0];
if (sp && sp->len) {
x = sp->xvec[0];
if (x && x->xso.offload_handle)
return true;
}
Expand Down
7 changes: 4 additions & 3 deletions drivers/net/netdevsim/ipsec.c
Original file line number Diff line number Diff line change
Expand Up @@ -227,18 +227,19 @@ static const struct xfrmdev_ops nsim_xfrmdev_ops = {

bool nsim_ipsec_tx(struct netdevsim *ns, struct sk_buff *skb)
{
struct sec_path *sp = skb_sec_path(skb);
struct nsim_ipsec *ipsec = &ns->ipsec;
struct xfrm_state *xs;
struct nsim_sa *tsa;
u32 sa_idx;

/* do we even need to check this packet? */
if (!skb->sp)
if (!sp)
return true;

if (unlikely(!skb->sp->len)) {
if (unlikely(!sp->len)) {
netdev_err(ns->netdev, "no xfrm state len = %d\n",
skb->sp->len);
sp->len);
return false;
}

Expand Down
33 changes: 24 additions & 9 deletions include/linux/netfilter_bridge.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,43 +17,58 @@ static inline void br_drop_fake_rtable(struct sk_buff *skb)
skb_dst_drop(skb);
}

static inline struct nf_bridge_info *
nf_bridge_info_get(const struct sk_buff *skb)
{
return skb_ext_find(skb, SKB_EXT_BRIDGE_NF);
}

static inline bool nf_bridge_info_exists(const struct sk_buff *skb)
{
return skb_ext_exist(skb, SKB_EXT_BRIDGE_NF);
}

static inline int nf_bridge_get_physinif(const struct sk_buff *skb)
{
struct nf_bridge_info *nf_bridge;
const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);

if (skb->nf_bridge == NULL)
if (!nf_bridge)
return 0;

nf_bridge = skb->nf_bridge;
return nf_bridge->physindev ? nf_bridge->physindev->ifindex : 0;
}

static inline int nf_bridge_get_physoutif(const struct sk_buff *skb)
{
struct nf_bridge_info *nf_bridge;
const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);

if (skb->nf_bridge == NULL)
if (!nf_bridge)
return 0;

nf_bridge = skb->nf_bridge;
return nf_bridge->physoutdev ? nf_bridge->physoutdev->ifindex : 0;
}

static inline struct net_device *
nf_bridge_get_physindev(const struct sk_buff *skb)
{
return skb->nf_bridge ? skb->nf_bridge->physindev : NULL;
const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);

return nf_bridge ? nf_bridge->physindev : NULL;
}

static inline struct net_device *
nf_bridge_get_physoutdev(const struct sk_buff *skb)
{
return skb->nf_bridge ? skb->nf_bridge->physoutdev : NULL;
const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);

return nf_bridge ? nf_bridge->physoutdev : NULL;
}

static inline bool nf_bridge_in_prerouting(const struct sk_buff *skb)
{
return skb->nf_bridge && skb->nf_bridge->in_prerouting;
const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);

return nf_bridge && nf_bridge->in_prerouting;
}
#else
#define br_drop_fake_rtable(skb) do { } while (0)
Expand Down
Loading

0 comments on commit 4a54877

Please sign in to comment.