Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Browse files Browse the repository at this point in the history
Pull networking fixes from David Miller:
 "Several fixes here. Basically split down the line between newly
  introduced regressions and long existing problems:

   1) Double free in tipc_enable_bearer(), from Cong Wang.

   2) Many fixes to nf_conncount, from Florian Westphal.

   3) op->get_regs_len() can throw an error, check it, from Yunsheng
      Lin.

   4) Need to use GFP_ATOMIC in *_add_hash_mac_address() of fsl/fman
      driver, from Scott Wood.

   5) Inifnite loop in fib_empty_table(), from Yue Haibing.

   6) Use after free in ax25_fillin_cb(), from Cong Wang.

   7) Fix socket locking in nr_find_socket(), also from Cong Wang.

   8) Fix WoL wakeup enable in r8169, from Heiner Kallweit.

   9) On 32-bit sock->sk_stamp is not thread-safe, from Deepa Dinamani.

  10) Fix ptr_ring wrap during queue swap, from Cong Wang.

  11) Missing shutdown callback in hinic driver, from Xue Chaojing.

  12) Need to return NULL on error from ip6_neigh_lookup(), from Stefano
      Brivio.

  13) BPF out of bounds speculation fixes from Daniel Borkmann"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (57 commits)
  ipv6: Consider sk_bound_dev_if when binding a socket to an address
  ipv6: Fix dump of specific table with strict checking
  bpf: add various test cases to selftests
  bpf: prevent out of bounds speculation on pointer arithmetic
  bpf: fix check_map_access smin_value test when pointer contains offset
  bpf: restrict unknown scalars of mixed signed bounds for unprivileged
  bpf: restrict stack pointer arithmetic for unprivileged
  bpf: restrict map value pointer arithmetic for unprivileged
  bpf: enable access to ax register also from verifier rewrite
  bpf: move tmp variable into ax register in interpreter
  bpf: move {prev_,}insn_idx into verifier env
  isdn: fix kernel-infoleak in capi_unlocked_ioctl
  ipv6: route: Fix return value of ip6_neigh_lookup() on neigh_create() error
  net/hamradio/6pack: use mod_timer() to rearm timers
  net-next/hinic:add shutdown callback
  net: hns3: call hns3_nic_net_open() while doing HNAE3_UP_CLIENT
  ip: validate header length on virtual device xmit
  tap: call skb_probe_transport_header after setting skb->dev
  ptr_ring: wrap back ->producer in __ptr_ring_swap_queue()
  net: rds: remove unnecessary NULL check
  ...
  • Loading branch information
torvalds committed Jan 3, 2019
2 parents 645ff1e + c5ee066 commit 43d86ee
Show file tree
Hide file tree
Showing 60 changed files with 2,079 additions and 404 deletions.
240 changes: 239 additions & 1 deletion Documentation/networking/snmp_counter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -571,7 +571,97 @@ duplicate packet is received.

* TcpExtTCPDSACKOfoRecv
The TCP stack receives a DSACK, which indicate an out of order
duplciate packet is received.
duplicate packet is received.

TCP out of order
===============
* TcpExtTCPOFOQueue
The TCP layer receives an out of order packet and has enough memory
to queue it.

* TcpExtTCPOFODrop
The TCP layer receives an out of order packet but doesn't have enough
memory, so drops it. Such packets won't be counted into
TcpExtTCPOFOQueue.

* TcpExtTCPOFOMerge
The received out of order packet has an overlay with the previous
packet. the overlay part will be dropped. All of TcpExtTCPOFOMerge
packets will also be counted into TcpExtTCPOFOQueue.

TCP PAWS
=======
PAWS (Protection Against Wrapped Sequence numbers) is an algorithm
which is used to drop old packets. It depends on the TCP
timestamps. For detail information, please refer the `timestamp wiki`_
and the `RFC of PAWS`_.

.. _RFC of PAWS: https://tools.ietf.org/html/rfc1323#page-17
.. _timestamp wiki: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#TCP_timestamps

* TcpExtPAWSActive
Packets are dropped by PAWS in Syn-Sent status.

* TcpExtPAWSEstab
Packets are dropped by PAWS in any status other than Syn-Sent.

TCP ACK skip
===========
In some scenarios, kernel would avoid sending duplicate ACKs too
frequently. Please find more details in the tcp_invalid_ratelimit
section of the `sysctl document`_. When kernel decides to skip an ACK
due to tcp_invalid_ratelimit, kernel would update one of below
counters to indicate the ACK is skipped in which scenario. The ACK
would only be skipped if the received packet is either a SYN packet or
it has no data.

.. _sysctl document: https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

* TcpExtTCPACKSkippedSynRecv
The ACK is skipped in Syn-Recv status. The Syn-Recv status means the
TCP stack receives a SYN and replies SYN+ACK. Now the TCP stack is
waiting for an ACK. Generally, the TCP stack doesn't need to send ACK
in the Syn-Recv status. But in several scenarios, the TCP stack need
to send an ACK. E.g., the TCP stack receives the same SYN packet
repeately, the received packet does not pass the PAWS check, or the
received packet sequence number is out of window. In these scenarios,
the TCP stack needs to send ACK. If the ACk sending frequency is higher than
tcp_invalid_ratelimit allows, the TCP stack will skip sending ACK and
increase TcpExtTCPACKSkippedSynRecv.


* TcpExtTCPACKSkippedPAWS
The ACK is skipped due to PAWS (Protect Against Wrapped Sequence
numbers) check fails. If the PAWS check fails in Syn-Recv, Fin-Wait-2
or Time-Wait statuses, the skipped ACK would be counted to
TcpExtTCPACKSkippedSynRecv, TcpExtTCPACKSkippedFinWait2 or
TcpExtTCPACKSkippedTimeWait. In all other statuses, the skipped ACK
would be counted to TcpExtTCPACKSkippedPAWS.

* TcpExtTCPACKSkippedSeq
The sequence number is out of window and the timestamp passes the PAWS
check and the TCP status is not Syn-Recv, Fin-Wait-2, and Time-Wait.

* TcpExtTCPACKSkippedFinWait2
The ACK is skipped in Fin-Wait-2 status, the reason would be either
PAWS check fails or the received sequence number is out of window.

* TcpExtTCPACKSkippedTimeWait
Tha ACK is skipped in Time-Wait status, the reason would be either
PAWS check failed or the received sequence number is out of window.

* TcpExtTCPACKSkippedChallenge
The ACK is skipped if the ACK is a challenge ACK. The RFC 5961 defines
3 kind of challenge ACK, please refer `RFC 5961 section 3.2`_,
`RFC 5961 section 4.2`_ and `RFC 5961 section 5.2`_. Besides these
three scenarios, In some TCP status, the linux TCP stack would also
send challenge ACKs if the ACK number is before the first
unacknowledged number (more strict than `RFC 5961 section 5.2`_).

.. _RFC 5961 section 3.2: https://tools.ietf.org/html/rfc5961#page-7
.. _RFC 5961 section 4.2: https://tools.ietf.org/html/rfc5961#page-9
.. _RFC 5961 section 5.2: https://tools.ietf.org/html/rfc5961#page-11


examples
=======
Expand Down Expand Up @@ -1188,3 +1278,151 @@ Run nstat on server B::
We have deleted the default route on server B. Server B couldn't find
a route for the 8.8.8.8 IP address, so server B increased
IpOutNoRoutes.

TcpExtTCPACKSkippedSynRecv
------------------------
In this test, we send 3 same SYN packets from client to server. The
first SYN will let server create a socket, set it to Syn-Recv status,
and reply a SYN/ACK. The second SYN will let server reply the SYN/ACK
again, and record the reply time (the duplicate ACK reply time). The
third SYN will let server check the previous duplicate ACK reply time,
and decide to skip the duplicate ACK, then increase the
TcpExtTCPACKSkippedSynRecv counter.

Run tcpdump to capture a SYN packet::

nstatuser@nstat-a:~$ sudo tcpdump -c 1 -w /tmp/syn.pcap port 9000
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes

Open another terminal, run nc command::

nstatuser@nstat-a:~$ nc nstat-b 9000

As the nstat-b didn't listen on port 9000, it should reply a RST, and
the nc command exited immediately. It was enough for the tcpdump
command to capture a SYN packet. A linux server might use hardware
offload for the TCP checksum, so the checksum in the /tmp/syn.pcap
might be not correct. We call tcprewrite to fix it::

nstatuser@nstat-a:~$ tcprewrite --infile=/tmp/syn.pcap --outfile=/tmp/syn_fixcsum.pcap --fixcsum

On nstat-b, we run nc to listen on port 9000::

nstatuser@nstat-b:~$ nc -lkv 9000
Listening on [0.0.0.0] (family 0, port 9000)

On nstat-a, we blocked the packet from port 9000, or nstat-a would send
RST to nstat-b::

nstatuser@nstat-a:~$ sudo iptables -A INPUT -p tcp --sport 9000 -j DROP

Send 3 SYN repeatly to nstat-b::

nstatuser@nstat-a:~$ for i in {1..3}; do sudo tcpreplay -i ens3 /tmp/syn_fixcsum.pcap; done

Check snmp cunter on nstat-b::

nstatuser@nstat-b:~$ nstat | grep -i skip
TcpExtTCPACKSkippedSynRecv 1 0.0

As we expected, TcpExtTCPACKSkippedSynRecv is 1.

TcpExtTCPACKSkippedPAWS
----------------------
To trigger PAWS, we could send an old SYN.

On nstat-b, let nc listen on port 9000::

nstatuser@nstat-b:~$ nc -lkv 9000
Listening on [0.0.0.0] (family 0, port 9000)

On nstat-a, run tcpdump to capture a SYN::

nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/paws_pre.pcap -c 1 port 9000
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes

On nstat-a, run nc as a client to connect nstat-b::

nstatuser@nstat-a:~$ nc -v nstat-b 9000
Connection to nstat-b 9000 port [tcp/*] succeeded!

Now the tcpdump has captured the SYN and exit. We should fix the
checksum::

nstatuser@nstat-a:~$ tcprewrite --infile /tmp/paws_pre.pcap --outfile /tmp/paws.pcap --fixcsum

Send the SYN packet twice::

nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/paws.pcap; done

On nstat-b, check the snmp counter::

nstatuser@nstat-b:~$ nstat | grep -i skip
TcpExtTCPACKSkippedPAWS 1 0.0

We sent two SYN via tcpreplay, both of them would let PAWS check
failed, the nstat-b replied an ACK for the first SYN, skipped the ACK
for the second SYN, and updated TcpExtTCPACKSkippedPAWS.

TcpExtTCPACKSkippedSeq
--------------------
To trigger TcpExtTCPACKSkippedSeq, we send packets which have valid
timestamp (to pass PAWS check) but the sequence number is out of
window. The linux TCP stack would avoid to skip if the packet has
data, so we need a pure ACK packet. To generate such a packet, we
could create two sockets: one on port 9000, another on port 9001. Then
we capture an ACK on port 9001, change the source/destination port
numbers to match the port 9000 socket. Then we could trigger
TcpExtTCPACKSkippedSeq via this packet.

On nstat-b, open two terminals, run two nc commands to listen on both
port 9000 and port 9001::

nstatuser@nstat-b:~$ nc -lkv 9000
Listening on [0.0.0.0] (family 0, port 9000)

nstatuser@nstat-b:~$ nc -lkv 9001
Listening on [0.0.0.0] (family 0, port 9001)

On nstat-a, run two nc clients::

nstatuser@nstat-a:~$ nc -v nstat-b 9000
Connection to nstat-b 9000 port [tcp/*] succeeded!

nstatuser@nstat-a:~$ nc -v nstat-b 9001
Connection to nstat-b 9001 port [tcp/*] succeeded!

On nstat-a, run tcpdump to capture an ACK::

nstatuser@nstat-a:~$ sudo tcpdump -w /tmp/seq_pre.pcap -c 1 dst port 9001
tcpdump: listening on ens3, link-type EN10MB (Ethernet), capture size 262144 bytes

On nstat-b, send a packet via the port 9001 socket. E.g. we sent a
string 'foo' in our example::

nstatuser@nstat-b:~$ nc -lkv 9001
Listening on [0.0.0.0] (family 0, port 9001)
Connection from nstat-a 42132 received!
foo

On nstat-a, the tcpdump should have caputred the ACK. We should check
the source port numbers of the two nc clients::

nstatuser@nstat-a:~$ ss -ta '( dport = :9000 || dport = :9001 )' | tee
State Recv-Q Send-Q Local Address:Port Peer Address:Port
ESTAB 0 0 192.168.122.250:50208 192.168.122.251:9000
ESTAB 0 0 192.168.122.250:42132 192.168.122.251:9001

Run tcprewrite, change port 9001 to port 9000, chagne port 42132 to
port 50208::

nstatuser@nstat-a:~$ tcprewrite --infile /tmp/seq_pre.pcap --outfile /tmp/seq.pcap -r 9001:9000 -r 42132:50208 --fixcsum

Now the /tmp/seq.pcap is the packet we need. Send it to nstat-b::

nstatuser@nstat-a:~$ for i in {1..2}; do sudo tcpreplay -i ens3 /tmp/seq.pcap; done

Check TcpExtTCPACKSkippedSeq on nstat-b::

nstatuser@nstat-b:~$ nstat | grep -i skip
TcpExtTCPACKSkippedSeq 1 0.0
4 changes: 2 additions & 2 deletions drivers/isdn/capi/kcapi.c
Original file line number Diff line number Diff line change
Expand Up @@ -852,15 +852,15 @@ u16 capi20_get_manufacturer(u32 contr, u8 *buf)
u16 ret;

if (contr == 0) {
strlcpy(buf, capi_manufakturer, CAPI_MANUFACTURER_LEN);
strncpy(buf, capi_manufakturer, CAPI_MANUFACTURER_LEN);
return CAPI_NOERROR;
}

mutex_lock(&capi_controller_lock);

ctr = get_capi_ctr_by_nr(contr);
if (ctr && ctr->state == CAPI_CTR_RUNNING) {
strlcpy(buf, ctr->manu, CAPI_MANUFACTURER_LEN);
strncpy(buf, ctr->manu, CAPI_MANUFACTURER_LEN);
ret = CAPI_NOERROR;
} else
ret = CAPI_REGNOTINSTALLED;
Expand Down
2 changes: 2 additions & 0 deletions drivers/isdn/hisax/hfc_pci.c
Original file line number Diff line number Diff line change
Expand Up @@ -1169,11 +1169,13 @@ HFCPCI_l1hw(struct PStack *st, int pr, void *arg)
if (cs->debug & L1_DEB_LAPD)
debugl1(cs, "-> PH_REQUEST_PULL");
#endif
spin_lock_irqsave(&cs->lock, flags);
if (!cs->tx_skb) {
test_and_clear_bit(FLG_L1_PULL_REQ, &st->l1.Flags);
st->l1.l1l2(st, PH_PULL | CONFIRM, NULL);
} else
test_and_set_bit(FLG_L1_PULL_REQ, &st->l1.Flags);
spin_unlock_irqrestore(&cs->lock, flags);
break;
case (HW_RESET | REQUEST):
spin_lock_irqsave(&cs->lock, flags);
Expand Down
7 changes: 3 additions & 4 deletions drivers/net/dsa/bcm_sf2.c
Original file line number Diff line number Diff line change
Expand Up @@ -303,11 +303,10 @@ static int bcm_sf2_sw_mdio_write(struct mii_bus *bus, int addr, int regnum,
* send them to our master MDIO bus controller
*/
if (addr == BRCM_PSEUDO_PHY_ADDR && priv->indir_phy_mask & BIT(addr))
bcm_sf2_sw_indir_rw(priv, 0, addr, regnum, val);
return bcm_sf2_sw_indir_rw(priv, 0, addr, regnum, val);
else
mdiobus_write_nested(priv->master_mii_bus, addr, regnum, val);

return 0;
return mdiobus_write_nested(priv->master_mii_bus, addr,
regnum, val);
}

static irqreturn_t bcm_sf2_switch_0_isr(int irq, void *dev_id)
Expand Down
4 changes: 3 additions & 1 deletion drivers/net/ethernet/atheros/atl1e/atl1e_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -473,7 +473,9 @@ static void atl1e_mdio_write(struct net_device *netdev, int phy_id,
{
struct atl1e_adapter *adapter = netdev_priv(netdev);

atl1e_write_phy_reg(&adapter->hw, reg_num & MDIO_REG_ADDR_MASK, val);
if (atl1e_write_phy_reg(&adapter->hw,
reg_num & MDIO_REG_ADDR_MASK, val))
netdev_err(netdev, "write phy register failed\n");
}

static int atl1e_mii_ioctl(struct net_device *netdev,
Expand Down
4 changes: 4 additions & 0 deletions drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
Original file line number Diff line number Diff line change
Expand Up @@ -1229,6 +1229,10 @@ int cudbg_collect_hw_sched(struct cudbg_init *pdbg_init,

rc = cudbg_get_buff(pdbg_init, dbg_buff, sizeof(struct cudbg_hw_sched),
&temp_buff);

if (rc)
return rc;

hw_sched_buff = (struct cudbg_hw_sched *)temp_buff.data;
hw_sched_buff->map = t4_read_reg(padap, TP_TX_MOD_QUEUE_REQ_MAP_A);
hw_sched_buff->mode = TIMERMODE_G(t4_read_reg(padap, TP_MOD_CONFIG_A));
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/ethernet/freescale/fman/fman_memac.c
Original file line number Diff line number Diff line change
Expand Up @@ -928,7 +928,7 @@ int memac_add_hash_mac_address(struct fman_mac *memac, enet_addr_t *eth_addr)
hash = get_mac_addr_hash_code(addr) & HASH_CTRL_ADDR_MASK;

/* Create element to be added to the driver hash table */
hash_entry = kmalloc(sizeof(*hash_entry), GFP_KERNEL);
hash_entry = kmalloc(sizeof(*hash_entry), GFP_ATOMIC);
if (!hash_entry)
return -ENOMEM;
hash_entry->addr = addr;
Expand Down
2 changes: 1 addition & 1 deletion drivers/net/ethernet/freescale/fman/fman_tgec.c
Original file line number Diff line number Diff line change
Expand Up @@ -553,7 +553,7 @@ int tgec_add_hash_mac_address(struct fman_mac *tgec, enet_addr_t *eth_addr)
hash = (crc >> TGEC_HASH_MCAST_SHIFT) & TGEC_HASH_ADR_MSK;

/* Create element to be added to the driver hash table */
hash_entry = kmalloc(sizeof(*hash_entry), GFP_KERNEL);
hash_entry = kmalloc(sizeof(*hash_entry), GFP_ATOMIC);
if (!hash_entry)
return -ENOMEM;
hash_entry->addr = addr;
Expand Down
7 changes: 4 additions & 3 deletions drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
Original file line number Diff line number Diff line change
Expand Up @@ -3995,17 +3995,18 @@ static int hns3_reset_notify_up_enet(struct hnae3_handle *handle)
struct hns3_nic_priv *priv = netdev_priv(kinfo->netdev);
int ret = 0;

clear_bit(HNS3_NIC_STATE_RESETTING, &priv->state);

if (netif_running(kinfo->netdev)) {
ret = hns3_nic_net_up(kinfo->netdev);
ret = hns3_nic_net_open(kinfo->netdev);
if (ret) {
set_bit(HNS3_NIC_STATE_RESETTING, &priv->state);
netdev_err(kinfo->netdev,
"hns net up fail, ret=%d!\n", ret);
return ret;
}
}

clear_bit(HNS3_NIC_STATE_RESETTING, &priv->state);

return ret;
}

Expand Down
6 changes: 6 additions & 0 deletions drivers/net/ethernet/huawei/hinic/hinic_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -1106,6 +1106,11 @@ static void hinic_remove(struct pci_dev *pdev)
dev_info(&pdev->dev, "HiNIC driver - removed\n");
}

static void hinic_shutdown(struct pci_dev *pdev)
{
pci_disable_device(pdev);
}

static const struct pci_device_id hinic_pci_table[] = {
{ PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_QUAD_PORT_25GE), 0},
{ PCI_VDEVICE(HUAWEI, HINIC_DEV_ID_DUAL_PORT_25GE), 0},
Expand All @@ -1119,6 +1124,7 @@ static struct pci_driver hinic_driver = {
.id_table = hinic_pci_table,
.probe = hinic_probe,
.remove = hinic_remove,
.shutdown = hinic_shutdown,
};

module_pci_driver(hinic_driver);
6 changes: 5 additions & 1 deletion drivers/net/ethernet/ibm/ibmveth.c
Original file line number Diff line number Diff line change
Expand Up @@ -1171,11 +1171,15 @@ static netdev_tx_t ibmveth_start_xmit(struct sk_buff *skb,

map_failed_frags:
last = i+1;
for (i = 0; i < last; i++)
for (i = 1; i < last; i++)
dma_unmap_page(&adapter->vdev->dev, descs[i].fields.address,
descs[i].fields.flags_len & IBMVETH_BUF_LEN_MASK,
DMA_TO_DEVICE);

dma_unmap_single(&adapter->vdev->dev,
descs[0].fields.address,
descs[0].fields.flags_len & IBMVETH_BUF_LEN_MASK,
DMA_TO_DEVICE);
map_failed:
if (!firmware_has_feature(FW_FEATURE_CMO))
netdev_err(netdev, "tx: unable to map xmit buffer\n");
Expand Down
Loading

0 comments on commit 43d86ee

Please sign in to comment.