Skip to content

Commit

Permalink
tcp: fix SO_RCVLOWAT hangs with fat skbs
Browse files Browse the repository at this point in the history
We autotune rcvbuf whenever SO_RCVLOWAT is set to account for 100%
overhead in tcp_set_rcvlowat()

This works well when skb->len/skb->truesize ratio is bigger than 0.5

But if we receive packets with small MSS, we can end up in a situation
where not enough bytes are available in the receive queue to satisfy
RCVLOWAT setting.
As our sk_rcvbuf limit is hit, we send zero windows in ACK packets,
preventing remote peer from sending more data.

Even autotuning does not help, because it only triggers at the time
user process drains the queue. If no EPOLLIN is generated, this
can not happen.

Note poll() has a similar issue, after commit
c700448 ("tcp: Respect SO_RCVLOWAT in tcp_poll().")

Fixes: 03f45c8 ("tcp: avoid extra wakeups for SO_RCVLOWAT users")
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Soheil Hassas Yeganeh <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
  • Loading branch information
Eric Dumazet authored and davem330 committed May 12, 2020
1 parent 92db978 commit 24adbc1
Show file tree
Hide file tree
Showing 3 changed files with 26 additions and 4 deletions.
13 changes: 13 additions & 0 deletions include/net/tcp.h
Original file line number Diff line number Diff line change
Expand Up @@ -1420,6 +1420,19 @@ static inline int tcp_full_space(const struct sock *sk)
return tcp_win_from_space(sk, READ_ONCE(sk->sk_rcvbuf));
}

/* We provision sk_rcvbuf around 200% of sk_rcvlowat.
* If 87.5 % (7/8) of the space has been consumed, we want to override
* SO_RCVLOWAT constraint, since we are receiving skbs with too small
* len/truesize ratio.
*/
static inline bool tcp_rmem_pressure(const struct sock *sk)
{
int rcvbuf = READ_ONCE(sk->sk_rcvbuf);
int threshold = rcvbuf - (rcvbuf >> 3);

return atomic_read(&sk->sk_rmem_alloc) > threshold;
}

extern void tcp_openreq_init_rwin(struct request_sock *req,
const struct sock *sk_listener,
const struct dst_entry *dst);
Expand Down
14 changes: 11 additions & 3 deletions net/ipv4/tcp.c
Original file line number Diff line number Diff line change
Expand Up @@ -476,9 +476,17 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
static inline bool tcp_stream_is_readable(const struct tcp_sock *tp,
int target, struct sock *sk)
{
return (READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq) >= target) ||
(sk->sk_prot->stream_memory_read ?
sk->sk_prot->stream_memory_read(sk) : false);
int avail = READ_ONCE(tp->rcv_nxt) - READ_ONCE(tp->copied_seq);

if (avail > 0) {
if (avail >= target)
return true;
if (tcp_rmem_pressure(sk))
return true;
}
if (sk->sk_prot->stream_memory_read)
return sk->sk_prot->stream_memory_read(sk);
return false;
}

/*
Expand Down
3 changes: 2 additions & 1 deletion net/ipv4/tcp_input.c
Original file line number Diff line number Diff line change
Expand Up @@ -4757,7 +4757,8 @@ void tcp_data_ready(struct sock *sk)
const struct tcp_sock *tp = tcp_sk(sk);
int avail = tp->rcv_nxt - tp->copied_seq;

if (avail < sk->sk_rcvlowat && !sock_flag(sk, SOCK_DONE))
if (avail < sk->sk_rcvlowat && !tcp_rmem_pressure(sk) &&
!sock_flag(sk, SOCK_DONE))
return;

sk->sk_data_ready(sk);
Expand Down

0 comments on commit 24adbc1

Please sign in to comment.