Skip to content

Commit f017c1f

Browse files
edumazetkuba-moo
authored andcommitted
tcp: use skb->len instead of skb->truesize in tcp_can_ingest()
Some applications are stuck to the 20th century and still use small SO_RCVBUF values. After the blamed commit, we can drop packets especially when using LRO/hw-gro enabled NIC and small MSS (1500) values. LRO/hw-gro NIC pack multiple segments into pages, allowing tp->scaling_ratio to be set to a high value. Whenever the receive queue gets full, we can receive a small packet filling RWIN, but with a high skb->truesize, because most NIC use 4K page plus sk_buff metadata even when receiving less than 1500 bytes of payload. Even if we refine how tp->scaling_ratio is estimated, we could have an issue at the start of the flow, because the first round of packets (IW10) will be sent based on the initial tp->scaling_ratio (1/2) Relax tcp_can_ingest() to use skb->len instead of skb->truesize, allowing the peer to use final RWIN, assuming a 'perfect' scaling_ratio of 1. Fixes: 1d2fbaa ("tcp: stronger sk_rcvbuf checks") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20250927092827.2707901-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1 parent d210ee5 commit f017c1f

1 file changed

Lines changed: 13 additions & 2 deletions

File tree

net/ipv4/tcp_input.c

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5086,12 +5086,23 @@ static int tcp_prune_queue(struct sock *sk, const struct sk_buff *in_skb);
50865086

50875087
/* Check if this incoming skb can be added to socket receive queues
50885088
* while satisfying sk->sk_rcvbuf limit.
5089+
*
5090+
* In theory we should use skb->truesize, but this can cause problems
5091+
* when applications use too small SO_RCVBUF values.
5092+
* When LRO / hw gro is used, the socket might have a high tp->scaling_ratio,
5093+
* allowing RWIN to be close to available space.
5094+
* Whenever the receive queue gets full, we can receive a small packet
5095+
* filling RWIN, but with a high skb->truesize, because most NIC use 4K page
5096+
* plus sk_buff metadata even when receiving less than 1500 bytes of payload.
5097+
*
5098+
* Note that we use skb->len to decide to accept or drop this packet,
5099+
* but sk->sk_rmem_alloc is the sum of all skb->truesize.
50895100
*/
50905101
static bool tcp_can_ingest(const struct sock *sk, const struct sk_buff *skb)
50915102
{
5092-
unsigned int new_mem = atomic_read(&sk->sk_rmem_alloc) + skb->truesize;
5103+
unsigned int rmem = atomic_read(&sk->sk_rmem_alloc);
50935104

5094-
return new_mem <= sk->sk_rcvbuf;
5105+
return rmem + skb->len <= sk->sk_rcvbuf;
50955106
}
50965107

50975108
static int tcp_try_rmem_schedule(struct sock *sk, const struct sk_buff *skb,

0 commit comments

Comments
 (0)