Skip to content

Commit ac1499f

Browse files
nashuiliangkuba-moo
authored andcommitted
ipv4: route: Prevent rt_bind_exception() from rebinding stale fnhe
The sit driver's packet transmission path calls: sit_tunnel_xmit() -> update_or_create_fnhe(), which lead to fnhe_remove_oldest() being called to delete entries exceeding FNHE_RECLAIM_DEPTH+random. The race window is between fnhe_remove_oldest() selecting fnheX for deletion and the subsequent kfree_rcu(). During this time, the concurrent path's __mkroute_output() -> find_exception() can fetch the soon-to-be-deleted fnheX, and rt_bind_exception() then binds it with a new dst using a dst_hold(). When the original fnheX is freed via RCU, the dst reference remains permanently leaked. CPU 0 CPU 1 __mkroute_output() find_exception() [fnheX] update_or_create_fnhe() fnhe_remove_oldest() [fnheX] rt_bind_exception() [bind dst] RCU callback [fnheX freed, dst leak] This issue manifests as a device reference count leak and a warning in dmesg when unregistering the net device: unregister_netdevice: waiting for sitX to become free. Usage count = N Ido Schimmel provided the simple test validation method [1]. The fix clears 'oldest->fnhe_daddr' before calling fnhe_flush_routes(). Since rt_bind_exception() checks this field, setting it to zero prevents the stale fnhe from being reused and bound to a new dst just before it is freed. [1] ip netns add ns1 ip -n ns1 link set dev lo up ip -n ns1 address add 192.0.2.1/32 dev lo ip -n ns1 link add name dummy1 up type dummy ip -n ns1 route add 192.0.2.2/32 dev dummy1 ip -n ns1 link add name gretap1 up arp off type gretap \ local 192.0.2.1 remote 192.0.2.2 ip -n ns1 route add 198.51.0.0/16 dev gretap1 taskset -c 0 ip netns exec ns1 mausezahn gretap1 \ -A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q & taskset -c 2 ip netns exec ns1 mausezahn gretap1 \ -A 198.51.100.1 -B 198.51.0.0/16 -t udp -p 1000 -c 0 -q & sleep 10 ip netns pids ns1 | xargs kill ip netns del ns1 Cc: stable@vger.kernel.org Fixes: 67d6d68 ("ipv4: make exception cache less predictible") Signed-off-by: Chuang Wang <nashuiliang@gmail.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251111064328.24440-1-nashuiliang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1 parent 0345552 commit ac1499f

1 file changed

Lines changed: 5 additions & 0 deletions

File tree

net/ipv4/route.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -607,6 +607,11 @@ static void fnhe_remove_oldest(struct fnhe_hash_bucket *hash)
607607
oldest_p = fnhe_p;
608608
}
609609
}
610+
611+
/* Clear oldest->fnhe_daddr to prevent this fnhe from being
612+
* rebound with new dsts in rt_bind_exception().
613+
*/
614+
oldest->fnhe_daddr = 0;
610615
fnhe_flush_routes(oldest);
611616
*oldest_p = oldest->fnhe_next;
612617
kfree_rcu(oldest, rcu);

0 commit comments

Comments
 (0)