Commit 3d8dc43
bpf: Switch to bpf mem allocator for LPM trie
Multiple syzbot warnings have been reported. These warnings are mainly
about the lock order between trie->lock and kmalloc()'s internal lock.
See report [1] as an example:
======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc7-syzkaller-00003-g4376e966ecb7 #0 Not tainted
------------------------------------------------------
syz.3.2069/15008 is trying to acquire lock:
ffff88801544e6d8 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node ...
but task is already holding lock:
ffff88802dcc89f8 (&trie->lock){-.-.}-{2:2}, at: trie_update_elem ...
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&trie->lock){-.-.}-{2:2}:
__raw_spin_lock_irqsave
_raw_spin_lock_irqsave+0x3a/0x60
trie_delete_elem+0xb0/0x820
___bpf_prog_run+0x3e51/0xabd0
__bpf_prog_run32+0xc1/0x100
bpf_dispatcher_nop_func
......
bpf_trace_run2+0x231/0x590
__bpf_trace_contention_end+0xca/0x110
trace_contention_end.constprop.0+0xea/0x170
__pv_queued_spin_lock_slowpath+0x28e/0xcc0
pv_queued_spin_lock_slowpath
queued_spin_lock_slowpath
queued_spin_lock
do_raw_spin_lock+0x210/0x2c0
__raw_spin_lock_irqsave
_raw_spin_lock_irqsave+0x42/0x60
__put_partials+0xc3/0x170
qlink_free
qlist_free_all+0x4e/0x140
kasan_quarantine_reduce+0x192/0x1e0
__kasan_slab_alloc+0x69/0x90
kasan_slab_alloc
slab_post_alloc_hook
slab_alloc_node
kmem_cache_alloc_node_noprof+0x153/0x310
__alloc_skb+0x2b1/0x380
......
-> #0 (&n->list_lock){-.-.}-{2:2}:
check_prev_add
check_prevs_add
validate_chain
__lock_acquire+0x2478/0x3b30
lock_acquire
lock_acquire+0x1b1/0x560
__raw_spin_lock_irqsave
_raw_spin_lock_irqsave+0x3a/0x60
get_partial_node.part.0+0x20/0x350
get_partial_node
get_partial
___slab_alloc+0x65b/0x1870
__slab_alloc.constprop.0+0x56/0xb0
__slab_alloc_node
slab_alloc_node
__do_kmalloc_node
__kmalloc_node_noprof+0x35c/0x440
kmalloc_node_noprof
bpf_map_kmalloc_node+0x98/0x4a0
lpm_trie_node_alloc
trie_update_elem+0x1ef/0xe00
bpf_map_update_value+0x2c1/0x6c0
map_update_elem+0x623/0x910
__sys_bpf+0x90c/0x49a0
...
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&trie->lock);
lock(&n->list_lock);
lock(&trie->lock);
lock(&n->list_lock);
*** DEADLOCK ***
[1]: https://syzkaller.appspot.com/bug?extid=9045c0a3d5a7f1b119f7
A bpf program attached to trace_contention_end() triggers after
acquiring &n->list_lock. The program invokes trie_delete_elem(), which
then acquires trie->lock. However, it is possible that another
process is invoking trie_update_elem(). trie_update_elem() will acquire
trie->lock first, then invoke kmalloc_node(). kmalloc_node() may invoke
get_partial_node() and try to acquire &n->list_lock (not necessarily the
same lock object). Therefore, lockdep warns about the circular locking
dependency.
Invoking kmalloc() before acquiring trie->lock could fix the warning.
However, since BPF programs call be invoked from any context (e.g.,
through kprobe/tracepoint/fentry), there may still be lock ordering
problems for internal locks in kmalloc() or trie->lock itself.
To eliminate these potential lock ordering problems with kmalloc()'s
internal locks, replacing kmalloc()/kfree()/kfree_rcu() with equivalent
BPF memory allocator APIs that can be invoked in any context. The lock
ordering problems with trie->lock (e.g., reentrance) will be handled
separately.
Three aspects of this change require explanation:
1. Intermediate and leaf nodes are allocated from the same allocator.
Since the value size of LPM trie is usually small, using a single
alocator reduces the memory overhead of the BPF memory allocator.
2. Leaf nodes are allocated before disabling IRQs. This handles cases
where leaf_size is large (e.g., > 4KB - 8) and updates require
intermediate node allocation. If leaf nodes were allocated in
IRQ-disabled region, the free objects in BPF memory allocator would not
be refilled timely and the intermediate node allocation may fail.
3. Paired migrate_{disable|enable}() calls for node alloc and free. The
BPF memory allocator uses per-CPU struct internally, these paired calls
are necessary to guarantee correctness.
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20241206110622.1161752-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>1 parent 27abc7b commit 3d8dc43
1 file changed
Lines changed: 48 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
25 | | - | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
| |||
287 | 288 | | |
288 | 289 | | |
289 | 290 | | |
290 | | - | |
291 | | - | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
292 | 294 | | |
293 | 295 | | |
294 | | - | |
295 | 296 | | |
296 | | - | |
297 | | - | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
298 | 302 | | |
299 | | - | |
300 | | - | |
301 | 303 | | |
302 | 304 | | |
303 | 305 | | |
| |||
325 | 327 | | |
326 | 328 | | |
327 | 329 | | |
328 | | - | |
| 330 | + | |
329 | 331 | | |
330 | 332 | | |
331 | 333 | | |
| |||
340 | 342 | | |
341 | 343 | | |
342 | 344 | | |
343 | | - | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
344 | 351 | | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
| 352 | + | |
351 | 353 | | |
352 | 354 | | |
353 | 355 | | |
| |||
423 | 425 | | |
424 | 426 | | |
425 | 427 | | |
426 | | - | |
| 428 | + | |
| 429 | + | |
427 | 430 | | |
428 | 431 | | |
429 | 432 | | |
| |||
447 | 450 | | |
448 | 451 | | |
449 | 452 | | |
450 | | - | |
451 | | - | |
452 | 453 | | |
453 | | - | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
454 | 460 | | |
455 | 461 | | |
456 | 462 | | |
| |||
548 | 554 | | |
549 | 555 | | |
550 | 556 | | |
551 | | - | |
552 | | - | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
553 | 562 | | |
554 | 563 | | |
555 | 564 | | |
| |||
571 | 580 | | |
572 | 581 | | |
573 | 582 | | |
| 583 | + | |
| 584 | + | |
574 | 585 | | |
575 | 586 | | |
576 | 587 | | |
| |||
595 | 606 | | |
596 | 607 | | |
597 | 608 | | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
598 | 615 | | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
599 | 620 | | |
600 | 621 | | |
601 | 622 | | |
| |||
627 | 648 | | |
628 | 649 | | |
629 | 650 | | |
630 | | - | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
631 | 655 | | |
632 | 656 | | |
633 | 657 | | |
634 | 658 | | |
635 | 659 | | |
636 | 660 | | |
| 661 | + | |
637 | 662 | | |
638 | 663 | | |
639 | 664 | | |
| |||
0 commit comments