Commit 35d2b7f
bpf, sockmap: Fix preempt_rt splat when using raw_spin_lock_t
Sockmap and sockhash maps are a collection of psocks that are
objects representing a socket plus a set of metadata needed
to manage the BPF programs associated with the socket. These
maps use the stab->lock to protect from concurrent operations
on the maps, e.g. trying to insert to objects into the array
at the same time in the same slot. Additionally, a sockhash map
has a bucket lock to protect iteration and insert/delete into
the hash entry.
Each psock has a psock->link which is a linked list of all the
maps that a psock is attached to. This allows a psock (socket)
to be included in multiple sockmap and sockhash maps. This
linked list is protected the psock->link_lock.
They _must_ be nested correctly to avoid deadlock:
lock(stab->lock)
: do BPF map operations and psock insert/delete
lock(psock->link_lock)
: add map to psock linked list of maps
unlock(psock->link_lock)
unlock(stab->lock)
For non PREEMPT_RT kernels both raw_spin_lock_t and spin_lock_t
are guaranteed to not sleep. But, with PREEMPT_RT kernels the
spin_lock_t variants may sleep. In the current code we have
many patterns like this:
rcu_critical_section:
raw_spin_lock(stab->lock)
spin_lock(psock->link_lock) <- may sleep ouch
spin_unlock(psock->link_lock)
raw_spin_unlock(stab->lock)
rcu_critical_section
Nesting spin_lock() inside a raw_spin_lock() violates locking
rules for PREEMPT_RT kernels. And additionally we do alloc(GFP_ATOMICS)
inside the stab->lock, but those might sleep on PREEMPT_RT kernels.
The result is splats like this:
./test_progs -t sockmap_basic
[ 33.344330] bpf_testmod: loading out-of-tree module taints kernel.
[ 33.441933]
[ 33.442089] =============================
[ 33.442421] [ BUG: Invalid wait context ]
[ 33.442763] 6.5.0-rc5-01731-gec0ded2e0282 #4958 Tainted: G O
[ 33.443320] -----------------------------
[ 33.443624] test_progs/2073 is trying to lock:
[ 33.443960] ffff888102a1c290 (&psock->link_lock){....}-{3:3}, at: sock_map_update_common+0x2c2/0x3d0
[ 33.444636] other info that might help us debug this:
[ 33.444991] context-{5:5}
[ 33.445183] 3 locks held by test_progs/2073:
[ 33.445498] #0: ffff88811a208d30 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_map_update_elem_sys+0xff/0x330
[ 33.446159] #1: ffffffff842539e0 (rcu_read_lock){....}-{1:3}, at: sock_map_update_elem_sys+0xf5/0x330
[ 33.446809] #2: ffff88810d687240 (&stab->lock){+...}-{2:2}, at: sock_map_update_common+0x177/0x3d0
[ 33.447445] stack backtrace:
[ 33.447655] CPU: 10 PID
To fix observe we can't readily remove the allocations (for that
we would need to use/create something similar to bpf_map_alloc). So
convert raw_spin_lock_t to spin_lock_t. We note that sock_map_update
that would trigger the allocate and potential sleep is only allowed
through sys_bpf ops and via sock_ops which precludes hw interrupts
and low level atomic sections in RT preempt kernel. On non RT
preempt kernel there are no changes here and spin locks sections
and alloc(GFP_ATOMIC) are still not sleepable.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230830053517.166611-1-john.fastabend@gmail.com1 parent be4033d commit 35d2b7f
1 file changed
Lines changed: 18 additions & 18 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
47 | | - | |
| 47 | + | |
48 | 48 | | |
49 | 49 | | |
50 | 50 | | |
| |||
411 | 411 | | |
412 | 412 | | |
413 | 413 | | |
414 | | - | |
| 414 | + | |
415 | 415 | | |
416 | 416 | | |
417 | 417 | | |
| |||
421 | 421 | | |
422 | 422 | | |
423 | 423 | | |
424 | | - | |
| 424 | + | |
425 | 425 | | |
426 | 426 | | |
427 | 427 | | |
| |||
487 | 487 | | |
488 | 488 | | |
489 | 489 | | |
490 | | - | |
| 490 | + | |
491 | 491 | | |
492 | 492 | | |
493 | 493 | | |
| |||
501 | 501 | | |
502 | 502 | | |
503 | 503 | | |
504 | | - | |
| 504 | + | |
505 | 505 | | |
506 | 506 | | |
507 | | - | |
| 507 | + | |
508 | 508 | | |
509 | 509 | | |
510 | 510 | | |
| |||
835 | 835 | | |
836 | 836 | | |
837 | 837 | | |
838 | | - | |
| 838 | + | |
839 | 839 | | |
840 | 840 | | |
841 | 841 | | |
| |||
910 | 910 | | |
911 | 911 | | |
912 | 912 | | |
913 | | - | |
| 913 | + | |
914 | 914 | | |
915 | 915 | | |
916 | 916 | | |
917 | 917 | | |
918 | 918 | | |
919 | 919 | | |
920 | 920 | | |
921 | | - | |
| 921 | + | |
922 | 922 | | |
923 | 923 | | |
924 | 924 | | |
| |||
932 | 932 | | |
933 | 933 | | |
934 | 934 | | |
935 | | - | |
| 935 | + | |
936 | 936 | | |
937 | 937 | | |
938 | 938 | | |
939 | 939 | | |
940 | 940 | | |
941 | 941 | | |
942 | 942 | | |
943 | | - | |
| 943 | + | |
944 | 944 | | |
945 | 945 | | |
946 | 946 | | |
| |||
1000 | 1000 | | |
1001 | 1001 | | |
1002 | 1002 | | |
1003 | | - | |
| 1003 | + | |
1004 | 1004 | | |
1005 | 1005 | | |
1006 | 1006 | | |
| |||
1026 | 1026 | | |
1027 | 1027 | | |
1028 | 1028 | | |
1029 | | - | |
| 1029 | + | |
1030 | 1030 | | |
1031 | 1031 | | |
1032 | | - | |
| 1032 | + | |
1033 | 1033 | | |
1034 | 1034 | | |
1035 | 1035 | | |
| |||
1115 | 1115 | | |
1116 | 1116 | | |
1117 | 1117 | | |
1118 | | - | |
| 1118 | + | |
1119 | 1119 | | |
1120 | 1120 | | |
1121 | 1121 | | |
| |||
1147 | 1147 | | |
1148 | 1148 | | |
1149 | 1149 | | |
1150 | | - | |
| 1150 | + | |
1151 | 1151 | | |
1152 | 1152 | | |
1153 | 1153 | | |
1154 | | - | |
| 1154 | + | |
1155 | 1155 | | |
1156 | 1156 | | |
1157 | 1157 | | |
| |||
0 commit comments