Commit 869c63d
bpf: Fix race in cpumap on PREEMPT_RT
On PREEMPT_RT kernels, the per-CPU xdp_bulk_queue (bq) can be accessed
concurrently by multiple preemptible tasks on the same CPU.
The original code assumes bq_enqueue() and __cpu_map_flush() run
atomically with respect to each other on the same CPU, relying on
local_bh_disable() to prevent preemption. However, on PREEMPT_RT,
local_bh_disable() only calls migrate_disable() (when
PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable
preemption, which allows CFS scheduling to preempt a task during
bq_flush_to_queue(), enabling another task on the same CPU to enter
bq_enqueue() and operate on the same per-CPU bq concurrently.
This leads to several races:
1. Double __list_del_clearprev(): after bq->count is reset in
bq_flush_to_queue(), a preempting task can call bq_enqueue() ->
bq_flush_to_queue() on the same bq when bq->count reaches
CPU_MAP_BULK_SIZE. Both tasks then call __list_del_clearprev()
on the same bq->flush_node, the second call dereferences the
prev pointer that was already set to NULL by the first.
2. bq->count and bq->q[] races: concurrent bq_enqueue() can corrupt
the packet queue while bq_flush_to_queue() is processing it.
The race between task A (__cpu_map_flush -> bq_flush_to_queue) and
task B (bq_enqueue -> bq_flush_to_queue) on the same CPU:
Task A (xdp_do_flush) Task B (cpu_map_enqueue)
---------------------- ------------------------
bq_flush_to_queue(bq)
spin_lock(&q->producer_lock)
/* flush bq->q[] to ptr_ring */
bq->count = 0
spin_unlock(&q->producer_lock)
bq_enqueue(rcpu, xdpf)
<-- CFS preempts Task A --> bq->q[bq->count++] = xdpf
/* ... more enqueues until full ... */
bq_flush_to_queue(bq)
spin_lock(&q->producer_lock)
/* flush to ptr_ring */
spin_unlock(&q->producer_lock)
__list_del_clearprev(flush_node)
/* sets flush_node.prev = NULL */
<-- Task A resumes -->
__list_del_clearprev(flush_node)
flush_node.prev->next = ...
/* prev is NULL -> kernel oops */
Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it
in bq_enqueue() and __cpu_map_flush(). These paths already run under
local_bh_disable(), so use local_lock_nested_bh() which on non-RT is
a pure annotation with no overhead, and on PREEMPT_RT provides a
per-CPU sleeping lock that serializes access to the bq.
To reproduce, insert an mdelay(100) between bq->count = 0 and
__list_del_clearprev() in bq_flush_to_queue(), then run reproducer
provided by syzkaller.
Fixes: 3253cb4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@google.com/T/
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20260225121459.183121-2-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>1 parent 5263e30 commit 869c63d
1 file changed
Lines changed: 15 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| 32 | + | |
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| |||
52 | 53 | | |
53 | 54 | | |
54 | 55 | | |
| 56 | + | |
55 | 57 | | |
56 | 58 | | |
57 | 59 | | |
| |||
451 | 453 | | |
452 | 454 | | |
453 | 455 | | |
| 456 | + | |
454 | 457 | | |
455 | 458 | | |
456 | 459 | | |
| |||
722 | 725 | | |
723 | 726 | | |
724 | 727 | | |
| 728 | + | |
| 729 | + | |
725 | 730 | | |
726 | 731 | | |
727 | 732 | | |
| |||
749 | 754 | | |
750 | 755 | | |
751 | 756 | | |
752 | | - | |
| 757 | + | |
| 758 | + | |
753 | 759 | | |
754 | 760 | | |
755 | 761 | | |
756 | | - | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
757 | 766 | | |
758 | 767 | | |
759 | 768 | | |
| |||
774 | 783 | | |
775 | 784 | | |
776 | 785 | | |
| 786 | + | |
| 787 | + | |
777 | 788 | | |
778 | 789 | | |
779 | 790 | | |
| |||
810 | 821 | | |
811 | 822 | | |
812 | 823 | | |
| 824 | + | |
813 | 825 | | |
| 826 | + | |
814 | 827 | | |
815 | 828 | | |
816 | 829 | | |
| |||
0 commit comments