Commit 1872e75
bpf: Fix race in devmap on PREEMPT_RT
On PREEMPT_RT kernels, the per-CPU xdp_dev_bulk_queue (bq) can be
accessed concurrently by multiple preemptible tasks on the same CPU.
The original code assumes bq_enqueue() and __dev_flush() run atomically
with respect to each other on the same CPU, relying on
local_bh_disable() to prevent preemption. However, on PREEMPT_RT,
local_bh_disable() only calls migrate_disable() (when
PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable
preemption, which allows CFS scheduling to preempt a task during
bq_xmit_all(), enabling another task on the same CPU to enter
bq_enqueue() and operate on the same per-CPU bq concurrently.
This leads to several races:
1. Double-free / use-after-free on bq->q[]: bq_xmit_all() snapshots
cnt = bq->count, then iterates bq->q[0..cnt-1] to transmit frames.
If preempted after the snapshot, a second task can call bq_enqueue()
-> bq_xmit_all() on the same bq, transmitting (and freeing) the
same frames. When the first task resumes, it operates on stale
pointers in bq->q[], causing use-after-free.
2. bq->count and bq->q[] corruption: concurrent bq_enqueue() modifying
bq->count and bq->q[] while bq_xmit_all() is reading them.
3. dev_rx/xdp_prog teardown race: __dev_flush() clears bq->dev_rx and
bq->xdp_prog after bq_xmit_all(). If preempted between
bq_xmit_all() return and bq->dev_rx = NULL, a preempting
bq_enqueue() sees dev_rx still set (non-NULL), skips adding bq to
the flush_list, and enqueues a frame. When __dev_flush() resumes,
it clears dev_rx and removes bq from the flush_list, orphaning the
newly enqueued frame.
4. __list_del_clearprev() on flush_node: similar to the cpumap race,
both tasks can call __list_del_clearprev() on the same flush_node,
the second dereferences the prev pointer already set to NULL.
The race between task A (__dev_flush -> bq_xmit_all) and task B
(bq_enqueue -> bq_xmit_all) on the same CPU:
Task A (xdp_do_flush) Task B (ndo_xdp_xmit redirect)
---------------------- --------------------------------
__dev_flush(flush_list)
bq_xmit_all(bq)
cnt = bq->count /* e.g. 16 */
/* start iterating bq->q[] */
<-- CFS preempts Task A -->
bq_enqueue(dev, xdpf)
bq->count == DEV_MAP_BULK_SIZE
bq_xmit_all(bq, 0)
cnt = bq->count /* same 16! */
ndo_xdp_xmit(bq->q[])
/* frames freed by driver */
bq->count = 0
<-- Task A resumes -->
ndo_xdp_xmit(bq->q[])
/* use-after-free: frames already freed! */
Fix this by adding a local_lock_t to xdp_dev_bulk_queue and acquiring
it in bq_enqueue() and __dev_flush(). These paths already run under
local_bh_disable(), so use local_lock_nested_bh() which on non-RT is
a pure annotation with no overhead, and on PREEMPT_RT provides a
per-CPU sleeping lock that serializes access to the bq.
Fixes: 3253cb4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20260225121459.183121-3-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>1 parent 869c63d commit 1872e75
1 file changed
Lines changed: 21 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| 48 | + | |
48 | 49 | | |
49 | 50 | | |
50 | 51 | | |
| |||
60 | 61 | | |
61 | 62 | | |
62 | 63 | | |
| 64 | + | |
63 | 65 | | |
64 | 66 | | |
65 | 67 | | |
| |||
381 | 383 | | |
382 | 384 | | |
383 | 385 | | |
| 386 | + | |
| 387 | + | |
384 | 388 | | |
385 | 389 | | |
386 | 390 | | |
| |||
425 | 429 | | |
426 | 430 | | |
427 | 431 | | |
| 432 | + | |
428 | 433 | | |
429 | 434 | | |
430 | 435 | | |
431 | 436 | | |
| 437 | + | |
432 | 438 | | |
433 | 439 | | |
434 | 440 | | |
| |||
451 | 457 | | |
452 | 458 | | |
453 | 459 | | |
454 | | - | |
| 460 | + | |
| 461 | + | |
455 | 462 | | |
456 | 463 | | |
457 | 464 | | |
458 | 465 | | |
459 | | - | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
460 | 470 | | |
461 | 471 | | |
462 | 472 | | |
| |||
477 | 487 | | |
478 | 488 | | |
479 | 489 | | |
| 490 | + | |
| 491 | + | |
480 | 492 | | |
481 | 493 | | |
482 | 494 | | |
| |||
1127 | 1139 | | |
1128 | 1140 | | |
1129 | 1141 | | |
1130 | | - | |
1131 | | - | |
| 1142 | + | |
| 1143 | + | |
| 1144 | + | |
| 1145 | + | |
| 1146 | + | |
| 1147 | + | |
| 1148 | + | |
1132 | 1149 | | |
1133 | 1150 | | |
1134 | 1151 | | |
| |||
0 commit comments