Merge branch 'bpf-improve-linked-register-tracking'

Alexei Starovoitov · Alexei Starovoitov · commit 6e951a9b1af5 · 2026-02-04T13:35:29.000-08:00
Puranjay Mohan says: ==================== bpf: Improve linked register tracking V3: https://lore.kernel.org/all/20260203222643.994713-1-puranjay@kernel.org/ Changes in v3->v4: - Add a call to reg_bounds_sync() in sync_linked_regs() to sync bounds after alu op. - Add a __sink(path[0]); in the C selftest so compiler doesn't say "error: variable 'path' set but not used" V2: https://lore.kernel.org/all/20260113152529.3217648-1-puranjay@kernel.org/ Changes in v2->v3: - Added another selftest showing a real usage pattern - Rebased on bpf-next/master v1: https://lore.kernel.org/bpf/20260107203941.1063754-1-puranjay@kernel.org/ Changes in v1->v2: - Add support for alu32 operations in linked register tracking (Alexei) - Squash the selftest fix with the first patch (Eduard) - Add more selftests to detect edge cases This series extends the BPF verifier's linked register tracking to handle negative offsets, BPF_SUB operations, and alu32 operations, enabling better bounds propagation for common arithmetic patterns. The verifier previously only tracked positive constant deltas between linked registers using BPF_ADD. This meant patterns using negative offsets or subtraction couldn't benefit from bounds propagation: void alu32_negative_offset(void) { volatile char path[5]; volatile int offset = bpf_get_prandom_u32(); int off = offset; if (off >= 5 && off < 10) path[off - 5] = '.'; } this gets compiled to: 0000000000000478 <alu32_negative_offset>: 143: call 0x7 144: *(u32 *)(r10 - 0xc) = w0 145: w1 = *(u32 *)(r10 - 0xc) 146: w2 = w1 // w2 and w1 share the same id 147: w2 += -0x5 // verifier knows w1 = w2 + 5 148: if w2 > 0x4 goto +0x5 <L0> // in fall-through: verifier knows w2 ∈ [0,4] => w1 ∈ [5, 9] 149: r2 = r10 150: r2 += -0x5 // r2 = fp - 5 151: r2 += r1 // r2 = fp - 5 + r1 (∈ [5, 9]) => r2 ∈ [fp, fp + 4] 152: w1 = 0x2e 153: *(u8 *)(r2 - 0x5) = w1 // r2 ∈ [fp, fp + 4] => r2 - 5 ∈ [fp - 5, fp - 1] <L0>: 154: exit After the changes, the verifier could link 32-bit scalars and also supported -ve offsets for linking: 146: w2 = w1 147: w2 += -0x5 It allowed the verifier to correctly propagate bounds, without the changes in this patchset, verifier would reject this program with: invalid unbounded variable-offset write to stack R2 This program has been added as a selftest in the second patch. Veristat comparison on programs from sched_ext, selftests, and some meta internal programs: Scx Progs File Program Verdict (A) Verdict (B) Verdict (DIFF) Insns (A) Insns (B) Insns (DIFF) ----------------- ---------------- ----------- ----------- -------------- --------- --------- ------------- scx_layered.bpf.o layered_runnable success success MATCH 5674 6077 +403 (+7.10%) FB Progs File Program Verdict (A) Verdict (B) Verdict (DIFF) Insns (A) Insns (B) Insns (DIFF) ------------ ---------------- ----------- ----------- -------------- --------- --------- ----------------- bpf232.bpf.o layered_dump success success MATCH 1151 1218 +67 (+5.82%) bpf257.bpf.o layered_runnable success success MATCH 5743 6143 +400 (+6.97%) bpf252.bpf.o layered_runnable success success MATCH 5677 6075 +398 (+7.01%) bpf227.bpf.o layered_dump success success MATCH 915 982 +67 (+7.32%) bpf239.bpf.o layered_runnable success success MATCH 5459 5861 +402 (+7.36%) bpf246.bpf.o layered_runnable success success MATCH 5562 6008 +446 (+8.02%) bpf229.bpf.o layered_runnable success success MATCH 2559 3011 +452 (+17.66%) bpf231.bpf.o layered_runnable success success MATCH 2559 3011 +452 (+17.66%) bpf234.bpf.o layered_runnable success success MATCH 2549 3001 +452 (+17.73%) bpf019.bpf.o do_sendmsg success success MATCH 124823 153523 +28700 (+22.99%) bpf019.bpf.o do_parse success success MATCH 124809 153509 +28700 (+23.00%) bpf227.bpf.o layered_runnable success success MATCH 1915 2356 +441 (+23.03%) bpf228.bpf.o layered_runnable success success MATCH 1700 2152 +452 (+26.59%) bpf232.bpf.o layered_runnable success success MATCH 1499 1951 +452 (+30.15%) bpf312.bpf.o mount_exit success success MATCH 19253 62883 +43630 (+226.61%) bpf312.bpf.o umount_exit success success MATCH 19253 62883 +43630 (+226.61%) bpf311.bpf.o mount_exit success success MATCH 19226 62863 +43637 (+226.97%) bpf311.bpf.o umount_exit success success MATCH 19226 62863 +43637 (+226.97%) The above four programs have specific patters that make the verifier explore a lot more states: for (; depth < MAX_DIR_DEPTH; depth++) { const unsigned char* name = BPF_CORE_READ(dentry, d_name.name); if (offset >= MAX_PATH_LEN - MAX_DIR_LEN) { return depth; } int len = bpf_probe_read_kernel_str(&path[offset], MAX_DIR_LEN, name); offset += len; if (len == MAX_DIR_LEN) { if (offset - 2 < MAX_PATH_LEN) { // <---- (a) path[offset - 2] = '.'; } if (offset - 3 < MAX_PATH_LEN) { // <---- (b) path[offset - 3] = '.'; } if (offset - 4 < MAX_PATH_LEN) { // <---- (c) path[offset - 4] = '.'; } } } When at some depth == N false branches of conditions (a), (b) and (c) are scheduled for verification, constraints for offset at depth == N+1 are: 1. offset >= MAX_PATH_LEN + 2 2. offset >= MAX_PATH_LEN + 3 3. offset >= MAX_PATH_LEN + 4 (visited before others) And after offset += len it becomes: 1. offset >= MAX_PATH_LEN - 4093 2. offset >= MAX_PATH_LEN - 4092 3. offset >= MAX_PATH_LEN - 4091 (visited before others) Because of the DFS states exploration logic, the states above are visited in order 3, 2, 1; 3 is not a subset of 2 and 1 is not a subset of 2, so pruning logic does not kick in. Previously this was not a problem, because range for offset was not propagated through the statements (a), (b), (c). As the root cause of this regression is understood, this is not a blocker for this change. Selftest Progs File Program Verdict (A) Verdict (B) Verdict (DIFF) Insns (A) Insns (B) Insns (DIFF) ---------------------------------- ------------------------ ----------- ----------- -------------- --------- --------- -------------- linked_list_peek.bpf.o list_peek success success MATCH 152 88 -64 (-42.11%) verifier_iterating_callbacks.bpf.o cond_break2 success success MATCH 110 88 -22 (-20.00%) These are the added selftests that failed earlier but are passing now: verifier_linked_scalars.bpf.o alu32_negative_offset failure success MISMATCH 11 13 +2 (+18.18%) verifier_linked_scalars.bpf.o scalars_alu32_big_offset failure success MISMATCH 7 10 +3 (+42.86%) verifier_linked_scalars.bpf.o scalars_neg_alu32_add failure success MISMATCH 7 10 +3 (+42.86%) verifier_linked_scalars.bpf.o scalars_neg_alu32_sub failure success MISMATCH 7 10 +3 (+42.86%) verifier_linked_scalars.bpf.o scalars_neg failure success MISMATCH 7 10 +3 (+42.86%) verifier_linked_scalars.bpf.o scalars_neg_sub failure success MISMATCH 7 10 +3 (+42.86%) verifier_linked_scalars.bpf.o scalars_sub_neg_imm failure success MISMATCH 7 10 +3 (+42.86%) iters.bpf.o iter_obfuscate_counter success success MATCH 83 119 +36 (+43.37%) bpf_cubic.bpf.o bpf_cubic_acked success success MATCH 243 430 +187 (+76.95%) ==================== Link: https://patch.msgid.link/20260204151741.2678118-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
@@ -147,8 +147,12 @@ struct bpf_reg_state {
 	 * registers. Example:
 	 * r1 = r2;    both will have r1->id == r2->id == N
 	 * r1 += 10;   r1->id == N | BPF_ADD_CONST and r1->off == 10
+	 * r3 = r2;    both will have r3->id == r2->id == N
+	 * w3 += 10;   r3->id == N | BPF_ADD_CONST32 and r3->off == 10
 	 */
-#define BPF_ADD_CONST (1U << 31)
+#define BPF_ADD_CONST64 (1U << 31)
+#define BPF_ADD_CONST32 (1U << 30)
+#define BPF_ADD_CONST (BPF_ADD_CONST64 | BPF_ADD_CONST32)
 	u32 id;
 	/* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned
 	 * from a pointer-cast helper, bpf_sk_fullsock() and
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
@@ -16209,6 +16209,13 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 		verbose(env, "verifier internal error: no src_reg\n");
 		return -EFAULT;
 	}
+	/*
+	 * For alu32 linked register tracking, we need to check dst_reg's
+	 * umax_value before the ALU operation. After adjust_scalar_min_max_vals(),
+	 * alu32 ops will have zero-extended the result, making umax_value <= U32_MAX.
+	 */
+	u64 dst_umax = dst_reg->umax_value;
+
 	err = adjust_scalar_min_max_vals(env, insn, dst_reg, *src_reg);
 	if (err)
 		return err;
@@ -16218,26 +16225,44 @@ static int adjust_reg_min_max_vals(struct bpf_verifier_env *env,
 	 * r1 += 0x1
 	 * if r2 < 1000 goto ...
 	 * use r1 in memory access
-	 * So for 64-bit alu remember constant delta between r2 and r1 and
-	 * update r1 after 'if' condition.
+	 * So remember constant delta between r2 and r1 and update r1 after
+	 * 'if' condition.
 	 */
 	if (env->bpf_capable &&
-	    BPF_OP(insn->code) == BPF_ADD && !alu32 &&
-	    dst_reg->id && is_reg_const(src_reg, false)) {
-		u64 val = reg_const_value(src_reg, false);
+	    (BPF_OP(insn->code) == BPF_ADD || BPF_OP(insn->code) == BPF_SUB) &&
+	    dst_reg->id && is_reg_const(src_reg, alu32)) {
+		u64 val = reg_const_value(src_reg, alu32);
+		s32 off;
+
+		if (!alu32 && ((s64)val < S32_MIN || (s64)val > S32_MAX))
+			goto clear_id;
+
+		if (alu32 && (dst_umax > U32_MAX))
+			goto clear_id;
 
-		if ((dst_reg->id & BPF_ADD_CONST) ||
-		    /* prevent overflow in sync_linked_regs() later */
-		    val > (u32)S32_MAX) {
+		off = (s32)val;
+
+		if (BPF_OP(insn->code) == BPF_SUB) {
+			/* Negating S32_MIN would overflow */
+			if (off == S32_MIN)
+				goto clear_id;
+			off = -off;
+		}
+
+		if (dst_reg->id & BPF_ADD_CONST) {
 			/*
 			 * If the register already went through rX += val
 			 * we cannot accumulate another val into rx->off.
 			 */
+clear_id:
 			dst_reg->off = 0;
 			dst_reg->id = 0;
 		} else {
-			dst_reg->id |= BPF_ADD_CONST;
-			dst_reg->off = val;
+			if (alu32)
+				dst_reg->id |= BPF_ADD_CONST32;
+			else
+				dst_reg->id |= BPF_ADD_CONST64;
+			dst_reg->off = off;
 		}
 	} else {
 		/*
@@ -17334,7 +17359,7 @@ static void sync_linked_regs(struct bpf_verifier_env *env, struct bpf_verifier_s
 			u32 saved_id = reg->id;
 
 			fake_reg.type = SCALAR_VALUE;
-			__mark_reg_known(&fake_reg, (s32)reg->off - (s32)known_reg->off);
+			__mark_reg_known(&fake_reg, (s64)reg->off - (s64)known_reg->off);
 
 			/* reg = known_reg; reg += delta */
 			copy_register_state(reg, known_reg);
@@ -17349,6 +17374,9 @@ static void sync_linked_regs(struct bpf_verifier_env *env, struct bpf_verifier_s
 			scalar32_min_max_add(reg, &fake_reg);
 			scalar_min_max_add(reg, &fake_reg);
 			reg->var_off = tnum_add(reg->var_off, fake_reg.var_off);
+			if (known_reg->id & BPF_ADD_CONST32)
+				zext_32_to_64(reg);
+			reg_bounds_sync(reg);
 		}
 		if (e->is_reg)
 			mark_reg_scratched(env, e->regno);
diff --git a/tools/testing/selftests/bpf/progs/verifier_bounds.c b/tools/testing/selftests/bpf/progs/verifier_bounds.c
@@ -1477,7 +1477,7 @@ __naked void sub64_full_overflow(void)
 SEC("socket")
 __description("64-bit subtraction, partial overflow, result in unbounded reg")
 __success __log_level(2)
-__msg("3: (1f) r3 -= r2 {{.*}} R3=scalar()")
+__msg("3: (1f) r3 -= r2 {{.*}} R3=scalar(id=1-1)")
 __retval(0)
 __naked void sub64_partial_overflow(void)
 {
diff --git a/tools/testing/selftests/bpf/progs/verifier_linked_scalars.c b/tools/testing/selftests/bpf/progs/verifier_linked_scalars.c

Original file line number	Diff line number	Diff line change
`@@ -1477,7 +1477,7 @@ __naked void sub64_full_overflow(void)`
`1477`	`1477`	`SEC("socket")`
`1478`	`1478`	`__description("64-bit subtraction, partial overflow, result in unbounded reg")`
`1479`	`1479`	`__success __log_level(2)`
`1480`		`-__msg("3: (1f) r3 -= r2 {{.*}} R3=scalar()")`
	`1480`	`+__msg("3: (1f) r3 -= r2 {{.*}} R3=scalar(id=1-1)")`
`1481`	`1481`	`__retval(0)`
`1482`	`1482`	`__naked void sub64_partial_overflow(void)`
`1483`	`1483`	`{`