Skip to content

Commit ec0ded2

Browse files
author
Alexei Starovoitov
committed
Merge branch 'bpf-refcount-followups-3-bpf_mem_free_rcu-refcounted-nodes'
Dave Marchevsky says: ==================== BPF Refcount followups 3: bpf_mem_free_rcu refcounted nodes This series is the third of three (or more) followups to address issues in the bpf_refcount shared ownership implementation discovered by Kumar. This series addresses the use-after-free scenario described in [0]. The first followup series ([1]) also attempted to address the same use-after-free, but only got rid of the splat without addressing the underlying issue. After this series the underyling issue is fixed and bpf_refcount_acquire can be re-enabled. The main fix here is migration of bpf_obj_drop to use bpf_mem_free_rcu. To understand why this fixes the issue, let us consider the example interleaving provided by Kumar in [0]: CPU 0 CPU 1 n = bpf_obj_new lock(lock1) bpf_rbtree_add(rbtree1, n) m = bpf_rbtree_acquire(n) unlock(lock1) kptr_xchg(map, m) // move to map // at this point, refcount = 2 m = kptr_xchg(map, NULL) lock(lock2) lock(lock1) bpf_rbtree_add(rbtree2, m) p = bpf_rbtree_first(rbtree1) if (!RB_EMPTY_NODE) bpf_obj_drop_impl(m) // A bpf_rbtree_remove(rbtree1, p) unlock(lock1) bpf_obj_drop(p) // B bpf_refcount_acquire(m) // use-after-free ... Before this series, bpf_obj_drop returns memory to the allocator using bpf_mem_free. At this point (B in the example) there might be some non-owning references to that memory which the verifier believes are valid, but where the underlying memory was reused for some other allocation. Commit 7793fc3 ("bpf: Make bpf_refcount_acquire fallible for non-owning refs") attempted to fix this by doing refcount_inc_non_zero on refcount_acquire in instead of refcount_inc under the assumption that preventing erroneous incr-on-0 would be sufficient. This isn't true, though: refcount_inc_non_zero must *check* if the refcount is zero, and the memory it's checking could have been reused, so the check may look at and incr random reused bytes. If we wait to reuse this memory until all non-owning refs that could point to it are gone, there is no possibility of this scenario happening. Migrating bpf_obj_drop to use bpf_mem_free_rcu for refcounted nodes accomplishes this. For such nodes, the validity of their underlying memory is now tied to RCU critical section. This matches MEM_RCU trustedness expectations, so the series takes the opportunity to more explicitly mark this trustedness state. The functional effects of trustedness changes here are rather small. This is largely due to local kptrs having separate verifier handling - with implicit trustedness assumptions - than arbitrary kptrs. Regardless, let's take the opportunity to move towards a world where trustedness is more explicitly handled. Changelog: v1 -> v2: https://lore.kernel.org/bpf/20230801203630.3581291-1-davemarchevsky@fb.com/ Patch 1 ("bpf: Ensure kptr_struct_meta is non-NULL for collection insert and refcount_acquire") * Spent some time experimenting with a better approach as per convo w/ Yonghong on v1's patch. It started getting too complex, so left unchanged for now. Yonghong was fine with this approach being shipped. Patch 2 ("bpf: Consider non-owning refs trusted") * Add Yonghong ack Patch 3 ("bpf: Use bpf_mem_free_rcu when bpf_obj_dropping refcounted nodes") * Add Yonghong ack Patch 4 ("bpf: Reenable bpf_refcount_acquire") * Add Yonghong ack Patch 5 ("bpf: Consider non-owning refs to refcounted nodes RCU protected") * Undo a nonfunctional whitespace change that shouldn't have been included (Yonghong) * Better logging message when complaining about rcu_read_{lock,unlock} in rbtree cb (Alexei) * Don't invalidate_non_owning_refs when processing bpf_rcu_read_unlock (Yonghong, Alexei) Patch 6 ("[RFC] bpf: Allow bpf_spin_{lock,unlock} in sleepable prog's RCU CS") * preempt_{disable,enable} in __bpf_spin_{lock,unlock} (Alexei) * Due to this we can consider spin_lock CS an RCU-sched read-side CS (per RCU/Design/Requirements/Requirements.rst). Modify in_rcu_cs accordingly. * no need to check for !in_rcu_cs before allowing bpf_spin_{lock,unlock} (Alexei) * RFC tag removed and renamed to "bpf: Allow bpf_spin_{lock,unlock} in sleepable progs" Patch 7 ("selftests/bpf: Add tests for rbtree API interaction in sleepable progs") * Remove "no explicit bpf_rcu_read_lock" failure test, add similar success test (Alexei) Summary of patch contents, with sub-bullets being leading questions and comments I think are worth reviewer attention: * Patches 1 and 2 are moreso documententation - and enforcement, in patch 1's case - of existing semantics / expectations * Patch 3 changes bpf_obj_drop behavior for refcounted nodes such that their underlying memory is not reused until RCU grace period elapses * Perhaps it makes sense to move to mem_free_rcu for _all_ non-owning refs in the future, not just refcounted. This might allow custom non-owning ref lifetime + invalidation logic to be entirely subsumed by MEM_RCU handling. IMO this needs a bit more thought and should be tackled outside of a fix series, so it's not attempted here. * Patch 4 re-enables bpf_refcount_acquire as changes in patch 3 fix the remaining use-after-free * One might expect this patch to be last in the series, or last before selftest changes. Patches 5 and 6 don't change verification or runtime behavior for existing BPF progs, though. * Patch 5 brings the verifier's understanding of refcounted node trustedness in line with Patch 4's changes * Patch 6 allows some bpf_spin_{lock, unlock} calls in sleepable progs. Marked RFC for a few reasons: * bpf_spin_{lock,unlock} haven't been usable in sleepable progs since before the introduction of bpf linked list and rbtree. As such this feels more like a new feature that may not belong in this fixes series. * Patch 7 adds tests [0]: https://lore.kernel.org/bpf/atfviesiidev4hu53hzravmtlau3wdodm2vqs7rd7tnwft34e3@xktodqeqevir/ [1]: https://lore.kernel.org/bpf/20230602022647.1571784-1-davemarchevsky@fb.com/ ==================== Link: https://lore.kernel.org/r/20230821193311.3290257-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 parents 9e3b47a + 312aa5b commit ec0ded2

7 files changed

Lines changed: 165 additions & 14 deletions

File tree

include/linux/bpf.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -653,7 +653,8 @@ enum bpf_type_flag {
653653
MEM_RCU = BIT(13 + BPF_BASE_TYPE_BITS),
654654

655655
/* Used to tag PTR_TO_BTF_ID | MEM_ALLOC references which are non-owning.
656-
* Currently only valid for linked-list and rbtree nodes.
656+
* Currently only valid for linked-list and rbtree nodes. If the nodes
657+
* have a bpf_refcount_field, they must be tagged MEM_RCU as well.
657658
*/
658659
NON_OWN_REF = BIT(14 + BPF_BASE_TYPE_BITS),
659660

include/linux/bpf_verifier.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -745,7 +745,7 @@ static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
745745
}
746746
}
747747

748-
#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED)
748+
#define BPF_REG_TRUSTED_MODIFIERS (MEM_ALLOC | PTR_TRUSTED | NON_OWN_REF)
749749

750750
static inline bool bpf_type_has_unsafe_modifiers(u32 type)
751751
{

kernel/bpf/helpers.c

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,7 @@ static inline void __bpf_spin_lock(struct bpf_spin_lock *lock)
286286
compiletime_assert(u.val == 0, "__ARCH_SPIN_LOCK_UNLOCKED not 0");
287287
BUILD_BUG_ON(sizeof(*l) != sizeof(__u32));
288288
BUILD_BUG_ON(sizeof(*lock) != sizeof(__u32));
289+
preempt_disable();
289290
arch_spin_lock(l);
290291
}
291292

@@ -294,6 +295,7 @@ static inline void __bpf_spin_unlock(struct bpf_spin_lock *lock)
294295
arch_spinlock_t *l = (void *)lock;
295296

296297
arch_spin_unlock(l);
298+
preempt_enable();
297299
}
298300

299301
#else
@@ -1913,7 +1915,11 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec)
19131915

19141916
if (rec)
19151917
bpf_obj_free_fields(rec, p);
1916-
bpf_mem_free(&bpf_global_ma, p);
1918+
1919+
if (rec && rec->refcount_off >= 0)
1920+
bpf_mem_free_rcu(&bpf_global_ma, p);
1921+
else
1922+
bpf_mem_free(&bpf_global_ma, p);
19171923
}
19181924

19191925
__bpf_kfunc void bpf_obj_drop_impl(void *p__alloc, void *meta__ign)

kernel/bpf/verifier.c

Lines changed: 30 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5064,7 +5064,9 @@ static int map_kptr_match_type(struct bpf_verifier_env *env,
50645064
*/
50655065
static bool in_rcu_cs(struct bpf_verifier_env *env)
50665066
{
5067-
return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable;
5067+
return env->cur_state->active_rcu_lock ||
5068+
env->cur_state->active_lock.ptr ||
5069+
!env->prog->aux->sleepable;
50685070
}
50695071

50705072
/* Once GCC supports btf_type_tag the following mechanism will be replaced with tag check */
@@ -8007,6 +8009,7 @@ int check_func_arg_reg_off(struct bpf_verifier_env *env,
80078009
case PTR_TO_BTF_ID | PTR_TRUSTED:
80088010
case PTR_TO_BTF_ID | MEM_RCU:
80098011
case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF:
8012+
case PTR_TO_BTF_ID | MEM_ALLOC | NON_OWN_REF | MEM_RCU:
80108013
/* When referenced PTR_TO_BTF_ID is passed to release function,
80118014
* its fixed offset must be 0. In the other cases, fixed offset
80128015
* can be non-zero. This was already checked above. So pass
@@ -10473,6 +10476,7 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env,
1047310476
static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state *reg)
1047410477
{
1047510478
struct bpf_verifier_state *state = env->cur_state;
10479+
struct btf_record *rec = reg_btf_record(reg);
1047610480

1047710481
if (!state->active_lock.ptr) {
1047810482
verbose(env, "verifier internal error: ref_set_non_owning w/o active lock\n");
@@ -10485,6 +10489,9 @@ static int ref_set_non_owning(struct bpf_verifier_env *env, struct bpf_reg_state
1048510489
}
1048610490

1048710491
reg->type |= NON_OWN_REF;
10492+
if (rec->refcount_off >= 0)
10493+
reg->type |= MEM_RCU;
10494+
1048810495
return 0;
1048910496
}
1049010497

@@ -11217,10 +11224,7 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
1121711224
verbose(env, "arg#%d doesn't point to a type with bpf_refcount field\n", i);
1121811225
return -EINVAL;
1121911226
}
11220-
if (rec->refcount_off >= 0) {
11221-
verbose(env, "bpf_refcount_acquire calls are disabled for now\n");
11222-
return -EINVAL;
11223-
}
11227+
1122411228
meta->arg_btf = reg->btf;
1122511229
meta->arg_btf_id = reg->btf_id;
1122611230
break;
@@ -11325,6 +11329,11 @@ static int check_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
1132511329
struct bpf_func_state *state;
1132611330
struct bpf_reg_state *reg;
1132711331

11332+
if (in_rbtree_lock_required_cb(env) && (rcu_lock || rcu_unlock)) {
11333+
verbose(env, "Calling bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
11334+
return -EACCES;
11335+
}
11336+
1132811337
if (rcu_lock) {
1132911338
verbose(env, "nested rcu read lock (kernel function %s)\n", func_name);
1133011339
return -EINVAL;
@@ -16687,7 +16696,8 @@ static int do_check(struct bpf_verifier_env *env)
1668716696
return -EINVAL;
1668816697
}
1668916698

16690-
if (env->cur_state->active_rcu_lock) {
16699+
if (env->cur_state->active_rcu_lock &&
16700+
!in_rbtree_lock_required_cb(env)) {
1669116701
verbose(env, "bpf_rcu_read_unlock is missing\n");
1669216702
return -EINVAL;
1669316703
}
@@ -16967,11 +16977,6 @@ static int check_map_prog_compatibility(struct bpf_verifier_env *env,
1696716977
verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
1696816978
return -EINVAL;
1696916979
}
16970-
16971-
if (prog->aux->sleepable) {
16972-
verbose(env, "sleepable progs cannot use bpf_spin_lock yet\n");
16973-
return -EINVAL;
16974-
}
1697516980
}
1697616981

1697716982
if (btf_record_has_field(map->record, BPF_TIMER)) {
@@ -18276,13 +18281,21 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
1827618281
struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
1827718282
struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_2, (long)kptr_struct_meta) };
1827818283

18284+
if (desc->func_id == special_kfunc_list[KF_bpf_refcount_acquire_impl] &&
18285+
!kptr_struct_meta) {
18286+
verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
18287+
insn_idx);
18288+
return -EFAULT;
18289+
}
18290+
1827918291
insn_buf[0] = addr[0];
1828018292
insn_buf[1] = addr[1];
1828118293
insn_buf[2] = *insn;
1828218294
*cnt = 3;
1828318295
} else if (desc->func_id == special_kfunc_list[KF_bpf_list_push_back_impl] ||
1828418296
desc->func_id == special_kfunc_list[KF_bpf_list_push_front_impl] ||
1828518297
desc->func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) {
18298+
struct btf_struct_meta *kptr_struct_meta = env->insn_aux_data[insn_idx].kptr_struct_meta;
1828618299
int struct_meta_reg = BPF_REG_3;
1828718300
int node_offset_reg = BPF_REG_4;
1828818301

@@ -18292,6 +18305,12 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
1829218305
node_offset_reg = BPF_REG_5;
1829318306
}
1829418307

18308+
if (!kptr_struct_meta) {
18309+
verbose(env, "verifier internal error: kptr_struct_meta expected at insn_idx %d\n",
18310+
insn_idx);
18311+
return -EFAULT;
18312+
}
18313+
1829518314
__fixup_collection_insert_kfunc(&env->insn_aux_data[insn_idx], struct_meta_reg,
1829618315
node_offset_reg, insn, insn_buf, cnt);
1829718316
} else if (desc->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx] ||

tools/testing/selftests/bpf/prog_tests/refcounted_kptr.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,38 @@
99

1010
void test_refcounted_kptr(void)
1111
{
12+
RUN_TESTS(refcounted_kptr);
1213
}
1314

1415
void test_refcounted_kptr_fail(void)
1516
{
17+
RUN_TESTS(refcounted_kptr_fail);
1618
}
1719

1820
void test_refcounted_kptr_wrong_owner(void)
1921
{
22+
LIBBPF_OPTS(bpf_test_run_opts, opts,
23+
.data_in = &pkt_v4,
24+
.data_size_in = sizeof(pkt_v4),
25+
.repeat = 1,
26+
);
27+
struct refcounted_kptr *skel;
28+
int ret;
29+
30+
skel = refcounted_kptr__open_and_load();
31+
if (!ASSERT_OK_PTR(skel, "refcounted_kptr__open_and_load"))
32+
return;
33+
34+
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_wrong_owner_remove_fail_a1), &opts);
35+
ASSERT_OK(ret, "rbtree_wrong_owner_remove_fail_a1");
36+
ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_a1 retval");
37+
38+
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_wrong_owner_remove_fail_b), &opts);
39+
ASSERT_OK(ret, "rbtree_wrong_owner_remove_fail_b");
40+
ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_b retval");
41+
42+
ret = bpf_prog_test_run_opts(bpf_program__fd(skel->progs.rbtree_wrong_owner_remove_fail_a2), &opts);
43+
ASSERT_OK(ret, "rbtree_wrong_owner_remove_fail_a2");
44+
ASSERT_OK(opts.retval, "rbtree_wrong_owner_remove_fail_a2 retval");
45+
refcounted_kptr__destroy(skel);
2046
}

tools/testing/selftests/bpf/progs/refcounted_kptr.c

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,9 @@
88
#include "bpf_misc.h"
99
#include "bpf_experimental.h"
1010

11+
extern void bpf_rcu_read_lock(void) __ksym;
12+
extern void bpf_rcu_read_unlock(void) __ksym;
13+
1114
struct node_data {
1215
long key;
1316
long list_data;
@@ -497,4 +500,72 @@ long rbtree_wrong_owner_remove_fail_a2(void *ctx)
497500
return 0;
498501
}
499502

503+
SEC("?fentry.s/bpf_testmod_test_read")
504+
__success
505+
int BPF_PROG(rbtree_sleepable_rcu,
506+
struct file *file, struct kobject *kobj,
507+
struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
508+
{
509+
struct bpf_rb_node *rb;
510+
struct node_data *n, *m = NULL;
511+
512+
n = bpf_obj_new(typeof(*n));
513+
if (!n)
514+
return 0;
515+
516+
bpf_rcu_read_lock();
517+
bpf_spin_lock(&lock);
518+
bpf_rbtree_add(&root, &n->r, less);
519+
rb = bpf_rbtree_first(&root);
520+
if (!rb)
521+
goto err_out;
522+
523+
rb = bpf_rbtree_remove(&root, rb);
524+
if (!rb)
525+
goto err_out;
526+
527+
m = container_of(rb, struct node_data, r);
528+
529+
err_out:
530+
bpf_spin_unlock(&lock);
531+
bpf_rcu_read_unlock();
532+
if (m)
533+
bpf_obj_drop(m);
534+
return 0;
535+
}
536+
537+
SEC("?fentry.s/bpf_testmod_test_read")
538+
__success
539+
int BPF_PROG(rbtree_sleepable_rcu_no_explicit_rcu_lock,
540+
struct file *file, struct kobject *kobj,
541+
struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
542+
{
543+
struct bpf_rb_node *rb;
544+
struct node_data *n, *m = NULL;
545+
546+
n = bpf_obj_new(typeof(*n));
547+
if (!n)
548+
return 0;
549+
550+
/* No explicit bpf_rcu_read_lock */
551+
bpf_spin_lock(&lock);
552+
bpf_rbtree_add(&root, &n->r, less);
553+
rb = bpf_rbtree_first(&root);
554+
if (!rb)
555+
goto err_out;
556+
557+
rb = bpf_rbtree_remove(&root, rb);
558+
if (!rb)
559+
goto err_out;
560+
561+
m = container_of(rb, struct node_data, r);
562+
563+
err_out:
564+
bpf_spin_unlock(&lock);
565+
/* No explicit bpf_rcu_read_unlock */
566+
if (m)
567+
bpf_obj_drop(m);
568+
return 0;
569+
}
570+
500571
char _license[] SEC("license") = "GPL";

tools/testing/selftests/bpf/progs/refcounted_kptr_fail.c

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@ struct node_acquire {
1313
struct bpf_refcount refcount;
1414
};
1515

16+
extern void bpf_rcu_read_lock(void) __ksym;
17+
extern void bpf_rcu_read_unlock(void) __ksym;
18+
1619
#define private(name) SEC(".data." #name) __hidden __attribute__((aligned(8)))
1720
private(A) struct bpf_spin_lock glock;
1821
private(A) struct bpf_rb_root groot __contains(node_acquire, node);
@@ -71,4 +74,29 @@ long rbtree_refcounted_node_ref_escapes_owning_input(void *ctx)
7174
return 0;
7275
}
7376

77+
SEC("?fentry.s/bpf_testmod_test_read")
78+
__failure __msg("function calls are not allowed while holding a lock")
79+
int BPF_PROG(rbtree_fail_sleepable_lock_across_rcu,
80+
struct file *file, struct kobject *kobj,
81+
struct bin_attribute *bin_attr, char *buf, loff_t off, size_t len)
82+
{
83+
struct node_acquire *n;
84+
85+
n = bpf_obj_new(typeof(*n));
86+
if (!n)
87+
return 0;
88+
89+
/* spin_{lock,unlock} are in different RCU CS */
90+
bpf_rcu_read_lock();
91+
bpf_spin_lock(&glock);
92+
bpf_rbtree_add(&groot, &n->node, less);
93+
bpf_rcu_read_unlock();
94+
95+
bpf_rcu_read_lock();
96+
bpf_spin_unlock(&glock);
97+
bpf_rcu_read_unlock();
98+
99+
return 0;
100+
}
101+
74102
char _license[] SEC("license") = "GPL";

0 commit comments

Comments
 (0)