Skip to content

Commit 5263e30

Browse files
author
Alexei Starovoitov
committed
Merge branch 'close-race-in-freeing-special-fields-and-map-value'
Kumar Kartikeya Dwivedi says: ==================== Close race in freeing special fields and map value There exists a race across various map types where the freeing of special fields (tw, timer, wq, kptr, etc.) can be done eagerly when a logical delete operation is done on a map value, such that the program which continues to have access to such a map value can recreate the fields and cause them to leak. The set contains fixes for this case. It is a continuation of Mykyta's previous attempt in [0], but applies to all fields. A test is included which reproduces the bug reliably in absence of the fixes. Local Storage Benchmarks ------------------------ Evaluation Setup: Benchmarked on a dual-socket Intel Xeon Gold 6348 (Ice Lake) @ 2.60GHz (56 cores / 112 threads), with the CPU governor set to performance. Bench was pinned to a single NUMA node throughout the test. Benchmark comes from [1] using the following command: ./bench -p 1 local-storage-create --storage-type <socket,task> --batch-size <16,32,64> Before the test, 10 runs of all cases ([socket|task] x 3 batch sizes x 7 iterations per batch size) are done to warm up and prime the machine. Then, 3 runs of all cases are done (with and without the patch, across reboots). For each comparison, we have 21 samples, i.e. per batch size (e.g. socket 16) of a given local storage, we have 3 runs x 7 iterations. The statistics (mean, median, stddev) and t-test is done for each scenario (local storage and batch size pair) individually (21 samples for either case). All values are for local storage creations in thousand creations / sec (k/s). Baseline (without patch) With patch Delta Case Median Mean Std. Dev. Median Mean Std. Dev. Median % --------------------------------------------------------------------------------------------------- socket 16 432.026 431.941 1.047 431.347 431.953 1.635 -0.679 -0.16% socket 32 432.641 432.818 1.535 432.488 432.302 1.508 -0.153 -0.04% socket 64 431.504 431.996 1.337 429.145 430.326 2.469 -2.359 -0.55% task 16 38.816 39.382 1.456 39.657 39.337 1.831 +0.841 +2.17% task 32 38.815 39.644 2.690 38.721 39.122 1.636 -0.094 -0.24% task 64 37.562 38.080 1.701 39.554 38.563 1.689 +1.992 +5.30% The cases for socket are within the range of noise, and improvements in task local storage are due to high variance (CV ~4%-6% across batch sizes). The only statistically significant case worth mentioning is socket with batch size 64 with p-value from t-test < 0.05, but the absolute difference is small (~2k/s). TL;DR there doesn't appear to be any significant regression or improvement. [0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com [1]: https://lore.kernel.org/bpf/20260205222916.1788211-1-ameryhung@gmail.com Changelog: ---------- v2 -> v3 v2: https://lore.kernel.org/bpf/20260227052031.3988575-1-memxor@gmail.com * Add syzbot Tested-by. * Add Amery's Reviewed-by. * Fix missing rcu_dereference_check() in __bpf_selem_free_rcu. (BPF CI Bot) * Remove migrate_disable() in bpf_selem_free_rcu. (Alexei) v1 -> v2 v1: https://lore.kernel.org/bpf/20260225185121.2057388-1-memxor@gmail.com * Add Paul's Reviewed-by. * Fix use-after-free in accessing bpf_mem_alloc embedded in map. (syzbot CI) * Add benchmark numbers for local storage. * Add extra test case for per-cpu hashmap coverage with up to 16 refcount leaks. * Target bpf tree. ==================== Link: https://patch.msgid.link/20260227224806.646888-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 parents 6881af2 + 2939d7b commit 5263e30

15 files changed

Lines changed: 603 additions & 57 deletions

File tree

include/linux/bpf.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@ struct bpf_map_ops {
124124
u32 (*map_fd_sys_lookup_elem)(void *ptr);
125125
void (*map_seq_show_elem)(struct bpf_map *map, void *key,
126126
struct seq_file *m);
127-
int (*map_check_btf)(const struct bpf_map *map,
127+
int (*map_check_btf)(struct bpf_map *map,
128128
const struct btf *btf,
129129
const struct btf_type *key_type,
130130
const struct btf_type *value_type);
@@ -656,7 +656,7 @@ static inline bool bpf_map_support_seq_show(const struct bpf_map *map)
656656
map->ops->map_seq_show_elem;
657657
}
658658

659-
int map_check_no_btf(const struct bpf_map *map,
659+
int map_check_no_btf(struct bpf_map *map,
660660
const struct btf *btf,
661661
const struct btf_type *key_type,
662662
const struct btf_type *value_type);

include/linux/bpf_local_storage.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ u32 bpf_local_storage_destroy(struct bpf_local_storage *local_storage);
176176
void bpf_local_storage_map_free(struct bpf_map *map,
177177
struct bpf_local_storage_cache *cache);
178178

179-
int bpf_local_storage_map_check_btf(const struct bpf_map *map,
179+
int bpf_local_storage_map_check_btf(struct bpf_map *map,
180180
const struct btf *btf,
181181
const struct btf_type *key_type,
182182
const struct btf_type *value_type);

include/linux/bpf_mem_alloc.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@ struct bpf_mem_alloc {
1414
struct obj_cgroup *objcg;
1515
bool percpu;
1616
struct work_struct work;
17+
void (*dtor_ctx_free)(void *ctx);
18+
void *dtor_ctx;
1719
};
1820

1921
/* 'size != 0' is for bpf_mem_alloc which manages fixed-size objects.
@@ -32,6 +34,10 @@ int bpf_mem_alloc_percpu_init(struct bpf_mem_alloc *ma, struct obj_cgroup *objcg
3234
/* The percpu allocation with a specific unit size. */
3335
int bpf_mem_alloc_percpu_unit_init(struct bpf_mem_alloc *ma, int size);
3436
void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma);
37+
void bpf_mem_alloc_set_dtor(struct bpf_mem_alloc *ma,
38+
void (*dtor)(void *obj, void *ctx),
39+
void (*dtor_ctx_free)(void *ctx),
40+
void *ctx);
3541

3642
/* Check the allocation size for kmalloc equivalent allocator */
3743
int bpf_mem_alloc_check_size(bool percpu, size_t size);

kernel/bpf/arena.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,7 @@ static long arena_map_update_elem(struct bpf_map *map, void *key,
303303
return -EOPNOTSUPP;
304304
}
305305

306-
static int arena_map_check_btf(const struct bpf_map *map, const struct btf *btf,
306+
static int arena_map_check_btf(struct bpf_map *map, const struct btf *btf,
307307
const struct btf_type *key_type, const struct btf_type *value_type)
308308
{
309309
return 0;

kernel/bpf/arraymap.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -548,7 +548,7 @@ static void percpu_array_map_seq_show_elem(struct bpf_map *map, void *key,
548548
rcu_read_unlock();
549549
}
550550

551-
static int array_map_check_btf(const struct bpf_map *map,
551+
static int array_map_check_btf(struct bpf_map *map,
552552
const struct btf *btf,
553553
const struct btf_type *key_type,
554554
const struct btf_type *value_type)

kernel/bpf/bloom_filter.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ static long bloom_map_update_elem(struct bpf_map *map, void *key,
180180
return -EINVAL;
181181
}
182182

183-
static int bloom_map_check_btf(const struct bpf_map *map,
183+
static int bloom_map_check_btf(struct bpf_map *map,
184184
const struct btf *btf,
185185
const struct btf_type *key_type,
186186
const struct btf_type *value_type)

kernel/bpf/bpf_insn_array.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ static long insn_array_delete_elem(struct bpf_map *map, void *key)
9898
return -EINVAL;
9999
}
100100

101-
static int insn_array_check_btf(const struct bpf_map *map,
101+
static int insn_array_check_btf(struct bpf_map *map,
102102
const struct btf *btf,
103103
const struct btf_type *key_type,
104104
const struct btf_type *value_type)

kernel/bpf/bpf_local_storage.c

Lines changed: 39 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -107,14 +107,12 @@ static void __bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
107107
{
108108
struct bpf_local_storage *local_storage;
109109

110-
/* If RCU Tasks Trace grace period implies RCU grace period, do
111-
* kfree(), else do kfree_rcu().
110+
/*
111+
* RCU Tasks Trace grace period implies RCU grace period, do
112+
* kfree() directly.
112113
*/
113114
local_storage = container_of(rcu, struct bpf_local_storage, rcu);
114-
if (rcu_trace_implies_rcu_gp())
115-
kfree(local_storage);
116-
else
117-
kfree_rcu(local_storage, rcu);
115+
kfree(local_storage);
118116
}
119117

120118
/* Handle use_kmalloc_nolock == false */
@@ -138,10 +136,11 @@ static void bpf_local_storage_free_rcu(struct rcu_head *rcu)
138136

139137
static void bpf_local_storage_free_trace_rcu(struct rcu_head *rcu)
140138
{
141-
if (rcu_trace_implies_rcu_gp())
142-
bpf_local_storage_free_rcu(rcu);
143-
else
144-
call_rcu(rcu, bpf_local_storage_free_rcu);
139+
/*
140+
* RCU Tasks Trace grace period implies RCU grace period, do
141+
* kfree() directly.
142+
*/
143+
bpf_local_storage_free_rcu(rcu);
145144
}
146145

147146
static void bpf_local_storage_free(struct bpf_local_storage *local_storage,
@@ -164,24 +163,37 @@ static void bpf_local_storage_free(struct bpf_local_storage *local_storage,
164163
bpf_local_storage_free_trace_rcu);
165164
}
166165

167-
/* rcu tasks trace callback for use_kmalloc_nolock == false */
168-
static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu)
166+
/* rcu callback for use_kmalloc_nolock == false */
167+
static void __bpf_selem_free_rcu(struct rcu_head *rcu)
169168
{
170169
struct bpf_local_storage_elem *selem;
170+
struct bpf_local_storage_map *smap;
171171

172172
selem = container_of(rcu, struct bpf_local_storage_elem, rcu);
173-
if (rcu_trace_implies_rcu_gp())
174-
kfree(selem);
175-
else
176-
kfree_rcu(selem, rcu);
173+
/* bpf_selem_unlink_nofail may have already cleared smap and freed fields. */
174+
smap = rcu_dereference_check(SDATA(selem)->smap, 1);
175+
176+
if (smap)
177+
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
178+
kfree(selem);
179+
}
180+
181+
/* rcu tasks trace callback for use_kmalloc_nolock == false */
182+
static void __bpf_selem_free_trace_rcu(struct rcu_head *rcu)
183+
{
184+
/*
185+
* RCU Tasks Trace grace period implies RCU grace period, do
186+
* kfree() directly.
187+
*/
188+
__bpf_selem_free_rcu(rcu);
177189
}
178190

179191
/* Handle use_kmalloc_nolock == false */
180192
static void __bpf_selem_free(struct bpf_local_storage_elem *selem,
181193
bool vanilla_rcu)
182194
{
183195
if (vanilla_rcu)
184-
kfree_rcu(selem, rcu);
196+
call_rcu(&selem->rcu, __bpf_selem_free_rcu);
185197
else
186198
call_rcu_tasks_trace(&selem->rcu, __bpf_selem_free_trace_rcu);
187199
}
@@ -195,37 +207,29 @@ static void bpf_selem_free_rcu(struct rcu_head *rcu)
195207
/* The bpf_local_storage_map_free will wait for rcu_barrier */
196208
smap = rcu_dereference_check(SDATA(selem)->smap, 1);
197209

198-
if (smap) {
199-
migrate_disable();
210+
if (smap)
200211
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
201-
migrate_enable();
202-
}
203212
kfree_nolock(selem);
204213
}
205214

206215
static void bpf_selem_free_trace_rcu(struct rcu_head *rcu)
207216
{
208-
if (rcu_trace_implies_rcu_gp())
209-
bpf_selem_free_rcu(rcu);
210-
else
211-
call_rcu(rcu, bpf_selem_free_rcu);
217+
/*
218+
* RCU Tasks Trace grace period implies RCU grace period, do
219+
* kfree() directly.
220+
*/
221+
bpf_selem_free_rcu(rcu);
212222
}
213223

214224
void bpf_selem_free(struct bpf_local_storage_elem *selem,
215225
bool reuse_now)
216226
{
217-
struct bpf_local_storage_map *smap;
218-
219-
smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
220-
221227
if (!selem->use_kmalloc_nolock) {
222228
/*
223229
* No uptr will be unpin even when reuse_now == false since uptr
224230
* is only supported in task local storage, where
225231
* smap->use_kmalloc_nolock == true.
226232
*/
227-
if (smap)
228-
bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
229233
__bpf_selem_free(selem, reuse_now);
230234
return;
231235
}
@@ -797,7 +801,7 @@ int bpf_local_storage_map_alloc_check(union bpf_attr *attr)
797801
return 0;
798802
}
799803

800-
int bpf_local_storage_map_check_btf(const struct bpf_map *map,
804+
int bpf_local_storage_map_check_btf(struct bpf_map *map,
801805
const struct btf *btf,
802806
const struct btf_type *key_type,
803807
const struct btf_type *value_type)
@@ -958,10 +962,9 @@ void bpf_local_storage_map_free(struct bpf_map *map,
958962
*/
959963
synchronize_rcu();
960964

961-
if (smap->use_kmalloc_nolock) {
962-
rcu_barrier_tasks_trace();
963-
rcu_barrier();
964-
}
965+
/* smap remains in use regardless of kmalloc_nolock, so wait unconditionally. */
966+
rcu_barrier_tasks_trace();
967+
rcu_barrier();
965968
kvfree(smap->buckets);
966969
bpf_map_area_free(smap);
967970
}

kernel/bpf/hashtab.c

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,11 @@ struct htab_elem {
125125
char key[] __aligned(8);
126126
};
127127

128+
struct htab_btf_record {
129+
struct btf_record *record;
130+
u32 key_size;
131+
};
132+
128133
static inline bool htab_is_prealloc(const struct bpf_htab *htab)
129134
{
130135
return !(htab->map.map_flags & BPF_F_NO_PREALLOC);
@@ -457,6 +462,83 @@ static int htab_map_alloc_check(union bpf_attr *attr)
457462
return 0;
458463
}
459464

465+
static void htab_mem_dtor(void *obj, void *ctx)
466+
{
467+
struct htab_btf_record *hrec = ctx;
468+
struct htab_elem *elem = obj;
469+
void *map_value;
470+
471+
if (IS_ERR_OR_NULL(hrec->record))
472+
return;
473+
474+
map_value = htab_elem_value(elem, hrec->key_size);
475+
bpf_obj_free_fields(hrec->record, map_value);
476+
}
477+
478+
static void htab_pcpu_mem_dtor(void *obj, void *ctx)
479+
{
480+
void __percpu *pptr = *(void __percpu **)obj;
481+
struct htab_btf_record *hrec = ctx;
482+
int cpu;
483+
484+
if (IS_ERR_OR_NULL(hrec->record))
485+
return;
486+
487+
for_each_possible_cpu(cpu)
488+
bpf_obj_free_fields(hrec->record, per_cpu_ptr(pptr, cpu));
489+
}
490+
491+
static void htab_dtor_ctx_free(void *ctx)
492+
{
493+
struct htab_btf_record *hrec = ctx;
494+
495+
btf_record_free(hrec->record);
496+
kfree(ctx);
497+
}
498+
499+
static int htab_set_dtor(struct bpf_htab *htab, void (*dtor)(void *, void *))
500+
{
501+
u32 key_size = htab->map.key_size;
502+
struct bpf_mem_alloc *ma;
503+
struct htab_btf_record *hrec;
504+
int err;
505+
506+
/* No need for dtors. */
507+
if (IS_ERR_OR_NULL(htab->map.record))
508+
return 0;
509+
510+
hrec = kzalloc(sizeof(*hrec), GFP_KERNEL);
511+
if (!hrec)
512+
return -ENOMEM;
513+
hrec->key_size = key_size;
514+
hrec->record = btf_record_dup(htab->map.record);
515+
if (IS_ERR(hrec->record)) {
516+
err = PTR_ERR(hrec->record);
517+
kfree(hrec);
518+
return err;
519+
}
520+
ma = htab_is_percpu(htab) ? &htab->pcpu_ma : &htab->ma;
521+
bpf_mem_alloc_set_dtor(ma, dtor, htab_dtor_ctx_free, hrec);
522+
return 0;
523+
}
524+
525+
static int htab_map_check_btf(struct bpf_map *map, const struct btf *btf,
526+
const struct btf_type *key_type, const struct btf_type *value_type)
527+
{
528+
struct bpf_htab *htab = container_of(map, struct bpf_htab, map);
529+
530+
if (htab_is_prealloc(htab))
531+
return 0;
532+
/*
533+
* We must set the dtor using this callback, as map's BTF record is not
534+
* populated in htab_map_alloc(), so it will always appear as NULL.
535+
*/
536+
if (htab_is_percpu(htab))
537+
return htab_set_dtor(htab, htab_pcpu_mem_dtor);
538+
else
539+
return htab_set_dtor(htab, htab_mem_dtor);
540+
}
541+
460542
static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
461543
{
462544
bool percpu = (attr->map_type == BPF_MAP_TYPE_PERCPU_HASH ||
@@ -2281,6 +2363,7 @@ const struct bpf_map_ops htab_map_ops = {
22812363
.map_seq_show_elem = htab_map_seq_show_elem,
22822364
.map_set_for_each_callback_args = map_set_for_each_callback_args,
22832365
.map_for_each_callback = bpf_for_each_hash_elem,
2366+
.map_check_btf = htab_map_check_btf,
22842367
.map_mem_usage = htab_map_mem_usage,
22852368
BATCH_OPS(htab),
22862369
.map_btf_id = &htab_map_btf_ids[0],
@@ -2303,6 +2386,7 @@ const struct bpf_map_ops htab_lru_map_ops = {
23032386
.map_seq_show_elem = htab_map_seq_show_elem,
23042387
.map_set_for_each_callback_args = map_set_for_each_callback_args,
23052388
.map_for_each_callback = bpf_for_each_hash_elem,
2389+
.map_check_btf = htab_map_check_btf,
23062390
.map_mem_usage = htab_map_mem_usage,
23072391
BATCH_OPS(htab_lru),
23082392
.map_btf_id = &htab_map_btf_ids[0],
@@ -2482,6 +2566,7 @@ const struct bpf_map_ops htab_percpu_map_ops = {
24822566
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
24832567
.map_set_for_each_callback_args = map_set_for_each_callback_args,
24842568
.map_for_each_callback = bpf_for_each_hash_elem,
2569+
.map_check_btf = htab_map_check_btf,
24852570
.map_mem_usage = htab_map_mem_usage,
24862571
BATCH_OPS(htab_percpu),
24872572
.map_btf_id = &htab_map_btf_ids[0],
@@ -2502,6 +2587,7 @@ const struct bpf_map_ops htab_lru_percpu_map_ops = {
25022587
.map_seq_show_elem = htab_percpu_map_seq_show_elem,
25032588
.map_set_for_each_callback_args = map_set_for_each_callback_args,
25042589
.map_for_each_callback = bpf_for_each_hash_elem,
2590+
.map_check_btf = htab_map_check_btf,
25052591
.map_mem_usage = htab_map_mem_usage,
25062592
BATCH_OPS(htab_lru_percpu),
25072593
.map_btf_id = &htab_map_btf_ids[0],

kernel/bpf/local_storage.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -364,7 +364,7 @@ static long cgroup_storage_delete_elem(struct bpf_map *map, void *key)
364364
return -EINVAL;
365365
}
366366

367-
static int cgroup_storage_check_btf(const struct bpf_map *map,
367+
static int cgroup_storage_check_btf(struct bpf_map *map,
368368
const struct btf *btf,
369369
const struct btf_type *key_type,
370370
const struct btf_type *value_type)

0 commit comments

Comments
 (0)