Skip to content

Commit f17b474

Browse files
committed
Merge tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov: - Support associating BPF program with struct_ops (Amery Hung) - Switch BPF local storage to rqspinlock and remove recursion detection counters which were causing false positives (Amery Hung) - Fix live registers marking for indirect jumps (Anton Protopopov) - Introduce execution context detection BPF helpers (Changwoo Min) - Improve verifier precision for 32bit sign extension pattern (Cupertino Miranda) - Optimize BTF type lookup by sorting vmlinux BTF and doing binary search (Donglin Peng) - Allow states pruning for misc/invalid slots in iterator loops (Eduard Zingerman) - In preparation for ASAN support in BPF arenas teach libbpf to move global BPF variables to the end of the region and enable arena kfuncs while holding locks (Emil Tsalapatis) - Introduce support for implicit arguments in kfuncs and migrate a number of them to new API. This is a prerequisite for cgroup sub-schedulers in sched-ext (Ihor Solodrai) - Fix incorrect copied_seq calculation in sockmap (Jiayuan Chen) - Fix ORC stack unwind from kprobe_multi (Jiri Olsa) - Speed up fentry attach by using single ftrace direct ops in BPF trampolines (Jiri Olsa) - Require frozen map for calculating map hash (KP Singh) - Fix lock entry creation in TAS fallback in rqspinlock (Kumar Kartikeya Dwivedi) - Allow user space to select cpu in lookup/update operations on per-cpu array and hash maps (Leon Hwang) - Make kfuncs return trusted pointers by default (Matt Bobrowski) - Introduce "fsession" support where single BPF program is executed upon entry and exit from traced kernel function (Menglong Dong) - Allow bpf_timer and bpf_wq use in all programs types (Mykyta Yatsenko, Andrii Nakryiko, Kumar Kartikeya Dwivedi, Alexei Starovoitov) - Make KF_TRUSTED_ARGS the default for all kfuncs and clean up their definition across the tree (Puranjay Mohan) - Allow BPF arena calls from non-sleepable context (Puranjay Mohan) - Improve register id comparison logic in the verifier and extend linked registers with negative offsets (Puranjay Mohan) - In preparation for BPF-OOM introduce kfuncs to access memcg events (Roman Gushchin) - Use CFI compatible destructor kfunc type (Sami Tolvanen) - Add bitwise tracking for BPF_END in the verifier (Tianci Cao) - Add range tracking for BPF_DIV and BPF_MOD in the verifier (Yazhou Tang) - Make BPF selftests work with 64k page size (Yonghong Song) * tag 'bpf-next-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (268 commits) selftests/bpf: Fix outdated test on storage->smap selftests/bpf: Choose another percpu variable in bpf for btf_dump test selftests/bpf: Remove test_task_storage_map_stress_lookup selftests/bpf: Update task_local_storage/task_storage_nodeadlock test selftests/bpf: Update task_local_storage/recursion test selftests/bpf: Update sk_storage_omem_uncharge test bpf: Switch to bpf_selem_unlink_nofail in bpf_local_storage_{map_free, destroy} bpf: Support lockless unlink when freeing map or local storage bpf: Prepare for bpf_selem_unlink_nofail() bpf: Remove unused percpu counter from bpf_local_storage_map_free bpf: Remove cgroup local storage percpu counter bpf: Remove task local storage percpu counter bpf: Change local_storage->lock and b->lock to rqspinlock bpf: Convert bpf_selem_unlink to failable bpf: Convert bpf_selem_link_map to failable bpf: Convert bpf_selem_unlink_map to failable bpf: Select bpf_local_storage_map_bucket based on bpf_local_storage selftests/xsk: fix number of Tx frags in invalid packet selftests/xsk: properly handle batch ending in the middle of a packet bpf: Prevent reentrance into call_rcu_tasks_trace() ...
2 parents a7423e6 + db975de commit f17b474

248 files changed

Lines changed: 13312 additions & 2953 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/bpf/bpf_prog_run.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,11 +34,12 @@ following types:
3434
- ``BPF_PROG_TYPE_LWT_IN``
3535
- ``BPF_PROG_TYPE_LWT_OUT``
3636
- ``BPF_PROG_TYPE_LWT_XMIT``
37-
- ``BPF_PROG_TYPE_LWT_SEG6LOCAL``
3837
- ``BPF_PROG_TYPE_FLOW_DISSECTOR``
3938
- ``BPF_PROG_TYPE_STRUCT_OPS``
4039
- ``BPF_PROG_TYPE_RAW_TRACEPOINT``
4140
- ``BPF_PROG_TYPE_SYSCALL``
41+
- ``BPF_PROG_TYPE_TRACING``
42+
- ``BPF_PROG_TYPE_NETFILTER``
4243

4344
When using the ``BPF_PROG_RUN`` command, userspace supplies an input context
4445
object and (for program types operating on network packets) a buffer containing

Documentation/bpf/kfuncs.rst

Lines changed: 141 additions & 119 deletions
Original file line numberDiff line numberDiff line change
@@ -50,15 +50,78 @@ A wrapper kfunc is often needed when we need to annotate parameters of the
5050
kfunc. Otherwise one may directly make the kfunc visible to the BPF program by
5151
registering it with the BPF subsystem. See :ref:`BPF_kfunc_nodef`.
5252

53-
2.2 Annotating kfunc parameters
53+
2.2 kfunc Parameters
54+
--------------------
55+
56+
All kfuncs now require trusted arguments by default. This means that all
57+
pointer arguments must be valid, and all pointers to BTF objects must be
58+
passed in their unmodified form (at a zero offset, and without having been
59+
obtained from walking another pointer, with exceptions described below).
60+
61+
There are two types of pointers to kernel objects which are considered "trusted":
62+
63+
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
64+
2. Pointers which were returned from a KF_ACQUIRE kfunc.
65+
66+
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
67+
kfuncs, and may have a non-zero offset.
68+
69+
The definition of "valid" pointers is subject to change at any time, and has
70+
absolutely no ABI stability guarantees.
71+
72+
As mentioned above, a nested pointer obtained from walking a trusted pointer is
73+
no longer trusted, with one exception. If a struct type has a field that is
74+
guaranteed to be valid (trusted or rcu, as in KF_RCU description below) as long
75+
as its parent pointer is valid, the following macros can be used to express
76+
that to the verifier:
77+
78+
* ``BTF_TYPE_SAFE_TRUSTED``
79+
* ``BTF_TYPE_SAFE_RCU``
80+
* ``BTF_TYPE_SAFE_RCU_OR_NULL``
81+
82+
For example,
83+
84+
.. code-block:: c
85+
86+
BTF_TYPE_SAFE_TRUSTED(struct socket) {
87+
struct sock *sk;
88+
};
89+
90+
or
91+
92+
.. code-block:: c
93+
94+
BTF_TYPE_SAFE_RCU(struct task_struct) {
95+
const cpumask_t *cpus_ptr;
96+
struct css_set __rcu *cgroups;
97+
struct task_struct __rcu *real_parent;
98+
struct task_struct *group_leader;
99+
};
100+
101+
In other words, you must:
102+
103+
1. Wrap the valid pointer type in a ``BTF_TYPE_SAFE_*`` macro.
104+
105+
2. Specify the type and name of the valid nested field. This field must match
106+
the field in the original type definition exactly.
107+
108+
A new type declared by a ``BTF_TYPE_SAFE_*`` macro also needs to be emitted so
109+
that it appears in BTF. For example, ``BTF_TYPE_SAFE_TRUSTED(struct socket)``
110+
is emitted in the ``type_is_trusted()`` function as follows:
111+
112+
.. code-block:: c
113+
114+
BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct socket));
115+
116+
2.3 Annotating kfunc parameters
54117
-------------------------------
55118

56119
Similar to BPF helpers, there is sometime need for additional context required
57120
by the verifier to make the usage of kernel functions safer and more useful.
58121
Hence, we can annotate a parameter by suffixing the name of the argument of the
59122
kfunc with a __tag, where tag may be one of the supported annotations.
60123

61-
2.2.1 __sz Annotation
124+
2.3.1 __sz Annotation
62125
---------------------
63126

64127
This annotation is used to indicate a memory and size pair in the argument list.
@@ -74,7 +137,7 @@ argument as its size. By default, without __sz annotation, the size of the type
74137
of the pointer is used. Without __sz annotation, a kfunc cannot accept a void
75138
pointer.
76139

77-
2.2.2 __k Annotation
140+
2.3.2 __k Annotation
78141
--------------------
79142

80143
This annotation is only understood for scalar arguments, where it indicates that
@@ -98,7 +161,7 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
98161
size parameter, and the value of the constant matters for program safety, __k
99162
suffix should be used.
100163

101-
2.2.3 __uninit Annotation
164+
2.3.3 __uninit Annotation
102165
-------------------------
103166

104167
This annotation is used to indicate that the argument will be treated as
@@ -115,27 +178,36 @@ Here, the dynptr will be treated as an uninitialized dynptr. Without this
115178
annotation, the verifier will reject the program if the dynptr passed in is
116179
not initialized.
117180

118-
2.2.4 __opt Annotation
119-
-------------------------
181+
2.3.4 __nullable Annotation
182+
---------------------------
120183

121-
This annotation is used to indicate that the buffer associated with an __sz or __szk
122-
argument may be null. If the function is passed a nullptr in place of the buffer,
123-
the verifier will not check that length is appropriate for the buffer. The kfunc is
124-
responsible for checking if this buffer is null before using it.
184+
This annotation is used to indicate that the pointer argument may be NULL.
185+
The verifier will allow passing NULL for such arguments.
125186

126187
An example is given below::
127188

128-
__bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk)
189+
__bpf_kfunc void bpf_task_release(struct task_struct *task__nullable)
129190
{
130191
...
131192
}
132193

133-
Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk.
134-
Either way, the returned buffer is either NULL, or of size buffer_szk. Without this
135-
annotation, the verifier will reject the program if a null pointer is passed in with
136-
a nonzero size.
194+
Here, the task pointer may be NULL. The kfunc is responsible for checking if
195+
the pointer is NULL before dereferencing it.
196+
197+
The __nullable annotation can be combined with other annotations. For example,
198+
when used with __sz or __szk annotations for memory and size pairs, the
199+
verifier will skip size validation when a NULL pointer is passed, but will
200+
still process the size argument to extract constant size information when
201+
needed::
202+
203+
__bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__nullable,
204+
u32 buffer__szk)
205+
206+
Here, the buffer may be NULL. If the buffer is not NULL, it must be at least
207+
buffer__szk bytes in size. The kfunc is responsible for checking if the buffer
208+
is NULL before using it.
137209

138-
2.2.5 __str Annotation
210+
2.3.5 __str Annotation
139211
----------------------------
140212
This annotation is used to indicate that the argument is a constant string.
141213

@@ -160,34 +232,17 @@ Or::
160232
...
161233
}
162234

163-
2.2.6 __prog Annotation
164-
---------------------------
165-
This annotation is used to indicate that the argument needs to be fixed up to
166-
the bpf_prog_aux of the caller BPF program. Any value passed into this argument
167-
is ignored, and rewritten by the verifier.
168-
169-
An example is given below::
170-
171-
__bpf_kfunc int bpf_wq_set_callback_impl(struct bpf_wq *wq,
172-
int (callback_fn)(void *map, int *key, void *value),
173-
unsigned int flags,
174-
void *aux__prog)
175-
{
176-
struct bpf_prog_aux *aux = aux__prog;
177-
...
178-
}
179-
180235
.. _BPF_kfunc_nodef:
181236

182-
2.3 Using an existing kernel function
237+
2.4 Using an existing kernel function
183238
-------------------------------------
184239

185240
When an existing function in the kernel is fit for consumption by BPF programs,
186241
it can be directly registered with the BPF subsystem. However, care must still
187242
be taken to review the context in which it will be invoked by the BPF program
188243
and whether it is safe to do so.
189244

190-
2.4 Annotating kfuncs
245+
2.5 Annotating kfuncs
191246
---------------------
192247

193248
In addition to kfuncs' arguments, verifier may need more information about the
@@ -216,7 +271,7 @@ protected. An example is given below::
216271
...
217272
}
218273

219-
2.4.1 KF_ACQUIRE flag
274+
2.5.1 KF_ACQUIRE flag
220275
---------------------
221276

222277
The KF_ACQUIRE flag is used to indicate that the kfunc returns a pointer to a
@@ -226,7 +281,7 @@ referenced kptr (by invoking bpf_kptr_xchg). If not, the verifier fails the
226281
loading of the BPF program until no lingering references remain in all possible
227282
explored states of the program.
228283

229-
2.4.2 KF_RET_NULL flag
284+
2.5.2 KF_RET_NULL flag
230285
----------------------
231286

232287
The KF_RET_NULL flag is used to indicate that the pointer returned by the kfunc
@@ -235,87 +290,21 @@ returned from the kfunc before making use of it (dereferencing or passing to
235290
another helper). This flag is often used in pairing with KF_ACQUIRE flag, but
236291
both are orthogonal to each other.
237292

238-
2.4.3 KF_RELEASE flag
293+
2.5.3 KF_RELEASE flag
239294
---------------------
240295

241296
The KF_RELEASE flag is used to indicate that the kfunc releases the pointer
242297
passed in to it. There can be only one referenced pointer that can be passed
243298
in. All copies of the pointer being released are invalidated as a result of
244-
invoking kfunc with this flag. KF_RELEASE kfuncs automatically receive the
245-
protection afforded by the KF_TRUSTED_ARGS flag described below.
246-
247-
2.4.4 KF_TRUSTED_ARGS flag
248-
--------------------------
299+
invoking kfunc with this flag.
249300

250-
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
251-
indicates that the all pointer arguments are valid, and that all pointers to
252-
BTF objects have been passed in their unmodified form (that is, at a zero
253-
offset, and without having been obtained from walking another pointer, with one
254-
exception described below).
255-
256-
There are two types of pointers to kernel objects which are considered "valid":
257-
258-
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
259-
2. Pointers which were returned from a KF_ACQUIRE kfunc.
260-
261-
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
262-
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
263-
264-
The definition of "valid" pointers is subject to change at any time, and has
265-
absolutely no ABI stability guarantees.
266-
267-
As mentioned above, a nested pointer obtained from walking a trusted pointer is
268-
no longer trusted, with one exception. If a struct type has a field that is
269-
guaranteed to be valid (trusted or rcu, as in KF_RCU description below) as long
270-
as its parent pointer is valid, the following macros can be used to express
271-
that to the verifier:
272-
273-
* ``BTF_TYPE_SAFE_TRUSTED``
274-
* ``BTF_TYPE_SAFE_RCU``
275-
* ``BTF_TYPE_SAFE_RCU_OR_NULL``
276-
277-
For example,
278-
279-
.. code-block:: c
280-
281-
BTF_TYPE_SAFE_TRUSTED(struct socket) {
282-
struct sock *sk;
283-
};
284-
285-
or
286-
287-
.. code-block:: c
288-
289-
BTF_TYPE_SAFE_RCU(struct task_struct) {
290-
const cpumask_t *cpus_ptr;
291-
struct css_set __rcu *cgroups;
292-
struct task_struct __rcu *real_parent;
293-
struct task_struct *group_leader;
294-
};
295-
296-
In other words, you must:
297-
298-
1. Wrap the valid pointer type in a ``BTF_TYPE_SAFE_*`` macro.
299-
300-
2. Specify the type and name of the valid nested field. This field must match
301-
the field in the original type definition exactly.
302-
303-
A new type declared by a ``BTF_TYPE_SAFE_*`` macro also needs to be emitted so
304-
that it appears in BTF. For example, ``BTF_TYPE_SAFE_TRUSTED(struct socket)``
305-
is emitted in the ``type_is_trusted()`` function as follows:
306-
307-
.. code-block:: c
308-
309-
BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED(struct socket));
310-
311-
312-
2.4.5 KF_SLEEPABLE flag
301+
2.5.4 KF_SLEEPABLE flag
313302
-----------------------
314303

315304
The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
316305
be called by sleepable BPF programs (BPF_F_SLEEPABLE).
317306

318-
2.4.6 KF_DESTRUCTIVE flag
307+
2.5.5 KF_DESTRUCTIVE flag
319308
--------------------------
320309

321310
The KF_DESTRUCTIVE flag is used to indicate functions calling which is
@@ -324,18 +313,19 @@ rebooting or panicking. Due to this additional restrictions apply to these
324313
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
325314
added later.
326315

327-
2.4.7 KF_RCU flag
316+
2.5.6 KF_RCU flag
328317
-----------------
329318

330-
The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
331-
KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees
332-
that the objects are valid and there is no use-after-free. The pointers are not
333-
NULL, but the object's refcount could have reached zero. The kfuncs need to
334-
consider doing refcnt != 0 check, especially when returning a KF_ACQUIRE
335-
pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should very likely
336-
also be KF_RET_NULL.
319+
The KF_RCU flag allows kfuncs to opt out of the default trusted args
320+
requirement and accept RCU pointers with weaker guarantees. The kfuncs marked
321+
with KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier
322+
guarantees that the objects are valid and there is no use-after-free. The
323+
pointers are not NULL, but the object's refcount could have reached zero. The
324+
kfuncs need to consider doing refcnt != 0 check, especially when returning a
325+
KF_ACQUIRE pointer. Note as well that a KF_ACQUIRE kfunc that is KF_RCU should
326+
very likely also be KF_RET_NULL.
337327

338-
2.4.8 KF_RCU_PROTECTED flag
328+
2.5.7 KF_RCU_PROTECTED flag
339329
---------------------------
340330

341331
The KF_RCU_PROTECTED flag is used to indicate that the kfunc must be invoked in
@@ -354,7 +344,7 @@ RCU protection but do not take RCU protected arguments.
354344

355345
.. _KF_deprecated_flag:
356346

357-
2.4.9 KF_DEPRECATED flag
347+
2.5.8 KF_DEPRECATED flag
358348
------------------------
359349

360350
The KF_DEPRECATED flag is used for kfuncs which are scheduled to be
@@ -374,7 +364,39 @@ encouraged to make their use-cases known as early as possible, and participate
374364
in upstream discussions regarding whether to keep, change, deprecate, or remove
375365
those kfuncs if and when such discussions occur.
376366

377-
2.5 Registering the kfuncs
367+
2.5.9 KF_IMPLICIT_ARGS flag
368+
------------------------------------
369+
370+
The KF_IMPLICIT_ARGS flag is used to indicate that the BPF signature
371+
of the kfunc is different from it's kernel signature, and the values
372+
for implicit arguments are provided at load time by the verifier.
373+
374+
Only arguments of specific types are implicit.
375+
Currently only ``struct bpf_prog_aux *`` type is supported.
376+
377+
A kfunc with KF_IMPLICIT_ARGS flag therefore has two types in BTF: one
378+
function matching the kernel declaration (with _impl suffix in the
379+
name by convention), and another matching the intended BPF API.
380+
381+
Verifier only allows calls to the non-_impl version of a kfunc, that
382+
uses a signature without the implicit arguments.
383+
384+
Example declaration:
385+
386+
.. code-block:: c
387+
388+
__bpf_kfunc int bpf_task_work_schedule_signal(struct task_struct *task, struct bpf_task_work *tw,
389+
void *map__map, bpf_task_work_callback_t callback,
390+
struct bpf_prog_aux *aux) { ... }
391+
392+
Example usage in BPF program:
393+
394+
.. code-block:: c
395+
396+
/* note that the last argument is omitted */
397+
bpf_task_work_schedule_signal(task, &work->tw, &arrmap, task_work_callback);
398+
399+
2.6 Registering the kfuncs
378400
--------------------------
379401

380402
Once the kfunc is prepared for use, the final step to making it visible is
@@ -397,7 +419,7 @@ type. An example is shown below::
397419
}
398420
late_initcall(init_subsystem);
399421

400-
2.6 Specifying no-cast aliases with ___init
422+
2.7 Specifying no-cast aliases with ___init
401423
--------------------------------------------
402424

403425
The verifier will always enforce that the BTF type of a pointer passed to a

0 commit comments

Comments
 (0)