Skip to content

Commit 1bc1910

Browse files
committed
Merge tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt: - New user_events interface. User space can register an event with the kernel describing the format of the event. Then it will receive a byte in a page mapping that it can check against. A privileged task can then enable that event like any other event, which will change the mapped byte to true, telling the user space application to start writing the event to the tracing buffer. - Add new "ftrace_boot_snapshot" kernel command line parameter. When set, the tracing buffer will be saved in the snapshot buffer at boot up when the kernel hands things over to user space. This will keep the traces that happened at boot up available even if user space boot up has tracing as well. - Have TRACE_EVENT_ENUM() also update trace event field type descriptions. Thus if a static array defines its size with an enum, the user space trace event parsers can still know how to parse that array. - Add new TRACE_CUSTOM_EVENT() macro. This acts the same as the TRACE_EVENT() macro, but will attach to an existing tracepoint. This will make one tracepoint be able to trace different content and not be stuck at only what the original TRACE_EVENT() macro exports. - Fixes to tracing error logging. - Better saving of cmdlines to PIDs when tracing (use the wakeup events for mapping). * tag 'trace-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (30 commits) tracing: Have type enum modifications copy the strings user_events: Add trace event call as root for low permission cases tracing/user_events: Use alloc_pages instead of kzalloc() for register pages tracing: Add snapshot at end of kernel boot up tracing: Have TRACE_DEFINE_ENUM affect trace event types as well tracing: Fix strncpy warning in trace_events_synth.c user_events: Prevent dyn_event delete racing with ioctl add/delete tracing: Add TRACE_CUSTOM_EVENT() macro tracing: Move the defines to create TRACE_EVENTS into their own files tracing: Add sample code for custom trace events tracing: Allow custom events to be added to the tracefs directory tracing: Fix last_cmd_set() string management in histogram code user_events: Fix potential uninitialized pointer while parsing field tracing: Fix allocation of last_cmd in last_cmd_set() user_events: Add documentation file user_events: Add sample code for typical usage user_events: Add self-test for validator boundaries user_events: Add self-test for perf_event integration user_events: Add self-test for dynamic_events integration user_events: Add self-test for ftrace integration ...
2 parents 20f463f + 795301d commit 1bc1910

39 files changed

Lines changed: 4169 additions & 570 deletions

Documentation/admin-guide/kernel-parameters.txt

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1465,6 +1465,14 @@
14651465
as early as possible in order to facilitate early
14661466
boot debugging.
14671467

1468+
ftrace_boot_snapshot
1469+
[FTRACE] On boot up, a snapshot will be taken of the
1470+
ftrace ring buffer that can be read at:
1471+
/sys/kernel/tracing/snapshot.
1472+
This is useful if you need tracing information from kernel
1473+
boot up that is likely to be overridden by user space
1474+
start up functionality.
1475+
14681476
ftrace_dump_on_oops[=orig_cpu]
14691477
[FTRACE] will dump the trace buffers on oops.
14701478
If no parameter is passed, ftrace will dump

Documentation/trace/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,3 +30,4 @@ Linux Tracing Technologies
3030
stm
3131
sys-t
3232
coresight/index
33+
user_events
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
=========================================
2+
user_events: User-based Event Tracing
3+
=========================================
4+
5+
:Author: Beau Belgrave
6+
7+
Overview
8+
--------
9+
User based trace events allow user processes to create events and trace data
10+
that can be viewed via existing tools, such as ftrace, perf and eBPF.
11+
To enable this feature, build your kernel with CONFIG_USER_EVENTS=y.
12+
13+
Programs can view status of the events via
14+
/sys/kernel/debug/tracing/user_events_status and can both register and write
15+
data out via /sys/kernel/debug/tracing/user_events_data.
16+
17+
Programs can also use /sys/kernel/debug/tracing/dynamic_events to register and
18+
delete user based events via the u: prefix. The format of the command to
19+
dynamic_events is the same as the ioctl with the u: prefix applied.
20+
21+
Typically programs will register a set of events that they wish to expose to
22+
tools that can read trace_events (such as ftrace and perf). The registration
23+
process gives back two ints to the program for each event. The first int is the
24+
status index. This index describes which byte in the
25+
/sys/kernel/debug/tracing/user_events_status file represents this event. The
26+
second int is the write index. This index describes the data when a write() or
27+
writev() is called on the /sys/kernel/debug/tracing/user_events_data file.
28+
29+
The structures referenced in this document are contained with the
30+
/include/uap/linux/user_events.h file in the source tree.
31+
32+
**NOTE:** *Both user_events_status and user_events_data are under the tracefs
33+
filesystem and may be mounted at different paths than above.*
34+
35+
Registering
36+
-----------
37+
Registering within a user process is done via ioctl() out to the
38+
/sys/kernel/debug/tracing/user_events_data file. The command to issue is
39+
DIAG_IOCSREG.
40+
41+
This command takes a struct user_reg as an argument::
42+
43+
struct user_reg {
44+
u32 size;
45+
u64 name_args;
46+
u32 status_index;
47+
u32 write_index;
48+
};
49+
50+
The struct user_reg requires two inputs, the first is the size of the structure
51+
to ensure forward and backward compatibility. The second is the command string
52+
to issue for registering. Upon success two outputs are set, the status index
53+
and the write index.
54+
55+
User based events show up under tracefs like any other event under the
56+
subsystem named "user_events". This means tools that wish to attach to the
57+
events need to use /sys/kernel/debug/tracing/events/user_events/[name]/enable
58+
or perf record -e user_events:[name] when attaching/recording.
59+
60+
**NOTE:** *The write_index returned is only valid for the FD that was used*
61+
62+
Command Format
63+
^^^^^^^^^^^^^^
64+
The command string format is as follows::
65+
66+
name[:FLAG1[,FLAG2...]] [Field1[;Field2...]]
67+
68+
Supported Flags
69+
^^^^^^^^^^^^^^^
70+
**BPF_ITER** - EBPF programs attached to this event will get the raw iovec
71+
struct instead of any data copies for max performance.
72+
73+
Field Format
74+
^^^^^^^^^^^^
75+
::
76+
77+
type name [size]
78+
79+
Basic types are supported (__data_loc, u32, u64, int, char, char[20], etc).
80+
User programs are encouraged to use clearly sized types like u32.
81+
82+
**NOTE:** *Long is not supported since size can vary between user and kernel.*
83+
84+
The size is only valid for types that start with a struct prefix.
85+
This allows user programs to describe custom structs out to tools, if required.
86+
87+
For example, a struct in C that looks like this::
88+
89+
struct mytype {
90+
char data[20];
91+
};
92+
93+
Would be represented by the following field::
94+
95+
struct mytype myname 20
96+
97+
Deleting
98+
-----------
99+
Deleting an event from within a user process is done via ioctl() out to the
100+
/sys/kernel/debug/tracing/user_events_data file. The command to issue is
101+
DIAG_IOCSDEL.
102+
103+
This command only requires a single string specifying the event to delete by
104+
its name. Delete will only succeed if there are no references left to the
105+
event (in both user and kernel space). User programs should use a separate file
106+
to request deletes than the one used for registration due to this.
107+
108+
Status
109+
------
110+
When tools attach/record user based events the status of the event is updated
111+
in realtime. This allows user programs to only incur the cost of the write() or
112+
writev() calls when something is actively attached to the event.
113+
114+
User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to
115+
check the status for each event that is registered. The byte to check in the
116+
file is given back after the register ioctl() via user_reg.status_index.
117+
Currently the size of user_events_status is a single page, however, custom
118+
kernel configurations can change this size to allow more user based events. In
119+
all cases the size of the file is a multiple of a page size.
120+
121+
For example, if the register ioctl() gives back a status_index of 3 you would
122+
check byte 3 of the returned mmap data to see if anything is attached to that
123+
event.
124+
125+
Administrators can easily check the status of all registered events by reading
126+
the user_events_status file directly via a terminal. The output is as follows::
127+
128+
Byte:Name [# Comments]
129+
...
130+
131+
Active: ActiveCount
132+
Busy: BusyCount
133+
Max: MaxCount
134+
135+
For example, on a system that has a single event the output looks like this::
136+
137+
1:test
138+
139+
Active: 1
140+
Busy: 0
141+
Max: 4096
142+
143+
If a user enables the user event via ftrace, the output would change to this::
144+
145+
1:test # Used by ftrace
146+
147+
Active: 1
148+
Busy: 1
149+
Max: 4096
150+
151+
**NOTE:** *A status index of 0 will never be returned. This allows user
152+
programs to have an index that can be used on error cases.*
153+
154+
Status Bits
155+
^^^^^^^^^^^
156+
The byte being checked will be non-zero if anything is attached. Programs can
157+
check specific bits in the byte to see what mechanism has been attached.
158+
159+
The following values are defined to aid in checking what has been attached:
160+
161+
**EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0).
162+
163+
**EVENT_STATUS_PERF** - Bit set if perf/eBPF has been attached (Bit 1).
164+
165+
Writing Data
166+
------------
167+
After registering an event the same fd that was used to register can be used
168+
to write an entry for that event. The write_index returned must be at the start
169+
of the data, then the remaining data is treated as the payload of the event.
170+
171+
For example, if write_index returned was 1 and I wanted to write out an int
172+
payload of the event. Then the data would have to be 8 bytes (2 ints) in size,
173+
with the first 4 bytes being equal to 1 and the last 4 bytes being equal to the
174+
value I want as the payload.
175+
176+
In memory this would look like this::
177+
178+
int index;
179+
int payload;
180+
181+
User programs might have well known structs that they wish to use to emit out
182+
as payloads. In those cases writev() can be used, with the first vector being
183+
the index and the following vector(s) being the actual event payload.
184+
185+
For example, if I have a struct like this::
186+
187+
struct payload {
188+
int src;
189+
int dst;
190+
int flags;
191+
};
192+
193+
It's advised for user programs to do the following::
194+
195+
struct iovec io[2];
196+
struct payload e;
197+
198+
io[0].iov_base = &write_index;
199+
io[0].iov_len = sizeof(write_index);
200+
io[1].iov_base = &e;
201+
io[1].iov_len = sizeof(e);
202+
203+
writev(fd, (const struct iovec*)io, 2);
204+
205+
**NOTE:** *The write_index is not emitted out into the trace being recorded.*
206+
207+
EBPF
208+
----
209+
EBPF programs that attach to a user-based event tracepoint are given a pointer
210+
to a struct user_bpf_context. The bpf context contains the data type (which can
211+
be a user or kernel buffer, or can be a pointer to the iovec) and the data
212+
length that was emitted (minus the write_index).
213+
214+
Example Code
215+
------------
216+
See sample code in samples/user_events.

include/linux/ftrace.h

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,12 @@
3030
#define ARCH_SUPPORTS_FTRACE_OPS 0
3131
#endif
3232

33+
#ifdef CONFIG_TRACING
34+
extern void ftrace_boot_snapshot(void);
35+
#else
36+
static inline void ftrace_boot_snapshot(void) { }
37+
#endif
38+
3339
#ifdef CONFIG_FUNCTION_TRACER
3440
struct ftrace_ops;
3541
struct ftrace_regs;
@@ -215,7 +221,10 @@ struct ftrace_ops_hash {
215221
void ftrace_free_init_mem(void);
216222
void ftrace_free_mem(struct module *mod, void *start, void *end);
217223
#else
218-
static inline void ftrace_free_init_mem(void) { }
224+
static inline void ftrace_free_init_mem(void)
225+
{
226+
ftrace_boot_snapshot();
227+
}
219228
static inline void ftrace_free_mem(struct module *mod, void *start, void *end) { }
220229
#endif
221230

include/linux/trace_events.h

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,7 @@ enum {
315315
TRACE_EVENT_FL_KPROBE_BIT,
316316
TRACE_EVENT_FL_UPROBE_BIT,
317317
TRACE_EVENT_FL_EPROBE_BIT,
318+
TRACE_EVENT_FL_CUSTOM_BIT,
318319
};
319320

320321
/*
@@ -328,6 +329,9 @@ enum {
328329
* KPROBE - Event is a kprobe
329330
* UPROBE - Event is a uprobe
330331
* EPROBE - Event is an event probe
332+
* CUSTOM - Event is a custom event (to be attached to an exsiting tracepoint)
333+
* This is set when the custom event has not been attached
334+
* to a tracepoint yet, then it is cleared when it is.
331335
*/
332336
enum {
333337
TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT),
@@ -339,6 +343,7 @@ enum {
339343
TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT),
340344
TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT),
341345
TRACE_EVENT_FL_EPROBE = (1 << TRACE_EVENT_FL_EPROBE_BIT),
346+
TRACE_EVENT_FL_CUSTOM = (1 << TRACE_EVENT_FL_CUSTOM_BIT),
342347
};
343348

344349
#define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE)
@@ -440,7 +445,9 @@ static inline bool bpf_prog_array_valid(struct trace_event_call *call)
440445
static inline const char *
441446
trace_event_name(struct trace_event_call *call)
442447
{
443-
if (call->flags & TRACE_EVENT_FL_TRACEPOINT)
448+
if (call->flags & TRACE_EVENT_FL_CUSTOM)
449+
return call->name;
450+
else if (call->flags & TRACE_EVENT_FL_TRACEPOINT)
444451
return call->tp ? call->tp->name : NULL;
445452
else
446453
return call->name;
@@ -903,3 +910,18 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type,
903910
#endif
904911

905912
#endif /* _LINUX_TRACE_EVENT_H */
913+
914+
/*
915+
* Note: we keep the TRACE_CUSTOM_EVENT outside the include file ifdef protection.
916+
* This is due to the way trace custom events work. If a file includes two
917+
* trace event headers under one "CREATE_CUSTOM_TRACE_EVENTS" the first include
918+
* will override the TRACE_CUSTOM_EVENT and break the second include.
919+
*/
920+
921+
#ifndef TRACE_CUSTOM_EVENT
922+
923+
#define DECLARE_CUSTOM_EVENT_CLASS(name, proto, args, tstruct, assign, print)
924+
#define DEFINE_CUSTOM_EVENT(template, name, proto, args)
925+
#define TRACE_CUSTOM_EVENT(name, proto, args, struct, assign, print)
926+
927+
#endif /* ifdef TRACE_CUSTOM_EVENT (see note above) */
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
/*
3+
* Trace files that want to automate creation of all tracepoints defined
4+
* in their file should include this file. The following are macros that the
5+
* trace file may define:
6+
*
7+
* TRACE_SYSTEM defines the system the tracepoint is for
8+
*
9+
* TRACE_INCLUDE_FILE if the file name is something other than TRACE_SYSTEM.h
10+
* This macro may be defined to tell define_trace.h what file to include.
11+
* Note, leave off the ".h".
12+
*
13+
* TRACE_INCLUDE_PATH if the path is something other than core kernel include/trace
14+
* then this macro can define the path to use. Note, the path is relative to
15+
* define_trace.h, not the file including it. Full path names for out of tree
16+
* modules must be used.
17+
*/
18+
19+
#ifdef CREATE_CUSTOM_TRACE_EVENTS
20+
21+
/* Prevent recursion */
22+
#undef CREATE_CUSTOM_TRACE_EVENTS
23+
24+
#include <linux/stringify.h>
25+
26+
#undef TRACE_CUSTOM_EVENT
27+
#define TRACE_CUSTOM_EVENT(name, proto, args, tstruct, assign, print)
28+
29+
#undef DEFINE_CUSTOM_EVENT
30+
#define DEFINE_CUSTOM_EVENT(template, name, proto, args)
31+
32+
#undef TRACE_INCLUDE
33+
#undef __TRACE_INCLUDE
34+
35+
#ifndef TRACE_INCLUDE_FILE
36+
# define TRACE_INCLUDE_FILE TRACE_SYSTEM
37+
# define UNDEF_TRACE_INCLUDE_FILE
38+
#endif
39+
40+
#ifndef TRACE_INCLUDE_PATH
41+
# define __TRACE_INCLUDE(system) <trace/events/system.h>
42+
# define UNDEF_TRACE_INCLUDE_PATH
43+
#else
44+
# define __TRACE_INCLUDE(system) __stringify(TRACE_INCLUDE_PATH/system.h)
45+
#endif
46+
47+
# define TRACE_INCLUDE(system) __TRACE_INCLUDE(system)
48+
49+
/* Let the trace headers be reread */
50+
#define TRACE_CUSTOM_MULTI_READ
51+
52+
#include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
53+
54+
#ifdef TRACEPOINTS_ENABLED
55+
#include <trace/trace_custom_events.h>
56+
#endif
57+
58+
#undef TRACE_CUSTOM_EVENT
59+
#undef DECLARE_CUSTOM_EVENT_CLASS
60+
#undef DEFINE_CUSTOM_EVENT
61+
#undef TRACE_CUSTOM_MULTI_READ
62+
63+
/* Only undef what we defined in this file */
64+
#ifdef UNDEF_TRACE_INCLUDE_FILE
65+
# undef TRACE_INCLUDE_FILE
66+
# undef UNDEF_TRACE_INCLUDE_FILE
67+
#endif
68+
69+
#ifdef UNDEF_TRACE_INCLUDE_PATH
70+
# undef TRACE_INCLUDE_PATH
71+
# undef UNDEF_TRACE_INCLUDE_PATH
72+
#endif
73+
74+
/* We may be processing more files */
75+
#define CREATE_CUSTOM_TRACE_POINTS
76+
77+
#endif /* CREATE_CUSTOM_TRACE_POINTS */

0 commit comments

Comments
 (0)