@@ -20,11 +20,10 @@ dynamic_events is the same as the ioctl with the u: prefix applied.
2020
2121Typically programs will register a set of events that they wish to expose to
2222tools that can read trace_events (such as ftrace and perf). The registration
23- process gives back two ints to the program for each event. The first int is
24- the status bit. This describes which bit in little-endian format in the
25- /sys/kernel/tracing/user_events_status file represents this event. The
26- second int is the write index which describes the data when a write() or
27- writev() is called on the /sys/kernel/tracing/user_events_data file.
23+ process tells the kernel which address and bit to reflect if any tool has
24+ enabled the event and data should be written. The registration will give back
25+ a write index which describes the data when a write() or writev() is called
26+ on the /sys/kernel/tracing/user_events_data file.
2827
2928The structures referenced in this document are contained within the
3029/include/uapi/linux/user_events.h file in the source tree.
@@ -41,23 +40,64 @@ DIAG_IOCSREG.
4140This command takes a packed struct user_reg as an argument::
4241
4342 struct user_reg {
44- u32 size;
45- u64 name_args;
46- u32 status_bit;
47- u32 write_index;
48- };
43+ /* Input: Size of the user_reg structure being used */
44+ __u32 size;
45+
46+ /* Input: Bit in enable address to use */
47+ __u8 enable_bit;
48+
49+ /* Input: Enable size in bytes at address */
50+ __u8 enable_size;
51+
52+ /* Input: Flags for future use, set to 0 */
53+ __u16 flags;
54+
55+ /* Input: Address to update when enabled */
56+ __u64 enable_addr;
57+
58+ /* Input: Pointer to string with event name, description and flags */
59+ __u64 name_args;
60+
61+ /* Output: Index of the event to use when writing data */
62+ __u32 write_index;
63+ } __attribute__((__packed__));
64+
65+ The struct user_reg requires all the above inputs to be set appropriately.
66+
67+ + size: This must be set to sizeof(struct user_reg).
4968
50- The struct user_reg requires two inputs, the first is the size of the structure
51- to ensure forward and backward compatibility. The second is the command string
52- to issue for registering. Upon success two outputs are set, the status bit
53- and the write index.
69+ + enable_bit: The bit to reflect the event status at the address specified by
70+ enable_addr.
71+
72+ + enable_size: The size of the value specified by enable_addr.
73+ This must be 4 (32-bit) or 8 (64-bit). 64-bit values are only allowed to be
74+ used on 64-bit kernels, however, 32-bit can be used on all kernels.
75+
76+ + flags: The flags to use, if any. For the initial version this must be 0.
77+ Callers should first attempt to use flags and retry without flags to ensure
78+ support for lower versions of the kernel. If a flag is not supported -EINVAL
79+ is returned.
80+
81+ + enable_addr: The address of the value to use to reflect event status. This
82+ must be naturally aligned and write accessible within the user program.
83+
84+ + name_args: The name and arguments to describe the event, see command format
85+ for details.
86+
87+ Upon successful registration the following is set.
88+
89+ + write_index: The index to use for this file descriptor that represents this
90+ event when writing out data. The index is unique to this instance of the file
91+ descriptor that was used for the registration. See writing data for details.
5492
5593User based events show up under tracefs like any other event under the
5694subsystem named "user_events". This means tools that wish to attach to the
5795events need to use /sys/kernel/tracing/events/user_events/[name]/enable
5896or perf record -e user_events:[name] when attaching/recording.
5997
60- **NOTE: ** *The write_index returned is only valid for the FD that was used *
98+ **NOTE: ** The event subsystem name by default is "user_events". Callers should
99+ not assume it will always be "user_events". Operators reserve the right in the
100+ future to change the subsystem name per-process to accomodate event isolation.
61101
62102Command Format
63103^^^^^^^^^^^^^^
@@ -94,7 +134,7 @@ Would be represented by the following field::
94134 struct mytype myname 20
95135
96136Deleting
97- -----------
137+ --------
98138Deleting an event from within a user process is done via ioctl() out to the
99139/sys/kernel/tracing/user_events_data file. The command to issue is
100140DIAG_IOCSDEL.
@@ -104,92 +144,79 @@ its name. Delete will only succeed if there are no references left to the
104144event (in both user and kernel space). User programs should use a separate file
105145to request deletes than the one used for registration due to this.
106146
107- Status
108- ------
109- When tools attach/record user based events the status of the event is updated
110- in realtime. This allows user programs to only incur the cost of the write() or
111- writev() calls when something is actively attached to the event.
112-
113- User programs call mmap() on /sys/kernel/tracing/user_events_status to
114- check the status for each event that is registered. The bit to check in the
115- file is given back after the register ioctl() via user_reg.status_bit. The bit
116- is always in little-endian format. Programs can check if the bit is set either
117- using a byte-wise index with a mask or a long-wise index with a little-endian
118- mask.
147+ Unregistering
148+ -------------
149+ If after registering an event it is no longer wanted to be updated then it can
150+ be disabled via ioctl() out to the /sys/kernel/tracing/user_events_data file.
151+ The command to issue is DIAG_IOCSUNREG. This is different than deleting, where
152+ deleting actually removes the event from the system. Unregistering simply tells
153+ the kernel your process is no longer interested in updates to the event.
119154
120- Currently the size of user_events_status is a single page, however, custom
121- kernel configurations can change this size to allow more user based events. In
122- all cases the size of the file is a multiple of a page size.
155+ This command takes a packed struct user_unreg as an argument::
123156
124- For example, if the register ioctl() gives back a status_bit of 3 you would
125- check byte 0 (3 / 8) of the returned mmap data and then AND the result with 8
126- (1 << (3 % 8)) to see if anything is attached to that event.
157+ struct user_unreg {
158+ /* Input: Size of the user_unreg structure being used */
159+ __u32 size;
127160
128- A byte-wise index check is performed as follows::
161+ /* Input: Bit to unregister */
162+ __u8 disable_bit;
129163
130- int index, mask;
131- char *status_page ;
164+ /* Input: Reserved, set to 0 */
165+ __u8 __reserved ;
132166
133- index = status_bit / 8;
134- mask = 1 << (status_bit % 8);
135-
136- ...
167+ /* Input: Reserved, set to 0 */
168+ __u16 __reserved2;
137169
138- if (status_page[index] & mask) {
139- /* Enabled */
140- }
170+ /* Input: Address to unregister */
171+ __u64 disable_addr;
172+ } __attribute__((__packed__));
141173
142- A long-wise index check is performed as follows::
174+ The struct user_unreg requires all the above inputs to be set appropriately.
143175
144- #include <asm/bitsperlong.h>
145- #include <endian.h>
176+ + size: This must be set to sizeof(struct user_unreg).
146177
147- #if __BITS_PER_LONG == 64
148- #define endian_swap(x) htole64(x)
149- #else
150- #define endian_swap(x) htole32(x)
151- #endif
178+ + disable_bit: This must be set to the bit to disable (same bit that was
179+ previously registered via enable_bit).
152180
153- long index, mask, *status_page;
181+ + disable_addr: This must be set to the address to disable (same address that was
182+ previously registered via enable_addr).
154183
155- index = status_bit / __BITS_PER_LONG;
156- mask = 1L << (status_bit % __BITS_PER_LONG);
157- mask = endian_swap(mask);
184+ ** NOTE: ** Events are automatically unregistered when execve() is invoked. During
185+ fork() the registered events will be retained and must be unregistered manually
186+ in each process if wanted.
158187
159- ...
188+ Status
189+ ------
190+ When tools attach/record user based events the status of the event is updated
191+ in realtime. This allows user programs to only incur the cost of the write() or
192+ writev() calls when something is actively attached to the event.
160193
161- if (status_page[index] & mask) {
162- /* Enabled */
163- }
194+ The kernel will update the specified bit that was registered for the event as
195+ tools attach/detach from the event. User programs simply check if the bit is set
196+ to see if something is attached or not.
164197
165198Administrators can easily check the status of all registered events by reading
166199the user_events_status file directly via a terminal. The output is as follows::
167200
168- Byte: Name [# Comments]
201+ Name [# Comments]
169202 ...
170203
171204 Active: ActiveCount
172205 Busy: BusyCount
173- Max: MaxCount
174206
175207For example, on a system that has a single event the output looks like this::
176208
177- 1: test
209+ test
178210
179211 Active: 1
180212 Busy: 0
181- Max: 32768
182213
183214If a user enables the user event via ftrace, the output would change to this::
184215
185- 1: test # Used by ftrace
216+ test # Used by ftrace
186217
187218 Active: 1
188219 Busy: 1
189- Max: 32768
190-
191- **NOTE: ** *A status bit of 0 will never be returned. This allows user programs
192- to have a bit that can be used on error cases. *
193220
194221Writing Data
195222------------
@@ -217,7 +244,7 @@ For example, if I have a struct like this::
217244 int src;
218245 int dst;
219246 int flags;
220- };
247+ } __attribute__((__packed__)) ;
221248
222249It's advised for user programs to do the following::
223250
0 commit comments