Skip to content

Commit 69c5079

Browse files
committed
Merge tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt: - Extend tracing option mask to 64 bits The trace options were defined by a 32 bit variable. This limits the tracing instances to have a total of 32 different options. As that limit has been hit, and more options are being added, increase the option mask to a 64 bit number, doubling the number of options available. As this is required for the kprobe topic branches as well as the tracing topic branch, a separate branch was created and merged into both. - Make trace_user_fault_read() available for the rest of tracing The function trace_user_fault_read() is used by trace_marker file read to allow reading user space to be done fast and without locking or allocations. Make this available so that the system call trace events can use it too. - Have system call trace events read user space values Now that the system call trace events callbacks are called in a faultable context, take advantage of this and read the user space buffers for various system calls. For example, show the path name of the openat system call instead of just showing the pointer to that path name in user space. Also show the contents of the buffer of the write system call. Several system call trace events are updated to make tracing into a light weight strace tool for all applications in the system. - Update perf system call tracing to do the same - And a config and syscall_user_buf_size file to control the size of the buffer Limit the amount of data that can be read from user space. The default size is 63 bytes but that can be expanded to 165 bytes. - Allow the persistent ring buffer to print system calls normally The persistent ring buffer prints trace events by their type and ignores the print_fmt. This is because the print_fmt may change from kernel to kernel. As the system call output is fixed by the system call ABI itself, there's no reason to limit that. This makes reading the system call events in the persistent ring buffer much nicer and easier to understand. - Add options to show text offset to function profiler The function profiler that counts the number of times a function is hit currently lists all functions by its name and offset. But this becomes ambiguous when there are several functions with the same name. Add a tracing option that changes the output to be that of '_text+offset' instead. Now a user space tool can use this information to map the '_text+offset' to the unique function it is counting. - Report bad dynamic event command If a bad command is passed to the dynamic_events file, report it properly in the error log. - Clean up tracer options Clean up the tracer option code a bit, by removing some useless code and also using switch statements instead of a series of if statements. - Have tracing options be instance specific Tracers can have their own options (function tracer, irqsoff tracer, function graph tracer, etc). But now that the same tracer can be enabled in multiple trace instances, their options are still global. The API is per instance, thus changing one affects other instances. This isn't even consistent, as the option take affect differently depending on when an tracer started in an instance. Make the options for instances only affect the instance it is changed under. - Optimize pid_list lock contention Whenever the pid_list is read, it uses a spin lock. This happens at every sched switch. Taking the lock at sched switch can be removed by instead using a seqlock counter. - Clean up the trace trigger structures The trigger code uses two different structures to implement a single tigger. This was due to trying to reuse code for the two different types of triggers (always on trigger, and count limited trigger). But by adding a single field to one structure, the other structure could be absorbed into the first structure making he code easier to understand. - Create a bulk garbage collector for trace triggers If user space has triggers for several hundreds of events and then removes them, it can take several seconds to complete. This is because each removal calls tracepoint_synchronize_unregister() that can take hundreds of milliseconds to complete. Instead, create a helper thread that will do the clean up. When a trigger is removed, it will create the kthread if it isn't already created, and then add the trigger to a llist. The kthread will take the items off the llist, call tracepoint_synchronize_unregister(), and then remove the items it took off. It will then check if there's more items to free before sleeping. This makes user space removing all these triggers to finish in less than a second. - Allow function tracing of some of the tracing infrastructure code Because the tracing code can cause recursion issues if it is traced by the function tracer the entire tracing directory disables function tracing. But not all of tracing causes issues if it is traced. Namely, the event tracing code. Add a config that enables some of the tracing code to be traced to help in debugging it. Note, when this is enabled, it does add noise to general function tracing, especially if events are enabled as well (which is a common case). - Add boot-time backup instance for persistent buffer The persistent ring buffer is used mostly for kernel crash analysis in the field. One issue is that if there's a crash, the data in the persistent ring buffer must be read before tracing can begin using it. This slows down the boot process. Once tracing starts in the persistent ring buffer, the old data must be freed and the addresses no longer match and old events can't be in the buffer with new events. Create a way to create a backup buffer that copies the persistent ring buffer at boot up. Then after a crash, the always on tracer can begin immediately as well as the normal boot process while the crash analysis tooling uses the backup buffer. After the backup buffer is finished being read, it can be removed. - Enable function graph args and return address options at the same time Currently the when reading of arguments in the function graph tracer is enabled, the option to record the parent function in the entry event can not be enabled. Update the code so that it can. - Add new struct_offset() helper macro Add a new macro that takes a pointer to a structure and a name of one of its members and it will return the offset of that member. This allows the ring buffer code to simplify the following: From: size = struct_size(entry, buf, cnt - sizeof(entry->id)); To: size = struct_offset(entry, id) + cnt; There should be other simplifications that this macro can help out with as well * tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (42 commits) overflow: Introduce struct_offset() to get offset of member function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously tracing: Add boot-time backup of persistent ring buffer ftrace: Allow tracing of some of the tracing code tracing: Use strim() in trigger_process_regex() instead of skip_spaces() tracing: Add bulk garbage collection of freeing event_trigger_data tracing: Remove unneeded event_mutex lock in event_trigger_regex_release() tracing: Merge struct event_trigger_ops into struct event_command tracing: Remove get_trigger_ops() and add count_func() from trigger ops tracing: Show the tracer options in boot-time created instance ftrace: Avoid redundant initialization in register_ftrace_direct tracing: Remove unused variable in tracing_trace_options_show() fgraph: Make fgraph_no_sleep_time signed tracing: Convert function graph set_flags() to use a switch() statement tracing: Have function graph tracer option sleep-time be per instance tracing: Move graph-time out of function graph options tracing: Have function graph tracer option funcgraph-irqs be per instance trace/pid_list: optimize pid_list->lock contention tracing: Have function graph tracer define options per instance tracing: Have function tracer define options per instance ...
2 parents 36492b7 + f6ed9c5 commit 69c5079

32 files changed

Lines changed: 2301 additions & 902 deletions

Documentation/trace/ftrace.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -366,6 +366,14 @@ of ftrace. Here is a list of some of the key files:
366366
for each function. The displayed address is the patch-site address
367367
and can differ from /proc/kallsyms address.
368368

369+
syscall_user_buf_size:
370+
371+
Some system call trace events will record the data from a user
372+
space address that one of the parameters point to. The amount of
373+
data per event is limited. This file holds the max number of bytes
374+
that will be recorded into the ring buffer to hold this data.
375+
The max value is currently 165.
376+
369377
dyn_ftrace_total_info:
370378

371379
This file is for debugging purposes. The number of functions that

include/linux/ftrace.h

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1167,17 +1167,14 @@ static inline void ftrace_init(void) { }
11671167
*/
11681168
struct ftrace_graph_ent {
11691169
unsigned long func; /* Current function */
1170-
int depth;
1170+
unsigned long depth;
11711171
} __packed;
11721172

11731173
/*
11741174
* Structure that defines an entry function trace with retaddr.
1175-
* It's already packed but the attribute "packed" is needed
1176-
* to remove extra padding at the end.
11771175
*/
11781176
struct fgraph_retaddr_ent {
1179-
unsigned long func; /* Current function */
1180-
int depth;
1177+
struct ftrace_graph_ent ent;
11811178
unsigned long retaddr; /* Return address */
11821179
} __packed;
11831180

include/linux/overflow.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -458,6 +458,18 @@ static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend)
458458
#define struct_size_t(type, member, count) \
459459
struct_size((type *)NULL, member, count)
460460

461+
/**
462+
* struct_offset() - Calculate the offset of a member within a struct
463+
* @p: Pointer to the struct
464+
* @member: Name of the member to get the offset of
465+
*
466+
* Calculates the offset of a particular @member of the structure pointed
467+
* to by @p.
468+
*
469+
* Return: number of bytes to the location of @member.
470+
*/
471+
#define struct_offset(p, member) (offsetof(typeof(*(p)), member))
472+
461473
/**
462474
* __DEFINE_FLEX() - helper macro for DEFINE_FLEX() family.
463475
* Enables caller macro to pass arbitrary trailing expressions

include/linux/seq_buf.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,23 @@ static inline void seq_buf_commit(struct seq_buf *s, int num)
149149
}
150150
}
151151

152+
/**
153+
* seq_buf_pop - pop off the last written character
154+
* @s: the seq_buf handle
155+
*
156+
* Removes the last written character to the seq_buf @s.
157+
*
158+
* Returns the last character or -1 if it is empty.
159+
*/
160+
static inline int seq_buf_pop(struct seq_buf *s)
161+
{
162+
if (!s->len)
163+
return -1;
164+
165+
s->len--;
166+
return (unsigned int)s->buffer[s->len];
167+
}
168+
152169
extern __printf(2, 3)
153170
int seq_buf_printf(struct seq_buf *s, const char *fmt, ...);
154171
extern __printf(2, 0)

include/linux/trace_seq.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,19 @@ static inline bool trace_seq_has_overflowed(struct trace_seq *s)
8080
return s->full || seq_buf_has_overflowed(&s->seq);
8181
}
8282

83+
/**
84+
* trace_seq_pop - pop off the last written character
85+
* @s: trace sequence descriptor
86+
*
87+
* Removes the last written character to the trace_seq @s.
88+
*
89+
* Returns the last character or -1 if it is empty.
90+
*/
91+
static inline int trace_seq_pop(struct trace_seq *s)
92+
{
93+
return seq_buf_pop(&s->seq);
94+
}
95+
8396
/*
8497
* Currently only defined when tracing is enabled.
8598
*/

include/trace/syscall.h

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,9 @@
1616
* @name: name of the syscall
1717
* @syscall_nr: number of the syscall
1818
* @nb_args: number of parameters it takes
19+
* @user_arg_is_str: set if the arg for @user_arg_size is a string
20+
* @user_arg_size: holds @arg that has size of the user space to read
21+
* @user_mask: mask of @args that will read user space
1922
* @types: list of types as strings
2023
* @args: list of args as strings (args[i] matches types[i])
2124
* @enter_fields: list of fields for syscall_enter trace event
@@ -25,7 +28,10 @@
2528
struct syscall_metadata {
2629
const char *name;
2730
int syscall_nr;
28-
int nb_args;
31+
u8 nb_args:7;
32+
u8 user_arg_is_str:1;
33+
s8 user_arg_size;
34+
short user_mask;
2935
const char **types;
3036
const char **args;
3137
struct list_head enter_fields;

kernel/trace/Kconfig

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,20 @@ config DYNAMIC_FTRACE_WITH_JMP
342342
depends on DYNAMIC_FTRACE_WITH_DIRECT_CALLS
343343
depends on HAVE_DYNAMIC_FTRACE_WITH_JMP
344344

345+
config FUNCTION_SELF_TRACING
346+
bool "Function trace tracing code"
347+
depends on FUNCTION_TRACER
348+
help
349+
Normally all the tracing code is set to notrace, where the function
350+
tracer will ignore all the tracing functions. Sometimes it is useful
351+
for debugging to trace some of the tracing infratructure itself.
352+
Enable this to allow some of the tracing infrastructure to be traced
353+
by the function tracer. Note, this will likely add noise to function
354+
tracing if events and other tracing features are enabled along with
355+
function tracing.
356+
357+
If unsure, say N.
358+
345359
config FPROBE
346360
bool "Kernel Function Probe (fprobe)"
347361
depends on HAVE_FUNCTION_GRAPH_FREGS && HAVE_FTRACE_GRAPH_FUNC
@@ -587,6 +601,20 @@ config FTRACE_SYSCALLS
587601
help
588602
Basic tracer to catch the syscall entry and exit events.
589603

604+
config TRACE_SYSCALL_BUF_SIZE_DEFAULT
605+
int "System call user read max size"
606+
range 0 165
607+
default 63
608+
depends on FTRACE_SYSCALLS
609+
help
610+
Some system call trace events will record the data from a user
611+
space address that one of the parameters point to. The amount of
612+
data per event is limited. That limit is set by this config and
613+
this config also affects how much user space data perf can read.
614+
615+
For a tracing instance, this size may be changed by writing into
616+
its syscall_user_buf_size file.
617+
590618
config TRACER_SNAPSHOT
591619
bool "Create a snapshot trace buffer"
592620
select TRACER_MAX_TRACE

kernel/trace/Makefile

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,23 @@ obj-y += trace_selftest_dynamic.o
1616
endif
1717
endif
1818

19+
# Allow some files to be function traced
20+
ifdef CONFIG_FUNCTION_SELF_TRACING
21+
CFLAGS_trace_output.o = $(CC_FLAGS_FTRACE)
22+
CFLAGS_trace_seq.o = $(CC_FLAGS_FTRACE)
23+
CFLAGS_trace_stat.o = $(CC_FLAGS_FTRACE)
24+
CFLAGS_tracing_map.o = $(CC_FLAGS_FTRACE)
25+
CFLAGS_synth_event_gen_test.o = $(CC_FLAGS_FTRACE)
26+
CFLAGS_trace_events.o = $(CC_FLAGS_FTRACE)
27+
CFLAGS_trace_syscalls.o = $(CC_FLAGS_FTRACE)
28+
CFLAGS_trace_events_filter.o = $(CC_FLAGS_FTRACE)
29+
CFLAGS_trace_events_trigger.o = $(CC_FLAGS_FTRACE)
30+
CFLAGS_trace_events_synth.o = $(CC_FLAGS_FTRACE)
31+
CFLAGS_trace_events_hist.o = $(CC_FLAGS_FTRACE)
32+
CFLAGS_trace_events_user.o = $(CC_FLAGS_FTRACE)
33+
CFLAGS_trace_dynevent.o = $(CC_FLAGS_FTRACE)
34+
endif
35+
1936
ifdef CONFIG_FTRACE_STARTUP_TEST
2037
CFLAGS_trace_kprobe_selftest.o = $(CC_FLAGS_FTRACE)
2138
obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe_selftest.o

kernel/trace/blktrace.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1738,7 +1738,7 @@ static enum print_line_t print_one_line(struct trace_iterator *iter,
17381738

17391739
t = te_blk_io_trace(iter->ent);
17401740
what = (t->action & ((1 << BLK_TC_SHIFT) - 1)) & ~__BLK_TA_CGROUP;
1741-
long_act = !!(tr->trace_flags & TRACE_ITER_VERBOSE);
1741+
long_act = !!(tr->trace_flags & TRACE_ITER(VERBOSE));
17421742
log_action = classic ? &blk_log_action_classic : &blk_log_action;
17431743
has_cg = t->action & __BLK_TA_CGROUP;
17441744

@@ -1803,9 +1803,9 @@ blk_tracer_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set)
18031803
/* don't output context-info for blk_classic output */
18041804
if (bit == TRACE_BLK_OPT_CLASSIC) {
18051805
if (set)
1806-
tr->trace_flags &= ~TRACE_ITER_CONTEXT_INFO;
1806+
tr->trace_flags &= ~TRACE_ITER(CONTEXT_INFO);
18071807
else
1808-
tr->trace_flags |= TRACE_ITER_CONTEXT_INFO;
1808+
tr->trace_flags |= TRACE_ITER(CONTEXT_INFO);
18091809
}
18101810
return 0;
18111811
}

kernel/trace/fgraph.c

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -498,9 +498,6 @@ void *fgraph_retrieve_parent_data(int idx, int *size_bytes, int depth)
498498
return get_data_type_data(current, offset);
499499
}
500500

501-
/* Both enabled by default (can be cleared by function_graph tracer flags */
502-
bool fgraph_sleep_time = true;
503-
504501
#ifdef CONFIG_DYNAMIC_FTRACE
505502
/*
506503
* archs can override this function if they must do something
@@ -1023,11 +1020,6 @@ void fgraph_init_ops(struct ftrace_ops *dst_ops,
10231020
#endif
10241021
}
10251022

1026-
void ftrace_graph_sleep_time_control(bool enable)
1027-
{
1028-
fgraph_sleep_time = enable;
1029-
}
1030-
10311023
/*
10321024
* Simply points to ftrace_stub, but with the proper protocol.
10331025
* Defined by the linker script in linux/vmlinux.lds.h
@@ -1098,7 +1090,7 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
10981090
* Does the user want to count the time a function was asleep.
10991091
* If so, do not update the time stamps.
11001092
*/
1101-
if (fgraph_sleep_time)
1093+
if (!fgraph_no_sleep_time)
11021094
return;
11031095

11041096
timestamp = trace_clock_local();

0 commit comments

Comments
 (0)