Skip to content

Commit 4175529

Browse files
committed
Merge tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Madhavan Srinivasan: - powerpc support for BPF arena and arena atomics - Patches to switch to msi parent domain (per-device MSI domains) - Add a lock contention tracepoint in the queued spinlock slowpath - Fixes for underflow in pseries/powernv msi and pci paths - Switch from legacy-of-mm-gpiochip dependency to platform driver - Fixes for handling TLB misses - Introduce support for powerpc papr-hvpipe - Add vpa-dtl PMU driver for pseries platform - Misc fixes and cleanups Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence, Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao, Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters, Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao Bagalkote. * tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits) powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct genirq/msi: Remove msi_post_free() powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf powerpc/time: Expose boot_tb via accessor powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure powerpc/fprobe: fix updated fprobe for function-graph tracer powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL powerpc64/modules: replace stub allocation sentinel with an explicit counter powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs powerpc/ftrace: ensure ftrace record ops are always set for NOPs powerpc/603: Really copy kernel PGD entries into all PGDIRs powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler powerpc/pseries: HVPIPE changes to support migration powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS powerpc/pseries: Enable HVPIPE event message interrupt ...
2 parents 9cc220a + ef10405 commit 4175529

150 files changed

Lines changed: 2668 additions & 776 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
What: /sys/bus/event_source/devices/vpa_dtl/format
2+
Date: February 2025
3+
Contact: Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
4+
Description: Read-only. Attribute group to describe the magic bits
5+
that go into perf_event_attr.config for a particular pmu.
6+
(See ABI/testing/sysfs-bus-event_source-devices-format).
7+
8+
Each attribute under this group defines a bit range of the
9+
perf_event_attr.config. Supported attribute are listed
10+
below::
11+
12+
event = "config:0-7" - event ID
13+
14+
For example::
15+
16+
dtl_cede = "event=0x1"
17+
18+
What: /sys/bus/event_source/devices/vpa_dtl/events
19+
Date: February 2025
20+
Contact: Linux on PowerPC Developer List <linuxppc-dev at lists.ozlabs.org>
21+
Description: (RO) Attribute group to describe performance monitoring events
22+
for the Virtual Processor Dispatch Trace Log. Each attribute in
23+
this group describes a single performance monitoring event
24+
supported by vpa_dtl pmu. The name of the file is the name of
25+
the event (See ABI/testing/sysfs-bus-event_source-devices-events).

Documentation/arch/powerpc/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ powerpc
3737
vas-api
3838
vcpudispatch_stats
3939
vmemmap_dedup
40+
vpa-dtl
4041

4142
features
4243

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
.. _vpa-dtl:
3+
4+
===================================
5+
DTL (Dispatch Trace Log)
6+
===================================
7+
8+
Athira Rajeev, 19 April 2025
9+
10+
.. contents::
11+
:depth: 3
12+
13+
14+
Basic overview
15+
==============
16+
17+
The pseries Shared Processor Logical Partition(SPLPAR) machines can
18+
retrieve a log of dispatch and preempt events from the hypervisor
19+
using data from Disptach Trace Log(DTL) buffer. With this information,
20+
user can retrieve when and why each dispatch & preempt has occurred.
21+
The vpa-dtl PMU exposes the Virtual Processor Area(VPA) DTL counters
22+
via perf.
23+
24+
Infrastructure used
25+
===================
26+
27+
The VPA DTL PMU counters do not interrupt on overflow or generate any
28+
PMI interrupts. Therefore, hrtimer is used to poll the DTL data. The timer
29+
nterval can be provided by user via sample_period field in nano seconds.
30+
vpa dtl pmu has one hrtimer added per vpa-dtl pmu thread. DTL (Dispatch
31+
Trace Log) contains information about dispatch/preempt, enqueue time etc.
32+
We directly copy the DTL buffer data as part of auxiliary buffer and it
33+
will be processed later. This will avoid time taken to create samples
34+
in the kernel space. The PMU driver collecting Dispatch Trace Log (DTL)
35+
entries makes use of AUX support in perf infrastructure. On the tools side,
36+
this data is made available as PERF_RECORD_AUXTRACE records.
37+
38+
To correlate each DTL entry with other events across CPU's, an auxtrace_queue
39+
is created for each CPU. Each auxtrace queue has a array/list of auxtrace buffers.
40+
All auxtrace queues is maintained in auxtrace heap. The queues are sorted
41+
based on timestamp. When the different PERF_RECORD_XX records are processed,
42+
compare the timestamp of perf record with timestamp of top element in the
43+
auxtrace heap so that DTL events can be co-related with other events
44+
Process the auxtrace queue if the timestamp of element from heap is
45+
lower than timestamp from entry in perf record. Sometimes it could happen that
46+
one buffer is only partially processed. if the timestamp of occurrence of
47+
another event is more than currently processed element in the queue, it will
48+
move on to next perf record. So keep track of position of buffer to continue
49+
processing next time. Update the timestamp of the auxtrace heap with the timestamp
50+
of last processed entry from the auxtrace buffer.
51+
52+
This infrastructure ensures dispatch trace log entries can be correlated
53+
and presented along with other events like sched.
54+
55+
vpa-dtl PMU example usage
56+
=========================
57+
58+
.. code-block:: sh
59+
60+
# ls /sys/devices/vpa_dtl/
61+
events format perf_event_mux_interval_ms power subsystem type uevent
62+
63+
64+
To capture the DTL data using perf record:
65+
.. code-block:: sh
66+
67+
# ./perf record -a -e sched:\*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
68+
69+
The result can be interpreted using perf record. Snippet of perf report -D
70+
71+
.. code-block:: sh
72+
73+
# ./perf report -D
74+
75+
There are different PERF_RECORD_XX records. In that records corresponding to
76+
auxtrace buffers includes:
77+
78+
1. PERF_RECORD_AUX
79+
Conveys that new data is available in AUX area
80+
81+
2. PERF_RECORD_AUXTRACE_INFO
82+
Describes offset and size of auxtrace data in the buffers
83+
84+
3. PERF_RECORD_AUXTRACE
85+
This is the record that defines the auxtrace data which here in case of
86+
vpa-dtl pmu is dispatch trace log data.
87+
88+
Snippet from perf report -D showing the PERF_RECORD_AUXTRACE dump
89+
90+
.. code-block:: sh
91+
92+
0 0 0x39b10 [0x30]: PERF_RECORD_AUXTRACE size: 0x690 offset: 0 ref: 0 idx: 0 tid: -1 cpu: 0
93+
.
94+
. ... VPA DTL PMU data: size 1680 bytes, entries is 35
95+
. 00000000: boot_tb: 21349649546353231, tb_freq: 512000000
96+
. 00000030: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:7064, ready_to_enqueue_time:187, waiting_to_ready_time:6611773
97+
. 00000060: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:146, ready_to_enqueue_time:0, waiting_to_ready_time:15359437
98+
. 00000090: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4868, ready_to_enqueue_time:232, waiting_to_ready_time:5100709
99+
. 000000c0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:179, ready_to_enqueue_time:0, waiting_to_ready_time:30714243
100+
. 000000f0: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:197, ready_to_enqueue_time:0, waiting_to_ready_time:15350648
101+
. 00000120: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:213, ready_to_enqueue_time:0, waiting_to_ready_time:15353446
102+
. 00000150: dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:212, ready_to_enqueue_time:0, waiting_to_ready_time:15355126
103+
. 00000180: dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:6368, ready_to_enqueue_time:164, waiting_to_ready_time:5104665
104+
105+
Above is representation of dtl entry of below format:
106+
107+
struct dtl_entry {
108+
u8 dispatch_reason;
109+
u8 preempt_reason;
110+
u16 processor_id;
111+
u32 enqueue_to_dispatch_time;
112+
u32 ready_to_enqueue_time;
113+
u32 waiting_to_ready_time;
114+
u64 timebase;
115+
u64 fault_addr;
116+
u64 srr0;
117+
u64 srr1;
118+
119+
};
120+
121+
First two fields represent the dispatch reason and preempt reason. The post
122+
processing of PERF_RECORD_AUXTRACE records will translate to meaningful data
123+
for user to consume.
124+
125+
Visualize the dispatch trace log entries with perf report
126+
=========================================================
127+
128+
.. code-block:: sh
129+
130+
# ./perf record -a -e sched:*,vpa_dtl/dtl_all/ -c 1000000000 sleep 1
131+
[ perf record: Woken up 1 times to write data ]
132+
[ perf record: Captured and wrote 0.300 MB perf.data ]
133+
134+
# ./perf report
135+
# Samples: 321 of event 'vpa-dtl'
136+
# Event count (approx.): 321
137+
#
138+
# Children Self Command Shared Object Symbol
139+
# ........ ........ ....... ................. ..............................
140+
#
141+
100.00% 100.00% swapper [kernel.kallsyms] [k] plpar_hcall_norets_notrace
142+
143+
Visualize the dispatch trace log entries with perf script
144+
=========================================================
145+
146+
.. code-block:: sh
147+
148+
# ./perf script
149+
migration/9 67 [009] 105373.359903: sched:sched_waking: comm=perf pid=13418 prio=120 target_cpu=009
150+
migration/9 67 [009] 105373.359904: sched:sched_migrate_task: comm=perf pid=13418 prio=120 orig_cpu=9 dest_cpu=10
151+
migration/9 67 [009] 105373.359907: sched:sched_stat_runtime: comm=migration/9 pid=67 runtime=4050 [ns]
152+
migration/9 67 [009] 105373.359908: sched:sched_switch: prev_comm=migration/9 prev_pid=67 prev_prio=0 prev_state=S ==> next_comm=swapper/9 next_pid=0 next_prio=120
153+
:256 256 [016] 105373.359913: vpa-dtl: timebase: 21403600706628832 dispatch_reason:decrementer interrupt, preempt_reason:H_CEDE, enqueue_to_dispatch_time:4854, ready_to_enqueue_time:139, waiting_to_ready_time:511842115 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
154+
:256 256 [017] 105373.360012: vpa-dtl: timebase: 21403600706679454 dispatch_reason:priv doorbell, preempt_reason:H_CEDE, enqueue_to_dispatch_time:236, ready_to_enqueue_time:0, waiting_to_ready_time:133864583 c0000000000fcd28 plpar_hcall_norets_notrace+0x18 ([kernel.kallsyms])
155+
perf 13418 [010] 105373.360048: sched:sched_stat_runtime: comm=perf pid=13418 runtime=139748 [ns]
156+
perf 13418 [010] 105373.360052: sched:sched_waking: comm=migration/10 pid=72 prio=0 target_cpu=010

Documentation/userspace-api/ioctl/ioctl-number.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -374,6 +374,8 @@ Code Seq# Include File Comments
374374
<mailto:linuxppc-dev@lists.ozlabs.org>
375375
0xB2 08 arch/powerpc/include/uapi/asm/papr-physical-attestation.h powerpc/pseries Physical Attestation API
376376
<mailto:linuxppc-dev@lists.ozlabs.org>
377+
0xB2 09 arch/powerpc/include/uapi/asm/papr-hvpipe.h powerpc/pseries HVPIPE API
378+
<mailto:linuxppc-dev@lists.ozlabs.org>
377379
0xB3 00 linux/mmc/ioctl.h
378380
0xB4 00-0F linux/gpio.h <mailto:linux-gpio@vger.kernel.org>
379381
0xB5 00-0F uapi/linux/rpmsg.h <mailto:linux-remoteproc@vger.kernel.org>

arch/powerpc/Kconfig

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,12 +243,14 @@ config PPC
243243
select HAVE_EFFICIENT_UNALIGNED_ACCESS
244244
select HAVE_GUP_FAST
245245
select HAVE_FTRACE_GRAPH_FUNC
246+
select HAVE_FTRACE_REGS_HAVING_PT_REGS
246247
select HAVE_FUNCTION_ARG_ACCESS_API
247248
select HAVE_FUNCTION_DESCRIPTORS if PPC64_ELF_ABI_V1
248249
select HAVE_FUNCTION_ERROR_INJECTION
250+
select HAVE_FUNCTION_GRAPH_FREGS
249251
select HAVE_FUNCTION_GRAPH_TRACER
250252
select HAVE_FUNCTION_TRACER if !COMPILE_TEST && (PPC64 || (PPC32 && CC_IS_GCC))
251-
select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200 # plugin support on gcc <= 5.1 is buggy on PPC
253+
select HAVE_GCC_PLUGINS
252254
select HAVE_GENERIC_VDSO
253255
select HAVE_HARDLOCKUP_DETECTOR_ARCH if PPC_BOOK3S_64 && SMP
254256
select HAVE_HARDLOCKUP_DETECTOR_PERF if PERF_EVENTS && HAVE_PERF_EVENTS_NMI

arch/powerpc/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ ifeq ($(CONFIG_PPC64)$(CONFIG_LD_IS_BFD),yy)
5858
# There is a corresponding test in arch/powerpc/lib/Makefile
5959
KBUILD_LDFLAGS_MODULE += --save-restore-funcs
6060
else
61-
KBUILD_LDFLAGS_MODULE += arch/powerpc/lib/crtsavres.o
61+
KBUILD_LDFLAGS_MODULE += $(objtree)/arch/powerpc/lib/crtsavres.o
6262
endif
6363

6464
ifdef CONFIG_CPU_LITTLE_ENDIAN

arch/powerpc/boot/page.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
* Copyright (C) 2001 PPC64 Team, IBM Corp
66
*/
77

8-
#ifdef __ASSEMBLY__
8+
#ifdef __ASSEMBLER__
99
#define ASM_CONST(x) x
1010
#else
1111
#define __ASM_CONST(x) x##UL

arch/powerpc/boot/wrapper

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -226,11 +226,7 @@ ld_is_lld()
226226

227227
# Do not include PT_INTERP segment when linking pie. Non-pie linking
228228
# just ignores this option.
229-
LD_VERSION=$(${CROSS}ld --version | ld_version)
230-
LD_NO_DL_MIN_VERSION=$(echo 2.26 | ld_version)
231-
if [ "$LD_VERSION" -ge "$LD_NO_DL_MIN_VERSION" ] ; then
232-
nodl="--no-dynamic-linker"
233-
fi
229+
nodl="--no-dynamic-linker"
234230

235231
# suppress some warnings in recent ld versions
236232
nowarn="-z noexecstack"

arch/powerpc/include/asm/asm-const.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
#ifndef _ASM_POWERPC_ASM_CONST_H
22
#define _ASM_POWERPC_ASM_CONST_H
33

4-
#ifdef __ASSEMBLY__
4+
#ifdef __ASSEMBLER__
55
# define stringify_in_c(...) __VA_ARGS__
66
# define ASM_CONST(x) x
77
#else

arch/powerpc/include/asm/barrier.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
#include <asm/asm-const.h>
99

10-
#ifndef __ASSEMBLY__
10+
#ifndef __ASSEMBLER__
1111
#include <asm/ppc-opcode.h>
1212
#endif
1313

0 commit comments

Comments
 (0)