Skip to content

Commit f2d64a2

Browse files
committed
Merge branch 'for-next/perf' into for-next/core
* for-next/perf: (29 commits) perf/dwc_pcie: Fix use of uninitialized variable Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Documentation: hisi-pmu: Fix of minor format error drivers/perf: hisi: Add support for L3C PMU v3 drivers/perf: hisi: Refactor the event configuration of L3C PMU drivers/perf: hisi: Extend the field of tt_core drivers/perf: hisi: Extract the event filter check of L3C PMU drivers/perf: hisi: Simplify the probe process of each L3C PMU version drivers/perf: hisi: Export hisi_uncore_pmu_isr() drivers/perf: hisi: Relax the event ID check in the framework perf: Fujitsu: Add the Uncore PMU driver perf/arm-cmn: Fix CMN S3 DTM offset perf: arm_spe: Prevent overflow in PERF_IDX2OFF() coresight: trbe: Prevent overflow in PERF_IDX2OFF() MAINTAINERS: Remove myself from HiSilicon PMU maintainers drivers/perf: hisi: Add support for HiSilicon MN PMU driver drivers/perf: hisi: Add support for HiSilicon NoC PMU perf: arm_pmuv3: Factor out PMCCNTR_EL0 use conditions arm64/boot: Enable EL2 requirements for SPE_FEAT_FDS arm64/boot: Factor out a macro to check SPE version ...
2 parents 77dfca7 + 2084660 commit f2d64a2

26 files changed

Lines changed: 2399 additions & 174 deletions

Documentation/admin-guide/perf/dwc_pcie_pmu.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ provides the following two features:
1616

1717
- one 64-bit counter for Time Based Analysis (RX/TX data throughput and
1818
time spent in each low-power LTSSM state) and
19-
- one 32-bit counter for Event Counting (error and non-error events for
20-
a specified lane)
19+
- one 32-bit counter per event for Event Counting (error and non-error
20+
events for a specified lane)
2121

2222
Note: There is no interrupt for counter overflow.
2323

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
.. SPDX-License-Identifier: GPL-2.0-only
2+
3+
================================================
4+
Fujitsu Uncore Performance Monitoring Unit (PMU)
5+
================================================
6+
7+
This driver supports the Uncore MAC PMUs and the Uncore PCI PMUs found
8+
in Fujitsu chips.
9+
Each MAC PMU on these chips is exposed as a uncore perf PMU with device name
10+
mac_iod<iod>_mac<mac>_ch<ch>.
11+
And each PCI PMU on these chips is exposed as a uncore perf PMU with device name
12+
pci_iod<iod>_pci<pci>.
13+
14+
The driver provides a description of its available events and configuration
15+
options in sysfs, see /sys/bus/event_sources/devices/mac_iod<iod>_mac<mac>_ch<ch>/
16+
and /sys/bus/event_sources/devices/pci_iod<iod>_pci<pci>/.
17+
This driver exports:
18+
- formats, used by perf user space and other tools to configure events
19+
- events, used by perf user space and other tools to create events
20+
symbolically, e.g.:
21+
perf stat -a -e mac_iod0_mac0_ch0/event=0x21/ ls
22+
perf stat -a -e pci_iod0_pci0/event=0x24/ ls
23+
- cpumask, used by perf user space and other tools to know on which CPUs
24+
to open the events
25+
26+
This driver supports the following events for MAC:
27+
- cycles
28+
This event counts MAC cycles at MAC frequency.
29+
- read-count
30+
This event counts the number of read requests to MAC.
31+
- read-count-request
32+
This event counts the number of read requests including retry to MAC.
33+
- read-count-return
34+
This event counts the number of responses to read requests to MAC.
35+
- read-count-request-pftgt
36+
This event counts the number of read requests including retry with PFTGT
37+
flag.
38+
- read-count-request-normal
39+
This event counts the number of read requests including retry without PFTGT
40+
flag.
41+
- read-count-return-pftgt-hit
42+
This event counts the number of responses to read requests which hit the
43+
PFTGT buffer.
44+
- read-count-return-pftgt-miss
45+
This event counts the number of responses to read requests which miss the
46+
PFTGT buffer.
47+
- read-wait
48+
This event counts outstanding read requests issued by DDR memory controller
49+
per cycle.
50+
- write-count
51+
This event counts the number of write requests to MAC (including zero write,
52+
full write, partial write, write cancel).
53+
- write-count-write
54+
This event counts the number of full write requests to MAC (not including
55+
zero write).
56+
- write-count-pwrite
57+
This event counts the number of partial write requests to MAC.
58+
- memory-read-count
59+
This event counts the number of read requests from MAC to memory.
60+
- memory-write-count
61+
This event counts the number of full write requests from MAC to memory.
62+
- memory-pwrite-count
63+
This event counts the number of partial write requests from MAC to memory.
64+
- ea-mac
65+
This event counts energy consumption of MAC.
66+
- ea-memory
67+
This event counts energy consumption of memory.
68+
- ea-memory-mac-write
69+
This event counts the number of write requests from MAC to memory.
70+
- ea-ha
71+
This event counts energy consumption of HA.
72+
73+
'ea' is the abbreviation for 'Energy Analyzer'.
74+
75+
Examples for use with perf::
76+
77+
perf stat -e mac_iod0_mac0_ch0/ea-mac/ ls
78+
79+
And, this driver supports the following events for PCI:
80+
- pci-port0-cycles
81+
This event counts PCI cycles at PCI frequency in port0.
82+
- pci-port0-read-count
83+
This event counts read transactions for data transfer in port0.
84+
- pci-port0-read-count-bus
85+
This event counts read transactions for bus usage in port0.
86+
- pci-port0-write-count
87+
This event counts write transactions for data transfer in port0.
88+
- pci-port0-write-count-bus
89+
This event counts write transactions for bus usage in port0.
90+
- pci-port1-cycles
91+
This event counts PCI cycles at PCI frequency in port1.
92+
- pci-port1-read-count
93+
This event counts read transactions for data transfer in port1.
94+
- pci-port1-read-count-bus
95+
This event counts read transactions for bus usage in port1.
96+
- pci-port1-write-count
97+
This event counts write transactions for data transfer in port1.
98+
- pci-port1-write-count-bus
99+
This event counts write transactions for bus usage in port1.
100+
- ea-pci
101+
This event counts energy consumption of PCI.
102+
103+
'ea' is the abbreviation for 'Energy Analyzer'.
104+
105+
Examples for use with perf::
106+
107+
perf stat -e pci_iod0_pci0/ea-pci/ ls
108+
109+
Given that these are uncore PMUs the driver does not support sampling, therefore
110+
"perf record" will not work. Per-task perf sessions are not supported.

Documentation/admin-guide/perf/hisi-pmu.rst

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,10 @@ HiSilicon SoC uncore PMU driver
1818
Each device PMU has separate registers for event counting, control and
1919
interrupt, and the PMU driver shall register perf PMU drivers like L3C,
2020
HHA and DDRC etc. The available events and configuration options shall
21-
be described in the sysfs, see:
21+
be described in the sysfs, see::
22+
23+
/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>
2224

23-
/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}/hha{Y}/ddrc{Y}>.
2425
The "perf list" command shall list the available events from sysfs.
2526

2627
Each L3C, HHA and DDRC is registered as a separate PMU with perf. The PMU
@@ -112,6 +113,50 @@ uring channel. It is 2 bits. Some important codes are as follows:
112113
- 2'b00: default value, count the events which sent to the both uring and
113114
uring_ext channel;
114115

116+
6. ch: NoC PMU supports filtering the event counts of certain transaction
117+
channel with this option. The current supported channels are as follows:
118+
119+
- 3'b010: Request channel
120+
- 3'b100: Snoop channel
121+
- 3'b110: Response channel
122+
- 3'b111: Data channel
123+
124+
7. tt_en: NoC PMU supports counting only transactions that have tracetag set
125+
if this option is set. See the 2nd list for more information about tracetag.
126+
127+
For HiSilicon uncore PMU v3 whose identifier is 0x40, some uncore PMUs are
128+
further divided into parts for finer granularity of tracing, each part has its
129+
own dedicated PMU, and all such PMUs together cover the monitoring job of events
130+
on particular uncore device. Such PMUs are described in sysfs with name format
131+
slightly changed::
132+
133+
/sys/bus/event_source/devices/hisi_sccl{X}_<l3c{Y}_{Z}/ddrc{Y}_{Z}/noc{Y}_{Z}>
134+
135+
Z is the sub-id, indicating different PMUs for part of hardware device.
136+
137+
Usage of most PMUs with different sub-ids are identical. Specially, L3C PMU
138+
provides ``ext`` option to allow exploration of even finer granual statistics
139+
of L3C PMU. L3C PMU driver uses that as hint of termination when delivering
140+
perf command to hardware:
141+
142+
- ext=0: Default, could be used with event names.
143+
- ext=1 and ext=2: Must be used with event codes, event names are not supported.
144+
145+
An example of perf command could be::
146+
147+
$# perf stat -a -e hisi_sccl0_l3c1_0/rd_spipe/ sleep 5
148+
149+
or::
150+
151+
$# perf stat -a -e hisi_sccl0_l3c1_0/event=0x1,ext=1/ sleep 5
152+
153+
As above, ``hisi_sccl0_l3c1_0`` locates PMU of Super CPU CLuster 0, L3 cache 1
154+
pipe0.
155+
156+
First command locates the first part of L3C since ``ext=0`` is implied by
157+
default. Second command issues the counting on another part of L3C with the
158+
event ``0x1``.
159+
115160
Users could configure IDs to count data come from specific CCL/ICL, by setting
116161
srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
117162
tgtid_cmd & tgtid_msk. A set bit in srcid_msk/tgtid_msk means the PMU will not

Documentation/admin-guide/perf/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,4 @@ Performance monitor support
2929
cxl
3030
ampere_cspmu
3131
mrvl-pem-pmu
32+
fujitsu_uncore_pmu

Documentation/arch/arm64/booting.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,17 @@ Before jumping into the kernel, the following conditions must be met:
466466
- HDFGWTR2_EL2.nPMICFILTR_EL0 (bit 3) must be initialised to 0b1.
467467
- HDFGWTR2_EL2.nPMUACR_EL1 (bit 4) must be initialised to 0b1.
468468

469+
For CPUs with SPE data source filtering (FEAT_SPE_FDS):
470+
471+
- If EL3 is present:
472+
473+
- MDCR_EL3.EnPMS3 (bit 42) must be initialised to 0b1.
474+
475+
- If the kernel is entered at EL1 and EL2 is present:
476+
477+
- HDFGRTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
478+
- HDFGWTR2_EL2.nPMSDSFR_EL1 (bit 19) must be initialised to 0b1.
479+
469480
For CPUs with Memory Copy and Memory Set instructions (FEAT_MOPS):
470481

471482
- If the kernel is entered at EL1 and EL2 is present:

Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ properties:
3333
- items:
3434
- enum:
3535
- fsl,imx91-ddr-pmu
36+
- fsl,imx94-ddr-pmu
3637
- fsl,imx95-ddr-pmu
3738
- const: fsl,imx93-ddr-pmu
3839

MAINTAINERS

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9744,11 +9744,14 @@ F: drivers/video/fbdev/imxfb.c
97449744

97459745
FREESCALE IMX DDR PMU DRIVER
97469746
M: Frank Li <Frank.li@nxp.com>
9747+
M: Xu Yang <xu.yang_2@nxp.com>
97479748
L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
97489749
S: Maintained
97499750
F: Documentation/admin-guide/perf/imx-ddr.rst
97509751
F: Documentation/devicetree/bindings/perf/fsl-imx-ddr.yaml
97519752
F: drivers/perf/fsl_imx8_ddr_perf.c
9753+
F: drivers/perf/fsl_imx9_ddr_perf.c
9754+
F: tools/perf/pmu-events/arch/arm64/freescale/
97529755

97539756
FREESCALE IMX I2C DRIVER
97549757
M: Oleksij Rempel <o.rempel@pengutronix.de>
@@ -11059,7 +11062,6 @@ F: Documentation/devicetree/bindings/net/hisilicon*.txt
1105911062
F: drivers/net/ethernet/hisilicon/
1106011063

1106111064
HISILICON PMU DRIVER
11062-
M: Yicong Yang <yangyicong@hisilicon.com>
1106311065
M: Jonathan Cameron <jonathan.cameron@huawei.com>
1106411066
S: Supported
1106511067
W: http://www.hisilicon.com

arch/arm64/include/asm/el2_setup.h

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,14 @@
9191
msr cntvoff_el2, xzr // Clear virtual offset
9292
.endm
9393

94+
/* Branch to skip_label if SPE version is less than given version */
95+
.macro __spe_vers_imp skip_label, version, tmp
96+
mrs \tmp, id_aa64dfr0_el1
97+
ubfx \tmp, \tmp, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
98+
cmp \tmp, \version
99+
b.lt \skip_label
100+
.endm
101+
94102
.macro __init_el2_debug
95103
mrs x1, id_aa64dfr0_el1
96104
ubfx x0, x1, #ID_AA64DFR0_EL1_PMUVer_SHIFT, #4
@@ -103,8 +111,7 @@
103111
csel x2, xzr, x0, eq // all PMU counters from EL1
104112

105113
/* Statistical profiling */
106-
ubfx x0, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
107-
cbz x0, .Lskip_spe_\@ // Skip if SPE not present
114+
__spe_vers_imp .Lskip_spe_\@, ID_AA64DFR0_EL1_PMSVer_IMP, x0 // Skip if SPE not present
108115

109116
mrs_s x0, SYS_PMBIDR_EL1 // If SPE available at EL2,
110117
and x0, x0, #(1 << PMBIDR_EL1_P_SHIFT)
@@ -263,10 +270,8 @@
263270

264271
mov x0, xzr
265272
mov x2, xzr
266-
mrs x1, id_aa64dfr0_el1
267-
ubfx x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
268-
cmp x1, #3
269-
b.lt .Lskip_spe_fgt_\@
273+
/* If SPEv1p2 is implemented, */
274+
__spe_vers_imp .Lskip_spe_fgt_\@, #ID_AA64DFR0_EL1_PMSVer_V1P2, x1
270275
/* Disable PMSNEVFR_EL1 read and write traps */
271276
orr x0, x0, #HDFGRTR_EL2_nPMSNEVFR_EL1_MASK
272277
orr x2, x2, #HDFGWTR_EL2_nPMSNEVFR_EL1_MASK
@@ -387,6 +392,17 @@
387392
orr x0, x0, #HDFGRTR2_EL2_nPMICFILTR_EL0
388393
orr x0, x0, #HDFGRTR2_EL2_nPMUACR_EL1
389394
.Lskip_pmuv3p9_\@:
395+
/* If SPE is implemented, */
396+
__spe_vers_imp .Lskip_spefds_\@, ID_AA64DFR0_EL1_PMSVer_IMP, x1
397+
/* we can read PMSIDR and */
398+
mrs_s x1, SYS_PMSIDR_EL1
399+
and x1, x1, #PMSIDR_EL1_FDS
400+
/* if FEAT_SPE_FDS is implemented, */
401+
cbz x1, .Lskip_spefds_\@
402+
/* disable traps of PMSDSFR to EL2. */
403+
orr x0, x0, #HDFGRTR2_EL2_nPMSDSFR_EL1
404+
405+
.Lskip_spefds_\@:
390406
msr_s SYS_HDFGRTR2_EL2, x0
391407
msr_s SYS_HDFGWTR2_EL2, x0
392408
msr_s SYS_HFGRTR2_EL2, xzr

arch/arm64/include/asm/sysreg.h

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -344,15 +344,6 @@
344344
#define SYS_PAR_EL1_ATTR GENMASK_ULL(63, 56)
345345
#define SYS_PAR_EL1_F0_RES0 (GENMASK_ULL(6, 1) | GENMASK_ULL(55, 52))
346346

347-
/*** Statistical Profiling Extension ***/
348-
#define PMSEVFR_EL1_RES0_IMP \
349-
(GENMASK_ULL(47, 32) | GENMASK_ULL(23, 16) | GENMASK_ULL(11, 8) |\
350-
BIT_ULL(6) | BIT_ULL(4) | BIT_ULL(2) | BIT_ULL(0))
351-
#define PMSEVFR_EL1_RES0_V1P1 \
352-
(PMSEVFR_EL1_RES0_IMP & ~(BIT_ULL(18) | BIT_ULL(17) | BIT_ULL(11)))
353-
#define PMSEVFR_EL1_RES0_V1P2 \
354-
(PMSEVFR_EL1_RES0_V1P1 & ~BIT_ULL(6))
355-
356347
/* Buffer error reporting */
357348
#define PMBSR_EL1_FAULT_FSC_SHIFT PMBSR_EL1_MSS_SHIFT
358349
#define PMBSR_EL1_FAULT_FSC_MASK PMBSR_EL1_MSS_MASK

arch/arm64/tools/sysreg

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2994,11 +2994,20 @@ Field 0 RND
29942994
EndSysreg
29952995

29962996
Sysreg PMSFCR_EL1 3 0 9 9 4
2997-
Res0 63:19
2997+
Res0 63:53
2998+
Field 52 SIMDm
2999+
Field 51 FPm
3000+
Field 50 STm
3001+
Field 49 LDm
3002+
Field 48 Bm
3003+
Res0 47:21
3004+
Field 20 SIMD
3005+
Field 19 FP
29983006
Field 18 ST
29993007
Field 17 LD
30003008
Field 16 B
3001-
Res0 15:4
3009+
Res0 15:5
3010+
Field 4 FDS
30023011
Field 3 FnE
30033012
Field 2 FL
30043013
Field 1 FT

0 commit comments

Comments
 (0)