Skip to content

Commit 9b9e211

Browse files
committed
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas: - KCSAN enabled for arm64. - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD. - Some more SVE clean-ups and refactoring in preparation for SME support (scalable matrix extensions). - BTI clean-ups (SYM_FUNC macros etc.) - arm64 atomics clean-up and codegen improvements. - HWCAPs for FEAT_AFP (alternate floating point behaviour) and FEAT_RPRESS (increased precision of reciprocal estimate and reciprocal square root estimate). - Use SHA3 instructions to speed-up XOR. - arm64 unwind code refactoring/unification. - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP == 1 (potentially set by a hypervisor; user-space already does this). - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU, Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups. - Other fixes and clean-ups; highlights: fix the handling of erratum 1418040, correct the calculation of the nomap region boundaries, introduce io_stop_wc() mapped to the new DGH instruction (data gathering hint). * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits) arm64: Use correct method to calculate nomap region boundaries arm64: Drop outdated links in comments arm64: perf: Don't register user access sysctl handler multiple times drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check perf/smmuv3: Fix unused variable warning when CONFIG_OF=n arm64: errata: Fix exec handling in erratum 1418040 workaround arm64: Unhash early pointer print plus improve comment asm-generic: introduce io_stop_wc() and add implementation for ARM64 arm64: Ensure that the 'bti' macro is defined where linkage.h is included arm64: remove __dma_*_area() aliases docs/arm64: delete a space from tagged-address-abi arm64: Enable KCSAN kselftest/arm64: Add pidbench for floating point syscall cases arm64/fp: Add comments documenting the usage of state restore functions kselftest/arm64: Add a test program to exercise the syscall ABI kselftest/arm64: Allow signal tests to trigger from a function kselftest/arm64: Parameterise ptrace vector length information arm64/sve: Minor clarification of ABI documentation arm64/sve: Generalise vector length configuration prctl() for SME arm64/sve: Make sysctl interface for SVE reusable by SME ...
2 parents a7ac314 + 945409a commit 9b9e211

86 files changed

Lines changed: 4184 additions & 1070 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
================================================
2+
HiSilicon PCIe Performance Monitoring Unit (PMU)
3+
================================================
4+
5+
On Hip09, HiSilicon PCIe Performance Monitoring Unit (PMU) could monitor
6+
bandwidth, latency, bus utilization and buffer occupancy data of PCIe.
7+
8+
Each PCIe Core has a PMU to monitor multi Root Ports of this PCIe Core and
9+
all Endpoints downstream these Root Ports.
10+
11+
12+
HiSilicon PCIe PMU driver
13+
=========================
14+
15+
The PCIe PMU driver registers a perf PMU with the name of its sicl-id and PCIe
16+
Core id.::
17+
18+
/sys/bus/event_source/hisi_pcie<sicl>_<core>
19+
20+
PMU driver provides description of available events and filter options in sysfs,
21+
see /sys/bus/event_source/devices/hisi_pcie<sicl>_<core>.
22+
23+
The "format" directory describes all formats of the config (events) and config1
24+
(filter options) fields of the perf_event_attr structure. The "events" directory
25+
describes all documented events shown in perf list.
26+
27+
The "identifier" sysfs file allows users to identify the version of the
28+
PMU hardware device.
29+
30+
The "bus" sysfs file allows users to get the bus number of Root Ports
31+
monitored by PMU.
32+
33+
Example usage of perf::
34+
35+
$# perf list
36+
hisi_pcie0_0/rx_mwr_latency/ [kernel PMU event]
37+
hisi_pcie0_0/rx_mwr_cnt/ [kernel PMU event]
38+
------------------------------------------
39+
40+
$# perf stat -e hisi_pcie0_0/rx_mwr_latency/
41+
$# perf stat -e hisi_pcie0_0/rx_mwr_cnt/
42+
$# perf stat -g -e hisi_pcie0_0/rx_mwr_latency/ -e hisi_pcie0_0/rx_mwr_cnt/
43+
44+
The current driver does not support sampling. So "perf record" is unsupported.
45+
Also attach to a task is unsupported for PCIe PMU.
46+
47+
Filter options
48+
--------------
49+
50+
1. Target filter
51+
PMU could only monitor the performance of traffic downstream target Root Ports
52+
or downstream target Endpoint. PCIe PMU driver support "port" and "bdf"
53+
interfaces for users, and these two interfaces aren't supported at the same
54+
time.
55+
56+
-port
57+
"port" filter can be used in all PCIe PMU events, target Root Port can be
58+
selected by configuring the 16-bits-bitmap "port". Multi ports can be selected
59+
for AP-layer-events, and only one port can be selected for TL/DL-layer-events.
60+
61+
For example, if target Root Port is 0000:00:00.0 (x8 lanes), bit0 of bitmap
62+
should be set, port=0x1; if target Root Port is 0000:00:04.0 (x4 lanes),
63+
bit8 is set, port=0x100; if these two Root Ports are both monitored, port=0x101.
64+
65+
Example usage of perf::
66+
67+
$# perf stat -e hisi_pcie0_0/rx_mwr_latency,port=0x1/ sleep 5
68+
69+
-bdf
70+
71+
"bdf" filter can only be used in bandwidth events, target Endpoint is selected
72+
by configuring BDF to "bdf". Counter only counts the bandwidth of message
73+
requested by target Endpoint.
74+
75+
For example, "bdf=0x3900" means BDF of target Endpoint is 0000:39:00.0.
76+
77+
Example usage of perf::
78+
79+
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,bdf=0x3900/ sleep 5
80+
81+
2. Trigger filter
82+
Event statistics start when the first time TLP length is greater/smaller
83+
than trigger condition. You can set the trigger condition by writing "trig_len",
84+
and set the trigger mode by writing "trig_mode". This filter can only be used
85+
in bandwidth events.
86+
87+
For example, "trig_len=4" means trigger condition is 2^4 DW, "trig_mode=0"
88+
means statistics start when TLP length > trigger condition, "trig_mode=1"
89+
means start when TLP length < condition.
90+
91+
Example usage of perf::
92+
93+
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5
94+
95+
3. Threshold filter
96+
Counter counts when TLP length within the specified range. You can set the
97+
threshold by writing "thr_len", and set the threshold mode by writing
98+
"thr_mode". This filter can only be used in bandwidth events.
99+
100+
For example, "thr_len=4" means threshold is 2^4 DW, "thr_mode=0" means
101+
counter counts when TLP length >= threshold, and "thr_mode=1" means counts
102+
when TLP length < threshold.
103+
104+
Example usage of perf::
105+
106+
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5

Documentation/admin-guide/sysctl/kernel.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -905,6 +905,17 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
905905
The default value is 8.
906906

907907

908+
perf_user_access (arm64 only)
909+
=================================
910+
911+
Controls user space access for reading perf event counters. When set to 1,
912+
user space can read performance monitor counter registers directly.
913+
914+
The default value is 0 (access disabled).
915+
916+
See Documentation/arm64/perf.rst for more information.
917+
918+
908919
pid_max
909920
=======
910921

Documentation/arm64/cpu-feature-registers.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,23 @@ infrastructure:
275275
| SVEVer | [3-0] | y |
276276
+------------------------------+---------+---------+
277277

278+
8) ID_AA64MMFR1_EL1 - Memory model feature register 1
279+
280+
+------------------------------+---------+---------+
281+
| Name | bits | visible |
282+
+------------------------------+---------+---------+
283+
| AFP | [47-44] | y |
284+
+------------------------------+---------+---------+
285+
286+
9) ID_AA64ISAR2_EL1 - Instruction set attribute register 2
287+
288+
+------------------------------+---------+---------+
289+
| Name | bits | visible |
290+
+------------------------------+---------+---------+
291+
| RPRES | [7-4] | y |
292+
+------------------------------+---------+---------+
293+
294+
278295
Appendix I: Example
279296
-------------------
280297

Documentation/arm64/elf_hwcaps.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,6 +251,14 @@ HWCAP2_ECV
251251

252252
Functionality implied by ID_AA64MMFR0_EL1.ECV == 0b0001.
253253

254+
HWCAP2_AFP
255+
256+
Functionality implied by ID_AA64MFR1_EL1.AFP == 0b0001.
257+
258+
HWCAP2_RPRES
259+
260+
Functionality implied by ID_AA64ISAR2_EL1.RPRES == 0b0001.
261+
254262
4. Unused AT_HWCAP bits
255263
-----------------------
256264

Documentation/arm64/perf.rst

Lines changed: 77 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@
22
33
.. _perf_index:
44

5-
=====================
5+
====
6+
Perf
7+
====
8+
69
Perf Event Attributes
710
=====================
811

@@ -88,3 +91,76 @@ exclude_host. However when using !exclude_hv there is a small blackout
8891
window at the guest entry/exit where host events are not captured.
8992

9093
On VHE systems there are no blackout windows.
94+
95+
Perf Userspace PMU Hardware Counter Access
96+
==========================================
97+
98+
Overview
99+
--------
100+
The perf userspace tool relies on the PMU to monitor events. It offers an
101+
abstraction layer over the hardware counters since the underlying
102+
implementation is cpu-dependent.
103+
Arm64 allows userspace tools to have access to the registers storing the
104+
hardware counters' values directly.
105+
106+
This targets specifically self-monitoring tasks in order to reduce the overhead
107+
by directly accessing the registers without having to go through the kernel.
108+
109+
How-to
110+
------
111+
The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu
112+
registers is enabled and that the userspace has access to the relevant
113+
information in order to use them.
114+
115+
In order to have access to the hardware counters, the global sysctl
116+
kernel/perf_user_access must first be enabled:
117+
118+
.. code-block:: sh
119+
120+
echo 1 > /proc/sys/kernel/perf_user_access
121+
122+
It is necessary to open the event using the perf tool interface with config1:1
123+
attr bit set: the sys_perf_event_open syscall returns a fd which can
124+
subsequently be used with the mmap syscall in order to retrieve a page of memory
125+
containing information about the event. The PMU driver uses this page to expose
126+
to the user the hardware counter's index and other necessary data. Using this
127+
index enables the user to access the PMU registers using the `mrs` instruction.
128+
Access to the PMU registers is only valid while the sequence lock is unchanged.
129+
In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is
130+
changed.
131+
132+
The userspace access is supported in libperf using the perf_evsel__mmap()
133+
and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for
134+
an example.
135+
136+
About heterogeneous systems
137+
---------------------------
138+
On heterogeneous systems such as big.LITTLE, userspace PMU counter access can
139+
only be enabled when the tasks are pinned to a homogeneous subset of cores and
140+
the corresponding PMU instance is opened by specifying the 'type' attribute.
141+
The use of generic event types is not supported in this case.
142+
143+
Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It
144+
can be run using the perf tool to check that the access to the registers works
145+
correctly from userspace:
146+
147+
.. code-block:: sh
148+
149+
perf test -v user
150+
151+
About chained events and counter sizes
152+
--------------------------------------
153+
The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1)
154+
counter along with userspace access. The sys_perf_event_open syscall will fail
155+
if a 64-bit counter is requested and the hardware doesn't support 64-bit
156+
counters. Chained events are not supported in conjunction with userspace counter
157+
access. If a 32-bit counter is requested on hardware with 64-bit counters, then
158+
userspace must treat the upper 32-bits read from the counter as UNKNOWN. The
159+
'pmc_width' field in the user page will indicate the valid width of the counter
160+
and should be used to mask the upper bits as needed.
161+
162+
.. Links
163+
.. _tools/perf/arch/arm64/tests/user-events.c:
164+
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
165+
.. _tools/lib/perf/tests/test-evsel.c:
166+
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c

Documentation/arm64/sve.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -255,7 +255,7 @@ prctl(PR_SVE_GET_VL)
255255
vector length change (which would only normally be the case between a
256256
fork() or vfork() and the corresponding execve() in typical use).
257257

258-
To extract the vector length from the result, and it with
258+
To extract the vector length from the result, bitwise and it with
259259
PR_SVE_VL_LEN_MASK.
260260

261261
Return value: a nonnegative value on success, or a negative value on error:

Documentation/arm64/tagged-address-abi.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ how the user addresses are used by the kernel:
4949

5050
- ``brk()``, ``mmap()`` and the ``new_address`` argument to
5151
``mremap()`` as these have the potential to alias with existing
52-
user addresses.
52+
user addresses.
5353

5454
NOTE: This behaviour changed in v5.6 and so some earlier kernels may
5555
incorrectly accept valid tagged pointers for the ``brk()``,

Documentation/devicetree/bindings/perf/arm,cmn.yaml

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,14 @@ maintainers:
1212

1313
properties:
1414
compatible:
15-
const: arm,cmn-600
15+
enum:
16+
- arm,cmn-600
17+
- arm,ci-700
1618

1719
reg:
1820
items:
1921
- description: Physical address of the base (PERIPHBASE) and
20-
size (up to 64MB) of the configuration address space.
22+
size of the configuration address space.
2123

2224
interrupts:
2325
minItems: 1
@@ -31,14 +33,23 @@ properties:
3133

3234
arm,root-node:
3335
$ref: /schemas/types.yaml#/definitions/uint32
34-
description: Offset from PERIPHBASE of the configuration
35-
discovery node (see TRM definition of ROOTNODEBASE).
36+
description: Offset from PERIPHBASE of CMN-600's configuration
37+
discovery node (see TRM definition of ROOTNODEBASE). Not
38+
relevant for newer CMN/CI products.
3639

3740
required:
3841
- compatible
3942
- reg
4043
- interrupts
41-
- arm,root-node
44+
45+
if:
46+
properties:
47+
compatible:
48+
contains:
49+
const: arm,cmn-600
50+
then:
51+
required:
52+
- arm,root-node
4253

4354
additionalProperties: false
4455

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
2+
%YAML 1.2
3+
---
4+
$id: http://devicetree.org/schemas/perf/arm,smmu-v3-pmcg.yaml#
5+
$schema: http://devicetree.org/meta-schemas/core.yaml#
6+
7+
title: Arm SMMUv3 Performance Monitor Counter Group
8+
9+
maintainers:
10+
- Will Deacon <will@kernel.org>
11+
- Robin Murphy <robin.murphy@arm.com>
12+
13+
description: |
14+
An SMMUv3 may have several Performance Monitor Counter Group (PMCG).
15+
They are standalone performance monitoring units that support both
16+
architected and IMPLEMENTATION DEFINED event counters.
17+
18+
properties:
19+
$nodename:
20+
pattern: "^pmu@[0-9a-f]*"
21+
compatible:
22+
oneOf:
23+
- items:
24+
- const: arm,mmu-600-pmcg
25+
- const: arm,smmu-v3-pmcg
26+
- const: arm,smmu-v3-pmcg
27+
28+
reg:
29+
items:
30+
- description: Register page 0
31+
- description: Register page 1, if SMMU_PMCG_CFGR.RELOC_CTRS = 1
32+
minItems: 1
33+
34+
interrupts:
35+
maxItems: 1
36+
37+
msi-parent: true
38+
39+
required:
40+
- compatible
41+
- reg
42+
43+
anyOf:
44+
- required:
45+
- interrupts
46+
- required:
47+
- msi-parent
48+
49+
additionalProperties: false
50+
51+
examples:
52+
- |
53+
#include <dt-bindings/interrupt-controller/arm-gic.h>
54+
#include <dt-bindings/interrupt-controller/irq.h>
55+
56+
pmu@2b420000 {
57+
compatible = "arm,smmu-v3-pmcg";
58+
reg = <0x2b420000 0x1000>,
59+
<0x2b430000 0x1000>;
60+
interrupts = <GIC_SPI 80 IRQ_TYPE_EDGE_RISING>;
61+
msi-parent = <&its 0xff0000>;
62+
};
63+
64+
pmu@2b440000 {
65+
compatible = "arm,smmu-v3-pmcg";
66+
reg = <0x2b440000 0x1000>,
67+
<0x2b450000 0x1000>;
68+
interrupts = <GIC_SPI 81 IRQ_TYPE_EDGE_RISING>;
69+
msi-parent = <&its 0xff0000>;
70+
};

0 commit comments

Comments
 (0)