Skip to content

Commit eb04e72

Browse files
Merge patch series "RISC-V Hardware Probing User Interface"
Evan Green <evan@rivosinc.com> says: There's been a bunch of off-list discussions about this, including at Plumbers. The original plan was to do something involving providing an ISA string to userspace, but ISA strings just aren't sufficient for a stable ABI any more: in order to parse an ISA string users need the version of the specifications that the string is written to, the version of each extension (sometimes at a finer granularity than the RISC-V releases/versions encode), and the expected use case for the ISA string (ie, is it a U-mode or M-mode string). That's a lot of complexity to try and keep ABI compatible and it's probably going to continue to grow, as even if there's no more complexity in the specifications we'll have to deal with the various ISA string parsing oddities that end up all over userspace. Instead this patch set takes a very different approach and provides a set of key/value pairs that encode various bits about the system. The big advantage here is that we can clearly define what these mean so we can ensure ABI stability, but it also allows us to encode information that's unlikely to ever appear in an ISA string (see the misaligned access performance, for example). The resulting interface looks a lot like what arm64 and x86 do, and will hopefully fit well into something like ACPI in the future. The actual user interface is a syscall, with a vDSO function in front of it. The vDSO function can answer some queries without a syscall at all, and falls back to the syscall for cases it doesn't have answers to. Currently we prepopulate it with an array of answers for all keys and a CPU set of "all CPUs". This can be adjusted as necessary to provide fast answers to the most common queries. An example series in glibc exposing this syscall and using it in an ifunc selector for memcpy can be found at [1]. I was asked about the performance delta between this and something like sysfs. I created a small test program and ran it on a Nezha D1 Allwinner board. Doing each operation 100000 times and dividing, these operations take the following amount of time: - open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us - access("/sys/kernel/cpu_byteorder", R_OK): 1.3us - riscv_hwprobe() vDSO and syscall: .0094us - riscv_hwprobe() vDSO with no syscall: 0.0091us These numbers get farther apart if we query multiple keys, as sysfs will scale linearly with the number of keys, where the dedicated syscall stays the same. To frame these numbers, I also did a tight fork/exec/wait loop, which I measured as 4.8ms. So doing 4 open/read/close operations is a delta of about 0.3%, versus a single vDSO call is a delta of essentially zero. [1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050 * b4-shazam-merge: RISC-V: Add hwprobe vDSO function and data selftests: Test the new RISC-V hwprobe interface RISC-V: hwprobe: Support probing of misaligned access performance RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA RISC-V: Add a syscall for HW probing RISC-V: Move struct riscv_cpuinfo to new header Link: https://lore.kernel.org/r/20230407231103.2622178-1-evan@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2 parents 6a24915 + aa5af0a commit eb04e72

28 files changed

Lines changed: 712 additions & 14 deletions

File tree

Documentation/riscv/hwprobe.rst

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
RISC-V Hardware Probing Interface
4+
---------------------------------
5+
6+
The RISC-V hardware probing interface is based around a single syscall, which
7+
is defined in <asm/hwprobe.h>::
8+
9+
struct riscv_hwprobe {
10+
__s64 key;
11+
__u64 value;
12+
};
13+
14+
long sys_riscv_hwprobe(struct riscv_hwprobe *pairs, size_t pair_count,
15+
size_t cpu_count, cpu_set_t *cpus,
16+
unsigned int flags);
17+
18+
The arguments are split into three groups: an array of key-value pairs, a CPU
19+
set, and some flags. The key-value pairs are supplied with a count. Userspace
20+
must prepopulate the key field for each element, and the kernel will fill in the
21+
value if the key is recognized. If a key is unknown to the kernel, its key field
22+
will be cleared to -1, and its value set to 0. The CPU set is defined by
23+
CPU_SET(3). For value-like keys (eg. vendor/arch/impl), the returned value will
24+
be only be valid if all CPUs in the given set have the same value. Otherwise -1
25+
will be returned. For boolean-like keys, the value returned will be a logical
26+
AND of the values for the specified CPUs. Usermode can supply NULL for cpus and
27+
0 for cpu_count as a shortcut for all online CPUs. There are currently no flags,
28+
this value must be zero for future compatibility.
29+
30+
On success 0 is returned, on failure a negative error code is returned.
31+
32+
The following keys are defined:
33+
34+
* :c:macro:`RISCV_HWPROBE_KEY_MVENDORID`: Contains the value of ``mvendorid``,
35+
as defined by the RISC-V privileged architecture specification.
36+
37+
* :c:macro:`RISCV_HWPROBE_KEY_MARCHID`: Contains the value of ``marchid``, as
38+
defined by the RISC-V privileged architecture specification.
39+
40+
* :c:macro:`RISCV_HWPROBE_KEY_MIMPLID`: Contains the value of ``mimplid``, as
41+
defined by the RISC-V privileged architecture specification.
42+
43+
* :c:macro:`RISCV_HWPROBE_KEY_BASE_BEHAVIOR`: A bitmask containing the base
44+
user-visible behavior that this kernel supports. The following base user ABIs
45+
are defined:
46+
47+
* :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`: Support for rv32ima or
48+
rv64ima, as defined by version 2.2 of the user ISA and version 1.10 of the
49+
privileged ISA, with the following known exceptions (more exceptions may be
50+
added, but only if it can be demonstrated that the user ABI is not broken):
51+
52+
* The :fence.i: instruction cannot be directly executed by userspace
53+
programs (it may still be executed in userspace via a
54+
kernel-controlled mechanism such as the vDSO).
55+
56+
* :c:macro:`RISCV_HWPROBE_KEY_IMA_EXT_0`: A bitmask containing the extensions
57+
that are compatible with the :c:macro:`RISCV_HWPROBE_BASE_BEHAVIOR_IMA`:
58+
base system behavior.
59+
60+
* :c:macro:`RISCV_HWPROBE_IMA_FD`: The F and D extensions are supported, as
61+
defined by commit cd20cee ("FMIN/FMAX now implement
62+
minimumNumber/maximumNumber, not minNum/maxNum") of the RISC-V ISA manual.
63+
64+
* :c:macro:`RISCV_HWPROBE_IMA_C`: The C extension is supported, as defined
65+
by version 2.2 of the RISC-V ISA manual.
66+
67+
* :c:macro:`RISCV_HWPROBE_KEY_CPUPERF_0`: A bitmask that contains performance
68+
information about the selected set of processors.
69+
70+
* :c:macro:`RISCV_HWPROBE_MISALIGNED_UNKNOWN`: The performance of misaligned
71+
accesses is unknown.
72+
73+
* :c:macro:`RISCV_HWPROBE_MISALIGNED_EMULATED`: Misaligned accesses are
74+
emulated via software, either in or below the kernel. These accesses are
75+
always extremely slow.
76+
77+
* :c:macro:`RISCV_HWPROBE_MISALIGNED_SLOW`: Misaligned accesses are supported
78+
in hardware, but are slower than the cooresponding aligned accesses
79+
sequences.
80+
81+
* :c:macro:`RISCV_HWPROBE_MISALIGNED_FAST`: Misaligned accesses are supported
82+
in hardware and are faster than the cooresponding aligned accesses
83+
sequences.
84+
85+
* :c:macro:`RISCV_HWPROBE_MISALIGNED_UNSUPPORTED`: Misaligned accesses are
86+
not supported at all and will generate a misaligned address fault.

Documentation/riscv/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ RISC-V architecture
77

88
boot-image-header
99
vm-layout
10+
hwprobe
1011
patch-acceptance
1112
uabi
1213

arch/riscv/Kconfig

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ config RISCV
3333
select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL
3434
select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
3535
select ARCH_HAS_UBSAN_SANITIZE_ALL
36+
select ARCH_HAS_VDSO_DATA
3637
select ARCH_OPTIONAL_KERNEL_RWX if ARCH_HAS_STRICT_KERNEL_RWX
3738
select ARCH_OPTIONAL_KERNEL_RWX_DEFAULT
3839
select ARCH_STACKWALK

arch/riscv/errata/thead/errata.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,9 @@
1111
#include <linux/uaccess.h>
1212
#include <asm/alternative.h>
1313
#include <asm/cacheflush.h>
14+
#include <asm/cpufeature.h>
1415
#include <asm/errata_list.h>
16+
#include <asm/hwprobe.h>
1517
#include <asm/patch.h>
1618
#include <asm/vendorid_list.h>
1719

@@ -115,3 +117,11 @@ void __init_or_module thead_errata_patch_func(struct alt_entry *begin, struct al
115117
if (stage == RISCV_ALTERNATIVES_EARLY_BOOT)
116118
local_flush_icache_all();
117119
}
120+
121+
void __init_or_module thead_feature_probe_func(unsigned int cpu,
122+
unsigned long archid,
123+
unsigned long impid)
124+
{
125+
if ((archid == 0) && (impid == 0))
126+
per_cpu(misaligned_access_speed, cpu) = RISCV_HWPROBE_MISALIGNED_FAST;
127+
}

arch/riscv/include/asm/alternative.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
#define ALT_OLD_PTR(a) __ALT_PTR(a, old_offset)
3131
#define ALT_ALT_PTR(a) __ALT_PTR(a, alt_offset)
3232

33+
void __init probe_vendor_features(unsigned int cpu);
3334
void __init apply_boot_alternatives(void);
3435
void __init apply_early_boot_alternatives(void);
3536
void apply_module_alternatives(void *start, size_t length);
@@ -52,11 +53,15 @@ void thead_errata_patch_func(struct alt_entry *begin, struct alt_entry *end,
5253
unsigned long archid, unsigned long impid,
5354
unsigned int stage);
5455

56+
void thead_feature_probe_func(unsigned int cpu, unsigned long archid,
57+
unsigned long impid);
58+
5559
void riscv_cpufeature_patch_func(struct alt_entry *begin, struct alt_entry *end,
5660
unsigned int stage);
5761

5862
#else /* CONFIG_RISCV_ALTERNATIVE */
5963

64+
static inline void probe_vendor_features(unsigned int cpu) { }
6065
static inline void apply_boot_alternatives(void) { }
6166
static inline void apply_early_boot_alternatives(void) { }
6267
static inline void apply_module_alternatives(void *start, size_t length) { }
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
/* SPDX-License-Identifier: GPL-2.0-only */
2+
/*
3+
* Copyright 2022-2023 Rivos, Inc
4+
*/
5+
6+
#ifndef _ASM_CPUFEATURE_H
7+
#define _ASM_CPUFEATURE_H
8+
9+
/*
10+
* These are probed via a device_initcall(), via either the SBI or directly
11+
* from the corresponding CSRs.
12+
*/
13+
struct riscv_cpuinfo {
14+
unsigned long mvendorid;
15+
unsigned long marchid;
16+
unsigned long mimpid;
17+
};
18+
19+
DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo);
20+
21+
DECLARE_PER_CPU(long, misaligned_access_speed);
22+
23+
#endif

arch/riscv/include/asm/hwprobe.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
2+
/*
3+
* Copyright 2023 Rivos, Inc
4+
*/
5+
6+
#ifndef _ASM_HWPROBE_H
7+
#define _ASM_HWPROBE_H
8+
9+
#include <uapi/asm/hwprobe.h>
10+
11+
#define RISCV_HWPROBE_MAX_KEY 5
12+
13+
#endif

arch/riscv/include/asm/syscall.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
#ifndef _ASM_RISCV_SYSCALL_H
1111
#define _ASM_RISCV_SYSCALL_H
1212

13+
#include <asm/hwprobe.h>
1314
#include <uapi/linux/audit.h>
1415
#include <linux/sched.h>
1516
#include <linux/err.h>
@@ -96,4 +97,7 @@ static inline bool arch_syscall_is_vdso_sigreturn(struct pt_regs *regs)
9697
}
9798

9899
asmlinkage long sys_riscv_flush_icache(uintptr_t, uintptr_t, uintptr_t);
100+
101+
asmlinkage long sys_riscv_hwprobe(struct riscv_hwprobe *, size_t, size_t,
102+
unsigned long *, unsigned int);
99103
#endif /* _ASM_RISCV_SYSCALL_H */

arch/riscv/include/asm/vdso/data.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
#ifndef __RISCV_ASM_VDSO_DATA_H
3+
#define __RISCV_ASM_VDSO_DATA_H
4+
5+
#include <linux/types.h>
6+
#include <vdso/datapage.h>
7+
#include <asm/hwprobe.h>
8+
9+
struct arch_vdso_data {
10+
/* Stash static answers to the hwprobe queries when all CPUs are selected. */
11+
__u64 all_cpu_hwprobe_values[RISCV_HWPROBE_MAX_KEY + 1];
12+
13+
/* Boolean indicating all CPUs have the same static hwprobe values. */
14+
__u8 homogeneous_cpus;
15+
};
16+
17+
#endif /* __RISCV_ASM_VDSO_DATA_H */

arch/riscv/include/asm/vdso/gettimeofday.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,12 @@
99
#include <asm/csr.h>
1010
#include <uapi/linux/time.h>
1111

12+
/*
13+
* 32-bit land is lacking generic time vsyscalls as well as the legacy 32-bit
14+
* time syscalls like gettimeofday. Skip these definitions since on 32-bit.
15+
*/
16+
#ifdef CONFIG_GENERIC_TIME_VSYSCALL
17+
1218
#define VDSO_HAS_CLOCK_GETRES 1
1319

1420
static __always_inline
@@ -60,6 +66,8 @@ int clock_getres_fallback(clockid_t _clkid, struct __kernel_timespec *_ts)
6066
return ret;
6167
}
6268

69+
#endif /* CONFIG_GENERIC_TIME_VSYSCALL */
70+
6371
static __always_inline u64 __arch_get_hw_counter(s32 clock_mode,
6472
const struct vdso_data *vd)
6573
{

0 commit comments

Comments
 (0)