Skip to content

Commit 4d84667

Browse files
committed
Merge tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull performance event updates from Ingo Molnar: "x86 PMU driver updates: - Add support for the core PMU for Intel Diamond Rapids (DMR) CPUs (Dapeng Mi) Compared to previous iterations of the Intel PMU code, there's been a lot of changes, which center around three main areas: - Introduce the OFF-MODULE RESPONSE (OMR) facility to replace the Off-Core Response (OCR) facility - New PEBS data source encoding layout - Support the new "RDPMC user disable" feature - Likewise, a large series adds uncore PMU support for Intel Diamond Rapids (DMR) CPUs (Zide Chen) This centers around these four main areas: - DMR may have two Integrated I/O and Memory Hub (IMH) dies, separate from the compute tile (CBB) dies. Each CBB and each IMH die has its own discovery domain. - Unlike prior CPUs that retrieve the global discovery table portal exclusively via PCI or MSR, DMR uses PCI for IMH PMON discovery and MSR for CBB PMON discovery. - DMR introduces several new PMON types: SCA, HAMVF, D2D_ULA, UBR, PCIE4, CRS, CPC, ITC, OTC, CMS, and PCIE6. - IIO free-running counters in DMR are MMIO-based, unlike SPR. - Also add support for Add missing PMON units for Intel Panther Lake, and support Nova Lake (NVL), which largely maps to Panther Lake. (Zide Chen) - KVM integration: Add support for mediated vPMUs (by Kan Liang and Sean Christopherson, with fixes and cleanups by Peter Zijlstra, Sandipan Das and Mingwei Zhang) - Add Intel cstate driver to support for Wildcat Lake (WCL) CPUs, which are a low-power variant of Panther Lake (Zide Chen) - Add core, cstate and MSR PMU support for the Airmont NP Intel CPU (aka MaxLinear Lightning Mountain), which maps to the existing Airmont code (Martin Schiller) Performance enhancements: - Speed up kexec shutdown by avoiding unnecessary cross CPU calls (Jan H. Schönherr) - Fix slow perf_event_task_exit() with LBR callstacks (Namhyung Kim) User-space stack unwinding support: - Various cleanups and refactorings in preparation to generalize the unwinding code for other architectures (Jens Remus) Uprobes updates: - Transition from kmap_atomic to kmap_local_page (Keke Ming) - Fix incorrect lockdep condition in filter_chain() (Breno Leitao) - Fix XOL allocation failure for 32-bit tasks (Oleg Nesterov) Misc fixes and cleanups: - s390: Remove kvm_types.h from Kbuild (Randy Dunlap) - x86/intel/uncore: Convert comma to semicolon (Chen Ni) - x86/uncore: Clean up const mismatch (Greg Kroah-Hartman) - x86/ibs: Fix typo in dc_l2tlb_miss comment (Xiang-Bin Shi)" * tag 'perf-core-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) s390: remove kvm_types.h from Kbuild uprobes: Fix incorrect lockdep condition in filter_chain() x86/ibs: Fix typo in dc_l2tlb_miss comment x86/uprobes: Fix XOL allocation failure for 32-bit tasks perf/x86/intel/uncore: Convert comma to semicolon perf/x86/intel: Add support for rdpmc user disable feature perf/x86: Use macros to replace magic numbers in attr_rdpmc perf/x86/intel: Add core PMU support for Novalake perf/x86/intel: Add support for PEBS memory auxiliary info field in NVL perf/x86/intel: Add core PMU support for DMR perf/x86/intel: Add support for PEBS memory auxiliary info field in DMR perf/x86/intel: Support the 4 new OMR MSRs introduced in DMR and NVL perf/core: Fix slow perf_event_task_exit() with LBR callstacks perf/core: Speed up kexec shutdown by avoiding unnecessary cross CPU calls uprobes: use kmap_local_page() for temporary page mappings arm/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol() mips/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol() arm64/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol() riscv/uprobes: use kmap_local_page() in arch_uprobe_copy_ixol() perf/x86/intel/uncore: Add Nova Lake support ...
2 parents a9aabb3 + 7db06e3 commit 4d84667

45 files changed

Lines changed: 2154 additions & 522 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
What: /sys/bus/event_source/devices/cpu.../rdpmc
2+
Date: November 2011
3+
KernelVersion: 3.10
4+
Contact: Linux kernel mailing list linux-kernel@vger.kernel.org
5+
Description: The /sys/bus/event_source/devices/cpu.../rdpmc attribute
6+
is used to show/manage if rdpmc instruction can be
7+
executed in user space. This attribute supports 3 numbers.
8+
- rdpmc = 0
9+
user space rdpmc is globally disabled for all PMU
10+
counters.
11+
- rdpmc = 1
12+
user space rdpmc is globally enabled only in event mmap
13+
ioctl called time window. If the mmap region is unmapped,
14+
user space rdpmc is disabled again.
15+
- rdpmc = 2
16+
user space rdpmc is globally enabled for all PMU
17+
counters.
18+
19+
In the Intel platforms supporting counter level's user
20+
space rdpmc disable feature (CPUID.23H.EBX[2] = 1), the
21+
meaning of 3 numbers is extended to
22+
- rdpmc = 0
23+
global user space rdpmc and counter level's user space
24+
rdpmc of all counters are both disabled.
25+
- rdpmc = 1
26+
No changes on behavior of global user space rdpmc.
27+
counter level's rdpmc of system-wide events is disabled
28+
but counter level's rdpmc of non-system-wide events is
29+
enabled.
30+
- rdpmc = 2
31+
global user space rdpmc and counter level's user space
32+
rdpmc of all counters are both enabled unconditionally.
33+
34+
The default value of rdpmc is 1.
35+
36+
Please notice:
37+
- global user space rdpmc's behavior would change
38+
immediately along with the rdpmc value's change,
39+
but the behavior of counter level's user space rdpmc
40+
won't take effect immediately until the event is
41+
reactivated or recreated.
42+
- The rdpmc attribute is global, even for x86 hybrid
43+
platforms. For example, changing cpu_core/rdpmc will
44+
also change cpu_atom/rdpmc.

arch/arm/probes/uprobes/core.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
113113
void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
114114
void *src, unsigned long len)
115115
{
116-
void *xol_page_kaddr = kmap_atomic(page);
116+
void *xol_page_kaddr = kmap_local_page(page);
117117
void *dst = xol_page_kaddr + (vaddr & ~PAGE_MASK);
118118

119119
preempt_disable();
@@ -126,7 +126,7 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
126126

127127
preempt_enable();
128128

129-
kunmap_atomic(xol_page_kaddr);
129+
kunmap_local(xol_page_kaddr);
130130
}
131131

132132

arch/arm64/kernel/probes/uprobes.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
1616
void *src, unsigned long len)
1717
{
18-
void *xol_page_kaddr = kmap_atomic(page);
18+
void *xol_page_kaddr = kmap_local_page(page);
1919
void *dst = xol_page_kaddr + (vaddr & ~PAGE_MASK);
2020

2121
/*
@@ -32,7 +32,7 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
3232
sync_icache_aliases((unsigned long)dst, (unsigned long)dst + len);
3333

3434
done:
35-
kunmap_atomic(xol_page_kaddr);
35+
kunmap_local(xol_page_kaddr);
3636
}
3737

3838
unsigned long uprobe_get_swbp_addr(struct pt_regs *regs)

arch/mips/kernel/uprobes.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -214,11 +214,11 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
214214
unsigned long kaddr, kstart;
215215

216216
/* Initialize the slot */
217-
kaddr = (unsigned long)kmap_atomic(page);
217+
kaddr = (unsigned long)kmap_local_page(page);
218218
kstart = kaddr + (vaddr & ~PAGE_MASK);
219219
memcpy((void *)kstart, src, len);
220220
flush_icache_range(kstart, kstart + len);
221-
kunmap_atomic((void *)kaddr);
221+
kunmap_local((void *)kaddr);
222222
}
223223

224224
/**

arch/riscv/kernel/probes/uprobes.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
165165
void *src, unsigned long len)
166166
{
167167
/* Initialize the slot */
168-
void *kaddr = kmap_atomic(page);
168+
void *kaddr = kmap_local_page(page);
169169
void *dst = kaddr + (vaddr & ~PAGE_MASK);
170170
unsigned long start = (unsigned long)dst;
171171

@@ -178,5 +178,5 @@ void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
178178
}
179179

180180
flush_icache_range(start, start + len);
181-
kunmap_atomic(kaddr);
181+
kunmap_local(kaddr);
182182
}

arch/s390/include/asm/Kbuild

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,5 @@ generated-y += syscall_table.h
55
generated-y += unistd_nr.h
66

77
generic-y += asm-offsets.h
8-
generic-y += kvm_types.h
98
generic-y += mcs_spinlock.h
109
generic-y += mmzone.h

arch/x86/entry/entry_fred.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ static idtentry_t sysvec_table[NR_SYSTEM_VECTORS] __ro_after_init = {
114114

115115
SYSVEC(IRQ_WORK_VECTOR, irq_work),
116116

117+
SYSVEC(PERF_GUEST_MEDIATED_PMI_VECTOR, perf_guest_mediated_pmi_handler),
117118
SYSVEC(POSTED_INTR_VECTOR, kvm_posted_intr_ipi),
118119
SYSVEC(POSTED_INTR_WAKEUP_VECTOR, kvm_posted_intr_wakeup_ipi),
119120
SYSVEC(POSTED_INTR_NESTED_VECTOR, kvm_posted_intr_nested_ipi),

arch/x86/events/amd/core.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1439,6 +1439,8 @@ static int __init amd_core_pmu_init(void)
14391439

14401440
amd_pmu_global_cntr_mask = x86_pmu.cntr_mask64;
14411441

1442+
x86_get_pmu(smp_processor_id())->capabilities |= PERF_PMU_CAP_MEDIATED_VPMU;
1443+
14421444
/* Update PMC handling functions */
14431445
x86_pmu.enable_all = amd_pmu_v2_enable_all;
14441446
x86_pmu.disable_all = amd_pmu_v2_disable_all;

arch/x86/events/core.c

Lines changed: 61 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
#include <linux/device.h>
3131
#include <linux/nospec.h>
3232
#include <linux/static_call.h>
33+
#include <linux/kvm_types.h>
3334

3435
#include <asm/apic.h>
3536
#include <asm/stacktrace.h>
@@ -56,6 +57,8 @@ DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = {
5657
.pmu = &pmu,
5758
};
5859

60+
static DEFINE_PER_CPU(bool, guest_lvtpc_loaded);
61+
5962
DEFINE_STATIC_KEY_FALSE(rdpmc_never_available_key);
6063
DEFINE_STATIC_KEY_FALSE(rdpmc_always_available_key);
6164
DEFINE_STATIC_KEY_FALSE(perf_is_hybrid);
@@ -1760,13 +1763,43 @@ void perf_events_lapic_init(void)
17601763
apic_write(APIC_LVTPC, APIC_DM_NMI);
17611764
}
17621765

1766+
#ifdef CONFIG_PERF_GUEST_MEDIATED_PMU
1767+
void perf_load_guest_lvtpc(u32 guest_lvtpc)
1768+
{
1769+
u32 masked = guest_lvtpc & APIC_LVT_MASKED;
1770+
1771+
apic_write(APIC_LVTPC,
1772+
APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked);
1773+
this_cpu_write(guest_lvtpc_loaded, true);
1774+
}
1775+
EXPORT_SYMBOL_FOR_KVM(perf_load_guest_lvtpc);
1776+
1777+
void perf_put_guest_lvtpc(void)
1778+
{
1779+
this_cpu_write(guest_lvtpc_loaded, false);
1780+
apic_write(APIC_LVTPC, APIC_DM_NMI);
1781+
}
1782+
EXPORT_SYMBOL_FOR_KVM(perf_put_guest_lvtpc);
1783+
#endif /* CONFIG_PERF_GUEST_MEDIATED_PMU */
1784+
17631785
static int
17641786
perf_event_nmi_handler(unsigned int cmd, struct pt_regs *regs)
17651787
{
17661788
u64 start_clock;
17671789
u64 finish_clock;
17681790
int ret;
17691791

1792+
/*
1793+
* Ignore all NMIs when the CPU's LVTPC is configured to route PMIs to
1794+
* PERF_GUEST_MEDIATED_PMI_VECTOR, i.e. when an NMI time can't be due
1795+
* to a PMI. Attempting to handle a PMI while the guest's context is
1796+
* loaded will generate false positives and clobber guest state. Note,
1797+
* the LVTPC is switched to/from the dedicated mediated PMI IRQ vector
1798+
* while host events are quiesced.
1799+
*/
1800+
if (this_cpu_read(guest_lvtpc_loaded))
1801+
return NMI_DONE;
1802+
17701803
/*
17711804
* All PMUs/events that share this PMI handler should make sure to
17721805
* increment active_events for their events.
@@ -2130,7 +2163,8 @@ static int __init init_hw_perf_events(void)
21302163

21312164
pr_cont("%s PMU driver.\n", x86_pmu.name);
21322165

2133-
x86_pmu.attr_rdpmc = 1; /* enable userspace RDPMC usage by default */
2166+
/* enable userspace RDPMC usage by default */
2167+
x86_pmu.attr_rdpmc = X86_USER_RDPMC_CONDITIONAL_ENABLE;
21342168

21352169
for (quirk = x86_pmu.quirks; quirk; quirk = quirk->next)
21362170
quirk->func();
@@ -2582,6 +2616,27 @@ static ssize_t get_attr_rdpmc(struct device *cdev,
25822616
return snprintf(buf, 40, "%d\n", x86_pmu.attr_rdpmc);
25832617
}
25842618

2619+
/*
2620+
* Behaviors of rdpmc value:
2621+
* - rdpmc = 0
2622+
* global user space rdpmc and counter level's user space rdpmc of all
2623+
* counters are both disabled.
2624+
* - rdpmc = 1
2625+
* global user space rdpmc is enabled in mmap enabled time window and
2626+
* counter level's user space rdpmc is enabled for only non system-wide
2627+
* events. Counter level's user space rdpmc of system-wide events is
2628+
* still disabled by default. This won't introduce counter data leak for
2629+
* non system-wide events since their count data would be cleared when
2630+
* context switches.
2631+
* - rdpmc = 2
2632+
* global user space rdpmc and counter level's user space rdpmc of all
2633+
* counters are enabled unconditionally.
2634+
*
2635+
* Suppose the rdpmc value won't be changed frequently, don't dynamically
2636+
* reschedule events to make the new rpdmc value take effect on active perf
2637+
* events immediately, the new rdpmc value would only impact the new
2638+
* activated perf events. This makes code simpler and cleaner.
2639+
*/
25852640
static ssize_t set_attr_rdpmc(struct device *cdev,
25862641
struct device_attribute *attr,
25872642
const char *buf, size_t count)
@@ -2610,12 +2665,12 @@ static ssize_t set_attr_rdpmc(struct device *cdev,
26102665
*/
26112666
if (val == 0)
26122667
static_branch_inc(&rdpmc_never_available_key);
2613-
else if (x86_pmu.attr_rdpmc == 0)
2668+
else if (x86_pmu.attr_rdpmc == X86_USER_RDPMC_NEVER_ENABLE)
26142669
static_branch_dec(&rdpmc_never_available_key);
26152670

26162671
if (val == 2)
26172672
static_branch_inc(&rdpmc_always_available_key);
2618-
else if (x86_pmu.attr_rdpmc == 2)
2673+
else if (x86_pmu.attr_rdpmc == X86_USER_RDPMC_ALWAYS_ENABLE)
26192674
static_branch_dec(&rdpmc_always_available_key);
26202675

26212676
on_each_cpu(cr4_update_pce, NULL, 1);
@@ -3073,11 +3128,12 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
30733128
cap->version = x86_pmu.version;
30743129
cap->num_counters_gp = x86_pmu_num_counters(NULL);
30753130
cap->num_counters_fixed = x86_pmu_num_counters_fixed(NULL);
3076-
cap->bit_width_gp = x86_pmu.cntval_bits;
3077-
cap->bit_width_fixed = x86_pmu.cntval_bits;
3131+
cap->bit_width_gp = cap->num_counters_gp ? x86_pmu.cntval_bits : 0;
3132+
cap->bit_width_fixed = cap->num_counters_fixed ? x86_pmu.cntval_bits : 0;
30783133
cap->events_mask = (unsigned int)x86_pmu.events_maskl;
30793134
cap->events_mask_len = x86_pmu.events_mask_len;
30803135
cap->pebs_ept = x86_pmu.pebs_ept;
3136+
cap->mediated = !!(pmu.capabilities & PERF_PMU_CAP_MEDIATED_VPMU);
30813137
}
30823138
EXPORT_SYMBOL_FOR_KVM(perf_get_x86_pmu_capability);
30833139

0 commit comments

Comments
 (0)