Skip to content

Commit 3fa805c

Browse files
leitaoakpm00
authored andcommitted
vmcoreinfo: track and log recoverable hardware errors
Introduce a generic infrastructure for tracking recoverable hardware errors (HW errors that are visible to the OS but does not cause a panic) and record them for vmcore consumption. This aids post-mortem crash analysis tools by preserving a count and timestamp for the last occurrence of such errors. On the other side, correctable errors, which the OS typically remains unaware of because the underlying hardware handles them transparently, are less relevant for crash dump and therefore are NOT tracked in this infrastructure. Add centralized logging for sources of recoverable hardware errors based on the subsystem it has been notified. hwerror_data is write-only at kernel runtime, and it is meant to be read from vmcore using tools like crash/drgn. For example, this is how it looks like when opening the crashdump from drgn. >>> prog['hwerror_data'] (struct hwerror_info[1]){ { .count = (int)844, .timestamp = (time64_t)1752852018, }, ... This helps fleet operators quickly triage whether a crash may be influenced by hardware recoverable errors (which executes a uncommon code path in the kernel), especially when recoverable errors occurred shortly before a panic, such as the bug fixed by commit ee62ce7 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool") This is not intended to replace full hardware diagnostics but provides a fast way to correlate hardware events with kernel panics quickly. Rare machine check exceptions—like those indicated by mce_flags.p5 or mce_flags.winchip—are not accounted for in this method, as they fall outside the intended usage scope for this feature's user base. [leitao@debian.org: add hw-recoverable-errors to toctree] Link: https://lkml.kernel.org/r/20251127-vmcoreinfo_fix-v1-1-26f5b1c43da9@debian.org Link: https://lkml.kernel.org/r/20251010-vmcore_hw_error-v5-1-636ede3efe44@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Suggested-by: Tony Luck <tony.luck@intel.com> Suggested-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> [APEI] Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Bob Moore <robert.moore@intel.com> Cc: Borislav Betkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Morse <james.morse@arm.com> Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com> Cc: Len Brown <lenb@kernel.org> Cc: Mahesh Salgaonkar <mahesh@linux.ibm.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: "Oliver O'Halloran" <oohall@gmail.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
1 parent 7b71205 commit 3fa805c

8 files changed

Lines changed: 137 additions & 0 deletions

File tree

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
=================================================
4+
Recoverable Hardware Error Tracking in vmcoreinfo
5+
=================================================
6+
7+
Overview
8+
--------
9+
10+
This feature provides a generic infrastructure within the Linux kernel to track
11+
and log recoverable hardware errors. These are hardware recoverable errors
12+
visible that might not cause immediate panics but may influence health, mainly
13+
because new code path will be executed in the kernel.
14+
15+
By recording counts and timestamps of recoverable errors into the vmcoreinfo
16+
crash dump notes, this infrastructure aids post-mortem crash analysis tools in
17+
correlating hardware events with kernel failures. This enables faster triage
18+
and better understanding of root causes, especially in large-scale cloud
19+
environments where hardware issues are common.
20+
21+
Benefits
22+
--------
23+
24+
- Facilitates correlation of hardware recoverable errors with kernel panics or
25+
unusual code paths that lead to system crashes.
26+
- Provides operators and cloud providers quick insights, improving reliability
27+
and reducing troubleshooting time.
28+
- Complements existing full hardware diagnostics without replacing them.
29+
30+
Data Exposure and Consumption
31+
-----------------------------
32+
33+
- The tracked error data consists of per-error-type counts and timestamps of
34+
last occurrence.
35+
- This data is stored in the `hwerror_data` array, categorized by error source
36+
types like CPU, memory, PCI, CXL, and others.
37+
- It is exposed via vmcoreinfo crash dump notes and can be read using tools
38+
like `crash`, `drgn`, or other kernel crash analysis utilities.
39+
- There is no other way to read these data other than from crash dumps.
40+
- These errors are divided by area, which includes CPU, Memory, PCI, CXL and
41+
others.
42+
43+
Typical usage example (in drgn REPL):
44+
45+
.. code-block:: python
46+
47+
>>> prog['hwerror_data']
48+
(struct hwerror_info[HWERR_RECOV_MAX]){
49+
{
50+
.count = (int)844,
51+
.timestamp = (time64_t)1752852018,
52+
},
53+
...
54+
}
55+
56+
Enabling
57+
--------
58+
59+
- This feature is enabled when CONFIG_VMCORE_INFO is set.
60+

Documentation/driver-api/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ Subsystem-specific APIs
9696
gpio/index
9797
hsi
9898
hte/index
99+
hw-recoverable-errors
99100
i2c
100101
iio/index
101102
infiniband

arch/x86/kernel/cpu/mce/core.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@
4545
#include <linux/task_work.h>
4646
#include <linux/hardirq.h>
4747
#include <linux/kexec.h>
48+
#include <linux/vmcore_info.h>
4849

4950
#include <asm/fred.h>
5051
#include <asm/cpu_device_id.h>
@@ -1700,6 +1701,9 @@ noinstr void do_machine_check(struct pt_regs *regs)
17001701
}
17011702

17021703
out:
1704+
/* Given it didn't panic, mark it as recoverable */
1705+
hwerr_log_error_type(HWERR_RECOV_OTHERS);
1706+
17031707
instrumentation_end();
17041708

17051709
clear:

drivers/acpi/apei/ghes.c

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
#include <linux/uuid.h>
4444
#include <linux/ras.h>
4545
#include <linux/task_work.h>
46+
#include <linux/vmcore_info.h>
4647

4748
#include <acpi/actbl1.h>
4849
#include <acpi/ghes.h>
@@ -867,6 +868,40 @@ int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd)
867868
}
868869
EXPORT_SYMBOL_NS_GPL(cxl_cper_kfifo_get, "CXL");
869870

871+
static void ghes_log_hwerr(int sev, guid_t *sec_type)
872+
{
873+
if (sev != CPER_SEV_RECOVERABLE)
874+
return;
875+
876+
if (guid_equal(sec_type, &CPER_SEC_PROC_ARM) ||
877+
guid_equal(sec_type, &CPER_SEC_PROC_GENERIC) ||
878+
guid_equal(sec_type, &CPER_SEC_PROC_IA)) {
879+
hwerr_log_error_type(HWERR_RECOV_CPU);
880+
return;
881+
}
882+
883+
if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR) ||
884+
guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID) ||
885+
guid_equal(sec_type, &CPER_SEC_CXL_DRAM_GUID) ||
886+
guid_equal(sec_type, &CPER_SEC_CXL_MEM_MODULE_GUID)) {
887+
hwerr_log_error_type(HWERR_RECOV_CXL);
888+
return;
889+
}
890+
891+
if (guid_equal(sec_type, &CPER_SEC_PCIE) ||
892+
guid_equal(sec_type, &CPER_SEC_PCI_X_BUS)) {
893+
hwerr_log_error_type(HWERR_RECOV_PCI);
894+
return;
895+
}
896+
897+
if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
898+
hwerr_log_error_type(HWERR_RECOV_MEMORY);
899+
return;
900+
}
901+
902+
hwerr_log_error_type(HWERR_RECOV_OTHERS);
903+
}
904+
870905
static void ghes_do_proc(struct ghes *ghes,
871906
const struct acpi_hest_generic_status *estatus)
872907
{
@@ -888,6 +923,7 @@ static void ghes_do_proc(struct ghes *ghes,
888923
if (gdata->validation_bits & CPER_SEC_VALID_FRU_TEXT)
889924
fru_text = gdata->fru_text;
890925

926+
ghes_log_hwerr(sev, sec_type);
891927
if (guid_equal(sec_type, &CPER_SEC_PLATFORM_MEM)) {
892928
struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
893929

drivers/pci/pcie/aer.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
#include <linux/kfifo.h>
3131
#include <linux/ratelimit.h>
3232
#include <linux/slab.h>
33+
#include <linux/vmcore_info.h>
3334
#include <acpi/apei.h>
3435
#include <acpi/ghes.h>
3536
#include <ras/ras_event.h>
@@ -765,6 +766,7 @@ static void pci_dev_aer_stats_incr(struct pci_dev *pdev,
765766
break;
766767
case AER_NONFATAL:
767768
aer_info->dev_total_nonfatal_errs++;
769+
hwerr_log_error_type(HWERR_RECOV_PCI);
768770
counter = &aer_info->dev_nonfatal_errs[0];
769771
max = AER_MAX_TYPEOF_UNCOR_ERRS;
770772
break;

include/linux/vmcore_info.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
#include <linux/linkage.h>
66
#include <linux/elfcore.h>
77
#include <linux/elf.h>
8+
#include <uapi/linux/vmcore.h>
89

910
#define CRASH_CORE_NOTE_HEAD_BYTES ALIGN(sizeof(struct elf_note), 4)
1011
#define CRASH_CORE_NOTE_NAME_BYTES ALIGN(sizeof(NN_PRSTATUS), 4)
@@ -77,4 +78,11 @@ extern u32 *vmcoreinfo_note;
7778
Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
7879
void *data, size_t data_len);
7980
void final_note(Elf_Word *buf);
81+
82+
#ifdef CONFIG_VMCORE_INFO
83+
void hwerr_log_error_type(enum hwerr_error_type src);
84+
#else
85+
static inline void hwerr_log_error_type(enum hwerr_error_type src) {};
86+
#endif
87+
8088
#endif /* LINUX_VMCORE_INFO_H */

include/uapi/linux/vmcore.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,13 @@ struct vmcoredd_header {
1515
__u8 dump_name[VMCOREDD_MAX_NAME_BYTES]; /* Device dump's name */
1616
};
1717

18+
enum hwerr_error_type {
19+
HWERR_RECOV_CPU,
20+
HWERR_RECOV_MEMORY,
21+
HWERR_RECOV_PCI,
22+
HWERR_RECOV_CXL,
23+
HWERR_RECOV_OTHERS,
24+
HWERR_RECOV_MAX,
25+
};
26+
1827
#endif /* _UAPI_VMCORE_H */

kernel/vmcore_info.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,13 @@ u32 *vmcoreinfo_note;
3131
/* trusted vmcoreinfo, e.g. we can make a copy in the crash memory */
3232
static unsigned char *vmcoreinfo_data_safecopy;
3333

34+
struct hwerr_info {
35+
atomic_t count;
36+
time64_t timestamp;
37+
};
38+
39+
static struct hwerr_info hwerr_data[HWERR_RECOV_MAX];
40+
3441
Elf_Word *append_elf_note(Elf_Word *buf, char *name, unsigned int type,
3542
void *data, size_t data_len)
3643
{
@@ -118,6 +125,16 @@ phys_addr_t __weak paddr_vmcoreinfo_note(void)
118125
}
119126
EXPORT_SYMBOL(paddr_vmcoreinfo_note);
120127

128+
void hwerr_log_error_type(enum hwerr_error_type src)
129+
{
130+
if (src < 0 || src >= HWERR_RECOV_MAX)
131+
return;
132+
133+
atomic_inc(&hwerr_data[src].count);
134+
WRITE_ONCE(hwerr_data[src].timestamp, ktime_get_real_seconds());
135+
}
136+
EXPORT_SYMBOL_GPL(hwerr_log_error_type);
137+
121138
static int __init crash_save_vmcoreinfo_init(void)
122139
{
123140
vmcoreinfo_data = (unsigned char *)get_zeroed_page(GFP_KERNEL);

0 commit comments

Comments
 (0)