Skip to content

Commit 03f76dd

Browse files
committed
Merge tag 'edac_updates_for_v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC updates from Borislav Petkov: - Add support for new AMD family 0x1a models to amd64_edac - Add an EDAC driver for the AMD VersalNET memory controller which reports hw errors from different IP blocks in the fabric using an IPC-type transport - Drop the silly static number of memory controllers in the Intel EDAC drivers (skx, i10nm) in favor of a flexible array so that former doesn't need to be increased with every new generation which adds more memory controllers; along with a proper refactoring - Add support for two Alder Lake-S SOCs to ie31200_edac - Add an EDAC driver for ADM Cortex A72 cores, and specifically for reporting L1 and L2 cache errors - Last but not least, the usual fixes, cleanups and improvements all over the subsystem * tag 'edac_updates_for_v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: (23 commits) EDAC/versalnet: Return the correct error in mc_probe() EDAC/mc_sysfs: Increase legacy channel support to 16 EDAC/amd64: Add support for AMD family 1Ah-based newer models EDAC: Add a driver for the AMD Versal NET DDR controller dt-bindings: memory-controllers: Add support for Versal NET EDAC RAS: Export log_non_standard_event() to drivers cdx: Export Symbols for MCDI RPC and Initialization cdx: Split mcdi.h and reorganize headers EDAC/skx_common: Use topology_physical_package_id() instead of open coding EDAC: Fix wrong executable file modes for C source files EDAC/altera: Use dev_fwnode() EDAC/skx_common: Remove unused *NUM*_IMC macros EDAC/i10nm: Reallocate skx_dev list if preconfigured cnt != runtime cnt EDAC/skx_common: Remove redundant upper bound check for res->imc EDAC/skx_common: Make skx_dev->imc[] a flexible array EDAC/skx_common: Swap memory controller index mapping EDAC/skx_common: Move mc_mapping to be a field inside struct skx_imc EDAC/{skx_common,skx}: Use configuration data, not global macros EDAC/i10nm: Skip DIMM enumeration on a disabled memory controller EDAC/ie31200: Add two more Intel Alder Lake-S SoCs for EDAC support ...
2 parents 88b4893 + 69ed025 commit 03f76dd

29 files changed

Lines changed: 1553 additions & 111 deletions

Documentation/devicetree/bindings/arm/cpus.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -353,6 +353,12 @@ properties:
353353
$ref: /schemas/types.yaml#/definitions/phandle
354354
description: Link to Mediatek Cache Coherent Interconnect
355355

356+
edac-enabled:
357+
$ref: /schemas/types.yaml#/definitions/flag
358+
description:
359+
A72 CPUs support Error Detection And Correction (EDAC) on their L1 and
360+
L2 caches. This flag marks this function as usable.
361+
356362
qcom,saw:
357363
$ref: /schemas/types.yaml#/definitions/phandle
358364
description:
@@ -399,6 +405,17 @@ properties:
399405
allOf:
400406
- $ref: /schemas/cpu.yaml#
401407
- $ref: /schemas/opp/opp-v1.yaml#
408+
- if:
409+
not:
410+
properties:
411+
compatible:
412+
contains:
413+
const: arm,cortex-a72
414+
then:
415+
# Allow edac-enabled only for Cortex A72
416+
properties:
417+
edac-enabled: false
418+
402419
- if:
403420
# If the enable-method property contains one of those values
404421
properties:
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
2+
%YAML 1.2
3+
---
4+
$id: http://devicetree.org/schemas/memory-controllers/xlnx,versal-net-ddrmc5.yaml#
5+
$schema: http://devicetree.org/meta-schemas/core.yaml#
6+
7+
title: Xilinx Versal NET Memory Controller
8+
9+
maintainers:
10+
- Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
11+
12+
description:
13+
The integrated DDR Memory Controllers (DDRMCs) support both DDR5 and LPDDR5
14+
compact and extended memory interfaces. Versal NET DDR memory controller
15+
has an optional ECC support which correct single bit ECC errors and detect
16+
double bit ECC errors. It also has support for reporting other errors like
17+
MMCM (Mixed-Mode Clock Manager) errors and General software errors.
18+
19+
properties:
20+
compatible:
21+
const: xlnx,versal-net-ddrmc5
22+
23+
amd,rproc:
24+
$ref: /schemas/types.yaml#/definitions/phandle
25+
description:
26+
phandle to the remoteproc_r5 rproc node using which APU interacts
27+
with remote processor. APU primarily communicates with the RPU for
28+
accessing the DDRMC address space and getting error notification.
29+
30+
required:
31+
- compatible
32+
- amd,rproc
33+
34+
additionalProperties: false
35+
36+
examples:
37+
- |
38+
memory-controller {
39+
compatible = "xlnx,versal-net-ddrmc5";
40+
amd,rproc = <&remoteproc_r5>;
41+
};

MAINTAINERS

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8745,16 +8745,20 @@ F: drivers/edac/thunderx_edac*
87458745
EDAC-CORE
87468746
M: Borislav Petkov <bp@alien8.de>
87478747
M: Tony Luck <tony.luck@intel.com>
8748-
R: James Morse <james.morse@arm.com>
8749-
R: Mauro Carvalho Chehab <mchehab@kernel.org>
8750-
R: Robert Richter <rric@kernel.org>
87518748
L: linux-edac@vger.kernel.org
87528749
S: Supported
87538750
T: git git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next
87548751
F: Documentation/driver-api/edac.rst
87558752
F: drivers/edac/
87568753
F: include/linux/edac.h
87578754

8755+
EDAC-A72
8756+
M: Vijay Balakrishna <vijayb@linux.microsoft.com>
8757+
M: Tyler Hicks <code@tyhicks.com>
8758+
L: linux-edac@vger.kernel.org
8759+
S: Supported
8760+
F: drivers/edac/a72_edac.c
8761+
87588762
EDAC-DMC520
87598763
M: Lei Wang <lewan@microsoft.com>
87608764
L: linux-edac@vger.kernel.org
@@ -27675,6 +27679,13 @@ S: Maintained
2767527679
F: Documentation/devicetree/bindings/memory-controllers/xlnx,versal-ddrmc-edac.yaml
2767627680
F: drivers/edac/versal_edac.c
2767727681

27682+
XILINX VERSALNET EDAC DRIVER
27683+
M: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
27684+
S: Maintained
27685+
F: Documentation/devicetree/bindings/memory-controllers/xlnx,versal-net-ddrmc5.yaml
27686+
F: drivers/edac/versalnet_edac.c
27687+
F: include/linux/cdx/edac_cdx_pcol.h
27688+
2767827689
XILINX WATCHDOG DRIVER
2767927690
M: Srinivas Neeli <srinivas.neeli@amd.com>
2768027691
R: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>

drivers/cdx/controller/cdx_controller.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
#include "cdx_controller.h"
1515
#include "../cdx.h"
1616
#include "mcdi_functions.h"
17-
#include "mcdi.h"
17+
#include "mcdid.h"
1818

1919
static unsigned int cdx_mcdi_rpc_timeout(struct cdx_mcdi *cdx, unsigned int cmd)
2020
{

drivers/cdx/controller/cdx_rpmsg.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
#include "../cdx.h"
1616
#include "cdx_controller.h"
1717
#include "mcdi_functions.h"
18-
#include "mcdi.h"
18+
#include "mcdid.h"
1919

2020
static struct rpmsg_device_id cdx_rpmsg_id_table[] = {
2121
{ .name = "mcdi_ipc" },

drivers/cdx/controller/mcdi.c

Lines changed: 41 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,10 @@
2323
#include <linux/log2.h>
2424
#include <linux/net_tstamp.h>
2525
#include <linux/wait.h>
26+
#include <linux/cdx/bitfield.h>
2627

27-
#include "bitfield.h"
28-
#include "mcdi.h"
28+
#include <linux/cdx/mcdi.h>
29+
#include "mcdid.h"
2930

3031
static void cdx_mcdi_cancel_cmd(struct cdx_mcdi *cdx, struct cdx_mcdi_cmd *cmd);
3132
static void cdx_mcdi_wait_for_cleanup(struct cdx_mcdi *cdx);
@@ -99,6 +100,19 @@ static unsigned long cdx_mcdi_rpc_timeout(struct cdx_mcdi *cdx, unsigned int cmd
99100
return cdx->mcdi_ops->mcdi_rpc_timeout(cdx, cmd);
100101
}
101102

103+
/**
104+
* cdx_mcdi_init - Initialize MCDI (Management Controller Driver Interface) state
105+
* @cdx: Handle to the CDX MCDI structure
106+
*
107+
* This function allocates and initializes internal MCDI structures and resources
108+
* for the CDX device, including the workqueue, locking primitives, and command
109+
* tracking mechanisms. It sets the initial operating mode and prepares the device
110+
* for MCDI operations.
111+
*
112+
* Return:
113+
* * 0 - on success
114+
* * -ENOMEM - if memory allocation or workqueue creation fails
115+
*/
102116
int cdx_mcdi_init(struct cdx_mcdi *cdx)
103117
{
104118
struct cdx_mcdi_iface *mcdi;
@@ -128,7 +142,16 @@ int cdx_mcdi_init(struct cdx_mcdi *cdx)
128142
fail:
129143
return rc;
130144
}
145+
EXPORT_SYMBOL_GPL(cdx_mcdi_init);
131146

147+
/**
148+
* cdx_mcdi_finish - Cleanup MCDI (Management Controller Driver Interface) state
149+
* @cdx: Handle to the CDX MCDI structure
150+
*
151+
* This function is responsible for cleaning up the MCDI (Management Controller Driver Interface)
152+
* resources associated with a cdx_mcdi structure. Also destroys the mcdi workqueue.
153+
*
154+
*/
132155
void cdx_mcdi_finish(struct cdx_mcdi *cdx)
133156
{
134157
struct cdx_mcdi_iface *mcdi;
@@ -143,6 +166,7 @@ void cdx_mcdi_finish(struct cdx_mcdi *cdx)
143166
kfree(cdx->mcdi);
144167
cdx->mcdi = NULL;
145168
}
169+
EXPORT_SYMBOL_GPL(cdx_mcdi_finish);
146170

147171
static bool cdx_mcdi_flushed(struct cdx_mcdi_iface *mcdi, bool ignore_cleanups)
148172
{
@@ -553,6 +577,19 @@ static void cdx_mcdi_start_or_queue(struct cdx_mcdi_iface *mcdi,
553577
cdx_mcdi_cmd_start_or_queue(mcdi, cmd);
554578
}
555579

580+
/**
581+
* cdx_mcdi_process_cmd - Process an incoming MCDI response
582+
* @cdx: Handle to the CDX MCDI structure
583+
* @outbuf: Pointer to the response buffer received from the management controller
584+
* @len: Length of the response buffer in bytes
585+
*
586+
* This function handles a response from the management controller. It locates the
587+
* corresponding command using the sequence number embedded in the header,
588+
* completes the command if it is still pending, and initiates any necessary cleanup.
589+
*
590+
* The function assumes that the response buffer is well-formed and at least one
591+
* dword in size.
592+
*/
556593
void cdx_mcdi_process_cmd(struct cdx_mcdi *cdx, struct cdx_dword *outbuf, int len)
557594
{
558595
struct cdx_mcdi_iface *mcdi;
@@ -590,6 +627,7 @@ void cdx_mcdi_process_cmd(struct cdx_mcdi *cdx, struct cdx_dword *outbuf, int le
590627

591628
cdx_mcdi_process_cleanup_list(mcdi->cdx, &cleanup_list);
592629
}
630+
EXPORT_SYMBOL_GPL(cdx_mcdi_process_cmd);
593631

594632
static void cdx_mcdi_cmd_work(struct work_struct *context)
595633
{
@@ -757,6 +795,7 @@ int cdx_mcdi_rpc(struct cdx_mcdi *cdx, unsigned int cmd,
757795
return cdx_mcdi_rpc_sync(cdx, cmd, inbuf, inlen, outbuf, outlen,
758796
outlen_actual, false);
759797
}
798+
EXPORT_SYMBOL_GPL(cdx_mcdi_rpc);
760799

761800
/**
762801
* cdx_mcdi_rpc_async - Schedule an MCDI command to run asynchronously

drivers/cdx/controller/mcdi_functions.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55

66
#include <linux/module.h>
77

8-
#include "mcdi.h"
98
#include "mcdi_functions.h"
109

1110
int cdx_mcdi_get_num_buses(struct cdx_mcdi *cdx)

drivers/cdx/controller/mcdi_functions.h

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
#ifndef CDX_MCDI_FUNCTIONS_H
99
#define CDX_MCDI_FUNCTIONS_H
1010

11-
#include "mcdi.h"
11+
#include <linux/cdx/mcdi.h>
12+
#include "mcdid.h"
1213
#include "../cdx.h"
1314

1415
/**

drivers/cdx/controller/mcdid.h

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
/* SPDX-License-Identifier: GPL-2.0
2+
*
3+
* Copyright 2008-2013 Solarflare Communications Inc.
4+
* Copyright (C) 2022-2025, Advanced Micro Devices, Inc.
5+
*/
6+
7+
#ifndef CDX_MCDID_H
8+
#define CDX_MCDID_H
9+
10+
#include <linux/mutex.h>
11+
#include <linux/kref.h>
12+
#include <linux/rpmsg.h>
13+
14+
#include "mc_cdx_pcol.h"
15+
16+
#ifdef DEBUG
17+
#define CDX_WARN_ON_ONCE_PARANOID(x) WARN_ON_ONCE(x)
18+
#define CDX_WARN_ON_PARANOID(x) WARN_ON(x)
19+
#else
20+
#define CDX_WARN_ON_ONCE_PARANOID(x) do {} while (0)
21+
#define CDX_WARN_ON_PARANOID(x) do {} while (0)
22+
#endif
23+
24+
#define MCDI_BUF_LEN (8 + MCDI_CTL_SDU_LEN_MAX)
25+
26+
static inline struct cdx_mcdi_iface *cdx_mcdi_if(struct cdx_mcdi *cdx)
27+
{
28+
return cdx->mcdi ? &cdx->mcdi->iface : NULL;
29+
}
30+
31+
int cdx_mcdi_rpc_async(struct cdx_mcdi *cdx, unsigned int cmd,
32+
const struct cdx_dword *inbuf, size_t inlen,
33+
cdx_mcdi_async_completer *complete,
34+
unsigned long cookie);
35+
int cdx_mcdi_wait_for_quiescence(struct cdx_mcdi *cdx,
36+
unsigned int timeout_jiffies);
37+
38+
/*
39+
* We expect that 16- and 32-bit fields in MCDI requests and responses
40+
* are appropriately aligned, but 64-bit fields are only
41+
* 32-bit-aligned.
42+
*/
43+
#define MCDI_BYTE(_buf, _field) \
44+
((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 1), \
45+
*MCDI_PTR(_buf, _field))
46+
#define MCDI_WORD(_buf, _field) \
47+
((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 2), \
48+
le16_to_cpu(*(__force const __le16 *)MCDI_PTR(_buf, _field)))
49+
#define MCDI_POPULATE_DWORD_1(_buf, _field, _name1, _value1) \
50+
CDX_POPULATE_DWORD_1(*_MCDI_DWORD(_buf, _field), \
51+
MC_CMD_ ## _name1, _value1)
52+
#define MCDI_SET_QWORD(_buf, _field, _value) \
53+
do { \
54+
CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[0], \
55+
CDX_DWORD, (u32)(_value)); \
56+
CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[1], \
57+
CDX_DWORD, (u64)(_value) >> 32); \
58+
} while (0)
59+
#define MCDI_QWORD(_buf, _field) \
60+
(CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[0], CDX_DWORD) | \
61+
(u64)CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[1], CDX_DWORD) << 32)
62+
63+
#endif /* CDX_MCDID_H */

drivers/edac/Kconfig

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -576,4 +576,20 @@ config EDAC_LOONGSON
576576
errors (CE) only. Loongson-3A5000/3C5000/3D5000/3A6000/3C6000
577577
are compatible.
578578

579+
config EDAC_CORTEX_A72
580+
tristate "ARM Cortex A72"
581+
depends on ARM64
582+
help
583+
Support for L1/L2 cache error detection for ARM Cortex A72 processor.
584+
The detected and reported errors are from reading CPU/L2 memory error
585+
syndrome registers.
586+
587+
config EDAC_VERSALNET
588+
tristate "AMD VersalNET DDR Controller"
589+
depends on CDX_CONTROLLER && ARCH_ZYNQMP
590+
help
591+
Support for single bit error correction, double bit error detection
592+
and other system errors from various IP subsystems like RPU, NOCs,
593+
HNICX, PL on the AMD Versal NET DDR memory controller.
594+
579595
endif # EDAC

0 commit comments

Comments
 (0)