Skip to content

Commit e2349c5

Browse files
committed
Merge remote-tracking branches 'ras/edac-amd-atl', 'ras/edac-drivers' and 'ras/edac-misc' into edac-updates
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
3 parents 69acbdb + ef1b6d9 + 24e3848 commit e2349c5

16 files changed

Lines changed: 750 additions & 641 deletions

File tree

Documentation/admin-guide/RAS/main.rst

Lines changed: 3 additions & 139 deletions
Original file line numberDiff line numberDiff line change
@@ -406,24 +406,8 @@ index of the MC::
406406
|->mc2
407407
....
408408

409-
Under each ``mcX`` directory each ``csrowX`` is again represented by a
410-
``csrowX``, where ``X`` is the csrow index::
411-
412-
.../mc/mc0/
413-
|
414-
|->csrow0
415-
|->csrow2
416-
|->csrow3
417-
....
418-
419-
Notice that there is no csrow1, which indicates that csrow0 is composed
420-
of a single ranked DIMMs. This should also apply in both Channels, in
421-
order to have dual-channel mode be operational. Since both csrow2 and
422-
csrow3 are populated, this indicates a dual ranked set of DIMMs for
423-
channels 0 and 1.
424-
425-
Within each of the ``mcX`` and ``csrowX`` directories are several EDAC
426-
control and attribute files.
409+
Within each of the ``mcX`` directory are several EDAC control and
410+
attribute files.
427411

428412
``mcX`` directories
429413
-------------------
@@ -569,134 +553,14 @@ this ``X`` memory module:
569553
- Unbuffered-DDR
570554

571555
.. [#f5] On some systems, the memory controller doesn't have any logic
572-
to identify the memory module. On such systems, the directory is called ``rankX`` and works on a similar way as the ``csrowX`` directories.
556+
to identify the memory module. On such systems, the directory is called ``rankX``.
573557
On modern Intel memory controllers, the memory controller identifies the
574558
memory modules directly. On such systems, the directory is called ``dimmX``.
575559
576560
.. [#f6] There are also some ``power`` directories and ``subsystem``
577561
symlinks inside the sysfs mapping that are automatically created by
578562
the sysfs subsystem. Currently, they serve no purpose.
579563
580-
``csrowX`` directories
581-
----------------------
582-
583-
When CONFIG_EDAC_LEGACY_SYSFS is enabled, sysfs will contain the ``csrowX``
584-
directories. As this API doesn't work properly for Rambus, FB-DIMMs and
585-
modern Intel Memory Controllers, this is being deprecated in favor of
586-
``dimmX`` directories.
587-
588-
In the ``csrowX`` directories are EDAC control and attribute files for
589-
this ``X`` instance of csrow:
590-
591-
592-
- ``ue_count`` - Total Uncorrectable Errors count attribute file
593-
594-
This attribute file displays the total count of uncorrectable
595-
errors that have occurred on this csrow. If panic_on_ue is set
596-
this counter will not have a chance to increment, since EDAC
597-
will panic the system.
598-
599-
600-
- ``ce_count`` - Total Correctable Errors count attribute file
601-
602-
This attribute file displays the total count of correctable
603-
errors that have occurred on this csrow. This count is very
604-
important to examine. CEs provide early indications that a
605-
DIMM is beginning to fail. This count field should be
606-
monitored for non-zero values and report such information
607-
to the system administrator.
608-
609-
610-
- ``size_mb`` - Total memory managed by this csrow attribute file
611-
612-
This attribute file displays, in count of megabytes, the memory
613-
that this csrow contains.
614-
615-
616-
- ``mem_type`` - Memory Type attribute file
617-
618-
This attribute file will display what type of memory is currently
619-
on this csrow. Normally, either buffered or unbuffered memory.
620-
Examples:
621-
622-
- Registered-DDR
623-
- Unbuffered-DDR
624-
625-
626-
- ``edac_mode`` - EDAC Mode of operation attribute file
627-
628-
This attribute file will display what type of Error detection
629-
and correction is being utilized.
630-
631-
632-
- ``dev_type`` - Device type attribute file
633-
634-
This attribute file will display what type of DRAM device is
635-
being utilized on this DIMM.
636-
Examples:
637-
638-
- x1
639-
- x2
640-
- x4
641-
- x8
642-
643-
644-
- ``ch0_ce_count`` - Channel 0 CE Count attribute file
645-
646-
This attribute file will display the count of CEs on this
647-
DIMM located in channel 0.
648-
649-
650-
- ``ch0_ue_count`` - Channel 0 UE Count attribute file
651-
652-
This attribute file will display the count of UEs on this
653-
DIMM located in channel 0.
654-
655-
656-
- ``ch0_dimm_label`` - Channel 0 DIMM Label control file
657-
658-
659-
This control file allows this DIMM to have a label assigned
660-
to it. With this label in the module, when errors occur
661-
the output can provide the DIMM label in the system log.
662-
This becomes vital for panic events to isolate the
663-
cause of the UE event.
664-
665-
DIMM Labels must be assigned after booting, with information
666-
that correctly identifies the physical slot with its
667-
silk screen label. This information is currently very
668-
motherboard specific and determination of this information
669-
must occur in userland at this time.
670-
671-
672-
- ``ch1_ce_count`` - Channel 1 CE Count attribute file
673-
674-
675-
This attribute file will display the count of CEs on this
676-
DIMM located in channel 1.
677-
678-
679-
- ``ch1_ue_count`` - Channel 1 UE Count attribute file
680-
681-
682-
This attribute file will display the count of UEs on this
683-
DIMM located in channel 0.
684-
685-
686-
- ``ch1_dimm_label`` - Channel 1 DIMM Label control file
687-
688-
This control file allows this DIMM to have a label assigned
689-
to it. With this label in the module, when errors occur
690-
the output can provide the DIMM label in the system log.
691-
This becomes vital for panic events to isolate the
692-
cause of the UE event.
693-
694-
DIMM Labels must be assigned after booting, with information
695-
that correctly identifies the physical slot with its
696-
silk screen label. This information is currently very
697-
motherboard specific and determination of this information
698-
must occur in userland at this time.
699-
700564
701565
System Logging
702566
--------------

arch/loongarch/configs/loongson3_defconfig

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -917,7 +917,6 @@ CONFIG_MMC=y
917917
CONFIG_MMC_LOONGSON2=m
918918
CONFIG_INFINIBAND=m
919919
CONFIG_EDAC=y
920-
# CONFIG_EDAC_LEGACY_SYSFS is not set
921920
CONFIG_EDAC_LOONGSON=y
922921
CONFIG_RTC_CLASS=y
923922
CONFIG_RTC_DRV_EFI=y

drivers/edac/Kconfig

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,6 @@ menuconfig EDAC
2323

2424
if EDAC
2525

26-
config EDAC_LEGACY_SYSFS
27-
bool "EDAC legacy sysfs"
28-
default y
29-
help
30-
Enable the compatibility sysfs nodes.
31-
Use 'Y' if your edac utilities aren't ported to work with the newer
32-
structures.
33-
3426
config EDAC_DEBUG
3527
bool "Debugging"
3628
select DEBUG_FS
@@ -291,6 +283,18 @@ config EDAC_I10NM
291283
system has non-volatile DIMMs you should also manually
292284
select CONFIG_ACPI_NFIT.
293285

286+
config EDAC_IMH
287+
tristate "Intel Integrated Memory/IO Hub MC"
288+
depends on X86_64 && X86_MCE_INTEL && ACPI
289+
depends on ACPI_NFIT || !ACPI_NFIT # if ACPI_NFIT=m, EDAC_IMH can't be y
290+
select DMI
291+
select ACPI_ADXL
292+
help
293+
Support for error detection and correction the Intel
294+
Integrated Memory/IO Hub Memory Controller. This MC IP is
295+
first used on the Diamond Rapids servers but may appear on
296+
others in the future.
297+
294298
config EDAC_PND2
295299
tristate "Intel Pondicherry2"
296300
depends on PCI && X86_64 && X86_MCE_INTEL

drivers/edac/Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,9 @@ obj-$(CONFIG_EDAC_SKX) += skx_edac.o skx_edac_common.o
6565
i10nm_edac-y := i10nm_base.o
6666
obj-$(CONFIG_EDAC_I10NM) += i10nm_edac.o skx_edac_common.o
6767

68+
imh_edac-y := imh_base.o
69+
obj-$(CONFIG_EDAC_IMH) += imh_edac.o skx_edac_common.o
70+
6871
obj-$(CONFIG_EDAC_HIGHBANK_MC) += highbank_mc_edac.o
6972
obj-$(CONFIG_EDAC_HIGHBANK_L2) += highbank_l2_edac.o
7073

0 commit comments

Comments
 (0)