diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-01-08 16:03:00 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-01-08 16:03:00 -0800 |
commit | 3edbe8afb617a736ae0dcc877311bdb112a00123 (patch) | |
tree | 7f9abbb39c12245bea8014310a32b4367cb025c4 /Documentation | |
parent | bef91c28f28fe8a36b91e9a39f60054ae1874280 (diff) | |
parent | 1f68ce2a027250aeeb1756391110cdc4dc97c797 (diff) |
Merge tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS updates from Borislav Petkov:
- Convert the hw error storm handling into a finer-grained, per-bank
solution which allows for more timely detection and reporting of
errors
- Start a documentation section which will hold down relevant RAS
features description and how they should be used
- Add new AMD error bank types
- Slim down and remove error type descriptions from the kernel side of
error decoding to rasdaemon which can be used from now on to decode
hw errors on AMD
- Mark pages containing uncorrectable errors as poison so that kdump
can avoid them and thus not cause another panic
- The usual cleanups and fixlets
* tag 'ras_core_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Handle Intel threshold interrupt storms
x86/mce: Add per-bank CMCI storm mitigation
x86/mce: Remove old CMCI storm mitigation code
Documentation: Begin a RAS section
x86/MCE/AMD: Add new MA_LLC, USR_DP, and USR_CP bank types
EDAC/mce_amd: Remove SMCA Extended Error code descriptions
x86/mce/amd, EDAC/mce_amd: Move long names to decoder module
x86/mce/inject: Clear test status value
x86/mce: Remove redundant check from mce_device_create()
x86/mce: Mark fatal MCE's page as poison to avoid panic in the kdump kernel
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/RAS/ras.rst | 26 | ||||
-rw-r--r-- | Documentation/index.rst | 1 |
2 files changed, 27 insertions, 0 deletions
diff --git a/Documentation/RAS/ras.rst b/Documentation/RAS/ras.rst new file mode 100644 index 000000000000..2556b397cd27 --- /dev/null +++ b/Documentation/RAS/ras.rst @@ -0,0 +1,26 @@ +.. SPDX-License-Identifier: GPL-2.0 + +Reliability, Availability and Serviceability features +===================================================== + +This documents different aspects of the RAS functionality present in the +kernel. + +Error decoding +--------------- + +* x86 + +Error decoding on AMD systems should be done using the rasdaemon tool: +https://github.com/mchehab/rasdaemon/ + +While the daemon is running, it would automatically log and decode +errors. If not, one can still decode such errors by supplying the +hardware information from the error:: + + $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca + +Also, the user can pass particular family and model to decode the error +string:: + + $ rasdaemon -p --status <STATUS> --ipid <IPID> --smca --family <CPU Family> --model <CPU Model> --bank <BANK_NUM> diff --git a/Documentation/index.rst b/Documentation/index.rst index 9dfdc826618c..36e61783437c 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -113,6 +113,7 @@ to ReStructured Text format, or are simply too old. :maxdepth: 1 staging/index + RAS/ras Translations |