diff options
author | Mauro Carvalho Chehab <mchehab+huawei@kernel.org> | 2021-09-30 11:44:51 +0200 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2021-10-05 16:24:15 +0200 |
commit | edfc8730ba45eac3cca20dba3799d6ae6c584b56 (patch) | |
tree | 1eef084e6fd0b2a3484e6459140d07c806ab0012 /Documentation/x86 | |
parent | df2205de92975bdaabb115dfef5c513b26b36263 (diff) |
ABI: sysfs-mce: add a new ABI file
Reduce the gap of missing ABIs for Intel servers with MCE
by adding a new ABI file.
The contents of this file comes from:
Documentation/x86/x86_64/machinecheck.rst
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/801a26985e32589eb78ba4b728d3e19fdea18f04.1632994837.git.mchehab+huawei@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'Documentation/x86')
-rw-r--r-- | Documentation/x86/x86_64/machinecheck.rst | 56 |
1 files changed, 2 insertions, 54 deletions
diff --git a/Documentation/x86/x86_64/machinecheck.rst b/Documentation/x86/x86_64/machinecheck.rst index b402e04bee60..cea12ee97200 100644 --- a/Documentation/x86/x86_64/machinecheck.rst +++ b/Documentation/x86/x86_64/machinecheck.rst @@ -21,60 +21,8 @@ from /dev/mcelog. Normally mcelog should be run regularly from a cronjob. Each CPU has a directory in /sys/devices/system/machinecheck/machinecheckN (N = CPU number). -The directory contains some configurable entries: - -bankNctl - (N bank number) - - 64bit Hex bitmask enabling/disabling specific subevents for bank N - When a bit in the bitmask is zero then the respective - subevent will not be reported. - By default all events are enabled. - Note that BIOS maintain another mask to disable specific events - per bank. This is not visible here - -The following entries appear for each CPU, but they are truly shared -between all CPUs. - -check_interval - How often to poll for corrected machine check errors, in seconds - (Note output is hexadecimal). Default 5 minutes. When the poller - finds MCEs it triggers an exponential speedup (poll more often) on - the polling interval. When the poller stops finding MCEs, it - triggers an exponential backoff (poll less often) on the polling - interval. The check_interval variable is both the initial and - maximum polling interval. 0 means no polling for corrected machine - check errors (but some corrected errors might be still reported - in other ways) - -tolerant - Tolerance level. When a machine check exception occurs for a non - corrected machine check the kernel can take different actions. - Since machine check exceptions can happen any time it is sometimes - risky for the kernel to kill a process because it defies - normal kernel locking rules. The tolerance level configures - how hard the kernel tries to recover even at some risk of - deadlock. Higher tolerant values trade potentially better uptime - with the risk of a crash or even corruption (for tolerant >= 3). - - 0: always panic on uncorrected errors, log corrected errors - 1: panic or SIGBUS on uncorrected errors, log corrected errors - 2: SIGBUS or log uncorrected errors, log corrected errors - 3: never panic or SIGBUS, log all errors (for testing only) - - Default: 1 - - Note this only makes a difference if the CPU allows recovery - from a machine check exception. Current x86 CPUs generally do not. - -trigger - Program to run when a machine check event is detected. - This is an alternative to running mcelog regularly from cron - and allows to detect events faster. -monarch_timeout - How long to wait for the other CPUs to machine check too on a - exception. 0 to disable waiting for other CPUs. - Unit: us +The directory contains some configurable entries. See +Documentation/ABI/testing/sysfs-mce for more details. TBD document entries for AMD threshold interrupt configuration |