summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
AgeCommit message (Collapse)Author
2023-11-29drm/amdgpu: Update EEPROM I2C address for smu v13_0_0Candice Li
Check smu v13_0_0 SKU type to select EEPROM I2C address. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-10-05drm/amdgpu: Fix complex macros errorSrinivasan Shanmugam
Fixes the below: ERROR: Macros with complex values should be enclosed in parentheses WARNING: macros should not use a trailing semicolon +#define amdgpu_inc_vram_lost(adev) atomic_inc(&((adev)->vram_lost_counter)); Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26drm/amdgpu: change if condition for bad channel bitmap updateTao Zhou
The amdgpu_ras_eeprom_control.bad_channel_bitmap is u32 type, but the channel index could be larger than 32. For the ASICs whose channel number is more than 32, the amdgpu_dpm_send_hbm_bad_channel_flag interface is not supported, so we simply bypass channel bitmap update under this condition. v2: replace sizeof with BITS_PER_TYPE, we should check bit number instead of byte number. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20drm/amdgpu: Use function for IP version checkLijo Lazar
Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-30drm/amdgpu: Only support RAS EEPROM on dGPU platformCandice Li
RAS EEPROM device is only supported on dGPU platform for smu v13_0_6. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-15drm/amdgpu: Add I2C EEPROM support on smu v13_0_6Candice Li
Support I2C EEPROM on smu v13_0_6. v2: Move IP_VERSION(13, 0, 6) ahead of IP_VERSION(13, 0, 10). Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-21drm/amd: Avoid reading the VBIOS part number twiceMario Limonciello
The VBIOS part number is read both in amdgpu_atom_parse() as well as in atom_get_vbios_pn() and stored twice in the `struct atom_context` structure. Remove the first unnecessary read and move the `pr_info` line from that read into the second. v2: squash in unused variable removal Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: Fix kdoc warningSrinivasan Shanmugam
Fixes the following gcc with W=1: drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c:76: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * EEPROM Table structure v1 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c:98: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * EEPROM Table structrue v2.1 Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Set EEPROM ras infoStanley.Yang
Set EEPROM ras info: rma status, health percent and bad page threshold. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Calculate EEPROM table ras info bytes sumStanley.Yang
It's more reasonable to check EEPROM table ras info bytes. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add support EEPROM table v2.1Stanley.Yang
Add ras info to EEPROM table, app can analyse device ECC status without GPU driver through EEPROM table ras info. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add RAS table v2.1 macro definitionStanley.Yang
Add RAS EEPROM table version 2.1 macro definition. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Rename ras table versionStanley.Yang
Rename RAS_TABLE_VER to RAS_TABLE_VER_V1, move RAS_TABLE_VER_V1 from amdgpu_ras_eeprom.c to amdgpu_ras_eeprom.h. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-13drm/amdgpu: simplify amdgpu_ras_eeprom.cAlex Deucher
All chips that support RAS also support IP discovery, so use the IP versions rather than a mix of IP versions and asic types. Checking the validity of the atom_ctx pointer is not required as the vbios is already fetched at this point. v2: add comments to id asic types based on feedback from Luben Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com>
2023-03-31drm/amdgpu: Return from switch early for EEPROM I2C addressLuben Tuikov
As soon as control->i2c_address is set, return; remove the "break;" from the switch--it is unnecessary. This mimics what happens when for some cases in the switch, we call helper functions with "return <helper function>". Remove final function "return true;" to indicate that the switch is final and terminal, and that there should be no code after the switch. Cc: Candice Li <candice.li@amd.com> Cc: Kent Russell <kent.russell@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-31drm/amdgpu: Remove second moot switch to set EEPROM I2C addressLuben Tuikov
Remove second switch since it already has its own function and case in the first switch. This also avoids requalifying the EEPROM I2C address for VEGA20, SIENNA CICHLID, and ALDEBARAN, as those have been set by the first switch and shouldn't match SMU v13.0.x. Cc: Candice Li <candice.li@amd.com> Cc: Kent Russell <kent.russell@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Fixes: 158225294683 ("drm/amdgpu: Add EEPROM I2C address for smu v13_0_0") Fixes: c9bdc6c3cf39 ("drm/amdgpu: Add EEPROM I2C address support for ip discovery") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23drm/amdgpu: add bad_page_threshold check in ras_eeprom_check_errTao Zhou
bad_page_threshold controls page retirement behavior and it should be also checked. v2: simplify the condition of bad page handling path. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23drm/amdgpu: change default behavior of bad_page_threshold parameterTao Zhou
Ignore ras umc bad page threshold by default, GPU initialization won't be stopped in this mode. v2: refine the description of bad_page_threshold. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-17drm/amdgpu: Add support for RAS table at 0x40000Luben Tuikov
Add support for RAS table at I2C EEPROM address of 0x40000, since on some ASICs it is not at 0, but at 0x40000. Cc: Alex Deucher <Alexander.Deucher@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Tested-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09drm/amdgpu: Decouple RAS EEPROM addresses from chipsLuben Tuikov
Abstract RAS I2C EEPROM addresses from chip names, and set their macro definition names to the address they set, not the chip they attach to. Since most chips either use I2C EEPROM address 0 or 40000h for the RAS table start offset, this leaves us with only two macro definitions as opposed to five, and removes the redundancy of four. Cc: Candice Li <candice.li@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09drm/amdgpu: Remove redundant I2C EEPROM addressLuben Tuikov
Remove redundant EEPROM_I2C_MADDR_54H address, since we already have it represented (ARCTURUS), and since we don't include the I2C device type identifier in EEPROM memory addresses, i.e. that high up in the device abstraction--we only use EEPROM memory addresses, as memory is continuously represented by EEPROM device(s) on the I2C bus. Add a comment describing what these memory addresses are, how they come about and how they're usually extracted from the device address byte. Cc: Candice Li <candice.li@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Fixes: c9bdc6c3cf39df ("drm/amdgpu: Add EEPROM I2C address support for ip discovery") Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27drm/amdgpu: Add EEPROM I2C address support for ip discoveryCandice Li
1. Update EEPROM_I2C_MADDR_SMU_13_0_0 to EEPROM_I2C_MADDR_54H 2. Add EEPROM I2C address support for smu v13_0_0 and v13_0_10. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-27drm/amdgpu: Update ras eeprom support for smu v13_0_0 and v13_0_10Candice Li
Enable RAS EEPROM support for smu v13_0_0 and v13_0_10. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdgpu: Add EEPROM I2C address for smu v13_0_0Candice Li
Set correct EEPROM I2C address for smu v13_0_0. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15drm/amdgpu: message smu to update bad channel infoStanley.Yang
It should notice SMU to update bad channel info when detected uncorrectable error in UMC block Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-25Merge tag 'drm-misc-next-2022-02-23' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.18: UAPI Changes: Cross-subsystem Changes: - Split out panel-lvds and lvds dt bindings . - Put yes/no on/off disabled/enabled strings in linux/string_helpers.h and use it in drivers and tomoyo. - Clarify dma_fence_chain and dma_fence_array should never include eachother. - Flatten chains in syncobj's. - Don't double add in fbdev/defio when page is already enlisted. - Don't sort deferred-I/O pages by default in fbdev. Core Changes: - Fix missing pm_runtime_put_sync in bridge. - Set modifier support to only linear fb modifier if drivers don't advertise support. - As a result, we remove allow_fb_modifiers. - Add missing clear for EDID Deep Color Modes in drm_reset_display_info. - Assorted documentation updates. - Warn once in drm_clflush if there is no arch support. - Add missing select for dp helper in drm_panel_edp. - Assorted small fixes. - Improve fb-helper's clipping handling. - Don't dump shmem mmaps in a core dump. - Add accounting to ttm resource manager, and use it in amdgpu. - Allow querying the detected eDP panel through debugfs. - Add helpers for xrgb8888 to 8 and 1 bits gray. - Improve drm's buddy allocator. - Add selftests for the buddy allocator. Driver Changes: - Add support for nomodeset to a lot of drm drivers. - Use drm_module_*_driver in a lot of drm drivers. - Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau, bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86. - Add bridge/it6505. - Create DP and DVI-I connectors in ast. - Assorted nouveau backlight fixes. - Rework amdgpu reset handling. - Add dt bindings for ingenic,jz4780-dw-hdmi. - Support reading edid through aux channel in ingenic. - Add a drm driver for Solomon SSD130x OLED displays. - Add simple support for sharp LQ140M1JW46. - Add more panels to nt35560. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/686ec871-e77f-c230-22e5-9e3bb80f064a@linux.intel.com
2022-02-11drm/amdgpu: Reset OOB table error count infoStanley.Yang
The OOB table error count info should be reset after reset eeprom table Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amdgpu: Move reset sem into reset_domainAndrey Grodzovsky
We want single instance of reset sem across all reset clients because in case of XGMI we should stop access cross device MMIO because any of them could be in a reset in the moment. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74117.html
2022-01-27drm/amd: Expose the FRU SMU I2C busLuben Tuikov
Expose both SMU I2C buses. Some boards use the same bus for both the RAS and FRU EEPROMs and others use different buses. This enables the additional I2C bus and sets the right buses to use for RAS and FRU EEPROM access. Cc: Roy Sun <Roy.Sun@amd.com> Co-developed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28drm/amdgpu: Add kernel parameter support for ignoring bad page thresholdKent Russell
When a GPU hits the bad_page_threshold, it will not be initialized by the amdgpu driver. This means that the table cannot be cleared, nor can information gathering be performed (getting serial number, BDF, etc). If the bad_page_threshold kernel parameter is set to -2, continue to initialize the GPU, while printing a warning to dmesg that this action has been done v2: squash in Luben's fix to restore RAS info reporting Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28drm/amdgpu: Warn when bad pages approaches 90% thresholdKent Russell
dmesg doesn't warn when the number of bad pages approaches the threshold for page retirement. WARN when the number of bad pages is at 90% or greater for easier checks and planning, instead of waiting until the GPU is full of bad pages. Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-20drm/amdgpu: Clarify error when hitting bad page thresholdKent Russell
Change the error message when the bad_page_threshold is reached, explicitly stating that the GPU will not be initialized. Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Mukul Joshi <Mukul.Joshi@amd.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-09-16drm/amdgpu: Drop inline from amdgpu_ras_eeprom_max_record_countMichel Dänzer
This was unusual; normally, inline functions are declared static as well, and defined in a header file if used by multiple compilation units. The latter would be more involved in this case, so just drop the inline declaration for now. Fixes compile failure building for ppc64le on RHEL 8: In file included from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h:32, from ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:33: ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c: In function ‘amdgpu_ras_recovery_init’: ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.h:90:17: error: inlining failed in call to ‘always_inline’ ‘amdgpu_ras_eeprom_max_record_count’: function body not available 90 | inline uint32_t amdgpu_ras_eeprom_max_record_count(void); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:1985:34: note: called from here 1985 | max_eeprom_records_len = amdgpu_ras_eeprom_max_record_count(); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: c84d46707ebb "drm/amdgpu: validate bad page threshold in ras(v3)" Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-30drm/amdgpu: Process any VBIOS RAS EEPROM addressLuben Tuikov
We can now process any RAS EEPROM address from VBIOS. Generalize so as to compute the top three bits of the 19-bit EEPROM address, from any byte returned as the "i2c address" from VBIOS. Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-05drm/amdgpu: set RAS EEPROM address from VBIOSJohn Clements
update to latest atombios fw table Signed-off-by: John Clements <john.clements@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-08drm/amdgpu: return -EFAULT if copy_to_user() failsDan Carpenter
If copy_to_user() fails then this should return -EFAULT instead of -EINVAL. Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-08drm/amdgpu: unlock on error in amdgpu_ras_debugfs_table_read()Dan Carpenter
This error path needs to unlock before returning. While we're at it, the correct error code from copy_to_user() failure is -EFAULT, not -EINVAL. Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-08drm/amdgpu: fix a signedness bug in __verify_ras_table_checksum()Dan Carpenter
If amdgpu_eeprom_read() returns a negative error code then the error handling checks: if (res < buf_size) { The problem is that "buf_size" is a u32 so negative values are type promoted to a high positive values and the condition is false. Fix this by changing the type of "buf_size" to int. Fixes: 63d4c081a556a1 ("drm/amdgpu: Optimize EEPROM RAS table I/O") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: fix 64 bit divide in eeprom codeAlex Deucher
pos is 64 bits. Fixes: c65b0805e77919 ("drm/amdgpu: RAS EEPROM table is now in debugfs") Cc: luben.tuikov@amd.com Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: RAS EEPROM table is now in debugfsLuben Tuikov
Add "ras_eeprom_size" file in debugfs, which reports the maximum size allocated to the RAS table in EEROM, as the number of bytes and the number of records it could store. For instance, $cat /sys/kernel/debug/dri/0/ras/ras_eeprom_size 262144 bytes or 10921 records $_ Add "ras_eeprom_table" file in debugfs, which dumps the RAS table stored EEPROM, in a formatted way. For instance, $cat ras_eeprom_table Signature Version FirstOffs Size Checksum 0x414D4452 0x00010000 0x00000014 0x000000EC 0x000000DA Index Offset ErrType Bank/CU TimeStamp Offs/Addr MemChl MCUMCID RetiredPage 0 0x00014 ue 0x00 0x00000000607608DC 0x000000000000 0x00 0x00 0x000000000000 1 0x0002C ue 0x00 0x00000000607608DC 0x000000001000 0x00 0x00 0x000000000001 2 0x00044 ue 0x00 0x00000000607608DC 0x000000002000 0x00 0x00 0x000000000002 3 0x0005C ue 0x00 0x00000000607608DC 0x000000003000 0x00 0x00 0x000000000003 4 0x00074 ue 0x00 0x00000000607608DC 0x000000004000 0x00 0x00 0x000000000004 5 0x0008C ue 0x00 0x00000000607608DC 0x000000005000 0x00 0x00 0x000000000005 6 0x000A4 ue 0x00 0x00000000607608DC 0x000000006000 0x00 0x00 0x000000000006 7 0x000BC ue 0x00 0x00000000607608DC 0x000000007000 0x00 0x00 0x000000000007 8 0x000D4 ue 0x00 0x00000000607608DD 0x000000008000 0x00 0x00 0x000000000008 $_ Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Xinhui Pan <xinhui.pan@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: Optimize EEPROM RAS table I/OLuben Tuikov
Split functionality between read and write, which simplifies the code and exposes areas of optimization and more or less complexity, and take advantage of that. Read and write the table in one go; use a separate stage to decode or encode the data, as opposed to on the fly, which keeps the I2C bus busy. Use a single read/write to read/write the table or at most two if the number of records we're reading/writing wraps around. Check the check-sum of a table in EEPROM on init. Update the checksum at the same time as when updating the table header signature, when the threshold was increased on boot. Take advantage of arithmetic modulo 256, that is, use a byte!, to greatly simplify checksum arithmetic. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: Get rid of test functionLuben Tuikov
The code is now tested from userspace. Remove already macroed out test function. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: Some renamesLuben Tuikov
Qualify with "ras_". Use kernel's own--don't redefine your own. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: Nerf buffLuben Tuikov
buff --> buf. Essentially buffer abbreviates to buf, remove 1/2 of it, or just the iron part, as opposed to just the Er, Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: Use explicit cardinality for clarityLuben Tuikov
RAS_MAX_RECORD_NUM may mean the maximum record number, as in the maximum house number on your street, or it may mean the maximum number of records, as in the count of records, which is also a number. To make this distinction whether the number is ordinal (index) or cardinal (count), rename this macro to RAS_MAX_RECORD_COUNT. This makes it easy to understand what it refers to, especially when we compute quantities such as, how many records do we have left in the table, especially when there are so many other numbers, quantities and numerical macros around. Also rename the long, amdgpu_ras_eeprom_get_record_max_length() to the more succinct and clear, amdgpu_ras_eeprom_max_record_count(). When computing the threshold, which also deals with counts, i.e. "how many", use cardinal "max_eeprom_records_count", than the quantitative "max_eeprom_records_len". Simplify the logic here and there, as well. Cc: Guchun Chen <guchun.chen@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: Simplify RAS EEPROM checksum calculationsLuben Tuikov
Rename update_table_header() to write_table_header() as this function is actually writing it to EEPROM. Use kernel types; use u8 to carry around the checksum, in order to take advantage of arithmetic modulo 8-bits (256). Tidy up to 80 columns. When updating the checksum, just recalculate the whole thing. Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: Fix amdgpu_ras_eeprom_init()Luben Tuikov
No need to account for the 2 bytes of EEPROM address--this is now well abstracted away by the fixes the the lower layers. Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: Return result fix in RASLuben Tuikov
The low level EEPROM write method, doesn't return 1, but the number of bytes written. Thus do not compare to 1, instead, compare to greater than 0 for success. Other cleanup: if the lower layers returned -errno, then return that, as opposed to overwriting the error code with one-fits-all -EINVAL. For instance, some return -EAGAIN. Cc: Jean Delvare <jdelvare@suse.de> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Lijo Lazar <Lijo.Lazar@amd.com> Cc: Stanley Yang <Stanley.Yang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: EEPROM: add explicit read and writeLuben Tuikov
Add explicit amdgpu_eeprom_read() and amdgpu_eeprom_write() for clarity. Cc: Jean Delvare <jdelvare@suse.de> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Lijo Lazar <Lijo.Lazar@amd.com> Cc: Stanley Yang <Stanley.Yang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-01drm/amdgpu: RAS xfer to read/writeLuben Tuikov
Wrap amdgpu_ras_eeprom_xfer(..., bool write), into amdgpu_ras_eeprom_read() and amdgpu_ras_eeprom_write(), as that makes reading and understanding the code clearer. Cc: Jean Delvare <jdelvare@suse.de> Cc: Alexander Deucher <Alexander.Deucher@amd.com> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com> Cc: Lijo Lazar <Lijo.Lazar@amd.com> Cc: Stanley Yang <Stanley.Yang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>