summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
AgeCommit message (Collapse)Author
2019-08-12drm/amdgpu: support mmhub ras in amdgpu rasTao Zhou
call mmhub ras query/inject in amdgpu ras Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-12drm/amdgpu: add sub block parameter in ras inject commandTao Zhou
ras sub block index could be passed from shell command Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-06drm/amdgpu: update ras sysfs feature infoTao Zhou
remove confused ras error type info Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: replace AMDGPU_RAS_UE with AMDGPU_RAS_SUCCESSTao Zhou
ce can also trigger interrupt, and even both ce and ue error can be found in one ras query, distinguishing between ce and ue in interrupt handler is uncessary. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Suggested-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: support ce interrupt in ras moduleTao Zhou
correctable error can also trigger interrupt in some ras blocks Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-02drm/amdgpu: add error address query for umc rasTao Zhou
umc error address query can get ce/ue error address and clear error status Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: support gfx ras error injection and err_cnt queryDennis Li
check gfx error count in both ras querry function and ras interrupt handler. gfx ras is still disabled by default due to known stability issue found in gpu reset. Signed-off-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: remove ras_reserve_vram in ras injectionTao Zhou
error injection address is not in gpu address space Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add check for ras error typeTao Zhou
only ue and ce errors are supported Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: allow ras interrupt callback to return error dataTao Zhou
add error data as parameter for ras interrupt cb and process it Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add support for recording ras error addressTao Zhou
more than one error address may be recorded in one query Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: switch to amdgpu_umc structureTao Zhou
create new amdgpu_umc structure to for more umc settings in future and switch to the new structure Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: add ras error count after each query (v2)Tao Zhou
v1: increase ras ce/ue error count v2: log the number of correctable and uncorrectable errors Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: querry umc error countHawking Zhang
check umc error count in both ras querry function and ras interrupt handler Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-31drm/amdgpu: move some ras data structure to amdgpu_ras.hHawking Zhang
These are common structures that can be included by IP specific source files Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <dennis.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: drop ras self testHawking Zhang
this function is not needed any more. error injection is the only way to validate ras but it can't be executed in amdgpu_ras_init, where gpu is even not initialized Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: only allow error injection to UMC IP blockHawking Zhang
error injection to other IP blocks (except UMC) will be enabled until RAS feature stablize on those IP blocks Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: do not create ras debugfs/sysfs node for ASICs that don't have ↵Hawking Zhang
ras ability driver shouldn't init any ras debugfs/sysfs node for ASICs that don't have ras hardware ability Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-25Merge branch 'drm-next' into drm-next-5.3Alex Deucher
Backmerge drm-next and fix up conflicts due to drmP.h removal. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-20drm/amdgpu: Do error injection even vram reserve failsxinhui pan
As long as the address is mapped with vram, we can do an error injection. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-14Merge tag 'drm-misc-next-2019-06-14' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.3: UAPI Changes: Cross-subsystem Changes: - Add code to signal all dma-fences when freed with pending signals. - Annotate reservation object access in CONFIG_DEBUG_MUTEXES Core Changes: - Assorted documentation fixes. - Use irqsave/restore spinlock to add crc entry. - Move code around to drm_client, for internal modeset clients. - Make drm_crtc.h and drm_debugfs.h self-contained. - Remove drm_fb_helper_connector. - Add bootsplash to todo. - Fix lock ordering in pan_display_legacy. - Support pinning buffers to current location in gem-vram. - Remove the now unused locking functions from gem-vram. - Remove the now unused kmap-object argument from vram helpers. - Stop checking return value of debugfs_create. - Add atomic encoder enable/disable helpers. - pass drm_atomic_state to atomic connector check. - Add atomic support for bridge enable/disable. - Add self refresh helpers to core. Driver Changes: - Add extra delay to make MTP SDM845 work. - Small fixes to virtio, vkms, sii902x, sii9234, ast, mcde, analogix, rockchip. - Add zpos and ?BGR8888 support to meson. - More removals of drm_os_linux and drmP headers for amd, radeon, sti, r128, r128, savage, sis. - Allow synopsis to unwedge the i2c hdmi bus. - Add orientation quirks for GPD panels. - Edid cleanups and fixing handling for edid < 1.2. - Add runtime pm to stm. - Handle s/r in dw-hdmi. - Add hooks for power on/off to dsi for stm. - Remove virtio dirty tracking code, done in drm core. - Rework BO handling in ast and mgag200. Tiny conflict in drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c, needed #include <linux/slab.h> to make it compile. Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/0e01de30-9797-853c-732f-4a5bd6e61445@linux.intel.com
2019-06-13amdgpu: no need to check return value of debugfs_create functionsGreg Kroah-Hartman
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: xinhui pan <xinhui.pan@amd.com> Cc: Evan Quan <evan.quan@amd.com> Cc: Feifei Xu <Feifei.Xu@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-10drm/amd: drop use of drmP.h in amdgpu.hSam Ravnborg
Delete the unused drmP.h from amdgpu.h. Fix fallout in various files. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-5-sam@ravnborg.org
2019-05-31drm/amdgpu: ras injection use gpu addressxinhui pan
injection need a valid gpu address. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24drm/amd/doc: Add RAS documentation to guideTom St Denis
Acked-by: Slava Abramov <slava.abramov@amd.com> Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24drm/amdgpu: use div64_ul for 32-bit compatibility v1Slava Abramov
v1: replace casting to unsigned long with div64_ul Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Slava Abramov <slava.abramov@amd.com> Tested-by: Slava Abramov <slava.abramov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24drm/amdgpu: ras support suspend/resumexinhui pan
add ras suspend function. rename ras_post_init to amdgpu_ras_resume. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24drm/amdgpu: add badpages sysfs interafcexinhui pan
add badpages node. it will output badpages list in format gpu pfn : gpu page size : flags example 0x00000000 : 0x00001000 : R 0x00000001 : 0x00001000 : R 0x00000002 : 0x00001000 : R 0x00000003 : 0x00001000 : R 0x00000004 : 0x00001000 : R 0x00000005 : 0x00001000 : R 0x00000006 : 0x00001000 : R 0x00000007 : 0x00001000 : P 0x00000008 : 0x00001000 : P 0x00000009 : 0x00001000 : P flags can be one of below characters R: reserved. P: pending for reserve. F: failed to reserve for some reasons. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24drm/amdgpu: handle ras resetxinhui pan
add another flag to allow IP do a gpu reset after device init. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24drm/amdgpu: Issue ras TA disable/enable cmd forcely on bootxinhui pan
Check ras TA error code and return EAGAIN. Issue ras enable/disable cmd without checking currect state. Looks like ras TA will handle current state == target state case. Now driver might need do a reset to satisfy ras TA. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-10drm/amdgpu: Introduce another ras enable functionxinhui pan
Many parts of the whole SW stack can program the ras enablement state during the boot. Now we handle that case by adding one function which check the ras flags and choose different code path. Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-04-10drm/amdgpu: Make default ras error type to nonexinhui pan
Unless IP has implemented its own ras, use ERROR_NONE as the default type. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-27drm/amdgpu: remove per obj debugfs writexinhui pan
there is ras_control node which can do its job. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-27drm/amdgpu: Fix amdgpu ras to ta enums conversionxinhui pan
Add helpes to transalte the two enums. And it will catch bugs easily. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-27drm/amdgpu: use macro instead of enum for flagsxinhui pan
better to use macro. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-27drm/amdgpu: Fix some sanity checkxinhui pan
ras context might be NULL, so move con->h_data after check !con also fix sizeof wrong type while at it. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: fix semicolon.cocci warningskbuild test robot
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:405:2-3: Unneeded semicolon drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:435:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci CC: xinhui pan <xinhui.pan@amd.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add new ras workflow control flagsxinhui pan
add ras post init function. Do some initialization after all IP have finished their late init. Add new member flags which will control the ras work flow. For now, vbios enable ras for us on boot. That might change in the future. So there should be a flag from vbios to tell us if ras is enabled or not on boot. Looks like there is no such info now. Other bits of the flags are reserved to control other parts of ras. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: let ras initialization a little noticeablexinhui pan
add drm info output if ras initialized successfully. add ras atomfirmware sanity check. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Fix lockdep warning more gracelyxinhui pan
lockdep need a static key. Previously we set ignore bit to avoid the warning. Now call sysfs_attr_init to initialize the static key. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-and-Tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Fix ras debugfs data parsexinhui pan
Unzero char is accepted by sscanf, so when data is structure but unexpectedly return error invalid; Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add new member hw_supportedxinhui pan
Currently, it is not clear how ras is supported. Both software and hardware can set the supported. That is confusing. Fix it by adding new member hw_supported. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Fix warning when lockdep is enabledxinhui pan
Set ignore bit to satisfy locpdep. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: lookup vbios table to check ecc capabilityxinhui pan
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add human readable debugfs control support (v2)xinhui pan
Currently, the debugfs control node can't parse bash-like commands. Now add such support for any tester that uses scripts. v2: squash in fixes for input validation Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add debugfs ctrl nodexinhui pan
allow userspace enable/disable ras Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add amdgpu_ras.c to support ras (v2)xinhui pan
add obj management. add feature control. add debugfs infrastructure. add sysfs infrastructure. add IH infrastructure. add recovery infrastructure. It is a framework. Other IPs need call amdgpu_ras_xxx function instead of psp_ras_xxx functions. v2: squash in warning fixes Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>