summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2019-03-19drm/amdgpu: enable IH ring 1&2 for Vega20 as wellChristian König
That doesn't seem to have any negative effects. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: enable IH doorbell for ring 1&2 on VegaChristian König
The doorbells should already be reserved, just enable them. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: change Vega IH ring 1 configChristian König
Disable overflow and enable full drain. This makes fault handling on ring 1 much more reliable since we don't generate back pressure any more. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Only clear dumb buffers if ring is enabledNicholas Kazlauskas
The buffers should be cleared when possible but we also don't want buffer creation to fail in the rare case where the ring isn't ready during the call. This could happen during some suspend/resume sequences. Cc: Christian König <ckoenig.leichtzumerken@gmail.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Clear VRAM for DRM dumb_create buffersNicholas Kazlauskas
The dumb_create API isn't intended for high performance rendering and it's more useful for userspace (ie. IGT) to have them precleared. The bonus here is that we also won't needlessly leak whatever was previously in VRAM, but it also probably wasn't sensitive if it was going through this API. Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: fix semicolon.cocci warningskbuild test robot
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:405:2-3: Unneeded semicolon drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:435:2-3: Unneeded semicolon Remove unneeded semicolon. Generated by: scripts/coccinelle/misc/semicolon.cocci CC: xinhui pan <xinhui.pan@amd.com> Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add new ras workflow control flagsxinhui pan
add ras post init function. Do some initialization after all IP have finished their late init. Add new member flags which will control the ras work flow. For now, vbios enable ras for us on boot. That might change in the future. So there should be a flag from vbios to tell us if ras is enabled or not on boot. Looks like there is no such info now. Other bits of the flags are reserved to control other parts of ras. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: let ras initialization a little noticeablexinhui pan
add drm info output if ras initialized successfully. add ras atomfirmware sanity check. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Fix lockdep warning more gracelyxinhui pan
lockdep need a static key. Previously we set ignore bit to avoid the warning. Now call sysfs_attr_init to initialize the static key. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-and-Tested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Fix ras debugfs data parsexinhui pan
Unzero char is accepted by sscanf, so when data is structure but unexpectedly return error invalid; Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add new member hw_supportedxinhui pan
Currently, it is not clear how ras is supported. Both software and hardware can set the supported. That is confusing. Fix it by adding new member hw_supported. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Fix warning when lockdep is enabledxinhui pan
Set ignore bit to satisfy locpdep. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Fix NULL pointer when ta is missingxinhui pan
Ta is optional, so check if ta firmware is loaded or not. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: fix ras parameter descriptionsEvan Quan
The descriptions of modinfo wrongly show two parameters for each feature(see below). This patch can fix this incorrect outputs. parm: amdgpu_ras_enable:Enable RAS features on the GPU (0 = disable, 1 = enable, -1 = auto (default)) parm: ras_enable:int parm: amdgpu_ras_mask:Mask of RAS features to enable (default 0xffffffff), only valid when ras_enable == 1 parm: ras_mask:uint Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: export both supported and enabled ras featuresxinhui pan
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: lookup vbios table to check ecc capabilityxinhui pan
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: query sram ecc/ecc availability from atombiosHawking Zhang
query sram ecc capability via amdgpu_atomfirmware_ecc_default_enabled query ecc availability via amdgpu_atomfirmware_sram_ecc_supported Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add atomfirmware helper function to query sram ecc capsHawking Zhang
sram ecc capability could be get from firmware_capability field in firmwareinfo table Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add atomfirmware helper function to query ecc statusHawking Zhang
ecc default status (enabled or disabled) could be get from umc_config field in umc_info table Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: handle ras resumexinhui pan
Suspend will put irq, so resume need get irq back. And in the same time, skip other ras initialization. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdkfd: add RAS ECC event support (v3)Eric Huang
RAS ECC event will combine with GPU reset event, due to ECC interrupts are caused by uncorrectable error that triggers GPU reset. v2: Fix misleading-indentation warning v3: fix build with CONFIG_HSA_AMD disabled Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add human readable debugfs control support (v2)xinhui pan
Currently, the debugfs control node can't parse bash-like commands. Now add such support for any tester that uses scripts. v2: squash in fixes for input validation Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: skip gpu reset when ras error occuredxinhui pan
gpu reset is not stable on vega20 A1. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add ioctl query for enabled ras features (v2)xinhui pan
Add a query for userspace to check which RAS features are enabled. v2: squash in warning fix Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Add a new flag to AMDGPU_CTX_OP_QUERY_STATE2xinhui pan
Add AMDGPU_CTX_QUERY2_FLAGS_RAS_CE/UE which indicate if any error happened between previous query and this query. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: enable ras on gmc9xinhui pan
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: enable ras on gfx9 (v2)Feifei Xu
Register ecc interrupts and ecc interrupt handler on gfx9. Add ras support on gfx9 v2: squash in warning fix Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: enable ras on sdma4xinhui pan
register IH, enable ras features on sdma. create sysfs debugfs file for sdma. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: reserve bad pages during recoveryxinhui pan
Mark vram pages with errors as bad and prevent the driver from using them. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add debugfs ctrl nodexinhui pan
allow userspace enable/disable ras Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add amdgpu_ras.c to support ras (v2)xinhui pan
add obj management. add feature control. add debugfs infrastructure. add sysfs infrastructure. add IH infrastructure. add recovery infrastructure. It is a framework. Other IPs need call amdgpu_ras_xxx function instead of psp_ras_xxx functions. v2: squash in warning fixes Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add psp cmd submit timeoutxinhui pan
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add psp v11 ras callbackxinhui pan
Add trigger_error and cure_posion. Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add psp ras subsystem infrastructure (v2)xinhui pan
Add ras fw loading, init, terminate. Add ras cmd submit helper. Add ras feature enable/disable common function. v2: squash in unused variable warning fix Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add psp ras callback func and macroxinhui pan
Define the driver side interface for ras ta. Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add ta_ras_if.hxinhui pan
Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add module parameters for rasxinhui pan
Allow RAS feature enable/disable via boot parameter. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: export ta fw infoxinhui pan
Output the ta fw, aka xgmi/ras, via debugfs. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: add ta ras fw info (v2)xinhui pan
Add ras fw part, xgmi and ras fw are combined together in ta binary. Reading the data from the info is not implemented yet. v2: squash in "drm/amdgpu: fix NULL pointer when ta is missing" Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Cosmetic change for calling func amdgpu_gmc_vram_locationOak Zeng
Use function parameter mc as the second parameter of amdgpu_gmc_vram_location, so codes look more consistent. Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Move IB pool init and fini v2Andrey Grodzovsky
Problem: Using SDMA for TLB invalidation in certain ASICs exposed a problem of IB pool not being ready while SDMA already up on Init and already shutt down while SDMA still running on Fini. This caused IB allocation failure. Temproary fix was commited into a bringup branch but this is the generic fix. Fix: Init IB pool rigth after GMC is ready but before SDMA is ready. Do th opposite for Fini. v2: Remove restriction on SDMA early init and move amdgpu_ib_pool_fini Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amd/powerplay: apply Vega20 BACO workaroundEvan Quan
Applied vdci flush workaround for Vega20 BACO. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: XGMI pstate switch initial supportshaoyunl
Driver vote low to high pstate switch whenever there is an outstanding XGMI mapping request. Driver vote high to low pstate when all the outstanding XGMI mapping is terminated. Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Enable XGMI mapping for peer deviceshaoyunl
Adjust vram base offset for XGMI mapping when update the PT entry so the address will fall into correct XGMI aperture for peer device Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Add sysfs entries for xgmi hive v2.Andrey Grodzovsky
For each device a file xgmi_device_id is created. On the first device a subdirectory named xgmi_hive_info is created, It contains a file named hive_id and symlinks named node 1-4 linking to each device in the hive. v2: Return error codes instead of '-1' and few misspellings. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: allow huge invalid mappings on GMC8Christian König
Only GMC9 supports true huge pages, but we can still free invalid mappings on GMC8. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: drop the huge page flagChristian König
Not needed any more since we now free PDs/PTs on demand. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: free PDs/PTs on demandChristian König
When something is unmapped we now free the affected PDs/PTs again. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: allocate VM PDs/PTs on demandChristian König
Let's start to allocate VM PDs/PTs on demand instead of pre-allocating them during mapping. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: let amdgpu_vm_clear_bo figure out ats status v2Christian König
Instead of providing it from outside figure out the ats status in the function itself from the data structures. v2: simplify finding the right level v3: partially revert changes from v2, more cleanup and split code into more functions. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>