From a17f574ab4a2d3dcbd9a49e3c1710fb0cbe8a901 Mon Sep 17 00:00:00 2001 From: Yifan Zhang <yifan1.zhang@amd.com> Date: Thu, 26 Oct 2023 21:23:06 +0800 Subject: drm/amdgpu: remove amdgpu_mes_self_test in gpu recover MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit gpu tlb flush is skipped if reset sem is held, it makes mes_self_test fail since it involves add_hw_queue/remove_hw_queue which needs tlb flush functional. Remove mes_self_test in gpu recover sequence. This patch is to fix the recover failure in gfx11. [ 1831.768292] [drm] ring sdma_32769.3.3 was added [ 1831.768313] [drm] ring gfx_32769.1.1 ib test pass [ 1831.768337] [drm] ring compute_32769.2.2 ib test pass [ 1831.768399] amdgpu 0000:c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process pid 0 thread pid 0) [ 1831.768434] amdgpu 0000:c2:00.0: amdgpu: in page starting at address 0x0000aec200000000 from client 10 [ 1831.768456] amdgpu 0000:c2:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800A30 [ 1831.768473] amdgpu 0000:c2:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5) [ 1831.768489] amdgpu 0000:c2:00.0: amdgpu: MORE_FAULTS: 0x0 [ 1831.768501] amdgpu 0000:c2:00.0: amdgpu: WALKER_ERROR: 0x0 [ 1831.768513] amdgpu 0000:c2:00.0: amdgpu: PERMISSION_FAULTS: 0x3 [ 1831.768521] amdgpu 0000:c2:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 1831.768529] amdgpu 0000:c2:00.0: amdgpu: RW: 0x0 [ 1831.931229] amdgpu 0000:c2:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma_32769.3.3 test failed (-110) [ 1832.062917] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3 [ 1832.063107] [drm:amdgpu_mes_remove_hw_queue [amdgpu]] *ERROR* failed to remove hardware queue, queue id = 3 Fixes: e2e3788850b9 ("drm/amdgpu: rework lock handling for flush_tlb v2") Reported-by: Li Ma <li.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index d5f78179b2b6..bd940cb9322e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5566,10 +5566,6 @@ skip_hw_reset: drm_sched_start(&ring->sched, true); } - if (adev->enable_mes && - amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(11, 0, 3)) - amdgpu_mes_self_test(tmp_adev); - if (!drm_drv_uses_atomic_modeset(adev_to_drm(tmp_adev)) && !job_signaled) drm_helper_resume_force_mode(adev_to_drm(tmp_adev)); -- cgit v1.2.3-70-g09d2