drm/amd/amdgpu: consider kernel job always not guilty

[Why] Currently all timedout job will be considered to be guilty. In SRIOV multi-vf use case, the vf flr happens first and then job time out is found. There can be several jobs timeout during a very small time slice. And if the innocent sdma job time out is found before the real bad job, then the innocent sdma job will be set to guilty. This will lead to a page fault after resubmitting job. [How] If the job is a kernel job, we will always consider it not guilty Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
author: Jingwen Chen <Jingwen.Chen2@amd.com> 2021-07-20 18:35:35 +0800
committer: Alex Deucher <alexander.deucher@amd.com> 2021-07-23 10:08:00 -0400
commit: ff99849b00fef595ae46681ce0c2217a9f834332 (patch)
tree: 5837b66f83264ba3923b2ed0a6ba73302b8dd4b7 /drivers
parent: 410e302ea53f095f5d94dc14efefe8191bde901b (diff)
1 files changed, 3 insertions, 3 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f8bc670fe3f9..5d2453cc880c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4468,7 +4468,7 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device *adev,
 		amdgpu_fence_driver_force_completion(ring);
 	}
 
-	if(job)
+	if (job && job->vm)
 		drm_sched_increase_karma(&job->base);
 
 	r = amdgpu_reset_prepare_hwcontext(adev, reset_context);
@@ -4932,7 +4932,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 			DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
 				job ? job->base.id : -1, hive->hive_id);
 			amdgpu_put_xgmi_hive(hive);
-			if (job)
+			if (job && job->vm)
 				drm_sched_increase_karma(&job->base);
 			return 0;
 		}
@@ -4956,7 +4956,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 					job ? job->base.id : -1);
 
 		/* even we skipped this reset, still need to set the job to guilty */
-		if (job)
+		if (job && job->vm)
 			drm_sched_increase_karma(&job->base);
 		goto skip_recovery;
 	}
author	Jingwen Chen <Jingwen.Chen2@amd.com>	2021-07-20 18:35:35 +0800
committer	Alex Deucher <alexander.deucher@amd.com>	2021-07-23 10:08:00 -0400
commit	ff99849b00fef595ae46681ce0c2217a9f834332 (patch)
tree	5837b66f83264ba3923b2ed0a6ba73302b8dd4b7 /drivers
parent	410e302ea53f095f5d94dc14efefe8191bde901b (diff)