Age | Commit message (Collapse) | Author | |
---|---|---|---|
2024-09-06 | drm/sched: add optional errno to drm_sched_start() | Christian König | |
The current implementation of drm_sched_start uses a hardcoded -ECANCELED to dispose of a job when the parent/hw fence is NULL. This results in drm_sched_job_done being called with -ECANCELED for each job with a NULL parent in the pending list, making it difficult to distinguish between recovery methods, whether a queue reset or a full GPU reset was used. To improve this, we first try a soft recovery for timeout jobs and use the error code -ENODATA. If soft recovery fails, we proceed with a queue reset, where the error code remains -ENODATA for the job. Finally, for a full GPU reset, we use error codes -ECANCELED or -ETIME. This patch adds an error code parameter to drm_sched_start, allowing us to differentiate between queue reset and GPU reset failures. This enables user mode and test applications to validate the expected correctness of the requested operation. After a successful queue reset, the only way to continue normal operation is to call drm_sched_job_done with the specific error code -ENODATA. v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain and amdgpu_device_unlock_reset_domain to allow user mode to track the queue reset status and distinguish between queue reset and GPU reset. v2: Christian suggested using the error codes -ENODATA for queue reset and -ECANCELED or -ETIME for GPU reset, returned to amdgpu_cs_wait_ioctl. v3: To meet the requirements, we introduce a new function drm_sched_start_ex with an additional parameter to set dma_fence_set_error, allowing us to handle the specific error codes appropriately and dispose of bad jobs with the selected error code depending on whether it was a queue reset or GPU reset. v4: Alex suggested using a new name, drm_sched_start_with_recovery_error, which more accurately describes the function's purpose. Additionally, it was recommended to add documentation details about the new method. v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex) v6 (chk): rebase on upstream changes, cleanup the commit message, drop the new function again and update all callers, apply the errno also to scheduler fences with hw fences v7 (chk): rebased Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com | |||
2024-07-25 | drm/scheduler: remove full_recover from drm_sched_start | Christian König | |
This was basically just another one of amdgpus hacks. The parameter allowed to restart the scheduler without turning fence signaling on again. That this is absolutely not a good idea should be obvious by now since the fences will then just sit there and never signal. While at it cleanup the code a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com | |||
2023-11-27 | drm/sched: Fix compilation issues with DRM priority rename | Luben Tuikov | |
Fix compilation issues with DRM scheduler priority rename MIN to LOW. Signed-off-by: Luben Tuikov <ltuikov89@gmail.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311252109.WgbJsSkG-lkp@intel.com/ Cc: Danilo Krummrich <dakr@redhat.com> Cc: Frank Binns <frank.binns@imgtec.com> Cc: Donald Robson <donald.robson@imgtec.com> Cc: Matt Coster <matt.coster@imgtec.com> Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org> Fixes: fe375c74806dbd ("drm/sched: Rename priority MIN to LOW") Fixes: 38f922a563aac3 ("drm/sched: Reverse run-queue priority enumeration") Fixes: 5f03a507b29e44 ("drm/nouveau: implement 1:1 scheduler - entity relationship") Link: https://patchwork.freedesktop.org/patch/msgid/20231125192246.87268-2-ltuikov89@gmail.com Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/7429262c-6dea-4dcc-bf7e-54d2277dabf1@amd.com | |||
2023-11-23 | drm/imagination: Implement job submission and scheduling | Sarah Walker | |
Implement job submission ioctl. Job scheduling is implemented using drm_sched. Jobs are submitted in a stream format. This is intended to allow the UAPI data format to be independent of the actual FWIF structures in use, which vary depending on the GPU in use. The stream formats are documented at: https://gitlab.freedesktop.org/mesa/mesa/-/blob/f8d2b42ae65c2f16f36a43e0ae39d288431e4263/src/imagination/csbgen/rogue_kmd_stream.xml Changes since v8: - Updated for upstreamed DRM scheduler changes - Removed workaround code for the pending_list previously being updated after run_job() returned - Fixed null deref in pvr_queue_cleanup_fw_context() for bad stream ptr given to create_context ioctl - Corrected license identifiers Changes since v7: - Updated for v8 "DRM scheduler changes for XE" patchset Changes since v6: - Fix fence handling in pvr_sync_signal_array_add() - Add handling for SUBMIT_JOB_FRAG_CMD_DISABLE_PIXELMERGE flag - Fix missing dma_resv locking in job submit path Changes since v5: - Fix leak in job creation error path Changes since v4: - Use a regular workqueue for job scheduling Changes since v3: - Support partial render jobs - Add job timeout handler - Split sync handling out of job code - Use drm_dev_{enter,exit} Changes since v2: - Use drm_sched for job scheduling Co-developed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Co-developed-by: Donald Robson <donald.robson@imgtec.com> Signed-off-by: Donald Robson <donald.robson@imgtec.com> Signed-off-by: Sarah Walker <sarah.walker@imgtec.com> Link: https://lore.kernel.org/r/c98dab7a5f5fb891fbed7e4990d19b5d13964365.1700668843.git.donald.robson@imgtec.com Signed-off-by: Maxime Ripard <mripard@kernel.org> |