summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/scheduler/sched_entity.c
AgeCommit message (Collapse)Author
2023-10-26drm/sched: Convert the GPU scheduler to variable number of run-queuesLuben Tuikov
The GPU scheduler has now a variable number of run-queues, which are set up at drm_sched_init() time. This way, each driver announces how many run-queues it requires (supports) per each GPU scheduler it creates. Note, that run-queues correspond to scheduler "priorities", thus if the number of run-queues is set to 1 at drm_sched_init(), then that scheduler supports a single run-queue, i.e. single "priority". If a driver further sets a single entity per run-queue, then this creates a 1-to-1 correspondence between a scheduler and a scheduled entity. Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Danilo Krummrich <dakr@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Emma Anholt <emma@anholt.net> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
2023-07-11Merge v6.5-rc1 into drm-misc-fixesMaxime Ripard
Boris needs 6.5-rc1 in drm-misc-fixes to prevent a conflict. Signed-off-by: Maxime Ripard <mripard@kernel.org>
2023-06-22drm/sched: Make sure we wait for all dependencies in kill_jobs_cb()Boris Brezillon
drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped from the dependency array that was waited upon before drm_sched_entity_kill() was called (drm_sched_entity::dependency field), so we're basically waiting for all dependencies except one. In theory, this wait shouldn't be needed because resources should have their users registered to the dma_resv object, thus guaranteeing that future jobs wanting to access these resources wait on all the previous users (depending on the access type, of course). But we want to keep these explicit waits in the kill entity path just in case. Let's make sure we keep all dependencies in the array in drm_sched_job_dependency(), so we can iterate over the array and wait in drm_sched_entity_kill_jobs_cb(). We also make sure we wait on drm_sched_fence::finished if we were originally asked to wait on drm_sched_fence::scheduled. In that case, we assume the intent was to delegate the wait to the firmware/GPU or rely on the pipelining done at the entity/scheduler level, but when killing jobs, we really want to wait for completion not just scheduling. v2: - Don't evict deps in drm_sched_job_dependency() v3: - Always wait for drm_sched_fence::finished fences in drm_sched_entity_kill_jobs_cb() when we see a sched_fence v4: - Fix commit message - Fix a use-after-free bug v5: - Flag deps on which we should only wait for the scheduled event at insertion time v6: - Back to v4 implementation - Add Christian's R-b Cc: Frank Binns <frank.binns@imgtec.com> Cc: Sarah Walker <sarah.walker@imgtec.com> Cc: Donald Robson <donald.robson@imgtec.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Suggested-by: "Christian König" <christian.koenig@amd.com> Reviewed-by: "Christian König" <christian.koenig@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230619071921.3465992-1-boris.brezillon@collabora.com
2023-06-15drm/scheduler: avoid infinite loop if entity's dependency is a scheduled ↵ZhenGuo Yin
error fence [Why] drm_sched_entity_add_dependency_cb ignores the scheduled fence and return false. If entity's dependency is a scheduler error fence and drm_sched_stop is called due to TDR, drm_sched_entity_pop_job will wait for the dependency infinitely. [How] Do not wait or ignore the scheduled error fence, add drm_sched_entity_wakeup callback for the dependency with scheduled error fence. Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-19drm/sched: Rename to drm_sched_wakeup_if_can_queue()Luben Tuikov
Rename drm_sched_wakeup() to drm_sched_wakeup_if_canqueue() since the former is misleading, as it wakes up the GPU scheduler _only if_ more jobs can be queued to the underlying hardware. This distinction is important to make, since the wake conditional in the GPU scheduler thread wakes up when other conditions are also true, e.g. when there are jobs to be cleaned. For instance, a user might want to wake up the scheduler only because there are more jobs to clean, but whether we can queue more jobs is irrelevant. v2: Separate "canqueue" to "can_queue". (Alex D.) Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://lore.kernel.org/r/20230517233550.377847-2-luben.tuikov@amd.com Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
2023-04-24drm/scheduler: add drm_sched_entity_error and use rcu for last_scheduledChristian König
Switch to using RCU handling for the last scheduled job and add a function to return the error code of it. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230420115752.31470-2-christian.koenig@amd.com
2023-04-24drm/scheduler: properly forward fence errorsChristian König
When a hw fence is signaled with an error properly forward that to the finished fence. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230420115752.31470-1-christian.koenig@amd.com
2023-04-18drm/scheduler: set entity to NULL in drm_sched_entity_pop_job()Danilo Krummrich
It already happend a few times that patches slipped through which implemented access to an entity through a job that was already removed from the entities queue. Since jobs and entities might have different lifecycles, this can potentially cause UAF bugs. In order to make it obvious that a jobs entity pointer shouldn't be accessed after drm_sched_entity_pop_job() was called successfully, set the jobs entity pointer to NULL once the job is removed from the entity queue. Moreover, debugging a potential NULL pointer dereference is way easier than potentially corrupted memory through a UAF. Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://lore.kernel.org/r/20230418100453.4433-1-dakr@redhat.com Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
2023-04-06drm/scheduler: Fix UAF race in drm_sched_entity_push_job()Asahi Lina
After a job is pushed into the queue, it is owned by the scheduler core and may be freed at any time, so we can't write nor read the submit timestamp after that point. Fixes oopses observed with the drm/asahi driver, found with kASAN. Signed-off-by: Asahi Lina <lina@asahilina.net> Link: https://lore.kernel.org/r/20230406-scheduler-uaf-2-v1-1-972531cf0a81@asahilina.net Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
2023-04-06drm/scheduler: Fix UAF race in drm_sched_entity_push_job()Asahi Lina
After a job is pushed into the queue, it is owned by the scheduler core and may be freed at any time, so we can't write nor read the submit timestamp after that point. Fixes oopses observed with the drm/asahi driver, found with kASAN. Signed-off-by: Asahi Lina <lina@asahilina.net> Link: https://lore.kernel.org/r/20230406-scheduler-uaf-2-v1-1-972531cf0a81@asahilina.net Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
2023-01-02drm/scheduler: Fix lockup in drm_sched_entity_kill()Dmitry Osipenko
The drm_sched_entity_kill() is invoked twice by drm_sched_entity_destroy() while userspace process is exiting or being killed. First time it's invoked when sched entity is flushed and second time when entity is released. This causes a lockup within wait_for_completion(entity_idle) due to how completion API works. Calling wait_for_completion() more times than complete() was invoked is a error condition that causes lockup because completion internally uses counter for complete/wait calls. The complete_all() must be used instead in such cases. This patch fixes lockup of Panfrost driver that is reproducible by killing any application in a middle of 3d drawing operation. Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Reviewed-by: Christian König <christian.koenig@amd.com> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck Link: https://patchwork.freedesktop.org/patch/msgid/20221123001303.533968-1-dmitry.osipenko@collabora.com
2022-11-23drm/scheduler: Fix lockup in drm_sched_entity_kill()Dmitry Osipenko
The drm_sched_entity_kill() is invoked twice by drm_sched_entity_destroy() while userspace process is exiting or being killed. First time it's invoked when sched entity is flushed and second time when entity is released. This causes a lockup within wait_for_completion(entity_idle) due to how completion API works. Calling wait_for_completion() more times than complete() was invoked is a error condition that causes lockup because completion internally uses counter for complete/wait calls. The complete_all() must be used instead in such cases. This patch fixes lockup of Panfrost driver that is reproducible by killing any application in a middle of 3d drawing operation. Fixes: 2fdb8a8f07c2 ("drm/scheduler: rework entity flush, kill and fini") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221123001303.533968-1-dmitry.osipenko@collabora.com
2022-11-05Merge drm/drm-next into drm-misc-nextThomas Zimmermann
Backmerging drm/drm-next to get the latest changes in the xlnx driver. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2022-11-03drm/scheduler: rename dependency callback into prepare_jobChristian König
This now matches much better what this is doing. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-14-christian.koenig@amd.com
2022-11-03drm/scheduler: rework entity flush, kill and finiChristian König
This was buggy because when we had to wait for entities which were killed as well we would just deadlock. Instead move all the dependency handling into the callbacks so that will all happen asynchronously. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014084641.128280-13-christian.koenig@amd.com
2022-10-25Merge tag 'drm-misc-next-2022-10-20' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 6.2: UAPI Changes: - Documentation for page-flip flags Cross-subsystem Changes: - dma-buf: Add unlocked variant of vmapping and attachment-mapping functions Core Changes: - atomic-helpers: CRTC primary plane test fixes - connector: TV API consistency improvements, cmdline parsing improvements - crtc-helpers: Introduce drm_crtc_helper_atomic_check() helper - edid: Fixes for HFVSDB parsing, - fourcc: Addition of the Vivante tiled modifier - makefile: Sort and reorganize the objects files - mode_config: Remove fb_base from drm_mode_config_funcs - sched: Add a module parameter to change the scheduling policy, refcounting fix for fences - tests: Sort the Kunit tests in the Makefile, improvements to the DP-MST tests - ttm: Remove unnecessary drm_mm_clean() call Driver Changes: - New driver: ofdrm - Move all drivers to a common dma-buf locking convention - bridge: - adv7533: Remove dynamic lane switching - it6505: Runtime PM support - ps8640: Handle AUX defer messages - tc358775: Drop soft-reset over I2C - ast: Atomic Gamma LUT Support, Convert to SHMEM, various improvements - lcdif: Support for YUV planes - mgag200: Fix PLL Setup on some revisions - udl: Modesetting improvements, hot-unplug support - vc4: Fix support for PAL-M Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20221020072405.g3o4hxuk75gmeumw@houat
2022-10-19drm/sched: add DRM_SCHED_FENCE_DONT_PIPELINE flagChristian König
Setting this flag on a scheduler fence prevents pipelining of jobs depending on this fence. In other words we always insert a full CPU round trip before dependent jobs are pushed to the pipeline. Signed-off-by: Christian König <christian.koenig@amd.com> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2113#note_1579296 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221014081553.114899-1-christian.koenig@amd.com
2022-10-05drm/scheduler: fix fence ref countingChristian König
We leaked dependency fences when processes were beeing killed. Additional to that grab a reference to the last scheduled fence. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220929180151.139751-1-christian.koenig@amd.com
2022-09-30drm/sched: Add FIFO sched policy to run queueAndrey Grodzovsky
When many entities are competing for the same run queue on the same scheduler, we observe an unusually long wait times and some jobs get starved. This has been observed on GPUVis. The issue is due to the Round Robin policy used by schedulers to pick up the next entity's job queue for execution. Under stress of many entities and long job queues within entity some jobs could be stuck for very long time in it's entity's queue before being popped from the queue and executed while for other entities with smaller job queues a job might execute earlier even though that job arrived later then the job in the long queue. Fix: Add FIFO selection policy to entities in run queue, chose next entity on run queue in such order that if job on one entity arrived earlier then job on another entity the first job will start executing earlier regardless of the length of the entity's job queue. v2: Switch to rb tree structure for entities based on TS of oldest job waiting in the job queue of an entity. Improves next entity extraction to O(1). Entity TS update O(log N) where N is the number of entities in the run-queue Drop default option in module control parameter. v3: Various cosmetical fixes and minor refactoring of fifo update function. (Luben) v4: Switch drm_sched_rq_select_entity_fifo to in order search (Luben) v5: Fix up drm_sched_rq_select_entity_fifo loop (Luben) v6: Add missing drm_sched_rq_remove_fifo_locked v7: Fix ts sampling bug and more cosmetic stuff (Luben) v8: Fix module parameter string (Luben) Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Direct Rendering Infrastructure - Development <dri-devel@lists.freedesktop.org> Cc: AMD Graphics <amd-gfx@lists.freedesktop.org> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Tested-by: Yunxiang Li (Teddy) <Yunxiang.Li@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220930041258.1050247-1-luben.tuikov@amd.com
2022-07-15drm/scheduler: Don't kill jobs in interrupt contextDmitry Osipenko
Interrupt context can't sleep. Drivers like Panfrost and MSM are taking mutex when job is released, and thus, that code can sleep. This results into "BUG: scheduling while atomic" if locks are contented while job is freed. There is no good reason for releasing scheduler's jobs in IRQ context, hence use normal context to fix the trouble. Cc: stable@vger.kernel.org Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220411221536.283312-1-dmitry.osipenko@collabora.com
2022-05-17drm/scheduler: Don't kill jobs in interrupt contextDmitry Osipenko
Interrupt context can't sleep. Drivers like Panfrost and MSM are taking mutex when job is released, and thus, that code can sleep. This results into "BUG: scheduling while atomic" if locks are contented while job is freed. There is no good reason for releasing scheduler's jobs in IRQ context, hence use normal context to fix the trouble. Cc: stable@vger.kernel.org Fixes: 542cff7893a3 ("drm/sched: Avoid lockdep spalt on killing a processes") Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220411221536.283312-1-dmitry.osipenko@collabora.com
2021-11-01drm/sched: Avoid lockdep spalt on killing a processesAndrey Grodzovsky
Probelm: Singlaning one sched fence from within another's sched fence singal callback generates lockdep splat because the both have same lockdep class of their fence->lock Fix: Fix bellow stack by rescheduling to irq work of signaling and killing of jobs that left when entity is killed. [11176.741181] dump_stack+0x10/0x12 [11176.741186] __lock_acquire.cold+0x208/0x2df [11176.741197] lock_acquire+0xc6/0x2d0 [11176.741204] ? dma_fence_signal+0x28/0x80 [11176.741212] _raw_spin_lock_irqsave+0x4d/0x70 [11176.741219] ? dma_fence_signal+0x28/0x80 [11176.741225] dma_fence_signal+0x28/0x80 [11176.741230] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [11176.741240] drm_sched_entity_kill_jobs_cb+0x1c/0x50 [gpu_sched] [11176.741248] dma_fence_signal_timestamp_locked+0xac/0x1a0 [11176.741254] dma_fence_signal+0x3b/0x80 [11176.741260] drm_sched_fence_finished+0x12/0x20 [gpu_sched] [11176.741268] drm_sched_job_done.isra.0+0x7f/0x1a0 [gpu_sched] [11176.741277] drm_sched_job_done_cb+0x12/0x20 [gpu_sched] [11176.741284] dma_fence_signal_timestamp_locked+0xac/0x1a0 [11176.741290] dma_fence_signal+0x3b/0x80 [11176.741296] amdgpu_fence_process+0xd1/0x140 [amdgpu] [11176.741504] sdma_v4_0_process_trap_irq+0x8c/0xb0 [amdgpu] [11176.741731] amdgpu_irq_dispatch+0xce/0x250 [amdgpu] [11176.741954] amdgpu_ih_process+0x81/0x100 [amdgpu] [11176.742174] amdgpu_irq_handler+0x26/0xa0 [amdgpu] [11176.742393] __handle_irq_event_percpu+0x4f/0x2c0 [11176.742402] handle_irq_event_percpu+0x33/0x80 [11176.742408] handle_irq_event+0x39/0x60 [11176.742414] handle_edge_irq+0x93/0x1d0 [11176.742419] __common_interrupt+0x50/0xe0 [11176.742426] common_interrupt+0x80/0x90 Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Suggested-by: Christian König <christian.koenig@amd.com> Tested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/dri-devel/msg321250.html
2021-08-30drm/sched: improve docs around drm_sched_entityDaniel Vetter
I found a few too many things that are tricky and not documented, so I started typing. I found a few more things that looked broken while typing, see the varios FIXME in drm_sched_entity. Also some of the usual logics: - actually include sched_entity.c declarations, that was lost in the move here: 620e762f9a98 ("drm/scheduler: move entity handling into separate file") - Ditch the kerneldoc for internal functions, keep the comments where they're describing more than what the function name already implies. - Switch drm_sched_entity to inline docs. Acked-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: "Christian König" <christian.koenig@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Emma Anholt <emma@anholt.net> Cc: Lee Jones <lee.jones@linaro.org> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-7-daniel.vetter@ffwll.ch
2021-08-30drm/sched: drop entity parameter from drm_sched_push_jobDaniel Vetter
Originally a job was only bound to the queue when we pushed this, but now that's done in drm_sched_job_init, making that parameter entirely redundant. Remove it. The same applies to the context parameter in lima_sched_context_queue_task, simplify that too. v2: Rebase on top of msm adopting drm/sched Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Steven Price <steven.price@arm.com> (v1) Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v1) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Emma Anholt <emma@anholt.net> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Chen Li <chenli@uniontech.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: Melissa Wen <mwen@igalia.com> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-6-daniel.vetter@ffwll.ch
2021-08-30drm/sched: Add dependency trackingDaniel Vetter
Instead of just a callback we can just glue in the gem helpers that panfrost, v3d and lima currently use. There's really not that many ways to skin this cat. v2/3: Rebased. v4: Repaint this shed. The functions are now called _add_dependency() and _add_implicit_dependency() Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v3) Reviewed-by: Steven Price <steven.price@arm.com> (v1) Acked-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Nirmoy Das <nirmoy.aiemd@gmail.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-5-daniel.vetter@ffwll.ch
2021-08-30drm/sched: Barriers are needed for entity->last_scheduledDaniel Vetter
It might be good enough on x86 with just READ_ONCE, but the write side should then at least be WRITE_ONCE because x86 has total store order. It's definitely not enough on arm. Fix this proplery, which means - explain the need for the barrier in both places - point at the other side in each comment Also pull out the !sched_list case as the first check, so that the code flow is clearer. While at it sprinkle some comments around because it was very non-obvious to me what's actually going on here and why. Note that we really need full barriers here, at first I thought store-release and load-acquire on ->last_scheduled would be enough, but we actually requiring ordering between that and the queue state. v2: Put smp_rmp() in the right place and fix up comment (Andrey) Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Steven Price <steven.price@arm.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210805104705.862416-4-daniel.vetter@ffwll.ch
2021-08-30drm/sched: Split drm_sched_job_initDaniel Vetter
This is a very confusingly named function, because not just does it init an object, it arms it and provides a point of no return for pushing a job into the scheduler. It would be nice if that's a bit clearer in the interface. But the real reason is that I want to push the dependency tracking helpers into the scheduler code, and that means drm_sched_job_init must be called a lot earlier, without arming the job. v2: - don't change .gitignore (Steven) - don't forget v3d (Emma) v3: Emma noticed that I leak the memory allocated in drm_sched_job_init if we bail out before the point of no return in subsequent driver patches. To be able to fix this change drm_sched_job_cleanup() so it can handle being called both before and after drm_sched_job_arm(). Also improve the kerneldoc for this. v4: - Fix the drm_sched_job_cleanup logic, I inverted the booleans, as usual (Melissa) - Christian pointed out that drm_sched_entity_select_rq() also needs to be moved into drm_sched_job_arm, which made me realize that the job->id definitely needs to be moved too. Shuffle things to fit between job_init and job_arm. v5: Reshuffle the split between init/arm once more, amdgpu abuses drm_sched.ready to signal gpu reset failures. Also document this somewhat. (Christian) v6: Rebase on top of the msm drm/sched support. Note that the drm_sched_job_init() call is completely misplaced, and hence also the split-out drm_sched_entity_push_job(). I've put in a FIXME which the next patch will address. v7: Drop the FIXME in msm, after discussions with Rob I agree it shouldn't be a problem where it is now. Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Melissa Wen <mwen@igalia.com> Cc: Melissa Wen <melissa.srw@gmail.com> Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Steven Price <steven.price@arm.com> (v2) Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (v5) Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Cc: Lucas Stach <l.stach@pengutronix.de> Cc: Russell King <linux+etnaviv@armlinux.org.uk> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Qiang Yu <yuq825@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Steven Price <steven.price@arm.com> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Adam Borowski <kilobyte@angband.pl> Cc: Nick Terrell <terrelln@fb.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Dave Airlie <airlied@redhat.com> Cc: Nirmoy Das <nirmoy.das@amd.com> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Lee Jones <lee.jones@linaro.org> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Chen Li <chenli@uniontech.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: "Marek Olšák" <marek.olsak@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Sonny Jiang <sonny.jiang@amd.com> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Tian Tao <tiantao6@hisilicon.com> Cc: etnaviv@lists.freedesktop.org Cc: lima@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Emma Anholt <emma@anholt.net> Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <sean@poorly.run> Cc: linux-arm-msm@vger.kernel.org Cc: freedreno@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20210817084917.3555822-1-daniel.vetter@ffwll.ch
2021-05-19drm/sched: Avoid data corruptionsAndrey Grodzovsky
Wait for all dependencies of a job to complete before killing it to avoid data corruptions. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210519141407.88444-1-andrey.grodzovsky@amd.com
2021-05-19drm/scheduler: Fix hang when sched_entity releasedAndrey Grodzovsky
Problem: If scheduler is already stopped by the time sched_entity is released and entity's job_queue not empty I encountred a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle never becomes false. Fix: In drm_sched_fini detach all sched_entities from the scheduler's run queues. This will satisfy drm_sched_entity_is_idle. Also wakeup all those processes stuck in sched_entity flushing as the scheduler main thread which wakes them up is stopped by now. v2: Reverse order of drm_sched_rq_remove_entity and marking s_entity as stopped to prevent reinserion back to rq due to race. v3: Drop drm_sched_rq_remove_entity, only modify entity->stopped and check for it in drm_sched_entity_is_idle Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210512142648.666476-14-andrey.grodzovsky@amd.com
2021-04-22drm/scheduler/sched_entity: Fix some function name disparityLee Jones
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_entity.c:204: warning: expecting prototype for drm_sched_entity_kill_jobs(). Prototype was for drm_sched_entity_kill_jobs_cb() instead drivers/gpu/drm/scheduler/sched_entity.c:262: warning: expecting prototype for drm_sched_entity_cleanup(). Prototype was for drm_sched_entity_fini() instead drivers/gpu/drm/scheduler/sched_entity.c:305: warning: expecting prototype for drm_sched_entity_fini(). Prototype was for drm_sched_entity_destroy() instead Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20210416143725.2769053-25-lee.jones@linaro.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com>
2021-03-08drm/sched: select new rq even if there is only one v3Christian König
This is necessary when changing priorities of an entity. v2: test the sched_list instead of num_sched. v3: set the sched_list to NULL when there is only one entry Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210305125155.2312-1-christian.koenig@amd.com
2021-02-05drm/scheduler: provide scheduler score externallyChristian König
Allow multiple schedulers to share the load balancing score. This is useful when one engine has different hw rings. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210204144405.2737-1-christian.koenig@amd.com
2020-11-13gpu: drm: scheduler: sched_entity: Demote non-conformant kernel-doc headersLee Jones
Fixes the following W=1 kernel build warning(s): drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'f' not described in 'drm_sched_entity_clear_dep' drivers/gpu/drm/scheduler/sched_entity.c:316: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_clear_dep' drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'f' not described in 'drm_sched_entity_wakeup' drivers/gpu/drm/scheduler/sched_entity.c:330: warning: Function parameter or member 'cb' not described in 'drm_sched_entity_wakeup' Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Nirmoy Das <nirmoy.aiemd@gmail.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-10-05drm/sched: Avoid infinite waits in the drm_sched_entity_destroy() pathBoris Brezillon
If we don't initialize the entity to idle and the entity is never scheduled before being destroyed we end up with an infinite wait in the destroy path. v2: - Add Steven's R-b Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/393486/
2020-06-26drm/scheduler: improve job distribution with multiple queuesNirmoy Das
This patch uses score to select a new drm scheduler for better loadbalance between multiple drm schedulers instead of num_jobs. Below are test results after running amdgpu_test for ~10 times. Before this patch: sched_name num of many times it got schedule ========= ================================== sdma0 1463 sdma1 198 comp_1.0.1 280 After this patch: sched_name num of many times it got schedule ========= ================================== sdma0 925 sdma1 928 comp_1.0.1 177 comp_1.1.1 44 comp_1.2.1 43 comp_1.3.1 44 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/373000/ Signed-off-by: Christian König <christian.koenig@amd.com>
2020-03-16drm/sched: implement and export drm_sched_pick_bestNirmoy Das
Remove drm_sched_entity_get_free_sched() and use the logic of picking the least loaded drm scheduler from a drm scheduler list to implement drm_sched_pick_best(). This patch also exports drm_sched_pick_best() so that it can be utilized by other drm drivers. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16Revert "drm/scheduler: improve job distribution with multiple queues"changzhu
It needs to revert this patch to avoid amdgpu_test compute hang problem on picasso. This reverts commit 56822db194232c089601728d68ed078dccb97f8b. Signed-off-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09drm/scheduler: implement a function to modify sched listNirmoy Das
Implement drm_sched_entity_modify_sched() which modifies existing sched_list with a different one. This is going to be helpful when userspace changes priority of a ctx/entity then the driver can switch to the corresponding HW scheduler list for that priority. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-27drm/amdgpu: fix doc by clarifying sched_list definitionNirmoy Das
expand sched_list definition for better understanding. Also fix a typo atleast -> at least Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16drm/scheduler: fix documentation by replacing rq_list with sched_listNirmoy Das
This also replaces old artifacts with a correct one in drm_sched_entity_init() declaration Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-16drm/scheduler: improve job distribution with multiple queuesNirmoy Das
This patch uses score based logic to select a new rq for better loadbalance between multiple rq/scheds instead of num_jobs. Below are test results after running amdgpu_test from mesa drm Before this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 314 sdma1 32 comp_1.0.0 56 comp_1.0.1 0 comp_1.1.0 0 comp_1.1.1 0 comp_1.2.0 0 comp_1.2.1 0 comp_1.3.0 0 comp_1.3.1 0 After this patch: sched_name num of many times it got scheduled ========= ================================== sdma0 216 sdma1 185 comp_1.0.0 39 comp_1.0.1 9 comp_1.1.0 12 comp_1.1.1 0 comp_1.2.0 12 comp_1.2.1 0 comp_1.3.0 12 comp_1.3.1 0 Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18drm/scheduler: do not keep a copy of sched listNirmoy Das
entity should not keep copy and maintain sched list for itself. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18drm/scheduler: rework entity creationNirmoy Das
Entity currently keeps a copy of run_queue list and modify it in drm_sched_entity_set_priority(). Entities shouldn't modify run_queue list. Use drm_gpu_scheduler list instead of drm_sched_rq list in drm_sched_entity struct. In this way we can select a runqueue based on entity/ctx's priority for a drm scheduler. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-07drm/sched: Use completion to wait for sched->thread idle v2.Andrey Grodzovsky
Removes thread park/unpark hack from drm_sched_entity_fini and by this fixes reactivation of scheduler thread while the thread is supposed to be stopped. v2: Per sched entity completion. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-15drm/scheduler: use job count instead of peekChristian König
The spsc_queue_peek function is accessing queue->head which belongs to the consumer thread and shouldn't be accessed by the producer This is fixing a rare race condition when destroying entities. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Monk.liu@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-15drm/scheduler: drop use of drmP.hSam Ravnborg
Drop use of the deprecated drmP.h header file. Fix fallout. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Emil Velikov <emil.velikov@collabora.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Huang Rui <ray.huang@amd.com> Cc: Eric Anholt <eric@anholt.net> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Sharat Masetty <smasetty@codeaurora.org> Cc: Nathan Chancellor <natechancellor@gmail.com> Cc: Nayan Deshmukh <nayan26deshmukh@gmail.com> Cc: Sam Ravnborg <sam@ravnborg.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190630061922.7254-26-sam@ravnborg.org
2019-02-22Merge branch 'drm-next-5.1' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-next Fixes for 5.1: amdgpu: - Fix missing fw declaration after dropping old CI DPM code - Fix debugfs access to registers beyond the MMIO bar size - Fix context priority handling - Add missing license on some new files - Various cleanups and bug fixes radeon: - Fix missing break in CS parser for evergreen - Various cleanups and bug fixes sched: - Fix entities with 0 run queues Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190221214134.3308-1-alexander.deucher@amd.com
2019-02-15drm/sched: Fix entities with 0 rqs.Bas Nieuwenhuizen
Some blocks in amdgpu can have 0 rqs. Job creation already fails with -ENOENT when entity->rq is NULL, so jobs cannot be pushed. Without a rq there is no scheduler to pop jobs, and rq selection already does the right thing with a list of length 0. So the operations we need to fix are: - Creation, do not set rq to rq_list[0] if the list can have length 0. - Do not flush any jobs when there is no rq. - On entity destruction handle the rq = NULL case. - on set_priority, do not try to change the rq if it is NULL. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-02-08drm/sched: Always trace the dependencies we wait on, to fix a race.Eric Anholt
The entity->dependency can go away completely once we've called drm_sched_entity_add_dependency_cb() (if the cb is called before we get around to tracing). The tracepoint is more useful if we trace every dependency instead of just ones that get callbacks installed, anyway, so just do that. Fixes any easy-to-produce OOPS when tracing the scheduler on V3D with "perf record -a -e gpu_scheduler:.\* glxgears" and DEBUG_SLAB enabled. Signed-off-by: Eric Anholt <eric@anholt.net> Reviewed-by: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-05drm/scheduler: Add drm_sched_job_cleanupSharat Masetty
This patch adds a new API to clean up the scheduler job resources. This is primarliy needed in cases the job was created but was not queued to the scheduler queue. Additionally with this change, the layer which creates the scheduler job also gets to free up the job's resources and this entails moving the dma_fence_put(finished_fence) to the drivers ops free handler routines. Signed-off-by: Sharat Masetty <smasetty@codeaurora.org> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>