summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
AgeCommit message (Collapse)Author
2020-09-03drm/amdgpu: disable gpu-sched load balance for uvdNirmoy Das
On hardware with multiple uvd instances, dependent uvd jobs may get scheduled to different uvd instances. Because uvd_enc jobs retain hw context, dependent jobs should always run on the same uvd instance. This patch disables GPU scheduler's load balancer for a context that binds jobs from the same context to a uvd instance. v2: Squash in uvd_enc fix Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-24drm/amdgpu: drm_device to amdgpu_device by inline-f (v2)Luben Tuikov
Get the amdgpu_device from the DRM device by use of an inline function, drm_to_adev(). The inline function resolves a pointer to struct drm_device to a pointer to struct amdgpu_device. v2: Use a typed visible static inline function instead of an invisible macro. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18drm/scheduler: Remove priority macro INVALID (v2)Luben Tuikov
Remove DRM_SCHED_PRIORITY_INVALID. We no longer carry around an invalid priority and cut it off at the source. Backwards compatibility behaviour of AMDGPU CTX IOCTL passing in garbage for context priority from user space and then mapping that to DRM_SCHED_PRIORITY_NORMAL is preserved. v2: Revert "res" --> "r" and "prio" --> "priority". Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-18drm/scheduler: Scheduler priority fixes (v2)Luben Tuikov
Remove DRM_SCHED_PRIORITY_LOW, as it was used in only one place. Rename and separate by a line DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT as it represents a (total) count of said priorities and it is used as such in loops throughout the code. (0-based indexing is the the count number.) Remove redundant word HIGH in priority names, and rename *KERNEL* to *HIGH*, as it really means that, high. v2: Add back KERNEL and remove SW and HW, in lieu of a single HIGH between NORMAL and KERNEL. Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-08-14drm/amdgpu: revert "fix system hang issue during GPU reset"Christian König
The whole approach wasn't thought through till the end. We already had a reset lock like this in the past and it caused the same problems like this one. Completely revert the patch for now and add individual trylock protection to the hardware access functions as necessary. This reverts commit df9c8d1aa278c435c30a69b8f2418b4a52fcb929. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-07-27drm/amdgpu: fix system hang issue during GPU resetDennis Li
when GPU hang, driver has multi-paths to enter amdgpu_device_gpu_recover, the atomic adev->in_gpu_reset and hive->in_reset are used to avoid re-entering GPU recovery. During GPU reset and resume, it is unsafe that other threads access GPU, which maybe cause GPU reset failed. Therefore the new rw_semaphore adev->reset_sem is introduced, which protect GPU from being accessed by external threads during recovery. v2: 1. add rwlock for some ioctls, debugfs and file-close function. 2. change to use dqm->is_resetting and dqm_lock for protection in kfd driver. 3. remove try_lock and change adev->in_gpu_reset as atomic, to avoid re-enter GPU recovery for the same GPU hang. v3: 1. change back to use adev->reset_sem to protect kfd callback functions, because dqm_lock couldn't protect all codes, for example: free_mqd must be called outside of dqm_lock; [ 1230.176199] Hardware name: Supermicro SYS-7049GP-TRT/X11DPG-QT, BIOS 3.1 05/23/2019 [ 1230.177221] Call Trace: [ 1230.178249] dump_stack+0x98/0xd5 [ 1230.179443] amdgpu_virt_kiq_reg_write_reg_wait+0x181/0x190 [amdgpu] [ 1230.180673] gmc_v9_0_flush_gpu_tlb+0xcc/0x310 [amdgpu] [ 1230.181882] amdgpu_gart_unbind+0xa9/0xe0 [amdgpu] [ 1230.183098] amdgpu_ttm_backend_unbind+0x46/0x180 [amdgpu] [ 1230.184239] ? ttm_bo_put+0x171/0x5f0 [ttm] [ 1230.185394] ttm_tt_unbind+0x21/0x40 [ttm] [ 1230.186558] ttm_tt_destroy.part.12+0x12/0x60 [ttm] [ 1230.187707] ttm_tt_destroy+0x13/0x20 [ttm] [ 1230.188832] ttm_bo_cleanup_memtype_use+0x36/0x80 [ttm] [ 1230.189979] ttm_bo_put+0x1be/0x5f0 [ttm] [ 1230.191230] amdgpu_bo_unref+0x1e/0x30 [amdgpu] [ 1230.192522] amdgpu_amdkfd_free_gtt_mem+0xaf/0x140 [amdgpu] [ 1230.193833] free_mqd+0x25/0x40 [amdgpu] [ 1230.195143] destroy_queue_cpsch+0x1a7/0x270 [amdgpu] [ 1230.196475] pqm_destroy_queue+0x105/0x260 [amdgpu] [ 1230.197819] kfd_ioctl_destroy_queue+0x37/0x70 [amdgpu] [ 1230.199154] kfd_ioctl+0x277/0x500 [amdgpu] [ 1230.200458] ? kfd_ioctl_get_clock_counters+0x60/0x60 [amdgpu] [ 1230.201656] ? tomoyo_file_ioctl+0x19/0x20 [ 1230.202831] ksys_ioctl+0x98/0xb0 [ 1230.204004] __x64_sys_ioctl+0x1a/0x20 [ 1230.205174] do_syscall_64+0x5f/0x250 [ 1230.206339] entry_SYSCALL_64_after_hwframe+0x49/0xbe 2. remove try_lock and introduce atomic hive->in_reset, to avoid re-enter GPU recovery. v4: 1. remove an unnecessary whitespace change in kfd_chardev.c 2. remove comment codes in amdgpu_device.c 3. add more detailed comment in commit message 4. define a wrap function amdgpu_in_reset v5: 1. Fix some style issues. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com> Suggested-by: Luben Tukov <luben.tuikov@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09drm/amdgpu: rework sched_list generationNirmoy Das
Generate HW IP's sched_list in amdgpu_ring_init() instead of amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(), ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary. This patch also stores sched_list for all HW IPs in one big array in struct amdgpu_device which makes amdgpu_ctx_init_entity() much more leaner. v2: fix a coding style issue do not use drm hw_ip const to populate amdgpu_ring_type enum v3: remove ctx reference and move sched array and num_sched to a struct use num_scheds to detect uninitialized scheduler list v4: use array_index_nospec for user space controlled variables fix possible checkpatch.pl warnings Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16drm/amdgpu: disable gpu_sched load balancer for vcn jobsNirmoy Das
VCN HW doesn't support dynamic load balance on multiple instances for a context. This patch initializes VNC entities with only one drm_gpu_scheduler picked by drm_sched_pick_best(). Picking a drm_gpu_scheduler using drm_sched_pick_best() ensures that we do load balance among multiple contexts but not among multiple jobs in a context. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-16drm/amdgpu: fix switch-case indentationNirmoy Das
Fix switch-case indentation in amdgpu_ctx_init_entity() Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-10drm/amdgpu: do not set nil entry in compute_prio_schedNirmoy Das
If there are no high priority compute queues available then set normal priority sched array to compute_prio_sched[AMDGPU_GFX_PIPE_PRIO_HIGH] Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09drm/amdgpu: change hw sched list on ctx priority overrideNirmoy Das
Switch to appropriate sched list for an entity on priority override. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-03-09drm/amdgpu: set compute queue priority at mqd_initNirmoy Das
We were changing compute ring priority while rings were being used before every job submission which is not recommended. This patch sets compute queue priority at mqd initialization for gfx8, gfx9 and gfx10. Policy: make queue 0 of each pipe as high priority compute queue High/normal priority compute sched lists are generated from set of high/normal priority compute queues. At context creation, entity of compute queue get a sched list from high or normal priority depending on ctx->priority Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-30drm/amdgpu: allocate entities on demandNirmoy Das
Currently we pre-allocate entities and fences for all the HW IPs on context creation and some of which are might never be used. This patch tries to resolve entity/fences wastage by creating entity only when needed. v2: allocate memory for entity and fences together Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-01-22drm/amdgpu: individualize fence allocation per entityNirmoy Das
Allocate fences for each entity and remove ctx->fences reference as fences should be bound to amdgpu_ctx_entity instead amdgpu_ctx. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-23drm/amdgpu: fix ctx init failure for asics without gfx ringLe Ma
This workaround does not affect other asics because amdgpu only need expose one gfx sched to user for now. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18amd/amdgpu: add sched array to IPs with multiple run-queuesNirmoy Das
This sched array can be passed on to entity creation routine instead of manually creating such sched array on every context creation. v2: squash in missing break fix Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-18drm/scheduler: rework entity creationNirmoy Das
Entity currently keeps a copy of run_queue list and modify it in drm_sched_entity_set_priority(). Entities shouldn't modify run_queue list. Use drm_gpu_scheduler list instead of drm_sched_rq list in drm_sched_entity struct. In this way we can select a runqueue based on entity/ctx's priority for a drm scheduler. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-12-11drm/amdgpu: fix JPEG instance checking when ctx initLeo Liu
Use proper structure. Fixes: 0388aee766376ed ("drm/amdgpu: use the JPEG structure for general driver support") Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-19drm/amdgpu: use the JPEG structure for general driver supportLeo Liu
JPEG1.0 will be functional along with VCN1.0 Signed-off-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-07Revert "drm/amdgpu: dont schedule jobs while in reset"Andrey Grodzovsky
This reverts commit 89b3d86403f1025f6b430d8f9ffc590efbadce62. We will do a proper fix in next patch. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-06drm/amdgpu: dont schedule jobs while in resetShirish S
[Why] doing kthread_park()/unpark() from drm_sched_entity_fini while GPU reset is in progress defeats all the purpose of drm_sched_stop->kthread_park. If drm_sched_entity_fini->kthread_unpark() happens AFTER drm_sched_stop->kthread_park nothing prevents from another (third) thread to keep submitting job to HW which will be picked up by the unparked scheduler thread and try to submit to HW but fail because the HW ring is deactivated. [How] grab the reset lock before calling drm_sched_entity_fini() Signed-off-by: Shirish S <shirish.s@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-23drm/amdgpu: correct ras error count typeGuchun Chen
Use unsigned long type for the same ras count variable. This will avoid overflow on 64 bit system. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-22drm/amdgpu: fix dma_fence_wait without referenceChristian König
We need to grab a reference to the fence we wait for. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-15drm/amdgpu: use exiting amdgpu_ctx_total_num_entities functionKevin Wang
simplify driver code. Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-08-15drm/amdgpu: fix typo error amdgput -> amdgpuKevin Wang
fix typo error: change function name from "amdgput_ctx_total_num_entities" to "amdgpu_ctx_total_num_entities". Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu:add all VCN rings into schedule request queueJames Zhu
Add all VCN instances' decode/encode/jpeg decode rings into drm_sched_rq list. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-07-18drm/amdgpu: modify amdgpu_vcn to support multiple instancesJames Zhu
Arcturus has dual-VCN. Need Restruct amdgpu_device::vcn to support multiple vcns. There are no any logical changes here Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-10drm/amd: drop use of drmP.h in amdgpu/amdgpu*Sam Ravnborg
Drop use of drmP.h in all files named amdgpu* in drm/amd/amdgpu/ Fix fallout. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-10-sam@ravnborg.org
2019-03-19drm/amdgpu: wait for VM to become idle during flushChristian König
Make sure that not only the entities are flush, but that we also wait for the HW to finish all processing. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: remove non-sense NULL ptr checkChristian König
It's a bug having a dead pointer in the IDR, silently returning is the worst we can do. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-03-19drm/amdgpu: Add a new flag to AMDGPU_CTX_OP_QUERY_STATE2xinhui pan
Add AMDGPU_CTX_QUERY2_FLAGS_RAS_CE/UE which indicate if any error happened between previous query and this query. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-02-15drm/amdgpu: Only add rqs for initialized rings.Bas Nieuwenhuizen
I don't see another way to figure out if a ring is initialized if the hardware block might not be initialized. Entities have been fixed up to handle num_rqs = 0. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-12-10drm/amdgpu: Limit vm max ctx number to 4096Rex Zhu
driver need to reserve resource for each ctx for some hw features. so add this limitation. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Rex Zhu <Rex.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-11-30drm/amdgpu: add VCN JPEG support amdgpu_ctx_num_entitiesAlex Deucher
Looks like it was missed when setting support was added. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: amdgpu_ctx_add_fence can't failChristian König
No more waiting for a fence done here. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: rework ctx entity creationChristian König
Use a fixed number of entities for each hardware IP. The number of compute entities is reduced to four, SDMA keeps it two entities and all other engines just expose one entity. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: use entity instead of ring for CSChristian König
Further demangle ring from entity handling. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: remove the queue managerChristian König
Not needed any more since that is now done by the scheduler. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: use scheduler load balancing for compute CSChristian König
Start to use the scheduler load balancing for userspace compute command submissions. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/amdgpu: use scheduler load balancing for SDMA CSChristian König
Start to use the scheduler load balancing for userspace SDMA command submissions. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Chunming Zhou <david1.zhou@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-08-27drm/scheduler: fix setting the priorty for entities (v2)Christian König
Since we now deal with multiple rq we need to update all of them, not just the current one. v2: Trivial: Removed unused variable (Alex) Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-25drm/scheduler: modify API to avoid redundancyNayan Deshmukh
entity has a scheduler field and we don't need the sched argument in any of the functions where entity is provided. Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-13drm/scheduler: modify args of drm_sched_entity_initNayan Deshmukh
replace run queue by a list of run queues and remove the sched arg as that is part of run queue itself Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Eric Anholt <eric@anholt.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-05drm/amdgpu: Rename entity cleanup finctions.Andrey Grodzovsky
Everything in the flush code path (i.e. waiting for SW queue to become empty) names with *_flush() and everything in the release code path names *_fini() Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-07-05drm/scheduler: Rename cleanup functions v2.Andrey Grodzovsky
Everything in the flush code path (i.e. waiting for SW queue to become empty) names with *_flush() and everything in the release code path names *_fini() This patch also effect the amdgpu and etnaviv drivers which use those functions. v2: Also pplay the change to vd3. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-06-15drm/amdgpu: move amdgpu_ctx_mgr_entity_fini to f_ops flush hook (V4)Andrey Grodzovsky
With this we can now terminate jobs enqueue into SW queue the moment the task is being killed instead of waiting for last user of drm file to release it. Also stop checking for kref_read(&ctx->refcount) == 1 when calling drm_sched_entity_do_release since other task might still hold a reference to this entity but we don't care since KILL means terminate job submission regardless of what other tasks are doing. v2: Use returned remaining timeout as parameter for the next call. Rebase. v3: Switch to working with jiffies. Streamline remainder TO usage. Rebase. v4: Rebase. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-05-18drm/amdgpu: Skip drm_sched_entity related ops for KIQ ring.Andrey Grodzovsky
Following change 75fbed2 we never initialize or use the GPU scheduler for KIQ and hence we need to skip KIQ ring when iterating amdgpu_ctx's scheduler entites. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-05-15drm/scheduler: remove unused parameterNayan Deshmukh
this patch also effect the amdgpu and etnaviv drivers which use the function drm_sched_entity_init Signed-off-by: Nayan Deshmukh <nayan26deshmukh@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Acked-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-05-15drm/amdgpu: Switch to interruptable wait to recover from ring hang.Andrey Grodzovsky
v2: Use dma_fence_wait instead of dma_fence_wait_timeout(...,MAX_SCHEDULE_TIMEOUT) Avoid printing error message for ERESTARTSYS Originally-by: David Panariti <David.Panariti@amd.com> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2018-05-15drm/gpu-sched: fix force APP kill hang(v4)Emily Deng
issue: there are VMC page fault occurred if force APP kill during 3dmark test, the cause is in entity_fini we manually signal all those jobs in entity's queue which confuse the sync/dep mechanism: 1)page fault occurred in sdma's clear job which operate on shadow buffer, and shadow buffer's Gart table is cleaned by ttm_bo_release since the fence in its reservation was fake signaled by entity_fini() under the case of SIGKILL received. 2)page fault occurred in gfx' job because during the lifetime of gfx job we manually fake signal all jobs from its entity in entity_fini(), thus the unmapping/clear PTE job depend on those result fence is satisfied and sdma start clearing the PTE and lead to GFX page fault. fix: 1)should at least wait all jobs already scheduled complete in entity_fini() if SIGKILL is the case. 2)if a fence signaled and try to clear some entity's dependency, should set this entity guilty to prevent its job really run since the dependency is fake signaled. v2: splitting drm_sched_entity_fini() into two functions: 1)The first one is does the waiting, removes the entity from the runqueue and returns an error when the process was killed. 2)The second one then goes over the entity, install it as completion signal for the remaining jobs and signals all jobs with an error code. v3: 1)Replace the fini1 and fini2 with better name 2)Call the first part before the VM teardown in amdgpu_driver_postclose_kms() and the second part after the VM teardown 3)Keep the original function drm_sched_entity_fini to refine the code. v4: 1)Rename entity->finished to entity->last_scheduled; 2)Rename drm_sched_entity_fini_job_cb() to drm_sched_entity_kill_jobs_cb(); 3)Pass NULL to drm_sched_entity_fini_job_cb() if -ENOENT; 4)Replace the type of entity->fini_status with "int"; 5)Remove the check about entity->finished. Signed-off-by: Monk Liu <Monk.Liu@amd.com> Signed-off-by: Emily Deng <Emily.Deng@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>