Age | Commit message (Collapse) | Author |
|
wait memory room until enough before writing mes packets
to avoid ring buffer overflow.
v2: squash in sched_hw_submission fix
Fixes: de3246254156 ("drm/amdgpu: cleanup MES11 command submission")
Fixes: fffe347e1478 ("drm/amdgpu: cleanup MES12 command submission")
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For some value of optimisation we can replace the division with an
bitwise and. And it even shrinks the code. Before:
6c9: 53 push %rbx
6ca: 4c 8b 47 08 mov 0x8(%rdi),%r8
6ce: 31 d2 xor %edx,%edx
6d0: 48 89 fb mov %rdi,%rbx
6d3: 8b 87 c8 05 00 00 mov 0x5c8(%rdi),%eax
6d9: 41 8b 48 04 mov 0x4(%r8),%ecx
6dd: f7 d0 not %eax
6df: 21 c8 and %ecx,%eax
6e1: 83 c1 01 add $0x1,%ecx
6e4: 83 c0 01 add $0x1,%eax
6e7: f7 f1 div %ecx
6e9: 89 d6 mov %edx,%esi
6eb: 41 ff 90 88 00 00 00 call *0x88(%r8)
After:
6c9: 53 push %rbx
6ca: 48 8b 57 08 mov 0x8(%rdi),%rdx
6ce: 48 89 fb mov %rdi,%rbx
6d1: 8b 87 c8 05 00 00 mov 0x5c8(%rdi),%eax
6d7: 8b 72 04 mov 0x4(%rdx),%esi
6da: f7 d0 not %eax
6dc: 21 f0 and %esi,%eax
6de: 83 c0 01 add $0x1,%eax
6e1: 21 c6 and %eax,%esi
6e3: ff 92 88 00 00 00 call *0x88(%rdx)
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Do not make a function call for zero size NOP as it
does not add anything in the ring and is unnecessary
function call.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Check the ring type value to fix the out-of-bounds
write warning
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Clear overflowed array index read warning by cast operation.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
An errant disk backup on my desktop got into debugfs and triggered the
following deadlock scenario in the amdgpu debugfs files. The machine
also hard-resets immediately after those lines are printed (although I
wasn't able to reproduce that part when reading by hand):
[ 1318.016074][ T1082] ======================================================
[ 1318.016607][ T1082] WARNING: possible circular locking dependency detected
[ 1318.017107][ T1082] 6.8.0-rc7-00015-ge0c8221b72c0 #17 Not tainted
[ 1318.017598][ T1082] ------------------------------------------------------
[ 1318.018096][ T1082] tar/1082 is trying to acquire lock:
[ 1318.018585][ T1082] ffff98c44175d6a0 (&mm->mmap_lock){++++}-{3:3}, at: __might_fault+0x40/0x80
[ 1318.019084][ T1082]
[ 1318.019084][ T1082] but task is already holding lock:
[ 1318.020052][ T1082] ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.020607][ T1082]
[ 1318.020607][ T1082] which lock already depends on the new lock.
[ 1318.020607][ T1082]
[ 1318.022081][ T1082]
[ 1318.022081][ T1082] the existing dependency chain (in reverse order) is:
[ 1318.023083][ T1082]
[ 1318.023083][ T1082] -> #2 (reservation_ww_class_mutex){+.+.}-{3:3}:
[ 1318.024114][ T1082] __ww_mutex_lock.constprop.0+0xe0/0x12f0
[ 1318.024639][ T1082] ww_mutex_lock+0x32/0x90
[ 1318.025161][ T1082] dma_resv_lockdep+0x18a/0x330
[ 1318.025683][ T1082] do_one_initcall+0x6a/0x350
[ 1318.026210][ T1082] kernel_init_freeable+0x1a3/0x310
[ 1318.026728][ T1082] kernel_init+0x15/0x1a0
[ 1318.027242][ T1082] ret_from_fork+0x2c/0x40
[ 1318.027759][ T1082] ret_from_fork_asm+0x11/0x20
[ 1318.028281][ T1082]
[ 1318.028281][ T1082] -> #1 (reservation_ww_class_acquire){+.+.}-{0:0}:
[ 1318.029297][ T1082] dma_resv_lockdep+0x16c/0x330
[ 1318.029790][ T1082] do_one_initcall+0x6a/0x350
[ 1318.030263][ T1082] kernel_init_freeable+0x1a3/0x310
[ 1318.030722][ T1082] kernel_init+0x15/0x1a0
[ 1318.031168][ T1082] ret_from_fork+0x2c/0x40
[ 1318.031598][ T1082] ret_from_fork_asm+0x11/0x20
[ 1318.032011][ T1082]
[ 1318.032011][ T1082] -> #0 (&mm->mmap_lock){++++}-{3:3}:
[ 1318.032778][ T1082] __lock_acquire+0x14bf/0x2680
[ 1318.033141][ T1082] lock_acquire+0xcd/0x2c0
[ 1318.033487][ T1082] __might_fault+0x58/0x80
[ 1318.033814][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu]
[ 1318.034181][ T1082] full_proxy_read+0x55/0x80
[ 1318.034487][ T1082] vfs_read+0xa7/0x360
[ 1318.034788][ T1082] ksys_read+0x70/0xf0
[ 1318.035085][ T1082] do_syscall_64+0x94/0x180
[ 1318.035375][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.035664][ T1082]
[ 1318.035664][ T1082] other info that might help us debug this:
[ 1318.035664][ T1082]
[ 1318.036487][ T1082] Chain exists of:
[ 1318.036487][ T1082] &mm->mmap_lock --> reservation_ww_class_acquire --> reservation_ww_class_mutex
[ 1318.036487][ T1082]
[ 1318.037310][ T1082] Possible unsafe locking scenario:
[ 1318.037310][ T1082]
[ 1318.037838][ T1082] CPU0 CPU1
[ 1318.038101][ T1082] ---- ----
[ 1318.038350][ T1082] lock(reservation_ww_class_mutex);
[ 1318.038590][ T1082] lock(reservation_ww_class_acquire);
[ 1318.038839][ T1082] lock(reservation_ww_class_mutex);
[ 1318.039083][ T1082] rlock(&mm->mmap_lock);
[ 1318.039328][ T1082]
[ 1318.039328][ T1082] *** DEADLOCK ***
[ 1318.039328][ T1082]
[ 1318.040029][ T1082] 1 lock held by tar/1082:
[ 1318.040259][ T1082] #0: ffff98c4c13f55f8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: amdgpu_debugfs_mqd_read+0x6a/0x250 [amdgpu]
[ 1318.040560][ T1082]
[ 1318.040560][ T1082] stack backtrace:
[ 1318.041053][ T1082] CPU: 22 PID: 1082 Comm: tar Not tainted 6.8.0-rc7-00015-ge0c8221b72c0 #17 3316c85d50e282c5643b075d1f01a4f6365e39c2
[ 1318.041329][ T1082] Hardware name: Gigabyte Technology Co., Ltd. B650 AORUS PRO AX/B650 AORUS PRO AX, BIOS F20 12/14/2023
[ 1318.041614][ T1082] Call Trace:
[ 1318.041895][ T1082] <TASK>
[ 1318.042175][ T1082] dump_stack_lvl+0x4a/0x80
[ 1318.042460][ T1082] check_noncircular+0x145/0x160
[ 1318.042743][ T1082] __lock_acquire+0x14bf/0x2680
[ 1318.043022][ T1082] lock_acquire+0xcd/0x2c0
[ 1318.043301][ T1082] ? __might_fault+0x40/0x80
[ 1318.043580][ T1082] ? __might_fault+0x40/0x80
[ 1318.043856][ T1082] __might_fault+0x58/0x80
[ 1318.044131][ T1082] ? __might_fault+0x40/0x80
[ 1318.044408][ T1082] amdgpu_debugfs_mqd_read+0x103/0x250 [amdgpu 8fe2afaa910cbd7654c8cab23563a94d6caebaab]
[ 1318.044749][ T1082] full_proxy_read+0x55/0x80
[ 1318.045042][ T1082] vfs_read+0xa7/0x360
[ 1318.045333][ T1082] ksys_read+0x70/0xf0
[ 1318.045623][ T1082] do_syscall_64+0x94/0x180
[ 1318.045913][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.046201][ T1082] ? lockdep_hardirqs_on+0x7d/0x100
[ 1318.046487][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.046773][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047057][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047337][ T1082] ? do_syscall_64+0xa0/0x180
[ 1318.047611][ T1082] entry_SYSCALL_64_after_hwframe+0x46/0x4e
[ 1318.047887][ T1082] RIP: 0033:0x7f480b70a39d
[ 1318.048162][ T1082] Code: 91 ba 0d 00 f7 d8 64 89 02 b8 ff ff ff ff eb b2 e8 18 a3 01 00 0f 1f 84 00 00 00 00 00 80 3d a9 3c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
[ 1318.048769][ T1082] RSP: 002b:00007ffde77f5c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[ 1318.049083][ T1082] RAX: ffffffffffffffda RBX: 0000000000000800 RCX: 00007f480b70a39d
[ 1318.049392][ T1082] RDX: 0000000000000800 RSI: 000055c9f2120c00 RDI: 0000000000000008
[ 1318.049703][ T1082] RBP: 0000000000000800 R08: 000055c9f2120a94 R09: 0000000000000007
[ 1318.050011][ T1082] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c9f2120c00
[ 1318.050324][ T1082] R13: 0000000000000008 R14: 0000000000000008 R15: 0000000000000800
[ 1318.050638][ T1082] </TASK>
amdgpu_debugfs_mqd_read() holds a reservation when it calls
put_user(), which may fault and acquire the mmap_sem. This violates
the established locking order.
Bounce the mqd data through a kernel buffer to get put_user() out of
the illegal section.
Fixes: 445d85e3c1df ("drm/amdgpu: add debugfs interface for reading MQDs")
Cc: stable@vger.kernel.org # v6.5+
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix the warning info below during mode1 reset.
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000006] ? show_regs+0x6e/0x80
[ +0.000011] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000005] ? __warn+0x91/0x150
[ +0.000009] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000006] ? report_bug+0x19d/0x1b0
[ +0.000013] ? handle_bug+0x46/0x80
[ +0.000012] ? exc_invalid_op+0x1d/0x80
[ +0.000011] ? asm_exc_invalid_op+0x1f/0x30
[ +0.000014] ? __flush_work.isra.0+0x2e8/0x390
[ +0.000007] ? __flush_work.isra.0+0x208/0x390
[ +0.000007] ? _prb_read_valid+0x216/0x290
[ +0.000008] __cancel_work_timer+0x11d/0x1a0
[ +0.000007] ? try_to_grab_pending+0xe8/0x190
[ +0.000012] cancel_work_sync+0x14/0x20
[ +0.000008] amddrm_sched_stop+0x3c/0x1d0 [amd_sched]
[ +0.000032] amdgpu_device_gpu_recover+0x29a/0xe90 [amdgpu]
This warning info was printed after applying the patch
"drm/sched: Convert drm scheduler to use a work queue rather than kthread".
The root cause is that amdgpu driver tries to use the uninitialized
work_struct in the struct drm_gpu_scheduler
v2:
- Rename the function to amdgpu_ring_sched_ready and move it to
amdgpu_ring.c (Alex)
v3:
- Fix a few more checks based on Vitaly's patch (Alex)
v4:
- squash in fix noticed by Bert in
https://gitlab.freedesktop.org/drm/amd/-/issues/3139
Fixes: 11b3b9f461c5 ("drm/sched: Check scheduler ready before calling timeout handling")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This improves latency if the GPU is already busy with other work.
This is useful for VR compositors that submit highly latency-sensitive
compositing work on high-priority compute queues while the GPU is busy
rendering the next frame.
Userspace merge request:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462
v2: bump driver version (Alex)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Create a module option to disable soft recoveries on amdgpu, making
every recovery go through the device reset path. This option makes
easier to force device resets for testing and debugging purposes.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Set the fence error code before trying to soft-recover it.
It gets overwritten when a hard recovery is required.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This allows us to insert some error codes into the bottom of the pipeline
on an engine.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When the preempted IB frame resubmitted to cp, we need to modify the frame
data including:
1. set PRE_RESUME 1 in CONTEXT_CONTROL.
2. use meta data(DE and CE) read from CSA in WRITE_DATA.
Add functions to save the location the first time IBs emitted and callback
to patch the package when resubmission happens.
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix below warnings reported by checkpatch:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: static const char * array should probably be static const char * const
WARNING: space prohibited between function name and open parenthesis '('
WARNING: braces {} are not necessary for single statement blocks
WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Provide a debugfs interface to access the MQD. Useful for
debugging issues with the CP and MES hardware scheduler.
v2: fix missing unreserve/unmap when pos >= size (Alex)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The fences associated with mes queue have to be freed
up during amdgpu_ring_fini.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
And ensure each ring supports that many submissions. This makes
sure that we don't get surprises after the submission has been
scheduled where the ring allocation actually gets rejected.
My calculations on the existing limits:
COMPUTE v10: 128
COMPUTE v11: 128
COMPUTE v6: 157
COMPUTE v7: 133
COMPUTE v8: 130
COMPUTE v9: 125
GFX v10: 208
GFX v11: 213
GFX v6: 154 (doubling this in the previous patch)
GFX v7: 226
GFX v8: 213
GFX v9: 208
GFX v9 (SW): 208
SDMA CIK: 87
SDMA SI: 97
SDMA v2.4: 74
SDMA v3.0: 74
SDMA v4.0: 72
SDMA v5.0: 51
SDMA v6.0: 52
UVD ENC v6.0: 98
UVD ENC v7.0: 92
UVD v3.1: 124
UVD v4.2: 124
UVD v5.0: 83
UVD v6.0 (VM): 55
UVD v7.0: 51
VCE v2.0: 126
VCE v3.0 (VM): 98
VCE v4.0: 93
VCN DEC v1.0: 49
VCN DEC v2.0: 51
VCN DEC v3.0: 51
VCN ENC v1.0: 58
VCN ENC v2.0: 93
VCN ENC v3.0: 93
VCN ENC v4.0: 93
VCN JPEG v1.0: 17
VCN JPEG v2.0: 16
VCN JPEG v2.5: 17
VCN JPEG v3.0: 17
VCN JPEG v4.0: 17
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2498
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Trigger Mid-Command Buffer Preemption according to the priority of the software
rings and the hw fence signalling condition.
The muxer saves the locations of the indirect buffer frames from the software
ring together with the fence sequence number in its fifo queue, and pops out
those records when the fences are signalled. The locations are used to resubmit
packages in preemption scenarios by coping the chunks from the software ring.
v2: Update comment style.
v3: Fix conflict caused by previous modifications.
v4: Remove unnecessary prints.
v5: Fix corner cases for resubmission cases.
v6: Refactor functions for resubmission, calling fence_process in irq handler.
v7: Solve conflict for removing amdgpu_sw_ring.c.
v8: Add time threshold to judge if preemption request is needed.
v9: Correct comment spelling. Set fence emit timestamp before rsu assignment.
Cc: Christian Koenig <Christian.Koenig@amd.com>
Cc: Luben Tuikov <Luben.Tuikov@amd.com>
Cc: Andrey Grodzovsky <Andrey.Grodzovsky@amd.com>
Cc: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Jiadong.Zhu <Jiadong.Zhu@amd.com>
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This reverts commit 5bd8d53f6fa53eab5433698d1362dae2aa53c1cc.
This commit breaks the reset logic for aldebaran, revert it for now.
Will move the mask inside the reset handler.
Fixes: 5bd8d53f6fa53e ("drm/amdgpu: add debugfs amdgpu_reset_level")
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Introduce amdgpu_reset_level debugfs in order to help debug and
test specific type of reset. Also helps blocking unwanted type of
resets.
By default, mode2 reset will not be enabled
v2: make this debugfs in adev and use debugfs_create_u32
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix alignment problems reported by zuul for the
commit b07d1d73b09e ("drm/amd/amdgpu: Enable high priority gfx queue")
Fixes: b07d1d73b09e ("drm/amd/amdgpu: Enable high priority gfx queue")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Starting from SIENNA CICHLID asic supports two gfx pipes, enabling
two graphics queues, 1 on each pipe, pipe0 queue0 would be the normal
piority queue and pipe1 queue0 would be the high priority queue
Only one queue per pipe is visble to SPI, SPI looks at the priority
value assigned to CP_GFX_HQD_QUEUE_PRIORITY from each of the queue's
HQD/MQD.
Create contexts applying AMDGPU_CTX_PRIORITY_HIGH which submits job
to the high priority queue on GFX pipe1. There would be starvation
of LP workload if HP workload is always available.
v2:
- remove unnecessary check(Nirmoy)
- make pipe1 hardware support a separate patch(Nirmoy)
- remove duplicate code(Shashank)
- add CSA support for second gfx pipe(Alex)
v3(Christian):
- fix incorrect indentation
- merge COMPUTE and GFX switch cases as both calls the same function.
v4:
- rebase w/ latest code base
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Iniailize/finalize the ring for mes queue which submits the command
stream to the mes-managed hardware queue.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add the helper function to initialize mqd from ring configuration.
v2: use if/else pair instead of ?/: pair
v3: use simpler way to judge hqd_active
v4: fix parameters to amdgpu_gfx_is_high_priority_compute_queue
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This way we don't need to check for NULL any more.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Before we initialize schedulers we must know which reset
domain are we in - for single device there iis a single
domain per device and so single wq per device. For XGMI
the reset domain spans the entire XGMI hive and so the
reset wq is per hive.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://www.spinics.net/lists/amd-gfx/msg74112.html
|
|
Use debugfs_create_file_size API for creating ring debugfs, and as its a
NULL returning API, change the return type for amdgpu_debugfs_ring_init
API as well. Also cleanup surrounding code.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
debugfs APIs returns encoded error so use
IS_ERR for checking return value.
v2: return PTR_ERR(ent)
References: https://gitlab.freedesktop.org/drm/amd/-/issues/1686
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-By: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:169: warning: Function parameter or member 'sched_score' not described in 'amdgpu_ring_init'
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Allow separate ring to share the same scheduler score.
No functional change.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch consist of below related changes:
1 Rename ring->priority to ring->hw_prio.
2 Assign correct hardware ring priority.
3 Remove ring->priority_mutex as ring priority remains unchanged
after initialization.
4 Remove unused ring->num_jobs.
v3: remove ring->num_jobs.
v2: remove ring->priority_mutex.
Fixes: 33abcb1f5a17 ("drm/amdgpu: set compute queue priority at mqd_init")
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:168: warning: Function parameter or member 'max_dw' not described in 'amdgpu_ring_init'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:168: warning: Excess function parameter 'max_ndw' description in 'amdgpu_ring_init'
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes the following W=1 kernel build warning(s):
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:63: warning: Excess function parameter 'adev' description in 'amdgpu_ring_alloc'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:122: warning: Excess function parameter 'adev' description in 'amdgpu_ring_commit'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:167: warning: Function parameter or member 'max_dw' not described in 'amdgpu_ring_init'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:167: warning: Function parameter or member 'irq_src' not described in 'amdgpu_ring_init'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:167: warning: Function parameter or member 'irq_type' not described in 'amdgpu_ring_init'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:167: warning: Function parameter or member 'hw_prio' not described in 'amdgpu_ring_init'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:167: warning: Excess function parameter 'max_ndw' description in 'amdgpu_ring_init'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:167: warning: Excess function parameter 'nop' description in 'amdgpu_ring_init'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:285: warning: Excess function parameter 'adev' description in 'amdgpu_ring_fini'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:325: warning: Function parameter or member 'ring' not described in 'amdgpu_ring_emit_reg_write_reg_wait_helper'
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c:325: warning: Excess function parameter 'adev' description in 'amdgpu_ring_emit_reg_write_reg_wait_helper'
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When declaring pointer data, the "*" symbol should be used adjacent to
the data name as per the coding standards. This resolves following
issues reported by checkpatch script:
ERROR: "foo * bar" should be "foo *bar"
ERROR: "foo * bar" should be "foo *bar"
ERROR: "foo* bar" should be "foo *bar"
ERROR: "(foo*)" should be "(foo *)"
Signed-off-by: Deepak R Varma <mh12gx2825@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a static inline adev_to_drm() to obtain
the DRM device pointer from an amdgpu_device pointer.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove DRM_SCHED_PRIORITY_LOW, as it was used
in only one place.
Rename and separate by a line
DRM_SCHED_PRIORITY_MAX to DRM_SCHED_PRIORITY_COUNT
as it represents a (total) count of said
priorities and it is used as such in loops
throughout the code. (0-based indexing is the
the count number.)
Remove redundant word HIGH in priority names,
and rename *KERNEL* to *HIGH*, as it really
means that, high.
v2: Add back KERNEL and remove SW and HW,
in lieu of a single HIGH between NORMAL and KERNEL.
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Set up a GPU scheduler based on the ring flag rather
than the ring type.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This allows IPs to flag whether a specific ring requires
a GPU scheduler or not. E.g., sometimes instances of an
IP are asymmetric and have different capabilities.
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Generate HW IP's sched_list in amdgpu_ring_init() instead of
amdgpu_ctx.c. This makes amdgpu_ctx_init_compute_sched(),
ring.has_high_prio and amdgpu_ctx_init_sched() unnecessary.
This patch also stores sched_list for all HW IPs in one big
array in struct amdgpu_device which makes amdgpu_ctx_init_entity()
much more leaner.
v2:
fix a coding style issue
do not use drm hw_ip const to populate amdgpu_ring_type enum
v3:
remove ctx reference and move sched array and num_sched to a struct
use num_scheds to detect uninitialized scheduler list
v4:
use array_index_nospec for user space controlled variables
fix possible checkpatch.pl warnings
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
AMDGPU statically sets priority for compute queues
at initialization so remove all the functions
responsible for changing compute queue priority dynamically.
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
drm_minor_unregister will invoke drm_debugfs_cleanup
to clean all the child node under primary minor node.
We don't need to invoke amdgpu_debugfs_fini and
amdgpu_debugfs_regs_cleanup to clean agian.
Otherwise, it will raise the NULL pointer like below.
[ 45.046029] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 45.047256] PGD 0 P4D 0
[ 45.047713] Oops: 0002 [#1] SMP PTI
[ 45.048198] CPU: 0 PID: 2796 Comm: modprobe Tainted: G W OE 4.18.0-15-generic #16~18.04.1-Ubuntu
[ 45.049538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 45.050651] RIP: 0010:down_write+0x1f/0x40
[ 45.051194] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 ce d9 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 53 1c ff ff 65 48 8b 04 25 00 5c 01
[ 45.053702] RSP: 0018:ffffad8f4133fd40 EFLAGS: 00010246
[ 45.054384] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: ffffa011327dd814
[ 45.055349] RDX: ffffffff00000001 RSI: 0000000000000001 RDI: 00000000000000a8
[ 45.056346] RBP: ffffad8f4133fd48 R08: 0000000000000000 R09: ffffffffc0690a00
[ 45.057326] R10: ffffad8f4133fd58 R11: 0000000000000001 R12: ffffa0113cff0300
[ 45.058266] R13: ffffa0113c0a0000 R14: ffffffffc0c02a10 R15: ffffa0113e5c7860
[ 45.059221] FS: 00007f60d46f9540(0000) GS:ffffa0113fc00000(0000) knlGS:0000000000000000
[ 45.060809] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.061826] CR2: 00000000000000a8 CR3: 0000000136250004 CR4: 00000000003606f0
[ 45.062913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 45.064404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 45.065897] Call Trace:
[ 45.066426] debugfs_remove+0x36/0xa0
[ 45.067131] amdgpu_debugfs_ring_fini+0x15/0x20 [amdgpu]
[ 45.068019] amdgpu_debugfs_fini+0x2c/0x50 [amdgpu]
[ 45.068756] amdgpu_pci_remove+0x49/0x70 [amdgpu]
[ 45.069439] pci_device_remove+0x3e/0xc0
[ 45.070037] device_release_driver_internal+0x18a/0x260
[ 45.070842] driver_detach+0x3f/0x80
[ 45.071325] bus_remove_driver+0x59/0xd0
[ 45.071850] driver_unregister+0x2c/0x40
[ 45.072377] pci_unregister_driver+0x22/0xa0
[ 45.073043] amdgpu_exit+0x15/0x57c [amdgpu]
[ 45.073683] __x64_sys_delete_module+0x146/0x280
[ 45.074369] do_syscall_64+0x5a/0x120
[ 45.074916] entry_SYSCALL_64_after_hwframe+0x44/0xa9
v2: remove all debugfs cleanup/fini code at amdgpu
v3: squash in unused variable removal
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In order to remove the load and unload drm callbacks,
we need to reorder the init sequence to move all the drm
debugfs file handling. Do this for rings.
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
cleanup amdgpu_ring_fini to check the prerequisites before changing ring->sched.ready
Signed-off-by: Nirmoy Das <nirmoy.das@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Backmerge drm-next and fix up conflicts due to drmP.h removal.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The trailing fence for ring is used to track the
completion of preemption.
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It's incorrect to do soft reset for SRIOV, when GFX
hang the WREG would stuck there becuase it goes KIQ way.
the GPU reset counter is incorrect: always increase twice
for each timedout
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Drop use of drmP.h in all files named amdgpu*
in drm/amd/amdgpu/
Fix fallout.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20190609220757.10862-10-sam@ravnborg.org
|
|
To aid recoverable page faults.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
amdgpu_ring_soft_recovery would have Call-Trace,
when s_fence->parent was NULL inside amdgpu_job_timedout.
Check fence first, as drm_sched_hw_job_reset did.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move all error messages from IP specific code into the common helper.
This way we now uses the ring name in the messages instead of the index
and note which device is affected as well.
Also cleanup error handling in the IP specific code and consequently use
ETIMEDOUT when the ring test timed out.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Start using drm_gpu_scheduler.ready isntead.
v3:
Add helper function to run ring test and set
sched.ready flag status accordingly, clean explicit
sched.ready sets from the IP specific files.
v4: Add kerneldoc and rebase.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|