Age | Commit message (Collapse) | Author |
|
When GPU got timeout, it would notify an interested part
of an opportunity to dump info before actual GPU reset.
A usermode app would open 'autodump' node under debugfs system
and poll() for readable/writable. When a GPU reset is due,
amdgpu would notify usermode app through wait_queue_head and give
it 10 minutes to dump info.
After usermode app has done its work, this 'autodump' node is closed.
On node closure, amdgpu gets to know the dump is done through
the completion that is triggered in release().
There is no write or read callback because necessary info can be
obtained through dmesg and umr. Messages back and forth between
usermode app and amdgpu are unnecessary.
v2: (1) changed 'registered' to 'app_listening'
(2) add a mutex in open() to prevent race condition
v3 (chk): grab the reset lock to avoid race in autodump_open,
rename debugfs file to amdgpu_autodump,
provide autodump_read as well,
style and code cleanups
v4: add 'bool app_listening' to differentiate situations, so that
the node can be reopened; also, there is no need to wait for
completion when no app is waiting for a dump.
v5: change 'bool app_listening' to 'enum amdgpu_autodump_state'
add 'app_state_mutex' for race conditions:
(1)Only 1 user can open this file node
(2)wait_dump() can only take effect after poll() executed.
(3)eliminated the race condition between release() and
wait_dump()
v6: removed 'enum amdgpu_autodump_state' and 'app_state_mutex'
removed state checking in amdgpu_debugfs_wait_dump
Improve on top of version 3 so that the node can be reopened.
v7: move reinit_completion into open() so that only one user
can open it.
v8: remove complete_all() from amdgpu_debugfs_wait_dump().
Signed-off-by: Jiange Zhao <Jiange.Zhao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
drm_minor_unregister will invoke drm_debugfs_cleanup
to clean all the child node under primary minor node.
We don't need to invoke amdgpu_debugfs_fini and
amdgpu_debugfs_regs_cleanup to clean agian.
Otherwise, it will raise the NULL pointer like below.
[ 45.046029] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
[ 45.047256] PGD 0 P4D 0
[ 45.047713] Oops: 0002 [#1] SMP PTI
[ 45.048198] CPU: 0 PID: 2796 Comm: modprobe Tainted: G W OE 4.18.0-15-generic #16~18.04.1-Ubuntu
[ 45.049538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 45.050651] RIP: 0010:down_write+0x1f/0x40
[ 45.051194] Code: 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb e8 ce d9 ff ff 48 ba 01 00 00 00 ff ff ff ff 48 89 d8 <f0> 48 0f c1 10 85 d2 74 05 e8 53 1c ff ff 65 48 8b 04 25 00 5c 01
[ 45.053702] RSP: 0018:ffffad8f4133fd40 EFLAGS: 00010246
[ 45.054384] RAX: 00000000000000a8 RBX: 00000000000000a8 RCX: ffffa011327dd814
[ 45.055349] RDX: ffffffff00000001 RSI: 0000000000000001 RDI: 00000000000000a8
[ 45.056346] RBP: ffffad8f4133fd48 R08: 0000000000000000 R09: ffffffffc0690a00
[ 45.057326] R10: ffffad8f4133fd58 R11: 0000000000000001 R12: ffffa0113cff0300
[ 45.058266] R13: ffffa0113c0a0000 R14: ffffffffc0c02a10 R15: ffffa0113e5c7860
[ 45.059221] FS: 00007f60d46f9540(0000) GS:ffffa0113fc00000(0000) knlGS:0000000000000000
[ 45.060809] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 45.061826] CR2: 00000000000000a8 CR3: 0000000136250004 CR4: 00000000003606f0
[ 45.062913] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 45.064404] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 45.065897] Call Trace:
[ 45.066426] debugfs_remove+0x36/0xa0
[ 45.067131] amdgpu_debugfs_ring_fini+0x15/0x20 [amdgpu]
[ 45.068019] amdgpu_debugfs_fini+0x2c/0x50 [amdgpu]
[ 45.068756] amdgpu_pci_remove+0x49/0x70 [amdgpu]
[ 45.069439] pci_device_remove+0x3e/0xc0
[ 45.070037] device_release_driver_internal+0x18a/0x260
[ 45.070842] driver_detach+0x3f/0x80
[ 45.071325] bus_remove_driver+0x59/0xd0
[ 45.071850] driver_unregister+0x2c/0x40
[ 45.072377] pci_unregister_driver+0x22/0xa0
[ 45.073043] amdgpu_exit+0x15/0x57c [amdgpu]
[ 45.073683] __x64_sys_delete_module+0x146/0x280
[ 45.074369] do_syscall_64+0x5a/0x120
[ 45.074916] entry_SYSCALL_64_after_hwframe+0x44/0xa9
v2: remove all debugfs cleanup/fini code at amdgpu
v3: squash in unused variable removal
Signed-off-by: Yintian Tao <yttao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
to amdgpu_debugfs_fini. It will be used for other things in
the future.
Tested-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The MCBP unit test is used to test the functionality of MCBP.
It emualtes to send preemption request and resubmit the unfinished
jobs.
v2: squash in fixes (Alex)
v3: squash in memory leak fix (Jack)
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
amdgpu_device.c was getting pretty cluttered.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|