Age | Commit message (Collapse) | Author |
|
As we discussed before[1], soft recovery should be
forwarded to userspace, or we can get into a really
bad state where apps will keep submitting hanging
command buffers cascading us to a hard reset.
1: https://lore.kernel.org/all/bf23d5ed-9a6b-43e7-84ee-8cbfd0d60f18@froggi.es/
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
- amdgpu_ras_error_statistic_ue_count()
- amdgpu_ras_error_statistic_ce_count()
- amdgpu_ras_error_statistic_de_count()
The parameter 'err_addr' is no longer used since following patch.
Fixes: a7e8467fbeee ("drm/amdgpu: Remove unused code")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adding Manual GDB golden setting for gc v12
revision 0 ASIC.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In the convenience of calling it globally.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adding NOP packets one by one in the ring
does not use the CP efficiently.
Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adding NOP packets one by one in the ring
does not use the CP efficiently.
Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For some value of optimisation we can replace the division with an
bitwise and. And it even shrinks the code. Before:
6c9: 53 push %rbx
6ca: 4c 8b 47 08 mov 0x8(%rdi),%r8
6ce: 31 d2 xor %edx,%edx
6d0: 48 89 fb mov %rdi,%rbx
6d3: 8b 87 c8 05 00 00 mov 0x5c8(%rdi),%eax
6d9: 41 8b 48 04 mov 0x4(%r8),%ecx
6dd: f7 d0 not %eax
6df: 21 c8 and %ecx,%eax
6e1: 83 c1 01 add $0x1,%ecx
6e4: 83 c0 01 add $0x1,%eax
6e7: f7 f1 div %ecx
6e9: 89 d6 mov %edx,%esi
6eb: 41 ff 90 88 00 00 00 call *0x88(%r8)
After:
6c9: 53 push %rbx
6ca: 48 8b 57 08 mov 0x8(%rdi),%rdx
6ce: 48 89 fb mov %rdi,%rbx
6d1: 8b 87 c8 05 00 00 mov 0x5c8(%rdi),%eax
6d7: 8b 72 04 mov 0x4(%rdx),%esi
6da: f7 d0 not %eax
6dc: 21 f0 and %esi,%eax
6de: 83 c0 01 add $0x1,%eax
6e1: 21 c6 and %eax,%esi
6e3: ff 92 88 00 00 00 call *0x88(%rdx)
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Data abort exception and unknown errors are supported.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
MMHUB v4.1.0 only support fixed cache mode, so
only use legacy invalidation accordingly.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Frank Min <Frank.Min@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adding NOP packets one by one in the ring
does not use the CP efficiently.
Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
avoid using SDMA if it is unavailable.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adding NOP packets one by one in the ring
does not use the CP efficiently.
Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.
Reviewed-by: Christian König <christian.koenig@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Do not make a function call for zero size NOP as it
does not add anything in the ring and is unnecessary
function call.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Support per-queue reset for GFX9. The recommendation is for the driver
to target reset the HW queue via a SPI MMIO register write.
Since this requires pipe and HW queue info and MEC FW is limited to
doorbell reports of hung queues after an unmap failure, scan the HW
queue slots defined by SET_RESOURCES first to identify the user queue
candidates to reset.
Only signal reset events to processes that have had a queue reset.
If queue reset fails, fall back to GPU reset.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adding NOP packets one by one in the ring
does not use the CP efficiently.
Solution:
Use CP optimization while adding NOP packet's so PFP
can discard NOP packets based on information of count
from the Header instead of fetching all NOP packets
one by one.
Cc: Christian König <christian.koenig@amd.com>
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Tvrtko Ursulin <tursulin@igalia.com>
Cc: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
debugfs register list for dump is cleaned as it have
some issues related to proper power state of the IP
before register read.
Since the above mentioned is removed we no longer want
this to be dumped part of the devcoredump and hence
removed.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
There are some problem with existing amdgpu_reset_dump_register_list
debugfs node. It is supposed to read a list of registers but there
could be cases when the IP is not in correct power state. Register
read in such cases could lead to more problems.
We are taking care of all such power states in devcoredump and
dumping the registers of need for debugging. So cleaning this code
and we dont need this functionality via debugfs anymore.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This reverts commit 9d8c094ddab05db88d183ba82e23be807848cad8.
It was merged without meeting userspace requirements.
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240802145946.48073-2-hamza.mahfooz@amd.com
|
|
Backmerging to get a late RC of v6.10 before moving into v6.11.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
MES firmware requires larger log buffer for gfx12. Allocate
proper buffer respectively for gfx11 and gfx12.
Signed-off-by: Michael Chen <michael.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 739d0f3e1f36738d4cd84166784a8f7a58d69612)
|
|
Otherwise we won't get correct access to the IB.
v2: keep setting AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS to avoid problems in
the VRAM backend.
Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3501
Fixes: e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit fbfb5f0342253d92c4e446588c428a9d90c3f610)
|
|
Instead of manually passing around 'struct edid *' and its size,
use 'struct drm_edid', which encapsulates a validated combination of
both.
As the drm_edid_ can handle NULL gracefully, the explicit checks can be
dropped.
Also save a few characters by transforming '&array[0]' to the equivalent
'array' and using 'max_t(int, ...)' instead of manual casts.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Problem:
IP dump right now is done post suspend of all
IP's which for some IP's could change power
state and software state too which we do not want
to reflect in the dump as it might not be same at
the time of hang.
Solution:
IP should be dumped as close to the HW state when
the GPU was in hung state without trying to reinitialize
any resource.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
MES firmware requires larger log buffer for gfx12. Allocate
proper buffer respectively for gfx11 and gfx12.
Signed-off-by: Michael Chen <michael.chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
VCN dump is dependent on power state of the ip. Dump is
valid if VCN was powered up at the time of ip dump.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
ISP I2C bus device can't be enumerated via ACPI mechanism
since it shares the memory map with the AMDGPU.
So use the MFD mechanism for registering the ISP I2C device
and add the required resources.
Signed-off-by: Venkata Narendra Kumar Gutta <vengutta@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Otherwise we won't get correct access to the IB.
v2: keep setting AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS to avoid problems in
the VRAM backend.
Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3501
Fixes: e362b7c8f8c7 ("drm/amdgpu: Modify the contiguous flags behaviour")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Tested-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add support for logging the registers in devcoredump
buffer for vcn_v3_0.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add support of vcn ip dump in the devcoredump
for vcn_v3_0.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add macro definition which calculate offset of the
register with index override.
This is useful in case when there is an array of
registers which is common for all instances.
To read registers in that case it is easy to define
registers once and the index value is manually passed
to calculate proper offset of register for each instance.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add pointer to the vcn ip dump in the vcn global structure
to be accessible for all vcn version via global adev.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The comment in the vbios structure says:
// = 128 means EDID length is 128 bytes, otherwise the EDID length = ucFakeEDIDLength*128
This fake edid struct has not been used in a long time, so I'm
not sure if there were actually any boards out there with a non-128 byte
EDID, but align the code with the comment.
Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
Reported-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lists.freedesktop.org/archives/amd-gfx/2024-June/109964.html
Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This was basically just another one of amdgpus hacks. The parameter
allowed to restart the scheduler without turning fence signaling on
again.
That this is absolutely not a good idea should be obvious by now since
the fences will then just sit there and never signal.
While at it cleanup the code a bit.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240722083816.99685-1-christian.koenig@amd.com
|
|
[Why]
Page table of compute VM in the VRAM will lost after gpu reset.
VRAM won't be restored since compute VM has no shadows.
[How]
Use higher 32-bit of vm->generation to record a vram_lost_counter.
Reset the VM state machine when vm->genertaion is not equal to
the new generation token.
v2: Check vm->generation instead of calling drm_sched_entity_error
in amdgpu_vm_validate.
v3: Use new generation token instead of vram_lost_counter for check.
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit 47c0388b0589cb481c294dcb857d25a214c46eb3)
|
|
To prevent below probe failure, add a check for models with VCN
IP v4.0.6 where VCN1 may be harvested.
v2:
Apply the same check to VCN IP v4.0 and v5.0.
[ 54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[ 54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43
c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02
00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00
[ 54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286
[ 54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX:
0000000000000000
[ 54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI:
ffff99a82f680000
[ 54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09:
0000000000000000
[ 54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12:
0000000000000000
[ 54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15:
0000000000000001
[ 54.075400] FS: 00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000)
knlGS:0000000000000000
[ 54.075988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4:
0000000000750ef0
[ 54.076927] PKRU: 55555554
[ 54.077132] Call Trace:
[ 54.077319] <TASK>
[ 54.077484] ? show_regs+0x69/0x80
[ 54.077747] ? __die+0x28/0x70
[ 54.077979] ? page_fault_oops+0x180/0x4b0
[ 54.078286] ? do_user_addr_fault+0x2d2/0x680
[ 54.078610] ? exc_page_fault+0x84/0x190
[ 54.078910] ? asm_exc_page_fault+0x2b/0x30
[ 54.079224] ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[ 54.079941] ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu]
[ 54.080617] vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu]
[ 54.081316] amdgpu_device_ip_set_powergating_state+0x64/0xc0
[amdgpu]
[ 54.082057] amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu]
[ 54.082727] amdgpu_ring_alloc+0x44/0x70 [amdgpu]
[ 54.083351] amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu]
[ 54.084054] amdgpu_ring_test_helper+0x22/0x90 [amdgpu]
[ 54.084698] vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu]
[ 54.085307] amdgpu_device_init+0x1f96/0x2780 [amdgpu]
[ 54.085951] amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu]
[ 54.086591] amdgpu_pci_probe+0x19f/0x550 [amdgpu]
[ 54.087215] local_pci_probe+0x48/0xa0
[ 54.087509] pci_device_probe+0xc9/0x250
[ 54.087812] really_probe+0x1a4/0x3f0
[ 54.088101] __driver_probe_device+0x7d/0x170
[ 54.088443] driver_probe_device+0x24/0xa0
[ 54.088765] __driver_attach+0xdd/0x1d0
[ 54.089068] ? __pfx___driver_attach+0x10/0x10
[ 54.089417] bus_for_each_dev+0x8e/0xe0
[ 54.089718] driver_attach+0x22/0x30
[ 54.090000] bus_add_driver+0x120/0x220
[ 54.090303] driver_register+0x62/0x120
[ 54.090606] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[ 54.091255] __pci_register_driver+0x62/0x70
[ 54.091593] amdgpu_init+0x67/0xff0 [amdgpu]
[ 54.092190] do_one_initcall+0x5f/0x330
[ 54.092495] do_init_module+0x68/0x240
[ 54.092794] load_module+0x201c/0x2110
[ 54.093093] init_module_from_file+0x97/0xd0
[ 54.093428] ? init_module_from_file+0x97/0xd0
[ 54.093777] idempotent_init_module+0x11c/0x2a0
[ 54.094134] __x64_sys_finit_module+0x64/0xc0
[ 54.094476] do_syscall_64+0x58/0x120
[ 54.094767] entry_SYSCALL_64_after_hwframe+0x6e/0x76
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit 0b071245ddd98539d4f7493bdd188417fcf2d629)
|
|
The eeprom table is empty before initializing,
set eeprom table version first before initializing.
Changed from V1:
Reuse amdgpu_ras_set_eeprom_table_version function
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 015b8a2fdf39a4c288ff24e7b715b8d9198e56dc)
|
|
The ras command shared memory is allocated from
VRAM and the response status of the command
buffer will not be zero due to gpu being in
fatal error state after ras UE error injection.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8284951a6e79c6806c675e5f68a4cd425dd56bc4)
|
|
For VCN/JPEG 4.0.3, use only the local addressing scheme.
- Mask bit higher than AID0 range
v2
remain the case for mmhub use master XCC
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit caaf576292f8ccef5cdc0ac16e77b87dbf6e17ab)
|
|
VCN 4.0.3 does not HDP flush with RRMT enabled. Instead, mmsch
will do the HDP flush.
This change is necessary for VCN v4.0.3, no need for backward compatibility
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 49cfaebe48e97500a68d5322a8194736b0a2c3cf)
|
|
JPEG v4.0.3 doesn't support HDP flush when RRMT is enabled. Instead,
mmsch fw will do the flush.
This change is necessary for JPEG v4.0.3, no need for backward compatibility
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jane Jian <Jane.Jian@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 585e3fdb36f59c5cfed0ae06c852dc1df22b1d60)
|
|
Return 0 to avoid returning an uninitialized variable r.
Cc: stable@vger.kernel.org
Fixes: 230dd6bb6117 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10")
Signed-off-by: Ma Ke <make24@iscas.ac.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6472de66c0aa18d50a4b5ca85f8272e88a737676)
|
|
If PCIe supports atomics, configure register to prevent DF from
breaking atomics in separate load/store operations.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 666f14cab21b17ccc1bdfe1e82458aa429b3b7e0)
|
|
We seem to have a case where SDMA will sometimes miss a doorbell
if GFX is entering the powergating state when the doorbell comes in.
To workaround this, we can update the wptr via MMIO, however,
this is only safe because we disallow gfxoff in begin_ring() for
SDMA 5.2 and then allow it again in end_ring().
Enable this workaround while we are root causing the issue with
the HW team.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/3440
Tested-by: Friedrich Vock <friedrich.vock@gmx.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit f2ac52634963fc38e4935e11077b6f7854e5d700)
|
|
[Why]
Page table of compute VM in the VRAM will lost after gpu reset.
VRAM won't be restored since compute VM has no shadows.
[How]
Use higher 32-bit of vm->generation to record a vram_lost_counter.
Reset the VM state machine when vm->genertaion is not equal to
the new generation token.
v2: Check vm->generation instead of calling drm_sched_entity_error
in amdgpu_vm_validate.
v3: Use new generation token instead of vram_lost_counter for check.
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
To prevent below probe failure, add a check for models with VCN
IP v4.0.6 where VCN1 may be harvested.
v2:
Apply the same check to VCN IP v4.0 and v5.0.
[ 54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[ 54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43
c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01 f2 48 89 b2 50 f6 02
00 <89> 01 48 8b 82 50 f6 02 00 48 8d 48 04 48 89 8a 50 f6 02 00 c7 00
[ 54.072408] RSP: 0018:ffffb17985f736f8 EFLAGS: 00010286
[ 54.072793] RAX: 00000000000000d6 RBX: ffff99a82f680000 RCX:
0000000000000000
[ 54.073315] RDX: ffff99a82f680000 RSI: 0000000000000004 RDI:
ffff99a82f680000
[ 54.073835] RBP: ffffb17985f73730 R08: 0000000000000001 R09:
0000000000000000
[ 54.074353] R10: 0000000000000008 R11: ffffb17983c05000 R12:
0000000000000000
[ 54.074879] R13: 0000000000000000 R14: ffff99a82f680000 R15:
0000000000000001
[ 54.075400] FS: 00007f8d9c79a000(0000) GS:ffff99ab2f140000(0000)
knlGS:0000000000000000
[ 54.075988] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 54.076408] CR2: 0000000000000000 CR3: 0000000140c3a000 CR4:
0000000000750ef0
[ 54.076927] PKRU: 55555554
[ 54.077132] Call Trace:
[ 54.077319] <TASK>
[ 54.077484] ? show_regs+0x69/0x80
[ 54.077747] ? __die+0x28/0x70
[ 54.077979] ? page_fault_oops+0x180/0x4b0
[ 54.078286] ? do_user_addr_fault+0x2d2/0x680
[ 54.078610] ? exc_page_fault+0x84/0x190
[ 54.078910] ? asm_exc_page_fault+0x2b/0x30
[ 54.079224] ? vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu]
[ 54.079941] ? vcn_v4_0_5_start_dpg_mode+0xe6/0x36b0 [amdgpu]
[ 54.080617] vcn_v4_0_5_set_powergating_state+0x82/0x19b0 [amdgpu]
[ 54.081316] amdgpu_device_ip_set_powergating_state+0x64/0xc0
[amdgpu]
[ 54.082057] amdgpu_vcn_ring_begin_use+0x6f/0x1d0 [amdgpu]
[ 54.082727] amdgpu_ring_alloc+0x44/0x70 [amdgpu]
[ 54.083351] amdgpu_vcn_dec_sw_ring_test_ring+0x40/0x110 [amdgpu]
[ 54.084054] amdgpu_ring_test_helper+0x22/0x90 [amdgpu]
[ 54.084698] vcn_v4_0_5_hw_init+0x87/0xc0 [amdgpu]
[ 54.085307] amdgpu_device_init+0x1f96/0x2780 [amdgpu]
[ 54.085951] amdgpu_driver_load_kms+0x1e/0xc0 [amdgpu]
[ 54.086591] amdgpu_pci_probe+0x19f/0x550 [amdgpu]
[ 54.087215] local_pci_probe+0x48/0xa0
[ 54.087509] pci_device_probe+0xc9/0x250
[ 54.087812] really_probe+0x1a4/0x3f0
[ 54.088101] __driver_probe_device+0x7d/0x170
[ 54.088443] driver_probe_device+0x24/0xa0
[ 54.088765] __driver_attach+0xdd/0x1d0
[ 54.089068] ? __pfx___driver_attach+0x10/0x10
[ 54.089417] bus_for_each_dev+0x8e/0xe0
[ 54.089718] driver_attach+0x22/0x30
[ 54.090000] bus_add_driver+0x120/0x220
[ 54.090303] driver_register+0x62/0x120
[ 54.090606] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[ 54.091255] __pci_register_driver+0x62/0x70
[ 54.091593] amdgpu_init+0x67/0xff0 [amdgpu]
[ 54.092190] do_one_initcall+0x5f/0x330
[ 54.092495] do_init_module+0x68/0x240
[ 54.092794] load_module+0x201c/0x2110
[ 54.093093] init_module_from_file+0x97/0xd0
[ 54.093428] ? init_module_from_file+0x97/0xd0
[ 54.093777] idempotent_init_module+0x11c/0x2a0
[ 54.094134] __x64_sys_finit_module+0x64/0xc0
[ 54.094476] do_syscall_64+0x58/0x120
[ 54.094767] entry_SYSCALL_64_after_hwframe+0x6e/0x76
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
avoid kfd init crash in that case.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For the bad opcode case, it will cause CP/ME hang.
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.
And the driver needs to perform a vmid reset when receiving the interrupt.
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For the bad opcode case, it will cause CP/ME hang.
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.
And the driver needs to perform a vmid reset when receiving the interrupt.
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For the bad opcode case, it will cause CP/ME hang.
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.
And the driver needs to perform a vmid reset when receiving the interrupt.
v2: update irq naming (drop priv) (Alex)
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For the bad opcode case, it will cause CP/ME hang.
The firmware will prevent the ME side from hanging by raising a bad opcode interrupt.
And the driver needs to perform a vmid reset when receiving the interrupt.
v2: update irq naming (drop priv) (Alex)
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|