summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2024-11-12drm/amd: Fix initialization mistake for NBIO 7.7.0Vijendar Mukunda
There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME events while in the D0 state. Co-developed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 447a54a0f79c9a409ceaa17804bdd2e0206397b9) Cc: stable@vger.kernel.org
2024-11-12drm/amdgpu: enable GTT fallback handling for dGPUs onlyChristian König
That is just a waste of time on APUs. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3704 Fixes: 216c1282dde3 ("drm/amdgpu: use GTT only as fallback for VRAM|GTT") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit e8fc090d322346e5ce4c4cfe03a8100e31f61c3c) Cc: stable@vger.kernel.org
2024-11-11drm/amdgpu/mes12: correct kiq unmap latencyJack Xiao
Correct kiq unmap queue timeout value. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit cfe98204a06329b6b7fce1b828b7d620473181ff) Cc: stable@vger.kernel.org # 6.11.x
2024-11-11drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()Christian König
The coherency flags can only be determined when the BO is locked and that in turn is only guaranteed when the mapping is validated. Fix the check, move the resource check into the function and add an assert that the BO is locked. Signed-off-by: Christian König <christian.koenig@amd.com> Fixes: d1a372af1c3d ("drm/amdgpu: Set MTYPE in PTE based on BO flags") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1b4ca8546f5b5c482717bedb8e031227b1541539) Cc: stable@vger.kernel.org
2024-11-11drm/amdgpu: Fix video caps for H264 and HEVC encode maximum sizeDavid Rosca
H264 supports 4096x4096 starting from Polaris. HEVC also supports 4096x4096, with VCN 3 and newer 8192x4352 is supported. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 69e9a9e65b1ea542d07e3fdd4222b46e9f5a3a29) Cc: stable@vger.kernel.org
2024-11-05drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()Alex Deucher
Avoid a possible buffer overflow if size is larger than 4K. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f5d873f5825b40d886d03bd2aede91d4cf002434) Cc: stable@vger.kernel.org
2024-11-05drm/amdgpu: Adjust debugfs eviction and IB access permissionsAlex Deucher
Users should not be able to run these. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7ba9395430f611cfc101b1c2687732baafa239d5) Cc: stable@vger.kernel.org
2024-11-05drm/amdgpu: Adjust debugfs register access permissionsAlex Deucher
Regular users shouldn't have read access. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c0cfd2e652553d607b910be47d0cc5a7f3a78641) Cc: stable@vger.kernel.org
2024-11-05drm/amdgpu: Fix DPX valid mode check on GC 9.4.3Lijo Lazar
For DPX mode, the number of memory partitions supported should be less than or equal to 2. Fixes: 1589c82a1085 ("drm/amdgpu: Check memory ranges for valid xcp mode") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 990c4f580742de7bb78fa57420ffd182fc3ab4cd) Cc: stable@vger.kernel.org
2024-11-04drm/amdgpu: prevent NULL pointer dereference if ATIF is not supportedAntonio Quartulli
acpi_evaluate_object() may return AE_NOT_FOUND (failure), which would result in dereferencing buffer.pointer (obj) while being NULL. Although this case may be unrealistic for the current code, it is still better to protect against possible bugs. Bail out also when status is AE_NOT_FOUND. This fixes 1 FORWARD_NULL issue reported by Coverity Report: CID 1600951: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Fixes: c9b7c809b89f ("drm/amd: Guard against bad data for ATIF ACPI method") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91c9e221fe2553edf2db71627d8453f083de87a1) Cc: stable@vger.kernel.org
2024-10-22drm/amdgpu: fix random data corruption for sdma 7Frank Min
There is random data corruption caused by const fill, this is caused by write compression mode not correctly configured. So correct compression mode for const fill. Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 75400f8d6e36afc88d59db8a1f3e4b7d90d836ad) Cc: stable@vger.kernel.org # 6.11.x
2024-10-22drm/amd: Guard against bad data for ATIF ACPI methodMario Limonciello
If a BIOS provides bad data in response to an ATIF method call this causes a NULL pointer dereference in the caller. ``` ? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1)) ? __die (arch/x86/kernel/dumpstack.c:423 arch/x86/kernel/dumpstack.c:434) ? page_fault_oops (arch/x86/mm/fault.c:544 (discriminator 2) arch/x86/mm/fault.c:705 (discriminator 2)) ? do_user_addr_fault (arch/x86/mm/fault.c:440 (discriminator 1) arch/x86/mm/fault.c:1232 (discriminator 1)) ? acpi_ut_update_object_reference (drivers/acpi/acpica/utdelete.c:642) ? exc_page_fault (arch/x86/mm/fault.c:1542) ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623) ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:387 (discriminator 2)) amdgpu ? amdgpu_atif_query_backlight_caps.constprop.0 (drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c:386 (discriminator 1)) amdgpu ``` It has been encountered on at least one system, so guard for it. Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c9b7c809b89f24e9372a4e7f02d64c950b07fdee) Cc: stable@vger.kernel.org
2024-10-15drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ringSrinivasan Shanmugam
This patch addresses a double unlock issue in the amdgpu_mes_add_ring function. The mutex was being unlocked twice under certain error conditions, which could lead to undefined behavior. The fix ensures that the mutex is unlocked only once before jumping to the clean_up_memory label. The unlock operation is moved to just before the goto statement within the conditional block that checks the return value of amdgpu_ring_init. This prevents the second unlock attempt after the clean_up_memory label, which is no longer necessary as the mutex is already unlocked by this point in the code flow. This change resolves the potential double unlock and maintains the correct mutex handling throughout the function. Fixes below: Commit d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue submission"), leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c:1240 amdgpu_mes_add_ring() warn: double unlock '&adev->mes.mutex_hidden' (orig line 1213) drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 1143 int amdgpu_mes_add_ring(struct amdgpu_device *adev, int gang_id, 1144 int queue_type, int idx, 1145 struct amdgpu_mes_ctx_data *ctx_data, 1146 struct amdgpu_ring **out) 1147 { 1148 struct amdgpu_ring *ring; 1149 struct amdgpu_mes_gang *gang; 1150 struct amdgpu_mes_queue_properties qprops = {0}; 1151 int r, queue_id, pasid; 1152 1153 /* 1154 * Avoid taking any other locks under MES lock to avoid circular 1155 * lock dependencies. 1156 */ 1157 amdgpu_mes_lock(&adev->mes); 1158 gang = idr_find(&adev->mes.gang_id_idr, gang_id); 1159 if (!gang) { 1160 DRM_ERROR("gang id %d doesn't exist\n", gang_id); 1161 amdgpu_mes_unlock(&adev->mes); 1162 return -EINVAL; 1163 } 1164 pasid = gang->process->pasid; 1165 1166 ring = kzalloc(sizeof(struct amdgpu_ring), GFP_KERNEL); 1167 if (!ring) { 1168 amdgpu_mes_unlock(&adev->mes); 1169 return -ENOMEM; 1170 } 1171 1172 ring->ring_obj = NULL; 1173 ring->use_doorbell = true; 1174 ring->is_mes_queue = true; 1175 ring->mes_ctx = ctx_data; 1176 ring->idx = idx; 1177 ring->no_scheduler = true; 1178 1179 if (queue_type == AMDGPU_RING_TYPE_COMPUTE) { 1180 int offset = offsetof(struct amdgpu_mes_ctx_meta_data, 1181 compute[ring->idx].mec_hpd); 1182 ring->eop_gpu_addr = 1183 amdgpu_mes_ctx_get_offs_gpu_addr(ring, offset); 1184 } 1185 1186 switch (queue_type) { 1187 case AMDGPU_RING_TYPE_GFX: 1188 ring->funcs = adev->gfx.gfx_ring[0].funcs; 1189 ring->me = adev->gfx.gfx_ring[0].me; 1190 ring->pipe = adev->gfx.gfx_ring[0].pipe; 1191 break; 1192 case AMDGPU_RING_TYPE_COMPUTE: 1193 ring->funcs = adev->gfx.compute_ring[0].funcs; 1194 ring->me = adev->gfx.compute_ring[0].me; 1195 ring->pipe = adev->gfx.compute_ring[0].pipe; 1196 break; 1197 case AMDGPU_RING_TYPE_SDMA: 1198 ring->funcs = adev->sdma.instance[0].ring.funcs; 1199 break; 1200 default: 1201 BUG(); 1202 } 1203 1204 r = amdgpu_ring_init(adev, ring, 1024, NULL, 0, 1205 AMDGPU_RING_PRIO_DEFAULT, NULL); 1206 if (r) 1207 goto clean_up_memory; 1208 1209 amdgpu_mes_ring_to_queue_props(adev, ring, &qprops); 1210 1211 dma_fence_wait(gang->process->vm->last_update, false); 1212 dma_fence_wait(ctx_data->meta_data_va->last_pt_update, false); 1213 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1214 1215 r = amdgpu_mes_add_hw_queue(adev, gang_id, &qprops, &queue_id); 1216 if (r) 1217 goto clean_up_ring; ^^^^^^^^^^^^^^^^^^ 1218 1219 ring->hw_queue_id = queue_id; 1220 ring->doorbell_index = qprops.doorbell_off; 1221 1222 if (queue_type == AMDGPU_RING_TYPE_GFX) 1223 sprintf(ring->name, "gfx_%d.%d.%d", pasid, gang_id, queue_id); 1224 else if (queue_type == AMDGPU_RING_TYPE_COMPUTE) 1225 sprintf(ring->name, "compute_%d.%d.%d", pasid, gang_id, 1226 queue_id); 1227 else if (queue_type == AMDGPU_RING_TYPE_SDMA) 1228 sprintf(ring->name, "sdma_%d.%d.%d", pasid, gang_id, 1229 queue_id); 1230 else 1231 BUG(); 1232 1233 *out = ring; 1234 return 0; 1235 1236 clean_up_ring: 1237 amdgpu_ring_fini(ring); 1238 clean_up_memory: 1239 kfree(ring); --> 1240 amdgpu_mes_unlock(&adev->mes); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1241 return r; 1242 } Fixes: d0c423b64765 ("drm/amdgpu/mes: use ring for kernel queue submission") Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Suggested-by: Jack Xiao <Jack.Xiao@amd.com> Reported by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bfaf1883605fd0c0dbabacd67ed49708470d5ea4)
2024-10-15drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipesMichael Chen
With Unified MES enabled in gfx12, need separate event log buffer for the 2 MES pipes to avoid data overwrite. Signed-off-by: Michael Chen <michael.chen@amd.com> Reviewed-by: Jack Xiao <Jack.Xiao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 144df260f3daab42c4611021f929b3342de516e5) Cc: stable@vger.kernel.org # 6.11.x
2024-10-15drm/amdgpu: prevent BO_HANDLES error from being overwrittenMohammed Anees
Before this patch, if multiple BO_HANDLES chunks were submitted, the error -EINVAL would be correctly set but could be overwritten by the return value from amdgpu_cs_p1_bo_handles(). This patch ensures that if there are multiple BO_HANDLES, we stop. Fixes: fec5f8e8c6bc ("drm/amdgpu: disallow multiple BO_HANDLES chunks in one submit") Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 40f2cd98828f454bdc5006ad3d94330a5ea164b7) Cc: stable@vger.kernel.org
2024-10-15drm/amdgpu: enable enforce_isolation sysfs node on VFsAlex Deucher
It should be enabled on both bare metal and VFs. Fixes: e189be9b2e38 ("drm/amdgpu: Add enforce_isolation sysfs attribute") Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> (cherry picked from commit dc8847b054fd6679866ed4ee861e069e54c10799)
2024-10-09Merge tag 'amd-drm-fixes-6.12-2024-10-08' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.12-2024-10-08: amdgpu: - Fix invalid UBSAN warnings - Fix artifacts in MPO transitions - Hibernation fix amdkfd: - Fix an eviction fence leak radeon: - Add late register for connectors - Always set GEM function pointers Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241008142831.3739244-1-alexander.deucher@amd.com
2024-10-07drm/amdkfd: Fix an eviction fence leakLang Yu
Only creating a new reference for each process instead of each VM. Fixes: 9a1c1339abf9 ("drm/amdkfd: Run restore_workers on freezable WQs") Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Lang Yu <lang.yu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 5fa436289483ae56427b0896c31f72361223c758) Cc: stable@vger.kernel.org
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-09-28Merge tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular fixes for the week to end the merge window, i915 and xe have a few each, amdgpu makes up most of it with a bunch of SR-IOV related fixes amongst others. i915: - Fix BMG support to UHBR13.5 - Two PSR fixes - Fix colorimetry detection for DP xe: - Fix macro for checking minimum GuC version - Fix CCS offset calculation for some BMG SKUs - Fix locking on memory usage reporting via fdinfo and BO destroy - Fix GPU page fault handler on a closed VM - Fix overflow in oa batch buffer amdgpu: - MES 12 fix - KFD fence sync fix - SR-IOV fixes - VCN 4.0.6 fix - SDMA 7.x fix - Bump driver version to note cleared VRAM support - SWSMU fix - CU occupancy logic fix - SDMA queue fix" * tag 'drm-next-2024-09-28' of https://gitlab.freedesktop.org/drm/kernel: (79 commits) drm/amd/pm: update workload mask after the setting drm/amdgpu: bump driver version for cleared VRAM drm/amdgpu: fix vbios fetching for SR-IOV drm/amdgpu: fix PTE copy corruption for sdma 7 drm/amdkfd: Add SDMA queue quantum support for GFX12 drm/amdgpu/vcn: enable AV1 on both instances drm/amdkfd: Fix CU occupancy for GFX 9.4.3 drm/amdkfd: Update logic for CU occupancy calculations drm/amdgpu: skip coredump after job timeout in SRIOV drm/amdgpu: sync to KFD fences before clearing PTEs drm/amdgpu/mes12: set enable_level_process_quantum_check drm/i915/dp: Fix colorimetry detection drm/amdgpu/mes12: reduce timeout drm/amdgpu/mes11: reduce timeout drm/amdgpu: use GEM references instead of TTMs v2 drm/amd/display: Allow backlight to go below `AMDGPU_DM_DEFAULT_MIN_BACKLIGHT` drm/amd/display: Fix kdoc entry for 'tps' in 'dc_process_dmub_dpia_set_tps_notification' drm/amdgpu: update golden regs for gfx12 drm/amdgpu: clean up vbios fetching code drm/amd/display: handle nulled pipe context in DCE110's set_drr() ...
2024-09-26drm/amdgpu: bump driver version for cleared VRAMAlex Deucher
Driver now clears VRAM on allocation. Bump the driver version so mesa knows when it will get cleared vram by default. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-09-26drm/amdgpu: fix vbios fetching for SR-IOVAlex Deucher
SR-IOV fetches the vbios from VRAM in some cases. Re-enable the VRAM path for dGPUs and rename the function to make it clear that it is not IGP specific. Fixes: 042658d17a54 ("drm/amdgpu: clean up vbios fetching code") Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Tested-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26drm/amdgpu: fix PTE copy corruption for sdma 7Frank Min
Without setting dcc bit, there is ramdon PTE copy corruption on sdma 7. so add this bit and update the packet format accordingly. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-09-25drm/amdgpu/vcn: enable AV1 on both instancesSaleemkhan Jamadar
v1 - remove cs parse code (Christian) On VCN v4_0_6 AV1 is supported on both the instances. Remove cs IB parse code since explict handling of AV1 schedule is not required. Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-09-25drm/amdkfd: Fix CU occupancy for GFX 9.4.3Mukul Joshi
Make CU occupancy calculations work on GFX 9.4.3 by updating the logic to handle multiple XCCs correctly. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-25drm/amdkfd: Update logic for CU occupancy calculationsMukul Joshi
Currently, the code uses the IH_VMID_X_LUT register to map a queue's vmid to the corresponding PASID. This logic is racy since CP can update the VMID-PASID mapping anytime especially when there are more processes than number of vmids. Update the logic to calculate CU occupancy by matching doorbell offset of the queue with valid wave counts against the process's queues. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-25drm/amdgpu: skip coredump after job timeout in SRIOVZhenGuo Yin
VF FLR will be triggered by host driver before job timeout, hence the error status of GPU get cleared. Performing a coredump here is unnecessary. Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-25drm/amdgpu: sync to KFD fences before clearing PTEsChristian König
This patch tries to solve the basic problem we also need to sync to the KFD fences of the BO because otherwise it can be that we clear PTEs while the KFD queues are still running. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-25drm/amdgpu/mes12: set enable_level_process_quantum_checkJack Xiao
enable_level_process_quantum_check is requried to enable process quantum based scheduling. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-09-23Merge tag 'pull-stable-struct_fd' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull 'struct fd' updates from Al Viro: "Just the 'struct fd' layout change, with conversion to accessor helpers" * tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: add struct fd constructors, get rid of __to_fd() struct fd: representation change introduce fd_file(), convert all accessors to it.
2024-09-19Merge tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm updates from Dave Airlie: "This adds a couple of patches outside the drm core, all should be acked appropriately, the string and pstore ones are the main ones that come to mind. Otherwise it's the usual drivers, xe is getting enabled by default on some new hardware, we've changed the device number handling to allow more devices, and we added some optional rust code to create QR codes in the panic handler, an idea first suggested I think 10 years ago :-) string: - add mem_is_zero() core: - support more device numbers - use XArray for minor ids - add backlight constants - Split dma fence array creation into alloc and arm fbdev: - remove usage of old fbdev hooks kms: - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support dma-buf: - docs cleanup buddy: - Add start address support for trim function printk: - pass description to kmsg_dump scheduler: - Remove full_recover from drm_sched_start ttm: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory panic: - add display QR code (in rust) displayport: - mst: GUID improvements bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable - lt9611uxc: require DRM_BRIDGE_ATTACH_NO_CONNECTOR - anx7625: simplify OF array handling - dw-hdmi: simplify clock handling - lontium-lt8912b: fix mode validation - nwl-dsi: fix mode vsync/hsync polarity xe: - Enable LunarLake and Battlemage support - Introducing Xe2 ccs modifiers for integrated and discrete graphics - rename xe perf to xe observation - use wb caching on DGFX for system memory - add fence timeouts - Lunar Lake graphics/media/display workarounds - Battlemage workarounds - Battlemage GSC support - GSC and HuC fw updates for LL/BM - use dma_fence_chain_free - refactor hw engine lookup and mmio access - enable priority mem read for Xe2 - Add first GuC BMG fw - fix dma-resv lock - Fix DGFX display suspend/resume - Use xe_managed for kernel BOs - Use reserved copy engine for user binds on faulting devices - Allow mixing dma-fence jobs and long-running faulting jobs - fix media TLB invalidation - fix rpm in TTM swapout path - track resources and VF state by PF i915: - Type-C programming fix for MTL+ - FBC cleanup - Calc vblank delay more accurately - On DP MST, Enable LT fallback for UHBR<->non-UHBR rates - Fix DP LTTPR detection - limit relocations to INT_MAX - fix long hangs in buddy allocator on DG2/A380 amdgpu: - Per-queue reset support - SDMA devcoredump support - DCN 4.0.1 updates - GFX12/VCN4/JPEG4 updates - Convert vbios embedded EDID to drm_edid - GFX9.3/9.4 devcoredump support - process isolation framework for GFX 9.4.3/4 - take IOMMU mappings into account for P2P DMA amdkfd: - CRIU fixes - HMM fix - Enable process isolation support for GFX 9.4.3/4 - Allow users to target recommended SDMA engines - KFD support for targetting queues on recommended SDMA engines radeon: - remove .load and drm_dev_alloc - Fix vbios embedded EDID size handling - Convert vbios embedded EDID to drm_edid - Use GEM references instead of TTM - r100 cp init cleanup - Fix potential overflows in evergreen CS offset tracking msm: - DPU: - implement DP/PHY mapping on SC8180X - Enable writeback on SM8150, SC8180X, SM6125, SM6350 - DP: - Enable widebus on all relevant chipsets - MSM8998 HDMI support - GPU: - A642L speedbin support - A615/A306/A621 support - A7xx devcoredump support ast: - astdp: Support AST2600 with VGA - Clean up HPD - Fix timeout loop for DP link training - reorganize output code by type (VGA, DP, etc) - convert to struct drm_edid - fix BMC handling for all outputs exynos: - drop stale MAINTAINERS pattern - constify struct loongson: - use GEM refcount over TTM mgag200: - Improve BMC handling - Support VBLANK intterupts - transparently support BMC outputs nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's gm12u320: - convert to struct drm_edid gma500: - update i2c terms lcdif: - pixel clock fix host1x: - fix syncpoint IRQ during resume - use iommu_paging_domain_alloc() imx: - ipuv3: convert to struct drm_edid omapdrm: - improve error handling - use common helper for_each_endpoint_of_node() panel: - add support for BOE TV101WUM-LL2 plus DT bindings - novatek-nt35950: improve error handling - nv3051d: improve error handling - panel-edp: - add support for BOE NE140WUM-N6G - revert support for SDC ATNA45AF01 - visionox-vtdr6130: - improve error handling - use devm_regulator_bulk_get_const() - boe-th101mb31ig002: - Support for starry-er88577 MIPI-DSI panel plus DT - Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: - Support Melfas lmfbx101117480 MIPI-DSI panel plus DT - Refactor for code sharing - panel-edp: fix name for HKC MB116AN01 - jd9365da: fix "exit sleep" commands - jdi-fhd-r63452: simplify error handling with DSI multi-style helpers - mantix-mlaf057we51: simplify error handling with DSI multi-style helpers - simple: - support Innolux G070ACE-LH3 plus DT bindings - support On Tat Industrial Company KD50G21-40NT-A1 plus DT bindings - st7701: - decouple DSI and DRM code - add SPI support - support Anbernic RG28XX plus DT bindings mediatek: - support alpha blending - remove cl in struct cmdq_pkt - ovl adaptor fix - add power domain binding for mediatek DPI controller renesas: - rz-du: add support for RZ/G2UL plus DT bindings rockchip: - Improve DP sink-capability reporting - dw_hdmi: Support 4k@60Hz - vop: - Support RGB display on Rockchip RK3066 - Support 4096px width sti: - convert to struct drm_edid stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: - Fix transparency after disabling plane - Remove unused interrupt tegra: - gr3d: improve PM domain handling - convert to struct drm_edid - Call drm_atomic_helper_shutdown() vc4: - fix PM during detect - replace DRM_ERROR() with drm_error() - v3d: simplify clock retrieval v3d: - Clean up perfmon virtio: - add DRM capset" * tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel: (1326 commits) drm/xe: Fix missing conversion to xe_display_pm_runtime_resume drm/xe/xe2hpg: Add Wa_15016589081 drm/xe: Don't keep stale pointer to bo->ggtt_node drm/xe: fix missing 'xe_vm_put' drm/xe: fix build warning with CONFIG_PM=n drm/xe: Suppress missing outer rpm protection warning drm/xe: prevent potential UAF in pf_provision_vf_ggtt() drm/amd/display: Add all planes on CRTC to state for overlay cursor drm/i915/bios: fix printk format width drm/i915/display: Fix BMG CCS modifiers drm/amdgpu: get rid of bogus includes of fdtable.h drm/amdkfd: CRIU fixes drm/amdgpu: fix a race in kfd_mem_export_dmabuf() drm: new helper: drm_gem_prime_handle_to_dmabuf() drm/amdgpu/atomfirmware: Silence UBSAN warning drm/amdgpu: Fix kdoc entry in 'amdgpu_vm_cpu_prepare' drm/amd/amdgpu: apply command submission parser for JPEG v1 drm/amd/amdgpu: apply command submission parser for JPEG v2+ drm/amd/pm: fix the pp_dpm_pcie issue on smu v14.0.2/3 drm/amd/pm: update the features set on smu v14.0.2/3 ...
2024-09-18drm/amdgpu/mes12: reduce timeoutAlex Deucher
The firmware timeout is 2s. Reduce the driver timeout to 2.1 seconds to avoid back pressure on queue submissions. Fixes: 94b51a3d01ed ("drm/amdgpu/mes12: increase mes submission timeout") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-09-18drm/amdgpu/mes11: reduce timeoutAlex Deucher
The firmware timeout is 2s. Reduce the driver timeout to 2.1 seconds to avoid back pressure on queue submissions. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3627 Fixes: f7c161a4c250 ("drm/amdgpu: increase mes submission timeout") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2024-09-18drm/amdgpu: use GEM references instead of TTMs v2Christian König
Instead of a TTM reference grab a GEM reference whenever necessary. v2: fix typo in amdgpu_bo_unref pointed out by Vitaly, initialize the GEM funcs for kernel allocations as well. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: update golden regs for gfx12Frank Min
update golden regs for gfx12 Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-09-18drm/amdgpu: clean up vbios fetching codeAlex Deucher
After splitting the logic between APU and dGPU, clean up some of the APU and dGPU specific logic that no longer applied. Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu/bios: split vbios fetching between APU and dGPUAlex Deucher
We need some different logic for dGPUs and the APU path can be simplified because there are some methods which are never used on APUs. This also fixes a regression on some older APUs causing the driver to fetch the unpatched ROM image rather than the patched image. Fixes: 9c081c11c621 ("drm/amdgpu: Reorder to read EFI exported ROM first") Reviewed-by: George Zhang <George.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: remove amdgpu_pin_restricted()Christian König
We haven't used the functionality to pin BOs in a certain range at all while the driver existed. Just nuke it. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: explicitely set the AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS flagChristian König
Instead of having that in the amdgpu_bo_pin() function applied for all pinned BOs. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: Fix XCP instance mask calculationLijo Lazar
Fix instance mask calculation for VCN IP. There are cases where VCN instance could be shared across partitions. Fix here so that other blocks don't need to check for any shared instances based on partition mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: Fix get each xcp macroAsad Kamal
Fix get each xcp macro to loop over each partition correctly Fixes: 4bdca2057933 ("drm/amdgpu: Add utility functions for xcp") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: load sos binary properly on the basis of pmfw versionLe Ma
To be compatible with legacy IFWI, driver needs to carry legacy tOS and query pmfw version to load them accordingly. Add psp_firmware_header_v2_1 to handle the combined sos binary. Double the sos count limit for the case of aux sos fw packed. v2: pass the correct fw_bin_desc to parse_sos_bin_descriptor Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: add psp funcs callback to check if aux fw is neededLe Ma
Query pmfw version to determine if aux sos fw needs to be loaded in psp v13.0. v2: refine callback to check if aux_fw loading is needed instead of getting pmfw version barely v3: return the comparison directly Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: nuke the VM PD/PT shadow handlingChristian König
This was only used as workaround for recovering the page tables after VRAM was lost and is no longer necessary after the function amdgpu_vm_bo_reset_state_machine() started to do the same. Compute never used shadows either, so the only proplematic case left is SVM and that is most likely not recoverable in any way when VRAM is lost. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu/gfx9.4.3: Explicitly halt MEC before initAlex Deucher
Need to make sure it's halted as we don't know what state the GPU may have been left in previously. Tested-by: Amber Lin <Amber.Lin@amd.com> Acked-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu/gfx9.4.3: set additional bits on MEC haltAlex Deucher
Need to set the pipe reset and cache invalidation bits on halt otherwise we can get stale state if the CP firmware changes (e.g., on module unload and reload). Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Amber Lin <Amber.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: Fix selfring initialization sequence on soc24David Belanger
Move enable_doorbell_selfring_aperture from common_hw_init to common_late_init in soc24, otherwise selfring aperture is initialized with an incorrect doorbell aperture base. Port changes from this commit from soc21 to soc24: commit 1c312e816c40 ("drm/amdgpu: Enable doorbell selfring after resize FB BAR") Signed-off-by: David Belanger <david.belanger@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-09-18drm/amdgpu/mes12: switch SET_SHADER_DEBUGGER pkt to mes schq pipeJack Xiao
The SET_SHADER_DEBUGGER packet must work with the added hardware queue, switch the packet submitting to mes schq pipe. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.11.x
2024-09-18drm/amdgpu: Fix a typoAndrew Kreimer
Fix a typo in comments. Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-18drm/amdgpu: fix typo in the commentYan Zhen
Correctly spelled comments make it easier for the reader to understand the code. Replace 'udpate' with 'update' in the comment & replace 'recieved' with 'received' in the comment & replace 'dsiable' with 'disable' in the comment & replace 'Initiailize' with 'Initialize' in the comment & replace 'disble' with 'disable' in the comment & replace 'Disbale' with 'Disable' in the comment & replace 'enogh' with 'enough' in the comment & replace 'availabe' with 'available' in the comment. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Yan Zhen <yanzhen@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>