summaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)Author
2024-11-07drm/sched: Improve teardown documentationPhilipp Stanner
If jobs are still enqueued in struct drm_gpu_scheduler.pending_list when drm_sched_fini() gets called, those jobs will be leaked since that function stops both job-submission and (automatic) job-cleanup. It is, thus, up to the driver to take care of preventing leaks. The related function drm_sched_wqueue_stop() also prevents automatic job cleanup. Those pitfals are not reflected in the documentation, currently. Explicitly inform about the leak problem in the docstring of drm_sched_fini(). Additionally, detail the purpose of drm_sched_wqueue_{start,stop} and hint at the consequences for automatic cleanup. Signed-off-by: Philipp Stanner <pstanner@redhat.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105143137.71893-2-pstanner@redhat.com
2024-11-07drm/i915/request: Remove unnecessary modification of hrtimer:: FunctionNam Cao
When a request is created, the hrtimer is not initialized and only its 'function' field is set to NULL. The hrtimer is only initialized when the request is enqueued. The point of setting 'function' to NULL is that, it can be used to check whether hrtimer_try_to_cancel() should be called while retiring the request. This "trick" is unnecessary, because hrtimer_try_to_cancel() already does its own check whether the timer is armed. If the timer is not armed, hrtimer_try_to_cancel() returns 0. Fully initialize the timer when the request is created, which allows to make the hrtimer::function field private once all users of hrtimer_init() are converted to hrtimer_setup(), which requires a valid callback function to be set. Because hrtimer_try_to_cancel() returns 0 if the timer is not armed, the logic to check whether to call i915_request_put() remains equivalent. Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/50f865045aa672a9730343ad131543da332b1d8d.1730386209.git.namcao@linutronix.de
2024-11-06drm/panthor: Fix OPP refcnt leaks in devfreq initialisationAdrián Larumbe
Rearrange lookup of recommended OPP for the Mali GPU device and its refcnt decremental to make sure no OPP object leaks happen in the error path. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Fixes: fac9b22df4b1 ("drm/panthor: Add the devfreq logical block") Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105205458.1318989-2-adrian.larumbe@collabora.com
2024-11-06drm/panfrost: Add missing OPP table refcnt decrementalAdrián Larumbe
Commit f11b0417eec2 ("drm/panfrost: Add fdinfo support GPU load metrics") retrieves the OPP for the maximum device clock frequency, but forgets to keep the reference count balanced by putting the returned OPP object. This eventually leads to an OPP core warning when removing the device. Fix it by putting OPP objects as many times as they're retrieved. Also remove an unnecessary whitespace. Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com> Fixes: f11b0417eec2 ("drm/panfrost: Add fdinfo support GPU load metrics") Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105205458.1318989-1-adrian.larumbe@collabora.com
2024-11-05drm: replace strcpy() with strscpy()Yafang Shao
To prevent errors from occurring when the src string is longer than the dst string in strcpy(), we should use strscpy() instead. This approach also facilitates future extensions to the task comm. Link: https://lkml.kernel.org/r/20241007144911.27693-8-laoar.shao@gmail.com Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Suggested-by: Justin Stitt <justinstitt@google.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Justin Stitt <justinstitt@google.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Alejandro Colomar <alx@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Jan Kara <jack@suse.cz> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matus Jokay <matus.jokay@stuba.sk> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Quentin Monnet <qmo@kernel.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Simon Horman <horms@kernel.org> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05drm/xe: Stop accumulating LRC timestamp on job_freeLucas De Marchi
The exec queue timestamp is only really useful when it's being queried through the fdinfo. There's no need to update it so often, on every job_free. Tracing a simple app like vkcube running shows an update rate of ~ 120Hz. In case of discrete, the BO is on vram, creating a lot of pcie transactions. The update on job_free() is used to cover a gap: if exec queue is created and destroyed rapidly, before a new query, the timestamp still needs to be accumulated and accounted for in the xef. Initial implementation in commit 6109f24f87d7 ("drm/xe: Add helper to accumulate exec queue runtime") couldn't do it on the exec_queue_fini since the xef could be gone at that point. However since commit ce8c161cbad4 ("drm/xe: Add ref counting for xe_file") the xef is refcounted and the exec queue always holds a reference, making this safe now. Improve the fix in commit 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") by reducing the frequency in which the update is needed. Fixes: 2149ded63079 ("drm/xe: Fix use after free when client stats are captured") Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241104143815.2112272-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 83db047d9425d9a649f01573797558eff0f632e1) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-05drm/xe/pf: Fix potential GGTT allocation leakMichal Wajdeczko
In unlikely event that we fail during sending the new VF GGTT configuration to the GuC, we will free only the GGTT node data struct but will miss to release the actual GGTT allocation. This will later lead to list corruption, GGTT space leak and finally risking crash when unloading the driver: [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-EIO) [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT [ ] list_add corruption. next->prev should be prev (ffff88813cfcd628), but was 0000000000000000. (next=ffff88813cfe2028). [ ] RIP: 0010:__list_add_valid_or_report+0x6b/0xb0 [ ] Call Trace: [ ] drm_mm_insert_node_in_range+0x2c0/0x4e0 [ ] xe_ggtt_node_insert+0x46/0x70 [xe] [ ] pf_provision_vf_ggtt+0x7f5/0xa70 [xe] [ ] xe_gt_sriov_pf_config_set_ggtt+0x5e/0x770 [xe] [ ] ggtt_set+0x4b/0x70 [xe] [ ] simple_attr_write_xsigned.constprop.0.isra.0+0xb0/0x110 [ ] ... [drm] GT0: PF: Failed to provision VF1 with 1073741824 (1.00 GiB) GGTT (-ENOSPC) [ ] ... [drm] GT0: PF: VF1 provisioning remains at 0 (0 B) GGTT [ ] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b7b: 0000 [#1] PREEMPT SMP NOPTI [ ] RIP: 0010:drm_mm_remove_node+0x1b7/0x390 [ ] Call Trace: [ ] <TASK> [ ] ? die_addr+0x2e/0x80 [ ] ? exc_general_protection+0x1a1/0x3e0 [ ] ? asm_exc_general_protection+0x22/0x30 [ ] ? drm_mm_remove_node+0x1b7/0x390 [ ] ggtt_node_remove+0xa5/0xf0 [xe] [ ] xe_ggtt_node_remove+0x35/0x70 [xe] [ ] xe_ttm_bo_destroy+0x123/0x220 [xe] [ ] intel_user_framebuffer_destroy+0x44/0x70 [xe] [ ] intel_plane_destroy_state+0x3b/0xc0 [xe] [ ] drm_atomic_state_default_clear+0x1cd/0x2f0 [ ] intel_atomic_state_clear+0x9/0x20 [xe] [ ] __drm_atomic_state_free+0x1d/0xb0 Fix that by using pf_release_ggtt() on the error path, which now works regardless if the node has GGTT allocation or not. Fixes: 34e804220f69 ("drm/xe: Make xe_ggtt_node struct independent") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241104144901.1903-1-michal.wajdeczko@intel.com (cherry picked from commit 43b1dd2b550f0861ce80fbfffd5881b1b26272b1) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-05drm/xe: Drop VM dma-resv lock on xe_sync_in_fence_get failure in exec IOCTLMatthew Brost
Upon failure all locks need to be dropped before returning to the user. Fixes: 58480c1c912f ("drm/xe: Skip VMAs pin when requesting signal to the last XE_EXEC") Cc: <stable@vger.kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105043524.4062774-3-matthew.brost@intel.com (cherry picked from commit 7d1a4258e602ffdce529f56686925034c1b3b095) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-05drm/xe: Fix possible exec queue leak in exec IOCTLMatthew Brost
In a couple of places after an exec queue is looked up the exec IOCTL returns on input errors without dropping the exec queue ref. Fix this ensuring the exec queue ref is dropped on input error. Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: <stable@vger.kernel.org> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241105043524.4062774-2-matthew.brost@intel.com (cherry picked from commit 07064a200b40ac2195cb6b7b779897d9377e5e6f) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-11-06Merge tag 'drm-msm-next-2024-11-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-next Late updates for v6.13 MDSS: - cleanup UBWC registers handling DP: - Mass-rename the symbols DPU: - SSPP handling cleanup - Move kerneldoc comments from headers to source files - Misc small fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGuGL6k3CKXZ0Qv-FTQ589+_PWNtid6i7MmVJLopBm2sYg@mail.gmail.com
2024-11-06Merge tag 'drm-intel-next-2024-11-04' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/i915/kernel into drm-next drm/i915 feature pull #2 for v6.13: Features and functionality: - Pantherlake (PTL) Xe3 LPD display enabling for xe driver (Clint, Suraj, Dnyaneshwar, Matt, Gustavo, Radhakrishna, Chaitanya, Haridhar, Juha-Pekka, Ravi) - Enable dbuf overlap detection on Lunarlake and later (Stanislav, Vinod) - Allow fastset for HDR infoframe changes (Chaitanya) - Write DP source OUI also for non-eDP sinks (Imre) Refactoring and cleanups: - Independent platform identification for display (Jani) - Display tracepoint fixes and cleanups (Gustavo) - Share PCI ID headers between i915 and xe drivers (Jani) - Use x100 version for full version and release checks (Jani) - Conversions to struct intel_display (Jani, Ville) - Reuse DP DPCD and AUX macros in gvt instead of duplication (Jani) - Use string choice helpers (R Sundar, Sai Teja) - Remove unused underrun detection irq code (Sai Teja) - Color management debug improvements and other cleanups (Ville) - Refactor panel fitter code to a separate file (Ville) - Use try_cmpxchg() instead of open-coding (Uros Bizjak) Fixes: - PSR and Panel Replay fixes and workarounds (Jouni) - Fix panel power during connector detection (Imre) - Fix connector detection and modeset races (Imre) - Fix C20 PHY TX MISC configuration (Gustavo) - Improve panel fitter validity checks (Ville) - Fix eDP short HPD interrupt handling while runtime suspended (Imre) - Propagate DP MST DSC BW overhead/slice calculation errors (Imre) - Stop hotplug polling for eDP connectors (Imre) - Workaround panels reporting bad link status after PSR enable (Jouni) - Panel Replay VRR VSC SDP related workaround and refactor (Animesh, Mitul) - Fix memory leak on eDP init error path (Shuicheng) - Fix GVT KVMGT Kconfig dependencies (Arnd Bergmann) - Fix irq function documentation build warning (Rodrigo) - Add platform check to power management fuse bit read (Clint) - Revert kstrdup_const() and kfree_const() usage for clarity (Christophe JAILLET) - Workaround horizontal odd panning issues in display versions 20 and 30 (Nemesa) - Fix xe drive HDCP GSC firmware check (Suraj) Merges: - Backmerge drm-next to get some KVM changes (Rodrigo) - Fix a build failure originating from previous backmerge (Jani) Signed-off-by: Dave Airlie <airlied@redhat.com> # Conflicts: # drivers/gpu/drm/i915/display/intel_dp_mst.c From: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/87h68ni0wd.fsf@intel.com
2024-11-06Merge tag 'mediatek-drm-next-6.13' of ↵Dave Airlie
https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-next Mediatek DRM Next for Linux 6.13 1. Add support for OF graphs 2. Fix child node refcount handling and use scoped Signed-off-by: Dave Airlie <airlied@redhat.com> From: Chun-Kuang Hu <chunkuang.hu@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241104124103.8041-1-chunkuang.hu@kernel.org
2024-11-05drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()Alex Deucher
Avoid a possible buffer overflow if size is larger than 4K. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit f5d873f5825b40d886d03bd2aede91d4cf002434) Cc: stable@vger.kernel.org
2024-11-05drm/amdgpu: Adjust debugfs eviction and IB access permissionsAlex Deucher
Users should not be able to run these. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7ba9395430f611cfc101b1c2687732baafa239d5) Cc: stable@vger.kernel.org
2024-11-05drm/amdgpu: Adjust debugfs register access permissionsAlex Deucher
Regular users shouldn't have read access. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c0cfd2e652553d607b910be47d0cc5a7f3a78641) Cc: stable@vger.kernel.org
2024-11-05drm/amdgpu: Fix DPX valid mode check on GC 9.4.3Lijo Lazar
For DPX mode, the number of memory partitions supported should be less than or equal to 2. Fixes: 1589c82a1085 ("drm/amdgpu: Check memory ranges for valid xcp mode") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 990c4f580742de7bb78fa57420ffd182fc3ab4cd) Cc: stable@vger.kernel.org
2024-11-05drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()Alex Deucher
Avoid a possible buffer overflow if size is larger than 4K. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amdgpu: Adjust debugfs eviction and IB access permissionsAlex Deucher
Users should not be able to run these. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amdgpu: Adjust debugfs register access permissionsAlex Deucher
Regular users shouldn't have read access. Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amdgpu: stop syncing PRT map operationsChristian König
Requested by both Bas and Friedrich. Mapping PTEs as PRT doesn't need to sync for anything. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amdgpu: set the right AMDGPU sg segment limitationPrike Liang
The driver needs to set the correct max_segment_size; otherwise debug_dma_map_sg() will complain about the over-mapping of the AMDGPU sg length as following: WARNING: CPU: 6 PID: 1964 at kernel/dma/debug.c:1178 debug_dma_map_sg+0x2dc/0x370 [ 364.049444] Modules linked in: veth amdgpu(OE) amdxcp drm_exec gpu_sched drm_buddy drm_ttm_helper ttm(OE) drm_suballoc_helper drm_display_helper drm_kms_helper i2c_algo_bit rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat xt_addrtype iptable_filter br_netfilter nvme_fabrics overlay nfnetlink_cttimeout nfnetlink openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc amd_atl intel_rapl_msr intel_rapl_common sunrpc sch_fq_codel snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg edac_mce_amd binfmt_misc snd_hda_codec snd_pci_acp6x snd_hda_core snd_acp_config snd_hwdep snd_soc_acpi kvm_amd snd_pcm kvm snd_seq_midi snd_seq_midi_event crct10dif_pclmul ghash_clmulni_intel sha512_ssse3 snd_rawmidi sha256_ssse3 sha1_ssse3 aesni_intel snd_seq nls_iso8859_1 crypto_simd snd_seq_device cryptd snd_timer rapl input_leds snd [ 364.049532] ipmi_devintf wmi_bmof ccp serio_raw k10temp sp5100_tco soundcore ipmi_msghandler cm32181 industrialio mac_hid msr parport_pc ppdev lp parport drm efi_pstore ip_tables x_tables pci_stub crc32_pclmul nvme ahci libahci i2c_piix4 r8169 nvme_core i2c_designware_pci realtek i2c_ccgx_ucsi video wmi hid_generic cdc_ether usbnet usbhid hid r8152 mii [ 364.049576] CPU: 6 PID: 1964 Comm: rocminfo Tainted: G OE 6.10.0-custom #492 [ 364.049579] Hardware name: AMD Majolica-RN/Majolica-RN, BIOS RMJ1009A 06/13/2021 [ 364.049582] RIP: 0010:debug_dma_map_sg+0x2dc/0x370 [ 364.049585] Code: 89 4d b8 e8 36 b1 86 00 8b 4d b8 48 8b 55 b0 44 8b 45 a8 4c 8b 4d a0 48 89 c6 48 c7 c7 00 4b 74 bc 4c 89 4d b8 e8 b4 73 f3 ff <0f> 0b 4c 8b 4d b8 8b 15 c8 2c b8 01 85 d2 0f 85 ee fd ff ff 8b 05 [ 364.049588] RSP: 0018:ffff9ca600b57ac0 EFLAGS: 00010286 [ 364.049590] RAX: 0000000000000000 RBX: ffff88b7c132b0c8 RCX: 0000000000000027 [ 364.049592] RDX: ffff88bb0f521688 RSI: 0000000000000001 RDI: ffff88bb0f521680 [ 364.049594] RBP: ffff9ca600b57b20 R08: 000000000000006f R09: ffff9ca600b57930 [ 364.049596] R10: ffff9ca600b57928 R11: ffffffffbcb46328 R12: 0000000000000000 [ 364.049597] R13: 0000000000000001 R14: ffff88b7c19c0700 R15: ffff88b7c9059800 [ 364.049599] FS: 00007fb2d3516e80(0000) GS:ffff88bb0f500000(0000) knlGS:0000000000000000 [ 364.049601] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 364.049603] CR2: 000055610bd03598 CR3: 00000001049f6000 CR4: 0000000000350ef0 [ 364.049605] Call Trace: [ 364.049607] <TASK> [ 364.049609] ? show_regs+0x6d/0x80 [ 364.049614] ? __warn+0x8c/0x140 [ 364.049618] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049621] ? report_bug+0x193/0x1a0 [ 364.049627] ? handle_bug+0x46/0x80 [ 364.049631] ? exc_invalid_op+0x1d/0x80 [ 364.049635] ? asm_exc_invalid_op+0x1f/0x30 [ 364.049642] ? debug_dma_map_sg+0x2dc/0x370 [ 364.049647] __dma_map_sg_attrs+0x90/0xe0 [ 364.049651] dma_map_sgtable+0x25/0x40 [ 364.049654] amdgpu_bo_move+0x59a/0x850 [amdgpu] [ 364.049935] ? srso_return_thunk+0x5/0x5f [ 364.049939] ? amdgpu_ttm_tt_populate+0x5d/0xc0 [amdgpu] [ 364.050095] ttm_bo_handle_move_mem+0xc3/0x180 [ttm] [ 364.050103] ttm_bo_validate+0xc1/0x160 [ttm] [ 364.050108] ? amdgpu_ttm_tt_get_user_pages+0xe5/0x1b0 [amdgpu] [ 364.050263] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0xa12/0xc90 [amdgpu] [ 364.050473] kfd_ioctl_alloc_memory_of_gpu+0x16b/0x3b0 [amdgpu] [ 364.050680] kfd_ioctl+0x3c2/0x530 [amdgpu] [ 364.050866] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu] [ 364.051054] ? srso_return_thunk+0x5/0x5f [ 364.051057] ? tomoyo_file_ioctl+0x20/0x30 [ 364.051063] __x64_sys_ioctl+0x9c/0xd0 [ 364.051068] x64_sys_call+0x1219/0x20d0 [ 364.051073] do_syscall_64+0x51/0x120 [ 364.051077] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 364.051081] RIP: 0033:0x7fb2d2f1a94f Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amdgpu: Fix DPX valid mode check on GC 9.4.3Lijo Lazar
For DPX mode, the number of memory partitions supported should be less than or equal to 2. Fixes: 1589c82a1085 ("drm/amdgpu: Check memory ranges for valid xcp mode") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amdgpu/gfx11: Add cleaner shader for GFX11.0.3Srinivasan Shanmugam
This commit adds the cleaner shader microcode for GFX11.0.3 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_11_0_3_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amd/pm: add zero RPM stop temperature OD setting support for SMU13Wolfgang Müller
Together with the feature to enable or disable zero RPM in the last commit, it also makes sense to expose the OD setting determining under which temperature the fan should stop if zero RPM is enabled. Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Wolfgang Müller <wolf@oriole.systems> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amdgpu/mes: fetch fw version from firmware headerAlex Deucher
We need this prior to the firmware being loaded so fetch from the header. v2: fetch directly from the firmware v3: store both fw versions Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amd/pm: add zero RPM OD setting support for SMU13Wolfgang Müller
Whilst we have support for setting fan curves there is no support for disabling the zero RPM feature. Since the relevant bits are already present in the OverDriveTable, hook them up to a sysctl setting so users can influence this behaviour. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3489 Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Wolfgang Müller <wolf@oriole.systems> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/ci: remove update-xfails.pyVignesh Raman
We can remove the xfails/update-xfails.py script as it is not used in CI jobs. Once ci-collate [1] is tested for drm-ci, we can use this tool directly to update fails and flakes. [1] https://gitlab.freedesktop.org/gfx-ci/ci-collate/ Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com> Reviewed-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Helen Koike <helen.koike@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241030091732.665428-1-vignesh.raman@collabora.com
2024-11-05drm: display: Set fwnode for aux bus devicesSaravana Kannan
fwnode needs to be set for a device for fw_devlink to be able to track/enforce its dependencies correctly. Without this, you'll see error messages like this when the supplier has probed and tries to make sure all its fwnode consumers are linked to it using device links: mediatek-drm-dp 1c500000.edp-tx: Failed to create device link (0x180) with backlight-lcd0 Reported-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Closes: https://lore.kernel.org/all/7b995947-4540-4b17-872e-e107adca4598@notapiano/ Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com> Signed-off-by: Saravana Kannan <saravanak@google.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Thierry Reding <treding@nvidia.com> Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20241024061347.1771063-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05sysfs: treewide: constify attribute callback of bin_is_visible()Thomas Weißschuh
The is_bin_visible() callbacks should not modify the struct bin_attribute passed as argument. Enforce this by marking the argument as const. As there are not many callback implementers perform this change throughout the tree at once. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Jason Gunthorpe <jgg@nvidia.com> Acked-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Krzysztof Wilczyński <kw@linux.com> Link: https://lore.kernel.org/r/20241103-sysfs-const-bin_attr-v2-5-71110628844c@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05drm: use ATOMIC64_INIT() for atomic64_tJonathan Gray
use ATOMIC64_INIT() not ATOMIC_INIT() for atomic64_t Fixes: 3f09a0cd4ea3 ("drm: Add common fdinfo helper") Signed-off-by: Jonathan Gray <jsg@jsg.id.au> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240111023045.50013-1-jsg@jsg.id.au Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2024-11-05drm/bridge: Add ITE IT6263 LVDS to HDMI converterLiu Ying
Add basic HDMI video output support. Currently, only RGB888 output pixel format is supported. At the LVDS input side, the driver supports single LVDS link and dual LVDS links with "jeida-24" LVDS mapping. Product link: https://www.ite.com.tw/en/product/cate1/IT6263 Signed-off-by: Liu Ying <victor.liu@nxp.com> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Acked-by: Maxime Ripard <mripard@kernel.org> Tested-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241104032806.611890-11-victor.liu@nxp.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2024-11-05drm: of: Add drm_of_lvds_get_dual_link_pixel_order_sink()Liu Ying
drm_of_lvds_get_dual_link_pixel_order() gets LVDS dual-link source pixel order. Similar to it, add it's counterpart function drm_of_lvds_get_dual_link_pixel_order_sink() to get LVDS dual-link sink pixel order. Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Signed-off-by: Liu Ying <victor.liu@nxp.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241104032806.611890-7-victor.liu@nxp.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2024-11-05drm: of: Get MEDIA_BUS_FMT_RGB101010_1X7X5_{JEIDA, SPWG} LVDS data mappingsLiu Ying
Add MEDIA_BUS_FMT_RGB101010_1X7X5_{JEIDA,SPWG} support in drm_of_lvds_get_data_mapping() function implementation so that function callers may get the two LVDS data mappings. Signed-off-by: Liu Ying <victor.liu@nxp.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://patchwork.freedesktop.org/patch/msgid/20241104032806.611890-6-victor.liu@nxp.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
2024-11-05Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann
Backmerging to get the latest fixes from v6.12-rc6. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2024-11-05Merge tag 'exynos-drm-next-for-v6.13' of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-next New feature - Add Decon driver support for Exynos7870 SoC . This patch adds driver data and support for Exynos7870 SoC in the Exynos7 Decon driver Bug fixups for exynos7_drm_decon.c module - Properly clear channels during bind . This patch implements shadow protection/unprotection to clear DECON channels properly, preventing kernel panic - Fix ideal_clk by converting it to HZ . This patch corrects the clkdiv values by converting ideal_clk to Hz for consistency - Fix uninitialized crtc reference in functions . This patch modifies functions to accept a pointer to the decon_context struct to avoid uninitialized references Cleanups - Remove unused prototype for crtc . This patch removes unused prototypes exynos_drm_crtc_wait_pending_update exynos_drm_crtc_finish_update - And just typo fixup Signed-off-by: Dave Airlie <airlied@redhat.com> From: Inki Dae <inki.dae@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241104031341.36549-1-inki.dae@samsung.com
2024-11-05Merge tag 'drm-xe-next-2024-10-31' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next UAPI Changes: - Define and parse OA sync properties (Ashutosh) Driver Changes: - Add caller info to xe_gt_reset_async (Nirmoy) - A large forcewake rework / cleanup (Himal) - A g2h response timeout fix (Badal) - A PTL workaround (Vinay) - Handle unreliable MMIO reads during forcewake (Shuicheng) - Ufence user-space access fixes (Nirmoy) - Annotate flexible arrays (Matthew Brost) - Enable GuC lite restore (Fei) - Prevent GuC register capture on VF (Zhanjun) - Show VFs VRAM / LMEM provisioning summary over debugfs (Michal) - Parallel queues fix on GT reset (Nirmoy) - Move reference grabbing to a job's dma-fence (Matt Brost) - Mark a number of local workqueues WQ_MEM_RECLAIM (Matt Brost) - OA synchronization support (Ashutosh) - Capture all available bits of GuC timestamp to GuC log (John) - Increase readability of guc_info debugfs (John) - Add a mmio barrier before GGTT invalidate (Matt Brost) - Don't short-circuit TDR on jobs not started (Matt Brost) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZyNvA_vZZYR-1eWE@fedora
2024-11-05Merge tag 'drm-msm-next-2024-10-28' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/msm into drm-next Updates for v6.13 Core: - Switch to aperture_remove_all_conflicting_devices() - Simplify msm_disp_state_dump_regs() DPU: - Add SA8775P support - Add (disabled by default) MSM8917, MSM8937, MSM8953 and MSM8996 support - Enable support for larger framebuffers (required for X.Org working with several outputs) - Dropped LM_3, LM_4 (MSM8998, SDM845) - Fixed DSPP_3 routing on SDM845 DP: - Add SA8775P support HDMI: - Mark two arrays as const in MSM8998 HDMI PHY driver GPU: - a7xx preemption support - Adreno A663 support - Typos fixes, etc - Fix excessive stack usage in a6xx GMU Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGt7k8zDHsg2Uzx9apzyQMut8XdLXMQSRNn7WArdPUV5Qw@mail.gmail.com
2024-11-04drm/amd/pm: correct the workload settingKenneth Feng
Correct the workload setting in order not to mix the setting with the end user. Update the workload mask accordingly. v2: changes as below: 1. the end user can not erase the workload from driver except default workload. 2. always shows the real highest priority workoad to the end user. 3. the real workload mask is combined with driver workload mask and end user workload mask. v3: apply this to the other ASICs as well. v4: simplify the code v5: refine the code based on the review comments. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8cc438be5d49b8326b2fcade0bdb7e6a97df9e0b) Cc: stable@vger.kernel.org # 6.11.x
2024-11-04drm/amd/pm: always pick the pptable from IFWIKenneth Feng
always pick the pptable from IFWI on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 136ce12bd5907388cb4e9aa63ee5c9c8c441640b) Cc: stable@vger.kernel.org # 6.11.x
2024-11-04drm/amdgpu: prevent NULL pointer dereference if ATIF is not supportedAntonio Quartulli
acpi_evaluate_object() may return AE_NOT_FOUND (failure), which would result in dereferencing buffer.pointer (obj) while being NULL. Although this case may be unrealistic for the current code, it is still better to protect against possible bugs. Bail out also when status is AE_NOT_FOUND. This fixes 1 FORWARD_NULL issue reported by Coverity Report: CID 1600951: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Fixes: c9b7c809b89f ("drm/amd: Guard against bad data for ATIF ACPI method") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 91c9e221fe2553edf2db71627d8453f083de87a1) Cc: stable@vger.kernel.org
2024-11-04drm/amd/display: parse umc_info or vram_info based on ASICAurabindo Pillai
An upstream bug report suggests that there are production dGPUs that are older than DCN401 but still have a umc_info in VBIOS tables with the same version as expected for a DCN401 product. Hence, reading this tables should be guarded with a version check. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3678 Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2551b4a321a68134360b860113dd460133e856e5) Fixes: 00c391102abc ("drm/amd/display: Add misc DC changes for DCN401") Cc: stable@vger.kernel.org # 6.11.x
2024-11-04drm/amd/display: Fix brightness level not retained over rebootTom Chung
[Why] During boot up and resume the DC layer will reset the panel brightness to fix a flicker issue. It will cause the dm->actual_brightness is not the current panel brightness level. (the dm->brightness is the correct panel level) [How] Set the backlight level after do the set mode. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization") Reported-by: Mark Herbert <mark.herbert42@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3655 Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 7875afafba84817b791be6d2282b836695146060) Cc: stable@vger.kernel.org
2024-11-04drm/amd/pm: correct the workload settingKenneth Feng
Correct the workload setting in order not to mix the setting with the end user. Update the workload mask accordingly. v2: changes as below: 1. the end user can not erase the workload from driver except default workload. 2. always shows the real highest priority workoad to the end user. 3. the real workload mask is combined with driver workload mask and end user workload mask. v3: apply this to the other ASICs as well. v4: simplify the code v5: refine the code based on the review comments. Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: Add compatible NPS mode infoLijo Lazar
Populate the compatible NPS modes also for providing partition configuration details through sysfs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: Skip IP coredump for RAS errorsLijo Lazar
For RAS errors, source of error is known. Skip the core dump of IP states. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: Group gfx sysfs functionsLijo Lazar
Make amdgpu_gfx_sysfs_init/fini functions as common entry points for all gfx related sysfs nodes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: Add nps_mode in RAS init_flagCandice Li
Add nps_mode in RAS init_flag. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: add amdgpu_sdma_sched_mask debugfsJesse Zhang
Userspace wants to run jobs on a specific sdma ring for verification purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in the sdma ip. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfsJesse Zhang
compute/gfx may have multiple rings on some hardware. In some cases, userspace wants to run jobs on a specific ring for validation purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in the gfx/compute ip. Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: Fix dummy_read_page overlapping mappingsPrike Liang
Use the dma_map_page_attrs() with DMA_ATTR_SKIP_CPU_SYNC attribute setting to handle the dummy page overlapping mappings. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>