summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd
AgeCommit message (Collapse)Author
2024-11-04drm/amdgpu: Fix dummy_read_page overlapping mappingsPrike Liang
Use the dma_map_page_attrs() with DMA_ATTR_SKIP_CPU_SYNC attribute setting to handle the dummy page overlapping mappings. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: skip amdgpu_device_cache_pci_state under sriovVictor Zhao
Under sriov, host driver will save and restore vf pci cfg space during reset. And during device init, under sriov, pci_restore_state happens after fullaccess released, and it can have race condition with mmio protection enable from host side leading to missing interrupts. So skip amdgpu_device_cache_pci_state for sriov. Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdkfd: Use dynamic allocation for CU occupancy array in ↵Srinivasan Shanmugam
'kfd_get_cu_occupancy()' The `kfd_get_cu_occupancy` function previously declared a large `cu_occupancy` array as a local variable, which could lead to stack overflows due to excessive stack usage. This commit replaces the static array allocation with dynamic memory allocation using `kcalloc`, thereby reducing the stack size. This change avoids the risk of stack overflows in kernel space, in scenarios where `AMDGPU_MAX_QUEUES` is large. The allocated memory is freed using `kfree` before the function returns to prevent memory leaks. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c: In function ‘kfd_get_cu_occupancy’: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:322:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] 322 | } | ^ Fixes: 6ae9e1aba97e ("drm/amdkfd: Update logic for CU occupancy calculations") Cc: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/pm: always pick the pptable from IFWIKenneth Feng
always pick the pptable from IFWI on smu v14.0.2/3 Signed-off-by: Kenneth Feng <kenneth.feng@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: prevent NULL pointer dereference if ATIF is not supportedAntonio Quartulli
acpi_evaluate_object() may return AE_NOT_FOUND (failure), which would result in dereferencing buffer.pointer (obj) while being NULL. Although this case may be unrealistic for the current code, it is still better to protect against possible bugs. Bail out also when status is AE_NOT_FOUND. This fixes 1 FORWARD_NULL issue reported by Coverity Report: CID 1600951: Null pointer dereferences (FORWARD_NULL) Signed-off-by: Antonio Quartulli <antonio@mandelbit.com> Fixes: c9b7c809b89f ("drm/amd: Guard against bad data for ATIF ACPI method") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241031152848.4716-1-antonio@mandelbit.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: 3.2.308Aric Cyr
This version brings along following fixes: - Prune Invalid Modes for HDMI Output - SPL Cleanup - Fix brightness level not retained over reboot - Remove inaccessible registers from DMU diagnostics Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Prune Invalid Modes For HDMI OutputFangzhi Zuo
[Why] 1. HDMI does not have 6 bpc support. Having 6 bpc pass validation does not comply with spec. 2. Validate 420 only for native HDMI, but not apply to pcon use case. 3. Current mode validation log is not readable. [how] 1. Cap 8 bpc for dp-hdmi converter. 2. Validate yuv420 for pcon use case as well, if rgb/yuv444 8bpc cannot fit into pcon bw limitation of the link from the converter to HDMI sink. 3. Add readable pixel_format and color_depth into debug log. Reviewed-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Implement new backlight_level_params structureKaitlyn Tse
[Why] Implement the new backlight_level_params structure as part of the VBAC framework, the information in this structure is needed to be passed down to the DMCUB to identify the backlight control type, to adjust the backlight of the panel and to perform any required conversions from PWM to nits or vice versa. [How] Modified existing functions to include the new backlight_level_params structure. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Kaitlyn Tse <Kaitlyn.Tse@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: [FW Promotion] Release 0.0.241.0Taimur Hassan
- Add DPCS health check - Update USB4 PHY SSC - Fix FAMS2 SubVP Close to VBlank changes - Create VESA Aux-based backlight control path - Fix PSR1 CRC error during CTS test Acked-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Add a missing DCN401 reg definitionAurabindo Pillai
Add a mising reg field to the autogenerated header for future use Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/pm: add inst to dpm_set_vcn_enableBoyuan Zhang
Add an instance parameter to the existing function dpm_set_vcn_enable() for future implementation. Re-write all pptable functions accordingly. v2: Remove duplicated dpm_set_vcn_enable() functions in v1. Instead, adding instance parameter to existing functions. Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Don't write DP_MSTM_CTRL after LTWayne Lin
[Why] Observe after suspend/resme, we can't light up mst monitors under specific mst hub. The reason is that driver still writes DPCD DP_MSTM_CTRL after LT. It's forbidden even we write the same value for that dpcd register. [How] We already resume the mst branch device dpcd settings during resume_mst_branch_status(). Leverage drm_dp_mst_topology_queue_probe() to only probe the topology, not calling drm_dp_mst_topology_mgr_resume() which will set DP_MSTM_CTRL as well. Reviewed-by: Jerry Zuo <jerry.zuo@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Minimize wait for pending updatesIlya Bakoulin
[Why/How] Move the wait for pending updates past prepare_bandwidth if the previous update was not a full update to reduce the average time it takes to complete a full update. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: parse umc_info or vram_info based on ASICAurabindo Pillai
An upstream bug report suggests that there are production dGPUs that are older than DCN401 but still have a umc_info in VBIOS tables with the same version as expected for a DCN401 product. Hence, reading this tables should be guarded with a version check. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3678 Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Remove otg w/a toggling on HPO interfacesAusef Yousof
[why&how] Adjust otg w/a disable condition to include HPO explicitly rather than assuming it is implicitly used through DP2. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ausef Yousof <Ausef.Yousof@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Remove hw w/a toggle if on DP2/HPOAusef Yousof
[why&how] Applying a hw w/a only relevant to DIG FIFO causing corruption using HPO, do not apply the w/a if on DP2/HPO Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ausef Yousof <Ausef.Yousof@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: SPL cleanupSamson Tam
[Why & How] Move from pointer to callback to reference callback directly Missed renaming fixpt functions with spl prefix Reviewed-by: Navid Assadian <Navid.Assadian@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Fix brightness level not retained over rebootTom Chung
[Why] During boot up and resume the DC layer will reset the panel brightness to fix a flicker issue. It will cause the dm->actual_brightness is not the current panel brightness level. (the dm->brightness is the correct panel level) [How] Set the backlight level after do the set mode. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization") Reported-by: Mark Herbert <mark.herbert42@gmail.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3655 Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: fix asserts in SPL during bootupSamson Tam
[Why] During mode validation, there maybe modes that fail max_downscale_src_width check and scaling_quality taps are 0. This will cause an assert to trigger in spl_set_filters_data() because taps are 0. [How] Move taps calculation for non-adaptive scaling mode to separate function and call it if max_downscale_src_width fails. This will populate taps if scaling_quality taps are 0. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Remove inaccessible registers from DMU diagnosticsNicholas Kazlauskas
[Why] SEC_CNTL isn't readable by x86 and can block Z8 entry if read. [How] Remove the read. Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: fix rxstatus_msg_sz type narrowingDominik Kaszewski
[Why] Code reading rxstatus message size was incorrectly assigning it to uint8_t, despite the value being 10 bits long (lower byte plus lowest 2 bits from upper byte). This caused the highest 2 bits to be ignored, potentially missing invalid values. [How] Change all local variables holding rxstatus message size from uint8_t to uint16_t, as in mod_hdcp_message_hdcp2::rx_id_list_size. Replaced untyped HDCP_2_2_HMID_RXSTATUS_MSG_SZ_HI macro with function hdcp_2_2_hmid_rxstatus_msg_sz(const uint8_t[2]) to encapsulate entire calculation and return a typed result. Removed spaces mixed with tabs to fix indentation on modified lines. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Dominik Kaszewski <dominik.kaszewski@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04Revert "drm/amd/display: Block UHBR Based On USB-C PD Cable ID"Ausef Yousof
This reverts commit 4f01a68751194d05280d659a65758c09e4af04d6. [why & how] The offending commit caused a lighting issue for Samsung Odyssey G9 monitors when connecting via USB-C. The commit was intended to block certain UHBR rates. Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ausef Yousof <Ausef.Yousof@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: avoid divided by zeroCharlene Liu
[why] insert divided by zero protection Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Do Not Fallback To SW Cursor If HW Cursor RequiredAustin Zheng
[Why/How] Tearing can occur if there is a flip immediate plane and SW cursor. check_subvp_sw_cursor_fallback_req falls back to SW cursor if the stream has the potential to use subVP. Check for fallback not needed if HW cursor is required. e.g. Fullscreen gaming Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: use string choice helpersR Sundar
Use string choice helpers for better readability. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Closes: https://lore.kernel.org/r/202410161814.I6p2Nnux-lkp@intel.com/ Signed-off-by: R Sundar <prosunofficial@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: fix comment about amdgpu.abmlevel defaultsjeffbai@aosc.io
Since 040fdcde288a ("drm/amdgpu: respect the abmlevel module parameter value if it is set"), the default value for amdgpu.abmlevel was set to -1, or auto. However, the comment explaining the default value was not updated to reflect the change (-1, or auto; not -1, or disabled). Clarify that the default value (-1) means auto. Fixes: 040fdcde288a ("drm/amdgpu: respect the abmlevel module parameter value if it is set") Reported-by: Ruikai Liu <rickliu2000@outlook.com> Signed-off-by: Mingcong Bai <jeffbai@aosc.io> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: Expose special on chip memory pools in fdinfoTvrtko Ursulin
In the past these specialized on chip memory pools were reported as system memory (aka 'cpu') which was not correct and misleading. That has since been removed so lets make them visible as their own respective memory regions. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: Stop reporting special chip memory pools as CPU memory in fdinfoTvrtko Ursulin
So far these specialized on chip memory pools were reported as system memory (aka 'cpu') which is not correct and misleading. Lets remove that and consider later making them visible as their own thing. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Suggested-by: Christian König <christian.koenig@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Alex Deucher <alexdeucher@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: stop tracking visible memory statsYunxiang Li
Since on modern systems all of vram can be made visible anyways, to simplify the new implementation, drops tracking how much memory is visible for now. If this is really needed we can add it back on top of the new implementation, or just report all the BOs as visible. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: make drm-memory-* report resident memoryYunxiang Li
The old behavior reports the resident memory usage for this key and the documentation say so as well. However this was accidentally changed to include buffers that was evicted. Fixes: 04bdba46542c ("drm/amdgpu: Use drm_print_memory_stats helper from fdinfo") Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: Fix the memory allocation issue in amdgpu_discovery_get_nps_info()Li Huafei
Fix two issues with memory allocation in amdgpu_discovery_get_nps_info() for mem_ranges: - Add a check for allocation failure to avoid dereferencing a null pointer. - As suggested by Christophe, use kvcalloc() for memory allocation, which checks for multiplication overflow. Additionally, assign the output parameters nps_type and range_cnt after the kvcalloc() call to prevent modifying the output parameters in case of an error return. Fixes: b194d21b9bcc ("drm/amdgpu: Use NPS ranges from discovery table") Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: add ring reset messagesAlex Deucher
Add messages to make it clear when a per ring reset happens. This is helpful for debugging and aligns with other reset methods. v2: add ring name in success/fail messages (Lijo) Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: fix fairness in enforce isolation handlingAlex Deucher
Make sure KFD gets a turn when serializing access to the GC IP. Currently non-KFD jobs can starve KFD if they submit often enough. This patch prevents that by stalling non-KFD if its time period has elapsed. v2: fix units v3: check enablement properly Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Remove last parts of timing_traceDr. David Alan Gilbert
Commit c2c2ce1e9623 ("drm/amd/display: Optimize passive update planes.") removed the last caller of context_timing_trace. Remove it. With that gone, no one is now looking at the 'timing_trace' flag, remove it and all the places that set it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Remove unused cm3_helper_translate_curve_to_degamma_hw_formatDr. David Alan Gilbert
cm3_helper_translate_curve_to_degamma_hw_format() since it was added in 2020's commit 03f54d7d3448 ("drm/amd/display: Add DCN3 DPP") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amd/display: Remove unused regamma functionsDr. David Alan Gilbert
calculate_user_regamma_coeff() and calculate_user_regamma_ramp() were added in 2018 in commit 55a01d4023ce ("drm/amd/display: Add user_regamma to color module") but never used. Remove them and their helpers. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdkfd: add an interface to query whether is KFD is activeAlex Deucher
Add an interface to query whether KFD has any active queues. v2: fix build issues Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04Backmerge v6.12-rc6 of ↵Dave Airlie
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next Backmerge Linus tree for some drm-fixes needed for msm and xe merges. Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-11-03fdget(), trivial conversionsAl Viro
fdget() is the first thing done in scope, all matching fdput() are immediately followed by leaving the scope. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-01Merge tag 'drm-misc-next-2024-10-31' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.13: All of the previous pull request, with MORE! Core Changes: - Update documentation for scheduler start/stop and job init. - Add dedede and sm8350-hdk hardware to ci runs. Driver Changes: - Small fixes and cleanups to panfrost, omap, nouveau, ivpu, zynqmp, v3d, panthor docs, and leadtek-ltk050h3146w. - Crashdump support for qaic. - Support DP compliance in zynqmp. - Add Samsung S6E88A0-AMS427AP24 panel. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/deeef745-f3fb-4e85-a9d0-e8d38d43c1cf@linux.intel.com
2024-10-29Merge tag 'amd-drm-next-6.13-2024-10-25' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.13-2024-10-25: amdgpu: - SDMA queue reset support - SMU 13.0.6 updates - Add debugfs interface to help limit jpeg queue scheduling for testing - JPEG 4.0.3 updates - Initial runtime repartitioning support - GFX9 fixes - Misc code cleanups - Rework IP structures to better handle multiple instances of an IP - DML updates - DSC fixes - HDR fixes - Brightness control updates - Runtime pm cleanup - DMCUB fixes - DCN 3.5 updates - Struct drm_edid cleanup - Fetch EDID from _DDC if available - Ring noop optimizations - MES logging fixes - 3DLUT fixes - DCN 4.x fixes - SMU 13.x fixes - Fixes for set_soft_freq_range() - ACPI fixes - SMU 14.x updates - PSR-SU fixes - fdinfo cleanup - DCN documentation updates amdkfd: - Misc code cleanups - Increase event FIFO size - Copy wave state fixes for SDMA radeon: - Fix possible overflow in packet3 check - Late init connector fix - Always set GEM function pointer Documentation: - Update drm-memory documentation From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241025132336.2416913-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2024-10-28drm/amdgpu/smu13: fix profile reportingAlex Deucher
The following 3 commits landed in parallel: commit d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") commit 7a1613e47e65 ("drm/amdgpu/smu13: always apply the powersave optimization") commit 7c210ca5a2d7 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D") While everything is set correctly, this caused the profile to be reported incorrectly because both the powersave and fullscreen3d bits were set in the mask and when the driver prints the profile, it looks for the first bit set. Fixes: d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit ecfe9b237687a55d596fff0650ccc8cc455edd3f) Cc: stable@vger.kernel.org
2024-10-28drm/amd/pm: Vangogh: Fix kernel memory out of bounds writeTvrtko Ursulin
KASAN reports that the GPU metrics table allocated in vangogh_tables_init() is not large enough for the memset done in smu_cmn_init_soft_gpu_metrics(). Condensed report follows: [ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu] [ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067 ... [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544 [ 33.861816] Tainted: [W]=WARN [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 33.861822] Call Trace: [ 33.861826] <TASK> [ 33.861829] dump_stack_lvl+0x66/0x90 [ 33.861838] print_report+0xce/0x620 [ 33.861853] kasan_report+0xda/0x110 [ 33.862794] kasan_check_range+0xfd/0x1a0 [ 33.862799] __asan_memset+0x23/0x40 [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.867135] dev_attr_show+0x43/0xc0 [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0 [ 33.867155] seq_read_iter+0x3f8/0x1140 [ 33.867173] vfs_read+0x76c/0xc50 [ 33.867198] ksys_read+0xfb/0x1d0 [ 33.867214] do_syscall_64+0x90/0x160 ... [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s: [ 33.867358] kasan_save_stack+0x33/0x50 [ 33.867364] kasan_save_track+0x17/0x60 [ 33.867367] __kasan_kmalloc+0x87/0x90 [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu] [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu] [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu] [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu] [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu] [ 33.869608] local_pci_probe+0xda/0x180 [ 33.869614] pci_device_probe+0x43f/0x6b0 Empirically we can confirm that the former allocates 152 bytes for the table, while the latter memsets the 168 large block. Root cause appears that when GPU metrics tables for v2_4 parts were added it was not considered to enlarge the table to fit. The fix in this patch is rather "brute force" and perhaps later should be done in a smarter way, by extracting and consolidating the part version to size logic to a common helper, instead of brute forcing the largest possible allocation. Nevertheless, for now this works and fixes the out of bounds write. v2: * Drop impossible v3_0 case. (Mario) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Evan Quan <evan.quan@amd.com> Cc: Wenyou Yang <WenYou.Yang@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703) Cc: stable@vger.kernel.org # v6.6+
2024-10-28Revert "drm/amd/display: update DML2 policy ↵Ovidiu Bunea
EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable@vger.kernel.org
2024-10-28drm/amdgpu/smu13: fix profile reportingAlex Deucher
The following 3 commits landed in parallel: commit d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") commit 7a1613e47e65 ("drm/amdgpu/smu13: always apply the powersave optimization") commit 7c210ca5a2d7 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D") While everything is set correctly, this caused the profile to be reported incorrectly because both the powersave and fullscreen3d bits were set in the mask and when the driver prints the profile, it looks for the first bit set. Fixes: d7d2688bf4ea ("drm/amd/pm: update workload mask after the setting") Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28drm/amdkfd: flag per-queue reset support for gfx9Jonathan Kim
Flag KFD support for per-queue reset on GFX9 devices. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28drm/amdgpu: optimize ACA log printYang Wang
- skip to print CE ACA log. - optimize ACA log print for MCA. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28drm/amdgpu: add generic func to check if ta fw is applicableLe Ma
Separated xgmi ta is required for specific APU, and driver needs parse the ta binary properly with aux xgmi ta packed. v2: make the check function more generic (Lijo) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28drm/amdgpu: clean up the suspend_completePrike Liang
To check the status of S3 suspend completion, use the PM core pm_suspend_global_flags bit(1) to detect S3 abort events. Therefore, clean up the AMDGPU driver's private flag suspend_complete. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-28drm/amdgpu: correct the S3 abort check conditionPrike Liang
In the normal S3 entry, the TOS cycle counter is not reset during BIOS execution the _S3 method, so it doesn't determine whether the _S3 method is executed exactly. Howerver, the PM core performs the S3 suspend will set the PM_SUSPEND_FLAG_FW_RESUME bit if all the devices suspend successfully. Therefore, drivers can check the pm_suspend_global_flags bit(1) to detect the S3 suspend abort event. Fixes: 6704dbf71928 ("drm/amdgpu: update suspend status for aborting from deeper suspend") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>