summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
AgeCommit message (Collapse)Author
2023-06-09drm/amdgpu: support partition drm devicesJames Zhu
Support partition drm devices on GC_HWIP IP_VERSION(9, 4, 3). This is a temporary solution and will be superceded. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-and-tested-by: Philip Yang<Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/bu: update mtype_local parameter settingsGraham Sider
Update mtype_local module parameter to use MTYPE_RW by default. 0: MTYPE_RW (default) 1: MTYPE_NC 2: MTYPE_CC Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/bu: add mtype_local as a module parameterDavid Francis
Selects the MTYPE to be used for local memory, (0 = MTYPE_CC (default), 1 = MTYPE_NC, 2 = MTYPE_RW) v2: squash in build fix (Alex) Reviewed-by: Graham Sider <Graham.Sider@amd.com> Signed-off-by: David Francis <David.Francis@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu/bu: Add use_mtype_cc_wa module paramGraham Sider
By default, set use_mtype_cc_wa to 1 to set PTE coherence flag MTYPE_CC instead of MTYPE_RW by default. This is required for the time being to mitigate a bug causing XCCs to hit stale data due to TCC marking fully dirty lines as exclusive. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Reviewed-by: Joseph Greathouse <Joseph.Greathouse@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add auto mode for compute partitionLijo Lazar
When auto mode is specified, driver will choose the right compute partition mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Reviewed-by: Philip Yang <philip.yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: Add parsing of acpi xcc objectsLijo Lazar
Add parsing of ACPI xcc objects and fill in relevant info from them by invoking the DSM methods. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-and-tested-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: override partition mode through module parameterShiwu Zhang
Add a module parameter to override the partition mode. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add new flag to AMDGPU_CTX_QUERY2Pierre-Eric Pelloux-Prayer
OpenGL EXT_robustness extension expects the driver to stop reporting GUILTY_CONTEXT_RESET when the reset has completed and the GPU is ready to accept submission again. This commit adds a AMDGPU_CTX_QUERY2_FLAGS_RESET_IN_PROGRESS flag, that let the UMD know that the reset is still not finished. Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-24drm/amdgpu: bump driver version number for CP GFX shadowAlex Deucher
So UMDs can determine whether the kernel supports this. Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21986 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-21drm/amd/amdgpu: Fix style errors in amdgpu_drv.c & amdgpu_device.cSrinivasan Shanmugam
Fix following checkpatch style errors in amdgpu_drv.c & amdgpu_device.c ERROR: exactly one space required after that #ifdef ERROR: spaces required around that '+=' (ctx:WxV) ERROR: space required before the open brace '{' ERROR: spaces required around that '||' (ctx:VxE) ERROR: space prohibited before that close parenthesis ')' ERROR: space required before the open parenthesis '(' ERROR: space required before the open brace '{' ERROR: code indent should use tabs where possible Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-11drm/amd/amdgpu: Drop the hang limit parameterSrinivasan Shanmugam
The driver doesn't resubmit jobs on hangs any more, hence drop the hang limit parameter - amdgpu_job_hang_limit, wherever it is used. Suggested-by: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Kent Russell <kent.russell@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-31drm/amd/amdgpu: Remove initialisation of globals to 0 or NULLSrinivasan Shanmugam
Global variables do not need to be initialized to 0 or NULL and checkpatch flags this error in drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c: ERROR: do not initialise globals to NULL +char *amdgpu_disable_cu = NULL; +char *amdgpu_virtual_display = NULL; Fix this checkpatch error. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-31drm/amd/amdgpu: Fix error do not initialise globals to 0Srinivasan Shanmugam
Global variables do not need to be initialized to 0 and checkpatch flags this error in drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c: ERROR: do not initialise globals to 0 +int amdgpu_no_queue_eviction_on_vm_fault = 0; Fix this checkpatch error. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Hamza Mahfooz <Hamza.Mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-22drm/amdgpu: skip ASIC reset for APUs when go to S4Tim Huang
For GC IP v11.0.4/11, PSP TMR need to be reserved for ASIC mode2 reset. But for S4, when psp suspend, it will destroy the TMR that fails the ASIC reset. [ 96.006101] amdgpu 0000:62:00.0: amdgpu: MODE2 reset [ 100.409717] amdgpu 0000:62:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000011 SMN_C2PMSG_82:0x00000002 [ 100.411593] amdgpu 0000:62:00.0: amdgpu: Mode2 reset failed! [ 100.412470] amdgpu 0000:62:00.0: PM: pci_pm_freeze(): amdgpu_pmops_freeze+0x0/0x50 [amdgpu] returns -62 [ 100.414020] amdgpu 0000:62:00.0: PM: dpm_run_callback(): pci_pm_freeze+0x0/0xd0 returns -62 [ 100.415311] amdgpu 0000:62:00.0: PM: pci_pm_freeze+0x0/0xd0 returned -62 after 4623202 usecs [ 100.416608] amdgpu 0000:62:00.0: PM: failed to freeze async: error -62 We can skip the reset on APUs, assuming we can resume them properly. Verified on some GFX11, GFX10 and old GFX9 APUs. Signed-off-by: Tim Huang <tim.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23drm/amdgpu: change default behavior of bad_page_threshold parameterTao Zhou
Ignore ras umc bad page threshold by default, GPU initialization won't be stopped in this mode. v2: refine the description of bad_page_threshold. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23drm/amdgpu: add more fields into device info, caches sizes, etc.Marek Olšák
AMDGPU_IDS_FLAGS_CONFORMANT_TRUNC_COORD: important for conformance on gfx11 Other fields are exposed from IP discovery. enabled_rb_pipes_mask_hi is added for future chips, currently 0. Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403 Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23drm/amd: Don't allow s0ix on APUs older than RavenMario Limonciello
APUs before Raven didn't support s0ix. As we just relieved some of the safety checks for s0ix to improve power consumption on APUs that support it but that are missing BIOS support a new blind spot was introduced that a user could "try" to run s0ix. Plug this hole so that if users try to run s0ix on anything older than Raven it will just skip suspend of the GPU. Fixes: cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support") Suggested-by: Alexander Deucher <Alexander.Deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-09drm/amdgpu: add S/G display parameterAlex Deucher
Some users have reported flickerng with S/G display. We've tried extensively to reproduce and debug the issue on a wide variety of platform configurations (DRAM bandwidth, etc.) and a variety of monitors, but so far have not been able to. We disabled S/G display on a number of platforms to address this but that leads to failure to pin framebuffers errors and blank displays when there is memory pressure or no displays at all on systems with limited carveout (e.g., Chromebooks). Add a option to disable this as a debugging option as a way for users to disable this, depending on their use case, and for us to help debug this further. v2: fix typo Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-31Revert "drm/amdgpu: TA unload messages are not actually sent to psp when ↵Vitaly Prosyak
amdgpu is uninstalled" This reverts commit fac53471d0ea9693d314aa2df08d62b2e7e3a0f8. The following change: move the drm_dev_unplug call after amdgpu_driver_unload_kms in amdgpu_pci_remove. The reason is the following: amdgpu_pci_remove calls drm_dev_unregister and it should be called first to ensure userspace can't access the device instance anymore. If we call drm_dev_unplug after amdgpu_driver_unload_kms then we observe IGT PCI software unplug test failure (kernel hung) for all ASICs. This is how this regression was found. After this revert, the following commands do work not, but it would be fixed in the next commit: - sudo modprobe -r amdgpu - sudo modprobe amdgpu Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Reviewed-by Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-25Merge tag 'amd-drm-next-6.3-2023-01-20' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.3-2023-01-20: amdgpu: - Secure display fixes - Fix scaling - Misc code cleanups - Display BW alloc logic updates - DCN 3.2 fixes - Fix power reporting on certain firmwares for CZN/RN - SR-IOV fixes - Link training cleanup and code rework - HDCP fixes - Reserved VMID fix - Documentation updates - Colorspace fixes - RAS updates - GC11.0 fixes - VCN instance harvesting fixes - DCN 3.1.4/5 workarounds for S/G displays - Add PCIe info to the INFO IOCTL amdkfd: - XNACK fix UAPI: - Add PCIe gen/lanes info to the amdgpu INFO IOCTL Nesa ultimately plans to use this to make decisions about buffer placement optimizations Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20790 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230120234523.7610-1-alexander.deucher@amd.com
2023-01-24Merge tag 'drm-misc-next-2023-01-19' of ↵Daniel Vetter
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for $kernel-version: UAPI Changes: Cross-subsystem Changes: Core Changes: * Cleanup unneeded include statements wrt <linux/fb.h>, <drm/drm_fb_helper.h> and <drm/drm_crtc_helper.h> * Remove unused helper DRM_DEBUG_KMS_RATELIMITED() * fbdev: Remove obsolete aperture field from struct fb_device, plus driver cleanups; Remove unused flag FBINFO_MISC_FIRMWARE * MIPI-DSI: Fix brightness, plus rsp. driver updates * scheduler: Deprecate drm_sched_resubmit_jobs() * ttm: Fix MIPS build; Remove ttm_bo_wait(); Documentation fixes Driver Changes: * Remove obsolete drivers for userspace modesetting i810, mga, r128, savage, sis, tdfx, via * bridge: Support CDNS DSI J721E, plus DT bindings; lt9611: Various fixes and improvements; sil902x: Various fixes; Fixes * nouveau: Removed support for legacy ioctls; Replace zero-size array; Cleanups * panel: Fixes * radeon: Use new DRM logging helpers Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/Y8kDk5YX7Yz3eRhM@linux-uq9g
2023-01-19drm/amdgpu: return the PCIe gen and lanes from the INFO ioctlMarek Olšák
For computing PCIe bandwidth in userspace and troubleshooting PCIe bandwidth issues. Note that this intentionally fills holes and padding in drm_amdgpu_info_device. Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20790 Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-13drm/amdgpu: Do not include <linux/fb.h>Thomas Zimmermann
Remove unnecessary include statements for <linux/fb.h>. No functional changes. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20230111130206.29974-5-tzimmermann@suse.de
2023-01-09drm/amd: Delay removal of the firmware framebufferMario Limonciello
Removing the firmware framebuffer from the driver means that even if the driver doesn't support the IP blocks in a GPU it will no longer be functional after the driver fails to initialize. This change will ensure that unsupported IP blocks at least cause the driver to work with the EFI framebuffer. Cc: stable@vger.kernel.org Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-03Revert "drm/amd/display: Enable Freesync Video Mode by default"Michel Dänzer
This reverts commit de05abe6b9d0fe08f65d744f7f75a4cba4df27ad. The bug referenced below was bisected to this commit. There has been no activity toward fixing it in 3 months, so let's revert for now. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2162 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-03drm/amdgpu: allow zero as vram limitChristian König
This allows testing the driver without any VRAM. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-15drm/amdgpu: bump minor version number for DEV_INFO and SENSOR IOCTLs updateEvan Quan
Update AMDGPU_INFO_DEV_INFO IOCTL for minimum engine and memory clock. And update AMDGPU_INFO_SENSOR IOCTL for PEAK_PSTATE engine and memory clock. User applications can better utilize these IOCTLs to get needed informations. Increase the minor version number to indicate that the new flags are available. Proposed mesa patch: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/278 Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-12-09drm/amdgpu: handle polaris10/11 overlap asics (v2)Alex Deucher
Some special polaris 10 chips overlap with the polaris11 DID range. Handle this properly in the driver. v2: use local flags for other function calls. Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2022-12-01drm/amdgpu: improve GART and GTT documentationPeter Maucher
Document difference between amdgpu.gartsize and amdgpu.gttsize module parameters, as initially explained by Alex Deucher here: https://lists.freedesktop.org/archives/dri-devel/2022-October/375358.html v2: minor cleanups (Alex) Signed-off-by: Peter Maucher <bellosilicio@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29drm/amdgpu: use dev_dbg to print messages in runtime cycleGuchun Chen
Runtime PM can happen pretty frequently, as these printings may be annoyed, switch to dev_dbg. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29drm/amdgpu: add printing to indicate rpm completenessGuchun Chen
Add an explicit printing to tell when finishing rpm execution in amdgpu. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-23drm/amdgpu: update documentation of parameter amdgpu_gtt_sizeZhenGuo Yin
Fixes: f7ba887f606b ("drm/amdgpu: Adjust logic around GTT size (v3)") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-22Merge tag 'amd-drm-next-6.2-2022-11-18' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.2-2022-11-18: amdgpu: - SR-IOV fixes - Clean up DC checks - DCN 3.2.x fixes - DCN 3.1.x fixes - Don't enable degamma on asics which don't support it - IP discovery fixes - BACO fixes - Fix vbios allocation handling when vkms is enabled - Drop buggy tdr advanced mode GPU reset handling - Fix the build when DCN is not set in kconfig - MST DSC fixes - Userptr fixes - FRU and RAS EEPROM fixes - VCN 4.x RAS support - Aldrebaran CU occupancy reporting fix - PSP ring cleanup amdkfd: - Memory limit fix - Enable cooperative launch on gfx 10.3 amd-drm-next-6.2-2022-11-11: amdgpu: - SMU 13.x updates - GPUVM TLB race fix - DCN 3.1.4 updates - DCN 3.2.x updates - PSR fixes - Kerneldoc fix - Vega10 fan fix - GPUVM locking fixes in error pathes - BACO fix for Beige Goby - EEPROM I2C address cleanup - GFXOFF fix - Fix DC memory leak in error pathes - Flexible array updates - Mtype fix for GPUVM PTEs - Move Kconfig into amdgpu directory - SR-IOV updates - Fix possible memory leak in CS IOCTL error path amdkfd: - Fix possible memory overrun - CRIU fixes radeon: - ACPI ref count fix - HDA audio notifier support - Move Kconfig into radeon directory UAPI: - Add new GEM_CREATE flags to help to transition more KFD functionality to the DRM UAPI. These are used internally in the driver to align location based memory coherency requirements from memory allocated in the KFD with how we manage GPUVM PTEs. They are currently blocked in the GEM_CREATE IOCTL as we don't have a user right now. They are just used internally in the kernel driver for now for existing KFD memory allocations. So a change to the UAPI header, but no functional change in the UAPI. From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221118170807.6505-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
2022-11-16Merge tag 'drm-misc-next-2022-11-10-1' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for 6.2: UAPI Changes: Cross-subsystem Changes: Core Changes: - atomic-helper: Add begin_fb_access and end_fb_access hooks - fb-helper: Rework to move fb emulation into helpers - scheduler: rework entity flush, kill and fini - ttm: Optimize pool allocations Driver Changes: - amdgpu: scheduler rework - hdlcd: Switch to DRM-managed resources - ingenic: Fix registration error path - lcdif: FIFO threshold tuning - meson: Fix return type of cvbs' mode_valid - ofdrm: multiple fixes (kconfig, types, endianness) - sun4i: A100 and D1 support - panel: - New Panel: Jadard JD9365DA-H3 Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20221110083612.g63eaocoaa554soh@houat
2022-11-15drm/amdgpu: revert "implement tdr advanced mode"Christian König
This reverts commit e6c6338f393b74ac0b303d567bb918b44ae7ad75. This feature basically re-submits one job after another to figure out which one was the one causing a hang. This is obviously incompatible with gang-submit which requires that multiple jobs run at the same time. It's also absolutely not helpful to crash the hardware multiple times if a clean recovery is desired. For testing and debugging environments we should rather disable recovery alltogether to be able to inspect the state with a hw debugger. Additional to that the sw implementation is clearly buggy and causes reference count issues for the hardware fence. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15drm/amdgpu: remove the DID of Vangogh from pciidlistPerry Yuan
change the vangogh family to use IP discovery path to initialize IP list, this needs to remove the DID from the PCI ID list to allow the IP discovery path to set all the IP versions correctly. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-15drm/amdgpu: clarify DC checksAlex Deucher
There are several places where we don't want to check if a particular asic could support DC, but rather, if DC is enabled. Set a flag if DC is enabled and check for that rather than if a device supports DC or not. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-05drm/fb-helper: Move generic fbdev emulation into separate source fileThomas Zimmermann
Move the generic fbdev implementation into its own source and header file. Adapt drivers. No functional changes, but some of the internal helpers have been renamed to fit into the drm_fbdev_ naming scheme. v3: * rename drm_fbdev.{c,h} to drm_fbdev_generic.{c,h} * rebase onto vmwgfx changes * rebase onto xlnx changes * fix include statements in amdgpu Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221103151446.2638-22-tzimmermann@suse.de
2022-11-01drm/amdgpu: Disable GPU reset on SRIOV before remove pci.Gavin Wan
The recent change brought a bug on SRIOV envrionment. It caused unloading amdgpu failed on Guest VM. The reason is that the VF FLR was requested while unloading amdgpu driver, but the VF FLR of SRIOV sequence is wrong while removing PCI device. For SRIOV, the guest driver should not trigger the whole XGMI hive to do the reset. Host driver control how the device been reset. Fixes: f5c7e7797060 ("drm/amdgpu: Adjust removal control flow for smu v13_0_2") Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Gavin Wan <Gavin.Wan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-10-07Merge tag 'driver-core-6.1-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core and debug printk changes for 6.1-rc1. Included in here is: - dynamic debug updates for the core and the drm subsystem. The drm changes have all been acked by the relevant maintainers - kernfs fixes for syzbot reported problems - kernfs refactors and updates for cgroup requirements - magic number cleanups and removals from the kernel tree (they were not being used and they really did not actually do anything) - other tiny cleanups All of these have been in linux-next for a while with no reported issues" * tag 'driver-core-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (74 commits) docs: filesystems: sysfs: Make text and code for ->show() consistent Documentation: NBD_REQUEST_MAGIC isn't a magic number a.out: restore CMAGIC device property: Add const qualifier to device_get_match_data() parameter drm_print: add _ddebug descriptor to drm_*dbg prototypes drm_print: prefer bare printk KERN_DEBUG on generic fn drm_print: optimize drm_debug_enabled for jump-label drm-print: add drm_dbg_driver to improve namespace symmetry drm-print.h: include dyndbg header drm_print: wrap drm_*_dbg in dyndbg descriptor factory macro drm_print: interpose drm_*dbg with forwarding macros drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers. drm_print: condense enum drm_debug_category debugfs: use DEFINE_SHOW_ATTRIBUTE to define debugfs_regset32_fops driver core: use IS_ERR_OR_NULL() helper in device_create_groups_vargs() Documentation: ENI155_MAGIC isn't a magic number Documentation: NBD_REPLY_MAGIC isn't a magic number nbd: remove define-only NBD_MAGIC, previously magic number Documentation: FW_HEADER_MAGIC isn't a magic number Documentation: EEPROM_MAGIC_VALUE isn't a magic number ...
2022-09-24drm: POC drm on dyndbg - use in core, 2 helpers, 3 drivers.Jim Cromie
Use DECLARE_DYNDBG_CLASSMAP across DRM: - in .c files, since macro defines/initializes a record - in drivers, $mod_{drv,drm,param}.c ie where param setup is done, since a classmap is param related - in drm/drm_print.c since existing __drm_debug param is defined there, and we ifdef it, and provide an elaborated alternative. - in drm_*_helper modules: dp/drm_dp - 1st item in makefile target drivers/gpu/drm/drm_crtc_helper.c - random pick iirc. Since these modules all use identical CLASSMAP declarations (ie: names and .class_id's) they will all respond together to "class DRM_UT_*" query-commands: :#> echo class DRM_UT_KMS +p > /proc/dynamic_debug/control NOTES: This changes __drm_debug from int to ulong, so BIT() is usable on it. DRM's enum drm_debug_category values need to sync with the index of their respective class-names here. Then .class_id == category, and dyndbg's class FOO mechanisms will enable drm_dbg(DRM_UT_KMS, ...). Though DRM needs consistent categories across all modules, thats not generally needed; modules X and Y could define FOO differently (ie a different NAME => class_id mapping), changes are made according to each module's private class-map. No callsites are actually selected by this patch, since none are class'd yet. Signed-off-by: Jim Cromie <jim.cromie@gmail.com> Link: https://lore.kernel.org/r/20220912052852.1123868-3-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-09-20drm/amdgpu: bump minor for gang submitChristian König
Since that has now landed bump the minor to let userspace know about it. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19drm/amdgpu: Fixed psp fence and memory issues when removing amdgpu deviceYiPeng Chai
V3: Fixed psp fence and memory issues for the asic using smu v13_0_2 when removing amdgpu device. [Why]: 1. psp_suspend->psp_free_shared_bufs-> psp_ta_free_shared_buf-> amdgpu_bo_free_kernel-> ...->amdgpu_bo_release_notify-> amdgpu_fill_buffer psp will free vram memory used by psp when psp_suspend is called. But for the asic using smu v13_0_2, because psp_suspend is called before adev->shutdown is set to true when removing the first hive device, amdgpu fill_buffer will be called, which will cause fence issues when evicting all vram resources in amdgpu vram mgr_fini. 2. Since psp_hw_fini is not called after calling psp_suspend and psp_suspend only calls psp_ring_stop, the psp ring memory will not be released when amdgpu device is removed. [How]: 1. Set shutdown to true before calling amdgpu_device_gpu_recover, then amdgpu_fill_buffer will not be called when psp_suspend is called. 2. Free psp ring memory in psp_sw_fini. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19drm/amdgpu: Adjust removal control flow for smu v13_0_2YiPeng Chai
Adjust removal control flow for smu v13_0_2: During amdgpu uninstallation, when removing the first device, the kernel needs to first send a mode1reset message to all gpu devices. Otherwise, smu initialization will fail the next time amdgpu is installed. V2: 1. Update commit comments. 2. Remove the global variable amdgpu_device_remove_cnt and add a variable to the structure amdgpu_hive_info. 3. Use hive to detect the first removed device instead of a global variable. V3: 1. Update commit comments. 2. Split a patch into multiple patches. 3. The current patch does: a. Add a work mode of AMDGPU_RESET_FOR_DEVICE_REMOVE into the existing gpu recover path, which make all devices in hive list only have HW reset but no resume (except the base IP). b. Call AMDGPU_RESET_FOR_DEVICE_REMOVE and AMDGPU_NEED_FULL_RESET mode of amdgpu_device_gpu_recover in amdgpu_pci_remove when removing the first device in hive list. c. When removing the first device, the IP blocks keyword function call sequence is as follows: .suspend->mode1reset->.resume(basic ip)->.hw_fini->.early_fini->.sw_fini. ^ | |-<----------<---------<----| The first three sequences are because of a call to amdgpu_device_gpu_recover. The three sequences will be executed in a loop until all devices in the hive list are iterated. The sequences starting from .hw_fini only apply to the first device. Since .suspend has been called before, except the resumed phase1 basic ip blocks, all other ip blocks .hw_fini of current device will do nothing. d. When removing other devices, the calling sequences is the same as legacy: .hw_fini -> .early_fini -> .sw_fini. Since .suspend has been called when removing the first device, except the resumed phase1 basic ip blocks, all of other ip blocks .hw_fini of current device will do nothing. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-07drm/amdgpu: TA unload messages are not actually sent to psp when amdgpu is ↵YiPeng Chai
uninstalled V1: The psp_cmd_submit_buf function is called by psp_hw_fini to send TA unload messages to psp to terminate ras, asd and tmr. But when amdgpu is uninstalled, drm_dev_unplug is called earlier than psp_hw_fini in amdgpu_pci_remove, the calling order as follows: static void amdgpu_pci_remove(struct pci_dev *pdev) { drm_dev_unplug ...... amdgpu_driver_unload_kms->amdgpu_device_fini_hw->... ->.hw_fini->psp_hw_fini->... ->psp_ta_unload->psp_cmd_submit_buf ...... } The program will return when calling drm_dev_enter in psp_cmd_submit_buf. So the call to drm_dev_enter in psp_cmd_submit_buf should be removed, so that the TA unload messages can be sent to the psp when amdgpu is uninstalled. V2: 1. Restore psp_cmd_submit_buf to its original code. 2. Move drm_dev_unplug call after amdgpu_driver_unload_kms in amdgpu_pci_remove. 3. Since amdgpu_device_fini_hw is called by amdgpu_driver_unload_kms, remove the unplug check to release device mmio resource in amdgpu_device_fini_hw before calling drm_dev_unplug. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-29drm/amdgpu: add missing pci_disable_device() in amdgpu_pmops_runtime_resume()Yang Yingliang
Add missing pci_disable_device() if amdgpu_device_resume() fails. Fixes: 8e4d5d43cc6c ("drm/amdgpu: Handling of amdgpu_device_resume return value for graceful teardown") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25drm/amd/display: Add visualconfirm module parameterLeo Li
[Why] Being able to configure visual confirm at boot or in cmdline is helpful when debugging. [How] Add a module parameter to configure DC visual confirm, which works the same way as the equivalent debugfs entry. Signed-off-by: Leo Li <sunpeng.li@amd.com> Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25drm/amdgpu: bump driver version for IP discovery info in HW INFOAlex Deucher
So userspace knows when it is available. Proposed mesa patch: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17411/diffs?commit_id=c8a63590dfd0d64e6e6a634dcfed993f135dd075 Reviewed-by: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-25drm/amdgpu: Fix comment typoJason Wang
The double `to' is duplicated in the comment, remove one. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-07-18drm/amdgpu: drop runpm from amdgpu_device structureGuchun Chen
It's redundant, as now switching to rpm_mode to indicate runtime power management mode. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>