summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)Author
2022-01-09Revert "drm/amdgpu: stop scheduler when calling hw_fini (v2)"Len Brown
This reverts commit f7d6779df642720e22bffd449e683bb8690bd3bf. This bisected regression has impacted suspend-resume stability since 5.15-rc1. It regressed -stable via 5.14.10. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215315 Fixes: f7d6779df64 ("drm/amdgpu: stop scheduler when calling hw_fini (v2)") Cc: Guchun Chen <guchun.chen@amd.com> Cc: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: <stable@vger.kernel.org> # 5.14+ Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31drm/amdgpu: disable runpm if we are the primary adapterAlex Deucher
If we are the primary adapter (i.e., the one used by the firwmare framebuffer), disable runtime pm. This fixes a regression caused by commit 55285e21f045 which results in the displays waking up shortly after they go to sleep due to the device coming out of runtime suspend and sending a hotplug uevent. v2: squash in reworked fix from Evan Fixes: 55285e21f045 ("fbdev/efifb: Release PCI device's runtime PM ref during FB destroy") Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215203 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1840 Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-31Merge branch 'drm-misc-fixes' of ssh://git.freedesktop.org/git/drm/drm-misc ↵Dave Airlie
into drm-fixes This merges two fixes that haven't been sent to me yet, but I wanted to get in. One amdgpu fix, but one nouveau regression fixer. Signed-off-by: Dave Airlie <airlied@redhat.com>
2021-12-28drm/amdgpu: no DC support for headless chipsAlex Deucher
Chips with no display hardware should return false for DC support. v2: drop Arcturus and Aldebaran Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reported-by: Tareque Md.Hanif <tarequemd.hanif@yahoo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-27drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable ↵Evan Quan
platform By setting mp1_state as PP_MP1_STATE_UNLOAD, MP1 will do some proper cleanups and put itself into a state ready for PNP. That can workaround some random resuming failure observed on BOCO capable platforms. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-27drm/amdgpu: always reset the asic in suspend (v2)Alex Deucher
If the platform suspend happens to fail and the power rail is not turned off, the GPU will be in an unknown state on resume, so reset the asic so that it will be in a known good state on resume even if the platform suspend failed. v2: handle s0ix Acked-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-22drm/amdgpu: fix runpm documentationAlex Deucher
It's not only supported by HG/PX laptops. It's supported by all dGPUs which supports BOCO/BACO functionality (runtime D3). BOCO - Bus Off, Chip Off. The entire chip is powered off. This is controlled by ACPI. BACO - Bus Active, Chip Off. The chip still shows up on the PCI bus, but the device itself is powered down. v2: fix missed HG/PX reference Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-17drm/amdgpu: add support for IP discovery gc_info table v2Alex Deucher
Used on gfx9 based systems. Fixes incorrect CU counts reported in the kernel log. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1833 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-12-17drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly ↵chen gong
enabled Play a video on the raven (or PCO, raven2) platform, and then do the S3 test. When resume, the following error will be reported: amdgpu 0000:02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vcn_v1_0> failed -110 amdgpu 0000:02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 [why] When playing the video: The power state flag of the vcn block is set to POWER_STATE_ON. When doing suspend: There is no change to the power state flag of the vcn block, it is still POWER_STATE_ON. When doing resume: Need to open the power gate of the vcn block and set the power state flag of the VCN block to POWER_STATE_ON. But at this time, the power state flag of the vcn block is already POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm: avoid duplicate powergate/ungate setting" patch will return the amdgpu_dpm_set_powergating_by_smu function directly. As a result, the gate of the power was not opened, causing the subsequent ring test to fail. [how] In the suspend function of the vcn block, explicitly change the power state flag of the vcn block to POWER_STATE_OFF. BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1828 Signed-off-by: chen gong <curry.gong@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-12-17drm/amdgpu: introduce new amdgpu_fence object to indicate the job embedded fenceHuang Rui
The job embedded fence donesn't initialize the flags at dma_fence_init(). Then we will go a wrong way in amdgpu_fence_get_timeline_name callback and trigger a null pointer panic once we enabled the trace event here. So introduce new amdgpu_fence object to indicate the job embedded fence. [ 156.131790] BUG: kernel NULL pointer dereference, address: 00000000000002a0 [ 156.131804] #PF: supervisor read access in kernel mode [ 156.131811] #PF: error_code(0x0000) - not-present page [ 156.131817] PGD 0 P4D 0 [ 156.131824] Oops: 0000 [#1] PREEMPT SMP PTI [ 156.131832] CPU: 6 PID: 1404 Comm: sdma0 Tainted: G OE 5.16.0-rc1-custom #1 [ 156.131842] Hardware name: Gigabyte Technology Co., Ltd. Z170XP-SLI/Z170XP-SLI-CF, BIOS F20 11/04/2016 [ 156.131848] RIP: 0010:strlen+0x0/0x20 [ 156.131859] Code: 89 c0 c3 0f 1f 80 00 00 00 00 48 01 fe eb 0f 0f b6 07 38 d0 74 10 48 83 c7 01 84 c0 74 05 48 39 f7 75 ec 31 c0 c3 48 89 f8 c3 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 [ 156.131872] RSP: 0018:ffff9bd0018dbcf8 EFLAGS: 00010206 [ 156.131880] RAX: 00000000000002a0 RBX: ffff8d0305ef01b0 RCX: 000000000000000b [ 156.131888] RDX: ffff8d03772ab924 RSI: ffff8d0305ef01b0 RDI: 00000000000002a0 [ 156.131895] RBP: ffff9bd0018dbd60 R08: ffff8d03002094d0 R09: 0000000000000000 [ 156.131901] R10: 000000000000005e R11: 0000000000000065 R12: ffff8d03002094d0 [ 156.131907] R13: 000000000000001f R14: 0000000000070018 R15: 0000000000000007 [ 156.131914] FS: 0000000000000000(0000) GS:ffff8d062ed80000(0000) knlGS:0000000000000000 [ 156.131923] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 156.131929] CR2: 00000000000002a0 CR3: 000000001120a005 CR4: 00000000003706e0 [ 156.131937] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 156.131942] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 156.131949] Call Trace: [ 156.131953] <TASK> [ 156.131957] ? trace_event_raw_event_dma_fence+0xcc/0x200 [ 156.131973] ? ring_buffer_unlock_commit+0x23/0x130 [ 156.131982] dma_fence_init+0x92/0xb0 [ 156.131993] amdgpu_fence_emit+0x10d/0x2b0 [amdgpu] [ 156.132302] amdgpu_ib_schedule+0x2f9/0x580 [amdgpu] [ 156.132586] amdgpu_job_run+0xed/0x220 [amdgpu] v2: fix mismatch warning between the prototype and function name (Ray, kernel test robot) Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-17drm/amdgpu: fix dropped backing store handling in amdgpu_dma_buf_move_notifyChristian König
bo->tbo.resource can now be NULL. Signed-off-by: Christian König <christian.koenig@amd.com> Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1811 Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211210083927.1754-1-christian.koenig@amd.com
2021-12-14drm/amdgpu: correct the wrong cached state for GMC on PICASSOEvan Quan
Pair the operations did in GMC ->hw_init and ->hw_fini. That can help to maintain correct cached state for GMC and avoid unintention gate operation dropping due to wrong cached state. BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1828 Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14drm/amdgpu: don't override default ECO_BITs settingHawking Zhang
Leave this bit as hardware default setting Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-12-14drm/amdgpu: correct register access for RLC_JUMP_TABLE_RESTORELe Ma
should count on GC IP base address Signed-off-by: Le Ma <le.ma@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-12-01drm/amdgpu: adjust the kfd reset sequence in reset sriov functionshaoyunl
This change revert previous commits: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") This change moves the amdgpu_amdkfd_pre_reset to an earlier place in amdgpu_device_reset_sriov, presumably to address the sequence issue that the first patch was originally meant to fix. Some register access(GRBM_GFX_CNTL) only be allowed on full access mode. Move kfd_pre_reset and kfd_post_reset back inside reset_sriov function. Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") Fixes: 271fd38ce56d ("drm/amdgpu: move kfd post_reset out of reset_sriov function") Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdkfd: fix double free mem structurePhilip Yang
drm_gem_object_put calls release_notify callback to free the mem structure and unreserve_mem_limit, move it down after the last access of mem and make it conditional call. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu: Don't halt RLC on GFX suspendLijo Lazar
On aldebaran, RLC also controls GFXCLK. Skip halting RLC during GFX IP suspend and keep it running till PMFW disables all DPMs. [ 578.019986] amdgpu 0000:23:00.0: amdgpu: GPU reset begin! [ 583.245566] amdgpu 0000:23:00.0: amdgpu: Failed to disable smu features. [ 583.245621] amdgpu 0000:23:00.0: amdgpu: Fail to disable dpm features! [ 583.245639] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62 [ 583.248504] [drm] free PSP TMR buffer Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu: fix the missed handling for SDMA2 and SDMA3Guchun Chen
There is no base reg offset or ip_version set for SDMA2 and SDMA3 on SIENNA_CICHLID, so add them. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Kevin Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu: check atomic flag to differeniate with legacy pathFlora Cui
since vkms support atomic KMS interface Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <aleander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu: cancel the correct hrtimer on exitFlora Cui
Signed-off-by: Flora Cui <flora.cui@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-01drm/amdgpu/sriov/vcn: add new vcn ip revision check case for SIENNA_CICHLIDJane Jian
[WHY] for sriov odd# vf will modify vcn0 engine ip revision(due to multimedia bandwidth feature), which will be mismatched with original vcn0 revision [HOW] add new version check for vcn0 disabled revision(3, 0, 192), typically modified under sriov mode Signed-off-by: Jane Jian <Jane.Jian@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amd/display: update bios scratch when setting backlightAlex Deucher
Update the bios scratch register when updating the backlight level. Some platforms apparently read this scratch register and do additional operations in their hotkey handlers. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1518 Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: Skip ASPM programming on aldebaranLijo Lazar
There is no need for additional programming, keep the default settings. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: fix byteorder error in amdgpu discoveryYang Wang
fix some byteorder issues about amdgpu discovery. This will result in running errors on the big end system. (e.g:MIPS) Signed-off-by: Yang Wang <KevinYang.Wang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: enable Navi retry fault wptr overflowPhilip Yang
If xnack is on, VM retry fault interrupt send to IH ring1, and ring1 will be full quickly. IH cannot receive other interrupts, this causes deadlock if migrating buffer using sdma and waiting for sdma done while handling retry fault. Remove VMC from IH storm client, enable ring1 write pointer overflow, then IH will drop retry fault interrupts and be able to receive other interrupts while driver is handling retry fault. IH ring1 write pointer doesn't writeback to memory by IH, and ring1 write pointer recorded by self-irq is not updated, so always read the latest ring1 write pointer from register. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: enable Navi 48-bit IH timestamp counterPhilip Yang
By default this timestamp is 32 bit counter. It gets overflowed in around 10 minutes. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: IH process reset count when restartPhilip Yang
Otherwise when IH process restart, count is zero, the loop will not exit to wake_up_all after processing AMDGPU_IH_MAX_NUM_IVS interrupts. Cc: stable@vger.kernel.org Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu/gfx9: switch to golden tsc registers for renoir+Alex Deucher
Renoir and newer gfx9 APUs have new TSC register that is not part of the gfxoff tile, so it can be read without needing to disable gfx off. Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu/gfx10: add wraparound gpu counter check for APUs as wellAlex Deucher
Apply the same check we do for dGPUs for APUs as well. Acked-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: move kfd post_reset out of reset_sriov functionshaoyunl
Fixes: 9f4f2c1a3524 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov") For sriov XGMI configuration, the host driver will handle the hive reset, so in guest side, the reset_sriov only be called once on one device. This will make kfd post_reset unblanced with kfd pre_reset since kfd pre_reset already been moved out of reset_sriov function. Move kfd post_reset out of reset_sriov function to make them balance . Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: Fix double free of dmabufxinhui pan
amdgpu_amdkfd_gpuvm_free_memory_of_gpu drop dmabuf reference increased in amdgpu_gem_prime_export. amdgpu_bo_destroy drop dmabuf reference increased in amdgpu_gem_prime_import. So remove this extra dma_buf_put to avoid double free. Signed-off-by: xinhui pan <xinhui.pan@amd.com> Tested-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24drm/amdgpu: Fix MMIO HDP flush on SRIOVFelix Kuehling
Disable HDP register remapping on SRIOV and set rmmio_remap.reg_offset to the fixed address of the VF register for hdp_v*_flush_hdp. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Tested-by: Bokun Zhang <bokun.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amd/amdgpu: fix potential memleakBernard Zhao
In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed There is a potential memleak if not call kobject_put. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Bernard Zhao <bernard@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17drm/amdgpu: fix set scaling mode Full/Full aspect/Center not works on vga ↵hongao
and dvi connectors amdgpu_connector_vga_get_modes missed function amdgpu_get_native_mode which assign amdgpu_encoder->native_mode with *preferred_mode result in amdgpu_encoder->native_mode.clock always be 0. That will cause amdgpu_connector_set_property returned early on: if ((rmx_type != DRM_MODE_SCALE_NONE) && (amdgpu_encoder->native_mode.clock == 0)) when we try to set scaling mode Full/Full aspect/Center. Add the missing function to amdgpu_connector_vga_get_mode can fix this. It also works on dvi connectors because amdgpu_connector_dvi_helper_funcs.get_mode use the same method. Signed-off-by: hongao <hongao@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-11-17drm/amd/pm: avoid duplicate powergate/ungate settingEvan Quan
Just bail out if the target IP block is already in the desired powergate/ungate state. This can avoid some duplicate settings which sometimes may cause unexpected issues. Link: https://lore.kernel.org/all/YV81vidWQLWvATMM@zn.tnic/ Bug: https://bugzilla.kernel.org/show_bug.cgi?id=214921 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215025 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1789 Fixes: bf756fb833cb ("drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend") Signed-off-by: Evan Quan <evan.quan@amd.com> Tested-by: Borislav Petkov <bp@suse.de> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-11-17drm/amdgpu: add error print when failing to add IP block(v2)Guchun Chen
Driver initialization is driven by IP version from IP discovery table. So add error print when failing to add ip block during driver initialization, this will be more friendly to user to know which IP version is not correct. [ 40.467361] [drm] host supports REQ_INIT_DATA handshake [ 40.474076] [drm] add ip block number 0 <nv_common> [ 40.474090] [drm] add ip block number 1 <gmc_v10_0> [ 40.474101] [drm] add ip block number 2 <psp> [ 40.474103] [drm] add ip block number 3 <navi10_ih> [ 40.474114] [drm] add ip block number 4 <smu> [ 40.474119] [drm] add ip block number 5 <amdgpu_vkms> [ 40.474134] [drm] add ip block number 6 <gfx_v10_0> [ 40.474143] [drm] add ip block number 7 <sdma_v5_2> [ 40.474147] amdgpu 0000:00:08.0: amdgpu: Fatal error during GPU init [ 40.474545] amdgpu 0000:00:08.0: amdgpu: amdgpu: finishing device. v2: use dev_err to multi-GPU system Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-12Merge tag 'drm-next-2021-11-12' of git://anongit.freedesktop.org/drm/drmLinus Torvalds
Pull more drm updates from Dave Airlie: "I missed a drm-misc-next pull for the main pull last week. It wasn't that major and isn't the bulk of this at all. This has a bunch of fixes all over, a lot for amdgpu and i915. bridge: - HPD improvments for lt9611uxc - eDP aux-bus support for ps8640 - LVDS data-mapping selection support ttm: - remove huge page functionality (needs reworking) - fix a race condition during BO eviction panels: - add some new panels fbdev: - fix double-free - remove unused scrolling acceleration - CONFIG_FB dep improvements locking: - improve contended locking logging - naming collision fix dma-buf: - add dma_resv_for_each_fence iterator - fix fence refcounting bug - name locking fixesA prime: - fix object references during mmap nouveau: - various code style changes - refcount fix - device removal fixes - protect client list with a mutex - fix CE0 address calculation i915: - DP rates related fixes - Revert disabling dual eDP that was causing state readout problems - put the cdclk vtables in const data - Fix DVO port type for older platforms - Fix blankscreen by turning DP++ TMDS output buffers on encoder->shutdown - CCS FBs related fixes - Fix recursive lock in GuC submission - Revert guc_id from i915_request tracepoint - Build fix around dmabuf amdgpu: - GPU reset fix - Aldebaran fix - Yellow Carp fixes - DCN2.1 DMCUB fix - IOMMU regression fix for Picasso - DSC display fixes - BPC display calculation fixes - Other misc display fixes - Don't allow partial copy from user for DC debugfs - SRIOV fixes - GFX9 CSB pin count fix - Various IP version check fixes - DP 2.0 fixes - Limit DCN1 MPO fix to DCN1 amdkfd: - SVM fixes - Fix gfx version for renoir - Reset fixes udl: - timeout fix imx: - circular locking fix virtio: - NULL ptr deref fix" * tag 'drm-next-2021-11-12' of git://anongit.freedesktop.org/drm/drm: (126 commits) drm/ttm: Double check mem_type of BO while eviction drm/amdgpu: add missed support for UVD IP_VERSION(3, 0, 64) drm/amdgpu: drop jpeg IP initialization in SRIOV case drm/amd/display: reject both non-zero src_x and src_y only for DCN1x drm/amd/display: Add callbacks for DMUB HPD IRQ notifications drm/amd/display: Don't lock connection_mutex for DMUB HPD drm/amd/display: Add comment where CONFIG_DRM_AMD_DC_DCN macro ends drm/amdkfd: Fix retry fault drain race conditions drm/amdkfd: lower the VAs base offset to 8KB drm/amd/display: fix exit from amdgpu_dm_atomic_check() abruptly drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov drm/amdgpu: fix uvd crash on Polaris12 during driver unloading drm/i915/adlp/fb: Prevent the mapping of redundant trailing padding NULL pages drm/i915/fb: Fix rounding error in subsampled plane size calculation drm/i915/hdmi: Turn DP++ TMDS output buffers back on in encoder->shutdown() drm/locking: fix __stack_depot_* name conflict drm/virtio: Fix NULL dereference error in virtio_gpu_poll drm/amdgpu: fix SI handling in amdgpu_device_asic_has_dc_support() drm/amdgpu: Fix dangling kfd_bo pointer for shared BOs drm/amd/amdkfd: Don't sent command to HWS on kfd reset ...
2021-11-11Merge tag 'amd-drm-fixes-5.16-2021-11-10' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-fixes-5.16-2021-11-10: amdgpu: - Don't allow partial copy from user for DC debugfs - SRIOV fixes - GFX9 CSB pin count fix - Various IP version check fixes - DP 2.0 fixes - Limit DCN1 MPO fix to DCN1 amdkfd: - SVM fixes - Reset fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211110222536.7527-1-alexander.deucher@amd.com
2021-11-11Merge tag 'drm-misc-next-fixes-2021-11-10' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next Removed the TTM Huge Page functionnality to address a crash, a timeout fix for udl, CONFIG_FB dependency improvements, a fix for a circular locking depency in imx, a NULL pointer dereference fix for virtio, and a naming collision fix for drm/locking. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20211110082114.vfpkpnecwdfg27lk@gilmour
2021-11-10drm/amdgpu: add missed support for UVD IP_VERSION(3, 0, 64)Guchun Chen
Fixes: 96b8dd4423e74d ("drm/amdgpu/amdgpu_vcn: convert to IP version checking") Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-10drm/amdgpu: drop jpeg IP initialization in SRIOV caseGuchun Chen
Fixes: b05b9c591f9ed6 ("drm/amdgpu: clean up set IP function") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-09drm/amd/amdgpu: fix the kfd pre_reset sequence in sriovshaoyunl
The KFD pre_reset should be called before reset been executed, it will hold the lock to prevent other rocm process to sent the packlage to hiq during host execute the real reset on the HW Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-09drm/amdgpu: fix uvd crash on Polaris12 during driver unloadingEvan Quan
There was a change(below) target for such issue: d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload") But the fix for VI ASICs was missing there. This is a supplement for that. Fixes: d82e2c249c8f ("drm/amdgpu: Fix crash on device remove/driver unload") Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amdgpu: fix SI handling in amdgpu_device_asic_has_dc_support()Alex Deucher
Properly handle SI DC support when CONFIG_DRM_AMD_DC_SI is not set. Fixes: f7f12b25823c0d ("drm/amdgpu: default to true in amdgpu_device_asic_has_dc_support") Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amdgpu: Fix dangling kfd_bo pointer for shared BOsFelix Kuehling
If a kfd_bo was shared (e.g. a dmabuf export), the original kfd_bo may be freed when the amdgpu_bo still lives on. Free the kfd_bo struct in the release_notify callback then the amdgpu_bo is freed. Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-By: Ramesh Errabolu <Ramesh.Errabolu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amdgpu: correctly toggle gfx on/off around RLC_SPM_* register accessEvan Quan
As part of the ib padding process, accessing the RLC_SPM_* register may trigger gfx hang. Since gfxoff may be already kicked during the whole period. To address that, we manually toggle gfx on/off around the RLC_SPM_* register access. This can resolve the gfx hang issue observed on running Talos with RDP launched in parallel. Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amdgpu: correct xgmi ras error count resetTao Zhou
The error count reset for xgmi3x16 pcs is missed. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amd/amdgpu: Fix csb.bo pin_count leak on gfx 9YuBiao Wang
[Why] csb bo is not unpinned in gfx 9. It will lead to pin_count leak on driver unload. [How] Call bo_free_kernel corresponding to bo_create_kernel in gfx_rlc_init_csb. This will also unify the code path with other gfx versions. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amd/amdgpu: Avoid writing GMC registers under sriov in gmc9YuBiao Wang
[Why] For Vega10, disabling gart of gfxhub could mess up KIQ and PSP under sriov mode, and lead to DMAR on host side. [How] Skip writing GMC registers under sriov. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amdgpu: Make sure to reserve BOs before adding or removingKent Russell
BOs need to be reserved before they are added or removed, so ensure that they are reserved during kfd_mem_attach and kfd_mem_detach Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>