pm24.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2020-04-22	drm/amdgpu: cleanup coding style in amdkfd a bit	Bernard Zhao
	Make the code a bit more readable by using a common error handling pattern. Signed-off-by: Bernard Zhao <bernard@vivo.com> Reviewed-by: Christian König <christian.koenig@amd.com>. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: clean up unused variable about ring lru	Kevin Wang
	clean up unused variable: 1. ring_lru_list 2. ring_lru_list_lock related-commit: drm/amdgpu: remove ring lru handling Signed-off-by: Kevin Wang <kevin1.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: replace DRM prefix with PCI device info for gfx/mmhub	Dennis Li
	Prefix RAS message printing in gfx/mmhub with PCI device info, which assists the debug in multiple GPU case. Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: disble vblank when unloading sriov driver	Jiawei
	disble vblank in dce_vitual_crtc_commit(), which is skipped under sriov before Reviewed-by: Emily Deng <Emily.Deng@amd.com> Signed-off-by: Jiawei <Jiawei.Gu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: Print CU information by default during initialization	Yong Zhao
	This is convenient for multiple teams to obtain the information. Also, add device info by using dev_info(). Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: Adjust the SDMA doorbell info printing	Yong Zhao
	Turn off the printing by default because it is not very useful, while adding more details. Signed-off-by: Yong Zhao <Yong.Zhao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: fix race between pstate and remote buffer map	Jonathan Kim
	Vega20 arbitrates pstate at hive level and not device level. Last peer to remote buffer unmap could drop P-State while another process is still remote buffer mapped. With this fix, P-States still needs to be disabled for now as SMU bug was discovered on synchronous P2P transfers. This should be fixed in the next FW update. Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	Revert "drm/amdgpu: Disable gfx off if VCN is busy"	James Zhu
	This reverts commit 3fded222f4bf7f4c56ef4854872a39a4de08f7a8 This is work around for vcn1 only. Currently vcn1 has separate begin_use and idle work handle. Signed-off-by: James Zhu <James.Zhu@amd.com> Tested-by: changzhu <Changfeng.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU	Guchun Chen
	When running ras uncorrectable error injection and triggering GPU reset on sGPU, below issue is observed. It's caused by the list uninitialized when accessing. [ 80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750 [ 80.047300] #PF: supervisor write access in kernel mode [ 80.047351] #PF: error_code(0x0003) - permissions violation [ 80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061 [ 80.047477] Oops: 0003 [#1] SMP PTI [ 80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G OE 5.4.0-rc7-guchchen #1 [ 80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018 [ 80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu] Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: Disable FRU read on Arcturus	Kent Russell
	Update the list with supported Arcturus chips, but disable for now until final list is confirmed. Ideally we can poll atombios for FRU support, instead of maintaining this list of chips, but this will enable serial number reading for supported ASICs for the time-being. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu/gmc: Fix spelling mistake.	Rajneesh Bhardwaj
	Fixes a minor typo in the file. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"	Kent Russell
	This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e. The original patch causes a RAS event and subsequent kernel hard-hang when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and Arcturus dmesg output at hang time: [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected! amdgpu 0000:67:00.0: GPU reset begin! Evicting PASID 0x8000 queues Started evicting pasid 0x8000 qcm fence wait loop timeout expired The cp might be in an unrecoverable state due to an unsuccessful queues preemption Failed to evict process queues Failed to suspend process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost due to RAS ERREVENT_ATHUB_INTERRUPT amdgpu: [powerplay] Failed to send message 0x26, response 0x0 amdgpu: [powerplay] Failed to set soft min gfxclk ! amdgpu: [powerplay] Failed to upload DPM Bootup Levels! amdgpu: [powerplay] Failed to send message 0x7, response 0x0 amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu features! amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features! amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM! [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] ERROR suspend of IP block <powerplay> failed -5 Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu/gfx9: add gfxoff quirk	Alex Deucher
	Fix screen corruption with firefox. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=207171 Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: set mp1 state before reload	John Clements
	Set MP1 state to prepare for unload before reloading SMU FW Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: update psp fw loading sequence	John Clements
	Added dedicated function to check if particular fw should be skipped from loading. Added dedicated function for SMU FW loading via PSP Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-22	drm/amdgpu: fix the hw hang during perform system reboot and reset	Prike Liang
	The system reboot failed as some IP blocks enter power gate before perform hw resource destory. Meanwhile use unify interface to set device CGPG to ungate state can simplify the amdgpu poweroff or reset ungate guard. Fixes: 487eca11a321ef ("drm/amdgpu: fix gfx hang during suspend with video playback (v2)") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Tested-by: Mengbing Wang <Mengbing.Wang@amd.com> Tested-by: Paul Menzel <pmenzel@molgen.mpg.de> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu: remove dead code in si_dpm.c	Jason Yan
	This code is dead, let's remove it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amd/amdgpu: remove hardcoded module name in prints	Aurabindo Pillai
	Let format prefixes take care of printing the module name through pr_fmt and dev_fmt definitions. Signed-off-by: Aurabindo Pillai <mail@aurabindo.in> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amd/amdgpu: add print prefix for dev_* variants	Aurabindo Pillai
	Define dev_fmt macro for informative print messages Signed-off-by: Aurabindo Pillai <mail@aurabindo.in> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amd/amdgpu: add prefix for pr_* prints	Aurabindo Pillai
	amdgpu uses lots of pr_* calls for printing error messages. With this prefix, errors shall be more obvious to the end use regarding its origin, and may help debugging. Prefix format: [xxx.xxxxx] amdgpu: ... Signed-off-by: Aurabindo Pillai <mail@aurabindo.in> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu/ring: simplify scheduler setup logic	Alex Deucher
	Set up a GPU scheduler based on the ring flag rather than the ring type. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu/kiq: add no_scheduler flag to KIQ	Alex Deucher
	We don't want a GPU scheduler for this ring. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu/ring: add no_scheduler flag	Alex Deucher
	This allows IPs to flag whether a specific ring requires a GPU scheduler or not. E.g., sometimes instances of an IP are asymmetric and have different capabilities. Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu: fix wrong vram lost counter increment V2	Evan Quan
	Vram lost counter is wrongly increased by two during baco reset. V2: assumed vram lost for mode1 reset on all ASICs Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu: replace DRM prefix with PCI device info for GFX RAS	Guchun Chen
	Prefix RAS message printing in GFX IP with PCI device info, which assists the debug in multiple GPU case. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu: resume kiq access debugfs	Yintian Tao
	If there is no GPU hang, user still can access debugfs through kiq. Signed-off-by: Yintian Tao <yttao@amd.com> Reviewed-by: Monk Liu <Monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu: refine ras related message print	Guchun Chen
	Prefix ras related kernel message logging with PCI device info by replacing DRM_INFO/WARN/ERROR with dev_info/warn/err. This can clearly tell user about GPU device information where ras is. And add some other ras message printing to make it more clear and friendly as well. Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu: add uncorrectable error count print in UMC ecc irq cb	Guchun Chen
	Uncorrectable error count printing is missed when issuing UMC UE injection. When going to the error count log function in GPU recover work thread, there is no chance to get correct error count value by last error injection and print, because the error status register is automatically cleared after reading in UMC ecc irq callback. So add such message printing in UMC ecc irq cb to be consistent with other RAS error interrupt cases. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-13	drm/amdgpu: restrict debugfs register access under SR-IOV	Yintian Tao
	Under bare metal, there is no more else to take care of the GPU register access through MMIO. Under Virtualization, to access GPU register is implemented through KIQ during run-time due to world-switch. Therefore, under SR-IOV user can only access debugfs to r/w GPU registers when meets all three conditions below. - amdgpu_gpu_recovery=0 - TDR happened - in_gpu_reset=0 v2: merge amdgpu_virt_can_access_debugfs() into amdgpu_virt_enable_access_debugfs() v3: drop ret variable in amdgpu_virt_enable_access_debugfs() and directly return result Signed-off-by: Yintian Tao <yttao@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: increased atom cmd timeout	John Clements
	added macro to define timeout Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	amdgpu_kms: Remove unnecessary condition check	Aurabindo Pillai
	Execution will only reach here if the asserted condition is true. Hence there is no need for the additional check. Signed-off-by: Aurabindo Pillai <mail@aurabindo.in> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: unify fw_write_wait for new gfx9 asics	Aaron Liu
	Make the fw_write_wait default case true since presumably all new gfx9 asics will have updated firmware. That is using unique WAIT_REG_MEM packet with opration=1. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Yuxian Dai <Yuxian.Dai@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: support access regs outside of mmio bar	Hawking Zhang
	add indirect access support to registers outside of mmio bar. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: retire AMDGPU_REGS_KIQ flag	Hawking Zhang
	all the register access through kiq is redirected to amdgpu_kiq_rreg/amdgpu_kiq_wreg Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: retire RREG32_IDX/WREG32_IDX	Hawking Zhang
	those are not needed anymore Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: retire indirect mmio reg support from cgs	Hawking Zhang
	not needed anymore Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: replace indirect mmio access in non-dc code path	Hawking Zhang
	all the mmCUR_CONTROL instances are in mmr range and can be accessd directly by using RREG32/WREG32 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: remove inproper workaround for vega10	Hawking Zhang
	the workaround is not needed for soc15 ASICs except for vega10. it is even not needed with latest vega10 vbios. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: fix gfx hang during suspend with video playback (v2)	Prike Liang
	The system will be hang up during S3 suspend because of SMU is pending for GC not respose the register CP_HQD_ACTIVE access request.This issue root cause of accessing the GC register under enter GFX CGGPG and can be fixed by disable GFX CGPG before perform suspend. v2: Use disable the GFX CGPG instead of RLC safe mode guard. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Tested-by: Mengbing Wang <Mengbing.Wang@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: Re-enable FRU check for most models v5	Kent Russell
	There is at least 1 VG20 DID that does not have an FRU, and trying to read that will cause a hang. For now, explicitly support reading the FRU for Arcturus and for the WKS VG20 DIDs, and skip for everything else. This re-enables serial number reporting for server cards v2: Add ASIC check v3: Don't default to true for pre-VG20 v4: Use DID instead of parsing the VBIOS v5: Sqaush in overflow warning fix Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu/sriov add amdgpu_amdkfd_pre_reset in gpu reset	Jack Zhang
	[PATCH 2/2] kfd_pre_reset will free mem_objs allocated by kfd_gtt_sa_allocate Without this change, sriov tdr code path will never free those allocated memories and get memory leak. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdkfd Avoid destroy hqd when GPU is on reset	Jack Zhang
	This reverts commit 5161bba4311f in order to split it into two different patches, and this will make it easier to understand. [PATCH 1/2] porting to gfx10 from commit 1b0bfcff463f390c40 ("drm/amdgpu: Avoid destroy hqd when GPU is on reset") Originally, MEC is touched without GPU initialized first. Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com> Reviewed-by: Monk Liu <monk.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: update RAS related dmesg print	John Clements
	prefix RAS error related dmesg print with pci device info Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: resolve mGPU RAS query instability	John Clements
	upon receiving uncorrectable error, query every GPU node for ras errors Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amd/amdgpu: Correct gfx10's CG sequence	Chengming Gui
	Incorrect CG sequence will cause gfx timedout, if we keep switching power profile mode (enter profile mod such as PEAK will disable CG, exit profile mode EXIT will enable CG) when run Vulkan test case(case used for test: vkexample). Signed-off-by: Chengming Gui <Jack.Gui@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: add SPM golden settings for Navi12	Tianci.Yin
	Add RLC_SPM golden settings Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tianci.Yin <tianci.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: add SPM golden settings for Navi14	Tianci.Yin
	Add RLC_SPM golden settings Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tianci.Yin <tianci.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: add SPM golden settings for Navi10(v2)	Tianci.Yin
	Add RLC_SPM golden settings Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Tianci.Yin <tianci.yin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu: Print UTCL2 client ID on a gpuvm fault	Oak Zeng
	UTCL2 client ID is useful information to get which UTCL2 client caused the gpuvm fault. Print it out for debug purpose Signed-off-by: Oak Zeng <Oak.Zeng@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian Konig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2020-04-09	drm/amdgpu/vcn: add shared memory restore after wake up from sleep.	James Zhu
	VCN shared memory needs restore after wake up during S3 test. v2: Allocate shared memory saved_bo at sw_init and free it in sw_fini. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>