Age | Commit message (Collapse) | Author |
|
Fix the amdgpu runpm dereference usage count.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix a memory overflow issue in the gfx IB test
for some ASICs. At least 20 bytes are needed for
the IB test packet.
v2: correct code indentation errors. (Christian)
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
enable init_registers callback func for nbio v7.11.
Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Setting register to force ordering to prevent read/write or write/read
hazards for un-cached modes.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In nbio v7_9, host driver should not issu gpu reset
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The smu needs to get the rlc power down message to sync the rlc state
with smu, the rlc state updating message need to be sent at while smu
begin suspend sequence , otherwise SMU will crash while RLC state is not
notified by driver, and rlc state probally changed after that
notification, so it needs to notify rlc state to smu at the end of the
suspend sequence in amdgpu_device_suspend() that can make sure the rlc
state is correctly set to SMU.
[ 101.000590] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 101.000598] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 110.838026] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 110.838035] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
[ 110.838039] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[ 110.838040] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
[ 110.884394] PM: suspend of devices aborted after 21213.620 msecs
[ 110.884402] PM: start suspend of devices aborted after 21213.882 msecs
[ 110.884405] PM: Some devices failed to suspend, or early wake event detected
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Not needed anymore.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
GC IP 9.4.2 and up support TA reporting of the number
of xGMI links between peers.
Tested-by: Vignesh Chander <vignesh.chander@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Document what each amdgpu driver reset method does.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
'amdgpu_vm_flush_compute_tlb'
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c:1373: warning: Function parameter or member 'xcc_mask' not described in 'amdgpu_vm_flush_compute_tlb'
Cc: Felix Kuehling <Felix.Kuehling@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add trace for amdgpu runpm separate funcs usage and this will
help debugging on the case of runpm usage missed to dereference.
In the normal case the runpm usage count referred by one kind
of functionality pairwise and usage should be changed from 1 to 0,
otherwise there will be an issue in the amdgpu runpm usage
dereference.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
By catting the xgmi_port_num sysfs node, it prints out the info in the
format of <src node id>:<src port num> -> <dst node id>:<dst port num>
for one xgmi link.
For example, in case of 4 sockets fully and evenly connected setup, it
would be like as below for the first node in the hive.
01:02 -> 02:03
01:03 -> 02:02
01:07 -> 03:04
01:04 -> 03:07
01:06 -> 04:05
01:05 -> 04:06
Based on the fact that there is two xgmi links between each socket pair,
"01:02 -> 02:03" means that the current socket in question use the port 2
to connect with port 3 of the second node in the hive and so on.
v2: print out the src/dst node id for each xgmi link (lijo)
v3: replace the current_node++ with +1 to align with dst node (le)
and use the dev_err instead of pr_err (lijo)
v4: fix checkpatch warning (alex)
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The PCIe speed capabilities advertised by a USB4 or TBT3 link are
limited to PCIe gen 1 per the USB4 spec. In reality the speed will
change dynamically based on fabric conditions and other traffic.
DPM is disabled when dGPUs are connected directly to Intel hosts
since the PCIe root port isn't able to handle dynamic speed
switching.
As this limitation is specifically for PCIe root ports in the SoC,
don't apply it when connected to an eGPU enclosure connected to an
Intel host.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2885
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When bandwidth limits are looked up using pcie_bandwidth_available()
virtual links such as USB4 are analyzed which might not represent the
real speed. Furthermore devices may change speeds autonomously which
may introduce conditional variation to the results reported in the
status registers.
Instead look at the capabilities of first PCI device outside of
dGPU to decide upper limits that the dGPU will work at.
For eGPU this effectively means that it will use the speed of the link
partner.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925#note_2145860
Link: https://www.usb.org/document-library/usb4r-specification-v20
USB4 V2 with Errata and ECN through June 2023
Section 11.2.1
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes the below:
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
WARNING: Missing a blank line after declarations
WARNING: Too many leading tabs - consider code refactoring
+ if (list_connector->connector_type != DRM_MODE_CONNECTOR_VGA) {
WARNING: Too many leading tabs - consider code refactoring
+ if (!amdgpu_display_hpd_sense(adev, amdgpu_connector->hpd.hpd)) {
Cc: Guchun Chen <guchun.chen@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
GCC 14 introduces a new -Walloc-size included in -Wextra which errors out
on various files in drivers/gpu/drm/amd/amdgpu like:
```
amdgpu_amdkfd_gfx_v8.c:241:15: error: allocation of insufficient size ‘4’ for type ‘uint32_t[2]’ {aka ‘unsigned int[2]'} with size ‘8’ [-Werror=alloc-size]
```
This is because each HQD_N_REGS is actually a uint32_t[2]. Move the * 2 to
the size argument so GCC sees we're allocating enough.
Originally did 'sizeof(uint32_t) * 2' for the size but a friend suggested
'sizeof(**dump)' better communicates the intent.
Link: https://lore.kernel.org/all/87wmuwo7i3.fsf@gentoo.org/
Signed-off-by: Sam James <sam@gentoo.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This will make it possible for amdgpu GEM ioctls to flush TLBs on compute
VMs.
This removes VMID-based TLB flushing and always uses PASID-based
flushing. This still works because it scans the VMID-PASID mapping
registers to find the right VMID. It's only slightly less efficient. This
is not a production use case.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When restoring after an eviction, use amdgpu_vm_handle_moved to update
BO VA mappings in KFD VMs that are not managed through the KFD API. This
should allow using the render node API to create more flexible memory
mappings in KFD VMs.
v2: rebase on drm_exec changes (Alex)
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Developed a new driver which allocates a 64bit memory on
each request in sequence order. At the moment, user queue
fence memory is the main consumer of this seq64 driver.
v2: Worked on review comments from Christian for the following
modifications
- Move driver name from "semaphore" to "seq64"
- Remove unnecessary PT/PD mapping
- Move enable_mes check into init/fini functions.
v3: Worked on review comments from Christian
- drop enable_mes check
- use DECLARE_BITMAP for bit array
- added kerneldoc for seq64
v4: Worked on review comments from Christian
- Rename amdgpu_seq64_get name with amdgpu_seq64_alloc
v5: Worked on review comments from Christian
- Fix seq64 lockdep warning
- move fpriv->seq64_va check into amdgpu_seq64_unmap()
- make the function amdgpu_seq64_unmap() return as void.
- reserve the buffers as not interruptible.
v6: port to drm_exec (Alex)
v7: disable for now (Arun)
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We've had misc reports of random IOMMU page faults when
this is used. It's just a rarely used optimization anyway, so
let's just disable it. It can still be toggled via the
module parameter for testing.
v2: leave it configurable via module parameter
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We've had misc reports of random IOMMU page faults when
this is used. It's just a rarely used optimization anyway, so
let's just disable it. It can still be toggled via the
module parameter for testing.
v2: leave it configurable via module parameter
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We've had misc reports of random IOMMU page faults when
this is used. It's just a rarely used optimization anyway, so
let's just disable it. It can still be toggled via the
module parameter for testing.
v2: leave it configurable via module parameter
Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11")
Reviewed-by: Yang Wang <kevinyang.wang@amd.com> (v1)
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add a module parameter to control the AGP aperture. The AGP
aperture is an aperture in the GPU's internal address space
which provides direct non-paged access to the platform address
space. This access is non-snooped so only uncached memory
can be accessed.
Add a knob so that we can toggle this for debugging.
Fixes: 67318cb84341 ("drm/amdgpu/gmc11: set gart placement GC11")
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Should be && rather than ||.
Fixes: b2e1cbe6281f ("drm/amdgpu/gmc11: disable AGP on GC 11.5")
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com> # PHX & Navi33
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The port num info is firstly introduced with 20.00.01.13 xgmi ta and
make them as part of topology info.
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
fix ras err_data null pointer issue in amdgpu_ras.c
Fixes: 8cc0f5669eb6 ("drm/amdgpu: Support multiple error query modes")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The variable "chunk_ptr" should be a pointer pointing
to a struct drm_amdgpu_cs_chunk instead of to a pointer
of that.
Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
uvd_entity_init()'
Fixes the following:
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c:237: warning: Function parameter or member 'ring' not described in 'amdgpu_vce_entity_init'
drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c:405: warning: Function parameter or member 'ring' not described in 'amdgpu_uvd_entity_init'
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The valid num_mem_partitions is required during ttm pool fini,
thus move the cleanup at the end of the function.
Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
MC_VM_AGP_* registers should not be programmed by guest driver.
v2: move early return outside of loop
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Samir Dhume <samir.dhume@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Pull drm fixes from Daniel Vetter:
"Dave's VPN to the big machine died, so it's on me to do fixes pr this
and next week while everyone else is at plumbers.
- big pile of amd fixes, but mostly for hw support newly added in 6.7
- i915 fixes, mostly minor things
- qxl memory leak fix
- vc4 uaf fix in mock helpers
- syncobj fix for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE"
* tag 'drm-next-2023-11-10' of git://anongit.freedesktop.org/drm/drm: (78 commits)
drm/amdgpu: fix error handling in amdgpu_vm_init
drm/amdgpu: Fix possible null pointer dereference
drm/amdgpu: move UVD and VCE sched entity init after sched init
drm/amdgpu: move kfd_resume before the ip late init
drm/amd: Explicitly check for GFXOFF to be enabled for s0ix
drm/amdgpu: Change WREG32_RLC to WREG32_SOC15_RLC where inst != 0 (v2)
drm/amdgpu: Use correct KIQ MEC engine for gfx9.4.3 (v5)
drm/amdgpu: add smu v13.0.6 pcs xgmi ras error query support
drm/amdgpu: fix software pci_unplug on some chips
drm/amd/display: remove duplicated argument
drm/amdgpu: correct mca debugfs dump reg list
drm/amdgpu: correct acclerator check architecutre dump
drm/amdgpu: add pcs xgmi v6.4.0 ras support
drm/amdgpu: Change extended-scope MTYPE on GC 9.4.3
drm/amdgpu: disable smu v13.0.6 mca debug mode by default
drm/amdgpu: Support multiple error query modes
drm/amdgpu: refine smu v13.0.6 mca dump driver
drm/amdgpu: Do not program PF-only regs in hdp_v4_0.c under SRIOV (v2)
drm/amdgpu: Skip PCTL0_MMHUB_DEEPSLEEP_IB write in jpegv4.0.3 under SRIOV
drm: amd: Resolve Sphinx unexpected indentation warning
...
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.7-2023-11-10:
amdgpu:
- SR-IOV fixes
- DMCUB fixes
- DCN3.5 fixes
- DP2 fixes
- SubVP fixes
- SMU14 fixes
- SDMA4.x fixes
- Suspend/resume fixes
- AGP regression fix
- UAF fixes for some error cases
- SMU 13.0.6 fixes
- Documentation fixes
- RAS fixes
- Hotplug fixes
- Scheduling entity ordering fix
- GPUVM fixes
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110190703.4741-1-alexander.deucher@amd.com
|
|
When clearing the root PD fails we need to properly release it again.
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
mem = bo->tbo.resource may be NULL in amdgpu_vm_bo_update.
Fixes: 180253782038 ("drm/ttm: stop allocating dummy resources during BO creation")
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
We need kernel scheduling entities to deal with handle clean up
if apps are not cleaned up properly. With commit 56e449603f0ac5
("drm/sched: Convert the GPU scheduler to variable number of run-queues")
the scheduler entities have to be created after scheduler init, so
change the ordering to fix this.
v2: Leave logic in UVD and VCE code
Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: ltuikov89@gmail.com
|
|
The kfd_resume needs to touch GC registers to enable the interrupts,
it needs to be done before GFXOFF is enabled to ensure that the GFX is
not off and GC registers can be touched. So move kfd_resume before the
amdgpu_device_ip_late_init which enables the CGPG/GFXOFF.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
If a user has disabled GFXOFF this may cause problems for the suspend
sequence. Ensure that it is enabled in amdgpu_acpi_is_s0ix_active().
The system won't reach the deepest state but it also won't hang.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-next
drm-misc-fixes for v6.7-rc1:
qxl:
- qxl memory leak fix.
syncobj:
- Fix waiting for DRM_SYNCOBJ_WAIT_FLAGS_WAIT_AVAILABLE
vc4:
- Fix UAF in mock helpers
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[sima: Stitch together both changelogs from Maarten. Also because of
branch history this contains a few more bugfixes which are already in
v6.6, but I didn't feel like this justifies some backmerge since there
wasn't any real conflict.]
Link: https://patchwork.freedesktop.org/patch/msgid/bc8598ee-d427-4616-8ebd-64107ab9a2d8@linux.intel.com
|
|
W/RREG32_RLC is hardedcoded to use instance 0. W/RREG32_SOC15_RLC
should be used instead when inst != 0.
v2: rebase
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0.
Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC
and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter.
Using amdgpu_sriov_runtime to determine whether to access via kiq or
RLC is sufficient for now.
v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call
v4: avoid using amdgpu_sriov_w/rreg
v3: use W/RREG32_XCC to handle non-kiq case
v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters
of amdgpu_device_wreg/rreg
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
add pcs xgmi ras error query support for smu v13.0.6.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When software 'pci unplug' using IGT is executed we got a sysfs directory
entry is NULL for differant ras blocks like hdp, umc, etc.
Before call 'sysfs_remove_file_from_group' and 'sysfs_remove_group'
check that 'sd' is not NULL.
[ +0.000001] RIP: 0010:sysfs_remove_group+0x83/0x90
[ +0.000002] Code: 31 c0 31 d2 31 f6 31 ff e9 9a a8 b4 00 4c 89 e7 e8 f2 a2 ff ff eb c2 49 8b 55 00 48 8b 33 48 c7 c7 80 65 94 82 e8 cd 82 bb ff <0f> 0b eb cc 66 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90
[ +0.000001] RSP: 0018:ffffc90002067c90 EFLAGS: 00010246
[ +0.000002] RAX: 0000000000000000 RBX: ffffffff824ea180 RCX: 0000000000000000
[ +0.000001] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ +0.000001] RBP: ffffc90002067ca8 R08: 0000000000000000 R09: 0000000000000000
[ +0.000001] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ +0.000001] R13: ffff88810a395f48 R14: ffff888101aab0d0 R15: 0000000000000000
[ +0.000001] FS: 00007f5ddaa43a00(0000) GS:ffff88841e800000(0000) knlGS:0000000000000000
[ +0.000002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ +0.000001] CR2: 00007f8ffa61ba50 CR3: 0000000106432000 CR4: 0000000000350ef0
[ +0.000001] Call Trace:
[ +0.000001] <TASK>
[ +0.000001] ? show_regs+0x72/0x90
[ +0.000002] ? sysfs_remove_group+0x83/0x90
[ +0.000002] ? __warn+0x8d/0x160
[ +0.000001] ? sysfs_remove_group+0x83/0x90
[ +0.000001] ? report_bug+0x1bb/0x1d0
[ +0.000003] ? handle_bug+0x46/0x90
[ +0.000001] ? exc_invalid_op+0x19/0x80
[ +0.000002] ? asm_exc_invalid_op+0x1b/0x20
[ +0.000003] ? sysfs_remove_group+0x83/0x90
[ +0.000001] dpm_sysfs_remove+0x61/0x70
[ +0.000002] device_del+0xa3/0x3d0
[ +0.000002] ? ktime_get_mono_fast_ns+0x46/0xb0
[ +0.000002] device_unregister+0x18/0x70
[ +0.000001] i2c_del_adapter+0x26d/0x330
[ +0.000002] arcturus_i2c_control_fini+0x25/0x50 [amdgpu]
[ +0.000236] smu_sw_fini+0x38/0x260 [amdgpu]
[ +0.000241] amdgpu_device_fini_sw+0x116/0x670 [amdgpu]
[ +0.000186] ? mutex_lock+0x13/0x50
[ +0.000003] amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
[ +0.000192] drm_minor_release+0x4f/0x80 [drm]
[ +0.000025] drm_release+0xfe/0x150 [drm]
[ +0.000027] __fput+0x9f/0x290
[ +0.000002] ____fput+0xe/0x20
[ +0.000002] task_work_run+0x61/0xa0
[ +0.000002] exit_to_user_mode_prepare+0x150/0x170
[ +0.000002] syscall_exit_to_user_mode+0x2a/0x50
Cc: Hawking Zhang <hawking.zhang@amd.com>
Cc: Luben Tuikov <luben.tuikov@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian Koenig <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
avoid driver to touch invalid mca reg.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
So driver doesn't touch invalid aca entries.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
add pcs xgmi v6.4.0 ras support
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Change local memory type to MTYPE_UC on revision id 0
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Direct error query mode and firmware error query mode
are supported for now.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
refine smu mca driver to support query ras error from pmfw path.
- correct gfx smu bank hwid (from mp5 to smu bank)
- retire unused callback function in amdgpu_mca_smu_funcs{}
- add new mca_bank_set{} structure to collect mca bank
- move enum mca_reg_idx into amdgpu_mca.h header
- add mca status register field decode macro
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The following regs can only be programmed by the PF:
HDP_MISC_CNTL
HDP_NONSURFACE_BASE
HDP_NONSURFACE_BASE_HI
v2: update commit message
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Samir Dhume <samir.dhume@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
PCTL0_MMHUB_DEEPSLEEP_IB is blocked for VF access
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Samir Dhume <samir.dhume@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|