summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
AgeCommit message (Collapse)Author
2023-06-09drm/amdgpu: introduce vmhub definition for multi-partition cases (v3)Hawking Zhang
v1: Each partition has its own gfxhub or mmhub. adjust the num of MAX_VMHUBS and the GFXHUB/MMHUB layout (Le) v2: re-design the AMDGPU_GFXHUB/AMDGPU_MMHUB layout (Le) v3: apply the gfxhub/mmhub layout to new IPs (Hawking) v4: fix up gmc11 (Alex) v5: rebase (Alex) Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: release correct lock in amdgpu_gfx_enable_kgq()Dan Carpenter
This function was releasing the incorrect lock on the error path. Reported-by: kernel test robot <lkp@intel.com> Fixes: 1156e1a60f02 ("drm/amdgpu: add [en/dis]able_kgq() functions") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcsHoratio Zhang
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still use amdgpu_irq_put to disable this interrupt, which caused the call trace in this function. [ 102.873958] Call Trace: [ 102.873959] <TASK> [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.874072] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.874122] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.874172] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.874223] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.874321] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.874375] process_one_work+0x21f/0x3f0 [ 102.874377] worker_thread+0x200/0x3e0 [ 102.874378] ? process_one_work+0x3f0/0x3f0 [ 102.874379] kthread+0xfd/0x130 [ 102.874380] ? kthread_complete_and_exit+0x20/0x20 [ 102.874381] ret_from_fork+0x22/0x30 v2: - Handle umc and gfx ras cases in separated patch - Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11 v3: - Improve the subject and code comments - Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init v4: - Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state - Check cp_ecc_error_irq.funcs rather than ip version for a more sustainable life v5: - Simplify judgment conditions Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: unlock the correct lock in amdgpu_gfx_enable_kcq()Dan Carpenter
We changed which lock we are supposed to take but this error path was accidentally over looked so it still drops the old lock. Fixes: def799c6596d ("drm/amdgpu: add multi-xcc support to amdgpu_gfx interfaces (v4)") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: put MQDs in VRAMAlex Deucher
Reduces preemption latency. Only enable this for gfx10 and 11 for now to avoid changing behavior on gfx 8 and 9. v2: move MES MQDs into VRAM as well (YuBiao) v3: enable on gfx10, 11 only (Alex) v4: minor style changes, document why gfx10/11 only (Alex) Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add [en/dis]able_kgq() functionsAlex Deucher
To replace the IP specific variants which are largely duplicate. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: check correct allocated mqd_backup object after allocGuchun Chen
Instead of the default one, check the right mqd_backup object. Signed-off-by: Guchun Chen <guchun.chen@amd.com> Cc: Le Ma <le.ma@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: fix a build warning by a typo in amdgpu_gfx.cGuchun Chen
This should be a typo when intruducing multi-xx support. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Guchun Chen <guchun.chen@amd.com> Cc: Le Ma <le.ma@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-24drm/amdgpu: track MQD size for gfx and computeAlex Deucher
It varies by generation and we need to know the size to expose this via debugfs. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-21drm/amdgpu: allocate doorbell index for multi-die caseLe Ma
Allocate different doorbell index for kiq/kcq rings on each die Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-20drm/amdgpu: add master/slave check in init phaseLe Ma
Skip KCQ setup on slave xcc as there's no use case. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu: add multi-xcc support to amdgpu_gfx interfaces (v4)Le Ma
v1: Modify kiq_init/fini, mqd_sw_init/fini and enable/disable_kcq to adapt to multi-die case. Pass 0 as default to all asics with single xcc (Le) v2: squash commits to avoid breaking the build (Le) v3: unify naming style (Le) v4: apply the changes to gc v11_0 (Hawking) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu: move queue_bitmap to an independent structure (v3)Le Ma
To allocate independent queue_bitmap for each XCD, then the old bitmap policy can be continued to use with a clear logic. Use mec_bitmap[0] as default for all non-GC 9.4.3 IPs. v2: squash commits to avoid breaking the build v3: unify naming style Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-18drm/amdgpu: convert gfx.kiq to array type (v3)Le Ma
v1: more kiq instances are a available in SOC (Le) v2: squash commits to avoid breaking the build (Le) v3: make the conversion for gfx/mec v11_0 (Hawking) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-04-14drm/amdgpu: move vmhub out of amdgpu_ring_funcs (v4)Le Ma
It looks better to place this field in ring structure. Also drop the repeated ring funcs definitions if there's no difference except for vmhub field. v2: rename the field to vm_hub like others (Le) v3: apply the changes to new ip blocks (Hawking) v4: fix vcn sw ring (Alex) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-13drm/amdgpu: Correct gfx ras_late_init callbackHawking Zhang
Use default gfx ras_late_init callback for gfx ras block. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-18drm/amdgpu: allow multipipe policy on ASICs with one MECLang Yu
Always enable multipipe policy on ASICs with GC VERSION > 9.0.0 instead of MEC number > 1. This will allow multipipe policy on ASICs with one MEC, e.g., gfx11 APUs. Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17drm/amdgpu: Perform gpu reset after gfx finishes processing ras poison ↵YiPeng Chai
consumption on gfx_v11_0_3 Perform gpu reset after gfx finishes processing ras poison consumption on gfx_v11_0_3. V2: Move gfx poison consumption handler from hw_ops to ip function level. V3: Adjust the calling position of amdgpu_gfx_poison_consumation_handler. V4: Since gfx v11_0_3 does not have .hw_ops instance, the .hw_ops null pointer check in amdgpu_ras_interrupt_poison_consumption_handler needs to be adjusted. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17drm/amdgpu: Add gfx ras function on gfx v11_0_3YiPeng Chai
Add gfx ras function on gfx v11_0_3. V2: 1. Add separate source files for gfx v11_0_3. 2. Create a common function to initialize gfx ras block. V3: 1. Rename amdgpu_gfx_ras_block_init to amdgpu_gfx_ras_sw_init. 2. Adjust the calling position of amdgpu_gfx_ras_sw_init. 3. Remove gfx_v11_0_3_ras_ops. V4: Revert changes in amdgpu_ras_interrupt_poison_consumption_handler. V5: 1. Remove invalid include file in gfx_v11_0_3.c. 2. Reduce the number of parameters of amdgpu_gfx_ras_sw_init. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-03drm/amdgpu: use VRAM|GTT for a bunch of kernel allocationsChristian König
Technically all of those can use GTT as well, no need to force things into VRAM. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09drm/amdgpu: complete gfxoff allow signal during suspend without delayHarsh Jain
change guarantees that gfxoff is allowed before moving further in s2idle sequence to add more reliablity about gfxoff in amdgpu IP's suspend flow Signed-off-by: Harsh Jain <harsh.jain@amd.com> Reviewed-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29drm/amdgpu: fix compiler warning for amdgpu_gfx_cp_init_microcodeLikun Gao
Change the type of parameter on amdgpu_gfx_cp_init_microcode to fix compiler warning. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-29drm/amdgpu: add function to init CP microcodeLikun Gao
Add an common function to init CP related microcode. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amd: Add detailed GFXOFF stats to debugfsAndré Almeida
Add debugfs interface to log GFXOFF statistics: - Read amdgpu_gfxoff_count to get the total GFXOFF entry count at the time of query since system power-up - Write 1 to amdgpu_gfxoff_residency to start logging, and 0 to stop. Read it to get average GFXOFF residency % multiplied by 100 during the last logging interval. Both features are designed to be keep the values persistent between suspends. Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-16drm/amdgpu: reduce reset timeVictor Zhao
In multi container use case, reset time is important, so skip ring tests and cp halt wait during ip suspending for reset as they are going to fail and cost more time on reset v2: add a hang flag to indicate the reset comes from a job timeout, skip ring test and cp halt wait in this case v3: move hang flag to adev Signed-off-by: Victor Zhao <Victor.Zhao@amd.com> Acked-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-30drm/amdgpu: enable mes to access registers v2Jack Xiao
Enable mes to access registers. v2: squash mes sched ring enablement flag Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08drm/amd/amdgpu: Fix alignment issueArunpravin Paneer Selvam
Fix alignment problems reported by zuul for the commit b07d1d73b09e ("drm/amd/amdgpu: Enable high priority gfx queue") Fixes: b07d1d73b09e ("drm/amd/amdgpu: Enable high priority gfx queue") Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-06drm/amd/amdgpu: Enable high priority gfx queueArunpravin Paneer Selvam
Starting from SIENNA CICHLID asic supports two gfx pipes, enabling two graphics queues, 1 on each pipe, pipe0 queue0 would be the normal piority queue and pipe1 queue0 would be the high priority queue Only one queue per pipe is visble to SPI, SPI looks at the priority value assigned to CP_GFX_HQD_QUEUE_PRIORITY from each of the queue's HQD/MQD. Create contexts applying AMDGPU_CTX_PRIORITY_HIGH which submits job to the high priority queue on GFX pipe1. There would be starvation of LP workload if HP workload is always available. v2: - remove unnecessary check(Nirmoy) - make pipe1 hardware support a separate patch(Nirmoy) - remove duplicate code(Shashank) - add CSA support for second gfx pipe(Alex) v3(Christian): - fix incorrect indentation - merge COMPUTE and GFX switch cases as both calls the same function. v4: - rebase w/ latest code base Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-01drm/amdgpu: Resolve RAS GFX error count issue after cold boot on ArcturusCandice Li
Adjust the sequence for ras late init and separate ras reset error status from query status. v2: squash in fix from Candice Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-06drm/amdgpu: nuke dynamic gfx scratch reg allocationChristian König
It's over a decade ago that this was actually used for more than ring and IB tests. Just use the static register directly where needed and nuke the now useless infrastructure. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Lang Yu <Lang.Yu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04drm/amdgpu: add mes unmap legacy queue routineJack Xiao
For mes kiq has been taken over by mes sched, drv can't directly use mes kiq to unmap queues. drv has to use mes sched api to unmap legacy queue. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04drm/amdgpu: kiq takes charge of all queuesJack Xiao
To make kgq/kcq and mes queue co-exist, kiq needs take charge of all queues. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-25drm/amdgpu: fix off by one in amdgpu_gfx_kiq_acquire()Dan Carpenter
This post-op should be a pre-op so that we do not pass -1 as the bit number to test_bit(). The current code will loop downwards from 63 to -1. After changing to a pre-op, it loops from 63 to 0. Fixes: 71c37505e7ea ("drm/amdgpu/gfx: move more common KIQ code to amdgpu_gfx.c") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in gfx ras ↵yipechai
block Remove redundant calls of amdgpu_ras_block_late_fini in gfx ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Optimize xxx_ras_fini function of each ras blockyipechai
1. Move the variables of ras block instance members from specific xxx_ras_fini to general ras_fini call. 2. Function calls inside the modules only use parameters passed from xxx_ras_fini instead of ras block instance members. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Modify .ras_fini function pointer parameteryipechai
Modify .ras_fini function pointer parameter so that we can remove redundant intermediate calls in some ras blocks. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17drm/amdgpu: Optimize xxx_ras_late_init function of each ras blockyipechai
1. Move calling ras block instance members from module internal function to the top calling xxx_ras_late_init. 2. Module internal function calls can only use parameter variables of xxx_ras_late_init instead of ras block instance members. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17drm/amdgpu: Modify .ras_late_init function pointer parameteryipechai
Modify .ras_late_init function pointer parameter so that it can remove redundant intermediate calls in some ras blocks. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-14drm/amdgpu: Optimize amdgpu_gfx_ras_late_init/amdgpu_gfx_ras_fini function codeyipechai
Optimize amdgpu_gfx_ras_late_init/amdgpu_gfx_ras_fini function code. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amdgpu: Modify gfx block to fit for the unified ras block data and opsyipechai
1.Modify gfx block to fit for the unified ras block data and ops. 2.Change amdgpu_gfx_ras_funcs to amdgpu_gfx_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of gfx ras variable so that gfx ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register gfx ras block into amdgpu device ras block link list. 5.Remove the redundant code about gfx in amdgpu_ras.c after using the unified ras block. 6.Fill unified ras block .name .block .ras_late_init and .ras_fini for all of gfx versions. If .ras_late_init and .ras_fini had been defined by the selected gfx version, the defined functions will take effect; if not defined, default fill with amdgpu_gfx_ras_late_init and amdgpu_gfx_ras_fini. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amd/pm: do not expose implementation details to other blocks out of powerEvan Quan
Those implementation details(whether swsmu supported, some ppt_funcs supported, accessing internal statistics ...)should be kept internally. It's not a good practice and even error prone to expose implementation details. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-05drm/amdgpu: During s0ix don't wait to signal GFXOFFLijo Lazar
In the rare event when GFX IP suspend coincides with a s0ix entry, don't schedule a delayed work, instead signal PMFW immediately to allow GFXOFF entry. GFXOFF is a prerequisite for s0ix entry. PMFW needs to be signaled about GFXOFF status before amd-pmc module passes OS HINT to PMFW telling that everything is ready for a safe s0ix entry. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1712 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2021-08-20drm/amdgpu: Cancel delayed work when GFXOFF is disabledMichel Dänzer
schedule_delayed_work does not push back the work if it was already scheduled before, so amdgpu_device_delay_enable_gfx_off ran ~100 ms after the first time GFXOFF was disabled and re-enabled, even if GFXOFF was disabled and re-enabled again during those 100 ms. This resulted in frame drops / stutter with the upcoming mutter 41 release on Navi 14, due to constantly enabling GFXOFF in the HW and disabling it again (for getting the GPU clock counter). To fix this, call cancel_delayed_work_sync when the disable count transitions from 0 to 1, and only schedule the delayed work on the reverse transition, not if the disable count was already 0. This makes sure the delayed work doesn't run at unexpected times, and allows it to be lock-free. v2: * Use cancel_delayed_work_sync & mutex_trylock instead of mod_delayed_work. v3: * Make amdgpu_device_delay_enable_gfx_off lock-free (Christian König) v4: * Fix race condition between amdgpu_gfx_off_ctrl incrementing adev->gfx.gfx_off_req_count and amdgpu_device_delay_enable_gfx_off checking for it to be 0 (Evan Quan) Cc: stable@vger.kernel.org Reviewed-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> # v3 Acked-by: Christian König <christian.koenig@amd.com> # v3 Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-16drm/amd/amdgpu: remove unnecessary RAS context fieldCandice Li
Delete ras_if->name in the RAS ctx structure and remove related lines. Signed-off-by: Candice Li <candice.li@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-05-19drm/amdgpu: Conditionally reset RAS counters on bootJohn Clements
Only clear RAS error counters if perestent EDC harvesting is not supported Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: split gfx callbacks into ras and non-ras onesHawking Zhang
gfx ras is only available in cerntain ip generations. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Dennis Li <Dennis.Li@amd.com> Reviewed-by: John Clements <John.Clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amd/pm: unify the interface for gfx state settingEvan Quan
No need to have special handling for swSMU supported ASICs. Signed-off-by: Evan Quan <evan.quan@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: add the sched_score to amdgpu_ring_initChristian König
Allow separate ring to share the same scheduler score. No functional change. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-and-Tested-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: wrap kiq ring ops with kiq spinlockNirmoy Das
KIQ ring is being operated by kfd as well as amdgpu. KFD is using kiq lock, we should the same from amdgpu side as well. Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-04-09drm/amdgpu: add codes to capture invalid hardware access when recoveryDennis Li
When recovery thread has begun GPU reset, there should be not other threads to access hardware, otherwise system randomly hang. v2 (chk): rewritten from scratch, use trylock and lockdep instead of hand wiring the logic. v3: add in_irq check v4: change to check in_task Signed-off-by: Dennis Li <Dennis.Li@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>