summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
AgeCommit message (Collapse)Author
2024-11-08drm/amdgpu/gfx11: Enable cleaner shader for GFX11.0.0/11.0.2 GPUsSrinivasan Shanmugam
Enable the cleaner shader for GFX11.0.0/11.0.2 GPUs to provide data isolation between GPU workloads. The cleaner shader is responsible for clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps prevent data leakage and ensures accurate computation results. This update extends cleaner shader support to GFX11.0.0/11.0.2 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-08drm/amdgpu: Add sysfs interface for gc reset maskJesse.zhang@amd.com
Add two sysfs interfaces for gfx and compute: gfx_reset_mask compute_reset_mask These interfaces are read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add a generic helper which takes the ring as parameter and print the strings in the order they are applied (Christian) check amdgpu_gpu_recovery before creating sysfs file itself, and initialize supported_reset_types in IP version files (Lijo) v4: Fixing uninitialized variables (Tim) Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-05drm/amdgpu/gfx11: Add cleaner shader for GFX11.0.3Srinivasan Shanmugam
This commit adds the cleaner shader microcode for GFX11.0.3 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Clearing these resources is important for ensuring data isolation between different workloads running on the GPU. Without the cleaner shader, residual data from a previous workload could potentially be accessed by a subsequent workload, leading to data leaks and incorrect computation results. The cleaner shader microcode is represented as an array of 32-bit words (`gfx_11_0_3_cleaner_shader_hex`). This array is the binary representation of the cleaner shader code, which is written in a low-level GPU instruction set. When the cleaner shader feature is enabled, the AMDGPU driver loads this array into a specific location in the GPU memory. The GPU then reads this memory location to fetch and execute the cleaner shader instructions. The cleaner shader is executed automatically by the GPU at the end of each workload, before the next workload starts. This ensures that all GPU resources are in a clean state before the start of each workload. This addition is part of the cleaner shader feature implementation. The cleaner shader feature helps resource utilization by cleaning up GPU resources after they are used. It also enhances security and reliability by preventing data leaks between workloads. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-04drm/amdgpu: Group gfx sysfs functionsLijo Lazar
Make amdgpu_gfx_sysfs_init/fini functions as common entry points for all gfx related sysfs nodes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15drm/amdgpu: optimize fn gfx_v11_ring_insert_nopSunil Khatri
Optimize gfx_v11_ring_insert_nop() to call optimized version of amdgpu_ring_insert_nop instead of calling amdgpu_ring_write for number of nop times. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-08drm/amdgpu: fix typosAndrew Kreimer
Fix typos in comments: "wether -> whether". Signed-off-by: Andrew Kreimer <algonell@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu/gfx11: Apply Isolation Enforcement to GFX & Compute ringsSrinivasan Shanmugam
This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v11_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its use, respectively. `amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring begins its use. This function cancels any scheduled `enforce_isolation_work` and, if necessary, signals the Kernel Fusion Driver (KFD) to stop the runqueue. `amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends its use. This function schedules `enforce_isolation_work` to be run after a delay. These functions are part of the Enforce Isolation Handler, which enforces shader isolation on AMD GPUs to prevent data leakage between different processes. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu/gfx11: Implement cleaner shader support for GFX11 hardwareSrinivasan Shanmugam
The patch modifies the gfx_v11_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v11_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer. This patch adds support for the PACKET3_RUN_CLEANER_SHADER packet in the gfx_v11_0 module. This packet is used to emit the cleaner shader, which is used to clear GPU memory before it's reused, helping to prevent data leakage between different processes. Finally, the patch updates the ring function structures to include the new gfx_v11_0_ring_emit_cleaner_shader function. This allows the cleaner shader to be emitted as part of the ring's operations. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu: update the handle ptr in hw_finiSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of hw_fini. Also update the ip_block ptr where ever needed as there were cyclic dependency of hw_fini on suspend and some followed clean up. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu: update the handle ptr in hw_initSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of hw_init. Also update the ip_block ptr where ever needed as there were cyclic dependency of hw_init on resume. v2: squash in isp fix Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu: update the handle ptr in resumeSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of resume. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu: update the handle ptr in suspendSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of suspend. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07drm/amdgpu: update the handle ptr in wait_for_idleSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of wait_for_idle. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01drm/amdgpu: update the handle ptr in post_soft_resetSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of post_soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01drm/amdgpu: update the handle ptr in soft_resetSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01drm/amdgpu: update the handle ptr in check_soft_resetSunil Khatri
Update the *handle to amdgpu_ip_block ptr for all functions pointers of check_soft_reset. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01drm/amdgpu: update the handle ptr in sw_finiSunil Khatri
update the *handle to amdgpu_ip_block ptr for all functions pointers of sw_fini. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01drm/amdgpu: update the handle ptr in sw_initSunil Khatri
update the *handle to amdgpu_ip_block ptr for all functions pointers of sw_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01drm/amdgpu: update the handle ptr in late_initSunil Khatri
Update the ptr handle to amdgpu_ip_block ptr in all the functions of late_init function ptr. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01drm/amdgpu: update the handle ptr in early_initSunil Khatri
update the handle ptr to amdgpu_ip_block ptr for all functions pointers on early_init. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01drm/amdgpu: update the handle ptr in print_ip_stateSunil Khatri
Update the ptr handle to amdgpu_ip_block ptr in all the functions affected. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01drm/amdgpu: update the handle ptr in dump_ip_stateSunil Khatri
Update the ptr handle to amdgpu_ip_block ptr in all the functions. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-26drm/amdgpu: Fix typo "acccess" and improve the comment style hereWangYuli
There are some spelling mistakes of 'acccess' in comments which should be instead of 'access'. And the comment style should be like this: /* * Text * Text */ Suggested-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/all/f75fbe30-528e-404f-97e4-854d27d7a401@amd.com/ Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://lore.kernel.org/all/0c768bf6-bc19-43de-a30b-ff5e3ddfd0b3@suse.de/ Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: WangYuli <wangyuli@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02drm/amdgpu/gfx11: use rlc safe mode for soft recoveryAlex Deucher
Protect the MMIO access with safe mode. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02drm/amdgpu/gfx11: use proper rlc safe mode helpersAlex Deucher
Rather than open coding it for the queue reset. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02drm/amdgpu/gfx11: per queue reset only on bare metalAlex Deucher
It's not supported under SR-IOV at the moment. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02drm/amdgpu/mes: modify mes api for mmio queue resetJiadong Zhu
Add me/pipe/queue parameters for queue reset input. v2: fix build (Alex) Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02drm/amdgpu/gfx11: wait for reset done before remapJiadong Zhu
There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02drm/amdgpu/gfx11: rename gfx_v11_0_gfx_init_queue()Alex Deucher
Rename to gfx_v11_0_kgq_init_queue() to better align with the other naming in the file. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02drm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2)Prike Liang
Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in each function respectively. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-09-02drm/amdgpu/gfx11: add ring reset callbacksAlex Deucher
Add ring reset callbacks for gfx and compute. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29drm/amdgpu/gfx11: return early in preempt_ib()Alex Deucher
When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-20drm/amd/gfx11: move the gfx mutex into the callerAlex Deucher
Otherwise we can fail to drop the software mutex when we fail to take the hardware mutex. Fixes: 76acba7b7f12 ("drm/amdgpu/gfx11: add a mutex for the gfx semaphore") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx11: export gfx_v11_0_request_gfx_index_mutex()Alex Deucher
It will be used by the queue reset code. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx11: add a mutex for the gfx semaphoreAlex Deucher
This will be used in more places in the future so add a mutex. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16drm/amdgpu/gfx11: enter safe mode before touching CP_INT_CNTLAlex Deucher
Need to enter safe mode before touching GC MMIO. Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-13drm/amdgpu: fix ptr check warning in gfx11 ip_dumpSunil Khatri
Change condition, if (ptr == NULL) to if (!ptr) for a better format and fix the warning. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-06drm/amdgpu: optimize the padding for gfx11Sunil Khatri
Adding NOP packets one by one in the ring does not use the CP efficiently. Solution: Use CP optimization while adding NOP packet's so PFP can discard NOP packets based on information of count from the Header instead of fetching all NOP packets one by one. Reviewed-by: Christian König <christian.koenig@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Cc: Tvrtko Ursulin <tursulin@igalia.com> Cc: Marek Olšák <marek.olsak@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23drm/amdgpu/gfx11: Enable bad opcode interruptJesse Zhang
For the bad opcode case, it will cause CP/ME hang. The firmware will prevent the ME side from hanging by raising a bad opcode interrupt. And the driver needs to perform a vmid reset when receiving the interrupt. v2: update irq naming (drop priv) (Alex) Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23drm/amdgpu/gfx11: properly handle error ints on all pipesAlex Deucher
Need to handle the interrupt enables for all pipes. v2: fix indexing (Jessie) Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-23drm/amdgpu/gfx11: enable wave kill for compute queuesAlex Deucher
It should work the same for compute as well as gfx. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-10drm/amdgpu: select compute ME engines dynamicallySunil Khatri
GFX ME right now is one but this could change in future SOC's. Use no of ME for GFX as start point for ME for compute for GFX11. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-08drm/amd/pm: avoid to load smu firmware for APUsTim Huang
Certain call paths still load the SMU firmware for APUs, which needs to be skipped. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02drm/amdgpu: fix out of bounds access in gfx11 during ip dumpSunil Khatri
During ip dump in gfx11 the index variable is reused but is not reinitialized to 0 and this causes the index calculation to be wrong and access out of bound access. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02drm/amdgpu: add firmware for GC IP v11.5.2Tim Huang
This patch is to add firmware for GC 11.5.2. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-02drm/amdgpu: initialize GC IP v11.5.2Tim Huang
Initialize GC 11.5.2 and set gfx hw configuration. Signed-off-by: Tim Huang <Tim.Huang@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-07-01drm/amdgpu/gfx11: remove superfluous cache flagsMarek Olšák
If any INV flags are needed, they should be executed via ACQUIRE_MEM before INDIRECT_BUFFER. Signed-off-by: Marek Olšák <marek.olsak@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27drm/amdgpu: Fix smatch static checker warningHawking Zhang
adev->gfx.imu.funcs could be NULL Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-27drm/amdgpu: refine gfx11 firmware loadingYang Wang
refine gfx11 firmware loading Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-06-14drm/amdgpu: call flush_gpu_tlb directly in gfxhub enableYunxiang Li
Here since we are in reset and takes the reset_domain write side lock already. We can't use the flush tlb helper which tries to take the read side. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>