Age | Commit message (Collapse) | Author |
|
Reset sequence indicates that hardware already ran into a bad state.
Avoid sending unmap queue request to reset KCQ. This will also cover RAS
error scenarios which need a reset to recover, hence remove the check.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In current logic, it calls ring_alloc followed by a ring_test. ring_test
in turn will call another ring_alloc. This is illegal usage as a
ring_alloc is expected to be closed properly with a ring_commit. Change
to commit the map/unmap queue packet first followed by a ring_test. Add a
comment about the usage of ring_test.
Also, reorder the current pre-condition checks of job hang or kiq ring
scheduler not ready. Without them being met, it is not useful to attempt
ring or memory allocations.
Fixes tag refers to the original patch which introduced this issue which
then got carried over into newer code.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 6c10b5cc4eaa ("drm/amdgpu: Remove duplicate code in gfx_v8_0.c")
|
|
Add two sysfs interfaces for gfx and compute:
gfx_reset_mask
compute_reset_mask
These interfaces are read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.
V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
and print the strings in the order they are applied (Christian)
check amdgpu_gpu_recovery before creating sysfs file itself,
and initialize supported_reset_types in IP version files (Lijo)
v4: Fixing uninitialized variables (Tim)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Make amdgpu_gfx_sysfs_init/fini functions as common entry points for all
gfx related sysfs nodes.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
compute/gfx may have multiple rings on some hardware.
In some cases, userspace wants to run jobs on a specific ring for validation purposes.
This debugfs entry helps to disable or enable submitting jobs to a specific ring.
This entry is populated only if there are at least two or more cores in the gfx/compute ip.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Make sure KFD gets a turn when serializing access to
the GC IP. Currently non-KFD jobs can starve KFD if they
submit often enough. This patch prevents that by stalling
non-KFD if its time period has elapsed.
v2: fix units
v3: check enablement properly
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Zero-initialize mqd backup memory, otherwise the check for
'already-backed-up' could go wrong.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable sysfs node for current compute partition mode on VFs also.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com>
Tested-by: Vignesh Chander <Vignesh.Chander@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It should be enabled on both bare metal and VFs.
Fixes: e189be9b2e38 ("drm/amdgpu: Add enforce_isolation sysfs attribute")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Cc: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
|
|
amdgpu_gfx_bit_to_me_queue has been unused since it was added in
commit 7470bfcf2014 ("drm/amdgpu: add helper function for gfx queue/bitmap
transition")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add helper to get supported/available partition config modes
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
cleaner shader
This commit replaces the use of amdgpu_job_submit_direct which submits
the job to the ring directly, with drm_sched_entity in the cleaner
shader job submission process. The change allows the GPU scheduler to
manage the cleaner shader job.
- The job is then submitted to the GPU using the
drm_sched_entity_push_job function, which allows the GPU scheduler to
manage the job.
This change improves the reliability of the cleaner shader job
submission process by leveraging the capabilities of the GPU scheduler.
Fixes: d361ad5d2fc0 ("drm/amdgpu: Add sysfs interface for running cleaner shader")
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For mes11 old firmware has issue to map legacy queue,
add a flag to switch mes to map legacy queue.
Fixes: f9d8c5c7855d ("drm/amdgpu/gfx: enable mes to map legacy queue support")
Reported-by: Andrew Worsley <amworsley@gmail.com>
Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This commit introduces the Enforce Isolation Handler designed to enforce
shader isolation on AMD GPUs, which helps to prevent data leakage
between different processes.
The handler counts the number of emitted fences for each GFX and compute
ring. If there are any fences, it schedules the `enforce_isolation_work`
to be run after a delay of `GFX_SLICE_PERIOD`. If there are no fences,
it signals the Kernel Fusion Driver (KFD) to resume the runqueue.
The function is synchronized using the `enforce_isolation_mutex`.
This commit also introduces a reference count mechanism
(kfd_sch_req_count) to keep track of the number of requests to enable
the KFD scheduler. When a request to enable the KFD scheduler is made,
the reference count is decremented. When the reference count reaches
zero, a delayed work is scheduled to enforce isolation after a delay of
GFX_SLICE_PERIOD.
When a request to disable the KFD scheduler is made, the function first
checks if the reference count is zero. If it is, it cancels the delayed
work for enforcing isolation and checks if the KFD scheduler is active.
If the KFD scheduler is active, it sends a request to stop the KFD
scheduler and sets the KFD scheduler state to inactive. Then, it
increments the reference count.
The function is synchronized using the kfd_sch_mutex to ensure that the
KFD scheduler state and reference count are updated atomically.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This patch adds a new sysfs interface for running the cleaner shader on
AMD GPUs. The cleaner shader is used to clear GPU memory before it's
reused, which can help prevent data leakage between different processes.
The new sysfs file is write-only and is named `run_cleaner_shader`.
Write the number of the partition to this file to trigger the cleaner shader
on that partition. There is only one partition on GPUs which do not
support partitioning.
Changes made in this patch:
- Added `amdgpu_set_run_cleaner_shader` function to handle writes to the
`run_cleaner_shader` sysfs file.
- Added `run_cleaner_shader` to the list of device attributes in
`amdgpu_device_attrs`.
- Updated `default_attr_update` to handle `run_cleaner_shader`.
- Added `AMDGPU_DEVICE_ATTR_WO` macro to create write-only device
attributes.
v2: fix error handling (Alex)
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
|
|
This commit adds a new sysfs attribute 'enforce_isolation' to control
the 'enforce_isolation' setting per GPU. The attribute can be read and
written, and accepts values 0 (disabled) and 1 (enabled).
When 'enforce_isolation' is enabled, reserved VMIDs are allocated for
each ring. When it's disabled, the reserved VMIDs are freed.
The set function locks a mutex before changing the 'enforce_isolation'
flag and the VMIDs, and unlocks it afterwards. This ensures that these
operations are atomic and prevents race conditions and other concurrency
issues.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The cleaner shader is used by the CP firmware to clean LDS and GPRs
between processes on the CUs.
This adds an internal API for GFX IP code to allocate and initialize the
cleaner shader.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
|
|
Use mes pipe to unmap kcq and kgq.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add multiple mes ring instances in mes structure to support
multiple mes pipes.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This resolves the unchecded return value warning reported by Coverity.
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Reduce redundant code and user doesn't need to pay attention to RAS
details.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This commit fixes a type mismatch in the amdgpu_gfx_kiq_init_ring
function triggered by the snprintf function expecting unsigned char
arguments due to the '%hhu' format specifier, but receiving int and u32
arguments.
The issue occurred because the arguments xcc_id, ring->me, ring->pipe,
and ring->queue were of type int and u32, not unsigned char. This led to
a type mismatch when these arguments were passed to snprintf.
To resolve this, the snprintf line was modified to cast these arguments
to unsigned char. This ensures that the arguments are of the correct
type for the '%hhu' format specifier and resolves the warning.
Fixes the below:
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:4: warning: format
>> specifies type 'unsigned char' but the argument has type 'int'
>> [-Wformat]
xcc_id, ring->me, ring->pipe, ring->queue);
^~~~~~
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:12: warning: format
>> specifies type 'unsigned char' but the argument has type 'u32' (aka
>> 'unsigned int') [-Wformat]
xcc_id, ring->me, ring->pipe, ring->queue);
^~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:22: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat]
xcc_id, ring->me, ring->pipe, ring->queue);
^~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:333:34: warning: format specifies type 'unsigned char' but the argument has type 'u32' (aka 'unsigned int') [-Wformat]
xcc_id, ring->me, ring->pipe, ring->queue);
^~~~~~~~~~~
4 warnings generated.
Fixes: 0ea554455542 ("drm/amdgpu: Fix snprintf usage in amdgpu_gfx_kiq_init_ring")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405250446.XeaWe66u-lkp@intel.com/
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This commit fixes a format truncation issue arosed by the snprintf
function potentially writing more characters into the ring->name buffer
than it can hold, in the amdgpu_gfx_kiq_init_ring function
The issue occurred because the '%d' format specifier could write between
1 and 10 bytes into a region of size between 0 and 8, depending on the
values of xcc_id, ring->me, ring->pipe, and ring->queue. The snprintf
function could output between 12 and 41 bytes into a destination of size
16, leading to potential truncation.
To resolve this, the snprintf line was modified to use the '%hhu' format
specifier for xcc_id, ring->me, ring->pipe, and ring->queue. The '%hhu'
specifier is used for unsigned char variables and ensures that these
values are printed as unsigned decimal integers.
Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_kiq_init_ring’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:61: warning: ‘%d’ directive output may be truncated writing between 1 and 10 bytes into a region of size between 0 and 8 [-Wformat-truncation=]
332 | snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d",
| ^~
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:50: note: directive argument in the range [0, 2147483647]
332 | snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d",
| ^~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:332:9: note: ‘snprintf’ output between 12 and 41 bytes into a destination of size 16
332 | snprintf(ring->name, sizeof(ring->name), "kiq_%d.%d.%d.%d",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
333 | xcc_id, ring->me, ring->pipe, ring->queue);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 345a36c4f1ba ("drm/amdgpu: prefer snprintf over sprintf")
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adjust mes12 initialization sequence to fix mapping
legacy queue.
v2: use dev_err.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable mes to map legacy queue support.
v2: drop unused gfx_v12_0_kiq_enable_kgq() (Alex)
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add gfx v9_4_4 ip block support
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable mes to map legacy queue support.
v2: kiq_set_resources is required.
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Clear warning that uses uninitialized value fw_size.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix the KGQ fallback function name, as this will
help differentiate the failure in the KCQ enablement.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.9-2024-03-01:
amdgpu:
- GC 11.5.1 updates
- Misc display cleanups
- NBIO 7.9 updates
- Backlight fixes
- DMUB fixes
- MPO fixes
- atomfirmware table updates
- SR-IOV fixes
- VCN 4.x updates
- use RMW accessors for pci config registers
- PSR fixes
- Suspend/resume fixes
- RAS fixes
- ABM fixes
- Misc code cleanups
- SI DPM fix
- Revert freesync video
amdkfd:
- Misc cleanups
- Error handling fixes
radeon:
- use RMW accessors for pci config registers
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240301204857.13960-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Drop redundant parameters in function amdgpu_gfx_kiq_init_ring
to simplify the code
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.9-2024-02-19:
amdgpu:
- ATHUB 4.1 support
- EEPROM support updates
- RAS updates
- LSDMA 7.0 support
- JPEG DPG support
- IH 7.0 support
- HDP 7.0 support
- VCN 5.0 support
- Misc display fixes
- Retimer fixes
- DCN 3.5 fixes
- VCN 4.x fixes
- PSR fixes
- PSP 14.0 support
- VA_RESERVED cleanup
- SMU 13.0.6 updates
- NBIO 7.11 updates
- SDMA 6.1 updates
- MMHUB 3.3 updates
- Suspend/resume fixes
- DMUB updates
amdkfd:
- Trap handler enhancements
- Fix cache size reporting
- Relocate the trap handler
radeon:
- fix typo in print statement
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219214810.4911-1-alexander.deucher@amd.com
|
|
commit ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring
callbacks") caused GFXOFF control to be used more heavily and the
codepath that was removed from commit 0dee72639533 ("drm/amd: flush any
delayed gfxoff on suspend entry") now can be exercised at suspend again.
Users report that by using GNOME to suspend the lockscreen trigger will
cause SDMA traffic and the system can deadlock.
This reverts commit 0dee726395333fea833eaaf838bc80962df886c8.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: ab4750332dbe ("drm/amdgpu/sdma5.2: add begin/end_use ring callbacks")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.9-2024-02-09:
amdgpu:
- Validate DMABuf imports in compute VMs
- Add RAS ACA framework
- PSP 13 fixes
- Misc code cleanups
- Replay fixes
- Atom interpretor PS, WS bounds checking
- DML2 fixes
- Audio fixes
- DCN 3.5 Z state fixes
- Remove deprecated ida_simple usage
- UBSAN fixes
- RAS fixes
- Enable seq64 infrastructure
- DC color block enablement
- Documentation updates
- DC documentation updates
- DMCUB updates
- S3 fixes
- VCN 4.0.5 fixes
- DP MST fixes
- SR-IOV fixes
amdkfd:
- Validate DMABuf imports in compute VMs
- SVM fixes
- Trap handler updates
radeon:
- Atom interpretor PS, WS bounds checking
- Misc code cleanups
UAPI:
- Bump KFD version so UMDs know that the fixes that enable the management of
VA mappings in compute VMs using the GEM_VA ioctl for DMABufs exported from KFD are present
- Add INFO query for input power. This matches the existing INFO query for average
power. Used in gaming HUDs, etc.
Example userspace: https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240209221459.5453-1-alexander.deucher@amd.com
|
|
This will trade the W=1 warning -Wformat-overflow to
-Wformat-truncation. This lets us enable -Wformat-overflow subsystem
wide.
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Pan, Xinhui <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/fea7a52924f98b1ac24f4a7e6ba21d7754422430.1704908087.git.jani.nikula@intel.com
|
|
Fixes the below:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:645 amdgpu_gfx_enable_kcq() warn: inconsistent indenting
Cc: Le Ma <Le.Ma@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Le Ma <Le.Ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0.
Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC
and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter.
Using amdgpu_sriov_runtime to determine whether to access via kiq or
RLC is sufficient for now.
v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call
v4: avoid using amdgpu_sriov_w/rreg
v3: use W/RREG32_XCC to handle non-kiq case
v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters
of amdgpu_device_wreg/rreg
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Issues were reported with commit 1cfb4d612127
("drm/amdgpu: put MQDs in VRAM") on an ADLINK Ampere
Altra Developer Platform (AVA developer platform).
Various ARM systems seem to have problems related
to PCIe and MMIO access. In this case, I'm not sure
if this is specific to the ADLINK platform or ARM
in general. Seems to be some coherency issue with
VRAM. For now, just don't put MQDs in VRAM on ARM.
Link: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100453.html
Fixes: 1cfb4d612127 ("drm/amdgpu: put MQDs in VRAM")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: alexey.klimov@linaro.org
|
|
This is workaround, kiq ring test failed in suspend stage when do ras
recovery.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Use an inline function for version check. Gives more flexibility to
handle any format changes.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
DCN 3.1.4 is reported to hang on s2idle entry if graphics activity
is happening during entry. This is because GFXOFF was scheduled as
delayed but RLC gets disabled in s2idle entry sequence which will
hang GFX IP if not already in GFXOFF.
To help this problem, flush any delayed work for GFXOFF early in
s2idle entry sequence to ensure that it's off when RLC is changed.
commit 4b31b92b143f ("drm/amdgpu: complete gfxoff allow signal during
suspend without delay") modified power gating flow so that if called
in s0ix that it ensured that GFXOFF wasn't put in work queue but
instead processed immediately.
This is dead code due to commit 10cb67eb8a1b ("drm/amdgpu: skip
CG/PG for gfx during S0ix") because GFXOFF will now not be explicitly
called as part of the suspend entry code. Remove that dead code.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Tim Huang <tim.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Return -ENOMEM, when there is no sufficient dynamically allocated memory
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Return -ENOMEM, when there is no sufficient dynamically allocated memory
to create MQD backup for ring
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kcq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:497:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
497 | int j;
| ^
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_disable_kgq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:528:6: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
528 | int j;
| ^
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c: In function ‘amdgpu_gfx_enable_kgq’:
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c:630:12: warning: variable ‘j’ set but not used [-Wunused-but-set-variable]
630 | int r, i, j;
|
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This is introduced by the code merge and will let the
adev->gfx.kiq[0].ring struct being overrided
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
1. Need flush HDP for MQD putting in vram
2. Zero out mes MQD
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The common function can help reduce redundant code.
v2: remove xcp operation, only need to do RAS operations for all
instances.
v3: remove check for GFX RAS support, will be checked in higher level.
add amdgpu prefix for the function name.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Keep a helper function to get description of compute partition mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
GMC block handles memory related information, it makes more sense to
keep memory partition functions in gmc block.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It's not required to take lock on all cases while querying partition
mode. Querying partition mode during KFD init process doesn't need to
take a lock. Init process after a switch will already be happening under
lock. Control the behaviour by adding flags to xcp_query_partition_mode.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|