Age | Commit message (Collapse) | Author |
|
Cacheline size is not available in IP discovery for gc943,gc944.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
|
|
Host drivers can create partial hives per guest by disabling xgmi sharing
between certain peers in the main hive.
Typically, these partial hives are fully connected per guest session.
In the event that the host makes a mistake by adding a non-shared node
to a guest session, have the KFD reflect sharing disabled by severing
the IO link.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Tested-by: James Yao <yiqing.yao@amd.com>
Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Populate cache line size info in topology based on information from IP
discovery table.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Sreekant Somasekharan <Sreekant.Somasekharan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable KFD for GC 11.5.2.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
kfd_create_vcrat_image_gpu itself checks the avail_size at the start.
So the value of avail_size is at least VCRAT_SIZE_FOR_GPU(16384),
minus struct crat_header(40UL) and struct crat_subtype_compute(40UL) it cannot be less than 0.
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Added GFX12 support to a few switch statements.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add gfx v9_4_4 ip block support
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable KFD for GC 11.5.1.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The KFD topology includes cache line size, but we have not been
filling that information out unless we are parsing a CRAT table.
Fill in this information for the devices where we have cache
information structs, and pipe this information to the topology
sysfs files.
v2: squash in fix from Joe (Alex)
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
GFX 9.4.3 uses a new version of the GC info table which
contains the cache info. This patch adds a new function
to populate the cache info from IP discovery for GFX 9.4.3.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
I think this was an abstraction back from when
kfd supported both radeon and amdgpu. Since we just
support amdgpu now, there is no more need for this and
we can use the amdgpu structures directly.
This also avoids having the kfd_cu_info structures on
the stack when inlining which can blow up the stack.
Cc: Arnd Bergmann <arnd@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Merge all developer debug options available as separated module
parameters in one, making it obvious that are for developers.
Drop the obsolete module options in favor of the new ones.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Currently, we store CU info only for a single XCC assuming
that it is the same for all XCCs. However, that may not be
true. As a result, store CU info for all XCCs. This info is
later used for CU masking.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Enable KFD for GC 11.5.0.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Now that we use the dGPU path for all APUs, drop the
IOMMUv2 support.
v2: drop the now unused queue manager functions for gfx7/8 APUs
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We are dropping the IOMMUv2 path, so no need to enable this.
It's often buggy on consumer platforms anyway.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Remove DUMMY_VRAM_SIZE as it is not needed and can result
in reporting incorrect memory size.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
It's pointless on laptops to look for the SRAT table as these are not
NUMA. Check the number of possible nodes is > 1 to decide whether to
look for SRAT.
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
We need to track memory usage on a per partition basis. To do
that, store the local memory information in KFD node instead
of kfd device.
v2: squash in fix ("amdkfd: Use mem_id to access mem_partition info")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[For 1P NPS1 mode driver bringup]
Changes required to initialize the amdgpu driver with frontdoor firmware
loading and discovery=2 with the native mode SBIOS that enables CPU GPU
unified interleaved memory.
sudo modprobe amdgpu discovery=2
Once PSP TMR region is reported via the ACPI interface, the dependency
on the ip_discovery.bin will be removed.
Choice of where to allocate driver table is given to each IP version. In
general, both GTT and VRAM domains will be considered. If one of the
tables has a strict restriction for VRAM domain, then only VRAM domain
is considered.
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
(lijo: Modified the handling for SMU Tables)
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
GFXIP 9.4.3 could be in APU or carveout mode but we cannot use the
xgmi.connected_to_cpu flag to identify the iolinks type. Use appropriate
APU or Carveout mode based condition to report xgmi connection in kfd
topology.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The PSP TA will only provide xGMI topology info for links between GPU
sockets so links between partitions from different sockets will be
hardcoded as 3 xGMI hops with 1 hops weighted as xGMI and 2 hops
weighted with a new intra-socket weight to indicate the longest
possible distance.
If the link between a partition and the CPU is non-PCIe, then assume
the CPU (CCDs) is located within the same socket as the partition
and represent the link as an intra-socket weighted single hop XGMI link
with memory bandwidth.
Links between partitions within a single socket will be abstracted as
single hop xGMI links weighted with the new intra-socket weight and
will have memory bandwidth.
Finally, use the unused function bits in the location ID to represent the
coordinates of the compute partition within its socket.
A follow on patch will resolve the requirement for GPU socket xGMI
link representation sometime later.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Introduce a new structure, kfd_node, which will now represent
a compute node. kfd_node is carved out of kfd_dev structure.
kfd_dev struct now will become the parent of kfd_node, and will
store common resources such as doorbells, GTT sub-alloctor etc.
kfd_node struct will store all resources specific to a compute
node, such as device queue manager, interrupt handling etc.
This is the first step in adding compute partition support in KFD.
v2: introduce kfd_node struct to gc v11 (Hawking)
v3: make reference to kfd_dev struct through kfd_node (Morris)
v4: use kfd_node instead for kfd isr/mqd functions (Morris)
v5: rebase (Alex)
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Morris Zhang <Shiwu.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add initial KFD support
Convert a few structures to IP version checking (Hawking)
Signed-off-by: Elena Sakhnovitch <elena.sakhnovitch@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add initial support for GC 11.0.4 in KFD compute driver.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix the memory overrun issue caused by wrong array size.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1527133 ("Memory - corruptions")
Fixes: c0cc999f3c32e6 ("drm/amdkfd: Fix the warning of array-index-out-of-bounds")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For some GPUs with more CUs, the original sibling_map[32]
in struct crat_subtype_cache is not enough
to save the cache information when create the VCRAT table,
so skip filling the struct crat_subtype_cache info instead
fill struct kfd_cache_properties directly to fix this problem.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
kfd_topology_device->cache_count is not used by
other fucntions, so remove it.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Cleanup kfd_dev struct by removing ddev and pdev as both
drm_device and pci_dev can be fetched from amdgpu_device.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This dummy cache info will enable kfd base function support.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
correct the cache information for gfx1036
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update the gfx1037 L1/L2 cache setting.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Delete the redundant word 'to'.
Signed-off-by: wangjianli <wangjianli@cdjrlc.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Added missing cases for GFX 11.0.3 code in a few switch statements.
Signed-off-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The field is redundant and does not serve any functional role
Signed-off-by: Ramesh Errabolu <Ramesh.Errabolu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Loading amdgpu on GC 10.3.7 shows an ERR level message:
`kfd kfd: amdgpu: GC IP 0a0307 not supported in kfd`
Add these targets to match yellow carp structures.
Reported-by: David Chang <david.chang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Jesse(Jie) Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 5.18.x
|
|
Add initial support for GC 11.0.1 in KFD compute driver.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Changes are inherited from GC 11.0.0.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add initial support for soc21 in KFD compute
driver (Mukul)
- Add new definition for soc21 device.
- Add new file for amdgpu-kfd interface for GFX11 family.
- Add new file for queue management, interrupt handling,
mqd management for GFX11 family in KFD driver.
- Related changes/updates for soc21 device in
KFD driver.
- Repurpose last 2 entries of SDMA MQD for driver use.
v2: Add an optional argument into update queue operation (Mukul)
v3: Switch to ip version check, replace kgd_dev with
amdgpu_device (Hawking)
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Rather than using hardcoded tables, we can use the gfx and
gmc config pulled from the IP discovery table to generate the
cache configuration.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[ 168.544078] ======================================================
[ 168.550309] WARNING: possible circular locking dependency detected
[ 168.556523] 5.16.0-kfd-fkuehlin #148 Tainted: G E
[ 168.562558] ------------------------------------------------------
[ 168.568764] kfdtest/3479 is trying to acquire lock:
[ 168.573672] ffffffffc0927a70 (&topology_lock){++++}-{3:3}, at:
kfd_topology_device_by_id+0x16/0x60 [amdgpu] [ 168.583663]
but task is already holding lock:
[ 168.589529] ffff97d303dee668 (&mm->mmap_lock#2){++++}-{3:3}, at:
vm_mmap_pgoff+0xa9/0x180 [ 168.597755]
which lock already depends on the new lock.
[ 168.605970]
the existing dependency chain (in reverse order) is:
[ 168.613487]
-> #3 (&mm->mmap_lock#2){++++}-{3:3}:
[ 168.619700] lock_acquire+0xca/0x2e0
[ 168.623814] down_read+0x3e/0x140
[ 168.627676] do_user_addr_fault+0x40d/0x690
[ 168.632399] exc_page_fault+0x6f/0x270
[ 168.636692] asm_exc_page_fault+0x1e/0x30
[ 168.641249] filldir64+0xc8/0x1e0
[ 168.645115] call_filldir+0x7c/0x110
[ 168.649238] ext4_readdir+0x58e/0x940
[ 168.653442] iterate_dir+0x16a/0x1b0
[ 168.657558] __x64_sys_getdents64+0x83/0x140
[ 168.662375] do_syscall_64+0x35/0x80
[ 168.666492] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 168.672095]
-> #2 (&type->i_mutex_dir_key#6){++++}-{3:3}:
[ 168.679008] lock_acquire+0xca/0x2e0
[ 168.683122] down_read+0x3e/0x140
[ 168.686982] path_openat+0x5b2/0xa50
[ 168.691095] do_file_open_root+0xfc/0x190
[ 168.695652] file_open_root+0xd8/0x1b0
[ 168.702010] kernel_read_file_from_path_initns+0xc4/0x140
[ 168.709542] _request_firmware+0x2e9/0x5e0
[ 168.715741] request_firmware+0x32/0x50
[ 168.721667] amdgpu_cgs_get_firmware_info+0x370/0xdd0 [amdgpu]
[ 168.730060] smu7_upload_smu_firmware_image+0x53/0x190 [amdgpu]
[ 168.738414] fiji_start_smu+0xcf/0x4e0 [amdgpu]
[ 168.745539] pp_dpm_load_fw+0x21/0x30 [amdgpu]
[ 168.752503] amdgpu_pm_load_smu_firmware+0x4b/0x80 [amdgpu]
[ 168.760698] amdgpu_device_fw_loading+0xb8/0x140 [amdgpu]
[ 168.768412] amdgpu_device_init.cold+0xdf6/0x1716 [amdgpu]
[ 168.776285] amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[ 168.784034] amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[ 168.791161] local_pci_probe+0x40/0x80
[ 168.797027] work_for_cpu_fn+0x10/0x20
[ 168.802839] process_one_work+0x273/0x5b0
[ 168.808903] worker_thread+0x20f/0x3d0
[ 168.814700] kthread+0x176/0x1a0
[ 168.819968] ret_from_fork+0x1f/0x30
[ 168.825563]
-> #1 (&adev->pm.mutex){+.+.}-{3:3}:
[ 168.834721] lock_acquire+0xca/0x2e0
[ 168.840364] __mutex_lock+0xa2/0x930
[ 168.846020] amdgpu_dpm_get_mclk+0x37/0x60 [amdgpu]
[ 168.853257] amdgpu_amdkfd_get_local_mem_info+0xba/0xe0 [amdgpu]
[ 168.861547] kfd_create_vcrat_image_gpu+0x1b1/0xbb0 [amdgpu]
[ 168.869478] kfd_create_crat_image_virtual+0x447/0x510 [amdgpu]
[ 168.877884] kfd_topology_add_device+0x5c8/0x6f0 [amdgpu]
[ 168.885556] kgd2kfd_device_init.cold+0x385/0x4c5 [amdgpu]
[ 168.893347] amdgpu_amdkfd_device_init+0x138/0x180 [amdgpu]
[ 168.901177] amdgpu_device_init.cold+0x141b/0x1716 [amdgpu]
[ 168.909025] amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
[ 168.916458] amdgpu_pci_probe+0x19b/0x3a0 [amdgpu]
[ 168.923442] local_pci_probe+0x40/0x80
[ 168.929249] work_for_cpu_fn+0x10/0x20
[ 168.935008] process_one_work+0x273/0x5b0
[ 168.940944] worker_thread+0x20f/0x3d0
[ 168.946623] kthread+0x176/0x1a0
[ 168.951765] ret_from_fork+0x1f/0x30
[ 168.957277]
-> #0 (&topology_lock){++++}-{3:3}:
[ 168.965993] check_prev_add+0x8f/0xbf0
[ 168.971613] __lock_acquire+0x1299/0x1ca0
[ 168.977485] lock_acquire+0xca/0x2e0
[ 168.982877] down_read+0x3e/0x140
[ 168.987975] kfd_topology_device_by_id+0x16/0x60 [amdgpu]
[ 168.995583] kfd_device_by_id+0xa/0x20 [amdgpu]
[ 169.002180] kfd_mmap+0x95/0x200 [amdgpu]
[ 169.008293] mmap_region+0x337/0x5a0
[ 169.013679] do_mmap+0x3aa/0x540
[ 169.018678] vm_mmap_pgoff+0xdc/0x180
[ 169.024095] ksys_mmap_pgoff+0x186/0x1f0
[ 169.029734] do_syscall_64+0x35/0x80
[ 169.035005] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 169.041754]
other info that might help us debug this:
[ 169.053276] Chain exists of:
&topology_lock --> &type->i_mutex_dir_key#6 --> &mm->mmap_lock#2
[ 169.068389] Possible unsafe locking scenario:
[ 169.076661] CPU0 CPU1
[ 169.082383] ---- ----
[ 169.088087] lock(&mm->mmap_lock#2);
[ 169.092922] lock(&type->i_mutex_dir_key#6);
[ 169.100975] lock(&mm->mmap_lock#2);
[ 169.108320] lock(&topology_lock);
[ 169.112957]
*** DEADLOCK ***
This commit fixes the deadlock warning by ensuring pm.mutex is not
held while holding the topology lock. For this, kfd_local_mem_info
is moved into the KFD dev struct and filled during device init.
This cached value can then be used instead of querying the value
again and again.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Currently, the IO-links to the device being removed from topology,
are not cleared. As a result, there would be dangling links left in
the KFD topology. This patch aims to fix the following:
1. Cleanup all IO links to the device being removed.
2. Ensure that node numbering in sysfs and nodes proximity domain
values are consistent after the device is removed:
a. Adding a device and removing a GPU device are made mutually
exclusive.
b. The global proximity domain counter is no longer required to be
an atomic counter. A normal 32-bit counter can be used instead.
3. Update generation_count to let user-mode know that topology has
changed due to device removal.
CC: Shuotao Xu <shuotaoxu@microsoft.com>
Reviewed-by: Shuotao Xu <shuotaoxu@microsoft.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
A bunch of errors and warnings are leftover KFD over the years, attempt
to fix the errors and most warnings reported by checkpatch tool. Still a
few warnings remain which may be false positives so ignore them for now.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Update the SPDX License header for all the KFD files.
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add basic support for GC 10.1.4,
it uses same IP blocks with GC 10.1.3
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
As the possible failure of the allocation, kmemdup() may return NULL
pointer.
Therefore, it should be better to check the 'props2' in order to prevent
the dereference of NULL pointer.
Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
asic_family was a duplicate of asic_type, both of type amd_asic_type.
Replace all instances of device_info->asic_family with adev->asic_type
and remove asic_family from device_info.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Switch to IP version checking instead of asic_type on various KFD
version checks.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Converts KFD switch statements to use IP version checking instead
of asic_type.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|