summaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
AgeCommit message (Collapse)Author
2024-04-09drm/amdgpu: update check condition for XGMI ACA UETao Zhou
Check more possible ext error codes. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20drm/amdgpu: retire unused aca_bank_report data structureYang Wang
retire unused aca_bank_report data structure. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20drm/amdgpu: refine aca error cache for xgmi v6.4.0Yang Wang
refine aca error cache for xgmi v6.4.0 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20drm/amdgpu: add new aca_smu_type supportYang Wang
Add new types to distinguish between ACA error type and smu mca type. e.g.: the ACA_ERROR_TYPE_DEFERRED is not matched any smu mca valid bank channel, so add new type 'aca_smu_type' to distinguish aca error type and smu mca type. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15drm/amdgpu: replace MCA macro with ACA for XGMIYang Wang
use new ACA macro to instead of MCA Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-15drm/amdgpu: add xgmi v6.4.0 ACA supportYang Wang
add xgmi v6.4.0 ACA driver support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-19drm/amdgpu: MCA supports recording umc address informationYiPeng Chai
MCA supports recording umc address information. V2: Move err_addr variable from struct ras_err_node to struct ras_err_info. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-12-13drm/amdgpu: xgmi_fill_topology_infoVignesh Chander
1. Use the mirrored topology info to fill links for VF. The new solution is required to simplify and optimize host driver logic. Only use the new solution for VFs that support full duplex and extended_peer_link_info otherwise the info would be incomplete. 2. avoid calling extended_link_info on VF as its not supported Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-by: Zhigang Luo <zhigang.luo@amd.com> Reviewed-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-17drm/amdgpu: expose the connected port num info through sysfsShiwu Zhang
By catting the xgmi_port_num sysfs node, it prints out the info in the format of <src node id>:<src port num> -> <dst node id>:<dst port num> for one xgmi link. For example, in case of 4 sockets fully and evenly connected setup, it would be like as below for the first node in the hive. 01:02 -> 02:03 01:03 -> 02:02 01:07 -> 03:04 01:04 -> 03:07 01:06 -> 04:05 01:05 -> 04:06 Based on the fact that there is two xgmi links between each socket pair, "01:02 -> 02:03" means that the current socket in question use the port 2 to connect with port 3 of the second node in the hive and so on. v2: print out the src/dst node id for each xgmi link (lijo) v3: replace the current_node++ with +1 to align with dst node (le) and use the dev_err instead of pr_err (lijo) v4: fix checkpatch warning (alex) Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: add pcs xgmi v6.4.0 ras supportYang Wang
add pcs xgmi v6.4.0 ras support Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-09drm/amdgpu: Don't warn for unsupported set_xgmi_plpd_modeTao Zhou
set_xgmi_plpd_mode may be unsupported and this isn't error, no need to print warning for it. v2: add ret2 to save the status of psp_ras_trigger_error. Suggested-by: lijo.lazar@amd.com Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-07drm/amdgpu: add RAS reset/query operations for XGMI v6_4Tao Zhou
Reset/query RAS error status and count. v2: use XGMI IP version instead of WAFL version. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-20drm/amdgpu: replace reset_error_count with amdgpu_ras_reset_error_countTao Zhou
Simplify the code. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-28drm/amd/pm: deprecate allow_xgmi_power_down interfaceLe Ma
Replace with set_plpd_mode uniformly for places to use. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-26drm/amdgpu:Expose physical id of device in XGMI hiveMangesh Gadre
This identifies the physical ordering of devices in the hive v2: fix compilation issue Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-20drm/amdgpu: Use function for IP version checkLijo Lazar
Use an inline function for version check. Gives more flexibility to handle any format changes. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25drm/amdgpu: Add -ENOMEM error handling when there is no memorySrinivasan Shanmugam
Return -ENOMEM, when there is no sufficient dynamically allocated memory Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15drm/amdgpu: expose num_hops and num_links xgmi info through dev attrShiwu Zhang
Add these two dev attrs for xgmi info details which is helpful for developers checking the xgmi topology by catting the sys file directly. Take 4 cards with xgmi connection as an example, get the num_hops for each device or node through xmig_hive_info dir like, cat /sys/bus/pci/devices/0000:41:00.0/xgmi_hive_info/node1/num_hops will return "00 41 41 41" where "00" stands for the hops to node1 itself and "41" is the hops in hex format to every other node in the same hive. There are node1/node2/node3/node4 representing 4 cards in the hive. The same for num_links dev attr. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Acked-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09drm/amdgpu: add instance mask for RAS injectTao Zhou
User can specify injected instances by the mask. For backward compatibility, the mask value is incorporated into sub block index without interface change of RAS TA. User uses logical mask and driver should convert it to physical value before sending it to RAS TA. v2: update parameter name. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-31drm/amdgpu: Correct xgmi_wafl block nameHawking Zhang
Fix backward compatibility issue to stay with the old name of xgmi_wafl node. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-15drm/amdgpu: Rework xgmi_wafl_pcs ras sw_initHawking Zhang
To align with other IP blocks. Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Stanley Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-02-23drm/amdgpu: make kobj_type structures constantThomas Weißschuh
Since commit ee6d3dd4ed48 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17drm/amdgpu: correct query xgmi3x16 pcs error statusStanley.Yang
There is xgmi3x16 pcs error status for aldebaran, driver should check xgmi3x16 pcs error status field instead of gopx16 pcs error status field. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17drm/amdgpu: support check xgmi/walf error mask bit for aldebaranStanley.Yang
The pcs error count should be determined by PCS ERROR status and PCS ERROR MASK registers, only PCS ERROR status register can not refect error counts accurately. Changed from V1: remove clean noncorrectable mask registers optimize query pcs error status Changed from V2: remove check mask_value bits correct set value corresponding bit Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-29drm/amdgpu: Fix potential double free and null pointer dereferenceLiang He
In amdgpu_get_xgmi_hive(), we should not call kfree() after kobject_put() as the PUT will call kfree(). In amdgpu_device_ip_init(), we need to check the returned *hive* which can be NULL before we dereference it. Signed-off-by: Liang He <windhl@126.com> Reviewed-by: Luben Tuikov <luben.tuikov@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-19drm/amd/display: clean up some inconsistent indentingsYang Li
clean up some inconsistent indentings Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2178 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-13drm/amdgpu: Use per device reset_domain for XGMI on sriov configurationshaoyunl
For SRIOV configuration, host driver control the reset method(either FLR or heavier chain reset). The host will notify the guest individually with FLR message if individual GPU within the hive need to be reset. So for guest side, no need to use hive->reset_domain to replace the original per device reset_domain Signed-off-by: shaoyunl <shaoyun.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-25drm/amdgpu: skip set_topology_info for VFVignesh Chander
Skip set_topology_info as xgmi TA will now block it and host needs to program it. Signed-off-by: Vignesh Chander <Vignesh.Chander@amd.com> Reviewed-By : Shaoyun Liu <Shaoyun.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-08-22drm/amdgpu: Move psp_xgmi_terminate call from amdgpu_xgmi_remove_device to ↵YiPeng Chai
psp_hw_fini V1: The amdgpu_xgmi_remove_device function will send unload command to psp through psp ring to terminate xgmi, but psp ring has been destroyed in psp_hw_fini. V2: 1. Change the commit title. 2. Restore amdgpu_xgmi_remove_device to its original calling location. Move psp_xgmi_terminate call from amdgpu_xgmi_remove_device to psp_hw_fini. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15drm/amdgpu: drop xmgi23 error query/reset supportHawking Zhang
xgmi_ras is only initialized when host to GPU interface is PCIE. in such case, xgmi23 is disabled and protected by security firmware. Host access will results to security violation Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant .ras_fini initialization in some ras blocksyipechai
1. Define amdgpu_ras_block_late_fini_default in amdgpu_ras.c as .ras_fini common function, which is called when .ras_fini of ras block isn't initialized. 2. Remove the code of using amdgpu_ras_block_late_fini to initialize .ras_fini in ras blocks. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Remove redundant calls of amdgpu_ras_block_late_fini in xgmi ras ↵yipechai
block Remove redundant calls of amdgpu_ras_block_late_fini in xgmi ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Optimize xxx_ras_fini function of each ras blockyipechai
1. Move the variables of ras block instance members from specific xxx_ras_fini to general ras_fini call. 2. Function calls inside the modules only use parameters passed from xxx_ras_fini instead of ras block instance members. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-02drm/amdgpu: Modify .ras_fini function pointer parameteryipechai
Modify .ras_fini function pointer parameter so that we can remove redundant intermediate calls in some ras blocks. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-25Merge tag 'drm-misc-next-2022-02-23' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-next drm-misc-next for v5.18: UAPI Changes: Cross-subsystem Changes: - Split out panel-lvds and lvds dt bindings . - Put yes/no on/off disabled/enabled strings in linux/string_helpers.h and use it in drivers and tomoyo. - Clarify dma_fence_chain and dma_fence_array should never include eachother. - Flatten chains in syncobj's. - Don't double add in fbdev/defio when page is already enlisted. - Don't sort deferred-I/O pages by default in fbdev. Core Changes: - Fix missing pm_runtime_put_sync in bridge. - Set modifier support to only linear fb modifier if drivers don't advertise support. - As a result, we remove allow_fb_modifiers. - Add missing clear for EDID Deep Color Modes in drm_reset_display_info. - Assorted documentation updates. - Warn once in drm_clflush if there is no arch support. - Add missing select for dp helper in drm_panel_edp. - Assorted small fixes. - Improve fb-helper's clipping handling. - Don't dump shmem mmaps in a core dump. - Add accounting to ttm resource manager, and use it in amdgpu. - Allow querying the detected eDP panel through debugfs. - Add helpers for xrgb8888 to 8 and 1 bits gray. - Improve drm's buddy allocator. - Add selftests for the buddy allocator. Driver Changes: - Add support for nomodeset to a lot of drm drivers. - Use drm_module_*_driver in a lot of drm drivers. - Assorted small fixes to bridge/lt9611, v3d, vc4, vmwgfx, mxsfb, nouveau, bridge/dw-hdmi, panfrost, lima, ingenic, sprd, bridge/anx7625, ti-sn65dsi86. - Add bridge/it6505. - Create DP and DVI-I connectors in ast. - Assorted nouveau backlight fixes. - Rework amdgpu reset handling. - Add dt bindings for ingenic,jz4780-dw-hdmi. - Support reading edid through aux channel in ingenic. - Add a drm driver for Solomon SSD130x OLED displays. - Add simple support for sharp LQ140M1JW46. - Add more panels to nt35560. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/686ec871-e77f-c230-22e5-9e3bb80f064a@linux.intel.com
2022-02-17drm/amdgpu: Optimize xxx_ras_late_init function of each ras blockyipechai
1. Move calling ras block instance members from module internal function to the top calling xxx_ras_late_init. 2. Module internal function calls can only use parameter variables of xxx_ras_late_init instead of ras block instance members. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-17drm/amdgpu: Modify .ras_late_init function pointer parameteryipechai
Modify .ras_late_init function pointer parameter so that it can remove redundant intermediate calls in some ras blocks. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-14drm/amdgpu: Optimize amdgpu_xgmi_ras_late_init/amdgpu_xgmi_ras_fini function ↵yipechai
code Optimize amdgpu_xgmi_ras_late_init/amdgpu_xgmi_ras_fini function code. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-14drm/amdgpu: Optimize xxx_ras_late_init/xxx_ras_late_fini for each ras blockyipechai
1. Define amdgpu_ras_block_late_init to create sysfs nodes and interrupt handles. 2. Define amdgpu_ras_block_late_fini to remove sysfs nodes and interrupt handles. 3. Replace ras block variable members in struct amdgpu_ras_block_object with struct ras_common_if, which can make it easy to associate each ras block instance with each ras block functional interface. 4. Add .ras_cb to struct amdgpu_ras_block_object. 5. Change each ras block to fit for the changement of struct amdgpu_ras_block_object. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09drm/amdgpu: Rework reset domain to be refcounted.Andrey Grodzovsky
The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on boot witrh XGMI hive by adding type to reset_domain. XGMI will only create a new reset_domain if prevoius was of single device type meaning it's first boot. Otherwsie it will take a refocunt to exsiting reset_domain from the amdgou device. Add a wrapper around reset_domain->refcount get/put and a wrapper around send to reset wq (Lijo) Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74121.html
2022-02-09drm/amdgpu: Drop hive->in_resetAndrey Grodzovsky
Since we serialize all resets no need to protect from concurrent resets. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74115.html
2022-02-09drm/amdgpu: Introduce reset domainAndrey Grodzovsky
Defined a reset_domain struct such that all the entities that go through reset together will be serialized one against another. Do it for both single device and XGMI hive cases. Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Suggested-by: Christian König <ckoenig.leichtzumerken@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://www.spinics.net/lists/amd-gfx/msg74111.html
2022-01-18drm/amdgpu: Fix the code style warnings in hdp xgmi mca and umcyipechai
drm/amdgpu: Fix the code style warnings in hdp xgmi mca and umc: 1. WARNING: missing space after struct definition. 2. WARNING: please, no space before tabs. 3. WARNING: line length of xxx exceeds 100 columns. 4. ERROR: "foo* bar" should be "foo *bar". 5. ERROR: space required before the open parenthesis '('. 6. ERROR: space prohibited after that open parenthesis '('. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amdgpu: Adjust error inject function code style in amdgpu_ras.cyipechai
1. Move xgmi special error inject function from amdgpu_ras.c to xgmi block. 2. Support to use psp_ras_trigger_error as default error inject function in amdgpu_ras.c. If .ras_error_inject isn't defined in ras block, default error inject function will take effect. v2: squash in warning fix (Alex) Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-14drm/amdgpu: Modify xgmi block to fit for the unified ras block data and opsyipechai
1.Modify gmc block to fit for the unified ras block data and ops. 2.Change amdgpu_xgmi_ras_funcs to amdgpu_xgmi_ras, and the corresponding variable name remove _funcs suffix. 3.Remove the const flag of gmc ras variable so that gmc ras block can be able to be inserted into amdgpu device ras block link list. 4.Invoke amdgpu_ras_register_ras_block function to register gmc ras block into amdgpu device ras block link list. 5.Remove the redundant code about gmc in amdgpu_ras.c after using the unified ras block. Signed-off-by: yipechai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: John Clements <john.clements@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-11drm/amdgpu: use default_groups in kobj_typeGreg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the amdgpu sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jonathan Kim <jonathan.kim@amd.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: shaoyunl <shaoyun.liu@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13drm/amdgpu: check df_funcs and its callback pointersHawking Zhang
in case they are not avaiable in early phase Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-22drm/amd/amdgpu: fix potential memleakBernard Zhao
In function amdgpu_get_xgmi_hive, when kobject_init_and_add failed There is a potential memleak if not call kobject_put. Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Bernard Zhao <bernard@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-05drm/amdgpu: correct xgmi ras error count resetTao Zhou
The error count reset for xgmi3x16 pcs is missed. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-26drm/amdgpu: Add support for RAS XGMI err queryJohn Clements
Update XGMI RAS to support error query on aldebaran Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: John Clements <john.clements@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>