Merge tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm

Pull drm updates from Dave Airlie: "Highlights are support for privacy screens found in new laptops, a bunch of nomodeset refactoring, and i915 enables ADL-P systems by default, while starting to add RPL-S support. vmwgfx adds GEM and support for OpenGL 4.3 features in userspace. Lots of internal refactorings around dma reservations, and lots of driver refactoring as well. Summary: core: - add privacy screen support - move nomodeset option into drm subsystem - clean up nomodeset handling in drivers - make drm_irq.c legacy - fix stack_depot name conflicts - remove DMA_BUF_SET_NAME ioctl restrictions - sysfs: send hotplug event - replace several DRM_* logging macros with drm_* - move hashtable to legacy code - add error return from gem_create_object - cma-helper: improve interfaces, drop CONFIG_DRM_KMS_CMA_HELPER - kernel.h related include cleanups - support XRGB2101010 source buffers ttm: - don't include drm hashtable - stop pruning fences after wait - documentation updates dma-buf: - add dma_resv selftest - add debugfs helpers - remove dma_resv_get_excl_unlocked - documentation - make fences mandatory in dma_resv_add_excl_fence dp: - add link training delay helpers gem: - link shmem/cma helpers into separate modules - use dma_resv iteratior - import dma-buf namespace into gem helper modules scheduler: - fence grab fix - lockdep fixes bridge: - switch to managed MIPI DSI helpers - register and attach during probe fixes - convert to YAML in several places. panel: - add bunch of new panesl simpledrm: - support FB_DAMAGE_CLIPS - support virtual screen sizes - add Apple M1 support amdgpu: - enable seamless boot for DCN 3.01 - runtime PM fixes - use drm_kms_helper_connector_hotplug_event - get all fences at once - use generic drm fb helpers - PSR/DPCD/LTTPR/DSC/PM/RAS/OLED/SRIOV fixes - add smart trace buffer (STB) for supported GPUs - display debugfs entries - new SMU debug option - Documentation update amdkfd: - IP discovery enumeration refactor - interface between driver fixes - SVM fixes - kfd uapi header to define some sysfs bitfields. i915: - support VESA panel backlights - enable ADL-P by default - add eDP privacy screen support - add Raptor Lake S (RPL-S) support - DG2 page table support - lots of GuC/HuC fw refactoring - refactored i915->gt interfaces - CD clock squashing support - enable 10-bit gamma support - update ADL-P DMC fw to v2.14 - enable runtime PM autosuspend by default - ADL-P DSI support - per-lane DP drive settings for ICL+ - add support for pipe C/D DMC firmware - Atomic gamma LUT updates - remove CCS FB stride restrictions on ADL-P - VRR platform support for display 11 - add support for display audio codec keepalive - lots of display refactoring - fix runtime PM handling during PXP suspend - improved eviction performance with async TTM moves - async VMA unbinding improvements - VMA locking refactoring - improved error capture robustness - use per device iommu checks - drop bits stealing from i915_sw_fence function ptr - remove dma_resv_prune - add IC cache invalidation on DG2 nouveau: - crc fixes - validate LUTs in atomic check - set HDMI AVI RGB quant to full tegra: - buffer objects reworks for dma-buf compat - NVDEC driver uAPI support - power management improvements etnaviv: - IOMMU enabled system support - fix > 4GB command buffer mapping - close a DoS vector - fix spurious GPU resets ast: - fix i2c initialization rcar-du: - DSI output support exynos: - replace legacy gpio interface - implement generic GEM object mmap msm: - dpu plane state cleanup in prep for multirect - dpu debugfs cleanups - dp support for sc7280 - a506 support - removal of struct_mutex - remove old eDP sub-driver anx7625: - support MIPI DSI input - support HDMI audio - fix reading EDID lvds: - fix bridge DT bindings megachips: - probe both bridges before registering dw-hdmi: - allow interlace on bridge ps8640: - enable runtime PM - support aux-bus tx358768: - enable reference clock - add pulse mode support ti-sn65dsi86: - use regmap bulk write - add PWM support etnaviv: - get all fences at once gma500: - gem object cleanups kmb: - enable fb console radeon: - use dma_resv_wait_timeout rockchip: - add DSP hold timeout - suspend/resume fixes - PLL clock fixes - implement mmap in GEM object functions - use generic fbdev emulation sun4i: - use CMA helpers without vmap support vc4: - fix HDMI-CEC hang with display is off - power on HDMI controller while disabling - support 4K@60Hz modes - support 10-bit YUV 4:2:0 output vmwgfx: - fix leak on probe errors - fail probing on broken hosts - new placement for MOB page tables - hide internal BOs from userspace - implement GEM support - implement GL 4.3 support virtio: - overflow fixes xen: - implement mmap as GEM object function omapdrm: - fix scatterlist export - support virtual planes mediatek: - MT8192 support - CMDQ refinement" * tag 'drm-next-2022-01-07' of git://anongit.freedesktop.org/drm/drm: (1241 commits) drm/amdgpu: no DC support for headless chips drm/amd/display: fix dereference before NULL check drm/amdgpu: always reset the asic in suspend (v2) drm/amdgpu: put SMU into proper state on runpm suspending for BOCO capable platform drm/amd/display: Fix the uninitialized variable in enable_stream_features() drm/amdgpu: fix runpm documentation amdgpu/pm: Make sysfs pm attributes as read-only for VFs drm/amdgpu: save error count in RAS poison handler drm/amdgpu: drop redundant semicolon drm/amd/display: get and restore link res map drm/amd/display: support dynamic HPO DP link encoder allocation drm/amd/display: access hpo dp link encoder only through link resource drm/amd/display: populate link res in both detection and validation drm/amd/display: define link res and make it accessible to all link interfaces drm/amd/display: 3.2.167 drm/amd/display: [FW Promotion] Release 0.0.98 drm/amd/display: Undo ODM combine drm/amd/display: Add reg defs for DCN303 drm/amd/display: Changed pipe split policy to allow for multi-display pipe split drm/amd/display: Set optimize_pwr_state for DCN31 ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2022-01-10 12:58:46 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2022-01-10 12:58:46 -0800
commit: 8d0749b4f83bf4768ceae45ee6a79e6e7eddfc2a (patch)
tree: 069cc92e93982e0b921c09e71df6f7b68b4cbfa2 /drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
parent: bf4eebf8cfa2cd50e20b7321dfb3effdcdc6e909 (diff)
parent: cb6846fbb83b574c85c2a80211b402a6347b60b1 (diff)
1 files changed, 142 insertions, 43 deletions
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 694c3726e0f4..a8b08a72b71b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -30,6 +30,7 @@
 #include <linux/module.h>
 #include <linux/console.h>
 #include <linux/slab.h>
+#include <linux/iommu.h>
 
 #include <drm/drm_atomic_helper.h>
 #include <drm/drm_probe_helper.h>
@@ -331,7 +332,7 @@ void amdgpu_device_mm_access(struct amdgpu_device *adev, loff_t pos,
 }
 
 /**
- * amdgpu_device_vram_access - access vram by vram aperature
+ * amdgpu_device_aper_access - access vram by vram aperature
  *
  * @adev: amdgpu_device pointer
  * @pos: offset of the buffer in vram
@@ -550,11 +551,11 @@ void amdgpu_device_wreg(struct amdgpu_device *adev,
 	trace_amdgpu_device_wreg(adev->pdev->device, reg, v);
 }
 
-/*
+/**
  * amdgpu_mm_wreg_mmio_rlc -  write register either with mmio or with RLC path if in range
  *
  * this function is invoked only the debugfs register access
- * */
+ */
 void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev,
 			     uint32_t reg, uint32_t v)
 {
@@ -1100,7 +1101,7 @@ static void amdgpu_device_wb_fini(struct amdgpu_device *adev)
 }
 
 /**
- * amdgpu_device_wb_init- Init Writeback driver info and allocate memory
+ * amdgpu_device_wb_init - Init Writeback driver info and allocate memory
  *
  * @adev: amdgpu_device pointer
  *
@@ -2316,6 +2317,10 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 
 		/* need to do gmc hw init early so we can allocate gpu mem */
 		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
+			/* Try to reserve bad pages early */
+			if (amdgpu_sriov_vf(adev))
+				amdgpu_virt_exchange_data(adev);
+
 			r = amdgpu_device_vram_scratch_init(adev);
 			if (r) {
 				DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
@@ -2347,7 +2352,7 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
 	}
 
 	if (amdgpu_sriov_vf(adev))
-		amdgpu_virt_init_data_exchange(adev);
+		amdgpu_virt_exchange_data(adev);
 
 	r = amdgpu_ib_pool_init(adev);
 	if (r) {
@@ -2614,11 +2619,10 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)
 	if (r)
 		DRM_ERROR("enable mgpu fan boost failed (%d).\n", r);
 
-	/* For XGMI + passthrough configuration on arcturus, enable light SBR */
-	if (adev->asic_type == CHIP_ARCTURUS &&
-	    amdgpu_passthrough(adev) &&
-	    adev->gmc.xgmi.num_physical_nodes > 1)
-		smu_set_light_sbr(&adev->smu, true);
+	/* For passthrough configuration on arcturus and aldebaran, enable special handling SBR */
+	if (amdgpu_passthrough(adev) && ((adev->asic_type == CHIP_ARCTURUS && adev->gmc.xgmi.num_physical_nodes > 1)||
+			       adev->asic_type == CHIP_ALDEBARAN ))
+		smu_handle_passthrough_sbr(&adev->smu, true);
 
 	if (adev->gmc.xgmi.num_physical_nodes > 1) {
 		mutex_lock(&mgpu_info.mutex);
@@ -2657,6 +2661,36 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)
 	return 0;
 }
 
+/**
+ * amdgpu_device_smu_fini_early - smu hw_fini wrapper
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * For ASICs need to disable SMC first
+ */
+static void amdgpu_device_smu_fini_early(struct amdgpu_device *adev)
+{
+	int i, r;
+
+	if (adev->ip_versions[GC_HWIP][0] > IP_VERSION(9, 0, 0))
+		return;
+
+	for (i = 0; i < adev->num_ip_blocks; i++) {
+		if (!adev->ip_blocks[i].status.hw)
+			continue;
+		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_SMC) {
+			r = adev->ip_blocks[i].version->funcs->hw_fini((void *)adev);
+			/* XXX handle errors */
+			if (r) {
+				DRM_DEBUG("hw_fini of IP block <%s> failed %d\n",
+					  adev->ip_blocks[i].version->funcs->name, r);
+			}
+			adev->ip_blocks[i].status.hw = false;
+			break;
+		}
+	}
+}
+
 static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev)
 {
 	int i, r;
@@ -2677,21 +2711,8 @@ static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev)
 	amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
 	amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
 
-	/* need to disable SMC first */
-	for (i = 0; i < adev->num_ip_blocks; i++) {
-		if (!adev->ip_blocks[i].status.hw)
-			continue;
-		if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_SMC) {
-			r = adev->ip_blocks[i].version->funcs->hw_fini((void *)adev);
-			/* XXX handle errors */
-			if (r) {
-				DRM_DEBUG("hw_fini of IP block <%s> failed %d\n",
-					  adev->ip_blocks[i].version->funcs->name, r);
-			}
-			adev->ip_blocks[i].status.hw = false;
-			break;
-		}
-	}
+	/* Workaroud for ASICs need to disable SMC first */
+	amdgpu_device_smu_fini_early(adev);
 
 	for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
 		if (!adev->ip_blocks[i].status.hw)
@@ -2733,8 +2754,6 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
 	if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
 		amdgpu_virt_release_ras_err_handler_data(adev);
 
-	amdgpu_ras_pre_fini(adev);
-
 	if (adev->gmc.xgmi.num_physical_nodes > 1)
 		amdgpu_xgmi_remove_device(adev);
 
@@ -3373,6 +3392,22 @@ static int amdgpu_device_get_job_timeout_settings(struct amdgpu_device *adev)
 	return ret;
 }
 
+/**
+ * amdgpu_device_check_iommu_direct_map - check if RAM direct mapped to GPU
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * RAM direct mapped to GPU if IOMMU is not enabled or is pass through mode
+ */
+static void amdgpu_device_check_iommu_direct_map(struct amdgpu_device *adev)
+{
+	struct iommu_domain *domain;
+
+	domain = iommu_get_domain_for_dev(adev->dev);
+	if (!domain || domain->type == IOMMU_DOMAIN_IDENTITY)
+		adev->ram_is_direct_mapped = true;
+}
+
 static const struct attribute *amdgpu_dev_attributes[] = {
 	&dev_attr_product_name.attr,
 	&dev_attr_product_number.attr,
@@ -3547,6 +3582,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	if (r)
 		return r;
 
+	/* Need to get xgmi info early to decide the reset behavior*/
+	if (adev->gmc.xgmi.supported) {
+		r = adev->gfxhub.funcs->get_xgmi_info(adev);
+		if (r)
+			return r;
+	}
+
 	/* enable PCIE atomic ops */
 	if (amdgpu_sriov_vf(adev))
 		adev->have_atomics_support = ((struct amd_sriov_msg_pf2vf_info *)
@@ -3693,8 +3735,6 @@ fence_driver_init:
 	/* Get a log2 for easy divisions. */
 	adev->mm_stats.log2_max_MBps = ilog2(max(1u, max_MBps));
 
-	amdgpu_fbdev_init(adev);
-
 	r = amdgpu_pm_sysfs_init(adev);
 	if (r) {
 		adev->pm_sysfs_en = false;
@@ -3778,6 +3818,8 @@ fence_driver_init:
 		queue_delayed_work(system_wq, &mgpu_info.delayed_reset_work,
 				   msecs_to_jiffies(AMDGPU_RESUME_MS));
 
+	amdgpu_device_check_iommu_direct_map(adev);
+
 	return 0;
 
 release_ras_con:
@@ -3811,7 +3853,7 @@ static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
 }
 
 /**
- * amdgpu_device_fini - tear down the driver
+ * amdgpu_device_fini_hw - tear down the driver
  *
  * @adev: amdgpu_device pointer
  *
@@ -3852,17 +3894,21 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 		amdgpu_ucode_sysfs_fini(adev);
 	sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
 
-	amdgpu_fbdev_fini(adev);
+	/* disable ras feature must before hw fini */
+	amdgpu_ras_pre_fini(adev);
 
 	amdgpu_device_ip_fini_early(adev);
 
 	amdgpu_irq_fini_hw(adev);
 
-	ttm_device_clear_dma_mappings(&adev->mman.bdev);
+	if (adev->mman.initialized)
+		ttm_device_clear_dma_mappings(&adev->mman.bdev);
 
 	amdgpu_gart_dummy_page_fini(adev);
 
-	amdgpu_device_unmap_mmio(adev);
+	if (drm_dev_is_unplugged(adev_to_drm(adev)))
+		amdgpu_device_unmap_mmio(adev);
+
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
@@ -3948,7 +3994,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
 	drm_kms_helper_poll_disable(dev);
 
 	if (fbcon)
-		amdgpu_fbdev_set_suspend(adev, 1);
+		drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true);
 
 	cancel_delayed_work_sync(&adev->delayed_init_work);
 
@@ -4025,7 +4071,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool fbcon)
 	flush_delayed_work(&adev->delayed_init_work);
 
 	if (fbcon)
-		amdgpu_fbdev_set_suspend(adev, 0);
+		drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, false);
 
 	drm_kms_helper_poll_enable(dev);
 
@@ -4294,6 +4340,9 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
 				     bool from_hypervisor)
 {
 	int r;
+	struct amdgpu_hive_info *hive = NULL;
+
+	amdgpu_amdkfd_pre_reset(adev);
 
 	amdgpu_amdkfd_pre_reset(adev);
 
@@ -4322,9 +4371,19 @@ static int amdgpu_device_reset_sriov(struct amdgpu_device *adev,
 	if (r)
 		goto error;
 
-	amdgpu_irq_gpu_reset_resume_helper(adev);
-	r = amdgpu_ib_ring_tests(adev);
-	amdgpu_amdkfd_post_reset(adev);
+	hive = amdgpu_get_xgmi_hive(adev);
+	/* Update PSP FW topology after reset */
+	if (hive && adev->gmc.xgmi.num_physical_nodes > 1)
+		r = amdgpu_xgmi_update_topology(hive, adev);
+
+	if (hive)
+		amdgpu_put_xgmi_hive(hive);
+
+	if (!r) {
+		amdgpu_irq_gpu_reset_resume_helper(adev);
+		r = amdgpu_ib_ring_tests(adev);
+		amdgpu_amdkfd_post_reset(adev);
+	}
 
 error:
 	if (!r && adev->virt.gim_feature & AMDGIM_FEATURE_GIM_FLR_VRAMLOST) {
@@ -4650,7 +4709,7 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,
 				if (r)
 					goto out;
 
-				amdgpu_fbdev_set_suspend(tmp_adev, 0);
+				drm_fb_helper_set_suspend_unlocked(adev_to_drm(tmp_adev)->fb_helper, false);
 
 				/*
 				 * The GPU enters bad state once faulty pages
@@ -4749,7 +4808,7 @@ static int amdgpu_device_lock_hive_adev(struct amdgpu_device *adev, struct amdgp
 {
 	struct amdgpu_device *tmp_adev = NULL;
 
-	if (adev->gmc.xgmi.num_physical_nodes > 1) {
+	if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1)) {
 		if (!hive) {
 			dev_err(adev->dev, "Hive is NULL while device has multiple xgmi nodes");
 			return -ENODEV;
@@ -4961,7 +5020,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	 * We always reset all schedulers for device and all devices for XGMI
 	 * hive so that should take care of them too.
 	 */
-	hive = amdgpu_get_xgmi_hive(adev);
+	if (!amdgpu_sriov_vf(adev))
+		hive = amdgpu_get_xgmi_hive(adev);
 	if (hive) {
 		if (atomic_cmpxchg(&hive->in_reset, 0, 1) != 0) {
 			DRM_INFO("Bailing on TDR for s_job:%llx, hive: %llx as another already in progress",
@@ -5002,7 +5062,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 	 * to put adev in the 1st position.
 	 */
 	INIT_LIST_HEAD(&device_list);
-	if (adev->gmc.xgmi.num_physical_nodes > 1) {
+	if (!amdgpu_sriov_vf(adev) && (adev->gmc.xgmi.num_physical_nodes > 1)) {
 		list_for_each_entry(tmp_adev, &hive->device_list, gmc.xgmi.head)
 			list_add_tail(&tmp_adev->reset_list, &device_list);
 		if (!list_is_first(&adev->reset_list, &device_list))
@@ -5041,7 +5101,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
 		 */
 		amdgpu_unregister_gpu_instance(tmp_adev);
 
-		amdgpu_fbdev_set_suspend(tmp_adev, 1);
+		drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true);
 
 		/* disable ras on ALL IPs */
 		if (!need_emergency_restart &&
@@ -5636,3 +5696,42 @@ void amdgpu_device_invalidate_hdp(struct amdgpu_device *adev,
 
 	amdgpu_asic_invalidate_hdp(adev, ring);
 }
+
+/**
+ * amdgpu_device_halt() - bring hardware to some kind of halt state
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Bring hardware to some kind of halt state so that no one can touch it
+ * any more. It will help to maintain error context when error occurred.
+ * Compare to a simple hang, the system will keep stable at least for SSH
+ * access. Then it should be trivial to inspect the hardware state and
+ * see what's going on. Implemented as following:
+ *
+ * 1. drm_dev_unplug() makes device inaccessible to user space(IOCTLs, etc),
+ *    clears all CPU mappings to device, disallows remappings through page faults
+ * 2. amdgpu_irq_disable_all() disables all interrupts
+ * 3. amdgpu_fence_driver_hw_fini() signals all HW fences
+ * 4. set adev->no_hw_access to avoid potential crashes after setp 5
+ * 5. amdgpu_device_unmap_mmio() clears all MMIO mappings
+ * 6. pci_disable_device() and pci_wait_for_pending_transaction()
+ *    flush any in flight DMA operations
+ */
+void amdgpu_device_halt(struct amdgpu_device *adev)
+{
+	struct pci_dev *pdev = adev->pdev;
+	struct drm_device *ddev = adev_to_drm(adev);
+
+	drm_dev_unplug(ddev);
+
+	amdgpu_irq_disable_all(adev);
+
+	amdgpu_fence_driver_hw_fini(adev);
+
+	adev->no_hw_access = true;
+
+	amdgpu_device_unmap_mmio(adev);
+
+	pci_disable_device(pdev);
+	pci_wait_for_pending_transaction(pdev);
+}
author	Linus Torvalds <torvalds@linux-foundation.org>	2022-01-10 12:58:46 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2022-01-10 12:58:46 -0800
commit	8d0749b4f83bf4768ceae45ee6a79e6e7eddfc2a (patch)
tree	069cc92e93982e0b921c09e71df6f7b68b4cbfa2 /drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
parent	bf4eebf8cfa2cd50e20b7321dfb3effdcdc6e909 (diff)
parent	cb6846fbb83b574c85c2a80211b402a6347b60b1 (diff)