Age | Commit message (Collapse) | Author |
|
amdgpu pr conconflicts due to patches cherry-picked to -fixes, I might
as well catch up with a backmerge and handle them all. Plus both misc
and intel maintainers asked for a backmerge anyway.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup, when the iommu is enabled:
kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
Fix this by using the non-coherent allocator instead, I think there
might be a better answer to this, but it involve ripping up some of
APIs using sg lists.
Cc: stable@vger.kernel.org
Fixes: 2541626cfb79 ("drm/nouveau/acr: use common falcon HS FW code for ACR FWs")
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240815201923.632803-1-airlied@gmail.com
|
|
There's no good reason the ioremap() that results from nvif_object_map()
should fail, so add a check that the map succeeded, and remove the rd/wr
methods from display channel objects.
As this was the last user of rd/wr methods, the nvif plumbing is removed
at the same time.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-27-bskeggs@nvidia.com
|
|
The previous commit ensures the device is always mapped, so these
are unneeded.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-26-bskeggs@nvidia.com
|
|
Does nothing. Remove it.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-23-bskeggs@nvidia.com
|
|
This was once used by userspace tools (with nvkm built as a library),
but is now unused.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-22-bskeggs@nvidia.com
|
|
This is not, and has never, been used for anything. Remove it.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-21-bskeggs@nvidia.com
|
|
This was once used by userspace tools (with nvkm built as a library), as
a way to select a "default device".
The DRM code doesn't need this at all as clients only have access to a
single device already, so inherit the value from its parent.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-20-bskeggs@nvidia.com
|
|
These were a cludge used to prevent userspace's nvif ioctl from
accessing objects created by the kernel for the same client.
That interface was removed in a previous patch, so these are no
longer useful for anything.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-16-bskeggs@nvidia.com
|
|
Has been unused for a while now.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-14-bskeggs@nvidia.com
|
|
This reverts commit 52a6947bf576b97ff8e14bb0a31c5eaf2d0d96e2.
This causes loading failures in
[ 0.367379] nouveau 0000:01:00.0: NVIDIA GP104 (134000a1)
[ 0.474499] nouveau 0000:01:00.0: bios: version 86.04.50.80.13
[ 0.474620] nouveau 0000:01:00.0: pmu: firmware unavailable
[ 0.474977] nouveau 0000:01:00.0: fb: 8192 MiB GDDR5
[ 0.484371] nouveau 0000:01:00.0: sec2(acr): mbox 00000001 00000000
[ 0.484377] nouveau 0000:01:00.0: sec2(acr):load: boot failed: -5
[ 0.484379] nouveau 0000:01:00.0: acr: init failed, -5
[ 0.484466] nouveau 0000:01:00.0: init failed with -5
[ 0.484468] nouveau: DRM-master:00000000:00000080: init failed with -5
[ 0.484470] nouveau 0000:01:00.0: DRM-master: Device allocation failed: -5
[ 0.485078] nouveau 0000:01:00.0: probe with driver nouveau failed with error -50
I tried tracking it down but ran out of time this week, will revisit next week.
Reported-by: Dan Moulding <dan@danm.net>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a
BUG() on startup:
kernel BUG at include/linux/scatterlist.h:187!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30
Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019
RIP: 0010:sg_init_one+0x85/0xa0
Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54
24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b
0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00
RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000
RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508
R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018
FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0
Call Trace:
<TASK>
? die+0x36/0x90
? do_trap+0xdd/0x100
? sg_init_one+0x85/0xa0
? do_error_trap+0x65/0x80
? sg_init_one+0x85/0xa0
? exc_invalid_op+0x50/0x70
? sg_init_one+0x85/0xa0
? asm_exc_invalid_op+0x1a/0x20
? sg_init_one+0x85/0xa0
nvkm_firmware_ctor+0x14a/0x250 [nouveau]
nvkm_falcon_fw_ctor+0x42/0x70 [nouveau]
ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau]
r535_gsp_oneinit+0xb3/0x15f0 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? nvkm_udevice_new+0x95/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
? srso_return_thunk+0x5/0x5f
? ktime_get+0x47/0xb0
? srso_return_thunk+0x5/0x5f
nvkm_subdev_oneinit_+0x4f/0x120 [nouveau]
nvkm_subdev_init_+0x39/0x140 [nouveau]
? srso_return_thunk+0x5/0x5f
nvkm_subdev_init+0x44/0x90 [nouveau]
nvkm_device_init+0x166/0x2e0 [nouveau]
nvkm_udevice_init+0x47/0x70 [nouveau]
nvkm_object_init+0x41/0x1c0 [nouveau]
nvkm_ioctl_new+0x16a/0x290 [nouveau]
? __pfx_nvkm_client_child_new+0x10/0x10 [nouveau]
? __pfx_nvkm_udevice_new+0x10/0x10 [nouveau]
nvkm_ioctl+0x126/0x290 [nouveau]
nvif_object_ctor+0x112/0x190 [nouveau]
nvif_device_ctor+0x23/0x60 [nouveau]
nouveau_cli_init+0x164/0x640 [nouveau]
nouveau_drm_device_init+0x97/0x9e0 [nouveau]
? srso_return_thunk+0x5/0x5f
? pci_update_current_state+0x72/0xb0
? srso_return_thunk+0x5/0x5f
nouveau_drm_probe+0x12c/0x280 [nouveau]
? srso_return_thunk+0x5/0x5f
local_pci_probe+0x45/0xa0
pci_device_probe+0xc7/0x270
really_probe+0xe6/0x3a0
__driver_probe_device+0x87/0x160
driver_probe_device+0x1f/0xc0
__driver_attach+0xec/0x1f0
? __pfx___driver_attach+0x10/0x10
bus_for_each_dev+0x88/0xd0
bus_add_driver+0x116/0x220
driver_register+0x59/0x100
? __pfx_nouveau_drm_init+0x10/0x10 [nouveau]
do_one_initcall+0x5b/0x320
do_init_module+0x60/0x250
init_module_from_file+0x86/0xc0
idempotent_init_module+0x120/0x2b0
__x64_sys_finit_module+0x5e/0xb0
do_syscall_64+0x83/0x160
? srso_return_thunk+0x5/0x5f
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7feeb5cc20cd
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 1b cd 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffcf220b2c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055fdd2916aa0 RCX: 00007feeb5cc20cd
RDX: 0000000000000000 RSI: 000055fdd29161e0 RDI: 0000000000000035
RBP: 00007ffcf220b380 R08: 00007feeb5d8fb20 R09: 00007ffcf220b310
R10: 000055fdd2909dc0 R11: 0000000000000246 R12: 000055fdd29161e0
R13: 0000000000020000 R14: 000055fdd29203e0 R15: 000055fdd2909d80
</TASK>
We hit this when trying to initialize firmware of type
NVKM_FIRMWARE_IMG_DMA because we allocate our memory with
dma_alloc_coherent, and DMA allocations can't be turned back into memory
pages - which a scatterlist needs in order to map them.
So, fix this by allocating the memory with vmalloc instead().
V2:
* Fixup explanation as the prior one was bogus
Signed-off-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20240429182318.189668-1-lyude@redhat.com
|
|
It appears the client object tree has no locking unless I've missed
something else. Fix races around adding/removing client objects,
mostly vram bar mappings.
4562.099306] general protection fault, probably for non-canonical address 0x6677ed422bceb80c: 0000 [#1] PREEMPT SMP PTI
[ 4562.099314] CPU: 2 PID: 23171 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27
[ 4562.099324] Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
[ 4562.099330] RIP: 0010:nvkm_object_search+0x1d/0x70 [nouveau]
[ 4562.099503] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 48 89 f8 48 85 f6 74 39 48 8b 87 a0 00 00 00 48 85 c0 74 12 <48> 8b 48 f8 48 39 ce 73 15 48 8b 40 10 48 85 c0 75 ee 48 c7 c0 fe
[ 4562.099506] RSP: 0000:ffffa94cc420bbf8 EFLAGS: 00010206
[ 4562.099512] RAX: 6677ed422bceb814 RBX: ffff98108791f400 RCX: ffff9810f26b8f58
[ 4562.099517] RDX: 0000000000000000 RSI: ffff9810f26b9158 RDI: ffff98108791f400
[ 4562.099519] RBP: ffff9810f26b9158 R08: 0000000000000000 R09: 0000000000000000
[ 4562.099521] R10: ffffa94cc420bc48 R11: 0000000000000001 R12: ffff9810f02a7cc0
[ 4562.099526] R13: 0000000000000000 R14: 00000000000000ff R15: 0000000000000007
[ 4562.099528] FS: 00007f629c5017c0(0000) GS:ffff98142c700000(0000) knlGS:0000000000000000
[ 4562.099534] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4562.099536] CR2: 00007f629a882000 CR3: 000000017019e004 CR4: 00000000003706f0
[ 4562.099541] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4562.099542] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4562.099544] Call Trace:
[ 4562.099555] <TASK>
[ 4562.099573] ? die_addr+0x36/0x90
[ 4562.099583] ? exc_general_protection+0x246/0x4a0
[ 4562.099593] ? asm_exc_general_protection+0x26/0x30
[ 4562.099600] ? nvkm_object_search+0x1d/0x70 [nouveau]
[ 4562.099730] nvkm_ioctl+0xa1/0x250 [nouveau]
[ 4562.099861] nvif_object_map_handle+0xc8/0x180 [nouveau]
[ 4562.099986] nouveau_ttm_io_mem_reserve+0x122/0x270 [nouveau]
[ 4562.100156] ? dma_resv_test_signaled+0x26/0xb0
[ 4562.100163] ttm_bo_vm_fault_reserved+0x97/0x3c0 [ttm]
[ 4562.100182] ? __mutex_unlock_slowpath+0x2a/0x270
[ 4562.100189] nouveau_ttm_fault+0x69/0xb0 [nouveau]
[ 4562.100356] __do_fault+0x32/0x150
[ 4562.100362] do_fault+0x7c/0x560
[ 4562.100369] __handle_mm_fault+0x800/0xc10
[ 4562.100382] handle_mm_fault+0x17c/0x3e0
[ 4562.100388] do_user_addr_fault+0x208/0x860
[ 4562.100395] exc_page_fault+0x7f/0x200
[ 4562.100402] asm_exc_page_fault+0x26/0x30
[ 4562.100412] RIP: 0033:0x9b9870
[ 4562.100419] Code: 85 a8 f7 ff ff 8b 8d 80 f7 ff ff 89 08 e9 18 f2 ff ff 0f 1f 84 00 00 00 00 00 44 89 32 e9 90 fa ff ff 0f 1f 84 00 00 00 00 00 <44> 89 32 e9 f8 f1 ff ff 0f 1f 84 00 00 00 00 00 66 44 89 32 e9 e7
[ 4562.100422] RSP: 002b:00007fff9ba2dc70 EFLAGS: 00010246
[ 4562.100426] RAX: 0000000000000004 RBX: 000000000dd65e10 RCX: 000000fff0000000
[ 4562.100428] RDX: 00007f629a882000 RSI: 00007f629a882000 RDI: 0000000000000066
[ 4562.100432] RBP: 00007fff9ba2e570 R08: 0000000000000000 R09: 0000000123ddf000
[ 4562.100434] R10: 0000000000000001 R11: 0000000000000246 R12: 000000007fffffff
[ 4562.100436] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 4562.100446] </TASK>
[ 4562.100448] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables libcrc32c nfnetlink cmac bnep sunrpc iwlmvm intel_rapl_msr intel_rapl_common snd_sof_pci_intel_cnl x86_pkg_temp_thermal intel_powerclamp snd_sof_intel_hda_common mac80211 coretemp snd_soc_acpi_intel_match kvm_intel snd_soc_acpi snd_soc_hdac_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof kvm snd_sof_utils snd_soc_core snd_hda_codec_realtek libarc4 snd_hda_codec_generic snd_compress snd_hda_ext_core vfat fat snd_hda_intel snd_intel_dspcfg irqbypass iwlwifi snd_hda_codec snd_hwdep snd_hda_core btusb btrtl mei_hdcp iTCO_wdt rapl mei_pxp btintel snd_seq iTCO_vendor_support btbcm snd_seq_device intel_cstate bluetooth snd_pcm cfg80211 intel_wmi_thunderbolt wmi_bmof intel_uncore snd_timer mei_me snd ecdh_generic i2c_i801
[ 4562.100541] ecc mei i2c_smbus soundcore rfkill intel_pch_thermal acpi_pad zram nouveau drm_ttm_helper ttm gpu_sched i2c_algo_bit drm_gpuvm drm_exec mxm_wmi drm_display_helper drm_kms_helper drm crct10dif_pclmul crc32_pclmul nvme e1000e crc32c_intel nvme_core ghash_clmulni_intel video wmi pinctrl_cannonlake ip6_tables ip_tables fuse
[ 4562.100616] ---[ end trace 0000000000000000 ]---
Signed-off-by: Dave Airlie <airlied@redhat.com>
Cc: stable@vger.kernel.org
|
|
This allows it to break the following circular locking dependency.
Aug 10 07:01:29 dg1test kernel: ======================================================
Aug 10 07:01:29 dg1test kernel: WARNING: possible circular locking dependency detected
Aug 10 07:01:29 dg1test kernel: 6.4.0-rc7+ #10 Not tainted
Aug 10 07:01:29 dg1test kernel: ------------------------------------------------------
Aug 10 07:01:29 dg1test kernel: wireplumber/2236 is trying to acquire lock:
Aug 10 07:01:29 dg1test kernel: ffff8fca5320da18 (&fctx->lock){-...}-{2:2}, at: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel:
but task is already holding lock:
Aug 10 07:01:29 dg1test kernel: ffff8fca41208610 (&event->list_lock#2){-...}-{2:2}, at: nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:
which lock already depends on the new lock.
Aug 10 07:01:29 dg1test kernel:
the existing dependency chain (in reverse order) is:
Aug 10 07:01:29 dg1test kernel:
-> #3 (&event->list_lock#2){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel: ga100_fifo_nonstall_intr+0x24/0x30 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel: __handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel: handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel: handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel: __common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel: common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel: asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel:
-> #2 (&device->intr.lock){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: nvkm_inth_allow+0x2c/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_state+0x181/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_allow+0x63/0xd0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_uevent_mthd+0x4d/0x70 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_ioctl+0x10b/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvif_object_mthd+0xa8/0x1f0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvif_event_allow+0x2a/0xa0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_fence_enable_signaling+0x78/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel: __dma_fence_enable_signaling+0x5e/0x100
Aug 10 07:01:29 dg1test kernel: dma_fence_add_callback+0x4b/0xd0
Aug 10 07:01:29 dg1test kernel: nouveau_cli_work_queue+0xae/0x110 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_gem_object_close+0x1d1/0x2a0 [nouveau]
Aug 10 07:01:29 dg1test kernel: drm_gem_handle_delete+0x70/0xe0 [drm]
Aug 10 07:01:29 dg1test kernel: drm_ioctl_kernel+0xa5/0x150 [drm]
Aug 10 07:01:29 dg1test kernel: drm_ioctl+0x256/0x490 [drm]
Aug 10 07:01:29 dg1test kernel: nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
Aug 10 07:01:29 dg1test kernel: __x64_sys_ioctl+0x91/0xd0
Aug 10 07:01:29 dg1test kernel: do_syscall_64+0x3c/0x90
Aug 10 07:01:29 dg1test kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
Aug 10 07:01:29 dg1test kernel:
-> #1 (&event->refs_lock#4){....}-{2:2}:
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_state+0x37/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy_allow+0x63/0xd0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_uevent_mthd+0x4d/0x70 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_ioctl+0x10b/0x250 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvif_object_mthd+0xa8/0x1f0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvif_event_allow+0x2a/0xa0 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_fence_enable_signaling+0x78/0x80 [nouveau]
Aug 10 07:01:29 dg1test kernel: __dma_fence_enable_signaling+0x5e/0x100
Aug 10 07:01:29 dg1test kernel: dma_fence_add_callback+0x4b/0xd0
Aug 10 07:01:29 dg1test kernel: nouveau_cli_work_queue+0xae/0x110 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_gem_object_close+0x1d1/0x2a0 [nouveau]
Aug 10 07:01:29 dg1test kernel: drm_gem_handle_delete+0x70/0xe0 [drm]
Aug 10 07:01:29 dg1test kernel: drm_ioctl_kernel+0xa5/0x150 [drm]
Aug 10 07:01:29 dg1test kernel: drm_ioctl+0x256/0x490 [drm]
Aug 10 07:01:29 dg1test kernel: nouveau_drm_ioctl+0x5a/0xb0 [nouveau]
Aug 10 07:01:29 dg1test kernel: __x64_sys_ioctl+0x91/0xd0
Aug 10 07:01:29 dg1test kernel: do_syscall_64+0x3c/0x90
Aug 10 07:01:29 dg1test kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
Aug 10 07:01:29 dg1test kernel:
-> #0 (&fctx->lock){-...}-{2:2}:
Aug 10 07:01:29 dg1test kernel: __lock_acquire+0x14e3/0x2240
Aug 10 07:01:29 dg1test kernel: lock_acquire+0xc8/0x2a0
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_client_event+0xf/0x20 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy+0x9b/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel: ga100_fifo_nonstall_intr+0x24/0x30 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel: __handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel: handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel: handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel: __common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel: common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel: asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel:
other info that might help us debug this:
Aug 10 07:01:29 dg1test kernel: Chain exists of:
&fctx->lock --> &device->intr.lock --> &event->list_lock#2
Aug 10 07:01:29 dg1test kernel: Possible unsafe locking scenario:
Aug 10 07:01:29 dg1test kernel: CPU0 CPU1
Aug 10 07:01:29 dg1test kernel: ---- ----
Aug 10 07:01:29 dg1test kernel: lock(&event->list_lock#2);
Aug 10 07:01:29 dg1test kernel: lock(&device->intr.lock);
Aug 10 07:01:29 dg1test kernel: lock(&event->list_lock#2);
Aug 10 07:01:29 dg1test kernel: lock(&fctx->lock);
Aug 10 07:01:29 dg1test kernel:
*** DEADLOCK ***
Aug 10 07:01:29 dg1test kernel: 2 locks held by wireplumber/2236:
Aug 10 07:01:29 dg1test kernel: #0: ffff8fca53177bf8 (&device->intr.lock){-...}-{2:2}, at: nvkm_intr+0x29/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel: #1: ffff8fca41208610 (&event->list_lock#2){-...}-{2:2}, at: nvkm_event_ntfy+0x50/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel:
stack backtrace:
Aug 10 07:01:29 dg1test kernel: CPU: 6 PID: 2236 Comm: wireplumber Not tainted 6.4.0-rc7+ #10
Aug 10 07:01:29 dg1test kernel: Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021
Aug 10 07:01:29 dg1test kernel: Call Trace:
Aug 10 07:01:29 dg1test kernel: <TASK>
Aug 10 07:01:29 dg1test kernel: dump_stack_lvl+0x5b/0x90
Aug 10 07:01:29 dg1test kernel: check_noncircular+0xe2/0x110
Aug 10 07:01:29 dg1test kernel: __lock_acquire+0x14e3/0x2240
Aug 10 07:01:29 dg1test kernel: lock_acquire+0xc8/0x2a0
Aug 10 07:01:29 dg1test kernel: ? nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel: ? lock_acquire+0xc8/0x2a0
Aug 10 07:01:29 dg1test kernel: _raw_spin_lock_irqsave+0x4b/0x70
Aug 10 07:01:29 dg1test kernel: ? nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel: nouveau_fence_wait_uevent_handler+0x2b/0x100 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_client_event+0xf/0x20 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_event_ntfy+0x9b/0xf0 [nouveau]
Aug 10 07:01:29 dg1test kernel: ga100_fifo_nonstall_intr+0x24/0x30 [nouveau]
Aug 10 07:01:29 dg1test kernel: nvkm_intr+0x12c/0x240 [nouveau]
Aug 10 07:01:29 dg1test kernel: __handle_irq_event_percpu+0x88/0x240
Aug 10 07:01:29 dg1test kernel: handle_irq_event+0x38/0x80
Aug 10 07:01:29 dg1test kernel: handle_edge_irq+0xa3/0x240
Aug 10 07:01:29 dg1test kernel: __common_interrupt+0x72/0x160
Aug 10 07:01:29 dg1test kernel: common_interrupt+0x60/0xe0
Aug 10 07:01:29 dg1test kernel: asm_common_interrupt+0x26/0x40
Aug 10 07:01:29 dg1test kernel: RIP: 0033:0x7fb66174d700
Aug 10 07:01:29 dg1test kernel: Code: c1 e2 05 29 ca 8d 0c 10 0f be 07 84 c0 75 eb 89 c8 c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa e9 d7 0f fc ff 0f 1f 80 00 00 00 00 <f3> 0f 1e fa e9 c7 0f fc>
Aug 10 07:01:29 dg1test kernel: RSP: 002b:00007ffdd3c48438 EFLAGS: 00000206
Aug 10 07:01:29 dg1test kernel: RAX: 000055bb758763c0 RBX: 000055bb758752c0 RCX: 00000000000028b0
Aug 10 07:01:29 dg1test kernel: RDX: 000055bb758752c0 RSI: 000055bb75887490 RDI: 000055bb75862950
Aug 10 07:01:29 dg1test kernel: RBP: 00007ffdd3c48490 R08: 000055bb75873b10 R09: 0000000000000001
Aug 10 07:01:29 dg1test kernel: R10: 0000000000000004 R11: 000055bb7587f000 R12: 000055bb75887490
Aug 10 07:01:29 dg1test kernel: R13: 000055bb757f6280 R14: 000055bb758875c0 R15: 000055bb757f6280
Aug 10 07:01:29 dg1test kernel: </TASK>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Tested-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Danilo Krummrich <dakr@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231107053255.2257079-1-airlied@gmail.com
|
|
- preparation for GSP-RM, which has massive FW images
- based on a patch by Dave Airlie
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230918202149.4343-32-skeggsb@gmail.com
|
|
Will initially be used to tag some large grctx allocations which don't
need to be saved, to speedup suspend/resume.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Acked-by: Danilo Krummrich <me@dakr.org>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230919220442.202488-3-lyude@redhat.com
|
|
`strncpy` is deprecated for use on NUL-terminated destination strings [1].
We should prefer more robust and less ambiguous string interfaces.
A suitable replacement is `strscpy` [2] due to the fact that it guarantees
NUL-termination on the destination buffer without unnecessarily NUL-padding.
There is likely no bug in the current implementation due to the safeguard:
| cname[sizeof(cname) - 1] = '\0';
... however we can provide simpler and easier to understand code using
the newer (and recommended) `strscpy` api.
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230914-strncpy-drivers-gpu-drm-nouveau-nvkm-core-firmware-c-v1-1-3aeae46c032f@google.com
|
|
This can be completely normal in some situations (ie. non-stall intrs
when nothing is waiting on them).
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230525003106.3853741-2-skeggsb@gmail.com
|
|
Turns out, we're currently tearing down the disp core channel *before*
the satellite channels (wndw, etc) during suspend.
This makes RM return NV_ERR_NOT_SUPPORTED on attempting to reallocate
the core channel on resume for some reason, but we probably shouldn't
be doing it on HW either.
Tear down children in the reverse of allocation order instead.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230525003106.3853741-1-skeggsb@gmail.com
|
|
Missed some Tegra-specific quirks when reworking ACR to support Ampere.
Fixes: 2541626cfb79 ("drm/nouveau/acr: use common falcon HS FW code for ACR FWs")
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Tested-by: Diogo Ivo <diogo.ivo@tecnico.ulisboa.pt>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230130223715.1831509-3-bskeggs@redhat.com
|
|
Will be used to improve gr reset on GF100 and newer.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Adds context binding and support for FWs with a bootloader to the code
that was added to load VPR scrubber HS binaries, and ports ACR over to
using all of it.
- gv100 split from gp108 to handle FW exit status differences
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Adds the start of common interfaces to load and boot the HS binaries
provided by NVIDIA that enable the usage of GR.
ACR already handles most of this, but it's very much tied into ACR's
init process, and there's other code that could benefit from reusing
a lot of this stuff too (ie. VBIOS DEVINIT/PreOS, VPR scrubber).
The VPR scrubber code is fairly independent, and a good first target.
- adds better debug output to fw loading process, to ease bring-up/debug
v2:
- whitespace, 0->false
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Channel groups have somewhat more complicated requirements than what we
currently support. An engine context is shared between all channels in
a channel group, VEID/subctx support (later) brings per-VEID components,
and we need to track an individual channel's engine context pointers.
This commit adds the structures and refcounting to support the above,
wrapping the prior implementation for the moment.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Previously only available from Kepler onwards.
- also fixes the info() queries causing fifo init()/fini() unnecessarily
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
This wasn't really needed before; the main place this could race is with
channel recovery, but (through potentially fragile means) shouldn't have
been possible.
However, a number of upcoming patches benefit from having better control
over subdev init, necessitating some improvements here.
- allows subdev/engine oneinit() without init() (host/fifo patches)
- merges engine use locking/tracking into subdev, and extends it to fix
some issues that will arise with future usage patterns (acr patches)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
- new-style handlers can now be used here too
- decent clean-up
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
It's quite a lot of tedious and error-prone work to switch over all the
subdevs at once, so allow an nvkm_intr to request new-style handlers to
be created that wrap the existing interfaces.
This will allow a more gradual transition.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Turing adds a second top-level interrupt tree in HW, in addition to the
trees available via NV_PMC. Most of the interrupts we care about are
exposed in both trees, but not all of them, and we have some rather
nasty hacks to route the fault buffer interrupts.
Ampere removes the NV_PMC trees entirely.
Here we add some infrastructure to be able to handle all of this more
cleanly, as well as providing more explicit control over handlers.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Unifies the handling between PCI-based and Tegra GPUs, and makes more
explicit/obvious where device interrupts can be expected.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
The vblank and nonstall events have some annoying interactions with DRM
locking, and aren't able to do certain things as a result.
However, other uses of event notifications don't have such requirements,
and upcoming patches take advantage of this for various improvements.
Having separate classes for each nvkm_event's spinlocks allows lockdep
to distinguish between them and avoid false-positives.
v2: __always_inline + comment
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
This replaces the twisty, confusing, relationship between nvkm_event and
nvkm_notify with something much simpler, and less racey. It also places
events in the object tree hierarchy, which will allow a heap of the code
tracking events across allocation/teardown/suspend to be removed.
This commit just adds the new interfaces, and passes the owning subdev to
the event constructor to enable debug-tracing in the new code.
v2:
- use ?: (lyude)
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Fixes resume from hibernate failing on (at least) TU102, where cursor
channel init failed due to being performed before the core channel.
Not solid idea why suspend-to-ram worked, but, presumably HW being in
an entirely clean state has something to do with it.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Fix the following sparse warning:
drivers/gpu/drm/nouveau/nvkm/core/client.c:64:1: warning: symbol 'nvkm_uclient_sclass' was not declared. Should it be static?
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Link: https://gitlab.freedesktop.org/drm/nouveau/-/merge_requests/10
|
|
No longer required now that userspace can't touch anything that might
need it, and should fix DRM MM operations racing with each other, and
the random hangs/crashes that come with that.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|
|
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
|