summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-11ALSA: hda/cs8409: Support new Dell Dolphin VariantsStefan Binding
Add 4 new Dell Dolphin Systems, same configuration as older systems. Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com> Link: https://lore.kernel.org/r/20230811123044.1045651-1-sbinding@opensource.cirrus.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2023-08-11nvme: core: don't hold rcu read lock in nvme_ns_chr_uring_cmd_iopollMing Lei
Now nvme_ns_chr_uring_cmd_iopoll() has switched to request based io polling, and the associated NS is guaranteed to be live in case of io polling, so request is guaranteed to be valid because blk-mq uses pre-allocated request pool. Remove the rcu read lock in nvme_ns_chr_uring_cmd_iopoll(), which isn't needed any more after switching to request based io polling. Fix "BUG: sleeping function called from invalid context" because set_page_dirty_lock() from blk_rq_unmap_user() may sleep. Fixes: 585079b6e425 ("nvme: wire up async polling for io passthrough commands") Reported-by: Guangwu Zhang <guazhang@redhat.com> Cc: Kanchan Joshi <joshi.k@samsung.com> Cc: Anuj Gupta <anuj20.g@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Guangwu Zhang <guazhang@redhat.com> Link: https://lore.kernel.org/r/20230809020440.174682-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11net: pcs: Add missing put_device call in miic_createXiang Yang
The reference of pdev->dev is taken by of_find_device_by_node, so it should be released when not need anymore. Fixes: 7dc54d3b8d91 ("net: pcs: add Renesas MII converter driver") Signed-off-by: Xiang Yang <xiangyang3@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-11virtio-net: set queues after driver_okJason Wang
Commit 25266128fe16 ("virtio-net: fix race between set queues and probe") tries to fix the race between set queues and probe by calling _virtnet_set_queues() before DRIVER_OK is set. This violates virtio spec. Fixing this by setting queues after virtio_device_ready(). Note that rtnl needs to be held for userspace requests to change the number of queues. So we are serialized in this way. Fixes: 25266128fe16 ("virtio-net: fix race between set queues and probe") Reported-by: Dragos Tatulea <dtatulea@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-11Merge tag 'amd-drm-fixes-6.5-2023-08-09' of ↵Dave Airlie
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.5-2023-08-09: amdgpu: - S/G display workaround for platforms with >= 64G of memory - S0i3 fix - SMU 13.0.0 fixes - Disable SMU 13.x OD features temporarily while the interface is reworked to enable additional functionality - Fix cursor gamma issues on DCN3+ - SMU 13.0.6 fixes - Fix possible UAF in CS IOCTL - Polaris display regression fix - Only enable CP GFX shadowing on SR-IOV amdkfd: - Raven/Picasso KFD regression fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230809182827.8135-1-alexander.deucher@amd.com
2023-08-11Merge tag 'drm-misc-fixes-2023-08-10' of ↵Dave Airlie
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Multiple fixes for nouveau around memory safety and DisplayPort, one fix to reduce the log level of rockchip, a power state fix for the it6505 bridge, a timing fix for the lt9611 bridge, a cache maintenance fix for ivpu and one to reset vma->vm_ops on mmap for shmem-helper. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/fwed6gzdtkse5ocrgd37elhyw7qirfptsvfp5mqqverdzifhxj@4da3vesxcqp2
2023-08-10cifs: fix potential oops in cifs_oplock_breakSteve French
With deferred close we can have closes that race with lease breaks, and so with the current checks for whether to send the lease response, oplock_response(), this can mean that an unmount (kill_sb) can occur just before we were checking if the tcon->ses is valid. See below: [Fri Aug 4 04:12:50 2023] RIP: 0010:cifs_oplock_break+0x1f7/0x5b0 [cifs] [Fri Aug 4 04:12:50 2023] Code: 7d a8 48 8b 7d c0 c0 e9 02 48 89 45 b8 41 89 cf e8 3e f5 ff ff 4c 89 f7 41 83 e7 01 e8 82 b3 03 f2 49 8b 45 50 48 85 c0 74 5e <48> 83 78 60 00 74 57 45 84 ff 75 52 48 8b 43 98 48 83 eb 68 48 39 [Fri Aug 4 04:12:50 2023] RSP: 0018:ffffb30607ddbdf8 EFLAGS: 00010206 [Fri Aug 4 04:12:50 2023] RAX: 632d223d32612022 RBX: ffff97136944b1e0 RCX: 0000000080100009 [Fri Aug 4 04:12:50 2023] RDX: 0000000000000001 RSI: 0000000080100009 RDI: ffff97136944b188 [Fri Aug 4 04:12:50 2023] RBP: ffffb30607ddbe58 R08: 0000000000000001 R09: ffffffffc08e0900 [Fri Aug 4 04:12:50 2023] R10: 0000000000000001 R11: 000000000000000f R12: ffff97136944b138 [Fri Aug 4 04:12:50 2023] R13: ffff97149147c000 R14: ffff97136944b188 R15: 0000000000000000 [Fri Aug 4 04:12:50 2023] FS: 0000000000000000(0000) GS:ffff9714f7c00000(0000) knlGS:0000000000000000 [Fri Aug 4 04:12:50 2023] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Fri Aug 4 04:12:50 2023] CR2: 00007fd8de9c7590 CR3: 000000011228e000 CR4: 0000000000350ef0 [Fri Aug 4 04:12:50 2023] Call Trace: [Fri Aug 4 04:12:50 2023] <TASK> [Fri Aug 4 04:12:50 2023] process_one_work+0x225/0x3d0 [Fri Aug 4 04:12:50 2023] worker_thread+0x4d/0x3e0 [Fri Aug 4 04:12:50 2023] ? process_one_work+0x3d0/0x3d0 [Fri Aug 4 04:12:50 2023] kthread+0x12a/0x150 [Fri Aug 4 04:12:50 2023] ? set_kthread_struct+0x50/0x50 [Fri Aug 4 04:12:50 2023] ret_from_fork+0x22/0x30 [Fri Aug 4 04:12:50 2023] </TASK> To fix this change the ordering of the checks before sending the oplock_response to first check if the openFileList is empty. Fixes: da787d5b7498 ("SMB3: Do not send lease break acknowledgment if all file handles have been closed") Suggested-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Bharath SM <bharathsm@microsoft.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2023-08-10Merge branch 'x86/bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipJakub Kicinski
Cross merge x86 fixes to fix clang linking errors: ld.lld: error: ./arch/x86/kernel/vmlinux.lds:221: at least one side of the expression must be absolute These will hopefully be downstream by the time we ship the next batch of fixes. * 'x86/bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Move gds_ucode_mitigated() declaration to header x86/speculation: Add cpu_show_gds() prototype driver core: cpu: Make cpu_show_not_affected() static x86/srso: Fix build breakage with the LLVM linker Documentation/srso: Document IBPB aspect and fix formatting driver core: cpu: Unify redundant silly stubs Documentation/hw-vuln: Unify filename specification in index Link: https://lore.kernel.org/all/CAHk-=wj_b+FGTnevQSBAtCWuhCk=0oQ_THvthBW2hzqpOTLFmg@mail.gmail.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10parisc: perf: Make cpu_device variable staticHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-10parisc: ftrace: Add declaration for ftrace_function_trampoline()Helge Deller
Make sparse happy by adding declaration for ftrace_function_trampoline(). Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-10parisc: boot: Nuke some sparse warnings in decompressorHelge Deller
Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-10parisc: processor: Include asm/smp.h for init_per_cpu()Helge Deller
Fix sparse warning that init_per_cpu() isn't declared. Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-10parisc: unaligned: Include linux/sysctl.h for unaligned_enabledHelge Deller
Fix sparse warning that unaligned_enabled wasn't declared. Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-10parisc: Move proc_mckinley_root and proc_runway_root to sba_iommuHelge Deller
Clean up the procfs root entries for gsc, runway, and mckinley busses. Signed-off-by: Helge Deller <deller@gmx.de>
2023-08-10virtio-mem: check if the config changed before fake offlining memoryDavid Hildenbrand
If we repeatedly fail to fake offline memory to unplug it, we won't be sending any unplug requests to the device. However, we only check if the config changed when sending such (un)plug requests. We could end up trying for a long time to unplug memory, even though the config changed already and we're not supposed to unplug memory anymore. For example, the hypervisor might detect a low-memory situation while unplugging memory and decide to replug some memory. Continuing trying to unplug memory in that case can be problematic. So let's check on a more regular basis. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-5-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10virtio-mem: keep retrying on offline_and_remove_memory() errors in Sub Block ↵David Hildenbrand
Mode (SBM) In case offline_and_remove_memory() fails in SBM, we leave a completely unplugged Linux memory block stick around until we try plugging memory again. We won't try removing that memory block again. offline_and_remove_memory() may, for example, fail if we're racing with another alloc_contig_range() user, if allocating temporary memory fails, or if some memory notifier rejected the offlining request. Let's handle that case better, by simple retrying to offline and remove such memory. Tested using CONFIG_MEMORY_NOTIFIER_ERROR_INJECT. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-4-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10virtio-mem: convert most offline_and_remove_memory() errors to -EBUSYDavid Hildenbrand
Just like we do with alloc_contig_range(), let's convert all unknown errors to -EBUSY, but WARN so we can look into the issue. For example, offline_pages() could fail with -EINTR, which would be unexpected in our case. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-3-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10virtio-mem: remove unsafe unplug in Big Block Mode (BBM)David Hildenbrand
When "unsafe unplug" is enabled, we don't fake-offline all memory ahead of actual memory offlining using alloc_contig_range(). Instead, we rely on offline_pages() to also perform actual page migration, which might fail or take a very long time. In that case, it's possible to easily run into endless loops that cannot be aborted anymore (as offlining is triggered by a workqueue then): For example, a single (accidentally) permanently unmovable page in ZONE_MOVABLE results in an endless loop. For ZONE_NORMAL, races between isolating the pageblock (and checking for unmovable pages) and concurrent page allocation are possible and similarly result in endless loops. The idea of the unsafe unplug mode was to make it possible to more reliably unplug large memory blocks. However, (a) we really should be tackling that differently, by extending the alloc_contig_range()-based mechanism; and (b) this mode is not the default and as far as I know, it's unused either way. So let's simply get rid of it. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20230713145551.2824980-2-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10pds_vdpa: fix up debugfs feature bit printingShannon Nelson
Make clearer in debugfs output the difference between the hw feature bits, the features supported through the driver, and the features that have been negotiated. Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-6-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10pds_vdpa: alloc irq vectors on DRIVER_OKAllen Hubbe
We were allocating irq vectors at the time the aux dev was probed, but that is before the PCI VF is assigned to a separate iommu domain by vhost_vdpa. Because vhost_vdpa later changes the iommu domain the interrupts do not work. Instead, we can allocate the irq vectors later when we see DRIVER_OK and know that the reassignment of the PCI VF to an iommu domain has already happened. Fixes: 151cc834f3dd ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230711042437.69381-5-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10pds_vdpa: clean and reset vqs entriesShannon Nelson
Make sure that we initialize the vqs[] entries the same way both for initial setup and after a vq reset. Fixes: 151cc834f3dd ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-4-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10pds_vdpa: always allow offering VIRTIO_NET_F_MACShannon Nelson
Our driver sets a mac if the HW is 00:..:00 so we need to be sure to advertise VIRTIO_NET_F_MAC even if the HW doesn't. We also need to be sure that virtio_net sees the VIRTIO_NET_F_MAC and doesn't rewrite the mac address that a user may have set with the vdpa utility. After reading the hw_feature bits, add the VIRTIO_NET_F_MAC to the driver's supported_features and use that for reporting what is available. If the HW is not advertising it, be sure to strip the VIRTIO_NET_F_MAC before finishing the feature negotiation. If the user specifies a device_features bitpattern in the vdpa utility without the VIRTIO_NET_F_MAC set, then don't set the mac. Fixes: 151cc834f3dd ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-3-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10pds_vdpa: reset to vdpa specified macAllen Hubbe
When the vdpa device is reset, also reinitialize it with the mac address that was assigned when the device was added. Fixes: 151cc834f3dd ("pds_vdpa: add support for vdpa and vdpamgmt interfaces") Signed-off-by: Allen Hubbe <allen.hubbe@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230711042437.69381-2-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10virtio-net: Zero max_tx_vq field for VIRTIO_NET_CTRL_MQ_HASH_CONFIG caseHawkins Jiawei
Kernel uses `struct virtio_net_ctrl_rss` to save command-specific-data for both the VIRTIO_NET_CTRL_MQ_HASH_CONFIG and VIRTIO_NET_CTRL_MQ_RSS_CONFIG commands. According to the VirtIO standard, "Field reserved MUST contain zeroes. It is defined to make the structure to match the layout of virtio_net_rss_config structure, defined in 5.1.6.5.7.". Yet for the VIRTIO_NET_CTRL_MQ_HASH_CONFIG command case, the `max_tx_vq` field in struct virtio_net_ctrl_rss, which corresponds to the `reserved` field in struct virtio_net_hash_config, is not zeroed, thereby violating the VirtIO standard. This patch solves this problem by zeroing this field in virtnet_init_default_rss(). Cc: Andrew Melnychenko <andrew@daynix.com> Cc: stable@vger.kernel.org Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20230810110405.25558-1-yin31149@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10Merge tag 'net-6.5-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter, wireless and bpf. Still trending up in size but the good news is that the "current" regressions are resolved, AFAIK. We're getting weirdly many fixes for Wake-on-LAN and suspend/resume handling on embedded this week (most not merged yet), not sure why. But those are all for older bugs. Current release - regressions: - tls: set MSG_SPLICE_PAGES consistently when handing encrypted data over to TCP Current release - new code bugs: - eth: mlx5: correct IDs on VFs internal to the device (IPU) Previous releases - regressions: - phy: at803x: fix WoL support / reporting on AR8032 - bonding: fix incorrect deletion of ETH_P_8021AD protocol VID from slaves, leading to BUG_ON() - tun: prevent tun_build_skb() from exceeding the packet size limit - wifi: rtw89: fix 8852AE disconnection caused by RX full flags - eth/PCI: enetc: fix probing after 6fffbc7ae137 ("PCI: Honor firmware's device disabled status"), keep PCI devices around even if they are disabled / not going to be probed to be able to apply quirks on them - eth: prestera: fix handling IPv4 routes with nexthop IDs Previous releases - always broken: - netfilter: re-work garbage collection to avoid races between user-facing API and timeouts - tunnels: fix generating ipv4 PMTU error on non-linear skbs - nexthop: fix infinite nexthop bucket dump when using maximum nexthop ID - wifi: nl80211: fix integer overflow in nl80211_parse_mbssid_elems() Misc: - unix: use consistent error code in SO_PEERPIDFD - ipv6: adjust ndisc_is_useropt() to include PREFIX_INFO, in prep for upcoming IETF RFC" * tag 'net-6.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits) net: hns3: fix strscpy causing content truncation issue net: tls: set MSG_SPLICE_PAGES consistently ibmvnic: Ensure login failure recovery is safe from other resets ibmvnic: Do partial reset on login failure ibmvnic: Handle DMA unmapping of login buffs in release functions ibmvnic: Unmap DMA login rsp buffer on send login fail ibmvnic: Enforce stronger sanity checks on login response net: mana: Fix MANA VF unload when hardware is unresponsive netfilter: nf_tables: remove busy mark and gc batch API netfilter: nft_set_hash: mark set element as dead when deleting from packet path netfilter: nf_tables: adapt set backend to use GC transaction API netfilter: nf_tables: GC transaction API to avoid race with control plane selftests/bpf: Add sockmap test for redirecting partial skb data selftests/bpf: fix a CI failure caused by vsock sockmap test bpf, sockmap: Fix bug that strp_done cannot be called bpf, sockmap: Fix map type error in sock_map_del_link xsk: fix refcount underflow in error path ipv6: adjust ndisc_is_useropt() to also return true for PIO selftests: forwarding: bridge_mdb: Make test more robust selftests: forwarding: bridge_mdb_max: Fix failing test with old libnet ...
2023-08-10RDMA/bnxt_re: Initialize dpi_tbl_lock mutexKashyap Desai
Fix the missing dpi_tbl_lock mutex initialization. Fixes: 0ac20faf5d83 ("RDMA/bnxt_re: Reorg the bar mapping") Link: https://lore.kernel.org/r/1691642677-21369-4-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-10RDMA/bnxt_re: Fix error handling in probe failure pathKalesh AP
During bnxt_re_dev_init(), when bnxt_re_setup_chip_ctx() fails unregister with L2 first before bailing out probe. Fixes: ae8637e13185 ("RDMA/bnxt_re: Add chip context to identify 57500 series") Link: https://lore.kernel.org/r/1691642677-21369-3-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-10RDMA/bnxt_re: Properly order ib_device_unalloc() to avoid UAFSelvin Xavier
ib_dealloc_device() should be called only after device cleanup. Fix the dealloc sequence. Fixes: 6d758147c7b8 ("RDMA/bnxt_re: Use auxiliary driver interface") Link: https://lore.kernel.org/r/1691642677-21369-2-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-08-10vdpa/mlx5: Fix crash on shutdown for when no ndev existsDragos Tatulea
The ndev was accessed on shutdown without a check if it actually exists. This triggered the crash pasted below. Instead of doing the ndev check, delete the shutdown handler altogether. The irqs will be released at the parent VF level (mlx5_core). BUG: kernel NULL pointer dereference, address: 0000000000000300 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-rc2_for_upstream_min_debug_2023_07_17_15_05 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286 RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017 RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000 RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000 R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000 FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die+0x20/0x60 ? page_fault_oops+0x14c/0x3c0 ? exc_page_fault+0x75/0x140 ? asm_exc_page_fault+0x22/0x30 ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa] device_shutdown+0x13e/0x1e0 kernel_restart+0x36/0x90 __do_sys_reboot+0x141/0x210 ? vfs_writev+0xcd/0x140 ? handle_mm_fault+0x161/0x260 ? do_writev+0x6b/0x110 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f496990fb56 RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56 RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8 </TASK> CR2: 0000000000000300 ---[ end trace 0000000000000000 ]--- Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230803152648.199297-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10vdpa/mlx5: Delete control vq iotlb in destroy_mr only when necessaryEugenio Pérez
mlx5_vdpa_destroy_mr can be called from .set_map with data ASID after the control virtqueue ASID iotlb has been populated. The control vq iotlb must not be cleared, since it will not be populated again. So call the ASID aware destroy function which makes sure that the right vq resource is destroyed. Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Message-Id: <20230802171231.11001-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10vdpa/mlx5: Fix mr->initialized semanticsDragos Tatulea
The mr->initialized flag is shared between the control vq and data vq part of the mr init/uninit. But if the control vq and data vq get placed in different ASIDs, it can happen that initializing the control vq will prevent the data vq mr from being initialized. This patch consolidates the control and data vq init parts into their own init functions. The mr->initialized will now be used for the data vq only. The control vq currently doesn't need a flag. The uninitializing part is also taken care of: mlx5_vdpa_destroy_mr got split into data and control vq functions which are now also ASID aware. Fixes: 8fcd20c30704 ("vdpa/mlx5: Support different address spaces for control and data") Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Message-Id: <20230802171231.11001-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10vdpa/mlx5: Correct default number of queues when MQ is onDragos Tatulea
The standard specifies that the initial number of queues is the default, which is 1 (1 tx, 1 rx). Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230727172354.68243-2-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com>
2023-08-10virtio-vdpa: Fix cpumask memory leak in virtio_vdpa_find_vqs()Gal Pressman
Free the cpumask allocated by create_affinity_masks() before returning from the function. Fixes: 3dad56823b53 ("virtio-vdpa: Support interrupt affinity spreading mechanism") Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20230726191036.14324-1-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-08-10vduse: Use proper spinlock for IRQ injectionMaxime Coquelin
The IRQ injection work used spin_lock_irq() to protect the scheduling of the softirq, but spin_lock_bh() should be used. With spin_lock_irq(), we noticed delay of more than 6 seconds between the time a NAPI polling work is scheduled and the time it is executed. Fixes: c8a6153b6c59 ("vduse: Introduce VDUSE - vDPA Device in Userspace") Cc: xieyongji@bytedance.com Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com> Message-Id: <20230705114505.63274-1-maxime.coquelin@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com>
2023-08-10vdpa: Enable strict validation for netlinks opsDragos Tatulea
The previous patches added the missing nla policies that were required for validation to work. Now strict validation on netlink ops can be enabled. This patch does it. Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Cc: stable@vger.kernel.org Message-Id: <20230727175757.73988-9-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10vdpa: Add max vqp attr to vdpa_nl_policy for nlattr length checkLin Ma
The vdpa_nl_policy structure is used to validate the nlattr when parsing the incoming nlmsg. It will ensure the attribute being described produces a valid nlattr pointer in info->attrs before entering into each handler in vdpa_nl_ops. That is to say, the missing part in vdpa_nl_policy may lead to illegal nlattr after parsing, which could lead to OOB read just like CVE-2023-3773. This patch adds the missing nla_policy for vdpa max vqp attr to avoid such bugs. Fixes: ad69dd0bf26b ("vdpa: Introduce query of device config layout") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernel.org Message-Id: <20230727175757.73988-7-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10vdpa: Add queue index attr to vdpa_nl_policy for nlattr length checkLin Ma
The vdpa_nl_policy structure is used to validate the nlattr when parsing the incoming nlmsg. It will ensure the attribute being described produces a valid nlattr pointer in info->attrs before entering into each handler in vdpa_nl_ops. That is to say, the missing part in vdpa_nl_policy may lead to illegal nlattr after parsing, which could lead to OOB read just like CVE-2023-3773. This patch adds the missing nla_policy for vdpa queue index attr to avoid such bugs. Fixes: 13b00b135665 ("vdpa: Add support for querying vendor statistics") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernelorg Message-Id: <20230727175757.73988-5-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10vdpa: Add features attr to vdpa_nl_policy for nlattr length checkLin Ma
The vdpa_nl_policy structure is used to validate the nlattr when parsing the incoming nlmsg. It will ensure the attribute being described produces a valid nlattr pointer in info->attrs before entering into each handler in vdpa_nl_ops. That is to say, the missing part in vdpa_nl_policy may lead to illegal nlattr after parsing, which could lead to OOB read just like CVE-2023-3773. This patch adds the missing nla_policy for vdpa features attr to avoid such bugs. Fixes: 90fea5a800c3 ("vdpa: device feature provisioning") Signed-off-by: Lin Ma <linma@zju.edu.cn> Cc: stable@vger.kernel.org Message-Id: <20230727175757.73988-3-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10virtio-pci: Fix legacy device flag setting error in probeFeng Liu
The 'is_legacy' flag is used to differentiate between legacy vs modern device. Currently, it is based on the value of vp_dev->ldev.ioaddr. However, due to the shared memory of the union between struct virtio_pci_legacy_device and struct virtio_pci_modern_device, when virtio_pci_modern_probe modifies the content of struct virtio_pci_modern_device, it affects the content of struct virtio_pci_legacy_device, and ldev.ioaddr is no longer zero, causing the 'is_legacy' flag to be set as true. To resolve issue, when legacy device is probed, mark 'is_legacy' as true, when modern device is probed, keep 'is_legacy' as false. Fixes: 4f0fc22534e3 ("virtio_pci: Optimize virtio_pci_device structure size") Signed-off-by: Feng Liu <feliu@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20230719154550.79536-1-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10MAINTAINERS: add vhost-scsi entry and myself as a co-maintainerMike Christie
I've been doing a lot of the development on vhost-scsi the last couple of years, so per Michael T's suggestion this adds me as co-maintainer. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230715142027.5572-1-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-10vhost-scsi: Rename vhost_scsi_iov_to_sglMike Christie
Rename vhost_scsi_iov_to_sgl to vhost_scsi_map_iov_to_sgl so it matches matches the naming style used for vhost_scsi_copy_iov_to_sgl. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230709202859.138387-3-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-10vhost-scsi: Fix alignment handling with windowsMike Christie
The linux block layer requires bios/requests to have lengths with a 512 byte alignment. Some drivers/layers like dm-crypt and the directi IO code will test for it and just fail. Other drivers like SCSI just assume the requirement is met and will end up in infinte retry loops. The problem for drivers like SCSI is that it uses functions like blk_rq_cur_sectors and blk_rq_sectors which divide the request's length by 512. If there's lefovers then it just gets dropped. But other code in the block/scsi layer may use blk_rq_bytes/blk_rq_cur_bytes and end up thinking there is still data left and try to retry the cmd. We can then end up getting stuck in retry loops where part of the block/scsi thinks there is data left, but other parts think we want to do IOs of zero length. Linux will always check for alignment, but windows will not. When vhost-scsi then translates the iovec it gets from a windows guest to a scatterlist, we can end up with sg items where the sg->length is not divisible by 512 due to the misaligned offset: sg[0].offset = 255; sg[0].length = 3841; sg... sg[N].offset = 0; sg[N].length = 255; When the lio backends then convert the SG to bios or other iovecs, we end up sending them with the same misaligned values and can hit the issues above. This just has us drop down to allocating a temp page and copying the data when we detect a misaligned buffer and the IO is large enough that it will get split into multiple bad IOs. Signed-off-by: Mike Christie <michael.christie@oracle.com> Message-Id: <20230709202859.138387-2-michael.christie@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-10pds_vdpa: protect Makefile from unconfigured debugfsShannon Nelson
debugfs.h protects itself from an undefined DEBUG_FS, so it is not necessary to check it in the driver code or the Makefile. The driver code had been updated for this, but the Makefile had missed the update. Link: https://lore.kernel.org/linux-next/fec68c3c-8249-7af4-5390-0495386a76f9@infradead.org/ Fixes: a16291b5bcbb ("pds_vdpa: Add new vDPA driver for AMD/Pensando DSC") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230706231718.54198-1-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Acked-by: Jason Wang <jasowang@redhat.com>
2023-08-10virtio-mmio: don't break lifecycle of vm_devWolfram Sang
vm_dev has a separate lifecycle because it has a 'struct device' embedded. Thus, having a release callback for it is correct. Allocating the vm_dev struct with devres totally breaks this protection, though. Instead of waiting for the vm_dev release callback, the memory is freed when the platform_device is removed. Resulting in a use-after-free when finally the callback is to be called. To easily see the problem, compile the kernel with CONFIG_DEBUG_KOBJECT_RELEASE and unbind with sysfs. The fix is easy, don't use devres in this case. Found during my research about object lifetime problems. Fixes: 7eb781b1bbb7 ("virtio_mmio: add cleanup for virtio_mmio_probe") Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Message-Id: <20230629120526.7184-1-wsa+renesas@sang-engineering.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-08-10net: hns3: fix strscpy causing content truncation issueHao Chen
hns3_dbg_fill_content()/hclge_dbg_fill_content() is aim to integrate some items to a string for content, and we add '\n' and '\0' in the last two bytes of content. strscpy() will add '\0' in the last byte of destination buffer(one of items), it result in finishing content print ahead of schedule and some dump content truncation. One Error log shows as below: cat mac_list/uc UC MAC_LIST: Expected: UC MAC_LIST: FUNC_ID MAC_ADDR STATE pf 00:2b:19:05:03:00 ACTIVE The destination buffer is length-bounded and not required to be NUL-terminated, so just change strscpy() to memcpy() to fix it. Fixes: 1cf3d5567f27 ("net: hns3: fix strncpy() not using dest-buf length as length issue") Signed-off-by: Hao Chen <chenhao418@huawei.com> Signed-off-by: Jijie Shao <shaojijie@huawei.com> Link: https://lore.kernel.org/r/20230809020902.1941471-1-shaojijie@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10net: tls: set MSG_SPLICE_PAGES consistentlyJakub Kicinski
We used to change the flags for the last segment, because non-last segments had the MSG_SENDPAGE_NOTLAST flag set. That flag is no longer a thing so remove the setting. Since flags most likely don't have MSG_SPLICE_PAGES set this avoids passing parts of the sg as splice and parts as non-splice. Before commit under Fixes we'd have called tcp_sendpage() which would add the MSG_SPLICE_PAGES. Why this leads to trouble remains unclear but Tariq reports hitting the WARN_ON(!sendpage_ok()) due to page refcount of 0. Fixes: e117dcfd646e ("tls: Inline do_tcp_sendpages()") Reported-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/all/4c49176f-147a-4283-f1b1-32aac7b4b996@gmail.com/ Tested-by: Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20230808180917.1243540-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10Merge tag 'dmaengine-fix-6.5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine Pull dmaengine fixes from Vinod Koul: - HAS_IOMEM fixes for fsl edma and intel idma - return-value fix, interrupt vector setting and typo fix for xilinx xdma - email updates for codeaurora email domain move - correct pause status for pl330 driver - idxd clear flag on disable fix - function documentation fix for owl dma - potential un-allocated memory fix for mcf driver * tag 'dmaengine-fix-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: xilinx: xdma: Fix typo dmaengine: xilinx: xdma: Fix interrupt vector setting dmaengine: owl-dma: Modify mismatched function name dmaengine: idxd: Clear PRS disable flag when disabling IDXD device dmaengine: pl330: Return DMA_PAUSED when transaction is paused dmaengine: qcom_hidma: Update codeaurora email domain dmaengine: mcf-edma: Fix a potential un-allocated memory access dmaengine: xilinx: xdma: Fix Judgment of the return value idmaengine: make FSL_EDMA and INTEL_IDMA64 depends on HAS_IOMEM
2023-08-10Merge tag 'nf-23-08-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The existing attempt to resolve races between control plane and GC work is error prone, as reported by Bien Pham <phamnnb@sea.com>, some places forgot to call nft_set_elem_mark_busy(), leading to double-deactivation of elements. This series contains the following patches: 1) Do not skip expired elements during walk otherwise elements might never decrement the reference counter on data, leading to memleak. 2) Add a GC transaction API to replace the former attempt to deal with races between control plane and GC. GC worker sets on NFT_SET_ELEM_DEAD_BIT on elements and it creates a GC transaction to remove the expired elements, GC transaction could abort in case of interference with control plane and retried later (GC async). Set backends such as rbtree and pipapo also perform GC from control plane (GC sync), in such case, element deactivation and removal is safe because mutex is held then collected elements are released via call_rcu(). 3) Adapt existing set backends to use the GC transaction API. 4) Update rhash set backend to set on _DEAD bit to report deleted elements from datapath for GC. 5) Remove old GC batch API and the NFT_SET_ELEM_BUSY_BIT. * tag 'nf-23-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: remove busy mark and gc batch API netfilter: nft_set_hash: mark set element as dead when deleting from packet path netfilter: nf_tables: adapt set backend to use GC transaction API netfilter: nf_tables: GC transaction API to avoid race with control plane netfilter: nf_tables: don't skip expired elements during walk ==================== Link: https://lore.kernel.org/r/20230810070830.24064-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Martin KaFai Lau says: ==================== pull-request: bpf 2023-08-09 We've added 5 non-merge commits during the last 7 day(s) which contain a total of 6 files changed, 102 insertions(+), 8 deletions(-). The main changes are: 1) A bpf sockmap memleak fix and a fix in accessing the programs of a sockmap under the incorrect map type from Xu Kuohai. 2) A refcount underflow fix in xsk from Magnus Karlsson. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add sockmap test for redirecting partial skb data selftests/bpf: fix a CI failure caused by vsock sockmap test bpf, sockmap: Fix bug that strp_done cannot be called bpf, sockmap: Fix map type error in sock_map_del_link xsk: fix refcount underflow in error path ==================== Link: https://lore.kernel.org/r/20230810055303.120917-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-10ibmvnic: Ensure login failure recovery is safe from other resetsNick Child
If a login request fails, the recovery process should be protected against parallel resets. It is a known issue that freeing and registering CRQ's in quick succession can result in a failover CRQ from the VIOS. Processing a failover during login recovery is dangerous for two reasons: 1. This will result in two parallel initialization processes, this can cause serious issues during login. 2. It is possible that the failover CRQ is received but never executed. We get notified of a pending failover through a transport event CRQ. The reset is not performed until a INIT CRQ request is received. Previously, if CRQ init fails during login recovery, then the ibmvnic irq is freed and the login process returned error. If failover_pending is true (a transport event was received), then the ibmvnic device would never be able to process the reset since it cannot receive the CRQ_INIT request due to the irq being freed. This leaved the device in a inoperable state. Therefore, the login failure recovery process must be hardened against these possible issues. Possible failovers (due to quick CRQ free and init) must be avoided and any issues during re-initialization should be dealt with instead of being propagated up the stack. This logic is similar to that of ibmvnic_probe(). Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230809221038.51296-5-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>