summaryrefslogtreecommitdiff
path: root/drivers/virtio
AgeCommit message (Collapse)Author
2024-12-02module: Convert symbol namespace to string literalPeter Zijlstra
Clean up the existing export namespace code along the same lines of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") and for the same reason, it is not desired for the namespace argument to be a macro expansion itself. Scripted using git grep -l -e MODULE_IMPORT_NS -e EXPORT_SYMBOL_NS | while read file; do awk -i inplace ' /^#define EXPORT_SYMBOL_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /^#define MODULE_IMPORT_NS/ { gsub(/__stringify\(ns\)/, "ns"); print; next; } /MODULE_IMPORT_NS/ { $0 = gensub(/MODULE_IMPORT_NS\(([^)]*)\)/, "MODULE_IMPORT_NS(\"\\1\")", "g"); } /EXPORT_SYMBOL_NS/ { if ($0 ~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+),/) { if ($0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/ && $0 !~ /(EXPORT_SYMBOL_NS[^(]*)\(\)/ && $0 !~ /^my/) { getline line; gsub(/[[:space:]]*\\$/, ""); gsub(/[[:space:]]/, "", line); $0 = $0 " " line; } $0 = gensub(/(EXPORT_SYMBOL_NS[^(]*)\(([^,]+), ([^)]+)\)/, "\\1(\\2, \"\\3\")", "g"); } } { print }' $file; done Requested-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://mail.google.com/mail/u/2/#inbox/FMfcgzQXKWgMmjdFwwdsfgxzKpVHWPlc Acked-by: Greg KH <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-12-01Get rid of 'remove_new' relic from platform driver structLinus Torvalds
The continual trickle of small conversion patches is grating on me, and is really not helping. Just get rid of the 'remove_new' member function, which is just an alias for the plain 'remove', and had a comment to that effect: /* * .remove_new() is a relic from a prototype conversion of .remove(). * New drivers are supposed to implement .remove(). Once all drivers are * converted to not use .remove_new any more, it will be dropped. */ This was just a tree-wide 'sed' script that replaced '.remove_new' with '.remove', with some care taken to turn a subsequent tab into two tabs to make things line up. I did do some minimal manual whitespace adjustment for places that used spaces to line things up. Then I just removed the old (sic) .remove_new member function, and this is the end result. No more unnecessary conversion noise. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-11-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "A small number of improvements all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_vdpa: remove redundant check on desc virtio_fs: store actual queue index in mq_map virtio_fs: add informative log for new tag discovery virtio: Make vring_new_virtqueue support packed vring virtio_pmem: Add freeze/restore callbacks vdpa/mlx5: Fix suboptimal range on iotlb iteration
2024-11-27Merge tag 'vfio-v6.13-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Constify an unmodified structure used in linking vfio and kvm (Christophe JAILLET) - Add ID for an additional hardware SKU supported by the nvgrace-gpu vfio-pci variant driver (Ankit Agrawal) - Fix incorrect signed cast in QAT vfio-pci variant driver, negating test in check_add_overflow(), though still caught by later tests (Giovanni Cabiddu) - Additional debugfs attributes exposed in hisi_acc vfio-pci variant driver for migration debugging (Longfang Liu) - Migration support is added to the virtio vfio-pci variant driver, becoming the primary feature of the driver while retaining emulation of virtio legacy support as a secondary option (Yishai Hadas) - Fixes to a few unwind flows in the mlx5 vfio-pci driver discovered through reviews of the virtio variant driver (Yishai Hadas) - Fix an unlikely issue where a PCI device exposed to userspace with an unknown capability at the base of the extended capability chain can overflow an array index (Avihai Horon) * tag 'vfio-v6.13-rc1' of https://github.com/awilliam/linux-vfio: vfio/pci: Properly hide first-in-list PCIe extended capability vfio/mlx5: Fix unwind flows in mlx5vf_pci_save/resume_device_data() vfio/mlx5: Fix an unwind issue in mlx5vf_add_migration_pages() vfio/virtio: Enable live migration once VIRTIO_PCI was configured vfio/virtio: Add PRE_COPY support for live migration vfio/virtio: Add support for the basic live migration functionality virtio-pci: Introduce APIs to execute device parts admin commands virtio: Manage device and driver capabilities via the admin commands virtio: Extend the admin command to include the result size virtio_pci: Introduce device parts access commands Documentation: add debugfs description for hisi migration hisi_acc_vfio_pci: register debugfs for hisilicon migration driver hisi_acc_vfio_pci: create subfunction for data reading hisi_acc_vfio_pci: extract public functions for container_of vfio/qat: fix overflow check in qat_vf_resume_write() vfio/nvgrace-gpu: Add a new GH200 SKU to the devid table kvm/vfio: Constify struct kvm_device_ops
2024-11-21Merge tag 'net-next-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "The most significant set of changes is the per netns RTNL. The new behavior is disabled by default, regression risk should be contained. Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its default value from PTP_1588_CLOCK_KVM, as the first is intended to be a more reliable replacement for the latter. Core: - Started a very large, in-progress, effort to make the RTNL lock scope per network-namespace, thus reducing the lock contention significantly in the containerized use-case, comprising: - RCU-ified some relevant slices of the FIB control path - introduce basic per netns locking helpers - namespacified the IPv4 address hash table - remove rtnl_register{,_module}() in favour of rtnl_register_many() - refactor rtnl_{new,del,set}link() moving as much validation as possible out of RTNL lock - convert all phonet doit() and dumpit() handlers to RCU - convert IPv4 addresses manipulation to per-netns RTNL - convert virtual interface creation to per-netns RTNL the per-netns lock infrastructure is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim. - Introduce NAPI suspension, to efficiently switching between busy polling (NAPI processing suspended) and normal processing. - Migrate the IPv4 routing input, output and control path from direct ToS usage to DSCP macros. This is a work in progress to make ECN handling consistent and reliable. - Add drop reasons support to the IPv4 rotue input path, allowing better introspection in case of packets drop. - Make FIB seqnum lockless, dropping RTNL protection for read access. - Make inet{,v6} addresses hashing less predicable. - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets and timestamps Things we sprinkled into general kernel code: - Add small file operations for debugfs, to reduce the struct ops size. - Refactoring and optimization for the implementation of page_frag API, This is a preparatory work to consolidate the page_frag implementation. Netfilter: - Optimize set element transactions to reduce memory consumption - Extended netlink error reporting for attribute parser failure. - Make legacy xtables configs user selectable, giving users the option to configure iptables without enabling any other config. - Address a lot of false-positive RCU issues, pointed by recent CI improvements. BPF: - Put xsk sockets on a struct diet and add various cleanups. Overall, this helps to bump performance by 12% for some workloads. - Extend BPF selftests to increase coverage of XDP features in combination with BPF cpumap. - Optimize and homogenize bpf_csum_diff helper for all archs and also add a batch of new BPF selftests for it. - Extend netkit with an option to delegate skb->{mark,priority} scrubbing to its BPF program. - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs. Protocols: - Introduces 4-tuple hash for connected udp sockets, speeding-up significantly connected sockets lookup. - Add a fastpath for some TCP timers that usually expires after close, the socket lock contention. - Add inbound and outbound xfrm state caches to speed up state lookups. - Avoid sending MPTCP advertisements on stale subflows, reducing risks on loosing them. - Make neighbours table flushing more scalable, maintaining per device neigh lists. Driver API: - Introduce a unified interface to configure transmission H/W shaping, and expose it to user-space via generic-netlink. - Add support for per-NAPI config via netlink. This makes napi configuration persistent across queues removal and re-creation. Requires driver updates, currently supported drivers are: nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice. - Add ethtool support for writing SFP / PHY firmware blocks. - Track RSS context allocation from ethtool core. - Implement support for mirroring to DSA CPU port, via TC mirror offload. - Consolidate FDB updates notification, to avoid duplicates on device-specific entries. - Expose DPLL clock quality level to the user-space. - Support master-slave PHY config via device tree. Tests and tooling: - forwarding: introduce deferred commands, to simplify the cleanup phase Drivers: - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic, Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the IRQs and queues to NAPI IDs, allowing busy polling and better introspection. - Ethernet high-speed NICs: - nVidia/Mellanox: - mlx5: - a large refactor to implement support for cross E-Switch scheduling - refactor H/W conter management to let it scale better - H/W GRO cleanups - Intel (100G, ice):: - add support for ethtool reset - implement support for per TX queue H/W shaping - AMD/Solarflare: - implement per device queue stats support - Broadcom (bnxt): - improve wildcard l4proto on IPv4/IPv6 ntuple rules - Marvell Octeon: - Add representor support for each Resource Virtualization Unit (RVU) device. - Hisilicon: - add support for the BMC Gigabit Ethernet - IBM (EMAC): - driver cleanup and modernization - Cisco (VIC): - raise the queues number limit to 256 - Ethernet virtual: - Google vNIC: - implement page pool support - macsec: - inherit lower device's features and TSO limits when offloading - virtio_net: - enable premapped mode by default - support for XDP socket(AF_XDP) zerocopy TX - wireguard: - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger packets. - Ethernet NICs embedded and virtual: - Broadcom ASP: - enable software timestamping - Freescale: - add enetc4 PF driver - MediaTek: Airoha SoC: - implement BQL support - RealTek r8169: - enable TSO by default on r8168/r8125 - implement extended ethtool stats - Renesas AVB: - enable TX checksum offload - Synopsys (stmmac): - support header splitting for vlan tagged packets - move common code for DWMAC4 and DWXGMAC into a separate FPE module. - add dwmac driver support for T-HEAD TH1520 SoC - Synopsys (xpcs): - driver refactor and cleanup - TI: - icssg_prueth: add VLAN offload support - Xilinx emaclite: - add clock support - Ethernet switches: - Microchip: - implement support for the lan969x Ethernet switch family - add LAN9646 switch support to KSZ DSA driver - Ethernet PHYs: - Marvel: 88q2x: enable auto negotiation - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2 - PTP: - Add support for the Amazon virtual clock device - Add PtP driver for s390 clocks - WiFi: - mac80211 - EHT 1024 aggregation size for transmissions - new operation to indicate that a new interface is to be added - support radio separation of multi-band devices - move wireless extension spy implementation to libiw - Broadcom: - brcmfmac: optional LPO clock support - Microchip: - add support for Atmel WILC3000 - Qualcomm (ath12k): - firmware coredump collection support - add debugfs support for a multitude of statistics - Qualcomm (ath5k): - Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support - Realtek: - rtw88: 8821au and 8812au USB adapters support - rtw89: add thermal protection - rtw89: fine tune BT-coexsitence to improve user experience - rtw89: firmware secure boot for WiFi 6 chip - Bluetooth - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and 0x13d3:0x3623 - add Realtek RTL8852BE support for id Foxconn 0xe123 - add MediaTek MT7920 support for wireless module ids - btintel_pcie: add handshake between driver and firmware - btintel_pcie: add recovery mechanism - btnxpuart: add GPIO support to power save feature" * tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits) mm: page_frag: fix a compile error when kernel is not compiled Documentation: tipc: fix formatting issue in tipc.rst selftests: nic_performance: Add selftest for performance of NIC driver selftests: nic_link_layer: Add selftest case for speed and duplex states selftests: nic_link_layer: Add link layer selftest for NIC driver bnxt_en: Add FW trace coredump segments to the coredump bnxt_en: Add a new ethtool -W dump flag bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr() bnxt_en: Add functions to copy host context memory bnxt_en: Do not free FW log context memory bnxt_en: Manage the FW trace context memory bnxt_en: Allocate backing store memory for FW trace logs bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem() bnxt_en: Refactor bnxt_free_ctx_mem() bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type bnxt_en: Update firmware interface spec to 1.10.3.85 selftests/bpf: Add some tests with sockmap SK_PASS bpf: fix recursive lock when verdict program return SK_PASS wireguard: device: support big tcp GSO wireguard: selftests: load nf_conntrack if not present ...
2024-11-18Merge tag 's390-6.13-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Add firmware sysfs interface which allows user space to retrieve the dump area size of the machine - Add 'measurement_chars_full' CHPID sysfs attribute to make the complete associated Channel-Measurements Characteristics Block available - Add virtio-mem support - Move gmap aka KVM page fault handling from the main fault handler to KVM code. This is the first step to make s390 KVM page fault handling similar to other architectures. With this first step the main fault handler does not have any special handling anymore, and therefore convert it to support LOCK_MM_AND_FIND_VMA - With gcc 14 s390 support for flag output operand support for inline assemblies was added. This allows for several optimizations: - Provide a cmpxchg inline assembly which makes use of this, and provide all variants of arch_try_cmpxchg() so that the compiler can generate slightly better code - Convert a few cmpxchg() loops to try_cmpxchg() loops - Similar to x86 add a CC_OUT() helper macro (and other macros), and convert all inline assemblies to make use of them, so that depending on compiler version better code can be generated - List installed host-key hashes in sysfs if the machine supports the Query Ultravisor Keys UVC - Add 'Retrieve Secret' ioctl which allows user space in protected execution guests to retrieve previously stored secrets from the Ultravisor - Add pkey-uv module which supports the conversion of Ultravisor retrievable secrets to protected keys - Extend the existing paes cipher to exploit the full AES-XTS hardware acceleration introduced with message-security assist extension 10 - Convert hopefully all sysfs show functions to use sysfs_emit() so that the constant flow of such patches stop - For PCI devices make use of the newly added Topology ID attribute to enable whole card multi-function support despite the change to PCHID per port. Additionally improve the overall robustness and usability of the multifunction support - Various other small improvements, fixes, and cleanups * tag 's390-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (133 commits) s390/cio/ioasm: Convert to use flag output macros s390/cio/qdio: Convert to use flag output macros s390/sclp: Convert to use flag output macros s390/dasd: Convert to use flag output macros s390/boot/physmem: Convert to use flag output macros s390/pci: Convert to use flag output macros s390/kvm: Convert to use flag output macros s390/extmem: Convert to use flag output macros s390/string: Convert to use flag output macros s390/diag: Convert to use flag output macros s390/irq: Convert to use flag output macros s390/smp: Convert to use flag output macros s390/uv: Convert to use flag output macros s390/pai: Convert to use flag output macros s390/mm: Convert to use flag output macros s390/cpu_mf: Convert to use flag output macros s390/cpcmd: Convert to use flag output macros s390/topology: Convert to use flag output macros s390/time: Convert to use flag output macros s390/pageattr: Convert to use flag output macros ...
2024-11-15virtio_ring: remove API virtqueue_set_dma_premappedXuan Zhuo
Now, this API is useless. remove it. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20241112012928.102478-8-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15virtio_ring: introduce add api for premappedXuan Zhuo
Two APIs are introduced to submit premapped per-buffers. int virtqueue_add_inbuf_premapped(struct virtqueue *vq, struct scatterlist *sg, unsigned int num, void *data, void *ctx, gfp_t gfp); int virtqueue_add_outbuf_premapped(struct virtqueue *vq, struct scatterlist *sg, unsigned int num, void *data, gfp_t gfp); Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20241112012928.102478-6-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15virtio_ring: perform premapped operations based on per-bufferXuan Zhuo
The current configuration sets the virtqueue (vq) to premapped mode, implying that all buffers submitted to this queue must be mapped ahead of time. This presents a challenge for the virtnet send queue (sq): the virtnet driver would be required to keep track of dma information for vq size * 17, which can be substantial. However, if the premapped mode were applied on a per-buffer basis, the complexity would be greatly reduced. With AF_XDP enabled, AF_XDP buffers would become premapped, while kernel skb buffers could remain unmapped. And consider that some sgs are not generated by the virtio driver, that may be passed from the block stack. So we can not change the sgs, new APIs are the better way. So we pass the new argument 'premapped' to indicate the buffers submitted to virtio are premapped in advance. Additionally, DMA unmap operations for these buffers will be bypassed. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20241112012928.102478-5-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15virtio_ring: packed: record extras for indirect buffersXuan Zhuo
The subsequent commit needs to know whether every indirect buffer is premapped or not. So we need to introduce an extra struct for every indirect buffer to record this info. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20241112012928.102478-4-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15virtio_ring: split: record extras for indirect buffersXuan Zhuo
The subsequent commit needs to know whether every indirect buffer is premapped or not. So we need to introduce an extra struct for every indirect buffer to record this info. Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20241112012928.102478-3-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15virtio_ring: introduce vring_need_unmap_bufferXuan Zhuo
To make the code readable, introduce vring_need_unmap_buffer() to replace do_unmap. use_dma_api premapped -> vring_need_unmap_buffer() 1. false false false 2. true false true 3. true true false Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20241112012928.102478-2-xuanzhuo@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-13virtio-pci: Introduce APIs to execute device parts admin commandsYishai Hadas
Introduce APIs to handle the execution of device parts admin commands. These APIs cover functionalities such as mode setting, object creation and destruction, and operations like parts get/set and metadata retrieval. These APIs will be utilized in upcoming patches within this series. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20241113115200.209269-5-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-13virtio: Manage device and driver capabilities via the admin commandsYishai Hadas
Manage device and driver capabilities via the admin commands. The device exposes its supported features and resource object limits via an administrative command called VIRTIO_ADMIN_CMD_CAP_ID_LIST_QUERY, using the 'self group type.' Each capability is identified by a unique ID, and the driver communicates the functionality and resource limits it plans to utilize. The capability VIRTIO_DEV_PARTS_CAP specifically represents the device's parts resource object limit. Manage the device's parts resource object ID using a common IDA for both get and set operations. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20241113115200.209269-4-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-13virtio: Extend the admin command to include the result sizeYishai Hadas
Extend the admin command by incorporating a result size field. This allows higher layers to determine the actual result size from the backend when this information is not included in the result_sg. The additional information introduced here will be used in subsequent patches of this series. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20241113115200.209269-3-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-12virtio_vdpa: remove redundant check on descColin Ian King
The boolean variable has_affinity is true when desc is non-null and ops->set_vq_affinity is non-null. Hence the call to create_affinity_masks does not need to check for desc being non-null is redundant when has_affinity is true, so it can be removed as well as the now unused default_affd. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Message-Id: <20241107132114.22188-1-colin.i.king@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
2024-11-12virtio: Make vring_new_virtqueue support packed vringWenyu Huang
It is used for testing in tools/virtio/vringh_test.c. If vring_new_virtqueue supports packed vring, we can add support for packed vring to vringh and test it. Signed-off-by: Wenyu Huang <huangwenyu1998@gmail.com> Message-Id: <20241013033233.65026-1-huangwenyu1998@gmail.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2024-11-07virtio-mem: s390 supportDavid Hildenbrand
Now that s390 code is prepared for memory devices that reside above the maximum storage increment exposed through SCLP, everything is in place to unlock virtio-mem support. As virtio-mem in Linux currently supports logically onlining/offlining memory in pageblock granularity, we have an effective hot(un)plug granularity of 1 MiB on s390. As virito-mem adds/removes individual Linux memory blocks (256MB), we will currently never use gigantic pages in the identity mapping. It is worth noting that neither storage keys nor storage attributes (e.g., data / nodat) are touched when onlining memory blocks, which is good because we are not supposed to touch these parts for unplugged device blocks that are logically offline in Linux. We will currently never initialize storage keys for virtio-mem memory -- IOW, storage_key_init_range() is never called. It could be added in the future when plugging device blocks. But as that function essentially does nothing without modifying the code (changing PAGE_DEFAULT_ACC), that's just fine for now. kexec should work as intended and just like on other architectures that support virtio-mem: we will never place kexec binaries on virtio-mem memory, and never indicate virtio-mem memory to the 2nd kernel. The device driver in the 2nd kernel can simply reset the device -- turning all memory unplugged, to then start plugging memory and adding them to Linux, without causing trouble because the memory is already used elsewhere. The special s390 kdump mode, whereby the 2nd kernel creates the ELF core header, won't currently dump virtio-mem memory. The virtio-mem driver has a special kdump mode, from where we can detect memory ranges to dump. Based on this, support for dumping virtio-mem memory can be added in the future fairly easily. Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20241025141453.1210600-5-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-11-06virtio_pci: Fix admin vq cleanup by using correct info pointerFeng Liu
vp_modern_avq_cleanup() and vp_del_vqs() clean up admin vq resources by virtio_pci_vq_info pointer. The info pointer of admin vq is stored in vp_dev->admin_vq.info instead of vp_dev->vqs[]. Using the info pointer from vp_dev->vqs[] for admin vq causes a kernel NULL pointer dereference bug. In vp_modern_avq_cleanup() and vp_del_vqs(), get the info pointer from vp_dev->admin_vq.info for admin vq to clean up the resources. Also make info ptr as argument of vp_del_vq() to be symmetric with vp_setup_vq(). vp_reset calls vp_modern_avq_cleanup, and causes the Call Trace: ================================================================== BUG: kernel NULL pointer dereference, address:0000000000000000 ... CPU: 49 UID: 0 PID: 4439 Comm: modprobe Not tainted 6.11.0-rc5 #1 RIP: 0010:vp_reset+0x57/0x90 [virtio_pci] Call Trace: <TASK> ... ? vp_reset+0x57/0x90 [virtio_pci] ? vp_reset+0x38/0x90 [virtio_pci] virtio_reset_device+0x1d/0x30 remove_vq_common+0x1c/0x1a0 [virtio_net] virtnet_remove+0xa1/0xc0 [virtio_net] virtio_dev_remove+0x46/0xa0 ... virtio_pci_driver_exit+0x14/0x810 [virtio_pci] ================================================================== Fixes: 4c3b54af907e ("virtio_pci_modern: use completion instead of busy loop to wait on admin cmd result") Signed-off-by: Feng Liu <feliu@nvidia.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Message-Id: <20241024135406.81388-1-feliu@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-10-07Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Several small bugfixes all over the place. Most notably, fixes the vsock allocation with GFP_KERNEL in atomic context, which has been triggering warnings for lots of testers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost/scsi: null-ptr-dereference in vhost_scsi_get_req() vsock/virtio: use GFP_ATOMIC under RCU read lock virtio_console: fix misc probe bugs virtio_ring: tag event_triggered as racy for KCSAN vdpa/octeon_ep: Fix format specifier for pointers in debug messages
2024-09-26Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "Several new features here: - virtio-balloon supports new stats - vdpa supports setting mac address - vdpa/mlx5 suspend/resume as well as MKEY ops are now faster - virtio_fs supports new sysfs entries for queue info - virtio/vsock performance has been improved And fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vsock/virtio: avoid queuing packets when intermediate queue is empty vsock/virtio: refactor virtio_transport_send_pkt_work fw_cfg: Constify struct kobj_type vdpa/mlx5: Postpone MR deletion vdpa/mlx5: Introduce init/destroy for MR resources vdpa/mlx5: Rename mr_mtx -> lock vdpa/mlx5: Extract mr members in own resource struct vdpa/mlx5: Rename function vdpa/mlx5: Delete direct MKEYs in parallel vdpa/mlx5: Create direct MKEYs in parallel MAINTAINERS: add virtio-vsock driver in the VIRTIO CORE section virtio_fs: add sysfs entries for queue information virtio_fs: introduce virtio_fs_put_locked helper vdpa: Remove unused declarations vdpa/mlx5: Parallelize VQ suspend/resume for CVQ MQ command vdpa/mlx5: Small improvement for change_num_qps() vdpa/mlx5: Keep notifiers during suspend but ignore vdpa/mlx5: Parallelize device resume vdpa/mlx5: Parallelize device suspend vdpa/mlx5: Use async API for vq modify commands ...
2024-09-25virtio_ring: tag event_triggered as racy for KCSANMichael S. Tsirkin
Setting event_triggered from the interrupt handler is fundamentally racy. There are races of 2 types: 1. vq processing can read false value while interrupt triggered and set it to true. result will be a bit of extra work when disabling cbs, no big deal. 1. vq processing can set false value then interrupt immediately sets true value since interrupt then triggers a callback which will process buffers, this is also not an issue. However, looks like KCSAN can not figure all this out, and warns about the race between the write and the read. Tag the access data_racy for now. We should probably look at ways to make this more straight-forwardly correct. Message-ID: <6bdd771a4fb7625a9227971b3cf4745c34c31a32.1726153334.git.mst@redhat.com> Reported-by: syzbot+8a02104389c2e0ef5049@syzkaller.appspotmail.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Marco Elver <elver@google.com>
2024-09-10virtio_balloon: introduce memory scan/reclaim infozhenwei pi
Expose memory scan/reclaim information to the host side via virtio balloon device. Now we have a metric to analyze the memory performance: y: counter increases n: counter does not changes h: the rate of counter change is high l: the rate of counter change is low OOM: VIRTIO_BALLOON_S_OOM_KILL STALL: VIRTIO_BALLOON_S_ALLOC_STALL ASCAN: VIRTIO_BALLOON_S_SCAN_ASYNC DSCAN: VIRTIO_BALLOON_S_SCAN_DIRECT ARCLM: VIRTIO_BALLOON_S_RECLAIM_ASYNC DRCLM: VIRTIO_BALLOON_S_RECLAIM_DIRECT - OOM[y], STALL[*], ASCAN[*], DSCAN[*], ARCLM[*], DRCLM[*]: the guest runs under really critial memory pressure - OOM[n], STALL[h], ASCAN[*], DSCAN[l], ARCLM[*], DRCLM[l]: the memory allocation stalls due to cgroup, not the global memory pressure. - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[h]: the memory allocation stalls due to global memory pressure. The performance gets hurt a lot. A high ratio between DRCLM/DSCAN shows quite effective memory reclaiming. - OOM[n], STALL[h], ASCAN[*], DSCAN[h], ARCLM[*], DRCLM[l]: the memory allocation stalls due to global memory pressure. the ratio between DRCLM/DSCAN gets low, the guest OS is thrashing heavily, the serious case leads poor performance and difficult trouble shooting. Ex, sshd may block on memory allocation when accepting new connections, a user can't login a VM by ssh command. - OOM[n], STALL[n], ASCAN[h], DSCAN[n], ARCLM[l], DRCLM[n]: the low ratio between ARCLM/ASCAN shows that the guest tries to reclaim more memory, but it can't. Once more memory is required in future, it will struggle to reclaim memory. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20240423034109.1552866-5-pizhenwei@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-09-10virtio_balloon: introduce memory allocation stall counterzhenwei pi
Memory allocation stall counter represents the performance/latency of memory allocation, expose this counter to the host side by virtio balloon device via out-of-bound way. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20240423034109.1552866-4-pizhenwei@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-09-10virtio_balloon: introduce oom-kill invocationszhenwei pi
When the guest OS runs under critical memory pressure, the guest starts to kill processes. A guest monitor agent may scan 'oom_kill' from /proc/vmstat, and reports the OOM KILL event. However, the agent may be killed and we will loss this critical event(and the later events). For now we can also grep for magic words in guest kernel log from host side. Rather than this unstable way, virtio balloon reports OOM-KILL invocations instead. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: zhenwei pi <pizhenwei@bytedance.com> Message-Id: <20240423034109.1552866-3-pizhenwei@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-08-15virtio: allow driver to disable the configure change notificationJason Wang
Sometime, it would be useful to disable the configure change notification from the driver. So this patch allows this by introducing a variable config_change_driver_disabled and only allow the configure change notification callback to be triggered when it is allowed by both the virtio core and the driver. It is set to false by default to hold the current semantic so we don't need to change any drivers. The first user for this would be virtio-net. Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Cc: Gia-Khanh Nguyen <gia-khanh.nguyen@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20240814052228.4654-3-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-15virtio: rename virtio_config_enabled to virtio_config_core_enabledJason Wang
Following patch will allow the config interrupt to be disabled by a specific driver via another boolean. So this patch renames virtio_config_enabled and relevant helpers to virtio_config_core_enabled. Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Cc: Gia-Khanh Nguyen <gia-khanh.nguyen@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20240814052228.4654-2-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-29Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "The biggest thing here is the adminq change - but it looks like the only way to avoid headq blocking causing indefinite stalls. This fixes three issues: - Prevent admin commands on one VF blocking another. This prevents a bad VF from blocking a good one, as well as fixing a scalability issue with large # of VFs - Correctly return error on command failure on octeon. We used to treat failed commands as a success. - Fix modpost warning when building virtio_dma_buf. Harmless, but the fix is trivial" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_pci_modern: remove admin queue serialization lock virtio_pci_modern: use completion instead of busy loop to wait on admin cmd result virtio_pci_modern: pass cmd as an identification token virtio_pci_modern: create admin queue of queried size virtio: create admin queues alongside other virtqueues virtio_pci: pass vq info as an argument to vp_setup_vq() virtio: push out code to vp_avq_index() virtio_pci_modern: treat vp_dev->admin_vq.info.vq pointer as static virtio_pci: introduce vector allocation fallback for slow path virtqueues virtio_pci: pass vector policy enum to vp_find_one_vq_msix() virtio_pci: pass vector policy enum to vp_find_vqs_msix() virtio_pci: simplify vp_request_msix_vectors() call a bit virtio_pci: push out single vq find code to vp_find_one_vq_msix() vdpa/octeon_ep: Fix error code in octep_process_mbox() virtio: add missing MODULE_DESCRIPTION() macro
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-20Merge tag 'riscv-for-linus-6.11-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for various new ISA extensions: * The Zve32[xf] and Zve64[xfd] sub-extensios of the vector extension * Zimop and Zcmop for may-be-operations * The Zca, Zcf, Zcd and Zcb sub-extensions of the C extension * Zawrs - riscv,cpu-intc is now dtschema - A handful of performance improvements and cleanups to text patching - Support for memory hot{,un}plug - The highest user-allocatable virtual address is now visible in hwprobe * tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (58 commits) riscv: lib: relax assembly constraints in hweight riscv: set trap vector earlier KVM: riscv: selftests: Add Zawrs extension to get-reg-list test KVM: riscv: Support guest wrs.nto riscv: hwprobe: export Zawrs ISA extension riscv: Add Zawrs support for spinlocks dt-bindings: riscv: Add Zawrs ISA extension description riscv: Provide a definition for 'pause' riscv: hwprobe: export highest virtual userspace address riscv: Improve sbi_ecall() code generation by reordering arguments riscv: Add tracepoints for SBI calls and returns riscv: Optimize crc32 with Zbc extension riscv: Enable DAX VMEMMAP optimization riscv: mm: Add support for ZONE_DEVICE virtio-mem: Enable virtio-mem for RISC-V riscv: Enable memory hotplugging for RISC-V riscv: mm: Take memory hotplug read-lock during kernel page table dump riscv: mm: Add memory hotplugging support riscv: mm: Add pfn_to_kaddr() implementation riscv: mm: Refactor create_linear_mapping_range() for memory hot add ...
2024-07-19Merge tag 'char-misc-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc and other driver updates from Greg KH: "Here is the "big" set of char/misc and other driver subsystem changes for 6.11-rc1. Nothing major in here, just loads of new drivers and updates. Included in here are: - IIO api updates and new drivers added - wait_interruptable_timeout() api cleanups for some drivers - MODULE_DESCRIPTION() additions for loads of drivers - parport out-of-bounds fix - interconnect driver updates and additions - mhi driver updates and additions - w1 driver fixes - binder speedups and fixes - eeprom driver updates - coresight driver updates - counter driver update - new misc driver additions - other minor api updates All of these, EXCEPT for the final Kconfig build fix for 32bit systems, have been in linux-next for a while with no reported issues. The Kconfig fixup went in 29 hours ago, so might have missed the latest linux-next, but was acked by everyone involved" * tag 'char-misc-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (330 commits) misc: Kconfig: exclude mrvl-cn10k-dpi compilation for 32-bit systems misc: delete Makefile.rej binder: fix hang of unregistered readers misc: Kconfig: add a new dependency for MARVELL_CN10K_DPI virtio: add missing MODULE_DESCRIPTION() macro agp: uninorth: add missing MODULE_DESCRIPTION() macro spmi: add missing MODULE_DESCRIPTION() macros dev/parport: fix the array out-of-bounds risk samples: configfs: add missing MODULE_DESCRIPTION() macro misc: mrvl-cn10k-dpi: add Octeon CN10K DPI administrative driver misc: keba: Fix missing AUXILIARY_BUS dependency slimbus: Fix struct and documentation alignment in stream.c MAINTAINERS: CC dri-devel list on Qualcomm FastRPC patches misc: fastrpc: use coherent pool for untranslated Compute Banks misc: fastrpc: support complete DMA pool access to the DSP misc: fastrpc: add missing MODULE_DESCRIPTION() macro misc: fastrpc: Add missing dev_err newlines misc: fastrpc: Use memdup_user() nvmem: core: Implement force_ro sysfs attribute nvmem: Use sysfs_emit() for type attribute ...
2024-07-17virtio_pci_modern: remove admin queue serialization lockJiri Pirko
The admin queue operations are protected by newly introduced spin lock. To make it possible to issue parallel commands, remove the admin queue serialization lock. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-14-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci_modern: use completion instead of busy loop to wait on admin cmd ↵Jiri Pirko
result Currently, the code waits in a busy loop on every admin virtqueue issued command to get a reply. That prevents callers from issuing multiple commands in parallel. To overcome this limitation, introduce a virtqueue event callback for admin virtqueue. For every issued command, use completion mechanism to wait on a reply. In the event callback, trigger the completion is done for every incoming reply. Alongside with that, introduce a spin lock to protect the admin virtqueue operations. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-13-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci_modern: pass cmd as an identification tokenJiri Pirko
In preparation to asynchronous admin queue processing, pass cmd pointer as a data arg to virtqueue_add_sgs(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-12-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci_modern: create admin queue of queried sizeJiri Pirko
Don't limit the admin queue size to VIRTIO_AVQ_SGS_MAX and rather rely on the queried queue size. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-11-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio: create admin queues alongside other virtqueuesJiri Pirko
Admin virtqueue is just another virtqueue nothing that special about it. The current implementation treats it somehow separate though in terms of creation and deletion. Unify the admin virtqueue creation and deletion flows to be aligned with the rest of virtqueues, creating it from vp_find_vqs_*() helpers. Let the admin virtqueue to be deleted by vp_del_vqs() as the rest. Call vp_find_one_vq_msix() with slow_path argument being "true" to make sure that in case of limited interrupt vectors the config vector is used for admin queue. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-10-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci: pass vq info as an argument to vp_setup_vq()Jiri Pirko
Instead vp_setup_vq() storing vq info directly to vp_dev->vqs, let the caller provide a pointer to store the info to. This prepares vp_setup_vq() to be able to store admin queue info as well. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-9-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio: push out code to vp_avq_index()Jiri Pirko
To prepare for the follow-up patch to use the code as an op, push out the code that gets admin virtqueue base index and count to a separate helper. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-8-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci_modern: treat vp_dev->admin_vq.info.vq pointer as staticJiri Pirko
It is guaranteed by the virtio_pci and PCI layers that none of the VFs is probed before setup_vq() is called for admin queue and after del_vq() is called for admin queue. Therefore treat vp_dev->admin_vq.info.vq as static, don't null it and don't take cmd lock during assign. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-7-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci: introduce vector allocation fallback for slow path virtqueuesJiri Pirko
If there is not enough vectors to satisfy all virtqueues, currently the fallback is to use one vector for all virtqueues. That may be unnecessary in some cases, when there is enough vectors per data queues. Introduce another fallback policy that tries to allocate vector for all data queues, however for slow path queues (control/admin) it shares config vector. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-6-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci: pass vector policy enum to vp_find_one_vq_msix()Jiri Pirko
Instead of accessing vp_dev->per_vq_vectors, pass vector policy enum as an argument of vp_find_one_vq_msix() in preparation for another irq allocation policy. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-5-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci: pass vector policy enum to vp_find_vqs_msix()Jiri Pirko
In preparation for another irq allocation fallback, introduce vector policy enum and pass the values to vp_find_vqs_msix() instead of bool arg. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-4-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci: simplify vp_request_msix_vectors() call a bitJiri Pirko
Pass desc arg as it is always to vp_request_msix_vectors(). There rely on per_vq_vectors arg and null desc in case it is false. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-3-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_pci: push out single vq find code to vp_find_one_vq_msix()Jiri Pirko
In order to be reused for admin queue setup, push out common code to setup and configure irq for one vq into a separate helper. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240716113552.80599-2-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio: add missing MODULE_DESCRIPTION() macroJeff Johnson
make allmodconfig && make W=1 C=1 reports: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/virtio/virtio_dma_buf.o Add the missing invocation of the MODULE_DESCRIPTION() macro. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Message-Id: <20240602-md-virtio_dma_buf-v1-1-ce602d47e257@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio: rename virtio_find_vqs_info() to virtio_find_vqs()Jiri Pirko
Since the original virtio_find_vqs() is no longer present, rename virtio_find_vqs_info() back to virtio_find_vqs(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-20-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio: convert the rest virtio_find_vqs() users to virtio_find_vqs_info()Jiri Pirko
Instead of passing separate names and callbacks arrays to virtio_find_vqs(), have one of virtual_queue_info structs and pass it to virtio_find_vqs_info(). Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-18-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio_balloon: convert to use virtio_find_vqs_info()Jiri Pirko
Instead of passing separate names and callbacks arrays to virtio_find_vqs(), have one of virtual_queue_info structs and pass it to virtio_find_vqs_info(). Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-17-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-17virtio: rename find_vqs_info() op to find_vqs()Jiri Pirko
Since the original find_vqs() is no longer present, rename find_vqs_info() back to find_vqs(). Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240708074814.1739223-10-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>