summaryrefslogtreecommitdiff
path: root/Documentation/virt
AgeCommit message (Collapse)Author
2022-05-23Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - Initial support for the ARMv9 Scalable Matrix Extension (SME). SME takes the approach used for vectors in SVE and extends this to provide architectural support for matrix operations. No KVM support yet, SME is disabled in guests. - Support for crashkernel reservations above ZONE_DMA via the 'crashkernel=X,high' command line option. - btrfs search_ioctl() fix for live-lock with sub-page faults. - arm64 perf updates: support for the Hisilicon "CPA" PMU for monitoring coherent I/O traffic, support for Arm's CMN-650 and CMN-700 interconnect PMUs, minor driver fixes, kerneldoc cleanup. - Kselftest updates for SME, BTI, MTE. - Automatic generation of the system register macros from a 'sysreg' file describing the register bitfields. - Update the type of the function argument holding the ESR_ELx register value to unsigned long to match the architecture register size (originally 32-bit but extended since ARMv8.0). - stacktrace cleanups. - ftrace cleanups. - Miscellaneous updates, most notably: arm64-specific huge_ptep_get(), avoid executable mappings in kexec/hibernate code, drop TLB flushing from get_clear_flush() (and rename it to get_clear_contig()), ARCH_NR_GPIO bumped to 2048 for ARCH_APPLE. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (145 commits) arm64/sysreg: Generate definitions for FAR_ELx arm64/sysreg: Generate definitions for DACR32_EL2 arm64/sysreg: Generate definitions for CSSELR_EL1 arm64/sysreg: Generate definitions for CPACR_ELx arm64/sysreg: Generate definitions for CONTEXTIDR_ELx arm64/sysreg: Generate definitions for CLIDR_EL1 arm64/sve: Move sve_free() into SVE code section arm64: Kconfig.platforms: Add comments arm64: Kconfig: Fix indentation and add comments arm64: mm: avoid writable executable mappings in kexec/hibernate code arm64: lds: move special code sections out of kernel exec segment arm64/hugetlb: Implement arm64 specific huge_ptep_get() arm64/hugetlb: Use ptep_get() to get the pte value of a huge page arm64: kdump: Do not allocate crash low memory if not needed arm64/sve: Generate ZCR definitions arm64/sme: Generate defintions for SVCR arm64/sme: Generate SMPRI_EL1 definitions arm64/sme: Automatically generate SMPRIMAP_EL2 definitions arm64/sme: Automatically generate SMIDR_EL1 defines arm64/sme: Automatically generate defines for SMCR ...
2022-05-23Merge tag 'x86_sev_for_v5.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull AMD SEV-SNP support from Borislav Petkov: "The third AMD confidential computing feature called Secure Nested Paging. Add to confidential guests the necessary memory integrity protection against malicious hypervisor-based attacks like data replay, memory remapping and others, thus achieving a stronger isolation from the hypervisor. At the core of the functionality is a new structure called a reverse map table (RMP) with which the guest has a say in which pages get assigned to it and gets notified when a page which it owns, gets accessed/modified under the covers so that the guest can take an appropriate action. In addition, add support for the whole machinery needed to launch a SNP guest, details of which is properly explained in each patch. And last but not least, the series refactors and improves parts of the previous SEV support so that the new code is accomodated properly and not just bolted on" * tag 'x86_sev_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) x86/entry: Fixup objtool/ibt validation x86/sev: Mark the code returning to user space as syscall gap x86/sev: Annotate stack change in the #VC handler x86/sev: Remove duplicated assignment to variable info x86/sev: Fix address space sparse warning x86/sev: Get the AP jump table address from secrets page x86/sev: Add missing __init annotations to SEV init routines virt: sevguest: Rename the sevguest dir and files to sev-guest virt: sevguest: Change driver name to reflect generic SEV support x86/boot: Put globals that are accessed early into the .data section x86/boot: Add an efi.h header for the decompressor virt: sevguest: Fix bool function returning negative value virt: sevguest: Fix return value check in alloc_shared_pages() x86/sev-es: Replace open-coded hlt-loop with sev_es_terminate() virt: sevguest: Add documentation for SEV-SNP CPUID Enforcement virt: sevguest: Add support to get extended report virt: sevguest: Add support to derive key virt: Add SEV-SNP guest driver x86/sev: Register SEV-SNP guest request platform device x86/sev: Provide support for SNP guest request NAEs ...
2022-04-29KVM: arm64: uapi: Add kvm_debug_exit_arch.hsr_highAlexandru Elisei
When userspace is debugging a VM, the kvm_debug_exit_arch part of the kvm_run struct contains arm64 specific debug information: the ESR_EL2 value, encoded in the field "hsr", and the address of the instruction that caused the exception, encoded in the field "far". Linux has moved to treating ESR_EL2 as a 64-bit register, but unfortunately kvm_debug_exit_arch.hsr cannot be changed because that would change the memory layout of the struct on big endian machines: Current layout: | Layout with "hsr" extended to 64 bits: | offset 0: ESR_EL2[31:0] (hsr) | offset 0: ESR_EL2[61:32] (hsr[61:32]) offset 4: padding | offset 4: ESR_EL2[31:0] (hsr[31:0]) offset 8: FAR_EL2[61:0] (far) | offset 8: FAR_EL2[61:0] (far) which breaks existing code. The padding is inserted by the compiler because the "far" field must be aligned to 8 bytes (each field must be naturally aligned - aapcs64 [1], page 18), and the struct itself must be aligned to 8 bytes (the struct must be aligned to the maximum alignment of its fields - aapcs64, page 18), which means that "hsr" must be aligned to 8 bytes as it is the first field in the struct. To avoid changing the struct size and layout for the existing fields, add a new field, "hsr_high", which replaces the existing padding. "hsr_high" will be used to hold the ESR_EL2[61:32] bits of the register. The memory layout, both on big and little endian machine, becomes: offset 0: ESR_EL2[31:0] (hsr) offset 4: ESR_EL2[61:32] (hsr_high) offset 8: FAR_EL2[61:0] (far) The padding that the compiler inserts for the current struct layout is unitialized. To prevent an updated userspace running on an old kernel mistaking the padding for a valid "hsr_high" value, add a new flag, KVM_DEBUG_ARCH_HSR_HIGH_VALID, to kvm_run->flags to let userspace know that "hsr_high" holds a valid ESR_EL2[61:32] value. [1] https://github.com/ARM-software/abi-aa/releases/download/2021Q3/aapcs64.pdf Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220425114444.368693-6-alexandru.elisei@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2022-04-29KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENTPaolo Bonzini
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags member that at the time was unused. Unfortunately this extensibility mechanism has several issues: - x86 is not writing the member, so it would not be possible to use it on x86 except for new events - the member is not aligned to 64 bits, so the definition of the uAPI struct is incorrect for 32- on 64-bit userspace. This is a problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately usage of flags was only introduced in 5.18. Since padding has to be introduced, place a new field in there that tells if the flags field is valid. To allow further extensibility, in fact, change flags to an array of 16 values, and store how many of the values are valid. The availability of the new ndata field is tied to a system capability; all architectures are changed to fill in the field. To avoid breaking compilation of userspace that was using the flags field, provide a userspace-only union to overlap flags with data[0]. The new field is placed at the same offset for both 32- and 64-bit userspace. Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Message-Id: <20220422103013.34832-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-27virt: sevguest: Rename the sevguest dir and files to sev-guestTom Lendacky
Rename the drivers/virt/coco/sevguest directory and files to sev-guest so as to match the driver name. [ bp: Rename Documentation/virt/coco/sevguest.rst too, as reported by sfr: https://lore.kernel.org/r/20220427101059.3bf55262@canb.auug.org.au ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/2f5c9cb16e3a67599c8e3170f6c72c8712c47d53.1650464054.git.thomas.lendacky@amd.com
2022-04-11Documentation: KVM: Add SPDX-License-Identifier tagLike Xu
+new file mode 100644 +WARNING: Missing or malformed SPDX-License-Identifier tag in line 1 +#27: FILE: Documentation/virt/kvm/x86/errata.rst:1: Opportunistically update all other non-added KVM documents and remove a new extra blank line at EOF for x86/errata.rst. Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20220406063715.55625-5-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-07virt: sevguest: Add documentation for SEV-SNP CPUID EnforcementMichael Roth
Update the documentation with information regarding SEV-SNP CPUID Enforcement details and what sort of assurances it provides to guests. Signed-off-by: Michael Roth <michael.roth@amd.com> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-47-brijesh.singh@amd.com
2022-04-07virt: sevguest: Add support to get extended reportBrijesh Singh
Version 2 of GHCB specification defines Non-Automatic-Exit (NAE) to get extended guest report which is similar to the SNP_GET_REPORT ioctl. The main difference is related to the additional data that will be returned. That additional data returned is a certificate blob that can be used by the SNP guest user. The certificate blob layout is defined in the GHCB specification. The driver simply treats the blob as a opaque data and copies it to userspace. [ bp: Massage commit message, cast 1st arg of access_ok() ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-46-brijesh.singh@amd.com
2022-04-07virt: sevguest: Add support to derive keyBrijesh Singh
The SNP_GET_DERIVED_KEY ioctl interface can be used by the SNP guest to ask the firmware to provide a key derived from a root key. The derived key may be used by the guest for any purposes it chooses, such as a sealing key or communicating with the external entities. See SEV-SNP firmware spec for more information. [ bp: No need to memset "req" - it will get overwritten. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Link: https://lore.kernel.org/r/20220307213356.2797205-45-brijesh.singh@amd.com
2022-04-07virt: Add SEV-SNP guest driverBrijesh Singh
The SEV-SNP specification provides the guest a mechanism to communicate with the PSP without risk from a malicious hypervisor who wishes to read, alter, drop or replay the messages sent. The driver uses snp_issue_guest_request() to issue GHCB SNP_GUEST_REQUEST or SNP_EXT_GUEST_REQUEST NAE events to submit the request to PSP. The PSP requires that all communication should be encrypted using key specified through a struct snp_guest_platform_data descriptor. Userspace can use SNP_GET_REPORT ioctl() to query the guest attestation report. See SEV-SNP spec section Guest Messages for more details. [ bp: Remove the "what" from the commit message, massage. ] Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220307213356.2797205-44-brijesh.singh@amd.com
2022-04-05Documentation: kvm: Add missing line break in api.rstBagas Sanjaya
Add missing line break separator between literal block and description of KVM_EXIT_RISCV_SBI. This fixes: </path/to/linux>/Documentation/virt/kvm/api.rst:6118: WARNING: Literal block ends without a blank line; unexpected unindent. Fixes: da40d85805937d (RISC-V: KVM: Document RISC-V specific parts of KVM API, 2021-09-27) Cc: Anup Patel <anup.patel@wdc.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-riscv@lists.infradead.org Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com> Message-Id: <20220403065735.23859-1-bagasdotme@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: - Only do MSR filtering for MSRs accessed by rdmsr/wrmsr - Documentation improvements - Prevent module exit until all VMs are freed - PMU Virtualization fixes - Fix for kvm_irq_delivery_to_apic_fast() NULL-pointer dereferences - Other miscellaneous bugfixes * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (42 commits) KVM: x86: fix sending PV IPI KVM: x86/mmu: do compare-and-exchange of gPTE via the user address KVM: x86: Remove redundant vm_entry_controls_clearbit() call KVM: x86: cleanup enter_rmode() KVM: x86: SVM: fix tsc scaling when the host doesn't support it kvm: x86: SVM: remove unused defines KVM: x86: SVM: move tsc ratio definitions to svm.h KVM: x86: SVM: fix avic spec based definitions again KVM: MIPS: remove reference to trap&emulate virtualization KVM: x86: document limitations of MSR filtering KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr KVM: x86/emulator: Emulate RDPID only if it is enabled in guest KVM: x86/pmu: Fix and isolate TSX-specific performance event logic KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs KVM: x86: Trace all APICv inhibit changes and capture overall status KVM: x86: Add wrappers for setting/clearing APICv inhibits KVM: x86: Make APICv inhibit reasons an enum and cleanup naming KVM: X86: Handle implicit supervisor access with SMAP KVM: X86: Rename variable smap to not_smap in permission_fault() ...
2022-04-02KVM: MIPS: remove reference to trap&emulate virtualizationPaolo Bonzini
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220313140522.1307751-1-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: x86: document limitations of MSR filteringPaolo Bonzini
MSR filtering requires an exit to userspace that is hard to implement and would be very slow in the case of nested VMX vmexit and vmentry MSR accesses. Document the limitation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: Remove dirty handling from gfn_to_pfn_cache completelyDavid Woodhouse
It isn't OK to cache the dirty status of a page in internal structures for an indefinite period of time. Any time a vCPU exits the run loop to userspace might be its last; the VMM might do its final check of the dirty log, flush the last remaining dirty pages to the destination and complete a live migration. If we have internal 'dirty' state which doesn't get flushed until the vCPU is finally destroyed on the source after migration is complete, then we have lost data because that will escape the final copy. This problem already exists with the use of kvm_vcpu_unmap() to mark pages dirty in e.g. VMX nesting. Note that the actual Linux MM already considers the page to be dirty since we have a writeable mapping of it. This is just about the KVM dirty logging. For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to track which gfn_to_pfn_caches have been used and explicitly mark the corresponding pages dirty before returning to userspace. But we would have needed external tracking of that anyway, rather than walking the full list of GPCs to find those belonging to this vCPU which are dirty. So let's rely *solely* on that external tracking, and keep it simple rather than laying a tempting trap for callers to fall into. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-02KVM: Don't actually set a request when evicting vCPUs for GFN cache invdSean Christopherson
Don't actually set a request bit in vcpu->requests when making a request purely to force a vCPU to exit the guest. Logging a request but not actually consuming it would cause the vCPU to get stuck in an infinite loop during KVM_RUN because KVM would see the pending request and bail from VM-Enter to service the request. Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as nothing in KVM is wired up to set guest_uses_pa=true. But, it'd be all too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init() without implementing handling of the request, especially since getting test coverage of MMU notifier interaction with specific KVM features usually requires a directed test. Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus to evict_vcpus. The purpose of the request is to get vCPUs out of guest mode, it's supposed to _avoid_ waking vCPUs that are blocking. Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to what it wants to accomplish, and to genericize the name so that it can used for similar but unrelated scenarios, should they arise in the future. Add a comment and documentation to explain why the "no action" request exists. Add compile-time assertions to help detect improper usage. Use the inner assertless helper in the one s390 path that makes requests without a hardcoded request. Cc: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220223165302.3205276-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-31Merge tag 'for-linus-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Devicetree support (for testing) - Various cleanups and fixes: UBD, port_user, uml_mconsole - Maintainer update * tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: run_helper: Write error message to kernel log on exec failure on host um: port_user: Improve error handling when port-helper is not found um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar um: port_user: Search for in.telnetd in PATH um: clang: Strip out -mno-global-merge from USER_CFLAGS docs: UML: Mention telnetd for port channel um: Remove unused timeval_to_ns() function um: Fix uml_mconsole stop/go um: Cleanup syscall_handler_t definition/cast, fix warning uml: net: vector: fix const issue um: Fix WRITE_ZEROES in the UBD Driver um: Migrate vector drivers to NAPI um: Fix order of dtb unflatten/early init um: fix and optimize xor select template for CONFIG64 and timetravel mode um: Document dtb command line option lib/logic_iomem: correct fallback config references um: Remove duplicated include in syscalls_64.c MAINTAINERS: Update UserModeLinux entry
2022-03-29Documentation: KVM: add API issues sectionPaolo Bonzini
Add a section to document all the different ways in which the KVM API sucks. I am sure there are way more, give people a place to vent so that userspace authors are aware. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110712.222449-4-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Documentation: KVM: add virtual CPU errata documentationPaolo Bonzini
Add a file to document all the different ways in which the virtual CPU emulation is imperfect. Include an example to show how to document such errata. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Jim Mattson <jmattson@google.com> Reviewed-by: Oliver Upton <oupton@google.com> Message-Id: <20220322110712.222449-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Documentation: KVM: add separate directories for architecture-specific ↵Paolo Bonzini
documentation ARM already has an arm/ subdirectory, but s390 and x86 do not even though they have a relatively large number of files specific to them. Create new directories in Documentation/virt/kvm for these two architectures as well. While at it, group the API documentation and the developer documentation in the table of contents. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110712.222449-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Documentation: kvm: include new locksPaolo Bonzini
kvm->mn_invalidate_lock and kvm->slots_arch_lock were not included in the documentation, add them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110720.222499-3-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-29Documentation: kvm: fixes for locking.rstPaolo Bonzini
Separate the various locks clearly, and include the new names of blocked_vcpu_on_cpu_lock and blocked_vcpu_on_cpu. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110720.222499-2-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes RISC-V: - Prevent KVM_COMPAT from being selected - Optimize __kvm_riscv_switch_to() implementation - RISC-V SBI v0.3 support s390: - memop selftest - fix SCK locking - adapter interruptions virtualization for secure guests - add Claudio Imbrenda as maintainer - first step to do proper storage key checking x86: - Continue switching kvm_x86_ops to static_call(); introduce static_call_cond() and __static_call_ret0 when applicable. - Cleanup unused arguments in several functions - Synthesize AMD 0x80000021 leaf - Fixes and optimization for Hyper-V sparse-bank hypercalls - Implement Hyper-V's enlightened MSR bitmap for nested SVM - Remove MMU auditing - Eager splitting of page tables (new aka "TDP" MMU only) when dirty page tracking is enabled - Cleanup the implementation of the guest PGD cache - Preparation for the implementation of Intel IPI virtualization - Fix some segment descriptor checks in the emulator - Allow AMD AVIC support on systems with physical APIC ID above 255 - Better API to disable virtualization quirks - Fixes and optimizations for the zapping of page tables: - Zap roots in two passes, avoiding RCU read-side critical sections that last too long for very large guests backed by 4 KiB SPTEs. - Zap invalid and defunct roots asynchronously via concurrency-managed work queue. - Allowing yielding when zapping TDP MMU roots in response to the root's last reference being put. - Batch more TLB flushes with an RCU trick. Whoever frees the paging structure now holds RCU as a proxy for all vCPUs running in the guest, i.e. to prolongs the grace period on their behalf. It then kicks the the vCPUs out of guest mode before doing rcu_read_unlock(). Generic: - Introduce __vcalloc and use it for very large allocations that need memcg accounting" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits) KVM: use kvcalloc for array allocations KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 kvm: x86: Require const tsc for RT KVM: x86: synthesize CPUID leaf 0x80000021h if useful KVM: x86: add support for CPUID leaf 0x80000021 KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()" kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU KVM: arm64: fix typos in comments KVM: arm64: Generalise VM features into a set of flags KVM: s390: selftests: Add error memop tests KVM: s390: selftests: Add more copy memop tests KVM: s390: selftests: Add named stages for memop test KVM: s390: selftests: Add macro as abstraction for MEM_OP KVM: s390: selftests: Split memop tests KVM: s390x: fix SCK locking RISC-V: KVM: Implement SBI HSM suspend call RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function RISC-V: Add SBI HSM suspend related defines RISC-V: KVM: Implement SBI v0.3 SRST extension ...
2022-03-21Merge tag 'docs-5.18' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "It has been a moderately busy cycle for documentation; some of the highlights are: - Numerous PDF-generation improvements - Kees's new document with guidelines for researchers studying the development community. - The ongoing stream of Chinese translations - Thorsten's new document on regression handling - A major reworking of the internal documentation for the kernel-doc script. Plus the usual stream of typo fixes and such" * tag 'docs-5.18' of git://git.lwn.net/linux: (80 commits) docs/kernel-parameters: update description of mem= docs/zh_CN: Add sched-nice-design Chinese translation docs: scheduler: Convert schedutil.txt to ReST Docs: ktap: add code-block type docs: serial: fix a reference file name in driver.rst docs: UML: Mention telnetd for port channel docs/zh_CN: add damon reclaim translation docs/zh_CN: add damon usage translation docs/zh_CN: add admin-guide damon start translation docs/zh_CN: add admin-guide damon index translation docs/zh_CN: Refactoring the admin-guide directory index zh_CN: Add translation for admin-guide/mm/index.rst zh_CN: Add translations for admin-guide/mm/ksm.rst Add Chinese translation for vm/ksm.rst docs/zh_CN: Add sched-stats Chinese translation docs/zh_CN: add devicetree of_unittest translation docs/zh_CN: add devicetree usage-model translation docs/zh_CN: add devicetree index translation Documentation: describe how to apply incremental stable patches docs/zh_CN: add peci subsystem translation ...
2022-03-21KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2Oliver Upton
KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not advertise the set of quirks which may be disabled to userspace, so it is impossible to predict the behavior of KVM. Worse yet, KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning it fails to reject attempts to set invalid quirk bits. The only valid workaround for the quirky quirks API is to add a new CAP. Actually advertise the set of quirks that can be disabled to userspace so it can predict KVM's behavior. Reject values for cap->args[0] that contain invalid bits. Finally, add documentation for the new capability and describe the existing quirks. Signed-off-by: Oliver Upton <oupton@google.com> Message-Id: <20220301060351.442881-5-oupton@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-18Merge tag 'kvmarm-5.18' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.18 - Proper emulation of the OSLock feature of the debug architecture - Scalibility improvements for the MMU lock when dirty logging is on - New VMID allocator, which will eventually help with SVA in VMs - Better support for PMUs in heterogenous systems - PSCI 1.1 support, enabling support for SYSTEM_RESET2 - Implement CONFIG_DEBUG_LIST at EL2 - Make CONFIG_ARM64_ERRATUM_2077057 default y - Reduce the overhead of VM exit when no interrupt is pending - Remove traces of 32bit ARM host support from the documentation - Updated vgic selftests - Various cleanups, doc updates and spelling fixes
2022-03-11docs: UML: Mention telnetd for port channelVincent Whitchurch
It is not obvious from the documentation that using the "port" channel for the console requires telnetd to be installed (see port_connection() in arch/um/drivers/port_user.c). Mention this, and the fact that UML will not boot until a client connects. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://lore.kernel.org/r/20220310124230.3069354-1-vincent.whitchurch@axis.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2022-03-11docs: UML: Mention telnetd for port channelVincent Whitchurch
It is not obvious from the documentation that using the "port" channel for the console requires telnetd to be installed (see port_connection() in arch/um/drivers/port_user.c). Mention this, and the fact that UML will not boot until a client connects. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2022-03-11um: Document dtb command line optionAnton Ivanov
Add documentation for the dtb command line option and the ability to load/parse device trees. Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Reviewed-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2022-03-09Merge branch kvm-arm64/misc-5.18 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/misc-5.18: : . : Misc fixes for KVM/arm64 5.18: : : - Drop unused kvm parameter to kvm_psci_version() : : - Implement CONFIG_DEBUG_LIST at EL2 : : - Make CONFIG_ARM64_ERRATUM_2077057 default y : : - Only do the interrupt dance if we have exited because of an interrupt : : - Remove traces of 32bit ARM host support from the documentation : . Documentation: KVM: Update documentation to indicate KVM is arm64-only KVM: arm64: Only open the interrupt window on exit due to an interrupt KVM: arm64: Enable Cortex-A510 erratum 2077057 by default Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-03-09Documentation: KVM: Update documentation to indicate KVM is arm64-onlyOliver Upton
KVM support for 32-bit ARM hosts (KVM/arm) has been removed from the kernel since commit 541ad0150ca4 ("arm: Remove 32bit KVM host support"). There still exists some remnants of the old architecture in the KVM documentation. Remove all traces of 32-bit host support from the documentation. Note that AArch32 guests are still supported. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oupton@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220308172856.2997250-1-oupton@google.com
2022-03-04Merge branch 'kvm-bugfixes' into HEADPaolo Bonzini
Merge bugfixes from 5.17 before merging more tricky work.
2022-03-01KVM: Drop KVM_REQ_MMU_RELOAD and update vcpu-requests.rst documentationSean Christopherson
Remove the now unused KVM_REQ_MMU_RELOAD, shift KVM_REQ_VM_DEAD into the unoccupied space, and update vcpu-requests.rst, which was missing an entry for KVM_REQ_VM_DEAD. Switching KVM_REQ_VM_DEAD to entry '1' also fixes the stale comment about bits 4-7 being reserved. Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Ben Gardon <bgardon@google.com> Message-Id: <20220225182248.3812651-7-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-25Merge branch kvm-arm64/psci-1.1 into kvmarm-master/nextMarc Zyngier
* kvm-arm64/psci-1.1: : . : Limited PSCI-1.1 support from Will Deacon: : : This small series exposes the PSCI SYSTEM_RESET2 call to guests, which : allows the propagation of a "reset_type" and a "cookie" back to the VMM. : Although Linux guests only ever pass 0 for the type ("SYSTEM_WARM_RESET"), : the vendor-defined range can be used by a bootloader to provide additional : information about the reset, such as an error code. : . KVM: arm64: Remove unneeded semicolons KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field KVM: arm64: Expose PSCI SYSTEM_RESET2 call to the guest KVM: arm64: Bump guest PSCI version to 1.1 Signed-off-by: Marc Zyngier <maz@kernel.org>
2022-02-25KVM: x86: Provide per VM capability for disabling PMU virtualizationDavid Dunn
Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of settings/features to allow userspace to configure PMU virtualization on a per-VM basis. For now, support a single flag, KVM_PMU_CAP_DISABLE, to allow disabling PMU virtualization for a VM even when KVM is configured with enable_pmu=true a module level. To keep KVM simple, disallow changing VM's PMU configuration after vCPUs have been created. Signed-off-by: David Dunn <daviddunn@google.com> Message-Id: <20220223225743.2703915-2-daviddunn@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-24Merge branch 'kvm-ppc-cap-210' into kvm-next-5.18Paolo Bonzini
2022-02-22Merge branch 'kvm-ppc-cap-210' into kvm-masterPaolo Bonzini
By request of Nick Piggin: > Patch 3 requires a KVM_CAP_PPC number allocated. QEMU maintainers are > happy with it (link in changelog) just waiting on KVM upstreaming. Do > you have objections to the series going to ppc/kvm tree first, or > another option is you could take patch 3 alone first (it's relatively > independent of the other 2) and ppc/kvm gets it from you?
2022-02-22KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3Nicholas Piggin
Add KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL resource mode to 3 with the H_SET_MODE hypercall. This capability differs between processor types and KVM types (PR, HV, Nested HV), and affects guest-visible behaviour. QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and use the KVM CAP if available to determine KVM support[2]. Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-22KVM: s390: Clarify key argument for MEM_OP in api docsJanis Schoetterl-Glausch
Clarify that the key argument represents the access key, not the whole storage key. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Link: https://lore.kernel.org/r/20220221143657.3712481-1-scgl@linux.ibm.com Fixes: 5e35d0eb472b ("KVM: s390: Update api documentation for memop ioctl") Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-21KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags fieldWill Deacon
When handling reset and power-off PSCI calls from the guest, we initialise X0 to PSCI_RET_INTERNAL_FAILURE in case the VMM tries to re-run the vCPU after issuing the call. Unfortunately, this also means that the VMM cannot see which PSCI call was issued and therefore cannot distinguish between PSCI SYSTEM_RESET and SYSTEM_RESET2 calls, which is necessary in order to determine the validity of the "reset_type" in X1. Allocate bit 0 of the previously unused 'flags' field of the system_event structure so that we can indicate the PSCI call used to initiate the reset. Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Alexandru Elisei <alexandru.elisei@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220221153524.15397-4-will@kernel.org
2022-02-17KVM: x86: Add KVM_CAP_ENABLE_CAP to x86Aaron Lewis
Follow the precedent set by other architectures that support the VCPU ioctl, KVM_ENABLE_CAP, and advertise the VM extension, KVM_CAP_ENABLE_CAP. This way, userspace can ensure that KVM_ENABLE_CAP is available on a vcpu before using it. Fixes: 5c919412fe61 ("kvm/x86: Hyper-V synthetic interrupt controller") Signed-off-by: Aaron Lewis <aaronlewis@google.com> Message-Id: <20220214212950.1776943-1-aaronlewis@google.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-02-14KVM: s390: Update api documentation for memop ioctlJanis Schoetterl-Glausch
Document all currently existing operations, flags and explain under which circumstances they are available. Document the recently introduced absolute operations and the storage key protection flag, as well as the existing SIDA operations. Signed-off-by: Janis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-10-scgl@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2022-02-08KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPUAlexandru Elisei
Userspace can assign a PMU to a VCPU with the KVM_ARM_VCPU_PMU_V3_SET_PMU device ioctl. If the VCPU is scheduled on a physical CPU which has a different PMU, the perf events needed to emulate a guest PMU won't be scheduled in and the guest performance counters will stop counting. Treat it as an userspace error and refuse to run the VCPU in this situation. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-7-alexandru.elisei@arm.com
2022-02-08KVM: arm64: Add KVM_ARM_VCPU_PMU_V3_SET_PMU attributeAlexandru Elisei
When KVM creates an event and there are more than one PMUs present on the system, perf_init_event() will go through the list of available PMUs and will choose the first one that can create the event. The order of the PMUs in this list depends on the probe order, which can change under various circumstances, for example if the order of the PMU nodes change in the DTB or if asynchronous driver probing is enabled on the kernel command line (with the driver_async_probe=armv8-pmu option). Another consequence of this approach is that on heteregeneous systems all virtual machines that KVM creates will use the same PMU. This might cause unexpected behaviour for userspace: when a VCPU is executing on the physical CPU that uses this default PMU, PMU events in the guest work correctly; but when the same VCPU executes on another CPU, PMU events in the guest will suddenly stop counting. Fortunately, perf core allows user to specify on which PMU to create an event by using the perf_event_attr->type field, which is used by perf_init_event() as an index in the radix tree of available PMUs. Add the KVM_ARM_VCPU_PMU_V3_CTRL(KVM_ARM_VCPU_PMU_V3_SET_PMU) VCPU attribute to allow userspace to specify the arm_pmu that KVM will use when creating events for that VCPU. KVM will make no attempt to run the VCPU on the physical CPUs that share the PMU, leaving it up to userspace to manage the VCPU threads' affinity accordingly. To ensure that KVM doesn't expose an asymmetric system to the guest, the PMU set for one VCPU will be used by all other VCPUs. Once a VCPU has run, the PMU cannot be changed in order to avoid changing the list of available events for a VCPU, or to change the semantics of existing events. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-6-alexandru.elisei@arm.com
2022-02-08KVM: arm64: Do not change the PMU event filter after a VCPU has runMarc Zyngier
Userspace can specify which events a guest is allowed to use with the KVM_ARM_VCPU_PMU_V3_FILTER attribute. The list of allowed events can be identified by a guest from reading the PMCEID{0,1}_EL0 registers. Changing the PMU event filter after a VCPU has run can cause reads of the registers performed before the filter is changed to return different values than reads performed with the new event filter in place. The architecture defines the two registers as read-only, and this behaviour contradicts that. Keep track when the first VCPU has run and deny changes to the PMU event filter to prevent this from happening. Signed-off-by: Marc Zyngier <maz@kernel.org> [ Alexandru E: Added commit message, updated ioctl documentation ] Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220127161759.53553-2-alexandru.elisei@arm.com
2022-01-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "Two larger x86 series: - Redo incorrect fix for SEV/SMAP erratum - Windows 11 Hyper-V workaround Other x86 changes: - Various x86 cleanups - Re-enable access_tracking_perf_test - Fix for #GP handling on SVM - Fix for CPUID leaf 0Dh in KVM_GET_SUPPORTED_CPUID - Fix for ICEBP in interrupt shadow - Avoid false-positive RCU splat - Enable Enlightened MSR-Bitmap support for real ARM: - Correctly update the shadow register on exception injection when running in nVHE mode - Correctly use the mm_ops indirection when performing cache invalidation from the page-table walker - Restrict the vgic-v3 workaround for SEIS to the two known broken implementations Generic code changes: - Dead code cleanup" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits) KVM: eventfd: Fix false positive RCU usage warning KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread() KVM: nVMX: Rename vmcs_to_field_offset{,_table} KVM: nVMX: eVMCS: Filter out VM_EXIT_SAVE_VMX_PREEMPTION_TIMER KVM: nVMX: Also filter MSR_IA32_VMX_TRUE_PINBASED_CTLS when eVMCS selftests: kvm: check dynamic bits against KVM_X86_XCOMP_GUEST_SUPP KVM: x86: add system attribute to retrieve full set of supported xsave states KVM: x86: Add a helper to retrieve userspace address from kvm_device_attr selftests: kvm: move vm_xsave_req_perm call to amx_test KVM: x86: Sync the states size with the XCR0/IA32_XSS at, any time KVM: x86: Update vCPU's runtime CPUID on write to MSR_IA32_XSS KVM: x86: Keep MSR_IA32_XSS unchanged for INIT KVM: x86: Free kvm_cpuid_entry2 array on post-KVM_RUN KVM_SET_CPUID{,2} KVM: nVMX: WARN on any attempt to allocate shadow VMCS for vmcs02 KVM: selftests: Don't skip L2's VMCALL in SMM test for SVM guest KVM: x86: Check .flags in kvm_cpuid_check_equal() too KVM: x86: Forcibly leave nested virt when SMM state is toggled KVM: SVM: drop unnecessary code in svm_hv_vmcb_dirty_nested_enlightenments() KVM: SVM: hyper-v: Enable Enlightened MSR-Bitmap support for real ...
2022-01-28KVM: x86: add system attribute to retrieve full set of supported xsave statesPaolo Bonzini
Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that have not been enabled. Probing those, for example so that they can be passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl. The latter is in fact worse, even though that is what the rest of the API uses, because it would require supported_xcr0 to be moved from the KVM module to the kernel just for this use. In addition, the value would be nonsensical (or an error would have to be returned) until the KVM module is loaded in. Therefore, to limit the growth of system ioctls, add a /dev/kvm variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86 with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-22Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull more kvm updates from Paolo Bonzini: "Generic: - selftest compilation fix for non-x86 - KVM: avoid warning on s390 in mark_page_dirty x86: - fix page write-protection bug and improve comments - use binary search to lookup the PMU event filter, add test - enable_pmu module parameter support for Intel CPUs - switch blocked_vcpu_on_cpu_lock to raw spinlock - cleanups of blocked vCPU logic - partially allow KVM_SET_CPUID{,2} after KVM_RUN (5.16 regression) - various small fixes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits) docs: kvm: fix WARNINGs from api.rst selftests: kvm/x86: Fix the warning in lib/x86_64/processor.c selftests: kvm/x86: Fix the warning in pmu_event_filter_test.c kvm: selftests: Do not indent with spaces kvm: selftests: sync uapi/linux/kvm.h with Linux header selftests: kvm: add amx_test to .gitignore KVM: SVM: Nullify vcpu_(un)blocking() hooks if AVIC is disabled KVM: SVM: Move svm_hardware_setup() and its helpers below svm_x86_ops KVM: SVM: Drop AVIC's intermediate avic_set_running() helper KVM: VMX: Don't do full kick when handling posted interrupt wakeup KVM: VMX: Fold fallback path into triggering posted IRQ helper KVM: VMX: Pass desired vector instead of bool for triggering posted IRQ KVM: VMX: Don't do full kick when triggering posted interrupt "fails" KVM: SVM: Skip AVIC and IRTE updates when loading blocking vCPU KVM: SVM: Use kvm_vcpu_is_blocking() in AVIC load to handle preemption KVM: SVM: Remove unnecessary APICv/AVIC update in vCPU unblocking path KVM: SVM: Don't bother checking for "running" AVIC when kicking for IPIs KVM: SVM: Signal AVIC doorbell iff vCPU is in guest mode KVM: x86: Remove defunct pre_block/post_block kvm_x86_ops hooks KVM: x86: Unexport LAPIC's switch_to_{hv,sw}_timer() helpers ...
2022-01-20docs: kvm: fix WARNINGs from api.rstWei Wang
Use the api number 134 for KVM_GET_XSAVE2, instead of 42, which has been used by KVM_GET_XSAVE. Also, fix the WARNINGs of the underlines being too short. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Wei Wang <wei.w.wang@intel.com> Tested-by: Stephen Rothwell <sfr@canb.auug.org.au> Message-Id: <20220120045003.315177-1-wei.w.wang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-01-16Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "RISCV: - Use common KVM implementation of MMU memory caches - SBI v0.2 support for Guest - Initial KVM selftests support - Fix to avoid spurious virtual interrupts after clearing hideleg CSR - Update email address for Anup and Atish ARM: - Simplification of the 'vcpu first run' by integrating it into KVM's 'pid change' flow - Refactoring of the FP and SVE state tracking, also leading to a simpler state and less shared data between EL1 and EL2 in the nVHE case - Tidy up the header file usage for the nvhe hyp object - New HYP unsharing mechanism, finally allowing pages to be unmapped from the Stage-1 EL2 page-tables - Various pKVM cleanups around refcounting and sharing - A couple of vgic fixes for bugs that would trigger once the vcpu xarray rework is merged, but not sooner - Add minimal support for ARMv8.7's PMU extension - Rework kvm_pgtable initialisation ahead of the NV work - New selftest for IRQ injection - Teach selftests about the lack of default IPA space and page sizes - Expand sysreg selftest to deal with Pointer Authentication - The usual bunch of cleanups and doc update s390: - fix sigp sense/start/stop/inconsistency - cleanups x86: - Clean up some function prototypes more - improved gfn_to_pfn_cache with proper invalidation, used by Xen emulation - add KVM_IRQ_ROUTING_XEN_EVTCHN and event channel delivery - completely remove potential TOC/TOU races in nested SVM consistency checks - update some PMCs on emulated instructions - Intel AMX support (joint work between Thomas and Intel) - large MMU cleanups - module parameter to disable PMU virtualization - cleanup register cache - first part of halt handling cleanups - Hyper-V enlightened MSR bitmap support for nested hypervisors Generic: - clean up Makefiles - introduce CONFIG_HAVE_KVM_DIRTY_RING - optimize memslot lookup using a tree - optimize vCPU array usage by converting to xarray" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (268 commits) x86/fpu: Fix inline prefix warnings selftest: kvm: Add amx selftest selftest: kvm: Move struct kvm_x86_state to header selftest: kvm: Reorder vcpu_load_state steps for AMX kvm: x86: Disable interception for IA32_XFD on demand x86/fpu: Provide fpu_sync_guest_vmexit_xfd_state() kvm: selftests: Add support for KVM_CAP_XSAVE2 kvm: x86: Add support for getting/setting expanded xstate buffer x86/fpu: Add uabi_size to guest_fpu kvm: x86: Add CPUID support for Intel AMX kvm: x86: Add XCR0 support for Intel AMX kvm: x86: Disable RDMSR interception of IA32_XFD_ERR kvm: x86: Emulate IA32_XFD_ERR for guest kvm: x86: Intercept #NM for saving IA32_XFD_ERR x86/fpu: Prepare xfd_err in struct fpu_guest kvm: x86: Add emulation for IA32_XFD x86/fpu: Provide fpu_update_guest_xfd() for IA32_XFD emulation kvm: x86: Enable dynamic xfeatures at KVM_SET_CPUID2 x86/fpu: Provide fpu_enable_guest_xfd_features() for KVM x86/fpu: Add guest support to xfd_enable_feature() ...