summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-10-30Merge branch kvm-arm64/mops into kvmarm/nextOliver Upton
* kvm-arm64/mops: : KVM support for MOPS, courtesy of Kristina Martsenko : : MOPS adds new instructions for accelerating memcpy(), memset(), and : memmove() operations in hardware. This series brings virtualization : support for KVM guests, and allows VMs to run on asymmetrict systems : that may have different MOPS implementations. KVM: arm64: Expose MOPS instructions to guests KVM: arm64: Add handler for MOPS exceptions Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/writable-id-regs into kvmarm/nextOliver Upton
* kvm-arm64/writable-id-regs: : Writable ID registers, courtesy of Jing Zhang : : This series significantly expands the architectural feature set that : userspace can manipulate via the ID registers. A new ioctl is defined : that makes the mutable fields in the ID registers discoverable to : userspace. KVM: selftests: Avoid using forced target for generating arm64 headers tools headers arm64: Fix references to top srcdir in Makefile KVM: arm64: selftests: Test for setting ID register from usersapce tools headers arm64: Update sysreg.h with kernel sources KVM: selftests: Generate sysreg-defs.h and add to include path perf build: Generate arm64's sysreg-defs.h and add to include path tools: arm64: Add a Makefile for generating sysreg-defs.h KVM: arm64: Document vCPU feature selection UAPIs KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1 KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1 KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1 KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1 KVM: arm64: Bump up the default KVM sanitised debug version to v8p8 KVM: arm64: Reject attempts to set invalid debug arch version KVM: arm64: Advertise selected DebugVer in DBGDIDR.Version KVM: arm64: Use guest ID register values for the sake of emulation KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKS KVM: arm64: Allow userspace to get the writable masks for feature ID registers Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30KVM: selftests: Avoid using forced target for generating arm64 headersOliver Upton
The 'prepare' target that generates the arm64 sysreg headers had no prerequisites, so it wound up forcing a rebuild of all KVM selftests each invocation. Add a rule for the generated headers and just have dependents use that for a prerequisite. Reported-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Fixes: 9697d84cc3b6 ("KVM: selftests: Generate sysreg-defs.h and add to include path") Tested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Link: https://lore.kernel.org/r/20231027005439.3142015-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30tools headers arm64: Fix references to top srcdir in MakefileOliver Upton
Aishwarya reports that KVM selftests for arm64 fail with the following error: | make[4]: Entering directory '/tmp/kci/linux/tools/testing/selftests/kvm' | Makefile:270: warning: overriding recipe for target | '/tmp/kci/linux/build/kselftest/kvm/get-reg-list' | Makefile:265: warning: ignoring old recipe for target | '/tmp/kci/linux/build/kselftest/kvm/get-reg-list' | make -C ../../../../tools/arch/arm64/tools/ | make[5]: Entering directory '/tmp/kci/linux/tools/arch/arm64/tools' | Makefile:10: ../tools/scripts/Makefile.include: No such file or directory | make[5]: *** No rule to make target '../tools/scripts/Makefile.include'. | Stop. It would appear that this only affects builds from the top-level Makefile (e.g. make kselftest-all), as $(srctree) is set to ".". Work around the issue by shadowing the kselftest naming scheme for the source tree variable. Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com> Fixes: 0359c946b131 ("tools headers arm64: Update sysreg.h with kernel sources") Link: https://lore.kernel.org/r/20231027005439.3142015-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/sgi-injection into kvmarm/nextOliver Upton
* kvm-arm64/sgi-injection: : vSGI injection improvements + fixes, courtesy Marc Zyngier : : Avoid linearly searching for vSGI targets using a compressed MPIDR to : index a cache. While at it, fix some egregious bugs in KVM's mishandling : of vcpuid (user-controlled value) and vcpu_idx. KVM: arm64: Clarify the ordering requirements for vcpu/RD creation KVM: arm64: vgic-v3: Optimize affinity-based SGI injection KVM: arm64: Fast-track kvm_mpidr_to_vcpu() when mpidr_data is available KVM: arm64: Build MPIDR to vcpu index cache at runtime KVM: arm64: Simplify kvm_vcpu_get_mpidr_aff() KVM: arm64: Use vcpu_idx for invalidation tracking KVM: arm64: vgic: Use vcpu_idx for the debug information KVM: arm64: vgic-v2: Use cpuid from userspace as vcpu_id KVM: arm64: vgic-v3: Refactor GICv3 SGI generation KVM: arm64: vgic-its: Treat the collection target address as a vcpu_id KVM: arm64: vgic: Make kvm_vgic_inject_irq() take a vcpu pointer Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/stage2-vhe-load into kvmarm/nextOliver Upton
* kvm-arm64/stage2-vhe-load: : Setup stage-2 MMU from vcpu_load() for VHE : : Unlike nVHE, there is no need to switch the stage-2 MMU around on guest : entry/exit in VHE mode as the host is running at EL2. Despite this KVM : reloads the stage-2 on every guest entry, which is needless. : : This series moves the setup of the stage-2 MMU context to vcpu_load() : when running in VHE mode. This is likely to be a win across the board, : but also allows us to remove an ISB on the guest entry path for systems : with one of the speculative AT errata. KVM: arm64: Move VTCR_EL2 into struct s2_mmu KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe() KVM: arm64: Rename helpers for VHE vCPU load/put KVM: arm64: Reload stage-2 for VMID change on VHE KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host() KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/nv-trap-fixes into kvmarm/nextOliver Upton
* kvm-arm64/nv-trap-fixes: : NV trap forwarding fixes, courtesy Miguel Luis and Marc Zyngier : : - Explicitly define the effects of HCR_EL2.NV on EL2 sysregs in the : NV trap encoding : : - Make EL2 registers that access AArch32 guest state UNDEF or RAZ/WI : where appropriate for NV guests KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WI KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregs KVM: arm64: Refine _EL2 system register list that require trap reinjection arm64: Add missing _EL2 encodings arm64: Add missing _EL12 encodings Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/smccc-filter-cleanups into kvmarm/nextOliver Upton
* kvm-arm64/smccc-filter-cleanups: : Cleanup the management of KVM's SMCCC maple tree : : Avoid the cost of maintaining the SMCCC filter maple tree if userspace : hasn't writen a rule to the filter. While at it, rip out the now : unnecessary VM flag to indicate whether or not the SMCCC filter was : configured. KVM: arm64: Use mtree_empty() to determine if SMCCC filter configured KVM: arm64: Only insert reserved ranges when SMCCC filter is used KVM: arm64: Add a predicate for testing if SMCCC filter is configured Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/pmevtyper-filter into kvmarm/nextOliver Upton
* kvm-arm64/pmevtyper-filter: : Fixes to KVM's handling of the PMUv3 exception level filtering bits : : - NSH (count at EL2) and M (count at EL3) should be stateful when the : respective EL is advertised in the ID registers but have no effect on : event counting. : : - NSU and NSK modify the event filtering of EL0 and EL1, respectively. : Though the kernel may not use these bits, other KVM guests might. : Implement these bits exactly as written in the pseudocode if EL3 is : advertised. KVM: arm64: Add PMU event filter bits required if EL3 is implemented KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertised Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/feature-flag-refactor into kvmarm/nextOliver Upton
* kvm-arm64/feature-flag-refactor: : vCPU feature flag cleanup : : Clean up KVM's handling of vCPU feature flags to get rid of the : vCPU-scoped bitmaps and remove failure paths from kvm_reset_vcpu(). KVM: arm64: Get rid of vCPU-scoped feature bitmap KVM: arm64: Remove unused return value from kvm_reset_vcpu() KVM: arm64: Hoist NV+SVE check into KVM_ARM_VCPU_INIT ioctl handler KVM: arm64: Prevent NV feature flag on systems w/o nested virt KVM: arm64: Hoist PAuth checks into KVM_ARM_VCPU_INIT ioctl KVM: arm64: Hoist SVE check into KVM_ARM_VCPU_INIT ioctl handler KVM: arm64: Hoist PMUv3 check into KVM_ARM_VCPU_INIT ioctl handler KVM: arm64: Add generic check for system-supported vCPU features Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30Merge branch kvm-arm64/misc into kvmarm/nextOliver Upton
* kvm-arm64/misc: : Miscellaneous updates : : - Put an upper bound on the number of I-cache invalidations by : cacheline to avoid soft lockups : : - Get rid of bogus refererence count transfer for THP mappings : : - Do a local TLB invalidation on permission fault race : : - Fixes for page_fault_test KVM selftest : : - Add a tracepoint for detecting MMIO instructions unsupported by KVM KVM: arm64: Add tracepoint for MMIO accesses where ISV==0 KVM: arm64: selftest: Perform ISB before reading PAR_EL1 KVM: arm64: selftest: Add the missing .guest_prepare() KVM: arm64: Always invalidate TLB for stage-2 permission faults KVM: arm64: Do not transfer page refcount for THP adjustment KVM: arm64: Avoid soft lockups due to I-cache maintenance arm64: tlbflush: Rename MAX_TLBI_OPS KVM: arm64: Don't use kerneldoc comment for arm64_check_features() Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30KVM: arm64: Add tracepoint for MMIO accesses where ISV==0Oliver Upton
It is a pretty well known fact that KVM does not support MMIO emulation without valid instruction syndrome information (ESR_EL2.ISV == 0). The current kvm_pr_unimpl() is pretty useless, as it contains zero context to relate the event to a vCPU. Replace it with a precise tracepoint that dumps the relevant context so the user can make sense of what the guest is doing. Acked-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231026205306.3045075-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30KVM: arm64: selftest: Perform ISB before reading PAR_EL1Zenghui Yu
It looks like a mistake to issue ISB *after* reading PAR_EL1, we should instead perform it between the AT instruction and the reads of PAR_EL1. As according to DDI0487J.a IJTYVP, "When an address translation instruction is executed, explicit synchronization is required to guarantee the result is visible to subsequent direct reads of PAR_EL1." Otherwise all guest_at testcases fail on my box with ==== Test Assertion Failure ==== aarch64/page_fault_test.c:142: par & 1 == 0 pid=1355864 tid=1355864 errno=4 - Interrupted system call 1 0x0000000000402853: vcpu_run_loop at page_fault_test.c:681 2 0x0000000000402cdb: run_test at page_fault_test.c:730 3 0x0000000000403897: for_each_guest_mode at guest_modes.c:100 4 0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1105 5 (inlined by) main at page_fault_test.c:1131 6 0x0000ffffb153c03b: ?? ??:0 7 0x0000ffffb153c113: ?? ??:0 8 0x0000000000401aaf: _start at ??:? 0x1 != 0x0 (par & 1 != 0) Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231007124043.626-2-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30KVM: arm64: selftest: Add the missing .guest_prepare()Zenghui Yu
Running page_fault_test on a Cortex A72 fails with Test: ro_memslot_no_syndrome_guest_cas Testing guest mode: PA-bits:40, VA-bits:48, 4K pages Testing memory backing src type: anonymous ==== Test Assertion Failure ==== aarch64/page_fault_test.c:117: guest_check_lse() pid=1944087 tid=1944087 errno=4 - Interrupted system call 1 0x00000000004028b3: vcpu_run_loop at page_fault_test.c:682 2 0x0000000000402d93: run_test at page_fault_test.c:731 3 0x0000000000403957: for_each_guest_mode at guest_modes.c:100 4 0x00000000004019f3: for_each_test_and_guest_mode at page_fault_test.c:1108 5 (inlined by) main at page_fault_test.c:1134 6 0x0000ffff868e503b: ?? ??:0 7 0x0000ffff868e5113: ?? ??:0 8 0x0000000000401aaf: _start at ??:? guest_check_lse() because we don't have a guest_prepare stage to check the presence of FEAT_LSE and skip the related guest_cas testing, and we end-up failing in GUEST_ASSERT(guest_check_lse()). Add the missing .guest_prepare() where it's indeed required. Signed-off-by: Zenghui Yu <yuzenghui@huawei.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231007124043.626-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-30KVM: arm64: Always invalidate TLB for stage-2 permission faultsOliver Upton
It is possible for multiple vCPUs to fault on the same IPA and attempt to resolve the fault. One of the page table walks will actually update the PTE and the rest will return -EAGAIN per our race detection scheme. KVM elides the TLB invalidation on the racing threads as the return value is nonzero. Before commit a12ab1378a88 ("KVM: arm64: Use local TLBI on permission relaxation") KVM always used broadcast TLB invalidations when handling permission faults, which had the convenient property of making the stage-2 updates visible to all CPUs in the system. However now we do a local invalidation, and TLBI elision leads to the vCPU thread faulting again on the stale entry. Remember that the architecture permits the TLB to cache translations that precipitate a permission fault. Invalidate the TLB entry responsible for the permission fault if the stage-2 descriptor has been relaxed, regardless of which thread actually did the job. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230922223229.1608155-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25KVM: arm64: Handle AArch32 SPSR_{irq,abt,und,fiq} as RAZ/WIMarc Zyngier
When trapping accesses from a NV guest that tries to access SPSR_{irq,abt,und,fiq}, make sure we handle them as RAZ/WI, as if AArch32 wasn't implemented. This involves a bit of repainting to make the visibility handler more generic. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-6-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25KVM: arm64: Do not let a L1 hypervisor access the *32_EL2 sysregsMarc Zyngier
DBGVCR32_EL2, DACR32_EL2, IFSR32_EL2 and FPEXC32_EL2 are required to UNDEF when AArch32 isn't implemented, which is definitely the case when running NV. Given that this is the only case where these registers can trap, unconditionally inject an UNDEF exception. Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20231023095444.1587322-5-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25KVM: arm64: Refine _EL2 system register list that require trap reinjectionMiguel Luis
Implement a fine grained approach in the _EL2 sysreg range instead of the current wide cast trap. This ensures that we don't mistakenly inject the wrong exception into the guest. [maz: commit message massaging, dropped secure and AArch32 registers from the list] Fixes: d0fc0a2519a6 ("KVM: arm64: nv: Add trap forwarding for HCR_EL2") Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-4-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25arm64: Add missing _EL2 encodingsMiguel Luis
Some _EL2 encodings are missing. Add them. Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> [maz: dropped secure encodings] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-25arm64: Add missing _EL12 encodingsMiguel Luis
Some _EL12 encodings are missing. Add them. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Miguel Luis <miguel.luis@oracle.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231023095444.1587322-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Add PMU event filter bits required if EL3 is implementedOliver Upton
Suzuki noticed that KVM's PMU emulation is oblivious to the NSU and NSK event filter bits. On systems that have EL3 these bits modify the filter behavior in non-secure EL0 and EL1, respectively. Even though the kernel doesn't use these bits, it is entirely possible some other guest OS does. Additionally, it would appear that these and the M bit are required by the architecture if EL3 is implemented. Allow the EL3 event filter bits to be set if EL3 is advertised in the guest's ID register. Implement the behavior of NSU and NSK according to the pseudocode, and entirely ignore the M bit for perf event creation. Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20231019185618.3442949-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-24KVM: arm64: Make PMEVTYPER<n>_EL0.NSH RES0 if EL2 isn't advertisedOliver Upton
The NSH bit, which filters event counting at EL2, is required by the architecture if an implementation has EL2. Even though KVM doesn't support nested virt yet, it makes no effort to hide the existence of EL2 from the ID registers. Userspace can, however, change the value of PFR0 to hide EL2. Align KVM's sysreg emulation with the architecture and make NSH RES0 if EL2 isn't advertised. Keep in mind the bit is ignored when constructing the backing perf event. While at it, build the event type mask using explicit field definitions instead of relying on ARMV8_PMU_EVTYPE_MASK. KVM probably should've been doing this in the first place, as it avoids changes to the aforementioned mask affecting sysreg emulation. Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20231019185618.3442949-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-23KVM: arm64: Move VTCR_EL2 into struct s2_mmuMarc Zyngier
We currently have a global VTCR_EL2 value for each guest, even if the guest uses NV. This implies that the guest's own S2 must fit in the host's. This is odd, for multiple reasons: - the PARange values and the number of IPA bits don't necessarily match: you can have 33 bits of IPA space, and yet you can only describe 32 or 36 bits of PARange - When userspace set the IPA space, it creates a contract with the kernel saying "this is the IPA space I'm prepared to handle". At no point does it constraint the guest's own IPA space as long as the guest doesn't try to use a [I]PA outside of the IPA space set by userspace - We don't even try to hide the value of ID_AA64MMFR0_EL1.PARange. And then there is the consequence of the above: if a guest tries to create a S2 that has for input address something that is larger than the IPA space defined by the host, we inject a fatal exception. This is no good. For all intent and purposes, a guest should be able to have the S2 it really wants, as long as the *output* address of that S2 isn't outside of the IPA space. For that, we need to have a per-s2_mmu VTCR_EL2 setting, which allows us to represent the full PARange. Move the vctr field into the s2_mmu structure, which has no impact whatsoever, except for NV. Note that once we are able to override ID_AA64MMFR0_EL1.PARange from userspace, we'll also be able to restrict the size of the shadow S2 that NV uses. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231012205108.3937270-1-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Load the stage-2 MMU context in kvm_vcpu_load_vhe()Oliver Upton
To date the VHE code has aggressively reloaded the stage-2 MMU context on every guest entry, despite the fact that this isn't necessary. This was probably done for consistency with the nVHE code, which needs to switch in/out the stage-2 MMU context as both the host and guest run at EL1. Hoist __load_stage2() into kvm_vcpu_load_vhe(), thus avoiding a reload on every guest entry/exit. This is likely to be beneficial to systems with one of the speculative AT errata, as there is now one fewer context synchronization event on the guest entry path. Additionally, it is possible that implementations have hitched correctness mitigations on writes to VTTBR_EL2, which are now elided on guest re-entry. Note that __tlb_switch_to_guest() is deliberately left untouched as it can be called outside the context of a running vCPU. Link: https://lore.kernel.org/r/20231018233212.2888027-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Rename helpers for VHE vCPU load/putOliver Upton
The names for the helpers we expose to the 'generic' KVM code are a bit imprecise; we switch the EL0 + EL1 sysreg context and setup trap controls that do not need to change for every guest entry/exit. Rename + shuffle things around a bit in preparation for loading the stage-2 MMU context on vcpu_load(). Link: https://lore.kernel.org/r/20231018233212.2888027-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Reload stage-2 for VMID change on VHEMarc Zyngier
Naturally, a change to the VMID for an MMU implies a new value for VTTBR. Reload on VMID change in anticipation of loading stage-2 on vcpu_load() instead of every guest entry. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231018233212.2888027-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Restore the stage-2 context in VHE's __tlb_switch_to_host()Marc Zyngier
An MMU notifier could cause us to clobber the stage-2 context loaded on a CPU when we switch to another VM's context to invalidate. This isn't an issue right now as the stage-2 context gets reloaded on every guest entry, but is disastrous when moving __load_stage2() into the vcpu_load() path. Restore the previous stage-2 context on the way out of a TLB invalidation if we installed something else. Deliberately do this after TGE=1 is synchronized to keep things safe in light of the speculative AT errata. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231018233212.2888027-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-20KVM: arm64: Don't zero VTTBR in __tlb_switch_to_host()Oliver Upton
HCR_EL2.TGE=0 is sufficient to disable stage-2 translation, so there's no need to explicitly zero VTTBR_EL2. Link: https://lore.kernel.org/r/20231018233212.2888027-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18KVM: arm64: selftests: Test for setting ID register from usersapceJing Zhang
Add tests to verify setting ID registers from userspace is handled correctly by KVM. Also add a test case to use ioctl KVM_ARM_GET_REG_WRITABLE_MASKS to get writable masks. Signed-off-by: Jing Zhang <jingzhangos@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231011195740.3349631-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18tools headers arm64: Update sysreg.h with kernel sourcesJing Zhang
The users of sysreg.h (perf, KVM selftests) are now generating the necessary sysreg-defs.h; sync sysreg.h with the kernel sources and fix the KVM selftests that use macros which suffered a rename. Signed-off-by: Jing Zhang <jingzhangos@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231011195740.3349631-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18KVM: selftests: Generate sysreg-defs.h and add to include pathOliver Upton
Start generating sysreg-defs.h for arm64 builds in anticipation of updating sysreg.h to a version that depends on it. Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231011195740.3349631-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18perf build: Generate arm64's sysreg-defs.h and add to include pathOliver Upton
Start generating sysreg-defs.h in anticipation of updating sysreg.h to a version that needs the generated output. Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231011195740.3349631-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-18tools: arm64: Add a Makefile for generating sysreg-defs.hOliver Upton
Use a common Makefile for generating sysreg-defs.h, which will soon be needed by perf and KVM selftests. The naming scheme of the generated macros is not expected to change, so just refer to the canonical script/data in the kernel source rather than copying to tools. Co-developed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231011195740.3349631-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-09KVM: arm64: Expose MOPS instructions to guestsKristina Martsenko
Expose the Armv8.8 FEAT_MOPS feature to guests in the ID register and allow the MOPS instructions to be run in a guest. Only expose MOPS if the whole system supports it. Note, it is expected that guests do not use these instructions on MMIO, similarly to other instructions where ESR_EL2.ISV==0 such as LDP/STP. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230922112508.1774352-3-kristina.martsenko@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-09KVM: arm64: Add handler for MOPS exceptionsKristina Martsenko
An Armv8.8 FEAT_MOPS main or epilogue instruction will take an exception if executed on a CPU with a different MOPS implementation option (A or B) than the CPU where the preceding prologue instruction ran. In this case the OS exception handler is expected to reset the registers and restart execution from the prologue instruction. A KVM guest may use the instructions at EL1 at times when the guest is not able to handle the exception, expecting that the instructions will only run on one CPU (e.g. when running UEFI boot services in the guest). As KVM may reschedule the guest between different types of CPUs at any time (on an asymmetric system), it needs to also handle the resulting exception itself in case the guest is not able to. A similar situation will also occur in the future when live migrating a guest from one type of CPU to another. Add handling for the MOPS exception to KVM. The handling can be shared with the EL0 exception handler, as the logic and register layouts are the same. The exception can be handled right after exiting a guest, which avoids the cost of returning to the host exit handler. Similarly to the EL0 exception handler, in case the main or epilogue instruction is being single stepped, it makes sense to finish the step before executing the prologue instruction, so advance the single step state machine. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230922112508.1774352-2-kristina.martsenko@arm.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-05KVM: arm64: Use mtree_empty() to determine if SMCCC filter configuredOliver Upton
The smccc_filter maple tree is only populated if userspace attempted to configure it. Use the state of the maple tree to determine if the filter has been configured, eliminating the VM flag. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231004234947.207507-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-05KVM: arm64: Only insert reserved ranges when SMCCC filter is usedOliver Upton
The reserved ranges are only useful for preventing userspace from adding a rule that intersects with functions we must handle in KVM. If userspace never writes to the SMCCC filter than this is all just wasted work/memory. Insert reserved ranges on the first call to KVM_ARM_VM_SMCCC_FILTER. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231004234947.207507-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-05KVM: arm64: Add a predicate for testing if SMCCC filter is configuredOliver Upton
Eventually we can drop the VM flag, move around the existing implementation for now. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231004234947.207507-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Document vCPU feature selection UAPIsOliver Upton
KVM/arm64 has a couple schemes for handling vCPU feature selection now, which is a lot to put on userspace. Add some documentation about how these interact and provide some recommendations for how to use the writable ID register scheme. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-11-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Allow userspace to change ID_AA64ZFR0_EL1Oliver Upton
All known fields in ID_AA64ZFR0_EL1 describe the unprivileged instructions supported by the PE's SVE implementation. Allow userspace to pick and choose the advertised feature set, though nothing stops the guest from using undisclosed instructions. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Allow userspace to change ID_AA64PFR0_EL1Jing Zhang
Allow userspace to change the guest-visible value of the register with some severe limitation: - No changes to features not virtualized by KVM (AMU, MPAM, RAS) - Short of full GICv2 emulation in kernel, hiding GICv3 from the guest makes absolutely no sense. - FP is effectively assumed for KVM VMs. Signed-off-by: Jing Zhang <jingzhangos@google.com> [oliver: restrict features that are illogical to change] Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-9-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Allow userspace to change ID_AA64MMFR{0-2}_EL1Jing Zhang
Allow userspace to modify the guest-visible values of these ID registers. Prevent changes to any of the virtualization features until KVM picks up support for nested and we have a handle on managing NV features. Signed-off-by: Jing Zhang <jingzhangos@google.com> [oliver: prevent changes to EL2 features for now] Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-8-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Allow userspace to change ID_AA64ISAR{0-2}_EL1Oliver Upton
Almost all of the features described by the ISA registers have no KVM involvement. Allow userspace to change the value of these registers with a couple exceptions: - MOPS is not writable as KVM does not currently virtualize FEAT_MOPS. - The PAuth fields are not writable as KVM requires both address and generic authentication be enabled. Co-developed-by: Jing Zhang <jingzhangos@google.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-7-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Bump up the default KVM sanitised debug version to v8p8Oliver Upton
Since ID_AA64DFR0_EL1 and ID_DFR0_EL1 are now writable from userspace, it is safe to bump up the default KVM sanitised debug version to v8p8. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Reject attempts to set invalid debug arch versionOliver Upton
The debug architecture is mandatory in ARMv8, so KVM should not allow userspace to configure a vCPU with less than that. Of course, this isn't handled elegantly by the generic ID register plumbing, as the respective ID register fields have a nonzero starting value. Add an explicit check for debug versions less than v8 of the architecture. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-5-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Advertise selected DebugVer in DBGDIDR.VersionOliver Upton
Much like we do for other fields, extract the Debug architecture version from the ID register to populate the corresponding field in DBGDIDR. Rewrite the existing sysreg field extractors to use SYS_FIELD_GET() for consistency. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Use guest ID register values for the sake of emulationJing Zhang
Since KVM now supports per-VM ID registers, use per-VM ID register values for the sake of emulation for DBGDIDR and LORegion. Signed-off-by: Jing Zhang <jingzhangos@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-4-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Document KVM_ARM_GET_REG_WRITABLE_MASKSJing Zhang
Add some basic documentation on how to get feature ID register writable masks from userspace. Signed-off-by: Jing Zhang <jingzhangos@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-3-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-10-04KVM: arm64: Allow userspace to get the writable masks for feature ID registersJing Zhang
While the Feature ID range is well defined and pretty large, it isn't inconceivable that the architecture will eventually grow some other ranges that will need to similarly be described to userspace. Add a VM ioctl to allow userspace to get writable masks for feature ID registers in below system register space: op0 = 3, op1 = {0, 1, 3}, CRn = 0, CRm = {0 - 7}, op2 = {0 - 7} This is used to support mix-and-match userspace and kernels for writable ID registers, where userspace may want to know upfront whether it can actually tweak the contents of an idreg or not. Add a new capability (KVM_CAP_ARM_SUPPORTED_FEATURE_ID_RANGES) that returns a bitmap of the valid ranges, which can subsequently be retrieved, one at a time by setting the index of the set bit as the range identifier. Suggested-by: Marc Zyngier <maz@kernel.org> Suggested-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Jing Zhang <jingzhangos@google.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20231003230408.3405722-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-09-30KVM: arm64: Clarify the ordering requirements for vcpu/RD creationMarc Zyngier
It goes without saying, but it is probably better to spell it out: If userspace tries to restore and VM, but creates vcpus and/or RDs in a different order, the vcpu/RD mapping will be different. Yes, our API is an ugly piece of crap and I can't believe that we missed this. If we want to relax the above, we'll need to define a new userspace API that allows the mapping to be specified, rather than relying on the kernel to perform the mapping on its own. Reviewed-by: Zenghui Yu <yuzenghui@huawei.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230927090911.3355209-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>