Age | Commit message (Collapse) | Author |
|
Guard them with CONFIG_KVM_X86_COMMON rather than the two vendor modules.
In practice this has no functional change, because CONFIG_KVM_X86_COMMON
is set if and only if at least one vendor-specific module is being built.
However, it is cleaner to specify CONFIG_KVM_X86_COMMON for functions that
are used in kvm.ko.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Fixes: 6d55a94222db ("x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
kvm.ko is nothing but library code shared by kvm-intel.ko and kvm-amd.ko.
It provides no functionality on its own and it is unnecessary unless one
of the vendor-specific module is compiled. In particular, /dev/kvm is
not created until one of kvm-intel.ko or kvm-amd.ko is loaded.
Use CONFIG_KVM to decide if it is built-in or a module, but use the
vendor-specific modules for the actual decision on whether to build it.
This also fixes a build failure when CONFIG_KVM_INTEL and CONFIG_KVM_AMD
are both disabled. The cpu_emergency_register_virt_callback() function
is called from kvm.ko, but it is only defined if at least one of
CONFIG_KVM_INTEL and CONFIG_KVM_AMD is provided.
Fixes: 590b09b1d88e ("KVM: x86: Register "emergency disable" callbacks when virt is enabled")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As was tried in commit 4e103134b862 ("KVM: x86/mmu: Zap only the relevant
pages when removing a memslot"), all shadow pages, i.e. non-leaf SPTEs,
need to be zapped. All of the accounting for a shadow page is tied to the
memslot, i.e. the shadow page holds a reference to the memslot, for all
intents and purposes. Deleting the memslot without removing all relevant
shadow pages, as is done when KVM_X86_QUIRK_SLOT_ZAP_ALL is disabled,
results in NULL pointer derefs when tearing down the VM.
Reintroduce from that commit the code that walks the whole memslot when
there are active shadow MMU pages.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The recent addition of support for testing with the x86 specific quirk
KVM_X86_QUIRK_SLOT_ZAP_ALL disabled in the generic memslot tests broke the
build of the KVM selftests for all other architectures:
In file included from include/kvm_util.h:8,
from include/memstress.h:13,
from memslot_modification_stress_test.c:21:
memslot_modification_stress_test.c: In function ‘main’:
memslot_modification_stress_test.c:176:38: error: ‘KVM_X86_QUIRK_SLOT_ZAP_ALL’ undeclared (first use in this function)
176 | KVM_X86_QUIRK_SLOT_ZAP_ALL);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
Add __x86_64__ guard defines to avoid building the relevant code on other
architectures.
Fixes: 61de4c34b51c ("KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled")
Fixes: 218f6415004a ("KVM: selftests: Allow slot modification stress test with quirk disabled")
Reported-by: Aishwarya TCV <aishwarya.tcv@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Message-ID: <20240930-kvm-build-breakage-v1-1-866fad3cc164@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The warning
Documentation/virt/kvm/locking.rst:31: ERROR: Unexpected indentation.
is caused by incorrectly treating a line as the continuation of a paragraph,
rather than as the first line in a bullet list.
Fixed: 44d174596260 ("KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
KVM VMX changes for 6.12:
- Set FINAL/PAGE in the page fault error code for EPT Violations if and only
if the GVA is valid. If the GVA is NOT valid, there is no guest-side page
table walk and so stuffing paging related metadata is nonsensical.
- Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of
emulating posted interrupt delivery to L2.
- Add a lockdep assertion to detect unsafe accesses of vmcs12 structures.
- Harden eVMCS loading against an impossible NULL pointer deref (really truly
should be impossible).
- Minor SGX fix and a cleanup.
|
|
KVM SVM changes for 6.12:
- Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled,
i.e. when the CPU has already flushed the RSB.
- Trace the per-CPU host save area as a VMCB pointer to improve readability
and cleanup the retrieval of the SEV-ES host save area.
- Remove unnecessary accounting of temporary nested VMCB related allocations.
|
|
into HEAD
KVM VMX and x86 PAT MSR macro cleanup for 6.12:
- Add common defines for the x86 architectural memory types, i.e. the types
that are shared across PAT, MTRRs, VMCSes, and EPTPs.
- Clean up the various VMX MSR macros to make the code self-documenting
(inasmuch as possible), and to make it less painful to add new macros.
|
|
KVM x86 MMU changes for 6.12:
- Overhaul the "unprotect and retry" logic to more precisely identify cases
where retrying is actually helpful, and to harden all retry paths against
putting the guest into an infinite retry loop.
- Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in
the shadow MMU.
- Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for
adding MGLRU support in KVM.
- Misc cleanups
|
|
KVM selftests changes for 6.12:
- Fix a goof that caused some Hyper-V tests to be skipped when run on bare
metal, i.e. NOT in a VM.
- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest.
- Explicitly include one-off assets in .gitignore. Past Sean was completely
wrong about not being able to detect missing .gitignore entries.
- Verify userspace single-stepping works when KVM happens to handle a VM-Exit
in its fastpath.
- Misc cleanups
|
|
KVM x86 misc changes for 6.12
- Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10
functionality that is on the horizon).
- Rework common MSR handling code to suppress errors on userspace accesses to
unsupported-but-advertised MSRs. This will allow removing (almost?) all of
KVM's exemptions for userspace access to MSRs that shouldn't exist based on
the vCPU model (the actual cleanup is non-trivial future work).
- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the
64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv)
stores the entire 64-bit value a the ICR offset.
- Fix a bug where KVM would fail to exit to userspace if one was triggered by
a fastpath exit handler.
- Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when
there's already a pending wake event at the time of the exit.
- Finally fix the RSM vs. nested VM-Enter WARN by forcing the vCPU out of
guest mode prior to signalling SHUTDOWN (architecturally, the SHUTDOWN is
supposed to hit L1, not L2).
|
|
KVK generic changes for 6.12:
- Fix a bug that results in KVM prematurely exiting to userspace for coalesced
MMIO/PIO in many cases, clean up the related code, and add a testcase.
- Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
the gpa+len crosses a page boundary, which thankfully is guaranteed to not
happen in the current code base. Add WARNs in more helpers that read/write
guest memory to detect similar bugs.
|
|
Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
virtualization as needed.
The primary motivation for this series is to simplify dealing with enabling
virtualization for Intel's TDX, which needs to enable virtualization
when kvm-intel.ko is loaded, i.e. long before the first VM is created.
That said, this is a nice cleanup on its own. By registering the callbacks
on-demand, the callbacks themselves don't need to check kvm_usage_count,
because their very existence implies a non-zero count.
Patch 1 (re)adds a dedicated lock for kvm_usage_count. This avoids a
lock ordering issue between cpus_read_lock() and kvm_lock. The lock
ordering issue still exist in very rare cases, and will be fixed for
good by switching vm_list to an (S)RCU-protected list.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Today whenever a memslot is moved or deleted, KVM invalidates the entire
page tables and generates fresh ones based on the new memslot layout.
This behavior traditionally was kept because of a bug which was never
fully investigated and caused VM instability with assigned GeForce
GPUs. It generally does not have a huge overhead, because the old
MMU is able to reuse cached page tables and the new one is more
scalabale and can resolve EPT violations/nested page faults in parallel,
but it has worse performance if the guest frequently deletes and
adds small memslots, and it's entirely not viable for TDX. This is
because TDX requires re-accepting of private pages after page dropping.
For non-TDX VMs, this series therefore introduces the
KVM_X86_QUIRK_SLOT_ZAP_ALL quirk, enabling users to control the behavior
of memslot zapping when a memslot is moved/deleted. The quirk is turned
on by default, leading to the zapping of all SPTEs when a memslot is
moved/deleted; users however have the option to turn off the quirk,
which limits the zapping only to those SPTEs hat lie within the range
of memslot being moved/deleted.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
* New ucontrol selftest
* Inline assembly touchups
|
|
To simplify testing enable UCONTROL KVM by default in debug kernels.
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20240807154512.316936-11-schlameuss@linux.ibm.com
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20240807154512.316936-11-schlameuss@linux.ibm.com>
|
|
Add test case running code interacting with registers within a
ucontrol VM.
* Add uc_gprs test case
The test uses the same VM setup using the fixture and debug macros
introduced in earlier patches in this series.
Signed-off-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20240807154512.316936-7-schlameuss@linux.ibm.com
[frankja@linux.ibm.com: Removed leftover comment line]
Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
Message-ID: <20240807154512.316936-7-schlameuss@linux.ibm.com>
|
|
KVM/riscv changes for 6.12
- Fix sbiret init before forwarding to userspace
- Don't zero-out PMU snapshot area before freeing data
- Allow legacy PMU access from guest
- Fix to allow hpmcounter31 from the guest
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.12
1. Revert qspinlock to test-and-set simple lock on VM.
2. Add Loongson Binary Translation extension support.
3. Add PMU support for guest.
4. Enable paravirt feature control from VMM.
5. Implement function kvm_para_has_feature().
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.12
* New features:
- Add a Stage-2 page table dumper, reusing the main ptdump
infrastructure, and allowing easier debugging of the our
page-table infrastructure
- Add FP8 support to the KVM/arm64 floating point handling.
- Add NV support for the AT family of instructions, which mostly
results in adding a page table walker that deals with most of the
complexity of the architecture.
* Improvements, fixes and cleanups:
- Add selftest checks for a bunch of timer emulation corner cases
- Fix the multiple of cases where KVM/arm64 doesn't correctly handle
the guest trying to use a GICv3 that isn't advertised
- Remove REG_HIDDEN_USER from the sysreg infrastructure, making
things little more simple
- Prevent MTE tags being restored by userspace if we are actively
logging writes, as that's a recipe for disaster
- Correct the refcount on a page that is not considered for MTE tag
copying (such as a device)
- Relax the synchronisation when walking a page table to split block
mappings, moving it at the end the walk, as there is no need to
perform it on every store.
- Fix boundary check when transfering memory using FFA
- Fix pKVM TLB invalidation, only affecting currently out of tree
code but worth addressing for peace of mind
|
|
Implement function kvm_para_has_feature() to detect supported paravirt
features. It can be used by device driver to detect and enable paravirt
features, such as the EIOINTC irqchip driver is able to detect feature
KVM_FEATURE_VIRT_EXTIOI and do some optimization.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Export kernel paravirt features to user space, so that VMM can control
each single paravirt feature. By default paravirt features will be the
same with kvm supported features if VMM does not set it.
Also a new feature KVM_FEATURE_VIRT_EXTIOI is added which can be set
from user space. This feature indicates that the virt EIOINTC can route
interrupts to 256 vCPUs, rather than 4 vCPUs like with real HW.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
On LoongArch, the host and guest have their own PMU CSRs registers and
they share PMU hardware resources. A set of PMU CSRs consists of a CTRL
register and a CNTR register. We can set which PMU CSRs are used by the
guest by writing to the GCFG register [24:26] bits.
On KVM side:
- Save the host PMU CSRs into structure kvm_context.
- If the host supports the PMU feature.
- When entering guest mode, save the host PMU CSRs and restore the guest PMU CSRs.
- When exiting guest mode, save the guest PMU CSRs and restore the host PMU CSRs.
Reviewed-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Song Gao <gaosong@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
* kvm-arm64/visibility-cleanups:
: .
: Remove REG_HIDDEN_USER from the sysreg infrastructure, making things
: a little more simple. From the cover letter:
:
: "Since 4d4f52052ba8 ("KVM: arm64: nv: Drop EL12 register traps that are
: redirected to VNCR") and the admission that KVM would never be supporting
: the original FEAT_NV, REG_HIDDEN_USER only had a few users, all of which
: could either be replaced by a more ad-hoc mechanism, or removed altogether."
: .
KVM: arm64: Get rid of REG_HIDDEN_USER visibility qualifier
KVM: arm64: Simplify visibility handling of AArch32 SPSR_*
KVM: arm64: Simplify handling of CNTKCTL_EL12
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/s2-ptdump:
: .
: Stage-2 page table dumper, reusing the main ptdump infrastructure,
: courtesy of Sebastian Ene. From the cover letter:
:
: "This series extends the ptdump support to allow dumping the guest
: stage-2 pagetables. When CONFIG_PTDUMP_STAGE2_DEBUGFS is enabled, ptdump
: registers the new following files under debugfs:
: - /sys/debug/kvm/<guest_id>/stage2_page_tables
: - /sys/debug/kvm/<guest_id>/stage2_levels
: - /sys/debug/kvm/<guest_id>/ipa_range
:
: This allows userspace tools (eg. cat) to dump the stage-2 pagetables by
: reading the 'stage2_page_tables' file.
: [...]"
: .
KVM: arm64: Register ptdump with debugfs on guest creation
arm64: ptdump: Don't override the level when operating on the stage-2 tables
arm64: ptdump: Use the ptdump description from a local context
arm64: ptdump: Expose the attribute parsing functionality
KVM: arm64: Move pagetable definitions to common header
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/nv-at-pan:
: .
: Add NV support for the AT family of instructions, which mostly results
: in adding a page table walker that deals with most of the complexity
: of the architecture.
:
: From the cover letter:
:
: "Another task that a hypervisor supporting NV on arm64 has to deal with
: is to emulate the AT instruction, because we multiplex all the S1
: translations on a single set of registers, and the guest S2 is never
: truly resident on the CPU.
:
: So given that we lie about page tables, we also have to lie about
: translation instructions, hence the emulation. Things are made
: complicated by the fact that guest S1 page tables can be swapped out,
: and that our shadow S2 is likely to be incomplete. So while using AT
: to emulate AT is tempting (and useful), it is not going to always
: work, and we thus need a fallback in the shape of a SW S1 walker."
: .
KVM: arm64: nv: Add support for FEAT_ATS1A
KVM: arm64: nv: Plumb handling of AT S1* traps from EL2
KVM: arm64: nv: Make AT+PAN instructions aware of FEAT_PAN3
KVM: arm64: nv: Sanitise SCTLR_EL1.EPAN according to VM configuration
KVM: arm64: nv: Add SW walker for AT S1 emulation
KVM: arm64: nv: Make ps_to_output_size() generally available
KVM: arm64: nv: Add emulation of AT S12E{0,1}{R,W}
KVM: arm64: nv: Add basic emulation of AT S1E2{R,W}
KVM: arm64: nv: Add basic emulation of AT S1E1{R,W}P
KVM: arm64: nv: Add basic emulation of AT S1E{0,1}{R,W}
KVM: arm64: nv: Honor absence of FEAT_PAN2
KVM: arm64: nv: Turn upper_attr for S2 walk into the full descriptor
KVM: arm64: nv: Enforce S2 alignment when contiguous bit is set
arm64: Add ESR_ELx_FSC_ADDRSZ_L() helper
arm64: Add system register encoding for PSTATE.PAN
arm64: Add PAR_EL1 field description
arm64: Add missing APTable and TCR_ELx.HPD masks
KVM: arm64: Make kvm_at() take an OP_AT_*
Signed-off-by: Marc Zyngier <maz@kernel.org>
# Conflicts:
# arch/arm64/kvm/nested.c
|
|
* kvm-arm64/selftests-6.12:
: .
: KVM/arm64 selftest updates for 6.12
:
: - Check for a bunch of timer emulation corner cases (COlton Lewis)
: .
KVM: arm64: selftests: Add arch_timer_edge_cases selftest
KVM: arm64: selftests: Ensure pending interrupts are handled in arch_timer test
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/vgic-sre-traps:
: .
: Fix the multiple of cases where KVM/arm64 doesn't correctly
: handle the guest trying to use a GICv3 that isn't advertised.
:
: From the cover letter:
:
: "It recently appeared that, when running on a GICv3-equipped platform
: (which is what non-ancient arm64 HW has), *not* configuring a GICv3
: for the guest could result in less than desirable outcomes.
:
: We have multiple issues to fix:
:
: - for registers that *always* trap (the SGI registers) or that *may*
: trap (the SRE register), we need to check whether a GICv3 has been
: instantiated before acting upon the trap.
:
: - for registers that only conditionally trap, we must actively trap
: them even in the absence of a GICv3 being instantiated, and handle
: those traps accordingly.
:
: - finally, ID registers must reflect the absence of a GICv3, so that
: we are consistent.
:
: This series goes through all these requirements. The main complexity
: here is to apply a GICv3 configuration on the host in the absence of a
: GICv3 in the guest. This is pretty hackish, but I don't have a much
: better solution so far.
:
: As part of making wider use of of the trap bits, we fully define the
: trap routing as per the architecture, something that we eventually
: need for NV anyway."
: .
KVM: arm64: selftests: Cope with lack of GICv3 in set_id_regs
KVM: arm64: Add selftest checking how the absence of GICv3 is handled
KVM: arm64: Unify UNDEF injection helpers
KVM: arm64: Make most GICv3 accesses UNDEF if they trap
KVM: arm64: Honor guest requested traps in GICv3 emulation
KVM: arm64: Add trap routing information for ICH_HCR_EL2
KVM: arm64: Add ICH_HCR_EL2 to the vcpu state
KVM: arm64: Zero ID_AA64PFR0_EL1.GIC when no GICv3 is presented to the guest
KVM: arm64: Add helper for last ditch idreg adjustments
KVM: arm64: Force GICv3 trap activation when no irqchip is configured on VHE
KVM: arm64: Force SRE traps when SRE access is not enabled
KVM: arm64: Move GICv3 trap configuration to kvm_calculate_traps()
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/fpmr:
: .
: Add FP8 support to the KVM/arm64 floating point handling.
:
: This includes new ID registers (ID_AA64PFR2_EL1 ID_AA64FPFR0_EL1)
: being made visible to guests, as well as a new confrol register
: (FPMR) which gets context-switched.
: .
KVM: arm64: Expose ID_AA64PFR2_EL1 to userspace and guests
KVM: arm64: Enable FP8 support when available and configured
KVM: arm64: Expose ID_AA64FPFR0_EL1 as a writable ID reg
KVM: arm64: Honor trap routing for FPMR
KVM: arm64: Add save/restore support for FPMR
KVM: arm64: Move FPMR into the sysreg array
KVM: arm64: Add predicate for FPMR support in a VM
KVM: arm64: Move SVCR into the sysreg array
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
* kvm-arm64/mmu-misc-6.12:
: .
: Various minor MMU improvements and bug-fixes:
:
: - Prevent MTE tags being restored by userspace if we are actively
: logging writes, as that's a recipe for disaster
:
: - Correct the refcount on a page that is not considered for MTE
: tag copying (such as a device)
:
: - When walking a page table to split blocks, keep the DSB at the end
: the walk, as there is no need to perform it on every store.
:
: - Fix boundary check when transfering memory using FFA
: .
KVM: arm64: Add memory length checks and remove inline in do_ffa_mem_xfer
KVM: arm64: Disallow copying MTE to guest memory while KVM is dirty logging
KVM: arm64: Release pfn, i.e. put page, if copying MTE tags hits ZONE_DEVICE
KVM: arm64: Move data barrier to end of split walk
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Now that REG_HIDDEN_USER has no direct user anymore, remove it
entirely and update all users of sysreg_hidden_user() to call
sysreg_hidden() instead.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240904082419.1982402-4-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Since SPSR_* are not associated with any register in the sysreg array,
nor do they have .get_user()/.set_user() helpers, they are invisible to
userspace with that encoding.
Therefore hidden_user_visibility() serves no purpose here, and can be
safely removed.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240904082419.1982402-3-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
We go trough a great deal of effort to map CNTKCTL_EL12 to CNTKCTL_EL1
while hidding this mapping from userspace via a special visibility helper.
However, it would be far simpler to just provide an accessor doing the
mapping job, removing the need for a visibility helper.
With that done, we can also remove the EL12_REG() macro which serves
no purpose.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240904082419.1982402-2-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Every vcpu has separate LBT registers. And there are four scr registers,
one flags and ftop register for LBT extension. When VM migrates, VMM
needs to get LBT registers for every vcpu.
Here macro KVM_REG_LOONGARCH_LBT is added for new vcpu lbt register type,
the following macro is added to get/put LBT registers.
KVM_REG_LOONGARCH_LBT_SCR0
KVM_REG_LOONGARCH_LBT_SCR1
KVM_REG_LOONGARCH_LBT_SCR2
KVM_REG_LOONGARCH_LBT_SCR3
KVM_REG_LOONGARCH_LBT_EFLAGS
KVM_REG_LOONGARCH_LBT_FTOP
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Loongson Binary Translation (LBT) is used to accelerate binary translation,
which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags)
and x87 fpu stack pointer (ftop).
Like FPU extension, here a lazy enabling method is used for LBT. the LBT
context is saved/restored on the vcpu context switch path.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Loongson SIMD Extension (LSX), Loongson Advanced SIMD Extension (LASX)
and Loongson Binary Translation (LBT) features are defined in register
CPUCFG2. Two kinds of LSX/LASX/LBT feature detection are added here, one
is VCPU feature, and the other is VM feature. VCPU feature dection can
only work with VCPU thread itself, and requires VCPU thread is created
already. So LSX/LASX/LBT feature detection for VM is added also, it can
be done even if VM is not created, and also can be done by any threads
besides VCPU threads.
Here ioctl command KVM_HAS_DEVICE_ATTR is added for VM, and macro
KVM_LOONGARCH_VM_FEAT_CTRL is added to check supported feature. And
five sub-features relative with LSX/LASX/LBT are added as following:
KVM_LOONGARCH_VM_FEAT_LSX
KVM_LOONGARCH_VM_FEAT_LASX
KVM_LOONGARCH_VM_FEAT_X86BT
KVM_LOONGARCH_VM_FEAT_ARMBT
KVM_LOONGARCH_VM_FEAT_MIPSBT
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Similar with x86, when VM is detected, revert to a simple test-and-set
lock to avoid the horrors of queue preemption.
Tested on 3C5000 Dual-way machine with 32 cores and 2 numa nodes,
test case is kcbench on kernel mainline 6.10, the detailed command is
"kcbench --src /root/src/linux"
Performance on host machine
kernel compile time performance impact
Original 150.29 seconds
With patch 150.19 seconds almost no impact
Performance on virtual machine:
1. 1 VM with 32 vCPUs and 2 numa node, numa node pinned
kernel compile time performance impact
Original 170.87 seconds
With patch 171.73 seconds almost no impact
2. 2 VMs, each VM with 32 vCPUs and 2 numa node, numa node pinned
kernel compile time performance impact
Original 2362.04 seconds
With patch 354.73 seconds +565%
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
While arch/*/mem/ptdump handles the kernel pagetable dumping code,
introduce KVM/ptdump to show the guest stage-2 pagetables. The
separation is necessary because most of the definitions from the
stage-2 pagetable reside in the KVM path and we will be invoking
functionality specific to KVM. Introduce the PTDUMP_STAGE2_DEBUGFS config.
When a guest is created, register a new file entry under the guest
debugfs dir which allows userspace to show the contents of the guest
stage-2 pagetables when accessed.
[maz: moved function prototypes from kvm_host.h to kvm_mmu.h]
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/r/20240909124721.1672199-6-sebastianene@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Ptdump uses the init_mm structure directly to dump the kernel
pagetables. When ptdump is called on the stage-2 pagetables, this mm
argument is not used. Prevent the level from being overwritten by
checking the argument against NULL.
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240909124721.1672199-5-sebastianene@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Rename the attributes description array to allow the parsing method
to use the description from a local context. To be able to do this,
store a pointer to the description array in the state structure. This
will allow for the later introduced callers (stage_2 ptdump) to specify
their own page table description format to the ptdump parser.
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240909124721.1672199-4-sebastianene@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Reuse the descriptor parsing functionality to keep the same output format
as the original ptdump code. In order for this to happen, move the state
tracking objects into a common header.
[maz: Fixed note_page() stub as suggested by Will]
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20240909124721.1672199-3-sebastianene@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
When we share memory through FF-A and the description of the buffers
exceeds the size of the mapped buffer, the fragmentation API is used.
The fragmentation API allows specifying chunks of descriptors in subsequent
FF-A fragment calls and no upper limit has been established for this.
The entire memory region transferred is identified by a handle which can be
used to reclaim the transferred memory.
To be able to reclaim the memory, the description of the buffers has to fit
in the ffa_desc_buf.
Add a bounds check on the FF-A sharing path to prevent the memory reclaim
from failing.
Also do_ffa_mem_xfer() does not need __always_inline, except for the
BUILD_BUG_ON() aspect, which gets moved to a macro.
[maz: fixed the BUILD_BUG_ON() breakage with LLVM, thanks to Wei-Lin Chang
for the timely report]
Fixes: 634d90cf0ac65 ("KVM: arm64: Handle FFA_MEM_LEND calls from the host")
Cc: stable@vger.kernel.org
Reviewed-by: Sebastian Ene <sebastianene@google.com>
Signed-off-by: Snehal Koukuntla <snehalreddy@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Link: https://lore.kernel.org/r/20240909180154.3267939-1-snehalreddy@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Remove superfluous RSB filling after a VMEXIT when the CPU already has
flushed the RSB after a VMEXIT when AutoIBRS is enabled.
The initial implementation for adding RETPOLINES added an ALTERNATIVES
implementation for filling the RSB after a VMEXIT in commit 117cc7a908c8
("x86/retpoline: Fill return stack buffer on vmexit").
Later, X86_FEATURE_RSB_VMEXIT was added in commit 9756bba28470
("x86/speculation: Fill RSB on vmexit for IBRS") to handle stuffing the
RSB if RETPOLINE=y *or* KERNEL_IBRS=y, i.e. to also stuff the RSB if the
kernel is configured to do IBRS mitigations on entry/exit.
The AutoIBRS (on AMD) feature implementation added in commit e7862eda309e
("x86/cpu: Support AMD Automatic IBRS") used the already-implemented logic
for EIBRS in spectre_v2_determine_rsb_fill_type_on_vmexit() -- but did not
update the code at VMEXIT to act on the mode selected in that function --
resulting in VMEXITs continuing to clear the RSB when RETPOLINES are
enabled, despite the presence of AutoIBRS.
Signed-off-by: Amit Shah <amit.shah@amd.com>
Link: https://lore.kernel.org/r/20240807123531.69677-1-amit@kernel.org
[sean: massage changeloge, drop comment about AMD not needing RSB_VMEXIT_LITE]
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
In preparation for using the stage-2 definitions in ptdump, move some of
these macros in the common header.
Signed-off-by: Sebastian Ene <sebastianene@google.com>
Link: https://lore.kernel.org/r/20240909124721.1672199-2-sebastianene@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Set PFERR_GUEST_{FINAL,PAGE}_MASK based on EPT_VIOLATION_GVA_TRANSLATED if
and only if EPT_VIOLATION_GVA_IS_VALID is also set in exit qualification.
Per the SDM, bit 8 (EPT_VIOLATION_GVA_TRANSLATED) is valid if and only if
bit 7 (EPT_VIOLATION_GVA_IS_VALID) is set, and is '0' if bit 7 is '0'.
Bit 7 (a.k.a. EPT_VIOLATION_GVA_IS_VALID)
Set if the guest linear-address field is valid. The guest linear-address
field is valid for all EPT violations except those resulting from an
attempt to load the guest PDPTEs as part of the execution of the MOV CR
instruction and those due to trace-address pre-translation
Bit 8 (a.k.a. EPT_VIOLATION_GVA_TRANSLATED)
If bit 7 is 1:
• Set if the access causing the EPT violation is to a guest-physical
address that is the translation of a linear address.
• Clear if the access causing the EPT violation is to a paging-structure
entry as part of a page walk or the update of an accessed or dirty bit.
Reserved if bit 7 is 0 (cleared to 0).
Failure to guard the logic on GVA_IS_VALID results in KVM marking the page
fault as PFERR_GUEST_PAGE_MASK when there is no known GVA, which can put
the vCPU into an infinite loop due to kvm_mmu_page_fault() getting false
positive on its PFERR_NESTED_GUEST_PAGE logic (though only because that
logic is also buggy/flawed).
In practice, this is largely a non-issue because so GVA_IS_VALID is almost
always set. However, when TDX comes along, GVA_IS_VALID will *never* be
set, as the TDX Module deliberately clears bits 12:7 in exit qualification,
e.g. so that the faulting virtual address and other metadata that aren't
practically useful for the hypervisor aren't leaked to the untrusted host.
When exit is due to EPT violation, bits 12-7 of the exit qualification
are cleared to 0.
Fixes: eebed2438923 ("kvm: nVMX: Add support for fast unprotection of nested guest page tables")
Reviewed-by: Yuan Yao <yuan.yao@intel.com>
Link: https://lore.kernel.org/r/20240831001538.336683-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Use KVM_PAGES_PER_HPAGE() instead of open coding equivalent logic that is
anything but obvious.
No functional change intended, and verified by compiling with the below
assertions:
BUILD_BUG_ON((1UL << KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K)) !=
KVM_PAGES_PER_HPAGE(PG_LEVEL_4K));
BUILD_BUG_ON((1UL << KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) !=
KVM_PAGES_PER_HPAGE(PG_LEVEL_2M));
BUILD_BUG_ON((1UL << KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G)) !=
KVM_PAGES_PER_HPAGE(PG_LEVEL_1G));
Link: https://lore.kernel.org/r/20240809194335.1726916-19-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Replace all of the open coded '1' literals used to mark a PTE list as
having many/multiple entries with a proper define. It's hard enough to
read the code with one magic bit, and a future patch to support "locking"
a single rmap will add another.
No functional change intended.
Link: https://lore.kernel.org/r/20240809194335.1726916-17-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Fold mmu_spte_age() into its sole caller now that aging and testing for
young SPTEs is handled in a common location, i.e. doesn't require more
helpers.
Opportunistically remove the use of mmu_spte_get_lockless(), as mmu_lock
is held (for write!), and marking SPTEs for access tracking outside of
mmu_lock is unsafe (at least, as written). I.e. using the lockless
accessor is quite misleading.
No functional change intended.
Link: https://lore.kernel.org/r/20240809194335.1726916-16-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Rework kvm_handle_gfn_range() into an aging-specic helper,
kvm_rmap_age_gfn_range(). In addition to purging a bunch of unnecessary
boilerplate code, this sets the stage for aging rmap SPTEs outside of
mmu_lock.
Note, there's a small functional change, as kvm_test_age_gfn() will now
return immediately if a young SPTE is found, whereas previously KVM would
continue iterating over other levels.
Link: https://lore.kernel.org/r/20240809194335.1726916-15-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Convert kvm_unmap_gfn_range(), which is the helper that zaps rmap SPTEs in
response to an mmu_notifier invalidation, to use __kvm_rmap_zap_gfn_range()
and feed in range->may_block. In other words, honor NEED_RESCHED by way of
cond_resched() when zapping rmaps. This fixes a long-standing issue where
KVM could process an absurd number of rmap entries without ever yielding,
e.g. if an mmu_notifier fired on a PUD (or larger) range.
Opportunistically rename __kvm_zap_rmap() to kvm_zap_rmap(), and drop the
old kvm_zap_rmap(). Ideally, the shuffling would be done in a different
patch, but that just makes the compiler unhappy, e.g.
arch/x86/kvm/mmu/mmu.c:1462:13: error: ‘kvm_zap_rmap’ defined but not used
Reported-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240809194335.1726916-14-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|