summaryrefslogtreecommitdiff
path: root/Documentation/virtual/kvm
AgeCommit message (Collapse)Author
2018-09-20KVM: x86: Control guest reads of MSR_PLATFORM_INFODrew Schmitt
Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access to reads of MSR_PLATFORM_INFO. Disabling access to reads of this MSR gives userspace the control to "expose" this platform-dependent information to guests in a clear way. As it exists today, guests that read this MSR would get unpopulated information if userspace hadn't already set it (and prior to this patch series, only the CPUID faulting information could have been populated). This existing interface could be confusing if guests don't handle the potential for incorrect/incomplete information gracefully (e.g. zero reported for base frequency). Signed-off-by: Drew Schmitt <dasch@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-12KVM: s390: Make huge pages unavailable in ucontrol VMsJanosch Frank
We currently do not notify all gmaps when using gmap_pmdp_xchg(), due to locking constraints. This makes ucontrol VMs, which is the only VM type that creates multiple gmaps, incompatible with huge pages. Also we would need to hold the guest_table_lock of all gmaps that have this vmaddr maped to synchronize access to the pmd. ucontrol VMs are rather exotic and creating a new locking concept is no easy task. Hence we return EINVAL when trying to active KVM_CAP_S390_HPAGE_1M and report it as being not available when checking for it. Fixes: a4499382 ("KVM: s390: Add huge page enablement control") Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-Id: <20180801112508.138159-1-frankja@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2018-08-22KVM: Documentation: rename the capability of KVM_CAP_ARM_SET_SERROR_ESRDongjiu Geng
In the documentation description, this capability's name is KVM_CAP_ARM_SET_SERROR_ESR, but in the header file this capability's name is KVM_CAP_ARM_INJECT_SERROR_ESR, so change the documentation description to make it same. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Reported-by: Andrew Jones <drjones@redhat.com> Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-22Merge tag 'kvmarm-for-v4.19' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm updates for 4.19 - Support for Group0 interrupts in guests - Cache management optimizations for ARMv8.4 systems - Userspace interface for RAS, allowing error retrival and injection - Fault path optimization - Emulated physical timer fixes - Random cleanups
2018-08-06KVM: X86: Implement "send IPI" hypercallWanpeng Li
Using hypercall to send IPIs by one vmexit instead of one by one for xAPIC/x2APIC physical mode and one vmexit per-cluster for x2APIC cluster mode. Intel guest can enter x2apic cluster mode when interrupt remmaping is enabled in qemu, however, latest AMD EPYC still just supports xapic mode which can get great improvement by Exit-less IPIs. This patchset lets a guest send multicast IPIs, with at most 128 destinations per hypercall in 64-bit mode and 64 vCPUs per hypercall in 32-bit mode. Hardware: Xeon Skylake 2.5GHz, 2 sockets, 40 cores, 80 threads, the VM is 80 vCPUs, IPI microbenchmark(https://lkml.org/lkml/2017/12/19/141): x2apic cluster mode, vanilla Dry-run: 0, 2392199 ns Self-IPI: 6907514, 15027589 ns Normal IPI: 223910476, 251301666 ns Broadcast IPI: 0, 9282161150 ns Broadcast lock: 0, 8812934104 ns x2apic cluster mode, pv-ipi Dry-run: 0, 2449341 ns Self-IPI: 6720360, 15028732 ns Normal IPI: 228643307, 255708477 ns Broadcast IPI: 0, 7572293590 ns => 22% performance boost Broadcast lock: 0, 8316124651 ns x2apic physical mode, vanilla Dry-run: 0, 3135933 ns Self-IPI: 8572670, 17901757 ns Normal IPI: 226444334, 255421709 ns Broadcast IPI: 0, 19845070887 ns Broadcast lock: 0, 19827383656 ns x2apic physical mode, pv-ipi Dry-run: 0, 2446381 ns Self-IPI: 6788217, 15021056 ns Normal IPI: 219454441, 249583458 ns Broadcast IPI: 0, 7806540019 ns => 154% performance boost Broadcast lock: 0, 9143618799 ns Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-08-06kvm: nVMX: Introduce KVM_CAP_NESTED_STATEJim Mattson
For nested virtualization L0 KVM is managing a bit of state for L2 guests, this state can not be captured through the currently available IOCTLs. In fact the state captured through all of these IOCTLs is usually a mix of L1 and L2 state. It is also dependent on whether the L2 guest was running at the moment when the process was interrupted to save its state. With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM that is in VMX operation. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Jim Mattson <jmattson@google.com> [karahmed@ - rename structs and functions and make them ready for AMD and address previous comments. - handle nested.smm state. - rebase & a bit of refactoring. - Merge 7/8 and 8/8 into one patch. ] Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-07-30KVM: s390: Add huge page enablement controlJanosch Frank
General KVM huge page support on s390 has to be enabled via the kvm.hpage module parameter. Either nested or hpage can be enabled, as we currently do not support vSIE for huge backed guests. Once the vSIE support is added we will either drop the parameter or enable it as default. For a guest the feature has to be enabled through the new KVM_CAP_S390_HPAGE_1M capability and the hpage module parameter. Enabling it means that cmm can't be enabled for the vm and disables pfmf and storage key interpretation. This is due to the fact that in some cases, in upcoming patches, we have to split huge pages in the guest mapping to be able to set more granular memory protection on 4k pages. These split pages have fake page tables that are not visible to the Linux memory management which subsequently will not manage its PGSTEs, while the SIE will. Disabling these features lets us manage PGSTE data in a consistent matter and solve that problem. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com>
2018-07-21KVM: arm: Add 32bit get/set events supportJames Morse
arm64's new use of KVMs get_events/set_events API calls isn't just or RAS, it allows an SError that has been made pending by KVM as part of its device emulation to be migrated. Wire this up for 32bit too. We only need to read/write the HCR_VA bit, and check that no esr has been provided, as we don't yet support VDFSR. Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Dongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21arm64: KVM: export the capability to set guest SError syndromeDongjiu Geng
For the arm64 RAS Extension, user space can inject a virtual-SError with specified ESR. So user space needs to know whether KVM support to inject such SError, this interface adds this query for this capability. KVM will check whether system support RAS Extension, if supported, KVM returns true to user space, otherwise returns false. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21arm/arm64: KVM: Add KVM_GET/SET_VCPU_EVENTSDongjiu Geng
For the migrating VMs, user space may need to know the exception state. For example, in the machine A, KVM make an SError pending, when migrate to B, KVM also needs to pend an SError. This new IOCTL exports user-invisible states related to SError. Together with appropriate user space changes, user space can get/set the SError exception state to do migrate/snapshot/suspend. Signed-off-by: Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: James Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-07-21KVM: arm/arm64: vgic: Update documentation of the GIC devices wrt IIDRChristoffer Dall
Update the documentation to reflect the ordering requirements of restoring the GICD_IIDR register before any other registers and the effects this has on restoring the interrupt groups for an emulated GICv2 instance. Also remove some outdated limitations in the documentation while we're at it. Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-06-22KVM: fix KVM_CAP_HYPERV_TLBFLUSH paragraph numberVitaly Kuznetsov
KVM_CAP_HYPERV_TLBFLUSH collided with KVM_CAP_S390_PSW-BPB, its paragraph number should now be 8.18. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-06-01KVM: docs: nVMX: Remove known limitations as they do not exist nowLiran Alon
We can document other "Known Limitations" but the ones currently referenced don't hold anymore... Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-01KVM: docs: mmu: KVM support exposing SLAT to guestsLiran Alon
Fix outdated statement that KVM is not able to expose SLAT (Second-Layer-Address-Translation) to guests. This was implemented a long time ago... Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-01Merge tag 'kvmarm-for-v4.18' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/ARM updates for 4.18 - Lazy context-switching of FPSIMD registers on arm64 - Allow virtual redistributors to be part of two or more MMIO ranges
2018-05-26KVM: docs: mmu: Fix link to NPT presentation from KVM Forum 2008Liran Alon
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-26kvm: x86: Amend the KVM_GET_SUPPORTED_CPUID API documentationJim Mattson
Document the subtle nuances that KVM_CAP_X86_DISABLE_EXITS induces in the KVM_GET_SUPPORTED_CPUID API. Fixes: 4d5422cea3b6 ("KVM: X86: Provide a capability to disable MWAIT intercepts") Signed-off-by: Jim Mattson <jmattson@google.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-26KVM: x86: hyperv: declare KVM_CAP_HYPERV_TLBFLUSH capabilityVitaly Kuznetsov
We need a new capability to indicate support for the newly added HvFlushVirtualAddress{List,Space}{,Ex} hypercalls. Upon seeing this capability, userspace is supposed to announce PV TLB flush features by setting the appropriate CPUID bits (if needed). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-05-25KVM: arm/arm64: Document KVM_VGIC_V3_ADDR_TYPE_REDIST_REGIONEric Auger
We introduce a new KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION attribute in KVM_DEV_ARM_VGIC_GRP_ADDR group. It allows userspace to provide the base address and size of a redistributor region Compared to KVM_VGIC_V3_ADDR_TYPE_REDIST, this new attribute allows to declare several separate redistributor regions. So the whole redist space does not need to be contiguous anymore. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-05-17kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIMEMichael S. Tsirkin
KVM_HINTS_DEDICATED seems to be somewhat confusing: Guest doesn't really care whether it's the only task running on a host CPU as long as it's not preempted. And there are more reasons for Guest to be preempted than host CPU sharing, for example, with memory overcommit it can get preempted on a memory access, post copy migration can cause preemption, etc. Let's call it KVM_HINTS_REALTIME which seems to better match what guests expect. Also, the flag most be set on all vCPUs - current guests assume this. Note so in the documentation. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-20arm/arm64: KVM: Add PSCI version selection APIMarc Zyngier
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1 or 1.0 to a guest, defaulting to the latest version of the PSCI implementation that is compatible with the requested version. This is no different from doing a firmware upgrade on KVM. But in order to give a chance to hypothetical badly implemented guests that would have a fit by discovering something other than PSCI 0.2, let's provide a new API that allows userspace to pick one particular version of the API. This is implemented as a new class of "firmware" registers, where we expose the PSCI version. This allows the PSCI version to be save/restored as part of a guest migration, and also set to any supported version if the guest requires it. Cc: stable@vger.kernel.org #4.16 Reviewed-by: Christoffer Dall <cdall@kernel.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-03-28KVM: trivial documentation cleanupsAndrew Jones
Add missing entries to the index and ensure the entries are in alphabetical order. Also amd-memory-encryption.rst is an .rst not a .txt. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-16KVM: X86: Provide a capability to disable HLT interceptsWanpeng Li
If host CPUs are dedicated to a VM, we can avoid VM exits on HLT. This patch adds the per-VM capability to disable them. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-16KVM: X86: Provide a capability to disable MWAIT interceptsWanpeng Li
Allowing a guest to execute MWAIT without interception enables a guest to put a (physical) CPU into a power saving state, where it takes longer to return from than what may be desired by the host. Don't give a guest that power over a host by default. (Especially, since nothing prevents a guest from using MWAIT even when it is not advertised via CPUID.) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Jan H. Schönherr <jschoenh@amazon.de> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-03-16Merge tag 'kvm-s390-next-4.17-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: fixes and features - more kvm stat counters - virtio gpu plumbing. The 3 non-KVM/s390 patches have Acks from Bartlomiej Zolnierkiewicz, Heiko Carstens and Greg Kroah-Hartman but all belong together to make virtio-gpu work as a tty. So I carried them in the KVM/s390 tree. - document some KVM_CAPs - cpu-model only facilities - cleanups
2018-03-14KVM: document KVM_CAP_S390_[BPB|PSW|GMAP|COW]Christian Borntraeger
commit 35b3fde6203b ("KVM: s390: wire up bpb feature") has no documentation for KVM_CAP_S390_BPB. While adding this let's also add other missing capabilities like KVM_CAP_S390_PSW, KVM_CAP_S390_GMAP and KVM_CAP_S390_COW. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2018-03-06KVM: Introduce paravirtualization hints and KVM_HINTS_DEDICATEDWanpeng Li
This patch introduces kvm_para_has_hint() to query for hints about the configuration of the guests. The first hint KVM_HINTS_DEDICATED, is set if the guest has dedicated physical CPUs for each vCPU (i.e. pinning and no over-commitment). This allows optimizing spinlocks and tells the guest to avoid PV TLB flush. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06KVM: x86: KVM_CAP_SYNC_REGSKen Hofsass
This commit implements an enhanced x86 version of S390 KVM_CAP_SYNC_REGS functionality. KVM_CAP_SYNC_REGS "allow[s] userspace to access certain guest registers without having to call SET/GET_*REGS”. This reduces ioctl overhead which is particularly important when userspace is making synchronous guest state modifications (e.g. when emulating and/or intercepting instructions). Originally implemented upstream for the S390, the x86 differences follow: - userspace can select the register sets to be synchronized with kvm_run using bit-flags in the kvm_valid_registers and kvm_dirty_registers fields. - vcpu_events is available in addition to the regs and sregs register sets. Signed-off-by: Ken Hofsass <hofsass@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> [Removed wrapper around check for reserved kvm_valid_regs. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06KVM: x86: add SYNC_REGS_SIZE_BYTES #define.Ken Hofsass
Replace hardcoded padding size value for struct kvm_sync_regs with #define SYNC_REGS_SIZE_BYTES. Also update the value specified in api.txt from outdated hardcoded value to SYNC_REGS_SIZE_BYTES. Signed-off-by: Ken Hofsass <hofsass@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-06kvm: x86: hyperv: guest->host event signaling via eventfdRoman Kagan
In Hyper-V, the fast guest->host notification mechanism is the SIGNAL_EVENT hypercall, with a single parameter of the connection ID to signal. Currently this hypercall incurs a user exit and requires the userspace to decode the parameters and trigger the notification of the potentially different I/O context. To avoid the costly user exit, process this hypercall and signal the corresponding eventfd in KVM, similar to ioeventfd. The association between the connection id and the eventfd is established via the newly introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an (srcu-protected) IDR. Signed-off-by: Roman Kagan <rkagan@virtuozzo.com> Reviewed-by: David Hildenbrand <david@redhat.com> [asm/hyperv.h changes approved by KY Srinivasan. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-03-01KVM: x86: Add a framework for supporting MSR-based featuresTom Lendacky
Provide a new KVM capability that allows bits within MSRs to be recognized as features. Two new ioctls are added to the /dev/kvm ioctl routine to retrieve the list of these MSRs and then retrieve their values. A kvm_x86_ops callback is used to determine support for the listed MSR-based features. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [Tweaked documentation. - Radim] Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-02-24KVM: x86: fix backward migration with async_PFRadim Krčmář
Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT bit when enabling async_PF, but this bit is reserved on old hypervisors, which results in a failure upon migration. To avoid breaking different cases, we are checking for CPUID feature bit before enabling the feature and nothing else. Fixes: 52a5c155cf79 ("KVM: async_pf: Let guest support delivery of async_pf from guest mode") Cc: <stable@vger.kernel.org> Reviewed-by: Wanpeng Li <wanpengli@tencent.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-01Merge tag 'kvm-ppc-next-4.16-1' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc PPC KVM update for 4.16 - Allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization that earlier CPU versions required. - A series from Ben Herrenschmidt to improve the handling of escalation interrupts with the XIVE interrupt controller. - Provide for the decrementer register to be copied across on migration. - Various minor cleanups and bugfixes.
2018-02-01Merge branch 'x86/hyperv' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Topic branch for stable KVM clockource under Hyper-V. Thanks to Christoffer Dall for resolving the ARM conflict.
2018-01-31Merge tag 'kvm-arm-for-v4.16' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm KVM/ARM Changes for v4.16 The changes for this version include icache invalidation optimizations (improving VM startup time), support for forwarded level-triggered interrupts (improved performance for timers and passthrough platform devices), a small fix for power-management notifiers, and some cosmetic changes.
2018-01-19KVM: PPC: Book3S: Provide information about hardware/firmware CVE workaroundsPaul Mackerras
This adds a new ioctl, KVM_PPC_GET_CPU_CHAR, that gives userspace information about the underlying machine's level of vulnerability to the recently announced vulnerabilities CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754, and whether the machine provides instructions to assist software to work around the vulnerabilities. The ioctl returns two u64 words describing characteristics of the CPU and required software behaviour respectively, plus two mask words which indicate which bits have been filled in by the kernel, for extensibility. The bit definitions are the same as for the new H_GET_CPU_CHARACTERISTICS hypercall. There is also a new capability, KVM_CAP_PPC_GET_CPU_CHAR, which indicates whether the new ioctl is available. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-19Merge remote-tracking branch 'remotes/powerpc/topic/ppc-kvm' into kvm-ppc-nextPaul Mackerras
This merges in the ppc-kvm topic branch of the powerpc tree to get two patches which are prerequisites for the following patch series, plus another patch which touches both powerpc and KVM code. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-16Merge branch 'sev-v9-p2' of https://github.com/codomania/kvmPaolo Bonzini
This part of Secure Encrypted Virtualization (SEV) patch series focuses on KVM changes required to create and manage SEV guests. SEV is an extension to the AMD-V architecture which supports running encrypted virtual machine (VMs) under the control of a hypervisor. Encrypted VMs have their pages (code and data) secured such that only the guest itself has access to unencrypted version. Each encrypted VM is associated with a unique encryption key; if its data is accessed to a different entity using a different key the encrypted guest's data will be incorrectly decrypted, leading to unintelligible data. This security model ensures that hypervisor will no longer able to inspect or alter any guest code or data. The key management of this feature is handled by a separate processor known as the AMD Secure Processor (AMD-SP) which is present on AMD SOCs. The SEV Key Management Specification (see below) provides a set of commands which can be used by hypervisor to load virtual machine keys through the AMD-SP driver. The patch series adds a new ioctl in KVM driver (KVM_MEMORY_ENCRYPT_OP). The ioctl will be used by qemu to issue SEV guest-specific commands defined in Key Management Specification. The following links provide additional details: AMD Memory Encryption white paper: http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf AMD64 Architecture Programmer's Manual: http://support.amd.com/TechDocs/24593.pdf SME is section 7.10 SEV is section 15.34 SEV Key Management: http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf KVM Forum Presentation: http://www.linux-kvm.org/images/7/74/02x08A-Thomas_Lendacky-AMDs_Virtualizatoin_Memory_Encryption_Technology.pdf SEV Guest BIOS support: SEV support has been add to EDKII/OVMF BIOS https://github.com/tianocore/edk2 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-01-16KVM: X86: use paravirtualized TLB ShootdownWanpeng Li
Remote TLB flush does a busy wait which is fine in bare-metal scenario. But with-in the guest, the vcpus might have been pre-empted or blocked. In this scenario, the initator vcpu would end up busy-waiting for a long amount of time; it also consumes CPU unnecessarily to wake up the target of the shootdown. This patch set adds support for KVM's new paravirtualized TLB flush; remote TLB flush does not wait for vcpus that are sleeping, instead KVM will flush the TLB as soon as the vCPU starts running again. The improvement is clearly visible when the host is overcommitted; in this case, the PV TLB flush (in addition to avoiding the wait on the main CPU) prevents preempted vCPUs from stealing precious execution time from the running ones. Testing on a Xeon Gold 6142 2.6GHz 2 sockets, 32 cores, 64 threads, so 64 pCPUs, and each VM is 64 vCPUs. ebizzy -M vanilla optimized boost 1VM 46799 48670 4% 2VM 23962 42691 78% 3VM 16152 37539 132% Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2018-01-16KVM: PPC: Book3S HV: Enable migration of decrementer registerPaul Mackerras
This adds a register identifier for use with the one_reg interface to allow the decrementer expiry time to be read and written by userspace. The decrementer expiry time is in guest timebase units and is equal to the sum of the decrementer and the guest timebase. (The expiry time is used rather than the decrementer value itself because the expiry time is not constantly changing, though the decrementer value is, while the guest vcpu is not running.) Without this, a guest vcpu migrated to a new host will see its decrementer set to some random value. On POWER8 and earlier, the decrementer is 32 bits wide and counts down at 512MHz, so the guest vcpu will potentially see no decrementer interrupts for up to about 4 seconds, which will lead to a stall. With POWER9, the decrementer is now 56 bits side, so the stall can be much longer (up to 2.23 years) and more noticeable. To help work around the problem in cases where userspace has not been updated to migrate the decrementer expiry time, we now set the default decrementer expiry at vcpu creation time to the current time rather than the maximum possible value. This should mean an immediate decrementer interrupt when a migrated vcpu starts running. In cases where the decrementer is 32 bits wide and more than 4 seconds elapse between the creation of the vcpu and when it first runs, the decrementer would have wrapped around to positive values and there may still be a stall - but this is no worse than the current situation. In the large-decrementer case, we are sure to get an immediate decrementer interrupt (assuming the time from vcpu creation to first run is less than 2.23 years) and we thus avoid a very long stall. Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-01-02KVM: arm/arm64: Delete outdated forwarded irq documentationChristoffer Dall
The reason I added this documentation originally was that the concept of "never taking the interrupt", but just use the timer to generate an exit from the guest, was confusing to most, and we had to explain it several times over. But as we can clearly see, we've failed to update the documentation as the code has evolved, and people who need to understand these details are probably better off reading the code. Let's lighten our maintenance burden slightly and just get rid of this. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-12-06KVM: s390: mark irq_state.flags as non-usableChristian Borntraeger
Old kernels did not check for zero in the irq_state.flags field and old QEMUs did not zero the flag/reserved fields when calling KVM_S390_*_IRQ_STATE. Let's add comments to prevent future uses of these fields. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2017-12-04KVM: Define SEV key management command idBrijesh Singh
Define Secure Encrypted Virtualization (SEV) key management command id and structure. The command definition is available in SEV KM spec 0.14 (http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf) and Documentation/virtual/kvm/amd-memory-encryption.txt. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04KVM: Introduce KVM_MEMORY_ENCRYPT_{UN,}REG_REGION ioctlBrijesh Singh
If hardware supports memory encryption then KVM_MEMORY_ENCRYPT_REG_REGION and KVM_MEMORY_ENCRYPT_UNREG_REGION ioctl's can be used by userspace to register/unregister the guest memory regions which may contain the encrypted data (e.g guest RAM, PCI BAR, SMRAM etc). Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Improvements-by: Borislav Petkov <bp@suse.de> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04KVM: Introduce KVM_MEMORY_ENCRYPT_OP ioctlBrijesh Singh
If the hardware supports memory encryption then the KVM_MEMORY_ENCRYPT_OP ioctl can be used by qemu to issue a platform specific memory encryption commands. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86@kernel.org Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-12-04Documentation/virtual/kvm: Add AMD Secure Encrypted Virtualization (SEV)Brijesh Singh
Create a Documentation entry to describe the AMD Secure Encrypted Virtualization (SEV) feature. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Borislav Petkov <bp@suse.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: kvm@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Brijesh Singh <brijesh.singh@amd.com> Reviewed-by: Borislav Petkov <bp@suse.de>
2017-11-17Merge tag 'kvm-arm-gicv4-for-v4.15' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD GICv4 Support for KVM/ARM for v4.15
2017-11-16Merge tag 'kvm-s390-next-4.15-1' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux KVM: s390: fixes and improvements for 4.15 - Some initial preparation patches for exitless interrupts and crypto - New capability for AIS migration - Fixes - merge of the sthyi tree from the base s390 team, which moves the sthyi out of KVM into a shared function also for non-KVM
2017-11-10KVM: arm/arm64: GICv4: Prevent a VM using GICv4 from being savedMarc Zyngier
The GICv4 architecture doesn't make it easy for save/restore to work, as it doesn't give any guarantee that the pending state is written into the pending table. So let's not take any chance, and let's return an error if we encounter any LPI that has the HW bit set. In order for userspace to distinguish this error from other failure modes, use -EACCES as an error code. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
2017-11-09KVM: s390: provide a capability for AIS state migrationChristian Borntraeger
The AIS capability was introduced in 4.12, while the interface to migrate the state was added in 4.13. Unfortunately it is not possible for userspace to detect the migration capability without creating a flic kvm device. As in QEMU the cpu model detection runs on the "none" machine this will result in cpu model issues regarding the "ais" capability. To get the "ais" capability properly let's add a new KVM capability that tells userspace that AIS states can be migrated. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Halil Pasic <pasic@linux.vnet.ibm.com>