diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-07-20 12:41:03 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-07-20 12:41:03 -0700 |
commit | 2c9b3512402ed192d1f43f4531fb5da947e72bd0 (patch) | |
tree | d63534a1e9cf5b12a1362a348e2237c9c592a493 /include | |
parent | c43a20e4a520b37c2ef6d4f422de989992c9129f (diff) | |
parent | 332d2c1d713e232e163386c35a3ba0c1b90df83f (diff) |
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Initial infrastructure for shadow stage-2 MMUs, as part of nested
virtualization enablement
- Support for userspace changes to the guest CTR_EL0 value, enabling
(in part) migration of VMs between heterogenous hardware
- Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1
of the protocol
- FPSIMD/SVE support for nested, including merged trap configuration
and exception routing
- New command-line parameter to control the WFx trap behavior under
KVM
- Introduce kCFI hardening in the EL2 hypervisor
- Fixes + cleanups for handling presence/absence of FEAT_TCRX
- Miscellaneous fixes + documentation updates
LoongArch:
- Add paravirt steal time support
- Add support for KVM_DIRTY_LOG_INITIALLY_SET
- Add perf kvm-stat support for loongarch
RISC-V:
- Redirect AMO load/store access fault traps to guest
- perf kvm stat support
- Use guest files for IMSIC virtualization, when available
s390:
- Assortment of tiny fixes which are not time critical
x86:
- Fixes for Xen emulation
- Add a global struct to consolidate tracking of host values, e.g.
EFER
- Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the
effective APIC bus frequency, because TDX
- Print the name of the APICv/AVIC inhibits in the relevant
tracepoint
- Clean up KVM's handling of vendor specific emulation to
consistently act on "compatible with Intel/AMD", versus checking
for a specific vendor
- Drop MTRR virtualization, and instead always honor guest PAT on
CPUs that support self-snoop
- Update to the newfangled Intel CPU FMS infrastructure
- Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as
it reads '0' and writes from userspace are ignored
- Misc cleanups
x86 - MMU:
- Small cleanups, renames and refactoring extracted from the upcoming
Intel TDX support
- Don't allocate kvm_mmu_page.shadowed_translation for shadow pages
that can't hold leafs SPTEs
- Unconditionally drop mmu_lock when allocating TDP MMU page tables
for eager page splitting, to avoid stalling vCPUs when splitting
huge pages
- Bug the VM instead of simply warning if KVM tries to split a SPTE
that is non-present or not-huge. KVM is guaranteed to end up in a
broken state because the callers fully expect a valid SPTE, it's
all but dangerous to let more MMU changes happen afterwards
x86 - AMD:
- Make per-CPU save_area allocations NUMA-aware
- Force sev_es_host_save_area() to be inlined to avoid calling into
an instrumentable function from noinstr code
- Base support for running SEV-SNP guests. API-wise, this includes a
new KVM_X86_SNP_VM type, encrypting/measure the initial image into
guest memory, and finalizing it before launching it. Internally,
there are some gmem/mmu hooks needed to prepare gmem-allocated
pages before mapping them into guest private memory ranges
This includes basic support for attestation guest requests, enough
to say that KVM supports the GHCB 2.0 specification
There is no support yet for loading into the firmware those signing
keys to be used for attestation requests, and therefore no need yet
for the host to provide certificate data for those keys.
To support fetching certificate data from userspace, a new KVM exit
type will be needed to handle fetching the certificate from
userspace.
An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS
exit type to handle this was introduced in v1 of this patchset, but
is still being discussed by community, so for now this patchset
only implements a stub version of SNP Extended Guest Requests that
does not provide certificate data
x86 - Intel:
- Remove an unnecessary EPT TLB flush when enabling hardware
- Fix a series of bugs that cause KVM to fail to detect nested
pending posted interrupts as valid wake eents for a vCPU executing
HLT in L2 (with HLT-exiting disable by L1)
- KVM: x86: Suppress MMIO that is triggered during task switch
emulation
Explicitly suppress userspace emulated MMIO exits that are
triggered when emulating a task switch as KVM doesn't support
userspace MMIO during complex (multi-step) emulation
Silently ignoring the exit request can result in the
WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace
for some other reason prior to purging mmio_needed
See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write
exits if emulator detects exception") for more details on KVM's
limitations with respect to emulated MMIO during complex emulator
flows
Generic:
- Rename the AS_UNMOVABLE flag that was introduced for KVM to
AS_INACCESSIBLE, because the special casing needed by these pages
is not due to just unmovability (and in fact they are only
unmovable because the CPU cannot access them)
- New ioctl to populate the KVM page tables in advance, which is
useful to mitigate KVM page faults during guest boot or after live
migration. The code will also be used by TDX, but (probably) not
through the ioctl
- Enable halt poll shrinking by default, as Intel found it to be a
clear win
- Setup empty IRQ routing when creating a VM to avoid having to
synchronize SRCU when creating a split IRQCHIP on x86
- Rework the sched_in/out() paths to replace kvm_arch_sched_in() with
a flag that arch code can use for hooking both sched_in() and
sched_out()
- Take the vCPU @id as an "unsigned long" instead of "u32" to avoid
truncating a bogus value from userspace, e.g. to help userspace
detect bugs
- Mark a vCPU as preempted if and only if it's scheduled out while in
the KVM_RUN loop, e.g. to avoid marking it preempted and thus
writing guest memory when retrieving guest state during live
migration blackout
Selftests:
- Remove dead code in the memslot modification stress test
- Treat "branch instructions retired" as supported on all AMD Family
17h+ CPUs
- Print the guest pseudo-RNG seed only when it changes, to avoid
spamming the log for tests that create lots of VMs
- Make the PMU counters test less flaky when counting LLC cache
misses by doing CLFLUSH{OPT} in every loop iteration"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits)
crypto: ccp: Add the SNP_VLEK_LOAD command
KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops
KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops
KVM: x86: Replace static_call_cond() with static_call()
KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event
x86/sev: Move sev_guest.h into common SEV header
KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event
KVM: x86: Suppress MMIO that is triggered during task switch emulation
KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro
KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE
KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY
KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level
KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault"
KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler
KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
KVM: Document KVM_PRE_FAULT_MEMORY ioctl
mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE
perf kvm: Add kvm-stat for loongarch64
LoongArch: KVM: Add PV steal time support in guest side
...
Diffstat (limited to 'include')
-rw-r--r-- | include/linux/arm_ffa.h | 3 | ||||
-rw-r--r-- | include/linux/kvm_host.h | 53 | ||||
-rw-r--r-- | include/linux/pagemap.h | 13 | ||||
-rw-r--r-- | include/linux/psp-sev.h | 4 | ||||
-rw-r--r-- | include/linux/srcu.h | 14 | ||||
-rw-r--r-- | include/uapi/linux/kvm.h | 27 | ||||
-rw-r--r-- | include/uapi/linux/psp-sev.h | 27 | ||||
-rw-r--r-- | include/uapi/linux/sev-guest.h | 3 |
8 files changed, 132 insertions, 12 deletions
diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h index c82d56768101..c6d18f50f671 100644 --- a/include/linux/arm_ffa.h +++ b/include/linux/arm_ffa.h @@ -212,6 +212,9 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev) { return false; } extern const struct bus_type ffa_bus_type; +/* The FF-A 1.0 partition structure lacks the uuid[4] */ +#define FFA_1_0_PARTITON_INFO_SZ (8) + /* FFA transport related */ struct ffa_partition_info { u16 id; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 692c01e41a18..689e8be873a7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -378,8 +378,10 @@ struct kvm_vcpu { bool dy_eligible; } spin_loop; #endif + bool wants_to_run; bool preempted; bool ready; + bool scheduled_out; struct kvm_vcpu_arch arch; struct kvm_vcpu_stat stat; char stats_id[KVM_STATS_NAME_SIZE]; @@ -1494,8 +1496,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, struct kvm_guest_debug *dbg); int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu); -void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu); - void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu); void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu); int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id); @@ -1955,8 +1955,6 @@ struct _kvm_stats_desc { HALT_POLL_HIST_COUNT), \ STATS_DESC_IBOOLEAN(VCPU_GENERIC, blocking) -extern struct dentry *kvm_debugfs_dir; - ssize_t kvm_stats_read(char *id, const struct kvm_stats_header *header, const struct _kvm_stats_desc *desc, void *stats, size_t size_stats, @@ -2096,6 +2094,7 @@ int kvm_set_irq_routing(struct kvm *kvm, const struct kvm_irq_routing_entry *entries, unsigned nr, unsigned flags); +int kvm_init_irq_routing(struct kvm *kvm); int kvm_set_routing_entry(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e, const struct kvm_irq_routing_entry *ue); @@ -2105,6 +2104,11 @@ void kvm_free_irq_routing(struct kvm *kvm); static inline void kvm_free_irq_routing(struct kvm *kvm) {} +static inline int kvm_init_irq_routing(struct kvm *kvm) +{ + return 0; +} + #endif int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi); @@ -2441,4 +2445,45 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, } #endif /* CONFIG_KVM_PRIVATE_MEM */ +#ifdef CONFIG_HAVE_KVM_GMEM_PREPARE +int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_order); +bool kvm_arch_gmem_prepare_needed(struct kvm *kvm); +#endif + +/** + * kvm_gmem_populate() - Populate/prepare a GPA range with guest data + * + * @kvm: KVM instance + * @gfn: starting GFN to be populated + * @src: userspace-provided buffer containing data to copy into GFN range + * (passed to @post_populate, and incremented on each iteration + * if not NULL) + * @npages: number of pages to copy from userspace-buffer + * @post_populate: callback to issue for each gmem page that backs the GPA + * range + * @opaque: opaque data to pass to @post_populate callback + * + * This is primarily intended for cases where a gmem-backed GPA range needs + * to be initialized with userspace-provided data prior to being mapped into + * the guest as a private page. This should be called with the slots->lock + * held so that caller-enforced invariants regarding the expected memory + * attributes of the GPA range do not race with KVM_SET_MEMORY_ATTRIBUTES. + * + * Returns the number of pages that were populated. + */ +typedef int (*kvm_gmem_populate_cb)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, + void __user *src, int order, void *opaque); + +long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages, + kvm_gmem_populate_cb post_populate, void *opaque); + +#ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE +void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end); +#endif + +#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY +long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu, + struct kvm_pre_fault_memory *range); +#endif + #endif diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index a0a026d2d244..2d72bd89bf7b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -208,7 +208,8 @@ enum mapping_flags { AS_RELEASE_ALWAYS, /* Call ->release_folio(), even if no private data */ AS_STABLE_WRITES, /* must wait for writeback before modifying folio contents */ - AS_UNMOVABLE, /* The mapping cannot be moved, ever */ + AS_INACCESSIBLE, /* Do not attempt direct R/W access to the mapping, + including to move the mapping */ }; /** @@ -309,20 +310,20 @@ static inline void mapping_clear_stable_writes(struct address_space *mapping) clear_bit(AS_STABLE_WRITES, &mapping->flags); } -static inline void mapping_set_unmovable(struct address_space *mapping) +static inline void mapping_set_inaccessible(struct address_space *mapping) { /* - * It's expected unmovable mappings are also unevictable. Compaction + * It's expected inaccessible mappings are also unevictable. Compaction * migrate scanner (isolate_migratepages_block()) relies on this to * reduce page locking. */ set_bit(AS_UNEVICTABLE, &mapping->flags); - set_bit(AS_UNMOVABLE, &mapping->flags); + set_bit(AS_INACCESSIBLE, &mapping->flags); } -static inline bool mapping_unmovable(struct address_space *mapping) +static inline bool mapping_inaccessible(struct address_space *mapping) { - return test_bit(AS_UNMOVABLE, &mapping->flags); + return test_bit(AS_INACCESSIBLE, &mapping->flags); } static inline gfp_t mapping_gfp_mask(struct address_space * mapping) diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h index 3705c2044fc0..903ddfea8585 100644 --- a/include/linux/psp-sev.h +++ b/include/linux/psp-sev.h @@ -658,6 +658,7 @@ struct sev_data_snp_launch_update { * @id_auth_paddr: system physical address of ID block authentication structure * @id_block_en: indicates whether ID block is present * @auth_key_en: indicates whether author key is present in authentication structure + * @vcek_disabled: indicates whether use of VCEK is allowed for attestation reports * @rsvd: reserved * @host_data: host-supplied data for guest, not interpreted by firmware */ @@ -667,7 +668,8 @@ struct sev_data_snp_launch_finish { u64 id_auth_paddr; u8 id_block_en:1; u8 auth_key_en:1; - u64 rsvd:62; + u8 vcek_disabled:1; + u64 rsvd:61; u8 host_data[32]; } __packed; diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 6f6cb5fc1242..835bbb2d1f88 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -378,6 +378,20 @@ static inline void smp_mb__after_srcu_read_unlock(void) /* __srcu_read_unlock has smp_mb() internally so nothing to do here. */ } +/** + * smp_mb__after_srcu_read_lock - ensure full ordering after srcu_read_lock + * + * Converts the preceding srcu_read_lock into a two-way memory barrier. + * + * Call this after srcu_read_lock, to guarantee that all memory operations + * that occur after smp_mb__after_srcu_read_lock will appear to happen after + * the preceding srcu_read_lock. + */ +static inline void smp_mb__after_srcu_read_lock(void) +{ + /* __srcu_read_lock has smp_mb() internally so nothing to do here. */ +} + DEFINE_LOCK_GUARD_1(srcu, struct srcu_struct, _T->idx = srcu_read_lock(_T->lock), srcu_read_unlock(_T->lock, _T->idx), diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index d03842abae57..637efc055145 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -192,11 +192,24 @@ struct kvm_xen_exit { /* Flags that describe what fields in emulation_failure hold valid data. */ #define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0) +/* + * struct kvm_run can be modified by userspace at any time, so KVM must be + * careful to avoid TOCTOU bugs. In order to protect KVM, HINT_UNSAFE_IN_KVM() + * renames fields in struct kvm_run from <symbol> to <symbol>__unsafe when + * compiled into the kernel, ensuring that any use within KVM is obvious and + * gets extra scrutiny. + */ +#ifdef __KERNEL__ +#define HINT_UNSAFE_IN_KVM(_symbol) _symbol##__unsafe +#else +#define HINT_UNSAFE_IN_KVM(_symbol) _symbol +#endif + /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */ struct kvm_run { /* in */ __u8 request_interrupt_window; - __u8 immediate_exit; + __u8 HINT_UNSAFE_IN_KVM(immediate_exit); __u8 padding1[6]; /* out */ @@ -917,6 +930,9 @@ struct kvm_enable_cap { #define KVM_CAP_MEMORY_ATTRIBUTES 233 #define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_VM_TYPES 235 +#define KVM_CAP_PRE_FAULT_MEMORY 236 +#define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237 +#define KVM_CAP_X86_GUEST_MODE 238 struct kvm_irq_routing_irqchip { __u32 irqchip; @@ -1548,4 +1564,13 @@ struct kvm_create_guest_memfd { __u64 reserved[6]; }; +#define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory) + +struct kvm_pre_fault_memory { + __u64 gpa; + __u64 size; + __u64 flags; + __u64 padding[5]; +}; + #endif /* __LINUX_KVM_H */ diff --git a/include/uapi/linux/psp-sev.h b/include/uapi/linux/psp-sev.h index b7a2c2ee35b7..2289b7c76c59 100644 --- a/include/uapi/linux/psp-sev.h +++ b/include/uapi/linux/psp-sev.h @@ -31,6 +31,7 @@ enum { SNP_PLATFORM_STATUS, SNP_COMMIT, SNP_SET_CONFIG, + SNP_VLEK_LOAD, SEV_MAX, }; @@ -215,6 +216,32 @@ struct sev_user_data_snp_config { } __packed; /** + * struct sev_data_snp_vlek_load - SNP_VLEK_LOAD structure + * + * @len: length of the command buffer read by the PSP + * @vlek_wrapped_version: version of wrapped VLEK hashstick (Must be 0h) + * @rsvd: reserved + * @vlek_wrapped_address: address of a wrapped VLEK hashstick + * (struct sev_user_data_snp_wrapped_vlek_hashstick) + */ +struct sev_user_data_snp_vlek_load { + __u32 len; /* In */ + __u8 vlek_wrapped_version; /* In */ + __u8 rsvd[3]; /* In */ + __u64 vlek_wrapped_address; /* In */ +} __packed; + +/** + * struct sev_user_data_snp_vlek_wrapped_vlek_hashstick - Wrapped VLEK data + * + * @data: Opaque data provided by AMD KDS (as described in SEV-SNP Firmware ABI + * 1.54, SNP_VLEK_LOAD) + */ +struct sev_user_data_snp_wrapped_vlek_hashstick { + __u8 data[432]; /* In */ +} __packed; + +/** * struct sev_issue_cmd - SEV ioctl parameters * * @cmd: SEV commands to execute diff --git a/include/uapi/linux/sev-guest.h b/include/uapi/linux/sev-guest.h index 154a87a1eca9..fcdfea767fca 100644 --- a/include/uapi/linux/sev-guest.h +++ b/include/uapi/linux/sev-guest.h @@ -89,6 +89,9 @@ struct snp_ext_report_req { #define SNP_GUEST_FW_ERR_MASK GENMASK_ULL(31, 0) #define SNP_GUEST_VMM_ERR_SHIFT 32 #define SNP_GUEST_VMM_ERR(x) (((u64)x) << SNP_GUEST_VMM_ERR_SHIFT) +#define SNP_GUEST_FW_ERR(x) ((x) & SNP_GUEST_FW_ERR_MASK) +#define SNP_GUEST_ERR(vmm_err, fw_err) (SNP_GUEST_VMM_ERR(vmm_err) | \ + SNP_GUEST_FW_ERR(fw_err)) #define SNP_GUEST_VMM_ERR_INVALID_LEN 1 #define SNP_GUEST_VMM_ERR_BUSY 2 |