Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- Fix a bug in a python script for Hyper-V (Ani Sinha)
- Workaround a bug in Hyper-V when IBT is enabled (Michael Kelley)
- Fix an issue parsing MP table when Linux runs in VTL2 (Saurabh
Sengar)
- Several cleanup patches (Nischala Yelchuri, Kameron Carr, YueHaibing,
ZhiHu)
* tag 'hyperv-fixes-signed-20230804' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
Drivers: hv: vmbus: Remove unused extern declaration vmbus_ontimer()
x86/hyperv: add noop functions to x86_init mpparse functions
vmbus_testing: fix wrong python syntax for integer value comparison
x86/hyperv: fix a warning in mshyperv.h
x86/hyperv: Disable IBT when hypercall page lacks ENDBR instruction
x86/hyperv: Improve code for referencing hyperv_pcpu_input_arg
Drivers: hv: Change hv_free_hyperv_page() to take void * argument
|
|
Hyper-V can run VMs at different privilege "levels" known as Virtual
Trust Levels (VTL). Sometimes, it chooses to run two different VMs
at different levels but they share some of their address space. In
such setups VTL2 (higher level VM) has visibility of all of the
VTL0 (level 0) memory space.
When the CONFIG_X86_MPPARSE is enabled for VTL2, the VTL2 kernel
performs a search within the low memory to locate MP tables. However,
in systems where VTL0 manages the low memory and may contain valid
tables, this scanning can result in incorrect MP table information
being provided to the VTL2 kernel, mistakenly considering VTL0's MP
table as its own
Add noop functions to avoid MP parse scan by VTL2.
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/r/1687537688-5397-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"x86:
- Do not register IRQ bypass consumer if posted interrupts not
supported
- Fix missed device interrupt due to non-atomic update of IRR
- Use GFP_KERNEL_ACCOUNT for pid_table in ipiv
- Make VMREAD error path play nice with noinstr
- x86: Acquire SRCU read lock when handling fastpath MSR writes
- Support linking rseq tests statically against glibc 2.35+
- Fix reference count for stats file descriptors
- Detect userspace setting invalid CR0
Non-KVM:
- Remove coccinelle script that has caused multiple confusion
("debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE()
usage", acked by Greg)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: selftests: Expand x86's sregs test to cover illegal CR0 values
KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest
KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid
Revert "debugfs, coccinelle: check for obsolete DEFINE_SIMPLE_ATTRIBUTE() usage"
KVM: selftests: Verify stats fd is usable after VM fd has been closed
KVM: selftests: Verify stats fd can be dup()'d and read
KVM: selftests: Verify userspace can create "redundant" binary stats files
KVM: selftests: Explicitly free vcpus array in binary stats test
KVM: selftests: Clean up stats fd in common stats_test() helper
KVM: selftests: Use pread() to read binary stats header
KVM: Grab a reference to KVM for VM and vCPU stats file descriptors
selftests/rseq: Play nice with binaries statically linked against glibc 2.35+
Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"
KVM: x86: Acquire SRCU read lock when handling fastpath MSR writes
KVM: VMX: Use vmread_error() to report VM-Fail in "goto" path
KVM: VMX: Make VMREAD error path play nice with noinstr
KVM: x86/irq: Conditionally register IRQ bypass consumer again
KVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipiv
KVM: x86: check the kvm_cpu_get_interrupt result before using it
KVM: x86: VMX: set irr_pending in kvm_apic_update_irr
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- AMD's automatic IBRS doesn't enable cross-thread branch target
injection protection (STIBP) for user processes. Enable STIBP on such
systems.
- Do not delete (but put the ref instead) of AMD MCE error thresholding
sysfs kobjects when destroying them in order not to delete the kernfs
pointer prematurely
- Restore annotation in ret_from_fork_asm() in order to fix kthread
stack unwinding from being marked as unreliable and thus breaking
livepatching
* tag 'x86_urgent_for_v6.5_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks
x86: Fix kthread unwind
|
|
Commit a2225d931f75 ("autofs: remove left-over autofs4 stubs")
promised the removal of the fs/autofs/Kconfig fragment for AUTOFS4_FS
within a couple of releases, but five years later this still has not
happened yet, and AUTOFS4_FS is still enabled in 63 defconfigs.
Get rid of it mechanically:
git grep -l CONFIG_AUTOFS4_FS -- '*defconfig' |
xargs sed -i 's/AUTOFS4_FS/AUTOFS_FS/'
Also just remove the AUTOFS4_FS config option stub. Anybody who hasn't
regenerated their config file in the last five years will need to just
get the new name right when they do.
Signed-off-by: Sven Joachim <svenjoac@gmx.de>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Stuff CR0 and/or CR4 to be compliant with a restricted guest if and only
if KVM itself is not configured to utilize unrestricted guests, i.e. don't
stuff CR0/CR4 for a restricted L2 that is running as the guest of an
unrestricted L1. Any attempt to VM-Enter a restricted guest with invalid
CR0/CR4 values should fail, i.e. in a nested scenario, KVM (as L0) should
never observe a restricted L2 with incompatible CR0/CR4, since nested
VM-Enter from L1 should have failed.
And if KVM does observe an active, restricted L2 with incompatible state,
e.g. due to a KVM bug, fudging CR0/CR4 instead of letting VM-Enter fail
does more harm than good, as KVM will often neglect to undo the side
effects, e.g. won't clear rmode.vm86_active on nested VM-Exit, and thus
the damage can easily spill over to L1. On the other hand, letting
VM-Enter fail due to bad guest state is more likely to contain the damage
to L2 as KVM relies on hardware to perform most guest state consistency
checks, i.e. KVM needs to be able to reflect a failed nested VM-Enter into
L1 irrespective of (un)restricted guest behavior.
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Fixes: bddd82d19e2e ("KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid,
e.g. due to setting bits 63:32, illegal combinations, or to a value that
isn't allowed in VMX (non-)root mode. The VMX checks in particular are
"fun" as failure to disallow Real Mode for an L2 that is configured with
unrestricted guest disabled, when KVM itself has unrestricted guest
enabled, will result in KVM forcing VM86 mode to virtual Real Mode for
L2, but then fail to unwind the related metadata when synthesizing a
nested VM-Exit back to L1 (which has unrestricted guest enabled).
Opportunistically fix a benign typo in the prototype for is_valid_cr4().
Cc: stable@vger.kernel.org
Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows
dereferencing memslots during WRMSR emulation, drop the requirement that
"next RIP" is valid. In hindsight, acquiring kvm->srcu would have been a
better fix than avoiding the pastpath, but at the time it was thought that
accessing SRCU-protected data in the fastpath was a one-off edge case.
This reverts commit 5c30e8101e8d5d020b1d7119117889756a6ed713.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721224337.2335137-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Temporarily acquire kvm->srcu for read when potentially emulating WRMSR in
the VM-Exit fastpath handler, as several of the common helpers used during
emulation expect the caller to provide SRCU protection. E.g. if the guest
is counting instructions retired, KVM will query the PMU event filter when
stepping over the WRMSR.
dump_stack+0x85/0xdf
lockdep_rcu_suspicious+0x109/0x120
pmc_event_is_allowed+0x165/0x170
kvm_pmu_trigger_event+0xa5/0x190
handle_fastpath_set_msr_irqoff+0xca/0x1e0
svm_vcpu_run+0x5c3/0x7b0 [kvm_amd]
vcpu_enter_guest+0x2108/0x2580
Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this
isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM:
SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"). Providing
protection for the entirety of WRMSR emulation will allow reverting the
aforementioned commit, and will avoid having to play whack-a-mole when new
uses of SRCU-protected structures are inevitably added in common emulation
helpers.
Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events")
Reported-by: Greg Thelen <gthelen@google.com>
Reported-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721224337.2335137-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use vmread_error() to report VM-Fail on VMREAD for the "asm goto" case,
now that trampoline case has yet another wrapper around vmread_error() to
play nice with instrumentation.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721235637.2345403-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Mark vmread_error_trampoline() as noinstr, and add a second trampoline
for the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=n case to enable instrumentation
when handling VM-Fail on VMREAD. VMREAD is used in various noinstr
flows, e.g. immediately after VM-Exit, and objtool rightly complains that
the call to the error trampoline leaves a no-instrumentation section
without annotating that it's safe to do so.
vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0xc9:
call to vmread_error_trampoline() leaves .noinstr.text section
Note, strictly speaking, enabling instrumentation in the VM-Fail path
isn't exactly safe, but if VMREAD fails the kernel/system is likely hosed
anyways, and logging that there is a fatal error is more important than
*maybe* encountering slightly unsafe instrumentation.
Reported-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721235637.2345403-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As was attempted commit 14717e203186 ("kvm: Conditionally register IRQ
bypass consumer"): "if we don't support a mechanism for bypassing IRQs,
don't register as a consumer. Initially this applied to AMD processors,
but when AVIC support was implemented for assigned devices,
kvm_arch_has_irq_bypass() was always returning true.
We can still skip registering the consumer where enable_apicv
or posted-interrupts capability is unsupported or globally disabled.
This eliminates meaningless dev_info()s when the connect fails
between producer and consumer", such as on Linux hosts where enable_apicv
or posted-interrupts capability is unsupported or globally disabled.
Cc: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Yong He <alexyonghe@tencent.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217379
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20230724111236.76570-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The pid_table of ipiv is the persistent memory allocated by
per-vcpu, which should be counted into the memory cgroup.
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Message-Id: <CAPm50aLxCQ3TQP2Lhc0PX3y00iTRg+mniLBqNDOC=t9CLxMwwA@mail.gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The code was blindly assuming that kvm_cpu_get_interrupt never returns -1
when there is a pending interrupt.
While this should be true, a bug in KVM can still cause this.
If -1 is returned, the code before this patch was converting it to 0xFF,
and 0xFF interrupt was injected to the guest, which results in an issue
which was hard to debug.
Add WARN_ON_ONCE to catch this case and skip the injection
if this happens again.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230726135945.260841-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When the APICv is inhibited, the irr_pending optimization is used.
Therefore, when kvm_apic_update_irr sets bits in the IRR,
it must set irr_pending to true as well.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230726135945.260841-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If APICv is inhibited, then IPIs from peer vCPUs are done by
atomically setting bits in IRR.
This means, that when __kvm_apic_update_irr copies PIR to IRR,
it has to modify IRR atomically as well.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230726135945.260841-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Commit c4e34dd99f2e ("x86: simplify load_unaligned_zeropad()
implementation") changes how exceptions around load_unaligned_zeropad()
handled. The kernel now uses the fault_address in fixup_exception() to
verify the address calculations for the load_unaligned_zeropad().
It works fine for #PF, but breaks on #VE since no fault address is
passed down to fixup_exception().
Propagating ve_info.gla down to fixup_exception() resolves the issue.
See commit 1e7769653b06 ("x86/tdx: Handle load_unaligned_zeropad()
page-cross to a shared page") for more context.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Michael Kelley <mikelley@microsoft.com>
Fixes: c4e34dd99f2e ("x86: simplify load_unaligned_zeropad() implementation")
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The following checkpatch warning is removed:
WARNING: Use #include <linux/io.h> instead of <asm/io.h>
Signed-off-by: ZhiHu <huzhi001@208suo.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
with ConfigVersion 9.3 or later support IBT in the guest. However,
current versions of Hyper-V have a bug in that there's not an ENDBR64
instruction at the beginning of the hypercall page. Since hypercalls are
made with an indirect call to the hypercall page, all hypercall attempts
fail with an exception and Linux panics.
A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
with ENDBR. The VM will boot and run without IBT.
If future Linux 32-bit kernels were to support IBT, additional hypercall
page hackery would be needed to make IBT work for such kernels in a
Hyper-V VM.
Cc: stable@vger.kernel.org
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1690001476-98594-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Unlike Intel's Enhanced IBRS feature, AMD's Automatic IBRS does not
provide protection to processes running at CPL3/user mode, see section
"Extended Feature Enable Register (EFER)" in the APM v2 at
https://bugzilla.kernel.org/attachment.cgi?id=304652
Explicitly enable STIBP to protect against cross-thread CPL3
branch target injections on systems with Automatic IBRS enabled.
Also update the relevant documentation.
Fixes: e7862eda309e ("x86/cpu: Support AMD Automatic IBRS")
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230720194727.67022-1-kim.phillips@amd.com
|
|
AMD systems from Family 10h to 16h share MCA bank 4 across multiple CPUs.
Therefore, the threshold_bank structure for bank 4, and its threshold_block
structures, will be initialized once at boot time. And the kobject for the
shared bank will be added to each of the CPUs that share it. Furthermore,
the threshold_blocks for the shared bank will be added again to the bank's
kobject. These additions will increase the refcount for the bank's kobject.
For example, a shared bank with two blocks and shared across two CPUs will
be set up like this:
CPU0 init
bank create and add; bank refcount = 1; threshold_create_bank()
block 0 init and add; bank refcount = 2; allocate_threshold_blocks()
block 1 init and add; bank refcount = 3; allocate_threshold_blocks()
CPU1 init
bank add; bank refcount = 3; threshold_create_bank()
block 0 add; bank refcount = 4; __threshold_add_blocks()
block 1 add; bank refcount = 5; __threshold_add_blocks()
Currently in threshold_remove_bank(), if the bank is shared then
__threshold_remove_blocks() is called. Here the shared bank's kobject and
the bank's blocks' kobjects are deleted. This is done on the first call
even while the structures are still shared. Subsequent calls from other
CPUs that share the structures will attempt to delete the kobjects.
During kobject_del(), kobject->sd is removed. If the kobject is not part of
a kset with default_groups, then subsequent kobject_del() calls seem safe
even with kobject->sd == NULL.
Originally, the AMD MCA thresholding structures did not use default_groups.
And so the above behavior was not apparent.
However, a recent change implemented default_groups for the thresholding
structures. Therefore, kobject_del() will go down the sysfs_remove_groups()
code path. In this case, the first kobject_del() may succeed and remove
kobject->sd. But subsequent kobject_del() calls will give a WARNing in
kernfs_remove_by_name_ns() since kobject->sd == NULL.
Use kobject_put() on the shared bank's kobject when "removing" blocks. This
decrements the bank's refcount while keeping kobjects enabled until the
bank is no longer shared. At that point, kobject_put() will be called on
the blocks which drives their refcount to 0 and deletes them and also
decrementing the bank's refcount. And finally kobject_put() will be called
on the bank driving its refcount to 0 and deleting it.
The same example above:
CPU1 shutdown
bank is shared; bank refcount = 5; threshold_remove_bank()
block 0 put parent bank; bank refcount = 4; __threshold_remove_blocks()
block 1 put parent bank; bank refcount = 3; __threshold_remove_blocks()
CPU0 shutdown
bank is no longer shared; bank refcount = 3; threshold_remove_bank()
block 0 put block; bank refcount = 2; deallocate_threshold_blocks()
block 1 put block; bank refcount = 1; deallocate_threshold_blocks()
put bank; bank refcount = 0; threshold_remove_bank()
Fixes: 7f99cb5e6039 ("x86/CPU/AMD: Use default_groups in kobj_type")
Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Tested-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/alpine.LRH.2.02.2205301145540.25840@file01.intranet.prod.int.rdu2.redhat.com
|
|
The rewrite of ret_from_form() misplaced an unwind hint which caused
all kthread stack unwinds to be marked unreliable, breaking
livepatching.
Restore the annotation and add a comment to explain the how and why of
things.
Fixes: 3aec4ecb3d1f ("x86: Rewrite ret_from_fork() in C")
Reported-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Petr Mladek <pmladek@suse.com>
Link: https://lkml.kernel.org/r/20230719201538.GA3553016@hirez.programming.kicks-ass.net
|
|
Add a fix for the Zen2 VZEROUPPER data corruption bug where under
certain circumstances executing VZEROUPPER can cause register
corruption or leak data.
The optimal fix is through microcode but in the case the proper
microcode revision has not been applied, enable a fallback fix using
a chicken bit.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
Avoid new and remove old forward declarations.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Borislav Petkov:
- Fix a lockdep warning when the event given is the first one, no event
group exists yet but the code still goes and iterates over event
siblings
* tag 'perf_urgent_for_v6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CFI fixes from Peter Zijlstra:
"Fix kCFI/FineIBT weaknesses
The primary bug Alyssa noticed was that with FineIBT enabled function
prologues have a spurious ENDBR instruction:
__cfi_foo:
endbr64
subl $hash, %r10d
jz 1f
ud2
nop
1:
foo:
endbr64 <--- *sadface*
This means that any indirect call that fails to target the __cfi
symbol and instead targets (the regular old) foo+0, will succeed due
to that second ENDBR.
Fixing this led to the discovery of a single indirect call that was
still doing this: ret_from_fork(). Since that's an assembly stub the
compiler would not generate the proper kCFI indirect call magic and it
would not get patched.
Brian came up with the most comprehensive fix -- convert the thing to
C with only a very thin asm wrapper. This ensures the kernel thread
boostrap is a proper kCFI call.
While discussing all this, Kees noted that kCFI hashes could/should be
poisoned to seal all functions whose address is never taken, further
limiting the valid kCFI targets -- much like we already do for IBT.
So what was a 'simple' observation and fix cascaded into a bunch of
inter-related CFI infrastructure fixes"
* tag 'x86_urgent_for_6.5_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cfi: Only define poison_cfi() if CONFIG_X86_KERNEL_IBT=y
x86/fineibt: Poison ENDBR at +0
x86: Rewrite ret_from_fork() in C
x86/32: Remove schedule_tail_wrapper()
x86/cfi: Extend ENDBR sealing to kCFI
x86/alternative: Rename apply_ibt_endbr()
x86/cfi: Extend {JMP,CAKK}_NOSPEC comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
- Fix some missing-prototype warnings
- Fix user events struct args (did not include size of struct)
When creating a user event, the "struct" keyword is to denote that
the size of the field will be passed in. But the parsing failed to
handle this case.
- Add selftest to struct sizes for user events
- Fix sample code for direct trampolines.
The sample code for direct trampolines attached to handle_mm_fault().
But the prototype changed and the direct trampoline sample code was
not updated. Direct trampolines needs to have the arguments correct
otherwise it can fail or crash the system.
- Remove unused ftrace_regs_caller_ret() prototype.
- Quiet false positive of FORTIFY_SOURCE
Due to backward compatibility, the structure used to save stack
traces in the kernel had a fixed size of 8. This structure is
exported to user space via the tracing format file. A change was made
to allow more than 8 functions to be recorded, and user space now
uses the size field to know how many functions are actually in the
stack.
But the structure still has size of 8 (even though it points into the
ring buffer that has the required amount allocated to hold a full
stack.
This was fine until the fortifier noticed that the
memcpy(&entry->caller, stack, size) was greater than the 8 functions
and would complain at runtime about it.
Hide this by using a pointer to the stack location on the ring buffer
instead of using the address of the entry structure caller field.
- Fix a deadloop in reading trace_pipe that was caused by a mismatch
between ring_buffer_empty() returning false which then asked to read
the data, but the read code uses rb_num_of_entries() that returned
zero, and causing a infinite "retry".
- Fix a warning caused by not using all pages allocated to store ftrace
functions, where this can happen if the linker inserts a bunch of
"NULL" entries, causing the accounting of how many pages needed to be
off.
- Fix histogram synthetic event crashing when the start event is
removed and the end event is still using a variable from it
- Fix memory leak in freeing iter->temp in tracing_release_pipe()
* tag 'trace-v6.5-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Fix memory leak of iter->temp when reading trace_pipe
tracing/histograms: Add histograms to hist_vars if they have referenced variables
tracing: Stop FORTIFY_SOURCE complaining about stack trace caller
ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
ring-buffer: Fix deadloop issue on reading trace_pipe
tracing: arm64: Avoid missing-prototype warnings
selftests/user_events: Test struct size match cases
tracing/user_events: Fix struct arg size match check
x86/ftrace: Remove unsued extern declaration ftrace_regs_caller_ret()
arm64: ftrace: Add direct call trampoline samples support
samples: ftrace: Save required argument registers in sample trampolines
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- a cleanup of the Xen related ELF-notes
- a fix for virtio handling in Xen dom0 when running Xen in a VM
* tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent
x86/Xen: tidy xen-head.S
|
|
poison_cfi() was introduced in:
9831c6253ace ("x86/cfi: Extend ENDBR sealing to kCFI")
... but it's only ever used under CONFIG_X86_KERNEL_IBT=y,
and if that option is disabled, we get:
arch/x86/kernel/alternative.c:1243:13: error: ‘poison_cfi’ defined but not used [-Werror=unused-function]
Guard the definition with CONFIG_X86_KERNEL_IBT.
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This is now unused, so can remove it.
Link: https://lore.kernel.org/linux-trace-kernel/20230623091640.21952-1-yuehaibing@huawei.com
Cc: <mark.rutland@arm.com>
Cc: <tglx@linutronix.de>
Cc: <mingo@redhat.com>
Cc: <bp@alien8.de>
Cc: <dave.hansen@linux.intel.com>
Cc: <x86@kernel.org>
Cc: <hpa@zytor.com>
Cc: <peterz@infradead.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Alyssa noticed that when building the kernel with CFI_CLANG+IBT and
booting on IBT enabled hardware to obtain FineIBT, the indirect
functions look like:
__cfi_foo:
endbr64
subl $hash, %r10d
jz 1f
ud2
nop
1:
foo:
endbr64
This is because the compiler generates code for kCFI+IBT. In that case
the caller does the hash check and will jump to +0, so there must be
an ENDBR there. The compiler doesn't know about FineIBT at all; also
it is possible to actually use kCFI+IBT when booting with 'cfi=kcfi'
on IBT enabled hardware.
Having this second ENDBR however makes it possible to elide the CFI
check. Therefore, we should poison this second ENDBR when switching to
FineIBT mode.
Fixes: 931ab63664f0 ("x86/ibt: Implement FineIBT")
Reported-by: "Milburn, Alyssa" <alyssa.milburn@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230615193722.194131053@infradead.org
|
|
When kCFI is enabled, special handling is needed for the indirect call
to the kernel thread function. Rewrite the ret_from_fork() function in
C so that the compiler can properly handle the indirect call.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230623225529.34590-3-brgerst@gmail.com
|
|
The unwinder expects a return address at the very top of the kernel
stack just below pt_regs and before any stack frame is created. Instead
of calling a wrapper, set up a return address as if ret_from_fork()
was called from the syscall entry code.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230623225529.34590-2-brgerst@gmail.com
|
|
Kees noted that IBT sealing could be extended to kCFI.
Fundamentally it is the list of functions that do not have their
address taken and are thus never called indirectly. It doesn't matter
that objtool uses IBT infrastructure to determine this list, once we
have it it can also be used to clobber kCFI hashes and avoid kCFI
indirect calls.
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230622144321.494426891%40infradead.org
|
|
The current name doesn't reflect what it does very well.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230622144321.427441595%40infradead.org
|
|
With the introduction of kCFI these helpers are no longer equivalent
to C indirect calls and should be used with care.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230622144321.360957723%40infradead.org
|
|
On SPR, the load latency event needs an auxiliary event in the same
group to work properly. There's a check in intel_pmu_hw_config()
for this to iterate sibling events and find a mem-loads-aux event.
The for_each_sibling_event() has a lockdep assert to make sure if it
disabled hardirq or hold leader->ctx->mutex. This works well if the
given event has a separate leader event since perf_try_init_event()
grabs the leader->ctx->mutex to protect the sibling list. But it can
cause a problem when the event itself is a leader since the event is
not initialized yet and there's no ctx for the event.
Actually I got a lockdep warning when I run the below command on SPR,
but I guess it could be a NULL pointer dereference.
$ perf record -d -e cpu/mem-loads/uP true
The code path to the warning is:
sys_perf_event_open()
perf_event_alloc()
perf_init_event()
perf_try_init_event()
x86_pmu_event_init()
hsw_hw_config()
intel_pmu_hw_config()
for_each_sibling_event()
lockdep_assert_event_ctx()
We don't need for_each_sibling_event() when it's a standalone event.
Let's return the error code directly.
Fixes: f3c0eba28704 ("perf: Add a few assertions")
Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20230704181516.3293665-1-namhyung@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu fix from Borislav Petkov:
- Do FPU AP initialization on Xen PV too which got missed by the recent
boot reordering work
* tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Fix secondary processors' FPU initialization
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
|
|
Parking CPUs in INIT works well, except for the crash case when the CPU
which invokes smp_park_other_cpus_in_init() is not the boot CPU. Sending
INIT to the boot CPU resets the whole machine.
Prevent this by validating that this runs on the boot CPU. If not fall back
and let CPUs hang in HLT.
Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
Reported-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/87ttui91jo.ffs@tglx
|
|
Moving the call of fpu__init_cpu() from cpu_init() to start_secondary()
broke Xen PV guests, as those don't call start_secondary() for APs.
Call fpu__init_cpu() in Xen's cpu_bringup(), which is the Xen PV
replacement of start_secondary().
Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230703130032.22916-1-jgross@suse.com
|
|
First of all move PV-only ELF notes inside the XEN_PV conditional; note
that
- HV_START_LOW is dropped altogether, as it was meaningful for 32-bit PV
only,
- the 32-bit instance of VIRT_BASE is dropped, as it would be dead code
once inside the conditional,
- while PADDR_OFFSET is not exactly unused for PVH, it defaults to zero
there, and the hypervisor (or tool stack) complains if it is present
but VIRT_BASE isn't.
Then have the "supported features" note actually report reality: All
three of the features there are supported and/or applicable only in
certain cases.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/f99bacc6-2a2f-41b0-5c0b-e01b7051cb07@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Eager page splitting optimization for dirty logging, optionally
allowing for a VM to avoid the cost of hugepage splitting in the
stage-2 fault path.
- Arm FF-A proxy for pKVM, allowing a pKVM host to safely interact
with services that live in the Secure world. pKVM intervenes on
FF-A calls to guarantee the host doesn't misuse memory donated to
the hyp or a pKVM guest.
- Support for running the split hypervisor with VHE enabled, known as
'hVHE' mode. This is extremely useful for testing the split
hypervisor on VHE-only systems, and paves the way for new use cases
that depend on having two TTBRs available at EL2.
- Generalized framework for configurable ID registers from userspace.
KVM/arm64 currently prevents arbitrary CPU feature set
configuration from userspace, but the intent is to relax this
limitation and allow userspace to select a feature set consistent
with the CPU.
- Enable the use of Branch Target Identification (FEAT_BTI) in the
hypervisor.
- Use a separate set of pointer authentication keys for the
hypervisor when running in protected mode, as the host is untrusted
at runtime.
- Ensure timer IRQs are consistently released in the init failure
paths.
- Avoid trapping CTR_EL0 on systems with Enhanced Virtualization
Traps (FEAT_EVT), as it is a register commonly read from userspace.
- Erratum workaround for the upcoming AmpereOne part, which has
broken hardware A/D state management.
RISC-V:
- Redirect AMO load/store misaligned traps to KVM guest
- Trap-n-emulate AIA in-kernel irqchip for KVM guest
- Svnapot support for KVM Guest
s390:
- New uvdevice secret API
- CMM selftest and fixes
- fix racy access to target CPU for diag 9c
x86:
- Fix missing/incorrect #GP checks on ENCLS
- Use standard mmu_notifier hooks for handling APIC access page
- Drop now unnecessary TR/TSS load after VM-Exit on AMD
- Print more descriptive information about the status of SEV and
SEV-ES during module load
- Add a test for splitting and reconstituting hugepages during and
after dirty logging
- Add support for CPU pinning in demand paging test
- Add support for AMD PerfMonV2, with a variety of cleanups and minor
fixes included along the way
- Add a "nx_huge_pages=never" option to effectively avoid creating NX
hugepage recovery threads (because nx_huge_pages=off can be toggled
at runtime)
- Move handling of PAT out of MTRR code and dedup SVM+VMX code
- Fix output of PIC poll command emulation when there's an interrupt
- Add a maintainer's handbook to document KVM x86 processes,
preferred coding style, testing expectations, etc.
- Misc cleanups, fixes and comments
Generic:
- Miscellaneous bugfixes and cleanups
Selftests:
- Generate dependency files so that partial rebuilds work as
expected"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (153 commits)
Documentation/process: Add a maintainer handbook for KVM x86
Documentation/process: Add a label for the tip tree handbook's coding style
KVM: arm64: Fix misuse of KVM_ARM_VCPU_POWER_OFF bit index
RISC-V: KVM: Remove unneeded semicolon
RISC-V: KVM: Allow Svnapot extension for Guest/VM
riscv: kvm: define vcpu_sbi_ext_pmu in header
RISC-V: KVM: Expose IMSIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel virtualization of AIA IMSIC
RISC-V: KVM: Expose APLIC registers as attributes of AIA irqchip
RISC-V: KVM: Add in-kernel emulation of AIA APLIC
RISC-V: KVM: Implement device interface for AIA irqchip
RISC-V: KVM: Skeletal in-kernel AIA irqchip support
RISC-V: KVM: Set kvm_riscv_aia_nr_hgei to zero
RISC-V: KVM: Add APLIC related defines
RISC-V: KVM: Add IMSIC related defines
RISC-V: KVM: Implement guest external interrupt line management
KVM: x86: Remove PRIx* definitions as they are solely for user space
s390/uv: Update query for secret-UVCs
s390/uv: replace scnprintf with sysfs_emit
s390/uvdevice: Add 'Lock Secret Store' UVC
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single regression fix for x86:
Moving the invocation of arch_cpu_finalize_init() earlier in the boot
process caused a boot regression on IBT enabled system.
The root cause is not the move of arch_cpu_finalize_init() itself. The
system fails to boot because the subsequent efi_enter_virtual_mode()
code has a non-IBT safe EFI call inside. This was not noticed before
because IBT was enabled after the EFI initialization.
Switching the EFI call to use the IBT safe wrapper cures the problem"
* tag 'x86-urgent-2023-07-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Make efi_set_virtual_address_map IBT safe
|
|
KVM VMX changes for 6.5:
- Fix missing/incorrect #GP checks on ENCLS
- Use standard mmu_notifier hooks for handling APIC access page
- Misc cleanups
|
|
KVM SVM changes for 6.5:
- Drop manual TR/TSS load after VM-Exit now that KVM uses VMLOAD for host state
- Fix a not-yet-problematic missing call to trace_kvm_exit() for VM-Exits that
are handled in the fastpath
- Print more descriptive information about the status of SEV and SEV-ES during
module load
- Assert that misc_cg_set_capacity() doesn't fail to avoid should-be-impossible
memory leaks
|
|
KVM x86/pmu changes for 6.5:
- Add support for AMD PerfMonV2, with a variety of cleanups and minor fixes
included along the way
|
|
KVM x86/mmu changes for 6.5:
- Add back a comment about the subtle side effect of try_cmpxchg64() in
tdp_mmu_set_spte_atomic()
- Add an assertion in __kvm_mmu_invalidate_addr() to verify that the target
KVM MMU is the current MMU
- Add a "never" option to effectively avoid creating NX hugepage recovery
threads
|
|
KVM x86 changes for 6.5:
* Move handling of PAT out of MTRR code and dedup SVM+VMX code
* Fix output of PIC poll command emulation when there's an interrupt
* Add a maintainer's handbook to document KVM x86 processes, preferred coding
style, testing expectations, etc.
* Misc cleanups
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Although some more stuff is brewing, the EFI changes that are ready
for mainline are few this cycle:
- improve the PCI DMA paranoia logic in the EFI stub
- some constification changes
- add statfs support to efivarfs
- allow user space to enumerate updatable firmware resources without
CAP_SYS_ADMIN"
* tag 'efi-next-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/libstub: Disable PCI DMA before grabbing the EFI memory map
efi/esrt: Allow ESRT access without CAP_SYS_ADMIN
efivarfs: expose used and total size
efi: make kobj_type structure constant
efi: x86: make kobj_type structure constant
|