Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.5, part #2
- Fixes for the configuration of SVE/SME traps when hVHE mode is in use
- Allow use of pKVM on systems with FF-A implementations that are v1.0
compatible
- Request/release percpu IRQs (arch timer, vGIC maintenance) correctly
when pKVM is in use
- Fix function prototype after __kvm_host_psci_cpu_entry() rename
- Skip to the next instruction when emulating writes to TCR_EL1 on
AmpereOne systems
|
|
To avoid possible time-of-check/time-of-use issues, the GHCB should
almost never be accessed outside dump_ghcb, sev_es_sync_to_ghcb
and sev_es_sync_from_ghcb. The only legitimate uses are to set the
exitinfo fields and to find the address of the scratch area embedded
in the ghcb. Accessing ghcb_usage also goes through svm->sev_es.ghcb
in sev_es_validate_vmgexit(), but that is because anyway the value is
not used.
Removing a shortcut variable that contains the value of svm->sev_es.ghcb
makes these cases a bit more verbose, but it limits the chance of someone
reading the ghcb by mistake.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
A KVM guest using SEV-ES or SEV-SNP with multiple vCPUs can trigger
a double fetch race condition vulnerability and invoke the VMGEXIT
handler recursively.
sev_handle_vmgexit() maps the GHCB page using kvm_vcpu_map() and then
fetches the exit code using ghcb_get_sw_exit_code(). Soon after,
sev_es_validate_vmgexit() fetches the exit code again. Since the GHCB
page is shared with the guest, the guest is able to quickly swap the
values with another vCPU and hence bypass the validation. One vmexit code
that can be rejected by sev_es_validate_vmgexit() is SVM_EXIT_VMGEXIT;
if sev_handle_vmgexit() observes it in the second fetch, the call
to svm_invoke_exit_handler() will invoke sev_handle_vmgexit() again
recursively.
To avoid the race, always fetch the GHCB data from the places where
sev_es_sync_from_ghcb stores it.
Exploiting recursions on linux kernel has been proven feasible
in the past, but the impact is mitigated by stack guard pages
(CONFIG_VMAP_STACK). Still, if an attacker manages to call the handler
multiple times, they can theoretically trigger a stack overflow and
cause a denial-of-service, or potentially guest-to-host escape in kernel
configurations without stack guard pages.
Note that winning the race reliably in every iteration is very tricky
due to the very tight window of the fetches; depending on the compiler
settings, they are often consecutive because of optimization and inlining.
Tested by booting an SEV-ES RHEL9 guest.
Fixes: CVE-2023-4155
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Reported-by: Andy Nguyen <theflow@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Validation of the GHCB is susceptible to time-of-check/time-of-use vulnerabilities.
To avoid them, we would like to always snapshot the fields that are read in
sev_es_validate_vmgexit(), and not use the GHCB anymore after it returns.
This means:
- invoking sev_es_sync_from_ghcb() before any GHCB access, including before
sev_es_validate_vmgexit()
- snapshotting all fields including the valid bitmap and the sw_scratch field,
which are currently not caching anywhere.
The valid bitmap is the first thing to be copied out of the GHCB; then,
further accesses will use the copy in svm->sev_es.
Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Stuff CR0 and/or CR4 to be compliant with a restricted guest if and only
if KVM itself is not configured to utilize unrestricted guests, i.e. don't
stuff CR0/CR4 for a restricted L2 that is running as the guest of an
unrestricted L1. Any attempt to VM-Enter a restricted guest with invalid
CR0/CR4 values should fail, i.e. in a nested scenario, KVM (as L0) should
never observe a restricted L2 with incompatible CR0/CR4, since nested
VM-Enter from L1 should have failed.
And if KVM does observe an active, restricted L2 with incompatible state,
e.g. due to a KVM bug, fudging CR0/CR4 instead of letting VM-Enter fail
does more harm than good, as KVM will often neglect to undo the side
effects, e.g. won't clear rmode.vm86_active on nested VM-Exit, and thus
the damage can easily spill over to L1. On the other hand, letting
VM-Enter fail due to bad guest state is more likely to contain the damage
to L2 as KVM relies on hardware to perform most guest state consistency
checks, i.e. KVM needs to be able to reflect a failed nested VM-Enter into
L1 irrespective of (un)restricted guest behavior.
Cc: Jim Mattson <jmattson@google.com>
Cc: stable@vger.kernel.org
Fixes: bddd82d19e2e ("KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid,
e.g. due to setting bits 63:32, illegal combinations, or to a value that
isn't allowed in VMX (non-)root mode. The VMX checks in particular are
"fun" as failure to disallow Real Mode for an L2 that is configured with
unrestricted guest disabled, when KVM itself has unrestricted guest
enabled, will result in KVM forcing VM86 mode to virtual Real Mode for
L2, but then fail to unwind the related metadata when synthesizing a
nested VM-Exit back to L1 (which has unrestricted guest enabled).
Opportunistically fix a benign typo in the prototype for is_valid_cr4().
Cc: stable@vger.kernel.org
Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230613203037.1968489-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows
dereferencing memslots during WRMSR emulation, drop the requirement that
"next RIP" is valid. In hindsight, acquiring kvm->srcu would have been a
better fix than avoiding the pastpath, but at the time it was thought that
accessing SRCU-protected data in the fastpath was a one-off edge case.
This reverts commit 5c30e8101e8d5d020b1d7119117889756a6ed713.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721224337.2335137-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Temporarily acquire kvm->srcu for read when potentially emulating WRMSR in
the VM-Exit fastpath handler, as several of the common helpers used during
emulation expect the caller to provide SRCU protection. E.g. if the guest
is counting instructions retired, KVM will query the PMU event filter when
stepping over the WRMSR.
dump_stack+0x85/0xdf
lockdep_rcu_suspicious+0x109/0x120
pmc_event_is_allowed+0x165/0x170
kvm_pmu_trigger_event+0xa5/0x190
handle_fastpath_set_msr_irqoff+0xca/0x1e0
svm_vcpu_run+0x5c3/0x7b0 [kvm_amd]
vcpu_enter_guest+0x2108/0x2580
Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this
isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM:
SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"). Providing
protection for the entirety of WRMSR emulation will allow reverting the
aforementioned commit, and will avoid having to play whack-a-mole when new
uses of SRCU-protected structures are inevitably added in common emulation
helpers.
Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events")
Reported-by: Greg Thelen <gthelen@google.com>
Reported-by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721224337.2335137-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Use vmread_error() to report VM-Fail on VMREAD for the "asm goto" case,
now that trampoline case has yet another wrapper around vmread_error() to
play nice with instrumentation.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721235637.2345403-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Mark vmread_error_trampoline() as noinstr, and add a second trampoline
for the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=n case to enable instrumentation
when handling VM-Fail on VMREAD. VMREAD is used in various noinstr
flows, e.g. immediately after VM-Exit, and objtool rightly complains that
the call to the error trampoline leaves a no-instrumentation section
without annotating that it's safe to do so.
vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0xc9:
call to vmread_error_trampoline() leaves .noinstr.text section
Note, strictly speaking, enabling instrumentation in the VM-Fail path
isn't exactly safe, but if VMREAD fails the kernel/system is likely hosed
anyways, and logging that there is a fatal error is more important than
*maybe* encountering slightly unsafe instrumentation.
Reported-by: Su Hui <suhui@nfschina.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230721235637.2345403-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
As was attempted commit 14717e203186 ("kvm: Conditionally register IRQ
bypass consumer"): "if we don't support a mechanism for bypassing IRQs,
don't register as a consumer. Initially this applied to AMD processors,
but when AVIC support was implemented for assigned devices,
kvm_arch_has_irq_bypass() was always returning true.
We can still skip registering the consumer where enable_apicv
or posted-interrupts capability is unsupported or globally disabled.
This eliminates meaningless dev_info()s when the connect fails
between producer and consumer", such as on Linux hosts where enable_apicv
or posted-interrupts capability is unsupported or globally disabled.
Cc: Alex Williamson <alex.williamson@redhat.com>
Reported-by: Yong He <alexyonghe@tencent.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217379
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20230724111236.76570-1-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The pid_table of ipiv is the persistent memory allocated by
per-vcpu, which should be counted into the memory cgroup.
Signed-off-by: Peng Hao <flyingpeng@tencent.com>
Message-Id: <CAPm50aLxCQ3TQP2Lhc0PX3y00iTRg+mniLBqNDOC=t9CLxMwwA@mail.gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The code was blindly assuming that kvm_cpu_get_interrupt never returns -1
when there is a pending interrupt.
While this should be true, a bug in KVM can still cause this.
If -1 is returned, the code before this patch was converting it to 0xFF,
and 0xFF interrupt was injected to the guest, which results in an issue
which was hard to debug.
Add WARN_ON_ONCE to catch this case and skip the injection
if this happens again.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230726135945.260841-4-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When the APICv is inhibited, the irr_pending optimization is used.
Therefore, when kvm_apic_update_irr sets bits in the IRR,
it must set irr_pending to true as well.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230726135945.260841-3-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
If APICv is inhibited, then IPIs from peer vCPUs are done by
atomically setting bits in IRR.
This means, that when __kvm_apic_update_irr copies PIR to IRR,
it has to modify IRR atomically as well.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20230726135945.260841-2-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Whelp, this is embarrassing. Since commit 082fdfd13841 ("KVM: arm64:
Prevent guests from enabling HA/HD on Ampere1") KVM traps writes to
TCR_EL1 on AmpereOne to work around an erratum in the unadvertised
HAFDBS implementation, preventing the guest from enabling the feature.
Unfortunately, I failed virtualization 101 when working on that change,
and forgot to advance PC after instruction emulation.
Do the right thing and skip the MSR instruction after emulating the
write.
Fixes: 082fdfd13841 ("KVM: arm64: Prevent guests from enabling HA/HD on Ampere1")
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230728000824.3848025-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The kvm_host_psci_cpu_entry() function was renamed in order to add a wrapper around
it, but the prototype did not change, so now the missing-prototype warning came
back in W=1 builds:
arch/arm64/kvm/hyp/nvhe/psci-relay.c:203:28: error: no previous prototype for function '__kvm_host_psci_cpu_entry' [-Werror,-Wmissing-prototypes]
asmlinkage void __noreturn __kvm_host_psci_cpu_entry(bool is_cpu_on)
Fixes: dcf89d1111995 ("KVM: arm64: Add missing BTI instructions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230724121850.1386668-1-arnd@kernel.org
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Ensure that SME traps are disabled for (h)VHE when getting the
reset value for the architectural feature control register.
Fixes: 75c76ab5a641 ("KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration")
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230724123829.2929609-9-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Ensure that SVE traps are disabled for hVHE, if the FPSIMD state
isn't owned by the guest, when getting the reset value for the
architectural feature control register.
Fixes: 75c76ab5a641 ("KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration")
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230724123829.2929609-8-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Instead of writing directly to cptr_el2, use the helper that
selects which feature trap register to write to based on the KVM
mode.
Fixes: 75c76ab5a641 ("KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration")
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230724123829.2929609-7-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Factor out the code that decides whether to write to the feature
trap registers, CPTR_EL2 or CPACR_EL1, based on the KVM mode,
i.e., (h)VHE or nVHE.
This function will be used in the subsequent patch.
No functional change intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230724123829.2929609-6-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Ensure that SME traps are disabled for (h)VHE when setting up
EL2, as they are for nVHE.
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230724123829.2929609-5-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Use the architectural feature trap/control register that
corresponds to the current KVM mode, i.e., CPTR_EL2 or CPACR_EL1,
when setting up SVE feature traps.
Signed-off-by: Fuad Tabba <tabba@google.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230724123829.2929609-4-tabba@google.com
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
The code for checking whether the kernel is in (h)VHE mode is
repeated, and will be needed again in future patches. Factor it
out in a macro.
No functional change intended.
No change in emitted assembly code intended.
Signed-off-by: Fuad Tabba <tabba@google.com>
Link: https://lore.kernel.org/kvmarm/20230724123829.2929609-3-tabba@google.com/
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Avoid pKVM finalization if KVM initialization fails
- Add missing BTI instructions in the hypervisor, fixing an early
boot failure on BTI systems
- Handle MMU notifiers correctly for non hugepage-aligned memslots
- Work around a bug in the architecture where hypervisor timer
controls have UNKNOWN behavior under nested virt
- Disable preemption in kvm_arch_hardware_enable(), fixing a kernel
BUG in cpu hotplug resulting from per-CPU accessor sanity checking
- Make WFI emulation on GICv4 systems robust w.r.t. preemption,
consistently requesting a doorbell interrupt on vcpu_put()
- Uphold RES0 sysreg behavior when emulating older PMU versions
- Avoid macro expansion when initializing PMU register names,
ensuring the tracepoints pretty-print the sysreg
s390:
- Two fixes for asynchronous destroy
x86 fixes will come early next week"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: s390: pv: fix index value of replaced ASCE
KVM: s390: pv: simplify shutdown and fix race
KVM: arm64: Fix the name of sys_reg_desc related to PMU
KVM: arm64: Correctly handle RES0 bits PMEVTYPER<n>_EL0.evtCount
KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption
KVM: arm64: Add missing BTI instructions
KVM: arm64: Correctly handle page aging notifiers for unaligned memslot
KVM: arm64: Disable preemption in kvm_arch_hardware_enable()
KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm
KVM: arm64: timers: Use CNTHCTL_EL2 when setting non-CNTKCTL_EL1 bits
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
Two fixes for asynchronous destroy
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.5, part #1
- Avoid pKVM finalization if KVM initialization fails
- Add missing BTI instructions in the hypervisor, fixing an early boot
failure on BTI systems
- Handle MMU notifiers correctly for non hugepage-aligned memslots
- Work around a bug in the architecture where hypervisor timer controls
have UNKNOWN behavior under nested virt.
- Disable preemption in kvm_arch_hardware_enable(), fixing a kernel BUG
in cpu hotplug resulting from per-CPU accessor sanity checking.
- Make WFI emulation on GICv4 systems robust w.r.t. preemption,
consistently requesting a doorbell interrupt on vcpu_put()
- Uphold RES0 sysreg behavior when emulating older PMU versions
- Avoid macro expansion when initializing PMU register names, ensuring
the tracepoints pretty-print the sysreg.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Reinstate support for little endian ELFv1 binaries, which it turns
out still exist in the wild.
- Revert a change which used asm goto for WARN_ON/__WARN_FLAGS, as it
lead to dead code generation and seemed to trigger compiler bugs in
some edge cases.
- Fix a deadlock in the pseries VAS code, between live migration and
the driver's mmap handler.
- Disable KCOV instrumentation in the powerpc KASAN code.
Thanks to Andrew Donnellan, Benjamin Gray, Christophe Leroy, Haren
Myneni, Russell Currey, and Uwe Kleine-König.
* tag 'powerpc-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
Revert "powerpc/64s: Remove support for ELFv1 little endian userspace"
powerpc/kasan: Disable KCOV in KASAN code
powerpc/512x: lpbfifo: Convert to platform remove callback returning void
powerpc/crypto: Add gitignore for generated P10 AES/GCM .S files
Revert "powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto"
powerpc/pseries/vas: Hold mmap_mutex after mmap lock during window close
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Fix per vma lock fault handling: add missing !(fault & VM_FAULT_ERROR)
check to fault handler to prevent error handling for return values
that don't indicate an error
- Use kfree_sensitive() instead of kfree() in paes crypto code to clear
memory that may contain keys before freeing it
- Fix reply buffer size calculation for CCA replies in zcrypt device
driver
* tag 's390-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/zcrypt: fix reply buffer calculations for CCA replies
s390/crypto: use kfree_sensitive() instead of kfree()
s390/mm: fix per vma lock fault handling
|
|
Pull io_uring fixes from Jens Axboe:
- Fix for io-wq not always honoring REQ_F_NOWAIT, if it was set and
punted directly (eg via DRAIN) (me)
- Capability check fix (Ondrej)
- Regression fix for the mmap changes that went into 6.4, which
apparently broke IA64 (Helge)
* tag 'io_uring-6.5-2023-07-21' of git://git.kernel.dk/linux:
ia64: mmap: Consider pgoff when searching for free mapping
io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()
io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq
io_uring: don't audit the capability check in io_uring_create()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"I've picked up a handful of arm64 fixes while Catalin's been away, so
here they are. Below is the usual summary, but we have basically have
two cleanups, a fix for an SME crash and a fix for hibernation:
- Fix saving of SME state after SVE vector length is changed
- Fix sparse warnings for missing vDSO function prototypes
- Fix hibernation resume path when kfence is enabled
- Fix field names for the HFGxTR_EL2 register"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
arm64: vdso: Clear common make C=2 warnings
arm64: mm: Make hibernation aware of KFENCE
arm64: Fix HFGxTR_EL2 field naming
|
|
IA64 is the only architecture which does not consider the pgoff value when
searching for a possible free memory region with vm_unmapped_area().
Adding this seems to have no negative side effect on IA64, so add it now
to make IA64 consistent with all other architectures.
Cc: stable@vger.kernel.org # 6.4
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-ia64@vger.kernel.org
Link: https://lore.kernel.org/r/20230721152432.196382-3-deller@gmx.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The io_uring testcase is broken on IA-64 since commit d808459b2e31
("io_uring: Adjust mapping wrt architecture aliasing requirements").
The reason is, that this commit introduced an own architecture
independend get_unmapped_area() search algorithm which finds on IA-64 a
memory region which is outside of the regular memory region used for
shared userspace mappings and which can't be used on that platform
due to aliasing.
To avoid similar problems on IA-64 and other platforms in the future,
it's better to switch back to the architecture-provided
get_unmapped_area() function and adjust the needed input parameters
before the call. Beside fixing the issue, the function now becomes
easier to understand and maintain.
This patch has been successfully tested with the io_uring testcase on
physical x86-64, ppc64le, IA-64 and PA-RISC machines. On PA-RISC the LTP
mmmap testcases did not report any regressions.
Cc: stable@vger.kernel.org # 6.4
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk>
Fixes: d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements")
Link: https://lore.kernel.org/r/20230721152432.196382-2-deller@gmx.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When we reconfigure the SVE vector length we discard the backing storage
for the SVE vectors and then reallocate on next SVE use, leaving the SME
specific state alone. This means that we do not enable SME traps if they
were already disabled. That means that userspace code can enter streaming
mode without trapping, putting the task in a state where if we try to save
the state of the task we will fault.
Since the ABI does not specify that changing the SVE vector length disturbs
SME state, and since SVE code may not be aware of SME code in the process,
we shouldn't simply discard any ZA state. Instead immediately reallocate
the storage for SVE, and disable SME if we change the SVE vector length
while there is no SME state active.
Disabling SME traps on SVE vector length changes would make the overall
code more complex since we would have a state where we have valid SME state
stored but might get a SME trap.
Fixes: 9e4ab6c89109 ("arm64/sme: Implement vector length configuration prctl()s")
Reported-by: David Spickett <David.Spickett@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230720-arm64-fix-sve-sme-vl-change-v2-1-8eea06b82d57@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from BPF, netfilter, bluetooth and CAN.
Current release - regressions:
- eth: r8169: multiple fixes for PCIe ASPM-related problems
- vrf: fix RCU lockdep splat in output path
Previous releases - regressions:
- gso: fall back to SW segmenting with GSO_UDP_L4 dodgy bit set
- dsa: mv88e6xxx: do a final check before timing out when polling
- nf_tables: fix sleep in atomic in nft_chain_validate
Previous releases - always broken:
- sched: fix undoing tcf_bind_filter() in multiple classifiers
- bpf, arm64: fix BTI type used for freplace attached functions
- can: gs_usb: fix time stamp counter initialization
- nft_set_pipapo: fix improper element removal (leading to UAF)
Misc:
- net: support STP on bridge in non-root netns, STP prevents packet
loops so not supporting it results in freezing systems of
unsuspecting users, and in turn very upset noises being made
- fix kdoc warnings
- annotate various bits of TCP state to prevent data races"
* tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits)
net: phy: prevent stale pointer dereference in phy_init()
tcp: annotate data-races around fastopenq.max_qlen
tcp: annotate data-races around icsk->icsk_user_timeout
tcp: annotate data-races around tp->notsent_lowat
tcp: annotate data-races around rskq_defer_accept
tcp: annotate data-races around tp->linger2
tcp: annotate data-races around icsk->icsk_syn_retries
tcp: annotate data-races around tp->keepalive_probes
tcp: annotate data-races around tp->keepalive_intvl
tcp: annotate data-races around tp->keepalive_time
tcp: annotate data-races around tp->tsoffset
tcp: annotate data-races around tp->tcp_tx_delay
Bluetooth: MGMT: Use correct address for memcpy()
Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014
Bluetooth: SCO: fix sco_conn related locking and validity issues
Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link
Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()
Bluetooth: coredump: fix building with coredump disabled
Bluetooth: ISO: fix iso_conn related locking and validity issues
Bluetooth: hci_event: call disconnect callback before deleting conn
...
|
|
kvm_arm_hardware_enabled is rather misleading, since it doesn't track
the state of all hardware resources needed for running a VM. What it
actually tracks is whether or not the hyp cpu context has been
initialized.
Since we're now at the point where vgic + timer irq management has
been separated from kvm_arm_hardware_enabled, rephrase it (and the
associated helpers) to make it clear what state is being tracked.
Acked-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230719231855.262973-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
When running in protected mode, the hyp stub is disabled after pKVM is
initialized, meaning the host cannot enable/disable the hyp at
runtime. As such, kvm_arm_hardware_enabled is always 1 after
initialization, and kvm_arch_hardware_enable() never enables the vgic
maintenance irq or timer irqs.
Unconditionally enable/disable the vgic + timer irqs in the respective
calls, instead relying on the percpu bookkeeping in the generic code
to keep track of which cpus have the interrupts unmasked.
Fixes: 466d27e48d7c ("KVM: arm64: Simplify the CPUHP logic")
Reported-by: Oliver Upton <oliver.upton@linux.dev>
Suggested-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Raghavendra Rao Ananta <rananta@google.com>
Link: https://lore.kernel.org/r/20230719175400.647154-1-rananta@google.com
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
key might contain private part of the key, so better use
kfree_sensitive() to free it.
Signed-off-by: Wang Ming <machel@vivo.com>
Reviewed-by: Harald Freudenberger <freude@linux.ibm.com>
Link: https://lore.kernel.org/r/20230717094533.18418-1-machel@vivo.com
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
make C=2 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- xxx.o
When I use the command above to do a 'make C=2' check on any object file,
the following warnings are always output:
CHECK arch/arm64/kernel/vdso/vgettimeofday.c
arch/arm64/kernel/vdso/vgettimeofday.c:9:5: warning:
symbol '__kernel_clock_gettime' was not declared. Should it be static?
arch/arm64/kernel/vdso/vgettimeofday.c:15:5: warning:
symbol '__kernel_gettimeofday' was not declared. Should it be static?
arch/arm64/kernel/vdso/vgettimeofday.c:21:5: warning:
symbol '__kernel_clock_getres' was not declared. Should it be static?
Therefore, the declaration of the three functions is added to eliminate
these common warnings to provide a clean output.
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230713115831.777-1-thunder.leizhen@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
In the restore path, swsusp_arch_suspend_exit uses copy_page() to
over-write memory. However, with features like KFENCE enabled, there could
be situations where it may have marked some pages as not valid, due to
which it could be reported as invalid accesses.
Consider a situation where page 'P' was part of the hibernation image.
Now, when the resume kernel tries to restore the pages, the same page 'P'
is already in use in the resume kernel and is kfence protected, due to
which its mapping is removed from linear map. Since restoring pages happens
with the resume kernel page tables, we would end up accessing 'P' during
copy and results in kernel pagefault.
The proposed fix tries to solve this issue by marking PTE as valid for such
kfence protected pages.
Co-developed-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Nikhil V <quic_nprakash@quicinc.com>
Link: https://lore.kernel.org/r/20230713070757.4093-1-quic_nprakash@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
pKVM initialization fails on systems with v1.1+ FF-A implementations, as
the hyp does a strict match on the returned version from FFA_VERSION.
This is a stronger assertion than required by the specification, which
requires minor revisions be backwards compatible with earlier revisions
of the same major version.
Relax the check in hyp_ffa_init() to only test the returned major
version. Even though v1.1 broke ABI, the expectation is that firmware
incapable of using the v1.0 ABI return NOT_SUPPORTED instead of a valid
version.
Acked-by: Marc Zyngier <maz@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230718184537.3220867-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
|
|
This reverts commit 606787fed7268feb256957872586370b56af697a.
ELFv1 with LE has never been a thing, and people who try to make ELFv1 LE
binaries are maniacs who need to be stopped, but unfortunately there are
ELFv1 LE binaries out there in the wild.
One such binary is the ppc64el (as Debian calls it) helper for
arch-test[0], a tool for detecting architectures that can be executed on a
given machine by means of attempting to execute helper binaries compiled
for each architecture and seeing which binaries succeed and fail. The
helpers are small snippets of assembly, and the ppc64el assembly doesn't
include the right directives to generate an ELFv2 binary.
This results in arch-test incorrectly determining that a ppc64el kernel
can't execute a ppc64el userspace, which in turn means that a number of
developer tools such as debootstrap will break (assuming arch-test is
installed).
[0] https://github.com/kilobyte/arch-test
Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230719071821.320594-1-ajd@linux.ibm.com
|
|
When running an freplace attached bpf program on an arm64 system w were
seeing the following issue:
Unhandled 64-bit el1h sync exception on CPU47, ESR 0x0000000036000003 -- BTI
After a bit of work to track it down I determined that what appeared to be
happening is that the 'bti c' at the start of the program was somehow being
reached after a 'br' instruction. Further digging pointed me toward the
fact that the function was attached via freplace. This in turn led me to
build_plt which I believe is invoking the long jump which is triggering
this error.
To resolve it we can replace the 'bti c' with 'bti jc' and add a comment
explaining why this has to be modified as such.
Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Signed-off-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/168926677665.316237.9953845318337455525.stgit@ahduyck-xeon-server.home.arpa
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
With per-vma locks, handle_mm_fault() may return non-fatal error
flags. In this case the code should reset the fault flags before
returning.
Fixes: e06f47a16573 ("s390/mm: try VMA lock-based page fault handling first")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
The index field of the struct page corresponding to a guest ASCE should
be 0. When replacing the ASCE in s390_replace_asce(), the index of the
new ASCE should also be set to 0.
Having the wrong index might lead to the wrong addresses being passed
around when notifying pte invalidations, and eventually to validity
intercepts (VM crash) if the prefix gets unmapped and the notifier gets
called with the wrong address.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Fixes: faa2f72cb356 ("KVM: s390: pv: leak the topmost page table when destroy fails")
Reviewed-by: Janosch Frank <frankja@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20230705111937.33472-3-imbrenda@linux.ibm.com>
|
|
Simplify the shutdown of non-protected VMs. There is no need to do
complex manipulations of the counter if it was zero.
This also fixes a very rare race which caused pages to be torn down
from the address space with a non-zero counter even on older machines
that don't support the UVC instruction, causing a crash.
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot")
Reviewed-by: Nico Boehr <nrb@linux.ibm.com>
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-ID: <20230705111937.33472-2-imbrenda@linux.ibm.com>
|
|
As per the generic KASAN code in mm/kasan, disable KCOV with
KCOV_INSTRUMENT := n in the makefile.
This fixes a ppc64 boot hang when KCOV and KASAN are enabled.
kasan_early_init() gets called before a PACA is initialised, but the
KCOV hook expects a valid PACA.
Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230710044143.146840-1-bgray@linux.ibm.com
|
|
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new() which already returns void. Eventually after all drivers
are converted, .remove_new() is renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230711143145.1192651-1-u.kleine-koenig@pengutronix.de
|
|
aesp10-ppc.S and ghashp10-ppc.S are autogenerated and not tracked by
git, so they should be ignored. This is doing the same as the P8 files
in drivers/crypto/vmx/.gitignore but for the P10 files in
arch/powerpc/crypto.
Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230713042206.85669-1-ruscur@russell.cc
|
|
with asm goto"
This partly reverts commit 1e688dd2a3d6759d416616ff07afc4bb836c4213.
That commit aimed at optimising the code around generation of
WARN_ON/BUG_ON but this leads to a lot of dead code erroneously
generated by GCC.
That dead code becomes a problem when we start using objtool validation
because objtool will abort validation with a warning as soon as it
detects unreachable code. This is because unreachable code might
be the indication that objtool doesn't properly decode object text.
text data bss dec hex filename
9551585 3627834 224376 13403795 cc8693 vmlinux.before
9535281 3628358 224376 13388015 cc48ef vmlinux.after
Once this change is reverted, in a standard configuration (pmac32 +
function tracer) the text is reduced by 16k which is around 1.7%
We already had problem with it when starting to use objtool on powerpc
as a replacement for recordmcount, see commit 93e3f45a2631 ("powerpc:
Fix __WARN_FLAGS() for use with Objtool")
There is also a problem with at least GCC 12, on ppc64_defconfig +
CONFIG_CC_OPTIMIZE_FOR_SIZE=y + CONFIG_DEBUG_SECTION_MISMATCH=y :
LD .tmp_vmlinux.kallsyms1
powerpc64-linux-ld: net/ipv4/tcp_input.o:(__ex_table+0xc4): undefined reference to `.L2136'
make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1
make[1]: *** [/home/chleroy/linux-powerpc/Makefile:1238: vmlinux] Error 2
Taking into account that other problems are encountered with that
'asm goto' in WARN_ON(), including build failures, keeping that
change is not worth it allthough it is primarily a compiler bug.
Revert it for now.
mpe: Retain EMIT_WARN_ENTRY as a synonym for EMIT_BUG_ENTRY to reduce
churn, as there are now nearly as many uses of EMIT_WARN_ENTRY as
EMIT_BUG_ENTRY.
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Naveen N Rao <naveen@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230712134552.534955-1-mpe@ellerman.id.au
|