summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2023-08-04Merge tag 'kvmarm-fixes-6.5-2' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.5, part #2 - Fixes for the configuration of SVE/SME traps when hVHE mode is in use - Allow use of pKVM on systems with FF-A implementations that are v1.0 compatible - Request/release percpu IRQs (arch timer, vGIC maintenance) correctly when pKVM is in use - Fix function prototype after __kvm_host_psci_cpu_entry() rename - Skip to the next instruction when emulating writes to TCR_EL1 on AmpereOne systems
2023-08-04KVM: SEV: remove ghcb variable declarationsPaolo Bonzini
To avoid possible time-of-check/time-of-use issues, the GHCB should almost never be accessed outside dump_ghcb, sev_es_sync_to_ghcb and sev_es_sync_from_ghcb. The only legitimate uses are to set the exitinfo fields and to find the address of the scratch area embedded in the ghcb. Accessing ghcb_usage also goes through svm->sev_es.ghcb in sev_es_validate_vmgexit(), but that is because anyway the value is not used. Removing a shortcut variable that contains the value of svm->sev_es.ghcb makes these cases a bit more verbose, but it limits the chance of someone reading the ghcb by mistake. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-04KVM: SEV: only access GHCB fields oncePaolo Bonzini
A KVM guest using SEV-ES or SEV-SNP with multiple vCPUs can trigger a double fetch race condition vulnerability and invoke the VMGEXIT handler recursively. sev_handle_vmgexit() maps the GHCB page using kvm_vcpu_map() and then fetches the exit code using ghcb_get_sw_exit_code(). Soon after, sev_es_validate_vmgexit() fetches the exit code again. Since the GHCB page is shared with the guest, the guest is able to quickly swap the values with another vCPU and hence bypass the validation. One vmexit code that can be rejected by sev_es_validate_vmgexit() is SVM_EXIT_VMGEXIT; if sev_handle_vmgexit() observes it in the second fetch, the call to svm_invoke_exit_handler() will invoke sev_handle_vmgexit() again recursively. To avoid the race, always fetch the GHCB data from the places where sev_es_sync_from_ghcb stores it. Exploiting recursions on linux kernel has been proven feasible in the past, but the impact is mitigated by stack guard pages (CONFIG_VMAP_STACK). Still, if an attacker manages to call the handler multiple times, they can theoretically trigger a stack overflow and cause a denial-of-service, or potentially guest-to-host escape in kernel configurations without stack guard pages. Note that winning the race reliably in every iteration is very tricky due to the very tight window of the fetches; depending on the compiler settings, they are often consecutive because of optimization and inlining. Tested by booting an SEV-ES RHEL9 guest. Fixes: CVE-2023-4155 Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: stable@vger.kernel.org Reported-by: Andy Nguyen <theflow@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-08-04KVM: SEV: snapshot the GHCB before accessing itPaolo Bonzini
Validation of the GHCB is susceptible to time-of-check/time-of-use vulnerabilities. To avoid them, we would like to always snapshot the fields that are read in sev_es_validate_vmgexit(), and not use the GHCB anymore after it returns. This means: - invoking sev_es_sync_from_ghcb() before any GHCB access, including before sev_es_validate_vmgexit() - snapshotting all fields including the valid bitmap and the sw_scratch field, which are currently not caching anywhere. The valid bitmap is the first thing to be copied out of the GHCB; then, further accesses will use the copy in svm->sev_es. Fixes: 291bd20d5d88 ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guestSean Christopherson
Stuff CR0 and/or CR4 to be compliant with a restricted guest if and only if KVM itself is not configured to utilize unrestricted guests, i.e. don't stuff CR0/CR4 for a restricted L2 that is running as the guest of an unrestricted L1. Any attempt to VM-Enter a restricted guest with invalid CR0/CR4 values should fail, i.e. in a nested scenario, KVM (as L0) should never observe a restricted L2 with incompatible CR0/CR4, since nested VM-Enter from L1 should have failed. And if KVM does observe an active, restricted L2 with incompatible state, e.g. due to a KVM bug, fudging CR0/CR4 instead of letting VM-Enter fail does more harm than good, as KVM will often neglect to undo the side effects, e.g. won't clear rmode.vm86_active on nested VM-Exit, and thus the damage can easily spill over to L1. On the other hand, letting VM-Enter fail due to bad guest state is more likely to contain the damage to L2 as KVM relies on hardware to perform most guest state consistency checks, i.e. KVM needs to be able to reflect a failed nested VM-Enter into L1 irrespective of (un)restricted guest behavior. Cc: Jim Mattson <jmattson@google.com> Cc: stable@vger.kernel.org Fixes: bddd82d19e2e ("KVM: nVMX: KVM needs to unset "unrestricted guest" VM-execution control in vmcs02 if vmcs12 doesn't set it") Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230613203037.1968489-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalidSean Christopherson
Reject KVM_SET_SREGS{2} with -EINVAL if the incoming CR0 is invalid, e.g. due to setting bits 63:32, illegal combinations, or to a value that isn't allowed in VMX (non-)root mode. The VMX checks in particular are "fun" as failure to disallow Real Mode for an L2 that is configured with unrestricted guest disabled, when KVM itself has unrestricted guest enabled, will result in KVM forcing VM86 mode to virtual Real Mode for L2, but then fail to unwind the related metadata when synthesizing a nested VM-Exit back to L1 (which has unrestricted guest enabled). Opportunistically fix a benign typo in the prototype for is_valid_cr4(). Cc: stable@vger.kernel.org Reported-by: syzbot+5feef0b9ee9c8e9e5689@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000f316b705fdf6e2b4@google.com Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230613203037.1968489-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29Revert "KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"Sean Christopherson
Now that handle_fastpath_set_msr_irqoff() acquires kvm->srcu, i.e. allows dereferencing memslots during WRMSR emulation, drop the requirement that "next RIP" is valid. In hindsight, acquiring kvm->srcu would have been a better fix than avoiding the pastpath, but at the time it was thought that accessing SRCU-protected data in the fastpath was a one-off edge case. This reverts commit 5c30e8101e8d5d020b1d7119117889756a6ed713. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721224337.2335137-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: Acquire SRCU read lock when handling fastpath MSR writesSean Christopherson
Temporarily acquire kvm->srcu for read when potentially emulating WRMSR in the VM-Exit fastpath handler, as several of the common helpers used during emulation expect the caller to provide SRCU protection. E.g. if the guest is counting instructions retired, KVM will query the PMU event filter when stepping over the WRMSR. dump_stack+0x85/0xdf lockdep_rcu_suspicious+0x109/0x120 pmc_event_is_allowed+0x165/0x170 kvm_pmu_trigger_event+0xa5/0x190 handle_fastpath_set_msr_irqoff+0xca/0x1e0 svm_vcpu_run+0x5c3/0x7b0 [kvm_amd] vcpu_enter_guest+0x2108/0x2580 Alternatively, check_pmu_event_filter() could acquire kvm->srcu, but this isn't the first bug of this nature, e.g. see commit 5c30e8101e8d ("KVM: SVM: Skip WRMSR fastpath on VM-Exit if next RIP isn't valid"). Providing protection for the entirety of WRMSR emulation will allow reverting the aforementioned commit, and will avoid having to play whack-a-mole when new uses of SRCU-protected structures are inevitably added in common emulation helpers. Fixes: dfdeda67ea2d ("KVM: x86/pmu: Prevent the PMU from counting disallowed events") Reported-by: Greg Thelen <gthelen@google.com> Reported-by: Aaron Lewis <aaronlewis@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721224337.2335137-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: VMX: Use vmread_error() to report VM-Fail in "goto" pathSean Christopherson
Use vmread_error() to report VM-Fail on VMREAD for the "asm goto" case, now that trampoline case has yet another wrapper around vmread_error() to play nice with instrumentation. Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721235637.2345403-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: VMX: Make VMREAD error path play nice with noinstrSean Christopherson
Mark vmread_error_trampoline() as noinstr, and add a second trampoline for the CONFIG_CC_HAS_ASM_GOTO_OUTPUT=n case to enable instrumentation when handling VM-Fail on VMREAD. VMREAD is used in various noinstr flows, e.g. immediately after VM-Exit, and objtool rightly complains that the call to the error trampoline leaves a no-instrumentation section without annotating that it's safe to do so. vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0xc9: call to vmread_error_trampoline() leaves .noinstr.text section Note, strictly speaking, enabling instrumentation in the VM-Fail path isn't exactly safe, but if VMREAD fails the kernel/system is likely hosed anyways, and logging that there is a fatal error is more important than *maybe* encountering slightly unsafe instrumentation. Reported-by: Su Hui <suhui@nfschina.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20230721235637.2345403-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86/irq: Conditionally register IRQ bypass consumer againLike Xu
As was attempted commit 14717e203186 ("kvm: Conditionally register IRQ bypass consumer"): "if we don't support a mechanism for bypassing IRQs, don't register as a consumer. Initially this applied to AMD processors, but when AVIC support was implemented for assigned devices, kvm_arch_has_irq_bypass() was always returning true. We can still skip registering the consumer where enable_apicv or posted-interrupts capability is unsupported or globally disabled. This eliminates meaningless dev_info()s when the connect fails between producer and consumer", such as on Linux hosts where enable_apicv or posted-interrupts capability is unsupported or globally disabled. Cc: Alex Williamson <alex.williamson@redhat.com> Reported-by: Yong He <alexyonghe@tencent.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217379 Signed-off-by: Like Xu <likexu@tencent.com> Message-Id: <20230724111236.76570-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: X86: Use GFP_KERNEL_ACCOUNT for pid_table in ipivPeng Hao
The pid_table of ipiv is the persistent memory allocated by per-vcpu, which should be counted into the memory cgroup. Signed-off-by: Peng Hao <flyingpeng@tencent.com> Message-Id: <CAPm50aLxCQ3TQP2Lhc0PX3y00iTRg+mniLBqNDOC=t9CLxMwwA@mail.gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: check the kvm_cpu_get_interrupt result before using itMaxim Levitsky
The code was blindly assuming that kvm_cpu_get_interrupt never returns -1 when there is a pending interrupt. While this should be true, a bug in KVM can still cause this. If -1 is returned, the code before this patch was converting it to 0xFF, and 0xFF interrupt was injected to the guest, which results in an issue which was hard to debug. Add WARN_ON_ONCE to catch this case and skip the injection if this happens again. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230726135945.260841-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: VMX: set irr_pending in kvm_apic_update_irrMaxim Levitsky
When the APICv is inhibited, the irr_pending optimization is used. Therefore, when kvm_apic_update_irr sets bits in the IRR, it must set irr_pending to true as well. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230726135945.260841-3-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-29KVM: x86: VMX: __kvm_apic_update_irr must update the IRR atomicallyMaxim Levitsky
If APICv is inhibited, then IPIs from peer vCPUs are done by atomically setting bits in IRR. This means, that when __kvm_apic_update_irr copies PIR to IRR, it has to modify IRR atomically as well. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20230726135945.260841-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-07-28KVM: arm64: Skip instruction after emulating write to TCR_EL1Oliver Upton
Whelp, this is embarrassing. Since commit 082fdfd13841 ("KVM: arm64: Prevent guests from enabling HA/HD on Ampere1") KVM traps writes to TCR_EL1 on AmpereOne to work around an erratum in the unadvertised HAFDBS implementation, preventing the guest from enabling the feature. Unfortunately, I failed virtualization 101 when working on that change, and forgot to advance PC after instruction emulation. Do the right thing and skip the MSR instruction after emulating the write. Fixes: 082fdfd13841 ("KVM: arm64: Prevent guests from enabling HA/HD on Ampere1") Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230728000824.3848025-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: fix __kvm_host_psci_cpu_entry() prototypeArnd Bergmann
The kvm_host_psci_cpu_entry() function was renamed in order to add a wrapper around it, but the prototype did not change, so now the missing-prototype warning came back in W=1 builds: arch/arm64/kvm/hyp/nvhe/psci-relay.c:203:28: error: no previous prototype for function '__kvm_host_psci_cpu_entry' [-Werror,-Wmissing-prototypes] asmlinkage void __noreturn __kvm_host_psci_cpu_entry(bool is_cpu_on) Fixes: dcf89d1111995 ("KVM: arm64: Add missing BTI instructions") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724121850.1386668-1-arnd@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Fix resetting SME trap values on reset for (h)VHEFuad Tabba
Ensure that SME traps are disabled for (h)VHE when getting the reset value for the architectural feature control register. Fixes: 75c76ab5a641 ("KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration") Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724123829.2929609-9-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Fix resetting SVE trap values on reset for hVHEFuad Tabba
Ensure that SVE traps are disabled for hVHE, if the FPSIMD state isn't owned by the guest, when getting the reset value for the architectural feature control register. Fixes: 75c76ab5a641 ("KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration") Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724123829.2929609-8-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Use the appropriate feature trap register when activating trapsFuad Tabba
Instead of writing directly to cptr_el2, use the helper that selects which feature trap register to write to based on the KVM mode. Fixes: 75c76ab5a641 ("KVM: arm64: Rework CPTR_EL2 programming for HVHE configuration") Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724123829.2929609-7-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Helper to write to appropriate feature trap register based on modeFuad Tabba
Factor out the code that decides whether to write to the feature trap registers, CPTR_EL2 or CPACR_EL1, based on the KVM mode, i.e., (h)VHE or nVHE. This function will be used in the subsequent patch. No functional change intended. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724123829.2929609-6-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Disable SME traps for (h)VHE at setupFuad Tabba
Ensure that SME traps are disabled for (h)VHE when setting up EL2, as they are for nVHE. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724123829.2929609-5-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Use the appropriate feature trap register for SVE at EL2 setupFuad Tabba
Use the architectural feature trap/control register that corresponds to the current KVM mode, i.e., CPTR_EL2 or CPACR_EL1, when setting up SVE feature traps. Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230724123829.2929609-4-tabba@google.com Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-26KVM: arm64: Factor out code for checking (h)VHE mode into a macroFuad Tabba
The code for checking whether the kernel is in (h)VHE mode is repeated, and will be needed again in future patches. Factor it out in a macro. No functional change intended. No change in emitted assembly code intended. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/kvmarm/20230724123829.2929609-3-tabba@google.com/ Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Avoid pKVM finalization if KVM initialization fails - Add missing BTI instructions in the hypervisor, fixing an early boot failure on BTI systems - Handle MMU notifiers correctly for non hugepage-aligned memslots - Work around a bug in the architecture where hypervisor timer controls have UNKNOWN behavior under nested virt - Disable preemption in kvm_arch_hardware_enable(), fixing a kernel BUG in cpu hotplug resulting from per-CPU accessor sanity checking - Make WFI emulation on GICv4 systems robust w.r.t. preemption, consistently requesting a doorbell interrupt on vcpu_put() - Uphold RES0 sysreg behavior when emulating older PMU versions - Avoid macro expansion when initializing PMU register names, ensuring the tracepoints pretty-print the sysreg s390: - Two fixes for asynchronous destroy x86 fixes will come early next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: pv: fix index value of replaced ASCE KVM: s390: pv: simplify shutdown and fix race KVM: arm64: Fix the name of sys_reg_desc related to PMU KVM: arm64: Correctly handle RES0 bits PMEVTYPER<n>_EL0.evtCount KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption KVM: arm64: Add missing BTI instructions KVM: arm64: Correctly handle page aging notifiers for unaligned memslot KVM: arm64: Disable preemption in kvm_arch_hardware_enable() KVM: arm64: Handle kvm_arm_init failure correctly in finalize_pkvm KVM: arm64: timers: Use CNTHCTL_EL2 when setting non-CNTKCTL_EL1 bits
2023-07-23Merge tag 'kvm-s390-master-6.5-1' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Two fixes for asynchronous destroy
2023-07-23Merge tag 'kvmarm-fixes-6.5-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.5, part #1 - Avoid pKVM finalization if KVM initialization fails - Add missing BTI instructions in the hypervisor, fixing an early boot failure on BTI systems - Handle MMU notifiers correctly for non hugepage-aligned memslots - Work around a bug in the architecture where hypervisor timer controls have UNKNOWN behavior under nested virt. - Disable preemption in kvm_arch_hardware_enable(), fixing a kernel BUG in cpu hotplug resulting from per-CPU accessor sanity checking. - Make WFI emulation on GICv4 systems robust w.r.t. preemption, consistently requesting a doorbell interrupt on vcpu_put() - Uphold RES0 sysreg behavior when emulating older PMU versions - Avoid macro expansion when initializing PMU register names, ensuring the tracepoints pretty-print the sysreg.
2023-07-22Merge tag 'powerpc-6.5-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Reinstate support for little endian ELFv1 binaries, which it turns out still exist in the wild. - Revert a change which used asm goto for WARN_ON/__WARN_FLAGS, as it lead to dead code generation and seemed to trigger compiler bugs in some edge cases. - Fix a deadlock in the pseries VAS code, between live migration and the driver's mmap handler. - Disable KCOV instrumentation in the powerpc KASAN code. Thanks to Andrew Donnellan, Benjamin Gray, Christophe Leroy, Haren Myneni, Russell Currey, and Uwe Kleine-König. * tag 'powerpc-6.5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: Revert "powerpc/64s: Remove support for ELFv1 little endian userspace" powerpc/kasan: Disable KCOV in KASAN code powerpc/512x: lpbfifo: Convert to platform remove callback returning void powerpc/crypto: Add gitignore for generated P10 AES/GCM .S files Revert "powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto" powerpc/pseries/vas: Hold mmap_mutex after mmap lock during window close
2023-07-22Merge tag 's390-6.5-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - Fix per vma lock fault handling: add missing !(fault & VM_FAULT_ERROR) check to fault handler to prevent error handling for return values that don't indicate an error - Use kfree_sensitive() instead of kfree() in paes crypto code to clear memory that may contain keys before freeing it - Fix reply buffer size calculation for CCA replies in zcrypt device driver * tag 's390-6.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: fix reply buffer calculations for CCA replies s390/crypto: use kfree_sensitive() instead of kfree() s390/mm: fix per vma lock fault handling
2023-07-22Merge tag 'io_uring-6.5-2023-07-21' of git://git.kernel.dk/linuxLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Fix for io-wq not always honoring REQ_F_NOWAIT, if it was set and punted directly (eg via DRAIN) (me) - Capability check fix (Ondrej) - Regression fix for the mmap changes that went into 6.4, which apparently broke IA64 (Helge) * tag 'io_uring-6.5-2023-07-21' of git://git.kernel.dk/linux: ia64: mmap: Consider pgoff when searching for free mapping io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area() io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq io_uring: don't audit the capability check in io_uring_create()
2023-07-21Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "I've picked up a handful of arm64 fixes while Catalin's been away, so here they are. Below is the usual summary, but we have basically have two cleanups, a fix for an SME crash and a fix for hibernation: - Fix saving of SME state after SVE vector length is changed - Fix sparse warnings for missing vDSO function prototypes - Fix hibernation resume path when kfence is enabled - Fix field names for the HFGxTR_EL2 register" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes arm64: vdso: Clear common make C=2 warnings arm64: mm: Make hibernation aware of KFENCE arm64: Fix HFGxTR_EL2 field naming
2023-07-21ia64: mmap: Consider pgoff when searching for free mappingHelge Deller
IA64 is the only architecture which does not consider the pgoff value when searching for a possible free memory region with vm_unmapped_area(). Adding this seems to have no negative side effect on IA64, so add it now to make IA64 consistent with all other architectures. Cc: stable@vger.kernel.org # 6.4 Signed-off-by: Helge Deller <deller@gmx.de> Tested-by: matoro <matoro_mailinglist_kernel@matoro.tk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-ia64@vger.kernel.org Link: https://lore.kernel.org/r/20230721152432.196382-3-deller@gmx.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-21io_uring: Fix io_uring mmap() by using architecture-provided get_unmapped_area()Helge Deller
The io_uring testcase is broken on IA-64 since commit d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements"). The reason is, that this commit introduced an own architecture independend get_unmapped_area() search algorithm which finds on IA-64 a memory region which is outside of the regular memory region used for shared userspace mappings and which can't be used on that platform due to aliasing. To avoid similar problems on IA-64 and other platforms in the future, it's better to switch back to the architecture-provided get_unmapped_area() function and adjust the needed input parameters before the call. Beside fixing the issue, the function now becomes easier to understand and maintain. This patch has been successfully tested with the io_uring testcase on physical x86-64, ppc64le, IA-64 and PA-RISC machines. On PA-RISC the LTP mmmap testcases did not report any regressions. Cc: stable@vger.kernel.org # 6.4 Signed-off-by: Helge Deller <deller@gmx.de> Reported-by: matoro <matoro_mailinglist_kernel@matoro.tk> Fixes: d808459b2e31 ("io_uring: Adjust mapping wrt architecture aliasing requirements") Link: https://lore.kernel.org/r/20230721152432.196382-2-deller@gmx.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-21arm64/fpsimd: Ensure SME storage is allocated after SVE VL changesMark Brown
When we reconfigure the SVE vector length we discard the backing storage for the SVE vectors and then reallocate on next SVE use, leaving the SME specific state alone. This means that we do not enable SME traps if they were already disabled. That means that userspace code can enter streaming mode without trapping, putting the task in a state where if we try to save the state of the task we will fault. Since the ABI does not specify that changing the SVE vector length disturbs SME state, and since SVE code may not be aware of SME code in the process, we shouldn't simply discard any ZA state. Instead immediately reallocate the storage for SVE, and disable SME if we change the SVE vector length while there is no SME state active. Disabling SME traps on SVE vector length changes would make the overall code more complex since we would have a state where we have valid SME state stored but might get a SME trap. Fixes: 9e4ab6c89109 ("arm64/sme: Implement vector length configuration prctl()s") Reported-by: David Spickett <David.Spickett@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230720-arm64-fix-sve-sme-vl-change-v2-1-8eea06b82d57@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20Merge tag 'net-6.5-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from BPF, netfilter, bluetooth and CAN. Current release - regressions: - eth: r8169: multiple fixes for PCIe ASPM-related problems - vrf: fix RCU lockdep splat in output path Previous releases - regressions: - gso: fall back to SW segmenting with GSO_UDP_L4 dodgy bit set - dsa: mv88e6xxx: do a final check before timing out when polling - nf_tables: fix sleep in atomic in nft_chain_validate Previous releases - always broken: - sched: fix undoing tcf_bind_filter() in multiple classifiers - bpf, arm64: fix BTI type used for freplace attached functions - can: gs_usb: fix time stamp counter initialization - nft_set_pipapo: fix improper element removal (leading to UAF) Misc: - net: support STP on bridge in non-root netns, STP prevents packet loops so not supporting it results in freezing systems of unsuspecting users, and in turn very upset noises being made - fix kdoc warnings - annotate various bits of TCP state to prevent data races" * tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: phy: prevent stale pointer dereference in phy_init() tcp: annotate data-races around fastopenq.max_qlen tcp: annotate data-races around icsk->icsk_user_timeout tcp: annotate data-races around tp->notsent_lowat tcp: annotate data-races around rskq_defer_accept tcp: annotate data-races around tp->linger2 tcp: annotate data-races around icsk->icsk_syn_retries tcp: annotate data-races around tp->keepalive_probes tcp: annotate data-races around tp->keepalive_intvl tcp: annotate data-races around tp->keepalive_time tcp: annotate data-races around tp->tsoffset tcp: annotate data-races around tp->tcp_tx_delay Bluetooth: MGMT: Use correct address for memcpy() Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014 Bluetooth: SCO: fix sco_conn related locking and validity issues Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor() Bluetooth: coredump: fix building with coredump disabled Bluetooth: ISO: fix iso_conn related locking and validity issues Bluetooth: hci_event: call disconnect callback before deleting conn ...
2023-07-20KVM: arm64: Rephrase percpu enable/disable tracking in terms of hypOliver Upton
kvm_arm_hardware_enabled is rather misleading, since it doesn't track the state of all hardware resources needed for running a VM. What it actually tracks is whether or not the hyp cpu context has been initialized. Since we're now at the point where vgic + timer irq management has been separated from kvm_arm_hardware_enabled, rephrase it (and the associated helpers) to make it clear what state is being tracked. Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20230719231855.262973-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-20KVM: arm64: Fix hardware enable/disable flows for pKVMRaghavendra Rao Ananta
When running in protected mode, the hyp stub is disabled after pKVM is initialized, meaning the host cannot enable/disable the hyp at runtime. As such, kvm_arm_hardware_enabled is always 1 after initialization, and kvm_arch_hardware_enable() never enables the vgic maintenance irq or timer irqs. Unconditionally enable/disable the vgic + timer irqs in the respective calls, instead relying on the percpu bookkeeping in the generic code to keep track of which cpus have the interrupts unmasked. Fixes: 466d27e48d7c ("KVM: arm64: Simplify the CPUHP logic") Reported-by: Oliver Upton <oliver.upton@linux.dev> Suggested-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Raghavendra Rao Ananta <rananta@google.com> Link: https://lore.kernel.org/r/20230719175400.647154-1-rananta@google.com Acked-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-20s390/crypto: use kfree_sensitive() instead of kfree()Wang Ming
key might contain private part of the key, so better use kfree_sensitive() to free it. Signed-off-by: Wang Ming <machel@vivo.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Link: https://lore.kernel.org/r/20230717094533.18418-1-machel@vivo.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-20arm64: vdso: Clear common make C=2 warningsZhen Lei
make C=2 ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- xxx.o When I use the command above to do a 'make C=2' check on any object file, the following warnings are always output: CHECK arch/arm64/kernel/vdso/vgettimeofday.c arch/arm64/kernel/vdso/vgettimeofday.c:9:5: warning: symbol '__kernel_clock_gettime' was not declared. Should it be static? arch/arm64/kernel/vdso/vgettimeofday.c:15:5: warning: symbol '__kernel_gettimeofday' was not declared. Should it be static? arch/arm64/kernel/vdso/vgettimeofday.c:21:5: warning: symbol '__kernel_clock_getres' was not declared. Should it be static? Therefore, the declaration of the three functions is added to eliminate these common warnings to provide a clean output. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230713115831.777-1-thunder.leizhen@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-20arm64: mm: Make hibernation aware of KFENCENikhil V
In the restore path, swsusp_arch_suspend_exit uses copy_page() to over-write memory. However, with features like KFENCE enabled, there could be situations where it may have marked some pages as not valid, due to which it could be reported as invalid accesses. Consider a situation where page 'P' was part of the hibernation image. Now, when the resume kernel tries to restore the pages, the same page 'P' is already in use in the resume kernel and is kfence protected, due to which its mapping is removed from linear map. Since restoring pages happens with the resume kernel page tables, we would end up accessing 'P' during copy and results in kernel pagefault. The proposed fix tries to solve this issue by marking PTE as valid for such kfence protected pages. Co-developed-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Signed-off-by: Nikhil V <quic_nprakash@quicinc.com> Link: https://lore.kernel.org/r/20230713070757.4093-1-quic_nprakash@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2023-07-19KVM: arm64: Allow pKVM on v1.0 compatible FF-A implementationsOliver Upton
pKVM initialization fails on systems with v1.1+ FF-A implementations, as the hyp does a strict match on the returned version from FFA_VERSION. This is a stronger assertion than required by the specification, which requires minor revisions be backwards compatible with earlier revisions of the same major version. Relax the check in hyp_ffa_init() to only test the returned major version. Even though v1.1 broke ABI, the expectation is that firmware incapable of using the v1.0 ABI return NOT_SUPPORTED instead of a valid version. Acked-by: Marc Zyngier <maz@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230718184537.3220867-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2023-07-19Revert "powerpc/64s: Remove support for ELFv1 little endian userspace"Andrew Donnellan
This reverts commit 606787fed7268feb256957872586370b56af697a. ELFv1 with LE has never been a thing, and people who try to make ELFv1 LE binaries are maniacs who need to be stopped, but unfortunately there are ELFv1 LE binaries out there in the wild. One such binary is the ppc64el (as Debian calls it) helper for arch-test[0], a tool for detecting architectures that can be executed on a given machine by means of attempting to execute helper binaries compiled for each architecture and seeing which binaries succeed and fail. The helpers are small snippets of assembly, and the ppc64el assembly doesn't include the right directives to generate an ELFv2 binary. This results in arch-test incorrectly determining that a ppc64el kernel can't execute a ppc64el userspace, which in turn means that a number of developer tools such as debootstrap will break (assuming arch-test is installed). [0] https://github.com/kilobyte/arch-test Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230719071821.320594-1-ajd@linux.ibm.com
2023-07-18bpf, arm64: Fix BTI type used for freplace attached functionsAlexander Duyck
When running an freplace attached bpf program on an arm64 system w were seeing the following issue: Unhandled 64-bit el1h sync exception on CPU47, ESR 0x0000000036000003 -- BTI After a bit of work to track it down I determined that what appeared to be happening is that the 'bti c' at the start of the program was somehow being reached after a 'br' instruction. Further digging pointed me toward the fact that the function was attached via freplace. This in turn led me to build_plt which I believe is invoking the long jump which is triggering this error. To resolve it we can replace the 'bti c' with 'bti jc' and add a comment explaining why this has to be modified as such. Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Acked-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/168926677665.316237.9953845318337455525.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-07-18s390/mm: fix per vma lock fault handlingSven Schnelle
With per-vma locks, handle_mm_fault() may return non-fatal error flags. In this case the code should reset the fault flags before returning. Fixes: e06f47a16573 ("s390/mm: try VMA lock-based page fault handling first") Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2023-07-18KVM: s390: pv: fix index value of replaced ASCEClaudio Imbrenda
The index field of the struct page corresponding to a guest ASCE should be 0. When replacing the ASCE in s390_replace_asce(), the index of the new ASCE should also be set to 0. Having the wrong index might lead to the wrong addresses being passed around when notifying pte invalidations, and eventually to validity intercepts (VM crash) if the prefix gets unmapped and the notifier gets called with the wrong address. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Fixes: faa2f72cb356 ("KVM: s390: pv: leak the topmost page table when destroy fails") Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20230705111937.33472-3-imbrenda@linux.ibm.com>
2023-07-18KVM: s390: pv: simplify shutdown and fix raceClaudio Imbrenda
Simplify the shutdown of non-protected VMs. There is no need to do complex manipulations of the counter if it was zero. This also fixes a very rare race which caused pages to be torn down from the address space with a non-zero counter even on older machines that don't support the UVC instruction, causing a crash. Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Fixes: fb491d5500a7 ("KVM: s390: pv: asynchronous destroy for reboot") Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20230705111937.33472-2-imbrenda@linux.ibm.com>
2023-07-17powerpc/kasan: Disable KCOV in KASAN codeBenjamin Gray
As per the generic KASAN code in mm/kasan, disable KCOV with KCOV_INSTRUMENT := n in the makefile. This fixes a ppc64 boot hang when KCOV and KASAN are enabled. kasan_early_init() gets called before a PACA is initialised, but the KCOV hook expects a valid PACA. Suggested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Benjamin Gray <bgray@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230710044143.146840-1-bgray@linux.ibm.com
2023-07-17powerpc/512x: lpbfifo: Convert to platform remove callback returning voidUwe Kleine-König
The .remove() callback for a platform driver returns an int which makes many driver authors wrongly assume it's possible to do error handling by returning an error code. However the value returned is ignored (apart from emitting a warning) and this typically results in resource leaks. To improve here there is a quest to make the remove callback return void. In the first step of this quest all drivers are converted to .remove_new() which already returns void. Eventually after all drivers are converted, .remove_new() is renamed to .remove(). Trivially convert this driver from always returning zero in the remove callback to the void returning variant. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230711143145.1192651-1-u.kleine-koenig@pengutronix.de
2023-07-17powerpc/crypto: Add gitignore for generated P10 AES/GCM .S filesRussell Currey
aesp10-ppc.S and ghashp10-ppc.S are autogenerated and not tracked by git, so they should be ignored. This is doing the same as the P8 files in drivers/crypto/vmx/.gitignore but for the P10 files in arch/powerpc/crypto. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230713042206.85669-1-ruscur@russell.cc
2023-07-17Revert "powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() ↵Christophe Leroy
with asm goto" This partly reverts commit 1e688dd2a3d6759d416616ff07afc4bb836c4213. That commit aimed at optimising the code around generation of WARN_ON/BUG_ON but this leads to a lot of dead code erroneously generated by GCC. That dead code becomes a problem when we start using objtool validation because objtool will abort validation with a warning as soon as it detects unreachable code. This is because unreachable code might be the indication that objtool doesn't properly decode object text. text data bss dec hex filename 9551585 3627834 224376 13403795 cc8693 vmlinux.before 9535281 3628358 224376 13388015 cc48ef vmlinux.after Once this change is reverted, in a standard configuration (pmac32 + function tracer) the text is reduced by 16k which is around 1.7% We already had problem with it when starting to use objtool on powerpc as a replacement for recordmcount, see commit 93e3f45a2631 ("powerpc: Fix __WARN_FLAGS() for use with Objtool") There is also a problem with at least GCC 12, on ppc64_defconfig + CONFIG_CC_OPTIMIZE_FOR_SIZE=y + CONFIG_DEBUG_SECTION_MISMATCH=y : LD .tmp_vmlinux.kallsyms1 powerpc64-linux-ld: net/ipv4/tcp_input.o:(__ex_table+0xc4): undefined reference to `.L2136' make[2]: *** [scripts/Makefile.vmlinux:36: vmlinux] Error 1 make[1]: *** [/home/chleroy/linux-powerpc/Makefile:1238: vmlinux] Error 2 Taking into account that other problems are encountered with that 'asm goto' in WARN_ON(), including build failures, keeping that change is not worth it allthough it is primarily a compiler bug. Revert it for now. mpe: Retain EMIT_WARN_ENTRY as a synonym for EMIT_BUG_ENTRY to reduce churn, as there are now nearly as many uses of EMIT_WARN_ENTRY as EMIT_BUG_ENTRY. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20230712134552.534955-1-mpe@ellerman.id.au