summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2025-02-11x86/cpu/kvm: SRSO: Fix possible missing IBPB on VM-ExitPatrick Bellasi
In [1] the meaning of the synthetic IBPB flags has been redefined for a better separation of concerns: - ENTRY_IBPB -- issue IBPB on entry only - IBPB_ON_VMEXIT -- issue IBPB on VM-Exit only and the Retbleed mitigations have been updated to match this new semantics. Commit [2] was merged shortly before [1], and their interaction was not handled properly. This resulted in IBPB not being triggered on VM-Exit in all SRSO mitigation configs requesting an IBPB there. Specifically, an IBPB on VM-Exit is triggered only when X86_FEATURE_IBPB_ON_VMEXIT is set. However: - X86_FEATURE_IBPB_ON_VMEXIT is not set for "spec_rstack_overflow=ibpb", because before [1] having X86_FEATURE_ENTRY_IBPB was enough. Hence, an IBPB is triggered on entry but the expected IBPB on VM-exit is not. - X86_FEATURE_IBPB_ON_VMEXIT is not set also when "spec_rstack_overflow=ibpb-vmexit" if X86_FEATURE_ENTRY_IBPB is already set. That's because before [1] this was effectively redundant. Hence, e.g. a "retbleed=ibpb spec_rstack_overflow=bpb-vmexit" config mistakenly reports the machine still vulnerable to SRSO, despite an IBPB being triggered both on entry and VM-Exit, because of the Retbleed selected mitigation config. - UNTRAIN_RET_VM won't still actually do anything unless CONFIG_MITIGATION_IBPB_ENTRY is set. For "spec_rstack_overflow=ibpb", enable IBPB on both entry and VM-Exit and clear X86_FEATURE_RSB_VMEXIT which is made superfluous by X86_FEATURE_IBPB_ON_VMEXIT. This effectively makes this mitigation option similar to the one for 'retbleed=ibpb', thus re-order the code for the RETBLEED_MITIGATION_IBPB option to be less confusing by having all features enabling before the disabling of the not needed ones. For "spec_rstack_overflow=ibpb-vmexit", guard this mitigation setting with CONFIG_MITIGATION_IBPB_ENTRY to ensure UNTRAIN_RET_VM sequence is effectively compiled in. Drop instead the CONFIG_MITIGATION_SRSO guard, since none of the SRSO compile cruft is required in this configuration. Also, check only that the required microcode is present to effectively enabled the IBPB on VM-Exit. Finally, update the KConfig description for CONFIG_MITIGATION_IBPB_ENTRY to list also all SRSO config settings enabled by this guard. Fixes: 864bcaa38ee4 ("x86/cpu/kvm: Provide UNTRAIN_RET_VM") [1] Fixes: d893832d0e1e ("x86/srso: Add IBPB on VMEXIT") [2] Reported-by: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Patrick Bellasi <derkling@google.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-02-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM: - Correctly clean the BSS to the PoC before allowing EL2 to access it on nVHE/hVHE/protected configurations - Propagate ownership of debug registers in protected mode after the rework that landed in 6.14-rc1 - Stop pretending that we can run the protected mode without a GICv3 being present on the host - Fix a use-after-free situation that can occur if a vcpu fails to initialise the NV shadow S2 MMU contexts - Always evaluate the need to arm a background timer for fully emulated guest timers - Fix the emulation of EL1 timers in the absence of FEAT_ECV - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0 s390: - move some of the guest page table (gmap) logic into KVM itself, inching towards the final goal of completely removing gmap from the non-kvm memory management code. As an initial set of cleanups, move some code from mm/gmap into kvm and start using __kvm_faultin_pfn() to fault-in pages as needed; but especially stop abusing page->index and page->lru to aid in the pgdesc conversion. x86: - Add missing check in the fix to defer starting the huge page recovery vhost_task - SRSO_USER_KERNEL_NO does not need SYNTHESIZED_F" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (31 commits) KVM: x86/mmu: Ensure NX huge page recovery thread is alive before waking KVM: remove kvm_arch_post_init_vm KVM: selftests: Fix spelling mistake "initally" -> "initially" kvm: x86: SRSO_USER_KERNEL_NO is not synthesized KVM: arm64: timer: Don't adjust the EL2 virtual timer offset KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECV KVM: arm64: timer: Always evaluate the need for a soft timer KVM: arm64: Fix nested S2 MMU structures reallocation KVM: arm64: Fail protected mode init if no vgic hardware is present KVM: arm64: Flush/sync debug state in protected mode KVM: s390: selftests: Streamline uc_skey test to issue iske after sske KVM: s390: remove the last user of page->index KVM: s390: move PGSTE softbits KVM: s390: remove useless page->index usage KVM: s390: move gmap_shadow_pgt_lookup() into kvm KVM: s390: stop using lists to keep track of used dat tables KVM: s390: stop using page->index for non-shadow gmaps KVM: s390: move some gmap shadowing functions away from mm/gmap.c KVM: s390: get rid of gmap_translate() KVM: s390: get rid of gmap_fault() ...
2025-02-08Merge tag 'execve-v6.14-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve fix from Kees Cook: "This is an alpha-specific fix, but since it touched ELF I was asked to carry it. - alpha/elf: Fix misc/setarch test of util-linux by removing 32bit support (Eric W. Biederman)" * tag 'execve-v6.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: alpha/elf: Fix misc/setarch test of util-linux by removing 32bit support
2025-02-08Merge tag 'x86-urgent-2025-02-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Fix a build regression on GCC 15 builds, caused by GCC changing the default C version that is overriden in the main Makefile but not in the x86 boot code Makefile" * tag 'x86-urgent-2025-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Use '-std=gnu11' to fix build with GCC 15
2025-02-07genirq: Remove leading space from irq_chip::irq_print_chip() callbacksGeert Uytterhoeven
The space separator was factored out from the multiple chip name prints, but several irq_chip::irq_print_chip() callbacks still print a leading space. Remove the superfluous double spaces. Fixes: 9d9f204bdf7243bf ("genirq/proc: Add missing space separator back") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/893f7e9646d8933cd6786d5a1ef3eb076d263768.1738764803.git.geert+renesas@glider.be
2025-02-06Merge tag 'for-linus-6.14-rc2-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Three fixes for xen_hypercall_hvm() that was introduced in the 6.13 cycle" * tag 'for-linus-6.14-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: remove unneeded dummy push from xen_hypercall_hvm() x86/xen: add FRAME_END to xen_hypercall_hvm() x86/xen: fix xen_hypercall_hvm() to not clobber %rbx
2025-02-06alpha/elf: Fix misc/setarch test of util-linux by removing 32bit supportEric W. Biederman
Richard Henderson <richard.henderson@linaro.org> writes[1]: > There was a Spec benchmark (I forget which) which was memory bound and ran > twice as fast with 32-bit pointers. > > I copied the idea from DEC to the ELF abi, but never did all the other work > to allow the toolchain to take advantage. > > Amusingly, a later Spec changed the benchmark data sets to not fit into a > 32-bit address space, specifically because of this. > > I expect one could delete the ELF bit and personality and no one would > notice. Not even the 10 remaining Alpha users. In [2] it was pointed out that parts of setarch weren't working properly on alpha because it has it's own SET_PERSONALITY implementation. In the discussion that followed Richard Henderson pointed out that the 32bit pointer support for alpha was never completed. Fix this by removing alpha's 32bit pointer support. As a bit of paranoia refuse to execute any alpha binaries that have the EF_ALPHA_32BIT flag set. Just in case someone somewhere has binaries that try to use alpha's 32bit pointer support. Link: https://lkml.kernel.org/r/CAFXwXrkgu=4Qn-v1PjnOR4SG0oUb9LSa0g6QXpBq4ttm52pJOQ@mail.gmail.com [1] Link: https://lkml.kernel.org/r/20250103140148.370368-1-glaubitz@physik.fu-berlin.de [2] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Link: https://lore.kernel.org/r/87y0zfs26i.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-02-05x86/xen: remove unneeded dummy push from xen_hypercall_hvm()Juergen Gross
Stack alignment of the kernel in 64-bit mode is 8, not 16, so the dummy push in xen_hypercall_hvm() for aligning the stack to 16 bytes can be removed. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-05x86/xen: add FRAME_END to xen_hypercall_hvm()Juergen Gross
xen_hypercall_hvm() is missing a FRAME_END at the end, add it. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202502030848.HTNTTuo9-lkp@intel.com/ Fixes: b4845bb63838 ("x86/xen: add central hypercall functions") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-05x86/xen: fix xen_hypercall_hvm() to not clobber %rbxJuergen Gross
xen_hypercall_hvm(), which is used when running as a Xen PVH guest at most only once during early boot, is clobbering %rbx. Depending on whether the caller relies on %rbx to be preserved across the call or not, this clobbering might result in an early crash of the system. This can be avoided by using an already saved register instead of %rbx. Fixes: b4845bb63838 ("x86/xen: add central hypercall functions") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com>
2025-02-04KVM: x86/mmu: Ensure NX huge page recovery thread is alive before wakingSean Christopherson
When waking a VM's NX huge page recovery thread, ensure the thread is actually alive before trying to wake it. Now that the thread is spawned on-demand during KVM_RUN, a VM without a recovery thread is reachable via the related module params. BUG: kernel NULL pointer dereference, address: 0000000000000040 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:vhost_task_wake+0x5/0x10 Call Trace: <TASK> set_nx_huge_pages+0xcc/0x1e0 [kvm] param_attr_store+0x8a/0xd0 module_attr_store+0x1a/0x30 kernfs_fop_write_iter+0x12f/0x1e0 vfs_write+0x233/0x3e0 ksys_write+0x60/0xd0 do_syscall_64+0x5b/0x160 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7f3b52710104 </TASK> Modules linked in: kvm_intel kvm CR2: 0000000000000040 Fixes: 931656b9e2ff ("kvm: defer huge page recovery vhost task to later") Cc: stable@vger.kernel.org Cc: Keith Busch <kbusch@kernel.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20250124234623.3609069-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-04KVM: remove kvm_arch_post_init_vmPaolo Bonzini
The only statement in a kvm_arch_post_init_vm implementation can be moved into the x86 kvm_arch_init_vm. Do so and remove all traces from architecture-independent code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-04Merge tag 'kvmarm-fixes-6.14-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.14, take #1 - Correctly clean the BSS to the PoC before allowing EL2 to access it on nVHE/hVHE/protected configurations - Propagate ownership of debug registers in protected mode after the rework that landed in 6.14-rc1 - Stop pretending that we can run the protected mode without a GICv3 being present on the host - Fix a use-after-free situation that can occur if a vcpu fails to initialise the NV shadow S2 MMU contexts - Always evaluate the need to arm a background timer for fully emulated guest timers - Fix the emulation of EL1 timers in the absence of FEAT_ECV - Correctly handle the EL2 virtual timer, specially when HCR_EL2.E2H==0
2025-02-04Merge tag 'kvm-s390-next-6.14-2' of ↵Paolo Bonzini
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - some selftest fixes - move some kvm-related functions from mm into kvm - remove all usage of page->index and page->lru from kvm - fixes and cleanups for vsie
2025-02-04kvm: x86: SRSO_USER_KERNEL_NO is not synthesizedPaolo Bonzini
SYNTHESIZED_F() generally is used together with setup_force_cpu_cap(), i.e. when it makes sense to present the feature even if cpuid does not have it *and* the VM is not able to see the difference. For example, it can be used when mitigations on the host automatically protect the guest as well. The "SYNTHESIZED_F(SRSO_USER_KERNEL_NO)" line came in as a conflict resolution between the CPUID overhaul from the KVM tree and support for the feature in the x86 tree. Using it right now does not hurt, or make a difference for that matter, because there is no setup_force_cpu_cap(X86_FEATURE_SRSO_USER_KERNEL_NO). However, it is a little less future proof in case such a setup_force_cpu_cap() appears later, for a case where the kernel somehow is not vulnerable but the guest would have to apply the mitigation. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-02-04KVM: arm64: timer: Don't adjust the EL2 virtual timer offsetMarc Zyngier
The way we deal with the EL2 virtual timer is a bit odd. We try to cope with E2H being flipped, and adjust which offset applies to that timer depending on the current E2H value. But that's a complexity we shouldn't have to worry about. What we have to deal with is either E2H being RES1, in which case there is no offset, or E2H being RES0, and the virtual timer simply does not exist. Drop the adjusting of the timer offset, which makes things a bit simpler. At the same time, make sure that accessing the HV timer when E2H is RES0 results in an UNDEF in the guest. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-4-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04KVM: arm64: timer: Correctly handle EL1 timer emulation when !FEAT_ECVMarc Zyngier
Both Wei-Lin Chang and Volodymyr Babchuk report that the way we handle the emulation of EL1 timers with NV is completely wrong, specially in the case of HCR_EL2.E2H==0. There are three problems in about as many lines of code: - With E2H==0, the EL1 timers are overwritten with the EL1 state, while they should actually contain the EL2 state (as per the timer map) - With E2H==1, we run the full EL1 timer emulation even when ECV is present, hiding a bug in timer_emulate() (see previous patch) - The comments are actively misleading, and say all the wrong things. This is only attributable to the code having been initially written for FEAT_NV, hacked up to handle FEAT_NV2 *in parallel*, and vaguely hacked again to be FEAT_NV2 only. Oh, and yours truly being a gold plated idiot. The fix is obvious: just delete most of the E2H==0 code, have a unified handling of the timers (because they really are E2H agnostic), and make sure we don't execute any of that when FEAT_ECV is present. Fixes: 4bad3068cfa9f ("KVM: arm64: nv: Sync nested timer state with FEAT_NV2") Reported-by: Wei-Lin Chang <r09922117@csie.ntu.edu.tw> Reported-by: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> Link: https://lore.kernel.org/r/fqiqfjzwpgbzdtouu2pwqlu7llhnf5lmy4hzv5vo6ph4v3vyls@jdcfy3fjjc5k Link: https://lore.kernel.org/r/87frl51tse.fsf@epam.com Tested-by: Dmytro Terletskyi <dmytro_terletskyi@epam.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-3-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04KVM: arm64: timer: Always evaluate the need for a soft timerMarc Zyngier
When updating the interrupt state for an emulated timer, we return early and skip the setup of a soft timer that runs in parallel with the guest. While this is OK if we have set the interrupt pending, it is pretty wrong if the guest moved CVAL into the future. In that case, no timer is armed and the guest can wait for a very long time (it will take a full put/load cycle for the situation to resolve). This is specially visible with EDK2 running at EL2, but still using the EL1 virtual timer, which in that case is fully emulated. Any key-press takes ages to be captured, as there is no UART interrupt and EDK2 relies on polling from a timer... The fix is simply to drop the early return. If the timer interrupt is pending, we will still return early, and otherwise arm the soft timer. Fixes: 4d74ecfa6458b ("KVM: arm64: Don't arm a hrtimer for an already pending timer") Cc: stable@vger.kernel.org Tested-by: Dmytro Terletskyi <dmytro_terletskyi@epam.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250204110050.150560-2-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04KVM: arm64: Fix nested S2 MMU structures reallocationMarc Zyngier
For each vcpu that userspace creates, we allocate a number of s2_mmu structures that will eventually contain our shadow S2 page tables. Since this is a dynamically allocated array, we reallocate the array and initialise the newly allocated elements. Once everything is correctly initialised, we adjust pointer and size in the kvm structure, and move on. But should that initialisation fail *and* the reallocation triggered a copy to another location, we end-up returning early, with the kvm structure still containing the (now stale) old pointer. Weeee! Cure it by assigning the pointer early, and use this to perform the initialisation. If everything succeeds, we adjust the size. Otherwise, we just leave the size as it was, no harm done, and the new memory is as good as the ol' one (we hope...). Fixes: 4f128f8e1aaac ("KVM: arm64: nv: Support multiple nested Stage-2 mmu structures") Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Link: https://lore.kernel.org/r/20250204145554.774427-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-04KVM: arm64: Fail protected mode init if no vgic hardware is presentOliver Upton
Protected mode assumes that at minimum vgic-v3 is present, however KVM fails to actually enforce this at the time of initialization. As such, when running protected mode in a half-baked state on GICv2 hardware we see the hyp go belly up at vcpu_load() when it tries to restore the vgic-v3 cpuif: $ ./arch_timer_edge_cases [ 130.599140] kvm [4518]: nVHE hyp panic at: [<ffff800081102b58>] __kvm_nvhe___vgic_v3_restore_vmcr_aprs+0x8/0x84! [ 130.603685] kvm [4518]: Cannot dump pKVM nVHE stacktrace: !CONFIG_PROTECTED_NVHE_STACKTRACE [ 130.611962] kvm [4518]: Hyp Offset: 0xfffeca95ed000000 [ 130.617053] Kernel panic - not syncing: HYP panic: [ 130.617053] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000 [ 130.617053] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000 [ 130.617053] VCPU:0000000000000000 [ 130.638013] CPU: 0 UID: 0 PID: 4518 Comm: arch_timer_edge Tainted: G C 6.13.0-rc3-00009-gf7d03fcbf1f4 #1 [ 130.648790] Tainted: [C]=CRAP [ 130.651721] Hardware name: Libre Computer AML-S905X-CC (DT) [ 130.657242] Call trace: [ 130.659656] show_stack+0x18/0x24 (C) [ 130.663279] dump_stack_lvl+0x38/0x90 [ 130.666900] dump_stack+0x18/0x24 [ 130.670178] panic+0x388/0x3e8 [ 130.673196] nvhe_hyp_panic_handler+0x104/0x208 [ 130.677681] kvm_arch_vcpu_load+0x290/0x548 [ 130.681821] vcpu_load+0x50/0x80 [ 130.685013] kvm_arch_vcpu_ioctl_run+0x30/0x868 [ 130.689498] kvm_vcpu_ioctl+0x2e0/0x974 [ 130.693293] __arm64_sys_ioctl+0xb4/0xec [ 130.697174] invoke_syscall+0x48/0x110 [ 130.700883] el0_svc_common.constprop.0+0x40/0xe0 [ 130.705540] do_el0_svc+0x1c/0x28 [ 130.708818] el0_svc+0x30/0xd0 [ 130.711837] el0t_64_sync_handler+0x10c/0x138 [ 130.716149] el0t_64_sync+0x198/0x19c [ 130.719774] SMP: stopping secondary CPUs [ 130.723660] Kernel Offset: disabled [ 130.727103] CPU features: 0x000,00000800,02800000,0200421b [ 130.732537] Memory Limit: none [ 130.735561] ---[ end Kernel panic - not syncing: HYP panic: [ 130.735561] PS:800003c9 PC:0000b56a94102b58 ESR:0000000002000000 [ 130.735561] FAR:ffff00007b98d4d0 HPFAR:00000000007b98d0 PAR:0000000000000000 [ 130.735561] VCPU:0000000000000000 ]--- Fix it by failing KVM initialization if the system doesn't implement vgic-v3, as protected mode will never do anything useful on such hardware. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/5ca7588c-7bf2-4352-8661-e4a56a9cd9aa@sirena.org.uk/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250203231543.233511-1-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-02-02Merge tag 'sh-for-v6.14-tag1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux Pull sh updates from John Paul Adrian Glaubitz: "Fixes and improvements for sh: - replace seq_printf() with the more efficient seq_put_decimal_ull_width() to increase performance when stress reading /proc/interrupts (David Wang) - migrate sh to the generic rule for built-in DTB to help avoid race conditions during parallel builds which can occur because Kbuild decends into arch/*/boot/dts twice (Masahiro Yamada) - replace select with imply in the board Kconfig for enabling hardware with complex dependencies. This addresses warnings which were reported by the kernel test robot (Geert Uytterhoeven)" * tag 'sh-for-v6.14-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux: sh: boards: Use imply to enable hardware with complex dependencies sh: Migrate to the generic rule for built-in DTB sh: irq: Use seq_put_decimal_ull_width() for decimal values
2025-02-01Merge tag 'mips_6.14_1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: "Revert commit breaking sysv ipc for o32 ABI" * tag 'mips_6.14_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: Revert "mips: fix shmctl/semctl/msgctl syscall for o32"
2025-02-01Merge tag 'mm-hotfixes-stable-2025-02-01-03-56' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes. 8 are cc:stable and the remainder address post-6.13 issues. 13 are for MM and 8 are for non-MM. All are singletons, please see the changelogs for details" * tag 'mm-hotfixes-stable-2025-02-01-03-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) MAINTAINERS: include linux-mm for xarray maintenance revert "xarray: port tests to kunit" MAINTAINERS: add lib/test_xarray.c mailmap, MAINTAINERS, docs: update Carlos's email address mm/hugetlb: fix hugepage allocation for interleaved memory nodes mm: gup: fix infinite loop within __get_longterm_locked mm, swap: fix reclaim offset calculation error during allocation .mailmap: update email address for Christopher Obbard kfence: skip __GFP_THISNODE allocations on NUMA systems nilfs2: fix possible int overflows in nilfs_fiemap() mm: compaction: use the proper flag to determine watermarks kernel: be more careful about dup_mmap() failures and uprobe registering mm/fake-numa: handle cases with no SRAT info mm: kmemleak: fix upper boundary check for physical address objects mailmap: add an entry for Hamza Mahfooz MAINTAINERS: mailmap: update Yosry Ahmed's email address scripts/gdb: fix aarch64 userspace detection in get_current_task mm/vmscan: accumulate nr_demoted for accurate demotion statistics ocfs2: fix incorrect CPU endianness conversion causing mount failure mm/zsmalloc: add __maybe_unused attribute for is_first_zpdesc() ...
2025-02-01revert "xarray: port tests to kunit"Andrew Morton
Revert c7bb5cf9fc4e ("xarray: port tests to kunit"). It broke the build when compiing the xarray userspace test harness code. Reported-by: Sidhartha Kumar <sidhartha.kumar@oracle.com> Closes: https://lkml.kernel.org/r/07cf896e-adf8-414f-a629-a808fc26014a@oracle.com Cc: David Gow <davidgow@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Tamir Duberstein <tamird@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-02-01sh: boards: Use imply to enable hardware with complex dependenciesGeert Uytterhoeven
If CONFIG_I2C=n: WARNING: unmet direct dependencies detected for SND_SOC_AK4642 Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && I2C [=n] Selected by [y]: - SH_7724_SOLUTION_ENGINE [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y] WARNING: unmet direct dependencies detected for SND_SOC_DA7210 Depends on [n]: SOUND [=y] && SND [=y] && SND_SOC [=y] && SND_SOC_I2C_AND_SPI [=n] Selected by [y]: - SH_ECOVEC [=y] && CPU_SUBTYPE_SH7724 [=y] && SND_SIMPLE_CARD [=y] Fix this by replacing select by imply, instead of adding a dependency on I2C. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202501240836.OvXqmANX-lkp@intel.com/ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2025-02-01sh: Migrate to the generic rule for built-in DTBMasahiro Yamada
Commit 654102df2ac2 ("kbuild: add generic support for built-in boot DTBs") introduced generic support for built-in DTBs. Select GENERIC_BUILTIN_DTB when built-in DTB support is enabled. To keep consistency across architectures, this commit also renames CONFIG_USE_BUILTIN_DTB to CONFIG_BUILTIN_DTB, and CONFIG_BUILTIN_DTB_SOURCE to CONFIG_BUILTIN_DTB_NAME. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2025-02-01sh: irq: Use seq_put_decimal_ull_width() for decimal valuesDavid Wang
On a system with n CPUs and m interrupts, there will be n*m decimal values yielded via seq_printf(.."%10u "..) which has significant costs parsing format string and is less efficient than seq_put_decimal_ull_width(). Stress reading /proc/interrupts indicates ~30% performance improvement with this patch. Signed-off-by: David Wang <00107082@163.com> Reviewed-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2025-02-01KVM: arm64: Flush/sync debug state in protected modeOliver Upton
The recent changes to debug state management broke self-hosted debug for guests when running in protected mode, since both the debug owner and the debug state itself aren't shared with the hyp's view of the vcpu. Fix it by flushing/syncing the relevant bits with the hyp vcpu. Fixes: beb470d96cec ("KVM: arm64: Use debug_owner to track if debug regs need save/restore") Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/kvmarm/5f62740f-a065-42d9-9f56-8fb648b9c63f@sirena.org.uk/ Signed-off-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20250131222922.1548780-3-oliver.upton@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-01-31Merge tag 'for-linus-hexagon-6.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux Pull hexagon updates from Brian Cain: - Move kernel prototypes out of uapi header to internal header - Fix to address an unbalanced spinlock - Miscellaneous patches to fix static checks - Update bcain@quicinc.com->brian.cain@oss.qualcomm.com * tag 'for-linus-hexagon-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/bcain/linux: MAINTAINERS: Update my email address hexagon: Fix unbalanced spinlock in die() hexagon: Fix warning comparing pointer to 0 hexagon: Move kernel prototypes out of uapi/asm/setup.h header hexagon: time: Remove redundant null check for resource hexagon: fix using plain integer as NULL pointer warning in cmpxchg
2025-01-31Merge tag 'riscv-for-linus-6.14-mw1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - The PH1520 pinctrl and dwmac drivers are enabeled in defconfig - A redundant AQRL barrier has been removed from the futex cmpxchg implementation - Support for the T-Head vector extensions, which includes exposing these extensions to userspace on systems that implement them - Some more page table information is now printed on die() and systems that cause PA overflows * tag 'riscv-for-linus-6.14-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: add a warning when physical memory address overflows riscv/mm/fault: add show_pte() before die() riscv: Add ghostwrite vulnerability selftests: riscv: Support xtheadvector in vector tests selftests: riscv: Fix vector tests riscv: hwprobe: Document thead vendor extensions and xtheadvector extension riscv: hwprobe: Add thead vendor extension probing riscv: vector: Support xtheadvector save/restore riscv: Add xtheadvector instruction definitions riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT RISC-V: define the elements of the VCSR vector CSR riscv: vector: Use vlenb from DT for thead riscv: Add thead and xtheadvector as a vendor extension riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree dt-bindings: cpus: add a thead vlen register length property dt-bindings: riscv: Add xtheadvector ISA extension description RISC-V: Mark riscv_v_init() as __init riscv: defconfig: drop RT_GROUP_SCHED=y riscv/futex: Optimize atomic cmpxchg riscv: defconfig: enable pinctrl and dwmac support for TH1520
2025-01-31Merge tag 'kbuild-v6.14' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Support multiple hook locations for maint scripts of Debian package - Remove 'cpio' from the build tool requirement - Introduce gendwarfksyms tool, which computes CRCs for export symbols based on the DWARF information - Support CONFIG_MODVERSIONS for Rust - Resolve all conflicts in the genksyms parser - Fix several syntax errors in genksyms * tag 'kbuild-v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (64 commits) kbuild: fix Clang LTO with CONFIG_OBJTOOL=n kbuild: Strip runtime const RELA sections correctly kconfig: fix memory leak in sym_warn_unmet_dep() kconfig: fix file name in warnings when loading KCONFIG_DEFCONFIG_LIST genksyms: fix syntax error for attribute before init-declarator genksyms: fix syntax error for builtin (u)int*x*_t types genksyms: fix syntax error for attribute after 'union' genksyms: fix syntax error for attribute after 'struct' genksyms: fix syntax error for attribute after abstact_declarator genksyms: fix syntax error for attribute before nested_declarator genksyms: fix syntax error for attribute before abstract_declarator genksyms: decouple ATTRIBUTE_PHRASE from type-qualifier genksyms: record attributes consistently for init-declarator genksyms: restrict direct-declarator to take one parameter-type-list genksyms: restrict direct-abstract-declarator to take one parameter-type-list genksyms: remove Makefile hack genksyms: fix last 3 shift/reduce conflicts genksyms: fix 6 shift/reduce conflicts and 5 reduce/reduce conflicts genksyms: reduce type_qualifier directly to decl_specifier genksyms: rename cvar_qualifier to type_qualifier ...
2025-02-01kbuild: Strip runtime const RELA sections correctlyArd Biesheuvel
Due to the fact that runtime const ELF sections are named without a leading period or double underscore, the RSTRIP logic that removes the static RELA sections from vmlinux fails to identify them. This results in a situation like below, where some sections that were supposed to get removed are left behind. [Nr] Name Type Address Off Size ES Flg Lk Inf Al [58] runtime_shift_d_hash_shift PROGBITS ffffffff83500f50 2900f50 000014 00 A 0 0 1 [59] .relaruntime_shift_d_hash_shift RELA 0000000000000000 55b6f00 000078 18 I 70 58 8 [60] runtime_ptr_dentry_hashtable PROGBITS ffffffff83500f68 2900f68 000014 00 A 0 0 1 [61] .relaruntime_ptr_dentry_hashtable RELA 0000000000000000 55b6f78 000078 18 I 70 60 8 [62] runtime_ptr_USER_PTR_MAX PROGBITS ffffffff83500f80 2900f80 000238 00 A 0 0 1 [63] .relaruntime_ptr_USER_PTR_MAX RELA 0000000000000000 55b6ff0 000d50 18 I 70 62 8 So tweak the match expression to strip all sections starting with .rel. While at it, consolidate the logic used by RISC-V, s390 and x86 into a single shared Makefile library command. Link: https://lore.kernel.org/all/CAHk-=wjk3ynjomNvFN8jf9A1k=qSc=JFF591W00uXj-qqNUxPQ@mail.gmail.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Charlie Jenkins <charlie@rivosinc.com> Tested-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2025-01-31Merge tag 'x86-mm-2025-01-31' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: - The biggest changes are the TLB flushing scalability optimizations, to update the mm_cpumask lazily and related changes. This feature has both a track record and a continued risk of performance regressions, so it was already delayed by a cycle - but it's all 100% perfect now™ (Rik van Riel) - Also miscellaneous fixes and cleanups. (Gautam Somani, Kirill Shutemov, Sebastian Andrzej Siewior) * tag 'x86-mm-2025-01-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Remove unnecessary include of <linux/extable.h> x86/mtrr: Rename mtrr_overwrite_state() to guest_force_mtrr_state() x86/mm/selftests: Fix typo in lam.c x86/mm/tlb: Only trim the mm_cpumask once a second x86/mm/tlb: Also remove local CPU from mm_cpumask if stale x86/mm/tlb: Add tracepoint for TLB flush IPI to stale CPU x86/mm/tlb: Update mm_cpumask lazily
2025-01-31KVM: s390: remove the last user of page->indexClaudio Imbrenda
Shadow page tables use page->index to keep the g2 address of the guest page table being shadowed. Instead of keeping the information in page->index, split the address and smear it over the 16-bit softbits areas of 4 PGSTEs. This removes the last s390 user of page->index. Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-16-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-16-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: move PGSTE softbitsClaudio Imbrenda
Move the softbits in the PGSTEs to the other usable area. This leaves the 16-bit block of usable bits free, which will be used in the next patch for something else. Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-15-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-15-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: remove useless page->index usageClaudio Imbrenda
The page->index field for VSIE dat tables is only used for segment tables. Stop setting the field for all region tables. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-14-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-14-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: move gmap_shadow_pgt_lookup() into kvmClaudio Imbrenda
Move gmap_shadow_pgt_lookup() from mm/gmap.c into kvm/gaccess.c . Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-13-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-13-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: stop using lists to keep track of used dat tablesClaudio Imbrenda
Until now, every dat table allocated to map a guest was put in a linked list. The page->lru field of struct page was used to keep track of which pages were being used, and when the gmap is torn down, the list was walked and all pages freed. This patch gets rid of the usage of page->lru. Page tables are now freed by recursively walking the dat table tree. Since s390_unlist_old_asce() becomes useless now, remove it. Acked-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-12-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-12-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: stop using page->index for non-shadow gmapsClaudio Imbrenda
The host_to_guest radix tree will now map userspace addresses to guest addresses, instead of userspace addresses to segment tables. When segment tables and page tables are needed, they are found using an additional gmap_table_walk(). This gets rid of all usage of page->index for non-shadow gmaps. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-11-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-11-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: move some gmap shadowing functions away from mm/gmap.cClaudio Imbrenda
Move some gmap shadowing functions from mm/gmap.c to kvm/kvm-s390.c and the newly created kvm/gmap-vsie.c This is a step toward removing gmap from mm. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-10-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-10-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: get rid of gmap_translate()Claudio Imbrenda
Add gpa_to_hva(), which uses memslots, and use it to replace all uses of gmap_translate(). Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-9-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-9-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: get rid of gmap_fault()Claudio Imbrenda
All gmap page faults are already handled in kvm by the function kvm_s390_handle_dat_fault(); only few users of gmap_fault remained, all within kvm. Convert those calls to use kvm_s390_handle_dat_fault() instead. Remove gmap_fault() entirely since it has no more users. Acked-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-8-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-8-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: use __kvm_faultin_pfn()Claudio Imbrenda
Refactor the existing page fault handling code to use __kvm_faultin_pfn(). This possible now that memslots are always present. Acked-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-7-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-7-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: move pv gmap functions into kvmClaudio Imbrenda
Move gmap related functions from kernel/uv into kvm. Create a new file to collect gmap-related functions. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> [fixed unpack_one(), thanks mhartmay@linux.ibm.com] Link: https://lore.kernel.org/r/20250123144627.312456-6-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-6-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: fake memslot for ucontrol VMsClaudio Imbrenda
Create a fake memslot for ucontrol VMs. The fake memslot identity-maps userspace. Now memslots will always be present, and ucontrol is not a special case anymore. Suggested-by: Sean Christopherson <seanjc@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20250123144627.312456-4-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-4-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: wrapper for KVM_BUGClaudio Imbrenda
Wrap the call to KVM_BUG; this reduces code duplication and improves readability. Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250123144627.312456-3-imbrenda@linux.ibm.com Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Message-ID: <20250123144627.312456-3-imbrenda@linux.ibm.com>
2025-01-31KVM: s390: vsie: stop using "struct page" for vsie pageDavid Hildenbrand
Now that we no longer use page->index and the page refcount explicitly, let's avoid messing with "struct page" completely. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Tested-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Message-ID: <20250107154344.1003072-5-david@redhat.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2025-01-31KVM: s390: vsie: stop messing with page refcountDavid Hildenbrand
Let's stop messing with the page refcount, and use a flag that is set / cleared atomically to remember whether a vsie page is currently in use. Note that we could use a page flag, or a lower bit of the scb_gpa. Let's keep it simple for now, we have sufficient space. While at it, stop passing "struct kvm *" to put_vsie_page(), it's unused. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Tested-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Message-ID: <20250107154344.1003072-4-david@redhat.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2025-01-31KVM: s390: vsie: stop using page->indexDavid Hildenbrand
Let's stop using page->index, and instead use a field inside "struct vsie_page" to hold that value. We have plenty of space left in there. This is one part of stopping using "struct page" when working with vsie pages. We place the "page_to_virt(page)" strategically, so the next cleanups requires less churn. Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Tested-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Message-ID: <20250107154344.1003072-3-david@redhat.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
2025-01-31KVM: s390: vsie: fix some corner-cases when grabbing vsie pagesDavid Hildenbrand
We try to reuse the same vsie page when re-executing the vsie with a given SCB address. The result is that we use the same shadow SCB -- residing in the vsie page -- and can avoid flushing the TLB when re-running the vsie on a CPU. So, when we allocate a fresh vsie page, or when we reuse a vsie page for a different SCB address -- reusing the shadow SCB in different context -- we set ihcpu=0xffff to trigger the flush. However, after we looked up the SCB address in the radix tree, but before we grabbed the vsie page by raising the refcount to 2, someone could reuse the vsie page for a different SCB address, adjusting page->index and the radix tree. In that case, we would be reusing the vsie page with a wrong page->index. Another corner case is that we might set the SCB address for a vsie page, but fail the insertion into the radix tree. Whoever would reuse that page would remove the corresponding radix tree entry -- which might now be a valid entry pointing at another page, resulting in the wrong vsie page getting removed from the radix tree. Let's handle such races better, by validating that the SCB address of a vsie page didn't change after we grabbed it (not reuse for a different SCB; the alternative would be performing another tree lookup), and by setting the SCB address to invalid until the insertion in the tree succeeded (SCB addresses are aligned to 512, so ULONG_MAX is invalid). These scenarios are rare, the effects a bit unclear, and these issues were only found by code inspection. Let's CC stable to be safe. Fixes: a3508fbe9dc6 ("KVM: s390: vsie: initial support for nested virtualization") Cc: stable@vger.kernel.org Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Tested-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Message-ID: <20250107154344.1003072-2-david@redhat.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>