summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2024-06-08Merge tag 'x86-urgent-2024-06-08' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Miscellaneous fixes: - Fix kexec() crash if call depth tracking is enabled - Fix SMN reads on inaccessible registers on certain AMD systems" * tag 'x86-urgent-2024-06-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd_nb: Check for invalid SMN reads x86/kexec: Fix bug with call depth tracking
2024-06-07Merge tag 'mm-hotfixes-stable-2024-06-07-15-24' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "14 hotfixes, 6 of which are cc:stable. All except the nilfs2 fix affect MM and all are singletons - see the chagelogs for details" * tag 'mm-hotfixes-stable-2024-06-07-15-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors mm: fix xyz_noprof functions calling profiled functions codetag: avoid race at alloc_slab_obj_exts mm/hugetlb: do not call vma_add_reservation upon ENOMEM mm/ksm: fix ksm_zero_pages accounting mm/ksm: fix ksm_pages_scanned accounting kmsan: do not wipe out origin when doing partial unpoisoning vmalloc: check CONFIG_EXECMEM in is_vmalloc_or_module_addr() mm: page_alloc: fix highatomic typing in multi-block buddies nilfs2: fix potential kernel bug due to lack of writeback flag waiting memcg: remove the lockdep assert from __mod_objcg_mlstate() mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptes mm: huge_mm: fix undefined reference to `mthp_stats' for CONFIG_SYSFS=n mm: drop the 'anon_' prefix for swap-out mTHP counters
2024-06-07Merge tag 'riscv-for-linus-6.10-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - Another fix to avoid allocating pages that overlap with ERR_PTR, which manifests on rv32 - A revert for the badaccess patch I incorrectly picked up an early version of * tag 'riscv-for-linus-6.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: Revert "riscv: mm: accelerate pagefault when badaccess" riscv: fix overlap of allocated page and PTR_ERR
2024-06-07Merge tag 's390-6.10-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - Do not create PT_LOAD program header for the kenel image when the virtual memory informaton in OS_INFO data is not available. That fixes stand-alone dump failures against kernels that do not provide the virtual memory informaton - Add KVM s390 shared zeropage selftest * tag 's390-6.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: KVM: s390x: selftests: Add shared zeropage test s390/crash: Do not use VM info if os_info does not have it
2024-06-07Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Fix spurious CPU hotplug warning message from SETEND emulation code - Fix the build when GCC wasn't inlining our I/O accessor internals * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/io: add constant-argument check arm64: armv8_deprecated: Fix warning in isndep cpuhp starting process
2024-06-05mm: arm64: fix the out-of-bounds issue in contpte_clear_young_dirty_ptesBarry Song
We are passing a huge nr to __clear_young_dirty_ptes() right now. While we should pass the number of pages, we are actually passing CONT_PTE_SIZE. This is causing lots of crashes of MADV_FREE, panic oops could vary everytime. Link: https://lkml.kernel.org/r/20240524005444.135417-1-21cnbao@gmail.com Fixes: 89e86854fb0a ("mm/arm64: override clear_young_dirty_ptes() batch helper") Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Lance Yang <ioworker0@gmail.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Chris Li <chrisl@kernel.org> Cc: Barry Song <21cnbao@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Jeff Xie <xiehuan09@gmail.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-06-05x86/amd_nb: Check for invalid SMN readsYazen Ghannam
AMD Zen-based systems use a System Management Network (SMN) that provides access to implementation-specific registers. SMN accesses are done indirectly through an index/data pair in PCI config space. The PCI config access may fail and return an error code. This would prevent the "read" value from being updated. However, the PCI config access may succeed, but the return value may be invalid. This is in similar fashion to PCI bad reads, i.e. return all bits set. Most systems will return 0 for SMN addresses that are not accessible. This is in line with AMD convention that unavailable registers are Read-as-Zero/Writes-Ignored. However, some systems will return a "PCI Error Response" instead. This value, along with an error code of 0 from the PCI config access, will confuse callers of the amd_smn_read() function. Check for this condition, clear the return value, and set a proper error code. Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h") Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230403164244.471141-1-yazen.ghannam@amd.com
2024-06-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "This is dominated by a couple large series for ARM and x86 respectively, but apart from that things are calm. ARM: - Large set of FP/SVE fixes for pKVM, addressing the fallout from the per-CPU data rework and making sure that the host is not involved in the FP/SVE switching any more - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH is completely supported - Fix for the respective priorities of Failed PAC, Illegal Execution state and Instruction Abort exceptions - Fix the handling of AArch32 instruction traps failing their condition code, which was broken by the introduction of ESR_EL2.ISS2 - Allow vcpus running in AArch32 state to be restored in System mode - Fix AArch32 GPR restore that would lose the 64 bit state under some conditions RISC-V: - No need to use mask when hart-index-bits is 0 - Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext() x86: - Fixes and debugging help for the #VE sanity check. Also disable it by default, even for CONFIG_DEBUG_KERNEL, because it was found to trigger spuriously (most likely a processor erratum as the exact symptoms vary by generation). - Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled situation (GIF=0 or interrupt shadow) when the processor supports virtual NMI. While generally KVM will not request an NMI window when virtual NMIs are supported, in this case it *does* have to single-step over the interrupt shadow or enable the STGI intercept, in order to deliver the latched second NMI. - Drop support for hand tuning APIC timer advancement from userspace. Since we have adaptive tuning, and it has proved to work well, drop the module parameter for manual configuration and with it a few stupid bugs that it had" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (32 commits) KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attr KVM: arm64: Ensure that SME controls are disabled in protected mode KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx format KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVM KVM: arm64: Eagerly restore host fpsimd/sve state in pKVM KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVM KVM: arm64: Specialize handling of host fpsimd state on trap KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helper KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_state KVM: arm64: Reintroduce __sve_save_state KVM: x86: Drop support for hand tuning APIC timer advancement from userspace KVM: SEV-ES: Delegate LBR virtualization to the processor KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent KVM: SEV-ES: Prevent MSR access post VMSA encryption RISC-V: KVM: Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext function RISC-V: KVM: No need to use mask when hart-index-bit is 0 KVM: arm64: nv: Expose BTI and CSV_frac to a guest hypervisor KVM: arm64: nv: Fix relative priorities of exceptions generated by ERETAx KVM: arm64: AArch32: Fix spurious trapping of conditional instructions KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode ...
2024-06-05s390/crash: Do not use VM info if os_info does not have itAlexander Gordeev
The virtual memory information stored in os_info area is required for creation of the kernel image PT_LOAD program header for kernels since commit a2ec5bec56dd ("s390/mm: uncouple physical vs virtual address spaces"). By contrast, if such information in os_info is absent the PT_LOAD program header should not be created. Currently the proper PT_LOAD program header is created for kernels that contain the virtual memory information, but for kernels without one an invalid header of zero size is created. That in turn leads to stand-alone dump failures. Use OS_INFO_KASLR_OFFSET variable to check whether os_info is present or not (same as crash and makedumpfile tools do) and based on that create or do not create the kernel image PT_LOAD program header. Fixes: f4cac27dc0d6 ("s390/crash: Use old os_info to create PT_LOAD headers") Tested-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Acked-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
2024-06-05arm64/io: add constant-argument checkArnd Bergmann
In some configurations __const_iowrite32_copy() does not get inlined and gcc runs into the BUILD_BUG(): In file included from <command-line>: In function '__const_memcpy_toio_aligned32', inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:203:3, inlined from '__const_iowrite32_copy' at arch/arm64/include/asm/io.h:199:20: include/linux/compiler_types.h:487:45: error: call to '__compiletime_assert_538' declared with attribute error: BUILD_BUG failed 487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^ include/linux/compiler_types.h:468:25: note: in definition of macro '__compiletime_assert' 468 | prefix ## suffix(); \ | ^~~~~~ include/linux/compiler_types.h:487:9: note: in expansion of macro '_compiletime_assert' 487 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | ^~~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert' 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg) | ^~~~~~~~~~~~~~~~~~ include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG' 59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed") | ^~~~~~~~~~~~~~~~ arch/arm64/include/asm/io.h:193:17: note: in expansion of macro 'BUILD_BUG' 193 | BUILD_BUG(); | ^~~~~~~~~ Move the check for constant arguments into the inline function to ensure it is still constant if the compiler decides against inlining it, and mark them as __always_inline to override the logic that sometimes leads to the compiler not producing the simplified output. Note that either the __always_inline annotation or the check for a constant value are sufficient here, but combining the two looks cleaner as it also avoids the macro. With clang-8 and older, the macro was still needed, but all versions of gcc and clang can reliably perform constant folding here. Fixes: ead79118dae6 ("arm64/io: Provide a WC friendly __iowriteXX_copy()") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20240604210006.668912-1-arnd@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-06-05KVM: x86/mmu: Don't save mmu_invalidate_seq after checking private attrTao Su
Drop the second snapshot of mmu_invalidate_seq in kvm_faultin_pfn(). Before checking the mismatch of private vs. shared, mmu_invalidate_seq is saved to fault->mmu_seq, which can be used to detect an invalidation related to the gfn occurred, i.e. KVM will not install a mapping in page table if fault->mmu_seq != mmu_invalidate_seq. Currently there is a second snapshot of mmu_invalidate_seq, which may not be same as the first snapshot in kvm_faultin_pfn(), i.e. the gfn attribute may be changed between the two snapshots, but the gfn may be mapped in page table without hindrance. Therefore, drop the second snapshot as it has no obvious benefits. Fixes: f6adeae81f35 ("KVM: x86/mmu: Handle no-slot faults at the beginning of kvm_faultin_pfn()") Signed-off-by: Tao Su <tao1.su@linux.intel.com> Message-ID: <20240528102234.2162763-1-tao1.su@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-05Merge tag 'kvmarm-fixes-6.10-1' of ↵Paolo Bonzini
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.10, take #1 - Large set of FP/SVE fixes for pKVM, addressing the fallout from the per-CPU data rework and making sure that the host is not involved in the FP/SVE switching any more - Allow FEAT_BTI to be enabled with NV now that FEAT_PAUTH is copletely supported - Fix for the respective priorities of Failed PAC, Illegal Execution state and Instruction Abort exceptions - Fix the handling of AArch32 instruction traps failing their condition code, which was broken by the introduction of ESR_EL2.ISS2 - Allow vpcus running in AArch32 state to be restored in System mode - Fix AArch32 GPR restore that would lose the 64 bit state under some conditions
2024-06-05arm64: armv8_deprecated: Fix warning in isndep cpuhp starting processWei Li
The function run_all_insn_set_hw_mode() is registered as startup callback of 'CPUHP_AP_ARM64_ISNDEP_STARTING', it invokes set_hw_mode() methods of all emulated instructions. As the STARTING callbacks are not expected to fail, if one of the set_hw_mode() fails, e.g. due to el0 mixed-endian is not supported for 'setend', it will report a warning: ``` CPU[2] cannot support the emulation of setend CPU 2 UP state arm64/isndep:starting (136) failed (-22) CPU2: Booted secondary processor 0x0000000002 [0x414fd0c1] ``` To fix it, add a check for INSN_UNAVAILABLE status and skip the process. Signed-off-by: Wei Li <liwei391@huawei.com> Tested-by: Huisong Li <lihuisong@huawei.com> Link: https://lore.kernel.org/r/20240423093501.3460764-1-liwei391@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-06-04KVM: arm64: Ensure that SME controls are disabled in protected modeFuad Tabba
KVM (and pKVM) do not support SME guests. Therefore KVM ensures that the host's SME state is flushed and that SME controls for enabling access to ZA storage and for streaming are disabled. pKVM needs to protect against a buggy/malicious host. Ensure that it wouldn't run a guest when protected mode is enabled should any of the SME controls be enabled. Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-10-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Refactor CPACR trap bit setting/clearing to use ELx formatFuad Tabba
When setting/clearing CPACR bits for EL0 and EL1, use the ELx format of the bits, which covers both. This makes the code clearer, and reduces the chances of accidentally missing a bit. No functional change intended. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-9-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Consolidate initializing the host data's fpsimd_state/sve in pKVMFuad Tabba
Now that we have introduced finalize_init_hyp_mode(), lets consolidate the initializing of the host_data fpsimd_state and sve state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-8-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Eagerly restore host fpsimd/sve state in pKVMFuad Tabba
When running in protected mode we don't want to leak protected guest state to the host, including whether a guest has used fpsimd/sve. Therefore, eagerly restore the host state on guest exit when running in protected mode, which happens only if the guest has used fpsimd/sve. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-7-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Allocate memory mapped at hyp for host sve state in pKVMFuad Tabba
Protected mode needs to maintain (save/restore) the host's sve state, rather than relying on the host kernel to do that. This is to avoid leaking information to the host about guests and the type of operations they are performing. As a first step towards that, allocate memory mapped at hyp, per cpu, for the host sve state. The following patch will use this memory to save/restore the host state. Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-6-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Specialize handling of host fpsimd state on trapFuad Tabba
In subsequent patches, n/vhe will diverge on saving the host fpsimd/sve state when taking a guest fpsimd/sve trap. Add a specialized helper to handle it. No functional change intended. Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-5-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Abstract set/clear of CPTR_EL2 bits behind helperFuad Tabba
The same traps controlled by CPTR_EL2 or CPACR_EL1 need to be toggled in different parts of the code, but the exact bits and their polarity differ between these two formats and the mode (vhe/nvhe/hvhe). To reduce the amount of duplicated code and the chance of getting the wrong bit/polarity or missing a field, abstract the set/clear of CPTR_EL2 bits behind a helper. Since (h)VHE is the way of the future, use the CPACR_EL1 format, which is a subset of the VHE CPTR_EL2, as a reference. No functional change intended. Suggested-by: Oliver Upton <oliver.upton@linux.dev> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-4-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Fix prototype for __sve_save_state/__sve_restore_stateFuad Tabba
Since the prototypes for __sve_save_state/__sve_restore_state at hyp were added, the underlying macro has acquired a third parameter for saving/restoring ffr. Fix the prototypes to account for the third parameter, and restore the ffr for the guest since it is saved. Suggested-by: Mark Brown <broonie@kernel.org> Signed-off-by: Fuad Tabba <tabba@google.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240603122852.3923848-3-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-04KVM: arm64: Reintroduce __sve_save_stateFuad Tabba
Now that the hypervisor is handling the host sve state in protected mode, it needs to be able to save it. This reverts commit e66425fc9ba3 ("KVM: arm64: Remove unused __sve_save_state"). Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Fuad Tabba <tabba@google.com> Link: https://lore.kernel.org/r/20240603122852.3923848-2-tabba@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-06-03Merge tag 'kvm-riscv-fixes-6.10-1' of https://github.com/kvm-riscv/linux ↵Paolo Bonzini
into HEAD KVM/riscv fixes for 6.10, take #1 - No need to use mask when hart-index-bits is 0 - Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext()
2024-06-03Merge branch 'kvm-fixes-6.10-1' into HEADPaolo Bonzini
* Fixes and debugging help for the #VE sanity check. Also disable it by default, even for CONFIG_DEBUG_KERNEL, because it was found to trigger spuriously (most likely a processor erratum as the exact symptoms vary by generation). * Avoid WARN() when two NMIs arrive simultaneously during an NMI-disabled situation (GIF=0 or interrupt shadow) when the processor supports virtual NMI. While generally KVM will not request an NMI window when virtual NMIs are supported, in this case it *does* have to single-step over the interrupt shadow or enable the STGI intercept, in order to deliver the latched second NMI. * Drop support for hand tuning APIC timer advancement from userspace. Since we have adaptive tuning, and it has proved to work well, drop the module parameter for manual configuration and with it a few stupid bugs that it had.
2024-06-03KVM: x86: Drop support for hand tuning APIC timer advancement from userspaceSean Christopherson
Remove support for specifying a static local APIC timer advancement value, and instead present a read-only boolean parameter to let userspace enable or disable KVM's dynamic APIC timer advancement. Realistically, it's all but impossible for userspace to specify an advancement that is more precise than what KVM's adaptive tuning can provide. E.g. a static value needs to be tuned for the exact hardware and kernel, and if KVM is using hrtimers, likely requires additional tuning for the exact configuration of the entire system. Dropping support for a userspace provided value also fixes several flaws in the interface. E.g. KVM interprets a negative value other than -1 as a large advancement, toggling between a negative and positive value yields unpredictable behavior as vCPUs will switch from dynamic to static advancement, changing the advancement in the middle of VM creation can result in different values for vCPUs within a VM, etc. Those flaws are mostly fixable, but there's almost no justification for taking on yet more complexity (it's minimal complexity, but still non-zero). The only arguments against using KVM's adaptive tuning is if a setup needs a higher maximum, or if the adjustments are too reactive, but those are arguments for letting userspace control the absolute max advancement and the granularity of each adjustment, e.g. similar to how KVM provides knobs for halt polling. Link: https://lore.kernel.org/all/20240520115334.852510-1-zhoushuling@huawei.com Cc: Shuling Zhou <zhoushuling@huawei.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240522010304.1650603-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV-ES: Delegate LBR virtualization to the processorRavi Bangoria
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. Although KVM currently enforces LBRV for SEV-ES guests, there are multiple issues with it: o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR interception is used to dynamically toggle LBRV for performance reasons, this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3: [guest ~]# wrmsr 0x1d9 0x4 KVM: entry failed, hardware error 0xffffffff EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000 Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR is of swap type A. o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA encryption. [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653 Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Co-developed-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240531044644.768-4-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absentRavi Bangoria
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. So, prevent SEV-ES guests when LBRV support is missing. [1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653 Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240531044644.768-3-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03KVM: SEV-ES: Prevent MSR access post VMSA encryptionNikunj A Dadhania
KVM currently allows userspace to read/write MSRs even after the VMSA is encrypted. This can cause unintentional issues if MSR access has side- effects. For ex, while migrating a guest, userspace could attempt to migrate MSR_IA32_DEBUGCTLMSR and end up unintentionally disabling LBRV on the target. Fix this by preventing access to those MSRs which are context switched via the VMSA, once the VMSA is encrypted. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Message-ID: <20240531044644.768-2-ravi.bangoria@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-06-03x86/kexec: Fix bug with call depth trackingDavid Kaplan
The call to cc_platform_has() triggers a fault and system crash if call depth tracking is active because the GS segment has been reset by load_segments() and GS_BASE is now 0 but call depth tracking uses per-CPU variables to operate. Call cc_platform_has() earlier in the function when GS is still valid. [ bp: Massage. ] Fixes: 5d8213864ade ("x86/retbleed: Add SKL return thunk") Signed-off-by: David Kaplan <david.kaplan@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20240603083036.637-1-bp@kernel.org
2024-06-03Revert "riscv: mm: accelerate pagefault when badaccess"Palmer Dabbelt
I accidentally picked up an earlier version of this patch, which had already landed via mm. The patch I picked up contains a bug, which I kept as I thought it was a fix. So let's just revert it. This reverts commit 4c6c0020427a4547845a83f7e4d6085e16c3e24f. Fixes: 4c6c0020427a ("riscv: mm: accelerate pagefault when badaccess") Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240530164451.21336-1-palmer@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-03riscv: fix overlap of allocated page and PTR_ERRNam Cao
On riscv32, it is possible for the last page in virtual address space (0xfffff000) to be allocated. This page overlaps with PTR_ERR, so that shouldn't happen. There is already some code to ensure memblock won't allocate the last page. However, buddy allocator is left unchecked. Fix this by reserving physical memory that would be mapped at virtual addresses greater than 0xfffff000. Reported-by: Björn Töpel <bjorn@kernel.org> Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.to.us Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: <stable@vger.kernel.org> Tested-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Björn Töpel <bjorn@rivosinc.com> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org> Link: https://lore.kernel.org/r/20240425115201.3044202-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-03LoongArch: Fix GMAC's phy-mode definitions in dtsHuacai Chen
The GMAC of Loongson chips cannot insert the correct 1.5-2ns delay. So we need the PHY to insert internal delays for both transmit and receive data lines from/to the PHY device. Fix this by changing the "phy-mode" from "rgmii" to "rgmii-id" in dts. Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03LoongArch: Override higher address bits in JUMP_VIRT_ADDRJiaxun Yang
In JUMP_VIRT_ADDR we are performing an or calculation on address value directly from pcaddi. This will only work if we are currently running from direct 1:1 mapping addresses or firmware's DMW is configured exactly same as kernel. Still, we should not rely on such assumption. Fix by overriding higher bits in address comes from pcaddi, so we can get rid of or operator. Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03LoongArch: Fix entry point in kernel image headerJiaxun Yang
Currently kernel entry in head.S is in DMW address range, firmware is instructed to jump to this address after loading the kernel image. However kernel should not make any assumption on firmware's DMW setting, thus the entry point should be a physical address falls into direct translation region. Fix by converting entry address to physical and amend entry calculation logic in libstub accordingly. BTW, use ABSOLUTE() to calculate variables to make Clang/LLVM happy. Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03LoongArch: Add all CPUs enabled by fdt to NUMA node 0Jiaxun Yang
NUMA enabled kernel on FDT based machine fails to boot because CPUs are all in NUMA_NO_NODE and mm subsystem won't accept that. Fix by adding them to default NUMA node at FDT parsing phase and move numa_add_cpu(0) to a later point. Cc: stable@vger.kernel.org Fixes: 88d4d957edc7 ("LoongArch: Add FDT booting support from efi system table") Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03LoongArch: Fix built-in DTB detectionJiaxun Yang
fdt_check_header(__dtb_start) will always success because kernel provides a dummy dtb, and by coincidence __dtb_start clashed with entry of this dummy dtb. The consequence is fdt passed from firmware will never be taken. Fix by trying to utilise __dtb_start only when CONFIG_BUILTIN_DTB is enabled. Cc: stable@vger.kernel.org Fixes: 7b937cc243e5 ("of: Create of_root if no dtb provided by firmware") Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-03LoongArch: Remove CONFIG_ACPI_TABLE_UPGRADE in platform_init()Tiezhu Yang
Both acpi_table_upgrade() and acpi_boot_table_init() are defined as empty functions under !CONFIG_ACPI_TABLE_UPGRADE and !CONFIG_ACPI in include/linux/acpi.h, there are no implicit declaration errors with various configs. #ifdef CONFIG_ACPI_TABLE_UPGRADE void acpi_table_upgrade(void); #else static inline void acpi_table_upgrade(void) { } #endif #ifdef CONFIG_ACPI ... void acpi_boot_table_init (void); ... #else /* !CONFIG_ACPI */ ... static inline void acpi_boot_table_init(void) { } ... #endif /* !CONFIG_ACPI */ As Huacai suggested, CONFIG_ACPI_TABLE_UPGRADE is ugly and not necessary here, just remove it. At the same time, just keep CONFIG_ACPI to prevent potential build errors in future, and give a signal to indicate the code is ACPI-specific. For the same reason, we also put acpi_table_upgrade() under CONFIG_ACPI. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
2024-06-02Merge tag 'x86-urgent-2024-06-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Miscellaneous topology parsing fixes: - Fix topology parsing regression on older CPUs in the new AMD/Hygon parser - Fix boot crash on odd Intel Quark and similar CPUs that do not fill out cpuinfo_x86::x86_clflush_size and zero out cpuinfo_x86::x86_cache_alignment as a result. Provide 32 bytes as a general fallback value. - Fix topology enumeration on certain rare CPUs where the BIOS locks certain CPUID leaves and the kernel unlocked them late, which broke with the new topology parsing code. Factor out this unlocking logic and move it earlier in the parsing sequence" * tag 'x86-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/topology/intel: Unlock CPUID before evaluating anything x86/cpu: Provide default cache line size if not enumerated x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater
2024-06-02Merge tag 'sched-urgent-2024-06-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Export a symbol to make life easier for instrumentation/debugging" * tag 'sched-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/x86: Export 'percpu arch_freq_scale'
2024-06-02Merge tag 'perf-urgent-2024-06-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fix from Ingo Molnar: "Add missing MODULE_DESCRIPTION() lines" * tag 'perf-urgent-2024-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Add missing MODULE_DESCRIPTION() lines perf/x86/rapl: Add missing MODULE_DESCRIPTION() line
2024-06-01Merge tag 'powerpc-6.10-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Enforce full ordering for ATOMIC operations with BPF_FETCH - Fix uaccess build errors seen with GCC 13/14 - Fix build errors on ppc32 due to ARCH_HAS_KERNEL_FPU_SUPPORT - Drop error message from lparcfg guest name lookup Thanks to Christophe Leroy, Guenter Roeck, Nathan Lynch, Naveen N Rao, Puranjay Mohan, and Samuel Holland. * tag 'powerpc-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64 powerpc/uaccess: Use YZ asm constraint for ld powerpc/uaccess: Fix build errors seen with GCC 13/14 powerpc/pseries/lparcfg: drop error message from guest name lookup powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
2024-05-31Merge tag 'riscv-for-linus-6.10-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V fixes from Palmer Dabbelt: - A fix to avoid pt_regs aliasing with idle thread stacks on secondary harts. - HAVE_ARCH_HUGE_VMAP is enabled on XIP kernels, which fixes boot issues on XIP systems with huge pages. - An update to the uABI documentation clarifying that only scalar misaligned accesses were grandfathered in as supported, as the vector extension did not exist at the time the uABI was frozen. - A fix for the recently-added byte/half atomics to avoid losing the fully ordered decorations. * tag 'riscv-for-linus-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: riscv: Fix fully ordered LR/SC xchg[8|16]() implementations Documentation: RISC-V: uabi: Only scalar misaligned loads are supported riscv: enable HAVE_ARCH_HUGE_VMAP for XIP kernel riscv: prevent pt_regs corruption for secondary idle threads
2024-05-31x86/topology/intel: Unlock CPUID before evaluating anythingThomas Gleixner
Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If this bit is set by the BIOS then CPUID evaluation including topology enumeration does not work correctly as the evaluation code does not try to analyze any leaf greater than two. This went unnoticed before because the original topology code just repeated evaluation several times and managed to overwrite the initial limited information with the correct one later. The new evaluation code does it once and therefore ends up with the limited and wrong information. Cure this by unlocking CPUID right before evaluating anything which depends on the maximum CPUID leaf being greater than two instead of rereading stuff after unlock. Fixes: 22d63660c35e ("x86/cpu: Use common topology code for Intel") Reported-by: Peter Schneider <pschneider1968@googlemail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Peter Schneider <pschneider1968@googlemail.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/fd3f73dc-a86f-4bcf-9c60-43556a21eb42@googlemail.com
2024-05-31sched/x86: Export 'percpu arch_freq_scale'Phil Auld
Commit: 7bc263840bc3 ("sched/topology: Consolidate and clean up access to a CPU's max compute capacity") removed rq->cpu_capacity_orig in favor of using arch_scale_freq_capacity() calls. Export the underlying percpu symbol on x86 so that external trace point helper modules can be made to work again. Signed-off-by: Phil Auld <pauld@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240530181548.2039216-1-pauld@redhat.com
2024-05-31perf/x86/intel: Add missing MODULE_DESCRIPTION() linesJeff Johnson
Fix the 'make W=1 C=1' warnings: WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-uncore.o WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/intel/intel-cstate.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-intel-v1-1-8252194ed20a@quicinc.com
2024-05-31perf/x86/rapl: Add missing MODULE_DESCRIPTION() lineJeff Johnson
Fix the warning from 'make C=1 W=1': WARNING: modpost: missing MODULE_DESCRIPTION() in arch/x86/events/rapl.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: https://lore.kernel.org/r/20240530-md-arch-x86-events-v1-1-e45ffa8af99f@quicinc.com
2024-05-31RISC-V: KVM: Fix incorrect reg_subtype labels in ↵Quan Zhou
kvm_riscv_vcpu_set_reg_isa_ext function In the function kvm_riscv_vcpu_set_reg_isa_ext, the original code used incorrect reg_subtype labels KVM_REG_RISCV_SBI_MULTI_EN/DIS. These have been corrected to KVM_REG_RISCV_ISA_MULTI_EN/DIS respectively. Although they are numerically equivalent, the actual processing will not result in errors, but it may lead to ambiguous code semantics. Fixes: 613029442a4b ("RISC-V: KVM: Extend ONE_REG to enable/disable multiple ISA extensions") Signed-off-by: Quan Zhou <zhouquan@iscas.ac.cn> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/ff1c6771a67d660db94372ac9aaa40f51e5e0090.1716429371.git.zhouquan@iscas.ac.cn Signed-off-by: Anup Patel <anup@brainfault.org>
2024-05-31RISC-V: KVM: No need to use mask when hart-index-bit is 0Yong-Xuan Wang
When the maximum hart number within groups is 1, hart-index-bit is set to 0. Consequently, there is no need to restore the hart ID from IMSIC addresses and hart-index-bit settings. Currently, QEMU and kvmtool do not pass correct hart-index-bit values when the maximum hart number is a power of 2, thereby avoiding this issue. Corresponding patches for QEMU and kvmtool will also be dispatched. Fixes: 89d01306e34d ("RISC-V: KVM: Implement device interface for AIA irqchip") Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20240415064905.25184-1-yongxuan.wang@sifive.com Signed-off-by: Anup Patel <anup@brainfault.org>
2024-05-30riscv: Fix fully ordered LR/SC xchg[8|16]() implementationsAlexandre Ghiti
The fully ordered versions of xchg[8|16]() using LR/SC lack the necessary memory barriers to guarantee the order. Fix this by matching what is already implemented in the fully ordered versions of cmpxchg() using LR/SC. Suggested-by: Andrea Parri <parri.andrea@gmail.com> Reported-by: Andrea Parri <parri.andrea@gmail.com> Closes: https://lore.kernel.org/linux-riscv/ZlYbupL5XgzgA0MX@andrea/T/#u Fixes: a8ed2b7a2c13 ("riscv/cmpxchg: Implement xchg for variables of size 1 and 2") Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by: Andrea Parri <parri.andrea@gmail.com> Link: https://lore.kernel.org/r/20240530145546.394248-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-05-30riscv: enable HAVE_ARCH_HUGE_VMAP for XIP kernelNam Cao
HAVE_ARCH_HUGE_VMAP also works on XIP kernel, so remove its dependency on !XIP_KERNEL. This also fixes a boot problem for XIP kernel introduced by the commit in "Fixes:". This commit used huge page mapping for vmemmap, but huge page vmap was not enabled for XIP kernel. Fixes: ff172d4818ad ("riscv: Use hugepage mappings for vmemmap") Signed-off-by: Nam Cao <namcao@linutronix.de> Cc: <stable@vger.kernel.org> Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20240526110104.470429-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>