summaryrefslogtreecommitdiff
path: root/arch/arm64/include
AgeCommit message (Collapse)Author
2024-10-31KVM: arm64: Add basic support for POR_EL2Marc Zyngier
S1POE support implies support for POR_EL2, which we provide by - adding it to the vcpu_sysreg enum - advertising it as mapped to its EL1 counterpart in get_el2_to_el1_mapping - wiring it in the sys_reg_desc table with the correct visibility - handling POR_EL1 in __vcpu_{read,write}_sys_reg_from_cpu() Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-32-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add kvm_has_s1poe() helperMarc Zyngier
Just like we have kvm_has_s1pie(), add its S1POE counterpart, making the code slightly more readable. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-31-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Hide S1PIE registers from userspace when disabled for guestsMark Brown
When the guest does not support S1PIE we should not allow any access to the system registers it adds in order to ensure that we do not create spurious issues with guest migration. Add a visibility operation for these registers. Fixes: 86f9de9db178 ("KVM: arm64: Save/restore PIE registers") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-3-376624fa829c@kernel.org [maz: simplify by using __el2_visibility(), kvm_has_s1pie() throughout] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-26-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Hide TCR2_EL1 from userspace when disabled for guestsMark Brown
When the guest does not support FEAT_TCR2 we should not allow any access to it in order to ensure that we do not create spurious issues with guest migration. Add a visibility operation for it. Fixes: fbff56068232 ("KVM: arm64: Save/restore TCR2_EL1") Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240822-kvm-arm64-hide-pie-regs-v2-2-376624fa829c@kernel.org [maz: simplify by using __el2_visibility(), kvm_has_tcr2() throughout] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-25-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add PIR{,E0}_EL2 to the sysreg arraysMarc Zyngier
Add the FEAT_S1PIE EL2 registers to the per-vcpu sysreg register array. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-15-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Extend masking facility to arbitrary registersMarc Zyngier
We currently only use the masking (RES0/RES1) facility for VNCR registers, as they are memory-based and thus easy to sanitise. But we could apply the same thing to other registers if we: - split the sanitisation from __VNCR_START__ - apply the sanitisation when reading from a HW register This involves a new "marker" in the vcpu_sysreg enum, which defines the point at which the sanitisation applies (the VNCR registers being of course after this marker). Whle we are at it, rename kvm_vcpu_sanitise_vncr_reg() to kvm_vcpu_apply_reg_masks(), which is vaguely more explicit, and harden set_sysreg_masks() against setting masks for random registers... Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-10-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Correctly access TCR2_EL1, PIR_EL1, PIRE0_EL1 with VHEMarc Zyngier
For code that accesses any of the guest registers for emulation purposes, it is crucial to know where the most up-to-date data is. While this is pretty clear for nVHE (memory is the sole repository), things are a lot muddier for VHE, as depending on the SYSREGS_ON_CPU flag, registers can either be loaded on the HW or be in memory. Even worse with NV, where the loaded state is by definition partial. For these reasons, KVM offers the vcpu_read_sys_reg() and vcpu_write_sys_reg() primitives that always do the right thing. However, these primitive must know what register to access, and this is the role of the __vcpu_read_sys_reg_from_cpu() and __vcpu_write_sys_reg_to_cpu() helpers. As it turns out, TCR2_EL1, PIR_EL1, PIRE0_EL1 and not described in the latter helpers, meaning that the AT code cannot use them to emulate S1PIE. Add the three registers to the (long) list. Fixes: 86f9de9db178 ("KVM: arm64: Save/restore PIE registers") Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Joey Gouly <joey.gouly@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20241023145345.1613824-9-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Add TCR2_EL2 to the sysreg arraysMarc Zyngier
Add the TCR2_EL2 register to the per-vcpu sysreg register array, the sysreg descriptor array, and advertise it as mapped to TCR2_EL1 for NV purposes. Access to this register is conditional based on ID_AA64MMFR3_EL1.TCRX being advertised. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-12-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31arm64: Remove VNCR definition for PIRE0_EL2Marc Zyngier
As of the ARM ARM Known Issues document 102105_K.a_04_en, D22677 fixes a problem with the PIRE0_EL2 register, resulting in its removal from the VNCR page (it had no purpose being there the first place). Follow the architecture update by removing this offset. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-3-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-30KVM: Protect vCPU's "last run PID" with rwlock, not RCUSean Christopherson
To avoid jitter on KVM_RUN due to synchronize_rcu(), use a rwlock instead of RCU to protect vcpu->pid, a.k.a. the pid of the task last used to a vCPU. When userspace is doing M:N scheduling of tasks to vCPUs, e.g. to run SEV migration helper vCPUs during post-copy, the synchronize_rcu() needed to change the PID associated with the vCPU can stall for hundreds of milliseconds, which is problematic for latency sensitive post-copy operations. In the directed yield path, do not acquire the lock if it's contended, i.e. if the associated PID is changing, as that means the vCPU's task is already running. Reported-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240802200136.329973-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-10-28arm64: Use new fallback IO memcpy/memsetJulian Vetter
Use the new fallback memcpy_{from,to}io and memset_io functions from lib/iomem_copy.c on the arm64 processor architecture. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Yann Sionneau <ysionneau@kalrayinc.com> Signed-off-by: Julian Vetter <jvetter@kalrayinc.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28asm-generic: provide generic page_to_phys and phys_to_page implementationsChristoph Hellwig
page_to_phys is duplicated by all architectures, and from some strange reason placed in <asm/io.h> where it doesn't fit at all. phys_to_page is only provided by a few architectures despite having a lot of open coded users. Provide generic versions in <asm-generic/memory_model.h> to make these helpers more easily usable. Note with this patch powerpc loses the CONFIG_DEBUG_VIRTUAL pfn_valid check. It will be added back in a generic version later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-10-28perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access controlRob Herring (Arm)
Armv8.9/9.4 PMUv3.9 adds per counter EL0 access controls. Per counter access is enabled with the UEN bit in PMUSERENR_EL1 register. Individual counters are enabled/disabled in the PMUACR_EL1 register. When UEN is set, the CR/ER bits control EL0 write access and must be set to disable write access. With the access controls, the clearing of unused counters can be skipped. KVM also configures PMUSERENR_EL1 in order to trap to EL2. UEN does not need to be set for it since only PMUv3.5 is exposed to guests. Signed-off-by: Rob Herring (Arm) <robh@kernel.org> Link: https://lore.kernel.org/r/20241002184326.1105499-1-robh@kernel.org Signed-off-by: Will Deacon <will@kernel.org>
2024-10-25KVM: arm64: Don't mark "struct page" accessed when making SPTE youngSean Christopherson
Don't mark pages/folios as accessed in the primary MMU when making a SPTE young in KVM's secondary MMU, as doing so relies on kvm_pfn_to_refcounted_page(), and generally speaking is unnecessary and wasteful. KVM participates in page aging via mmu_notifiers, so there's no need to push "accessed" updates to the primary MMU. Dropping use of kvm_set_pfn_accessed() also paves the way for removing kvm_pfn_to_refcounted_page() and all its users. Tested-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20241010182427.1434605-84-seanjc@google.com>
2024-10-24KVM: arm64: Add PSCI v1.3 SYSTEM_OFF2 function for hibernationDavid Woodhouse
The PSCI v1.3 specification adds support for a SYSTEM_OFF2 function which is analogous to ACPI S4 state. This will allow hosting environments to determine that a guest is hibernated rather than just powered off, and ensure that they preserve the virtual environment appropriately to allow the guest to resume safely (or bump the hardware_signature in the FACS to trigger a clean reboot instead). This feature is safe to enable unconditionally (in a subsequent commit) because it is exposed to userspace through the existing KVM_SYSTEM_EVENT_SHUTDOWN event, just with an additional flag which userspace can use to know that the instance intended hibernation instead of a plain power-off. As with SYSTEM_RESET2, there is only one type available (in this case HIBERNATE_OFF), and it is not explicitly reported to userspace through the event; userspace can get it from the registers if it cares). Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Miguel Luis <miguel.luis@oracle.com> Link: https://lore.kernel.org/r/20241019172459.2241939-3-dwmw2@infradead.org [oliver: slight cleanup of comments] Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-23arm64/mm: Drop _PROT_SECT_DEFAULTAnshuman Khandual
'commit db95ea787bd1 ("arm64: mm: Wire up TCR.DS bit to PTE shareability fields")' dropped the last reference to symbol _PROT_SECT_DEFAULT, while transitioning from PMD_SECT_S to PMD_MAYBE_SHARED for PROT_SECT_DEFAULT. Hence let's just drop that symbol which is now unused. Cc: Will Deacon <will@kernel.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20241021063713.750870-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-23arm64: Enable memory encrypt for RealmsSuzuki K Poulose
Use the memory encryption APIs to trigger a RSI call to request a transition between protected memory and shared memory (or vice versa) and updating the kernel's linear map of modified pages to flip the top bit of the IPA. This requires that block mappings are not used in the direct map for realm guests. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Co-developed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20241017131434.40935-10-steven.price@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-23arm64: rsi: Add support for checking whether an MMIO is protectedSuzuki K Poulose
On Arm CCA, with RMM-v1.0, all MMIO regions are shared. However, in the future, an Arm CCA-v1.0 compliant guest may be run in a lesser privileged partition in the Realm World (with Arm CCA-v1.1 Planes feature). In this case, some of the MMIO regions may be emulated by a higher privileged component in the Realm world, i.e, protected. Thus the guest must decide today, whether a given MMIO region is shared vs Protected and create the stage1 mapping accordingly. On Arm CCA, this detection is based on the "IPA State" (RIPAS == RIPAS_IO). Provide a helper to run this check on a given range of MMIO. Also, provide a arm64 helper which may be hooked in by other solutions. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20241017131434.40935-5-steven.price@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-23arm64: realm: Query IPA size from the RMMSteven Price
The top bit of the configured IPA size is used as an attribute to control whether the address is protected or shared. Query the configuration from the RMM to assertain which bit this is. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20241017131434.40935-4-steven.price@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-23arm64: Detect if in a realm and set RIPAS RAMSuzuki K Poulose
Detect that the VM is a realm guest by the presence of the RSI interface. This is done after PSCI has been initialised so that we can check the SMCCC conduit before making any RSI calls. If in a realm then iterate over all memory ensuring that it is marked as RIPAS RAM. The loader is required to do this for us, however if some memory is missed this will cause the guest to receive a hard to debug external abort at some random point in the future. So for a belt-and-braces approach set all memory to RIPAS RAM. Any failure here implies that the RAM regions passed to Linux are incorrect so panic() promptly to make the situation clear. Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Co-developed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20241017131434.40935-3-steven.price@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-23arm64: rsi: Add RSI definitionsSuzuki K Poulose
The RMM (Realm Management Monitor) provides functionality that can be accessed by a realm guest through SMC (Realm Services Interface) calls. The SMC definitions are based on DEN0137[1] version 1.0-rel0. [1] https://developer.arm.com/documentation/den0137/1-0rel0/ Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20241017131434.40935-2-steven.price@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-22arm64: preserve pt_regs::stackframe during exec*()Mark Rutland
When performing an exec*(), there's a transient period before the return to userspace where any stacktrace will result in a warning triggered by kunwind_next_frame_record_meta() encountering a struct frame_record_meta with an unknown type. This can be seen fairly reliably by enabling KASAN or KFENCE, e.g. | WARNING: CPU: 3 PID: 143 at arch/arm64/kernel/stacktrace.c:223 arch_stack_walk+0x264/0x3b0 | Modules linked in: | CPU: 3 UID: 0 PID: 143 Comm: login Not tainted 6.12.0-rc2-00010-g0f0b9a3f6a50 #1 | Hardware name: linux,dummy-virt (DT) | pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--) | pc : arch_stack_walk+0x264/0x3b0 | lr : arch_stack_walk+0x1ec/0x3b0 | sp : ffff80008060b970 | x29: ffff80008060ba10 x28: fff00000051133c0 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: fff000007fe84000 | x23: ffff9d1b3c940af0 x22: 0000000000000000 x21: fff00000051133c0 | x20: ffff80008060ba50 x19: ffff9d1b3c9408e0 x18: 0000000000000014 | x17: 000000006d50da47 x16: 000000008e3f265e x15: fff0000004e8bf40 | x14: 0000ffffc5e50e48 x13: 000000000000000f x12: 0000ffffc5e50fed | x11: 000000000000001f x10: 000018007f8bffff x9 : 0000000000000000 | x8 : ffff80008060b9c0 x7 : ffff80008060bfd8 x6 : ffff80008060ba80 | x5 : ffff80008060ba00 x4 : ffff80008060c000 x3 : ffff80008060bff0 | x2 : 0000000000000018 x1 : ffff80008060bfd8 x0 : 0000000000000000 | Call trace: | arch_stack_walk+0x264/0x3b0 (P) | arch_stack_walk+0x1ec/0x3b0 (L) | stack_trace_save+0x50/0x80 | metadata_update_state+0x98/0xa0 | kfence_guarded_free+0xec/0x2c4 | __kfence_free+0x50/0x100 | kmem_cache_free+0x1a4/0x37c | putname+0x9c/0xc0 | do_execveat_common.isra.0+0xf0/0x1e4 | __arm64_sys_execve+0x40/0x60 | invoke_syscall+0x48/0x104 | el0_svc_common.constprop.0+0x40/0xe0 | do_el0_svc+0x1c/0x28 | el0_svc+0x34/0xe0 | el0t_64_sync_handler+0x120/0x12c | el0t_64_sync+0x198/0x19c This happens because start_thread_common() zeroes the entirety of current_pt_regs(), including pt_regs::stackframe::type, changing this from FRAME_META_TYPE_FINAL to 0 and making the final record invalid. The stacktrace code will reject this until the next return to userspace, where a subsequent exception entry will reinitialize the type to FRAME_META_TYPE_FINAL. This zeroing wasn't a problem prior to commit: c2c6b27b5aa14fa2 ("arm64: stacktrace: unwind exception boundaries") ... as before that commit the stacktrace code only expected the final pt_regs::stackframe to contain zeroes, which was unchanged by start_thread_common(). A stacktrace could occur at any time, either due to instrumentation or an exception, and so start_thread_common() must ensure that pt_regs::stackframe is always valid. Fix this by changing the way start_thread_common() zeroes and reinitializes the pt_regs fields: * The '{regs,pc,pstate}' fields are initialized in one go via a struct assignment to the user_regs, with start_thread() and compat_start_thread() modified to pass 'pstate' into start_thread_common(). * The 'sp' and 'compat_sp' fields are zeroed by the struct assignment in start_thread_common(), and subsequently overwritten in start_thread() and compat_start_thread respectively, matching existing behaviour. * The 'syscallno' field is implicitly preserved while the 'orig_x0' field is explicitly zeroed, maintaining existing ABI. * The 'pmr' field is explicitly initialized, as necessary for an exec*() from a kernel thread, matching existing behaviour. * The 'stackframe' field is implicitly preserved, with a new comment and some assertions to ensure we don't accidentally break this in future. * All other fields are implicitly preserved, and should have no functional impact: - 'sdei_ttbr1' is only used for SDEI exception entry/exit, and we never exec*() inside an SDEI handler. - 'lockdep_hardirqs' and 'exit_rcu' are only used for EL1 exception entry/exit, and we never exec*() inside an EL1 exception handler. While updating compat_start_thread() to pass 'pstate' into start_thread_common(), I've also updated the logic to remove the ifdeffery, replacing: | #ifdef __AARCH64EB__ | regs->pstate |= PSR_AA32_E_BIT; | #endif ... with: | if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) | pstate |= PSR_AA32_E_BIT; ... which should be functionally equivalent, and matches our preferred code style. Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries") Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Puranjay Mohan <puranjay12@gmail.com> Cc: Will Deacon <will@kernel.org> Fixes: c2c6b27b5aa1 ("arm64: stacktrace: unwind exception boundaries") Tested-by: Puranjay Mohan <puranjay12@gmail.com> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20241021164456.2275285-1-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm fixes from Paolo Bonzini: "ARM64: - Fix the guest view of the ID registers, making the relevant fields writable from userspace (affecting ID_AA64DFR0_EL1 and ID_AA64PFR1_EL1) - Correcly expose S1PIE to guests, fixing a regression introduced in 6.12-rc1 with the S1POE support - Fix the recycling of stage-2 shadow MMUs by tracking the context (are we allowed to block or not) as well as the recycling state - Address a couple of issues with the vgic when userspace misconfigures the emulation, resulting in various splats. Headaches courtesy of our Syzkaller friends - Stop wasting space in the HYP idmap, as we are dangerously close to the 4kB limit, and this has already exploded in -next - Fix another race in vgic_init() - Fix a UBSAN error when faking the cache topology with MTE enabled RISCV: - RISCV: KVM: use raw_spinlock for critical section in imsic x86: - A bandaid for lack of XCR0 setup in selftests, which causes trouble if the compiler is configured to have x86-64-v3 (with AVX) as the default ISA. Proper XCR0 setup will come in the next merge window. - Fix an issue where KVM would not ignore low bits of the nested CR3 and potentially leak up to 31 bytes out of the guest memory's bounds - Fix case in which an out-of-date cached value for the segments could by returned by KVM_GET_SREGS. - More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL - Override MTRR state for KVM confidential guests, making it WB by default as is already the case for Hyper-V guests. Generic: - Remove a couple of unused functions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits) RISCV: KVM: use raw_spinlock for critical section in imsic KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups KVM: selftests: x86: Avoid using SSE/AVX instructions KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset() KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range() KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot x86/kvm: Override default caching mode for SEV-SNP and TDX KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic KVM: Remove unused kvm_vcpu_gfn_to_pfn KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS KVM: arm64: Fix shift-out-of-bounds bug KVM: arm64: Shave a few bytes from the EL2 idmap code KVM: arm64: Don't eagerly teardown the vgic on init error KVM: arm64: Expose S1PIE to guests KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule KVM: arm64: nv: Punt stage-2 recycling to a vCPU request KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed ...
2024-10-17arm64: Support AT_HWCAP3Mark Brown
We have filled all 64 bits of AT_HWCAP2 so in order to support discovery of further features provide the framework to use the already defined AT_HWCAP3 for further CPU features. Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20241004-arm64-elf-hwcap3-v2-2-799d1daad8b0@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: stacktrace: unwind exception boundariesMark Rutland
When arm64's stack unwinder encounters an exception boundary, it uses the pt_regs::stackframe created by the entry code, which has a copy of the PC and FP at the time the exception was taken. The unwinder doesn't know anything about pt_regs, and reports the PC from the stackframe, but does not report the LR. The LR is only guaranteed to contain the return address at function call boundaries, and can be used as a scratch register at other times, so the LR at an exception boundary may or may not be a legitimate return address. It would be useful to report the LR value regardless, as it can be helpful when debugging, and in future it will be helpful for reliable stacktrace support. This patch changes the way we unwind across exception boundaries, allowing both the PC and LR to be reported. The entry code creates a frame_record_meta structure embedded within pt_regs, which the unwinder uses to find the pt_regs. The unwinder can then extract pt_regs::pc and pt_regs::lr as two separate unwind steps before continuing with a regular walk of frame records. When a PC is unwound from pt_regs::lr, dump_backtrace() will log this with an "L" marker so that it can be identified easily. For example, an unwind across an exception boundary will appear as follows: | el1h_64_irq+0x6c/0x70 | _raw_spin_unlock_irqrestore+0x10/0x60 (P) | __aarch64_insn_write+0x6c/0x90 (L) | aarch64_insn_patch_text_nosync+0x28/0x80 ... with a (P) entry for pt_regs::pc, and an (L) entry for pt_regs:lr. Note that the LR may be stale at the point of the exception, for example, shortly after a return: | el1h_64_irq+0x6c/0x70 | default_idle_call+0x34/0x180 (P) | default_idle_call+0x28/0x180 (L) | do_idle+0x204/0x268 ... where the LR points a few instructions before the current PC. This plays nicely with all the other unwind metadata tracking. With the ftrace_graph profiler enabled globally, and kretprobes installed on generic_handle_domain_irq() and do_interrupt_handler(), a backtrace triggered by magic-sysrq + L reports: | Call trace: | show_stack+0x20/0x40 (CF) | dump_stack_lvl+0x60/0x80 (F) | dump_stack+0x18/0x28 | nmi_cpu_backtrace+0xfc/0x140 | nmi_trigger_cpumask_backtrace+0x1c8/0x200 | arch_trigger_cpumask_backtrace+0x20/0x40 | sysrq_handle_showallcpus+0x24/0x38 (F) | __handle_sysrq+0xa8/0x1b0 (F) | handle_sysrq+0x38/0x50 (F) | pl011_int+0x460/0x5a8 (F) | __handle_irq_event_percpu+0x60/0x220 (F) | handle_irq_event+0x54/0xc0 (F) | handle_fasteoi_irq+0xa8/0x1d0 (F) | generic_handle_domain_irq+0x34/0x58 (F) | gic_handle_irq+0x54/0x140 (FK) | call_on_irq_stack+0x24/0x58 (F) | do_interrupt_handler+0x88/0xa0 | el1_interrupt+0x34/0x68 (FK) | el1h_64_irq_handler+0x18/0x28 | el1h_64_irq+0x6c/0x70 | default_idle_call+0x34/0x180 (P) | default_idle_call+0x28/0x180 (L) | do_idle+0x204/0x268 | cpu_startup_entry+0x3c/0x50 (F) | rest_init+0xe4/0xf0 | start_kernel+0x744/0x750 | __primary_switched+0x88/0x98 Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-11-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: stacktrace: split unwind_consume_stack()Mark Rutland
When unwinding stacks, we use unwind_consume_stack() to both find whether an object (e.g. a frame record) is on an accessible stack *and* to update the stack boundaries. This works fine today since we only care about one type of object which does not overlap other objects. In subsequent patches we'll want to check whether an object (e.g a frame record) is on the stack and follow this up by accessing a larger object containing the first (e.g. a pt_regs with an embedded frame record). To make that pattern easier to implement, this patch reworks unwind_find_next_stack() and unwind_consume_stack() so that the former can be used to check if an object is on any accessible stack, and the latter is purely used to update the stack boundaries. As unwind_find_next_stack() is modified to also check the stack currently being unwound, it is renamed to unwind_find_stack(). There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-10-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: use a common struct frame_recordMark Rutland
Currently the signal handling code has its own struct frame_record, the definition of struct pt_regs open-codes a frame record as an array, and the kernel unwinder hard-codes frame record offsets. Move to a common struct frame_record that can be used throughout the kernel. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: pt_regs: swap 'unused' and 'pmr' fieldsMark Rutland
In subsequent patches we'll want to add an additional u64 to struct pt_regs. To make space, this patch swaps the 'unused' and 'pmr' fields, as the 'pmr' value only requires bits[7:0] and can safely fit into a u32, which frees up a 64-bit unused field. The 'lockdep_hardirqs' and 'exit_rcu' fields should eventually be moved out of pt_regs and managed locally within entry-common.c, so I've left those as-is for the moment. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: pt_regs: rename "pmr_save" -> "pmr"Mark Rutland
The pt_regs::pmr_save field is weirdly named relative to all other pt_regs fields, with a '_save' suffix that doesn't make anything clearer and only leads to more typing to access the field. Remove the '_save' suffix. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-4-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: pt_regs: remove stale big-endian layoutMark Rutland
For historical reasons the layout of struct pt_regs depends on the configured endianness, with the order of the 'syscallno' and 'unused2' fields varying dependent upon whether __AARCH64EB__ is defined. We no longer depend on the order of these two fields and can remove the ifdeffery. The current conditional layout was introduced in commit: 35d0e6fb4d219d64 ("arm64: syscallno is secretly an int, make it official") At the time, this was necessary so that the entry assembly could use a single STP instruction to save the pt_regs::{orig_x0,syscallno} fields, without logic that was conditional on the endianness of the kernel: | el0_svc_naked: | stp x0, xscno, [sp, #S_ORIG_X0] // save the original x0 and syscall number This logic was converted to C in commit: f37099b6992a0b81 ("arm64: convert syscall trace logic to C") Since that commit, we no longer manipulate pt_regs::orig_x0 from assembly, and only manipulate pt_regs::syscallno as a 32-bit quantity early in the kernel_entry assembly: | /* Not in a syscall by default (el0_svc overwrites for real syscall) */ | .if \el == 0 | mov w21, #NO_SYSCALL | str w21, [sp, #S_SYSCALLNO] | .endif Given the above, there's no longer a need for the layout of pt_regs::{syscallno,unused2} to depend on the endianness of the kernel. This patch removes the ifdeffery and places 'syscallno' before 'unused2' regardless of the endianess of the kernel. At the same time, 'unused2' is renamed to 'unused', as it is the only unused field within pt_regs. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: pt_regs: assert pt_regs is a multiple of 16 bytesMark Rutland
To ensure that the stack is correctly aligned when branching to C code, we require that struct pt_regs is a multiple of 16 bytes, as noted in a comment. Add an explicit assertion for this, so that any accidental violation of this requirement will be caught by the compiler. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Puranjay Mohan <puranjay12@gmail.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Madhavan T. Venkataraman <madvenka@linux.microsoft.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241017092538.1859841-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: - Disable software tag-based KASAN when compiling with GCC, as functions are incorrectly instrumented leading to a crash early during boot - Fix pkey configuration for kernel threads when POE is enabled - Fix invalid memory accesses in uprobes when targetting load-literal instructions * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: kasan: Disable Software Tag-Based KASAN with GCC Documentation/protection-keys: add AArch64 to documentation arm64: set POR_EL0 for kernel threads arm64: probes: Fix uprobes for big-endian kernels arm64: probes: Fix simulate_ldr*_literal() arm64: probes: Remove broken LDR (literal) uprobe support
2024-10-17arm64: mops: Handle MOPS exceptions from EL1Kristina Martsenko
We will soon be using MOPS instructions in the kernel, so wire up the exception handler to handle exceptions from EL1 caused by the copy/set operation being stopped and resumed on a different type of CPU. Add a helper for advancing the single step state machine, similarly to what the EL0 exception handler does. Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20240930161051.3777828-3-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17arm64: probes: Disable kprobes/uprobes on MOPS instructionsKristina Martsenko
FEAT_MOPS instructions require that all three instructions (prologue, main and epilogue) appear consecutively in memory. Placing a kprobe/uprobe on one of them doesn't work as only a single instruction gets executed out-of-line or simulated. So don't allow placing a probe on a MOPS instruction. Fixes: b7564127ffcb ("arm64: mops: detect and enable FEAT_MOPS") Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com> Link: https://lore.kernel.org/r/20240930161051.3777828-2-kristina.martsenko@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-17KVM: arm64: Shave a few bytes from the EL2 idmap codeMarc Zyngier
Our idmap is becoming too big, to the point where it doesn't fit in a 4kB page anymore. There are some low-hanging fruits though, such as the el2_init_state horror that is expanded 3 times in the kernel. Let's at least limit ourselves to two copies, which makes the kernel link again. At some point, we'll have to have a better way of doing this. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241009204903.GA3353168@thelio-3990X
2024-10-16arm64: head: Drop SWAPPER_TABLE_SHIFTGavin Shan
There is no users of SWAPPER_TABLE_SHIFT after commit 84b04d3e6bdb ("arm64: kernel: Create initial ID map from C code"). Just drop it. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20241014030341.995806-1-gshan@redhat.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-16arm64: cpufeature: add POE to cpucap_is_possible()Joey Gouly
Since de66cb37ab6 ("arm64: Add cpucap_is_possible()"), alternative_has_cap_unlikely() includes the IS_ENABLED() check. Add CONFIG_ARM64_POE to cpucap_is_possible() to avoid the explicit check. Signed-off-by: Joey Gouly <joey.gouly@arm.com> Cc: Will Deacon <will@kernel.org> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241008140121.2774348-1-joey.gouly@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-16hugetlb: arm64: add mte supportYang Shi
Enable MTE support for hugetlb. The MTE page flags will be set on the folio only. When copying hugetlb folio (for example, CoW), the tags for all subpages will be copied when copying the first subpage. When freeing hugetlb folio, the MTE flags will be cleared. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Yang Shi <yang@os.amperecomputing.com> Link: https://lore.kernel.org/r/20241001225220.271178-1-yang@os.amperecomputing.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-16arm64/mm: Change pgattr_change_is_safe() arguments as pteval_tAnshuman Khandual
pgattr_change_is_safe() processes two distinct page table entries that just happen to be 64 bits for all levels. This changes both arguments to reflect the actual data type being processed in the function. This change is important when moving to FEAT_D128 based 128 bit page tables because it makes it simple to change the entry size in one place. Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20241001045804.1119881-1-anshuman.khandual@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-16arm64: optimize flush tlb kernel rangeKefeng Wang
Currently the kernel TLBs is flushed page by page if the target VA range is less than MAX_DVM_OPS * PAGE_SIZE, otherwise we'll brutally issue a TLBI ALL. But we could optimize it when CPU supports TLB range operations, convert to use __flush_tlb_range_op() like other tlb range flush to improve performance. Co-developed-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20240923131351.713304-3-wangkefeng.wang@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-16arm64: tlbflush: add __flush_tlb_range_limit_excess()Kefeng Wang
The __flush_tlb_range_limit_excess() helper will be used when flush tlb kernel range soon. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/r/20240923131351.713304-2-wangkefeng.wang@huawei.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-16vdso: Introduce vdso/page.hVincenzo Frascino
The VDSO implementation includes headers from outside of the vdso/ namespace. Introduce vdso/page.h to make sure that the generic library uses only the allowed namespace. Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Link: https://lore.kernel.org/all/20241014151340.1639555-3-vincenzo.frascino@arm.com
2024-10-15arm64: insn: Simulate nop instruction for better uprobe performanceLiao Chang
v2->v1: 1. Remove the simuation of STP and the related bits. 2. Use arm64_skip_faulting_instruction for single-stepping or FEAT_BTI scenario. As Andrii pointed out, the uprobe/uretprobe selftest bench run into a counterintuitive result that nop and push variants are much slower than ret variant [0]. The root cause lies in the arch_probe_analyse_insn(), which excludes 'nop' and 'stp' from the emulatable instructions list. This force the kernel returns to userspace and execute them out-of-line, then trapping back to kernel for running uprobe callback functions. This leads to a significant performance overhead compared to 'ret' variant, which is already emulated. Typicall uprobe is installed on 'nop' for USDT and on function entry which starts with the instrucion 'stp x29, x30, [sp, #imm]!' to push lr and fp into stack regardless kernel or userspace binary. In order to improve the performance of handling uprobe for common usecases. This patch supports the emulation of Arm64 equvialents instructions of 'nop' and 'push'. The benchmark results below indicates the performance gain of emulation is obvious. On Kunpeng916 (Hi1616), 4 NUMA nodes, 64 Arm64 cores@2.4GHz. xol (1 cpus) ------------ uprobe-nop: 0.916 ± 0.001M/s (0.916M/prod) uprobe-push: 0.908 ± 0.001M/s (0.908M/prod) uprobe-ret: 1.855 ± 0.000M/s (1.855M/prod) uretprobe-nop: 0.640 ± 0.000M/s (0.640M/prod) uretprobe-push: 0.633 ± 0.001M/s (0.633M/prod) uretprobe-ret: 0.978 ± 0.003M/s (0.978M/prod) emulation (1 cpus) ------------------- uprobe-nop: 1.862 ± 0.002M/s (1.862M/prod) uprobe-push: 1.743 ± 0.006M/s (1.743M/prod) uprobe-ret: 1.840 ± 0.001M/s (1.840M/prod) uretprobe-nop: 0.964 ± 0.004M/s (0.964M/prod) uretprobe-push: 0.936 ± 0.004M/s (0.936M/prod) uretprobe-ret: 0.940 ± 0.001M/s (0.940M/prod) As shown above, the performance gap between 'nop/push' and 'ret' variants has been significantly reduced. Due to the emulation of 'push' instruction needs to access userspace memory, it spent more cycles than the other. As Mark suggested [1], it is painful to emulate the correct atomicity and ordering properties of STP, especially when it interacts with MTE, POE, etc. So this patch just focus on the simuation of 'nop'. The simluation of STP and related changes will be addressed in a separate patch. [0] https://lore.kernel.org/all/CAEf4BzaO4eG6hr2hzXYpn+7Uer4chS0R99zLn02ezZ5YruVuQw@mail.gmail.com/ [1] https://lore.kernel.org/all/Zr3RN4zxF5XPgjEB@J2N7QTR9R3/ CC: Andrii Nakryiko <andrii.nakryiko@gmail.com> CC: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Liao Chang <liaochang1@huawei.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/r/20240909071114.1150053-1-liaochang1@huawei.com [catalin.marinas@arm.com: small tweaks following MarkR's comments] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-15arm64: asm-offsets: remove VMA_VM_*Mark Rutland
The VMA_VM_MM definition is only used by the vma_vm_mm macro, which itself is unused. The VMA_VM_FLAGS definition isn't used anywhere. Remove them all. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241007123921.549340-3-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-15arm64: probes: Remove probe_opcode_tMark Rutland
The probe_opcode_t typedef for u32 isn't necessary, and is a source of confusion as it is easily confused with kprobe_opcode_t, which is a typedef for __le32. The typedef is only used within arch/arm64, and all of arm64's commn insn code uses u32 for the endian-agnostic value of an instruction, so it'd be clearer to use u32 consistently. Remove probe_opcode_t and use u32 directly. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marnias@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241008155851.801546-7-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-15arm64: probes: Cleanup kprobes endianness conversionsMark Rutland
The core kprobes code uses kprobe_opcode_t for the in-memory representation of an instruction, using 'kprobe_opcode_t *' for XOL slots. As arm64 instructions are always little-endian 32-bit values, kprobes_opcode_t should be __le32, but at the moment kprobe_opcode_t is typedef'd to u32. Today there is no functional issue as we convert values via cpu_to_le32() and le32_to_cpu() where necessary, but these conversions are inconsistent with the types used, causing sparse warnings: | CHECK arch/arm64/kernel/probes/kprobes.c | arch/arm64/kernel/probes/kprobes.c:102:21: warning: cast to restricted __le32 | CHECK arch/arm64/kernel/probes/decode-insn.c | arch/arm64/kernel/probes/decode-insn.c:122:46: warning: cast to restricted __le32 | arch/arm64/kernel/probes/decode-insn.c:124:50: warning: cast to restricted __le32 | arch/arm64/kernel/probes/decode-insn.c:136:31: warning: cast to restricted __le32 Improve this by making kprobes_opcode_t a typedef for __le32 and consistently using this for pointers to executable instructions. With this change we can rely on the type system to tell us where conversions are necessary. Since kprobe::opcode is changed from u32 to __le32, the existing le32_to_cpu() converion moves from the point this is initialized (in arch_prepare_kprobe()) to the points this is consumed when passed to a handler or text patching function. As kprobe::opcode isn't altered or consumed elsewhere, this shouldn't result in a functional change. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241008155851.801546-6-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-15arm64: probes: Move kprobes-specific fieldsMark Rutland
We share struct arch_probe_insn between krpboes and uprobes, but most of its fields aren't necessary for uprobes: * The 'insn' field is only used by kprobes as a pointer to the XOL slot. * The 'restore' field is only used by probes as the PC to restore after stepping an instruction in the XOL slot. * The 'pstate_cc' field isn't used by kprobes or uprobes, and seems to only exist as a result of copy-pasting the 32-bit arm implementation of kprobes. As these fields live in struct arch_probe_insn they cannot use definitions that only exist when CONFIG_KPROBES=y, such as the kprobe_opcode_t typedef, which we'd like to use in subsequent patches. Clean this up by removing the 'pstate_cc' field, and moving the kprobes-specific fields into the kprobes-specific struct arch_specific_insn. To make it clear that the fields are related to stepping instructions in the XOL slot, 'insn' is renamed to 'xol_insn' and 'restore' is renamed to 'xol_restore' At the same time, remove the misleading and useless comment above struct arch_probe_insn. The should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20241008155851.801546-5-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-15vdso: Remove timekeeper argument of __arch_update_vsyscall()Thomas Weißschuh
No implementation of this hook uses the passed in timekeeper anymore. This avoids including a non-VDSO header while building the VDSO, which can lead to compilation errors. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/all/20241010-vdso-generic-arch_update_vsyscall-v1-1-7fe5a3ea4382@linutronix.de
2024-10-15ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regsSteven Rostedt
Most architectures use pt_regs within ftrace_regs making a lot of the accessor functions just calls to the pt_regs internally. Instead of duplication this effort, use a HAVE_ARCH_FTRACE_REGS for architectures that have their own ftrace_regs that is not based on pt_regs and will define all the accessor functions, and for the architectures that just use pt_regs, it will leave it undefined, and the default accessor functions will be used. Note, this will also make it easier to add new accessor functions to ftrace_regs as it will mean having to touch less architectures. Cc: <linux-arch@vger.kernel.org> Cc: "x86@kernel.org" <x86@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/20241010202114.2289f6fd@gandalf.local.home Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> # powerpc Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-10-10ftrace: Make ftrace_regs abstract from direct useSteven Rostedt
ftrace_regs was created to hold registers that store information to save function parameters, return value and stack. Since it is a subset of pt_regs, it should only be used by its accessor functions. But because pt_regs can easily be taken from ftrace_regs (on most archs), it is tempting to use it directly. But when running on other architectures, it may fail to build or worse, build but crash the kernel! Instead, make struct ftrace_regs an empty structure and have the architectures define __arch_ftrace_regs and all the accessor functions will typecast to it to get to the actual fields. This will help avoid usage of ftrace_regs directly. Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/ Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "x86@kernel.org" <x86@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>