summaryrefslogtreecommitdiff
path: root/arch/x86/entry/entry_fred.c
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2024-03-11 16:00:17 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2024-03-11 16:00:17 -0700
commit720c857907530e6cdc86c9bc1102ea6b372fbfb6 (patch)
tree03f492c411e076f009d4daf7b9755ded20756347 /arch/x86/entry/entry_fred.c
parentca7e917769121195bae45d4886f6e24efd6f99ae (diff)
parentc416b5bac6ad6ffe21e36225553b82ff2ec1558c (diff)
Merge tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 FRED support from Thomas Gleixner: "Support for x86 Fast Return and Event Delivery (FRED). FRED is a replacement for IDT event delivery on x86 and addresses most of the technical nightmares which IDT exposes: 1) Exception cause registers like CR2 need to be manually preserved in nested exception scenarios. 2) Hardware interrupt stack switching is suboptimal for nested exceptions as the interrupt stack mechanism rewinds the stack on each entry which requires a massive effort in the low level entry of #NMI code to handle this. 3) No hardware distinction between entry from kernel or from user which makes establishing kernel context more complex than it needs to be especially for unconditionally nestable exceptions like NMI. 4) NMI nesting caused by IRET unconditionally reenabling NMIs, which is a problem when the perf NMI takes a fault when collecting a stack trace. 5) Partial restore of ESP when returning to a 16-bit segment 6) Limitation of the vector space which can cause vector exhaustion on large systems. 7) Inability to differentiate NMI sources FRED addresses these shortcomings by: 1) An extended exception stack frame which the CPU uses to save exception cause registers. This ensures that the meta information for each exception is preserved on stack and avoids the extra complexity of preserving it in software. 2) Hardware interrupt stack switching is non-rewinding if a nested exception uses the currently interrupt stack. 3) The entry points for kernel and user context are separate and GS BASE handling which is required to establish kernel context for per CPU variable access is done in hardware. 4) NMIs are now nesting protected. They are only reenabled on the return from NMI. 5) FRED guarantees full restore of ESP 6) FRED does not put a limitation on the vector space by design because it uses a central entry points for kernel and user space and the CPUstores the entry type (exception, trap, interrupt, syscall) on the entry stack along with the vector number. The entry code has to demultiplex this information, but this removes the vector space restriction. The first hardware implementations will still have the current restricted vector space because lifting this limitation requires further changes to the local APIC. 7) FRED stores the vector number and meta information on stack which allows having more than one NMI vector in future hardware when the required local APIC changes are in place. The series implements the initial FRED support by: - Reworking the existing entry and IDT handling infrastructure to accomodate for the alternative entry mechanism. - Expanding the stack frame to accomodate for the extra 16 bytes FRED requires to store context and meta information - Providing FRED specific C entry points for events which have information pushed to the extended stack frame, e.g. #PF and #DB. - Providing FRED specific C entry points for #NMI and #MCE - Implementing the FRED specific ASM entry points and the C code to demultiplex the events - Providing detection and initialization mechanisms and the necessary tweaks in context switching, GS BASE handling etc. The FRED integration aims for maximum code reuse vs the existing IDT implementation to the extent possible and the deviation in hot paths like context switching are handled with alternatives to minimalize the impact. The low level entry and exit paths are seperate due to the extended stack frame and the hardware based GS BASE swichting and therefore have no impact on IDT based systems. It has been extensively tested on existing systems and on the FRED simulation and as of now there are no outstanding problems" * tag 'x86-fred-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits) x86/fred: Fix init_task thread stack pointer initialization MAINTAINERS: Add a maintainer entry for FRED x86/fred: Fix a build warning with allmodconfig due to 'inline' failing to inline properly x86/fred: Invoke FRED initialization code to enable FRED x86/fred: Add FRED initialization functions x86/syscall: Split IDT syscall setup code into idt_syscall_init() KVM: VMX: Call fred_entry_from_kvm() for IRQ/NMI handling x86/entry: Add fred_entry_from_kvm() for VMX to handle IRQ/NMI x86/entry/calling: Allow PUSH_AND_CLEAR_REGS being used beyond actual entry code x86/fred: Fixup fault on ERETU by jumping to fred_entrypoint_user x86/fred: Let ret_from_fork_asm() jmp to asm_fred_exit_user when FRED is enabled x86/traps: Add sysvec_install() to install a system interrupt handler x86/fred: FRED entry/exit and dispatch code x86/fred: Add a machine check entry stub for FRED x86/fred: Add a NMI entry stub for FRED x86/fred: Add a debug fault entry stub for FRED x86/idtentry: Incorporate definitions/declarations of the FRED entries x86/fred: Make exc_page_fault() work for FRED x86/fred: Allow single-step trap and NMI when starting a new task x86/fred: No ESPFIX needed when FRED is enabled ...
Diffstat (limited to 'arch/x86/entry/entry_fred.c')
-rw-r--r--arch/x86/entry/entry_fred.c294
1 files changed, 294 insertions, 0 deletions
diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
new file mode 100644
index 000000000000..ac120cbdaaf2
--- /dev/null
+++ b/arch/x86/entry/entry_fred.c
@@ -0,0 +1,294 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * The FRED specific kernel/user entry functions which are invoked from
+ * assembly code and dispatch to the associated handlers.
+ */
+#include <linux/kernel.h>
+#include <linux/kdebug.h>
+#include <linux/nospec.h>
+
+#include <asm/desc.h>
+#include <asm/fred.h>
+#include <asm/idtentry.h>
+#include <asm/syscall.h>
+#include <asm/trapnr.h>
+#include <asm/traps.h>
+
+/* FRED EVENT_TYPE_OTHER vector numbers */
+#define FRED_SYSCALL 1
+#define FRED_SYSENTER 2
+
+static noinstr void fred_bad_type(struct pt_regs *regs, unsigned long error_code)
+{
+ irqentry_state_t irq_state = irqentry_nmi_enter(regs);
+
+ instrumentation_begin();
+
+ /* Panic on events from a high stack level */
+ if (regs->fred_cs.sl > 0) {
+ pr_emerg("PANIC: invalid or fatal FRED event; event type %u "
+ "vector %u error 0x%lx aux 0x%lx at %04x:%016lx\n",
+ regs->fred_ss.type, regs->fred_ss.vector, regs->orig_ax,
+ fred_event_data(regs), regs->cs, regs->ip);
+ die("invalid or fatal FRED event", regs, regs->orig_ax);
+ panic("invalid or fatal FRED event");
+ } else {
+ unsigned long flags = oops_begin();
+ int sig = SIGKILL;
+
+ pr_alert("BUG: invalid or fatal FRED event; event type %u "
+ "vector %u error 0x%lx aux 0x%lx at %04x:%016lx\n",
+ regs->fred_ss.type, regs->fred_ss.vector, regs->orig_ax,
+ fred_event_data(regs), regs->cs, regs->ip);
+
+ if (__die("Invalid or fatal FRED event", regs, regs->orig_ax))
+ sig = 0;
+
+ oops_end(flags, regs, sig);
+ }
+
+ instrumentation_end();
+ irqentry_nmi_exit(regs, irq_state);
+}
+
+static noinstr void fred_intx(struct pt_regs *regs)
+{
+ switch (regs->fred_ss.vector) {
+ /* Opcode 0xcd, 0x3, NOT INT3 (opcode 0xcc) */
+ case X86_TRAP_BP:
+ return exc_int3(regs);
+
+ /* Opcode 0xcd, 0x4, NOT INTO (opcode 0xce) */
+ case X86_TRAP_OF:
+ return exc_overflow(regs);
+
+#ifdef CONFIG_IA32_EMULATION
+ /* INT80 */
+ case IA32_SYSCALL_VECTOR:
+ if (ia32_enabled())
+ return int80_emulation(regs);
+ fallthrough;
+#endif
+
+ default:
+ return exc_general_protection(regs, 0);
+ }
+}
+
+static __always_inline void fred_other(struct pt_regs *regs)
+{
+ /* The compiler can fold these conditions into a single test */
+ if (likely(regs->fred_ss.vector == FRED_SYSCALL && regs->fred_ss.lm)) {
+ regs->orig_ax = regs->ax;
+ regs->ax = -ENOSYS;
+ do_syscall_64(regs, regs->orig_ax);
+ return;
+ } else if (ia32_enabled() &&
+ likely(regs->fred_ss.vector == FRED_SYSENTER && !regs->fred_ss.lm)) {
+ regs->orig_ax = regs->ax;
+ regs->ax = -ENOSYS;
+ do_fast_syscall_32(regs);
+ return;
+ } else {
+ exc_invalid_op(regs);
+ return;
+ }
+}
+
+#define SYSVEC(_vector, _function) [_vector - FIRST_SYSTEM_VECTOR] = fred_sysvec_##_function
+
+static idtentry_t sysvec_table[NR_SYSTEM_VECTORS] __ro_after_init = {
+ SYSVEC(ERROR_APIC_VECTOR, error_interrupt),
+ SYSVEC(SPURIOUS_APIC_VECTOR, spurious_apic_interrupt),
+ SYSVEC(LOCAL_TIMER_VECTOR, apic_timer_interrupt),
+ SYSVEC(X86_PLATFORM_IPI_VECTOR, x86_platform_ipi),
+
+ SYSVEC(RESCHEDULE_VECTOR, reschedule_ipi),
+ SYSVEC(CALL_FUNCTION_SINGLE_VECTOR, call_function_single),
+ SYSVEC(CALL_FUNCTION_VECTOR, call_function),
+ SYSVEC(REBOOT_VECTOR, reboot),
+
+ SYSVEC(THRESHOLD_APIC_VECTOR, threshold),
+ SYSVEC(DEFERRED_ERROR_VECTOR, deferred_error),
+ SYSVEC(THERMAL_APIC_VECTOR, thermal),
+
+ SYSVEC(IRQ_WORK_VECTOR, irq_work),
+
+ SYSVEC(POSTED_INTR_VECTOR, kvm_posted_intr_ipi),
+ SYSVEC(POSTED_INTR_WAKEUP_VECTOR, kvm_posted_intr_wakeup_ipi),
+ SYSVEC(POSTED_INTR_NESTED_VECTOR, kvm_posted_intr_nested_ipi),
+};
+
+static bool fred_setup_done __initdata;
+
+void __init fred_install_sysvec(unsigned int sysvec, idtentry_t handler)
+{
+ if (WARN_ON_ONCE(sysvec < FIRST_SYSTEM_VECTOR))
+ return;
+
+ if (WARN_ON_ONCE(fred_setup_done))
+ return;
+
+ if (!WARN_ON_ONCE(sysvec_table[sysvec - FIRST_SYSTEM_VECTOR]))
+ sysvec_table[sysvec - FIRST_SYSTEM_VECTOR] = handler;
+}
+
+static noinstr void fred_handle_spurious_interrupt(struct pt_regs *regs)
+{
+ spurious_interrupt(regs, regs->fred_ss.vector);
+}
+
+void __init fred_complete_exception_setup(void)
+{
+ unsigned int vector;
+
+ for (vector = 0; vector < FIRST_EXTERNAL_VECTOR; vector++)
+ set_bit(vector, system_vectors);
+
+ for (vector = 0; vector < NR_SYSTEM_VECTORS; vector++) {
+ if (sysvec_table[vector])
+ set_bit(vector + FIRST_SYSTEM_VECTOR, system_vectors);
+ else
+ sysvec_table[vector] = fred_handle_spurious_interrupt;
+ }
+ fred_setup_done = true;
+}
+
+static noinstr void fred_extint(struct pt_regs *regs)
+{
+ unsigned int vector = regs->fred_ss.vector;
+ unsigned int index = array_index_nospec(vector - FIRST_SYSTEM_VECTOR,
+ NR_SYSTEM_VECTORS);
+
+ if (WARN_ON_ONCE(vector < FIRST_EXTERNAL_VECTOR))
+ return;
+
+ if (likely(vector >= FIRST_SYSTEM_VECTOR)) {
+ irqentry_state_t state = irqentry_enter(regs);
+
+ instrumentation_begin();
+ sysvec_table[index](regs);
+ instrumentation_end();
+ irqentry_exit(regs, state);
+ } else {
+ common_interrupt(regs, vector);
+ }
+}
+
+static noinstr void fred_hwexc(struct pt_regs *regs, unsigned long error_code)
+{
+ /* Optimize for #PF. That's the only exception which matters performance wise */
+ if (likely(regs->fred_ss.vector == X86_TRAP_PF))
+ return exc_page_fault(regs, error_code);
+
+ switch (regs->fred_ss.vector) {
+ case X86_TRAP_DE: return exc_divide_error(regs);
+ case X86_TRAP_DB: return fred_exc_debug(regs);
+ case X86_TRAP_BR: return exc_bounds(regs);
+ case X86_TRAP_UD: return exc_invalid_op(regs);
+ case X86_TRAP_NM: return exc_device_not_available(regs);
+ case X86_TRAP_DF: return exc_double_fault(regs, error_code);
+ case X86_TRAP_TS: return exc_invalid_tss(regs, error_code);
+ case X86_TRAP_NP: return exc_segment_not_present(regs, error_code);
+ case X86_TRAP_SS: return exc_stack_segment(regs, error_code);
+ case X86_TRAP_GP: return exc_general_protection(regs, error_code);
+ case X86_TRAP_MF: return exc_coprocessor_error(regs);
+ case X86_TRAP_AC: return exc_alignment_check(regs, error_code);
+ case X86_TRAP_XF: return exc_simd_coprocessor_error(regs);
+
+#ifdef CONFIG_X86_MCE
+ case X86_TRAP_MC: return fred_exc_machine_check(regs);
+#endif
+#ifdef CONFIG_INTEL_TDX_GUEST
+ case X86_TRAP_VE: return exc_virtualization_exception(regs);
+#endif
+#ifdef CONFIG_X86_CET
+ case X86_TRAP_CP: return exc_control_protection(regs, error_code);
+#endif
+ default: return fred_bad_type(regs, error_code);
+ }
+
+}
+
+static noinstr void fred_swexc(struct pt_regs *regs, unsigned long error_code)
+{
+ switch (regs->fred_ss.vector) {
+ case X86_TRAP_BP: return exc_int3(regs);
+ case X86_TRAP_OF: return exc_overflow(regs);
+ default: return fred_bad_type(regs, error_code);
+ }
+}
+
+__visible noinstr void fred_entry_from_user(struct pt_regs *regs)
+{
+ unsigned long error_code = regs->orig_ax;
+
+ /* Invalidate orig_ax so that syscall_get_nr() works correctly */
+ regs->orig_ax = -1;
+
+ switch (regs->fred_ss.type) {
+ case EVENT_TYPE_EXTINT:
+ return fred_extint(regs);
+ case EVENT_TYPE_NMI:
+ if (likely(regs->fred_ss.vector == X86_TRAP_NMI))
+ return fred_exc_nmi(regs);
+ break;
+ case EVENT_TYPE_HWEXC:
+ return fred_hwexc(regs, error_code);
+ case EVENT_TYPE_SWINT:
+ return fred_intx(regs);
+ case EVENT_TYPE_PRIV_SWEXC:
+ if (likely(regs->fred_ss.vector == X86_TRAP_DB))
+ return fred_exc_debug(regs);
+ break;
+ case EVENT_TYPE_SWEXC:
+ return fred_swexc(regs, error_code);
+ case EVENT_TYPE_OTHER:
+ return fred_other(regs);
+ default: break;
+ }
+
+ return fred_bad_type(regs, error_code);
+}
+
+__visible noinstr void fred_entry_from_kernel(struct pt_regs *regs)
+{
+ unsigned long error_code = regs->orig_ax;
+
+ /* Invalidate orig_ax so that syscall_get_nr() works correctly */
+ regs->orig_ax = -1;
+
+ switch (regs->fred_ss.type) {
+ case EVENT_TYPE_EXTINT:
+ return fred_extint(regs);
+ case EVENT_TYPE_NMI:
+ if (likely(regs->fred_ss.vector == X86_TRAP_NMI))
+ return fred_exc_nmi(regs);
+ break;
+ case EVENT_TYPE_HWEXC:
+ return fred_hwexc(regs, error_code);
+ case EVENT_TYPE_PRIV_SWEXC:
+ if (likely(regs->fred_ss.vector == X86_TRAP_DB))
+ return fred_exc_debug(regs);
+ break;
+ case EVENT_TYPE_SWEXC:
+ return fred_swexc(regs, error_code);
+ default: break;
+ }
+
+ return fred_bad_type(regs, error_code);
+}
+
+#if IS_ENABLED(CONFIG_KVM_INTEL)
+__visible noinstr void __fred_entry_from_kvm(struct pt_regs *regs)
+{
+ switch (regs->fred_ss.type) {
+ case EVENT_TYPE_EXTINT:
+ return fred_extint(regs);
+ case EVENT_TYPE_NMI:
+ return fred_exc_nmi(regs);
+ default:
+ WARN_ON_ONCE(1);
+ }
+}
+#endif