Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Although some more stuff is brewing, the EFI changes that are ready
for mainline are few this cycle:
- improve the PCI DMA paranoia logic in the EFI stub
- some constification changes
- add statfs support to efivarfs
- allow user space to enumerate updatable firmware resources without
CAP_SYS_ADMIN"
* tag 'efi-next-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
efi/libstub: Disable PCI DMA before grabbing the EFI memory map
efi/esrt: Allow ESRT access without CAP_SYS_ADMIN
efivarfs: expose used and total size
efi: make kobj_type structure constant
efi: x86: make kobj_type structure constant
|
|
Currently, the EFI stub will disable PCI DMA as the very last thing it
does before calling ExitBootServices(), to avoid interfering with the
firmware's normal operation as much as possible.
However, the stub will invoke DisconnectController() on all endpoints
downstream of the PCI bridges it disables, and this may affect the
layout of the EFI memory map, making it substantially more likely that
ExitBootServices() will fail the first time around, and that the EFI
memory map needs to be reloaded.
This, in turn, increases the likelihood that the slack space we
allocated is insufficient (and we can no longer allocate memory via boot
services after having called ExitBootServices() once), causing the
second call to GetMemoryMap (and therefore the boot) to fail. This makes
the PCI DMA disable feature a bit more fragile than it already is, so
let's make it more robust, by allocating the space for the EFI memory
map after disabling PCI DMA.
Fixes: 4444f8541dad16fe ("efi: Allow disabling PCI busmastering on bridges during boot")
Reported-by: Glenn Washburn <development@efficientek.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The UEFI v2.9 specification includes a new memory type to be used in
environments where the OS must accept memory that is provided from its
host. Before the introduction of this memory type, all memory was
accepted eagerly in the firmware. In order for the firmware to safely
stop accepting memory on the OS's behalf, the OS must affirmatively
indicate support to the firmware. This is only a problem for AMD
SEV-SNP, since Linux has had support for it since 5.19. The other
technology that can make use of unaccepted memory, Intel TDX, does not
yet have Linux support, so it can strictly require unaccepted memory
support as a dependency of CONFIG_TDX and not require communication with
the firmware.
Enabling unaccepted memory requires calling a 0-argument enablement
protocol before ExitBootServices. This call is only made if the kernel
is compiled with UNACCEPTED_MEMORY=y
This protocol will be removed after the end of life of the first LTS
that includes it, in order to give firmware implementations an
expiration date for it. When the protocol is removed, firmware will
strictly infer that a SEV-SNP VM is running an OS that supports the
unaccepted memory type. At the earliest convenience, when unaccepted
memory support is added to Linux, SEV-SNP may take strict dependence in
it. After the firmware removes support for the protocol, this should be
reverted.
[tl: address some checkscript warnings]
Signed-off-by: Dionna Glaze <dionnaglaze@google.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/0d5f3d9a20b5cf361945b7ab1263c36586a78a42.1686063086.git.thomas.lendacky@amd.com
|
|
UEFI Specification version 2.9 introduces the concept of memory
acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD
SEV-SNP, requiring memory to be accepted before it can be used by the
guest. Accepting happens via a protocol specific for the Virtual
Machine platform.
Accepting memory is costly and it makes VMM allocate memory for the
accepted guest physical address range. It's better to postpone memory
acceptance until memory is needed. It lowers boot time and reduces
memory overhead.
The kernel needs to know what memory has been accepted. Firmware
communicates this information via memory map: a new memory type --
EFI_UNACCEPTED_MEMORY -- indicates such memory.
Range-based tracking works fine for firmware, but it gets bulky for
the kernel: e820 (or whatever the arch uses) has to be modified on every
page acceptance. It leads to table fragmentation and there's a limited
number of entries in the e820 table.
Another option is to mark such memory as usable in e820 and track if the
range has been accepted in a bitmap. One bit in the bitmap represents a
naturally aligned power-2-sized region of address space -- unit.
For x86, unit size is 2MiB: 4k of the bitmap is enough to track 64GiB or
physical address space.
In the worst-case scenario -- a huge hole in the middle of the
address space -- It needs 256MiB to handle 4PiB of the address
space.
Any unaccepted memory that is not aligned to unit_size gets accepted
upfront.
The bitmap is allocated and constructed in the EFI stub and passed down
to the kernel via EFI configuration table. allocate_e820() allocates the
bitmap if unaccepted memory is present, according to the size of
unaccepted region.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230606142637.5171-4-kirill.shutemov@linux.intel.com
|
|
Currently allocate_e820() is only interested in the size of map and size
of memory descriptor to determine how many e820 entries the kernel
needs.
UEFI Specification version 2.9 introduces a new memory type --
unaccepted memory. To track unaccepted memory, the kernel needs to
allocate a bitmap. The size of the bitmap is dependent on the maximum
physical address present in the system. A full memory map is required to
find the maximum address.
Modify allocate_e820() to get a full memory map.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230606142637.5171-3-kirill.shutemov@linux.intel.com
|
|
The cper.c file needs to include an extra header, and efi_zboot_entry
needs an extern declaration to avoid these 'make W=1' warnings:
drivers/firmware/efi/libstub/zboot.c:65:1: error: no previous prototype for 'efi_zboot_entry' [-Werror=missing-prototypes]
drivers/firmware/efi/efi.c:176:16: error: no previous prototype for 'efi_attr_is_visible' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:626:6: error: no previous prototype for 'cper_estatus_print' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:649:5: error: no previous prototype for 'cper_estatus_check_header' [-Werror=missing-prototypes]
drivers/firmware/efi/cper.c:662:5: error: no previous prototype for 'cper_estatus_check' [-Werror=missing-prototypes]
To make this easier, move the cper specific declarations to
include/linux/cper.h.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The Make variable containing the objcopy flags may be constructed from
the output of build tools operating on build artifacts, and these may
not exist when doing a make clean.
So avoid evaluating them eagerly, to prevent spurious build warnings.
Suggested-by: Pedro Falcato <pedro.falcato@gmail.com>
Tested-by: Alan Bartlett <ajb@elrepo.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Instead of relying on a dodgy dd hack to copy the image code size from
the uncompressed image's PE header to the end of the compressed image,
let's grab the code size from the symbol that is injected into the ELF
object by the Kbuild rules that generate the compressed payload.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
|
|
The EFI zboot code is not built as part of the kernel proper, like the
ordinary EFI stub, but still needs access to symbols that are defined
only internally in the kernel, and are left unexposed deliberately to
avoid creating ABI inadvertently that we're stuck with later.
So capture the kernel code size of the kernel image, and inject it as an
ELF symbol into the object that contains the compressed payload, where
it will be accessible to zboot code that needs it.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
|
|
Add some plumbing to the zboot EFI header generation to set the newly
introduced DllCharacteristicsEx flag associated with forward edge CFI
enforcement instructions (BTI on arm64, IBT on x86)
x86 does not currently uses the zboot infrastructure, so let's wire it
up only for arm64.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
We don't really care about the size of the decompressed image - what
matters is how much space needs to be allocated for the image to
execute, and this includes space for BSS that is not part of the
loadable image and so it is not accounted for in the decompressed size.
So let's add some zero padding to the end of the image: this compresses
well, and it ensures that BSS is accounted for, and as a bonus, it will
be zeroed before launching the image.
Since all architectures that implement support for EFI zboot carry this
value in the header in the same location, we can just grab it from the
binary that is being compressed.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
UEFI heavily relies on so-called protocols, which are essentially
tables populated with pointers to executable code, and these are invoked
indirectly using BR or BLR instructions.
This makes the EFI execution context vulnerable to attacks on forward
edge control flow, and so it would help if we could enable hardware
enforcement (BTI) on CPUs that implement it.
So let's no longer disable BTI codegen for the EFI stub, and set the
newly introduced PE/COFF header flag when the kernel is built with BTI
landing pads.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
|
|
Since Linux-6.3, LoongArch supports PIE kernel now, so let's reintroduce
efi_relocate_kernel() to relocate the core kernel.
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The logic in efi_random_alloc() will iterate over the memory map twice,
once to count the number of candidate slots, and another time to locate
the chosen slot after randomization.
If there is insufficient memory to do the allocation, the second loop
will run to completion without actually having located a slot, but we
currently return EFI_SUCCESS in this case, as we fail to initialize
status to the appropriate error value of EFI_OUT_OF_RESOURCES.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
In some cases, we expose the kernel's struct screen_info to the EFI stub
directly, so it gets populated before even entering the kernel. This
means the early console is available as soon as the early param parsing
happens, which is nice. It also means we need two different ways to pass
this information, as this trick only works if the EFI stub is baked into
the core kernel image, which is not always the case.
Huacai reports that the preparatory refactoring that was needed to
implement this alternative method for zboot resulted in a non-functional
efifb earlycon for other cases as well, due to the reordering of the
kernel image relocation with the population of the screen_info struct,
and the latter now takes place after copying the image to its new
location, which means we copy the old, uninitialized state.
So let's ensure that the same-image version of alloc_screen_info()
produces the correct screen_info pointer, by taking the displacement of
the loaded image into account.
Reported-by: Huacai Chen <chenhuacai@loongson.cn>
Tested-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/linux-efi/20230310021749.921041-1-chenhuacai@loongson.cn/
Fixes: 42c8ea3dca094ab8 ("efi: libstub: Factor out EFI stub entrypoint into separate file")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Avoid needlessly rebuilding the compressed image by adding the file
'vmlinuz' to the 'targets' Kbuild make variable.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
We no longer use the recsize argument for locating the string table in
an SMBIOS record, so we can drop it from the internal API.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Instead of using the SMBIOS type 1 record 'family' field, which is often
modified by OEMs, use the type 4 'processor ID' and 'processor version'
fields, which are set to a small set of probe-able values on all known
Ampere EFI systems in the field.
Fixes: 550b33cfd4452968 ("arm64: efi: Force the use of ...")
Tested-by: Andrea Righi <andrea.righi@canonical.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The type 1 SMBIOS record happens to always be the same size, but there
are other record types which have been augmented over time, and so we
should really use the length field in the header to decide where the
string table starts.
Fixes: 550b33cfd4452968 ("arm64: efi: Force the use of ...")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
After relocating the executable image, use the EFI memory attributes
protocol to remap the code and data regions with the appropriate
permissions.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Now that the zboot loader will invoke the EFI memory attributes protocol
to remap the decompressed code and rodata as read-only/executable, we
can set the PE/COFF header flag that indicates to the firmware that the
application does not rely on writable memory being executable at the
same time.
Cc: <stable@vger.kernel.org> # v6.2+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon (Andy)
- Preparatory refactoring and cleanup work in the efivar layer, which
is needed to accommodate the Snapdragon arm64 laptops that expose
their EFI variable store via a TEE secure world API (Johan)
- Enhancements to the EFI memory map handling so that Xen dom0 can
safely access EFI configuration tables (Demi Marie)
- Wire up the newly introduced IBT/BTI flag in the EFI memory
attributes table, so that firmware that is generated with ENDBR/BTI
landing pads will be mapped with enforcement enabled
- Clean up how we check and print the EFI revision exposed by the
firmware
- Incorporate EFI memory attributes protocol definition and wire it
up in the EFI zboot code (Evgeniy)
This ensures that these images can execute under new and stricter
rules regarding the default memory permissions for EFI page
allocations (More work is in progress here)
- CPER header cleanup (Dan Williams)
- Use a raw spinlock to protect the EFI runtime services stack on
arm64 to ensure the correct semantics under -rt (Pierre)
- EFI framebuffer quirk for Lenovo Ideapad (Darrell)"
* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
arm64: efi: Make efi_rt_lock a raw_spinlock
efi: Add mixed-mode thunk recipe for GetMemoryAttributes
efi: x86: Wire up IBT annotation in memory attributes table
efi: arm64: Wire up BTI annotation in memory attributes table
efi: Discover BTI support in runtime services regions
efi/cper, cxl: Remove cxl_err.h
efi: Use standard format for printing the EFI revision
efi: Drop minimum EFI version check at boot
efi: zboot: Use EFI protocol to remap code/data with the right attributes
efi/libstub: Add memory attribute protocol definitions
efi: efivars: prevent double registration
efi: verify that variable services are supported
efivarfs: always register filesystem
efi: efivars: add efivars printk prefix
efi: Warn if trying to reserve memory under Xen
efi: Actually enable the ESRT under Xen
efi: Apply allowlist to EFI configuration tables when running under Xen
efi: xen: Implement memory descriptor lookup based on hypercall
efi: memmap: Disregard bogus entries instead of returning them
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
architectural register (ZT0, for the look-up table feature) that
Linux needs to save/restore
- Include TPIDR2 in the signal context and add the corresponding
kselftests
- Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
(ARM CMN) at probe time
- Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64
- Permit EFI boot with MMU and caches on. Instead of cleaning the
entire loaded kernel image to the PoC and disabling the MMU and
caches before branching to the kernel bare metal entry point, leave
the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
all of system RAM to populate the initial page tables
- Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
kernel (the arm32 kernel only defines the values)
- Harden the arm64 shadow call stack pointer handling: stash the shadow
stack pointer in the task struct on interrupt, load it directly from
this structure
- Signal handling cleanups to remove redundant validation of size
information and avoid reading the same data from userspace twice
- Refactor the hwcap macros to make use of the automatically generated
ID registers. It should make new hwcaps writing less error prone
- Further arm64 sysreg conversion and some fixes
- arm64 kselftest fixes and improvements
- Pointer authentication cleanups: don't sign leaf functions, unify
asm-arch manipulation
- Pseudo-NMI code generation optimisations
- Minor fixes for SME and TPIDR2 handling
- Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
replace strtobool() to kstrtobool() in the cpufeature.c code, apply
dynamic shadow call stack in two passes, intercept pfn changes in
set_pte_at() without the required break-before-make sequence, attempt
to dump all instructions on unhandled kernel faults
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
arm64: fix .idmap.text assertion for large kernels
kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
kselftest/arm64: Copy whole EXTRA context
arm64: kprobes: Drop ID map text from kprobes blacklist
perf: arm_spe: Print the version of SPE detected
perf: arm_spe: Add support for SPEv1.2 inverted event filtering
perf: Add perf_event_attr::config3
arm64/sme: Fix __finalise_el2 SMEver check
drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
arm64/signal: Only read new data when parsing the ZT context
arm64/signal: Only read new data when parsing the ZA context
arm64/signal: Only read new data when parsing the SVE context
arm64/signal: Avoid rereading context frame sizes
arm64/signal: Make interface for restore_fpsimd_context() consistent
arm64/signal: Remove redundant size validation from parse_user_sigframe()
arm64/signal: Don't redundantly verify FPSIMD magic
arm64/cpufeature: Use helper macros to specify hwcaps
arm64/cpufeature: Always use symbolic name for feature value in hwcaps
arm64/sysreg: Initial unsigned annotations for ID registers
arm64/sysreg: Initial annotation of signed ID registers
...
|
|
machines
Commit 550b33cfd445 ("arm64: efi: Force the use of SetVirtualAddressMap()
on Altra machines") identifies the Altra family via the family field in
the type#1 SMBIOS record. eMAG and Altra Max machines are similarly
affected but not detected with the strict strcmp test.
The type1_family smbios string is not an entirely reliable means of
identifying systems with this issue as OEMs can, and do, use their own
strings for these fields. However, until we have a better solution,
capture the bulk of these systems by adding strcmp matching for "eMAG"
and "Altra Max".
Fixes: 550b33cfd445 ("arm64: efi: Force the use of SetVirtualAddressMap() on Altra machines")
Cc: <stable@vger.kernel.org> # 6.1.x
Cc: Alexandru Elisei <alexandru.elisei@gmail.com>
Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
Tested-by: Justin He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Use the recently introduced EFI_MEMORY_ATTRIBUTES_PROTOCOL in the zboot
implementation to set the right attributes for the code and data
sections of the decompressed image, i.e., EFI_MEMORY_RO for code and
EFI_MEMORY_XP for data.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
EFI_MEMORY_ATTRIBUTE_PROTOCOL servers as a better alternative to
DXE services for setting memory attributes in EFI Boot Services
environment. This protocol is better since it is a part of UEFI
specification itself and not UEFI PI specification like DXE
services.
Add EFI_MEMORY_ATTRIBUTE_PROTOCOL definitions.
Support mixed mode properly for its calls.
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Evgeniy Baskov <baskov@ispras.ru>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Instead of cleaning the entire loaded kernel image to the PoC and
disabling the MMU and caches before branching to the kernel's bare metal
entry point, we can leave the MMU and caches enabled, and rely on EFI's
cacheable 1:1 mapping of all of system RAM (which is mandated by the
spec) to populate the initial page tables.
This removes the need for managing coherency in software, which is
tedious and error prone.
Note that we still need to clean the executable region of the image to
the PoU if this is required for I/D coherency, but only if we actually
decided to move the image in memory, as otherwise, this will have been
taken care of by the loader.
This change affects both the builtin EFI stub as well as the zboot
decompressor, which now carries the entire EFI stub along with the
decompression code and the compressed image.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://lore.kernel.org/r/20230111102236.1430401-7-ardb@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Borislav Petkov:
"A of early boot cleanups and fixes.
- Do some spring cleaning to the compressed boot code by moving the
EFI mixed-mode code to a separate compilation unit, the AMD memory
encryption early code where it belongs and fixing up build
dependencies. Make the deprecated EFI handover protocol optional
with the goal of removing it at some point (Ard Biesheuvel)
- Skip realmode init code on Xen PV guests as it is not needed there
- Remove an old 32-bit PIC code compiler workaround"
* tag 'x86_boot_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot: Remove x86_32 PIC using %ebx workaround
x86/boot: Skip realmode init code when running as Xen PV guest
x86/efi: Make the deprecated EFI handover protocol optional
x86/boot/compressed: Only build mem_encrypt.S if AMD_MEM_ENCRYPT=y
x86/boot/compressed: Adhere to calling convention in get_sev_encryption_bit()
x86/boot/compressed: Move startup32_check_sev_cbit() out of head_64.S
x86/boot/compressed: Move startup32_check_sev_cbit() into .text
x86/boot/compressed: Move startup32_load_idt() out of head_64.S
x86/boot/compressed: Move startup32_load_idt() into .text section
x86/boot/compressed: Pull global variable reference into startup32_load_idt()
x86/boot/compressed: Avoid touching ECX in startup32_set_idt_entry()
x86/boot/compressed: Simplify IDT/GDT preserve/restore in the EFI thunk
x86/boot/compressed, efi: Merge multiple definitions of image_offset into one
x86/boot/compressed: Move efi32_pe_entry() out of head_64.S
x86/boot/compressed: Move efi32_entry out of head_64.S
x86/boot/compressed: Move efi32_pe_entry into .text section
x86/boot/compressed: Move bootargs parsing out of 32-bit startup code
x86/boot/compressed: Move 32-bit entrypoint code into .text section
x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Another fairly sizable pull request, by EFI subsystem standards.
Most of the work was done by me, some of it in collaboration with the
distro and bootloader folks (GRUB, systemd-boot), where the main focus
has been on removing pointless per-arch differences in the way EFI
boots a Linux kernel.
- Refactor the zboot code so that it incorporates all the EFI stub
logic, rather than calling the decompressed kernel as a EFI app.
- Add support for initrd= command line option to x86 mixed mode.
- Allow initrd= to be used with arbitrary EFI accessible file systems
instead of just the one the kernel itself was loaded from.
- Move some x86-only handling and manipulation of the EFI memory map
into arch/x86, as it is not used anywhere else.
- More flexible handling of any random seeds provided by the boot
environment (i.e., systemd-boot) so that it becomes available much
earlier during the boot.
- Allow improved arch-agnostic EFI support in loaders, by setting a
uniform baseline of supported features, and adding a generic magic
number to the DOS/PE header. This should allow loaders such as GRUB
or systemd-boot to reduce the amount of arch-specific handling
substantially.
- (arm64) Run EFI runtime services from a dedicated stack, and use it
to recover from synchronous exceptions that might occur in the
firmware code.
- (arm64) Ensure that we don't allocate memory outside of the 48-bit
addressable physical range.
- Make EFI pstore record size configurable
- Add support for decoding CXL specific CPER records"
* tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
arm64: efi: Recover from synchronous exceptions occurring in firmware
arm64: efi: Execute runtime services from a dedicated stack
arm64: efi: Limit allocations to 48-bit addressable physical region
efi: Put Linux specific magic number in the DOS header
efi: libstub: Always enable initrd command line loader and bump version
efi: stub: use random seed from EFI variable
efi: vars: prohibit reading random seed variables
efi: random: combine bootloader provided RNG seed with RNG protocol output
efi/cper, cxl: Decode CXL Error Log
efi/cper, cxl: Decode CXL Protocol Error Section
efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
efi: x86: Move EFI runtime map sysfs code to arch/x86
efi: runtime-maps: Clarify purpose and enable by default for kexec
efi: pstore: Add module parameter for setting the record size
efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
efi: memmap: Move manipulation routines into x86 arch tree
efi: memmap: Move EFI fake memmap support into x86 arch tree
efi: libstub: Undeprecate the command line initrd loader
efi: libstub: Add mixed mode support to command line initrd loader
efi: libstub: Permit mixed mode return types other than efi_status_t
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The highlights this time are support for dynamically enabling and
disabling Clang's Shadow Call Stack at boot and a long-awaited
optimisation to the way in which we handle the SVE register state on
system call entry to avoid taking unnecessary traps from userspace.
Summary:
ACPI:
- Enable FPDT support for boot-time profiling
- Fix CPU PMU probing to work better with PREEMPT_RT
- Update SMMUv3 MSI DeviceID parsing to latest IORT spec
- APMT support for probing Arm CoreSight PMU devices
CPU features:
- Advertise new SVE instructions (v2.1)
- Advertise range prefetch instruction
- Advertise CSSC ("Common Short Sequence Compression") scalar
instructions, adding things like min, max, abs, popcount
- Enable DIT (Data Independent Timing) when running in the kernel
- More conversion of system register fields over to the generated
header
CPU misfeatures:
- Workaround for Cortex-A715 erratum #2645198
Dynamic SCS:
- Support for dynamic shadow call stacks to allow switching at
runtime between Clang's SCS implementation and the CPU's pointer
authentication feature when it is supported (complete with scary
DWARF parser!)
Tracing and debug:
- Remove static ftrace in favour of, err, dynamic ftrace!
- Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace
and existing arch code
- Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the
old FTRACE_WITH_REGS
- Extend 'crashkernel=' parameter with default value and fallback to
placement above 4G physical if initial (low) allocation fails
SVE:
- Optimisation to avoid disabling SVE unconditionally on syscall
entry and just zeroing the non-shared state on return instead
Exceptions:
- Rework of undefined instruction handling to avoid serialisation on
global lock (this includes emulation of user accesses to the ID
registers)
Perf and PMU:
- Support for TLP filters in Hisilicon's PCIe PMU device
- Support for the DDR PMU present in Amlogic Meson G12 SoCs
- Support for the terribly-named "CoreSight PMU" architecture from
Arm (and Nvidia's implementation of said architecture)
Misc:
- Tighten up our boot protocol for systems with memory above 52 bits
physical
- Const-ify static keys to satisty jump label asm constraints
- Trivial FFA driver cleanups in preparation for v1.1 support
- Export the kernel_neon_* APIs as GPL symbols
- Harden our instruction generation routines against instrumentation
- A bunch of robustness improvements to our arch-specific selftests
- Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits)
arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
arm64: Prohibit instrumentation on arch_stack_walk()
arm64:uprobe fix the uprobe SWBP_INSN in big-endian
arm64: alternatives: add __init/__initconst to some functions/variables
arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
kselftest/arm64: Allow epoll_wait() to return more than one result
kselftest/arm64: Don't drain output while spawning children
kselftest/arm64: Hold fp-stress children until they're all spawned
arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
arm64/sysreg: Convert MVFR2_EL1 to automatic generation
arm64/sysreg: Convert MVFR1_EL1 to automatic generation
arm64/sysreg: Convert MVFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
...
|
|
The UEFI spec does not mention or reason about the configured size of
the virtual address space at all, but it does mention that all memory
should be identity mapped using a page size of 4 KiB.
This means that a LPA2 capable system that has any system memory outside
of the 48-bit addressable physical range and follows the spec to the
letter may serve page allocation requests from regions of memory that
the kernel cannot access unless it was built with LPA2 support and
enables it at runtime.
So let's ensure that all page allocations are limited to the 48-bit
range.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
GRUB currently relies on the magic number in the image header of ARM and
arm64 EFI kernel images to decide whether or not the image in question
is a bootable kernel.
However, the purpose of the magic number is to identify the image as one
that implements the bare metal boot protocol, and so GRUB, which only
does EFI boot, is limited unnecessarily to booting images that could
potentially be booted in a non-EFI manner as well.
This is problematic for the new zboot decompressor image format, as it
can only boot in EFI mode, and must therefore not use the bare metal
boot magic number in its header.
For this reason, the strict magic number was dropped from GRUB, to
permit essentially any kind of EFI executable to be booted via the
'linux' command, blurring the line between the linux loader and the
chainloader.
So let's use the same field in the DOS header that RISC-V and arm64
already use for their 'bare metal' magic numbers to store a 'generic
Linux kernel' magic number, which can be used to identify bootable
kernel images in PE format which don't necessarily implement a bare
metal boot protocol in the same binary. Note that, in the context of
EFI, the MS-DOS header is only described in terms of the fields that it
shares with the hybrid PE/COFF image format, (i.e., the MS-DOS EXE magic
number at offset #0 and the PE header offset at byte offset #0x3c).
Since we aim for compatibility with EFI only, and not with MS-DOS or
MS-Windows, we can use the remaining space in the MS-DOS header however
we want.
Let's set the generic magic number for x86 images as well: existing
bootloaders already have their own methods to identify x86 Linux images
that can be booted in a non-EFI manner, and having the magic number in
place there will ease any future transitions in loader implementations
to merge the x86 and non-x86 EFI boot paths.
Note that 32-bit ARM already uses the same location in the header for a
different purpose, but the ARM support is already widely implemented and
the EFI zboot decompressor is not available on ARM anyway, so we just
disregard it here.
Acked-by: Leif Lindholm <quic_llindhol@quicinc.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
In preparation for setting a cross-architecture baseline for EFI boot
support, remove the Kconfig option that permits the command line initrd
loader to be disabled. Also, bump the minor version so that any image
built with the new version can be identified as supporting this.
Acked-by: Leif Lindholm <quic_llindhol@quicinc.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
EFI has a rather unique benefit that it has access to some limited
non-volatile storage, where the kernel can store a random seed. Read
that seed in EFISTUB and concatenate it with other seeds we wind up
passing onward to the kernel in the configuration table. This is
complementary to the current other two sources - previous bootloaders,
and the EFI RNG protocol.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
[ardb: check for non-NULL RNG protocol pointer, call GetVariable()
without buffer first to obtain the size]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
There is no need for head_32.S and head_64.S both declaring a copy of
the global 'image_offset' variable, so drop those and make the extern C
declaration the definition.
When image_offset is moved to the .c file, it needs to be placed
particularly in the .data section because it lands by default in the
.bss section which is cleared too late, in .Lrelocated, before the first
access to it and thus garbage gets read, leading to SEV guests exploding
in early boot.
This happens only when the SEV guest kernel is loaded through grub. If
supplied with qemu's -kernel command line option, that memory is always
cleared upfront by qemu and all is fine there.
[ bp: Expand commit message with SEV aspect. ]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20221122161017.2426828-8-ardb@kernel.org
|
|
Instead of blindly creating the EFI random seed configuration table if
the RNG protocol is implemented and works, check whether such a EFI
configuration table was provided by an earlier boot stage and if so,
concatenate the existing and the new seeds, leaving it up to the core
code to mix it in and credit it the way it sees fit.
This can be used for, e.g., systemd-boot, to pass an additional seed to
Linux in a way that can be consumed by the kernel very early. In that
case, the following definitions should be used to pass the seed to the
EFI stub:
struct linux_efi_random_seed {
u32 size; // of the 'seed' array in bytes
u8 seed[];
};
The memory for the struct must be allocated as EFI_ACPI_RECLAIM_MEMORY
pool memory, and the address of the struct in memory should be installed
as a EFI configuration table using the following GUID:
LINUX_EFI_RANDOM_SEED_TABLE_GUID 1ce1e5bc-7ceb-42f2-81e5-8aadf180f57b
Note that doing so is safe even on kernels that were built without this
patch applied, but the seed will simply be overwritten with a seed
derived from the EFI RNG protocol, if available. The recommended seed
size is 32 bytes, and seeds larger than 512 bytes are considered
corrupted and ignored entirely.
In order to preserve forward secrecy, seeds from previous bootloaders
are memzero'd out, and in order to preserve memory, those older seeds
are also freed from memory. Freeing from memory without first memzeroing
is not safe to do, as it's possible that nothing else will ever
overwrite those pages used by EFI.
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
[ardb: incorporate Jason's followup changes to extend the maximum seed
size on the consumer end, memzero() it and drop a needless printk]
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
commit f4dc7fffa987 ("efi: libstub: unify initrd loading between
architectures") merge the first and the second parameters into a
struct without updating the kernel-doc. Let's fix it.
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Now that we have support for calling protocols that need additional
marshalling for mixed mode, wire up the initrd command line loader.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Rework the EFI stub macro wrappers around protocol method calls and
other indirect calls in order to allow return types other than
efi_status_t. This means the widening should be conditional on whether
or not the return type is efi_status_t, and should be omitted otherwise.
Also, switch to _Generic() to implement the type based compile time
conditionals, which is more concise, and distinguishes between
efi_status_t and u64 properly.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Currently, the initrd= command line option to the EFI stub only supports
loading files that reside on the same volume as the loaded image, which
is not workable for loaders like GRUB that don't even implement the
volume abstraction (EFI_SIMPLE_FILE_SYSTEM_PROTOCOL), and load the
kernel from an anonymous buffer in memory. For this reason, another
method was devised that relies on the LoadFile2 protocol.
However, the command line loader is rather useful when using the UEFI
shell or other generic loaders that have no awareness of Linux specific
protocols so let's make it a bit more flexible, by permitting textual
device paths to be provided to initrd= as well, provided that they refer
to a file hosted on a EFI_SIMPLE_FILE_SYSTEM_PROTOCOL volume. E.g.,
initrd=PciRoot(0x0)/Pci(0x3,0x0)/HD(1,MBR,0xBE1AFDFA,0x3F,0xFBFC1)/rootfs.cpio.gz
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The EFI spec is not very clear about which permissions are being given
when allocating pages of a certain type. However, it is quite obvious
that EFI_LOADER_CODE is more likely to permit execution than
EFI_LOADER_DATA, which becomes relevant once we permit booting the
kernel proper with the firmware's 1:1 mapping still active.
Ostensibly, recent systems such as the Surface Pro X grant executable
permissions to EFI_LOADER_CODE regions but not EFI_LOADER_DATA regions.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
|
|
Ampere Altra machines are reported to misbehave when the SetTime() EFI
runtime service is called after ExitBootServices() but before calling
SetVirtualAddressMap(). Given that the latter is horrid, pointless and
explicitly documented as optional by the EFI spec, we no longer invoke
it at boot if the configured size of the VA space guarantees that the
EFI runtime memory regions can remain mapped 1:1 like they are at boot
time.
On Ampere Altra machines, this results in SetTime() calls issued by the
rtc-efi driver triggering synchronous exceptions during boot. We can
now recover from those without bringing down the system entirely, due to
commit 23715a26c8d81291 ("arm64: efi: Recover from synchronous
exceptions occurring in firmware"). However, it would be better to avoid
the issue entirely, given that the firmware appears to remain in a funny
state after this.
So attempt to identify these machines based on the 'family' field in the
type #1 SMBIOS record, and call SetVirtualAddressMap() unconditionally
in that case.
Tested-by: Alexandru Elisei <alexandru.elisei@gmail.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Enable asynchronous unwind table generation for both the core kernel as
well as modules, and emit the resulting .eh_frame sections as init code
so we can use the unwind directives for code patching at boot or module
load time.
This will be used by dynamic shadow call stack support, which will rely
on code patching rather than compiler codegen to emit the shadow call
stack push and pop instructions.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Tested-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20221027155908.1940624-2-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Even though our EFI zboot decompressor is pedantically spec compliant
and idiomatic for EFI image loaders, calling LoadImage() and
StartImage() for the nested image is a bit of a burden. Not only does it
create workflow issues for the distros (as both the inner and outer
PE/COFF images need to be signed for secure boot), it also copies the
image around in memory numerous times:
- first, the image is decompressed into a buffer;
- the buffer is consumed by LoadImage(), which copies the sections into
a newly allocated memory region to hold the executable image;
- once the EFI stub is invoked by StartImage(), it will also move the
image in memory in case of KASLR, mirrored memory or if the image must
execute from a certain a priori defined address.
There are only two EFI spec compliant ways to load code into memory and
execute it:
- use LoadImage() and StartImage(),
- call ExitBootServices() and take ownership of the entire system, after
which anything goes.
Given that the EFI zboot decompressor always invokes the EFI stub, and
given that both are built from the same set of objects, let's merge the
two, so that we can avoid LoadImage()/StartImage but still load our
image into memory without breaking the above rules.
This also means we can decompress the image directly into its final
location, which could be randomized or meet other platform specific
constraints that LoadImage() does not know how to adhere to. It also
means that, even if the encapsulated image still has the EFI stub
incorporated as well, it does not need to be signed for secure boot when
wrapping it in the EFI zboot decompressor.
In the future, we might decide to retire the EFI stub attached to the
decompressed image, but for the time being, they can happily coexist.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The LoongArch build of the EFI stub is part of the core kernel image, and
therefore accesses section markers directly when it needs to figure out
the size of the various section.
The zboot decompressor does not have access to those symbols, but
doesn't really need that either. So let's move handle_kernel_image()
into a separate file (or rather, move everything else into a separate
file) so that the zboot build does not pull in unused code that links to
symbols that it does not define.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Currently, the EFI entry code for LoongArch is set up to copy the
executable image to the preferred offset, but instead of branching
directly into that image, it branches to the local copy of kernel_entry,
and relies on the logic in that function to switch to the link time
address instead.
This is a bit sloppy, and not something we can support once we merge the
EFI decompressor with the EFI stub. So let's clean this up a bit, by
adding a helper that computes the offset of kernel_entry from the start
of the image, and simply adding the result to VMLINUX_LOAD_ADDRESS.
And considering that we cannot execute from anywhere else anyway, let's
avoid efi_relocate_kernel() and just allocate the pages instead.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The arm64 build of the EFI stub is part of the core kernel image, and
therefore accesses section markers directly when it needs to figure out
the size of the various section.
The zboot decompressor does not have access to those symbols, but
doesn't really need that either. So let's move handle_kernel_image()
into a separate file (or rather, move everything else into a separate
file) so that the zboot build does not pull in unused code that links to
symbols that it does not define.
While at it, introduce a helper routine that the generic zboot loader
will need to invoke after decompressing the image but before invoking
it, to ensure that the I-side view of memory is consistent.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
The RISC-V build of the EFI stub is part of the core kernel image, and
therefore accesses section markers directly when it needs to figure out
the size of the various section.
The zboot decompressor does not have access to those symbols, but
doesn't really need that either. So let's move handle_kernel_image()
into a separate file (or rather, move everything else into a separate
file) so that the zboot build does not pull in unused code that links to
symbols that it does not define.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
Factor out the expressions that describe the preferred placement of the
loaded image as well as the minimum alignment so we can reuse them in
the decompressor.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|