summaryrefslogtreecommitdiff
path: root/arch/um/kernel
AgeCommit message (Collapse)Author
2024-11-30Merge tag 'uml-for-linus-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Lots of cleanups, mostly from Benjamin Berg and Tiwei Bie - Removal of unused code - Fix for sparse warnings - Cleanup around stub_exe() * tag 'uml-for-linus-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (68 commits) hostfs: Fix the NULL vs IS_ERR() bug for __filemap_get_folio() um: move thread info into task um: Always dump trace for specified task in show_stack um: vector: Do not use drvdata in release um: net: Do not use drvdata in release um: ubd: Do not use drvdata in release um: ubd: Initialize ubd's disk pointer in ubd_add um: virtio_uml: query the number of vqs if supported um: virtio_uml: fix call_fd IRQ allocation um: virtio_uml: send SET_MEM_TABLE message with the exact size um: remove broken double fault detection um: remove duplicate UM_NSEC_PER_SEC definition um: remove file sync for stub data um: always include kconfig.h and compiler-version.h um: set DONTDUMP and DONTFORK flags on KASAN shadow memory um: fix sparse warnings in signal code um: fix sparse warnings from regset refactor um: Remove double zero check um: fix stub exe build with CONFIG_GCOV um: Use os_set_pdeathsig helper in winch thread/process ...
2024-11-23Merge tag 'mm-stable-2024-11-18-19-27' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from Maíra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...
2024-11-12um: move thread info into taskBenjamin Berg
This selects the THREAD_INFO_IN_TASK option for UM and changes the way that the current task is discovered. This is trivial though, as UML already tracks the current task in cpu_tasks[] and this can be used to retrieve it. Also remove the signal handler code that copies the thread information into the IRQ stack. It is obsolete now, which also means that the mentioned race condition cannot happen anymore. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Reviewed-by: Hajime Tazaki <thehajime@gmail.com> Link: https://patch.msgid.link/20241111102910.46512-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07x86/module: prepare module loading for ROX allocations of textMike Rapoport (Microsoft)
When module text memory will be allocated with ROX permissions, the memory at the actual address where the module will live will contain invalid instructions and there will be a writable copy that contains the actual module code. Update relocations and alternatives patching to deal with it. [rppt@kernel.org: fix writable address in cfi_rewrite_endbr()] Link: https://lkml.kernel.org/r/ZysRwR29Ji8CcbXc@kernel.org Link: https://lkml.kernel.org/r/20241023162711.2579610-7-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Tested-by: Nathan Chancellor <nathan@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07asm-generic: introduce text-patching.hMike Rapoport (Microsoft)
Several architectures support text patching, but they name the header files that declare patching functions differently. Make all such headers consistently named text-patching.h and add an empty header in asm-generic for architectures that do not support text patching. Link: https://lkml.kernel.org/r/20241023162711.2579610-4-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Tested-by: kdevops <kdevops@lists.linux.dev> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Helge Deller <deller@gmx.de> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Song Liu <song@kernel.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-07um: Always dump trace for specified task in show_stackTiwei Bie
Currently, show_stack() always dumps the trace of the current task. However, it should dump the trace of the specified task if one is provided. Otherwise, things like running "echo t > sysrq-trigger" won't work as expected. Fixes: 970e51feaddb ("um: Add support for CONFIG_STACKTRACE") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241106103933.1132365-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07um: remove broken double fault detectionBenjamin Berg
The show_stack function had some code to detect double faults. However, the logic is wrong and it would e.g. trigger if a WARNING happened inside an IRQ. Remove it without trying to add a new logic. The current behaviour, which will just fault repeatedly until the IRQ stack is used up and the host kills UML, seems to be good enough. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241103150506.1367695-5-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07um: remove file sync for stub dataBenjamin Berg
There is no need to sync the stub code to "disk" for the other process to see the correct memory. Drop the fsync there and remove the helper function. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241103150506.1367695-3-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07um: Remove double zero checkShaojie Dong
free_pages() performs a parameter null check inside therefore remove double zero check here. Signed-off-by: Shaojie Dong <quic_shaojied@quicinc.com> Link: https://patch.msgid.link/20241025-upstream_branch-v5-1-b8998beb2c64@quicinc.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-29of/fdt: add dt_phys arg to early_init_dt_scan and early_init_dt_verifyUsama Arif
__pa() is only intended to be used for linear map addresses and using it for initial_boot_params which is in fixmap for arm64 will give an incorrect value. Hence save the physical address when it is known at boot time when calling early_init_dt_scan for arm64 and use it at kexec time instead of converting the virtual address using __pa(). Note that arm64 doesn't need the FDT region reserved in the DT as the kernel explicitly reserves the passed in FDT. Therefore, only a debug warning is fixed with this change. Reported-by: Breno Leitao <leitao@debian.org> Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Usama Arif <usamaarif642@gmail.com> Fixes: ac10be5cdbfa ("arm64: Use common of_kexec_alloc_and_setup_fdt()") Link: https://lore.kernel.org/r/20241023171426.452688-1-usamaarif642@gmail.com Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
2024-10-26um: fix stub exe build with CONFIG_GCOVJohannes Berg
CONFIG_GCOV is special and only in UML since it builds the kernel with a "userspace" option. This is fine, but the stub is even more special and not really a full userspace process, so it then fails to link as reported. Remove the GCOV options from the stub build. For good measure, also remove the GPROF options, even though they don't seem to cause build failures now. To be able to do this, export the specific options (GCOV_OPT and GPROF_OPT) but rename them so there's less chance of any conflicts. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202410242238.SXhs2kQ4-lkp@intel.com/ Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs") Link: https://patch.msgid.link/20241025102700.9fbb9c34473f.I7f1537fe075638f8da64beb52ef6c9e5adc51bc3@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: switch to regset API and depend on XSTATEBenjamin Berg
The PTRACE_GETREGSET API has now existed since Linux 2.6.33. The XSAVE CPU feature should also be sufficiently common to be able to rely on it. With this, define our internal FP state to be the hosts XSAVE data. Add discovery for the hosts XSAVE size and place the FP registers at the end of task_struct so that we can adjust the size at runtime. Next we can implement the regset API on top and update the signal handling as well as ptrace APIs to use them. Also switch coredump creation to use the regset API and finally set HAVE_ARCH_TRACEHOOK. This considerably improves the signal frames. Previously they might not have contained all the registers (i386) and also did not have the sizes and magic values set to the correct values to permit userspace to decode the frame. As a side effect, this will permit UML to run on hosts with newer CPU extensions (such as AMX) that need even more register state. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241023094120.4083426-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: insert scheduler ticks when userspace does not yieldBenjamin Berg
In time-travel mode userspace can do a lot of work without any time passing. Unfortunately, this can result in OOM situations as the RCU core code will never be run. Work around this by keeping track of userspace processes that do not yield for a lot of operations. When this happens, insert a jiffie into the sched_clock clock to account time against the process and cause the bookkeeping to run. As sched_clock is used for tracing, it is useful to keep it in sync between the different VMs. As such, try to remove added ticks again when the actual clock ticks. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241010142537.1134685-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Rename _PAGE_NEWPAGE to _PAGE_NEEDSYNCTiwei Bie
The _PAGE_NEWPAGE bit does not really indicate that this is a new page, but rather whether this entry needs to be synced or not. Renaming it to _PAGE_NEEDSYNC will make it more clear how everything ties together. Suggested-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011102354.1682626-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Abandon the _PAGE_NEWPROT bitTiwei Bie
When a PTE is updated in the page table, the _PAGE_NEWPAGE bit will always be set. And the corresponding page will always be mapped or unmapped depending on whether the PTE is present or not. The check on the _PAGE_NEWPROT bit is not really reachable. Abandoning it will allow us to simplify the code and remove the unreachable code. Reviewed-by: Benjamin Berg <benjamin.berg@intel.com> Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011102354.1682626-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Do not propagate initrd parameter to kernelTiwei Bie
This parameter is UML specific. It specifies the name of the file containing the initrd image, which is unknown to kernel. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011040441.1586345-10-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Do not propagate dtb parameter to kernelTiwei Bie
This parameter is UML specific and is unknown to kernel. It should not be propagated to kernel, otherwise it will be passed to user space as an environment option by kernel with a warning like: Unknown kernel command line parameters "dtb=/foo", will be passed to user space. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011040441.1586345-5-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Do not propagate mem parameter to kernelTiwei Bie
This parameter is UML specific and is unknown to kernel. It should not be propagated to kernel, otherwise it will be passed to user space as an environment option by kernel with a warning like: Unknown kernel command line parameters "mem=2G", will be passed to user space. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011040441.1586345-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: Remove UML specific debug parameterTiwei Bie
It does nothing but emit a warning when 'debug' is provided in the kernel command line. It can be a bit annoying, as 'debug' is also a valid kernel parameter to enable kernel debugging. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20241011040441.1586345-2-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: remove fault_catcher infrastructureJohannes Berg
This was perhaps intended to do _nofault copies, but the real reason is lost to history. Remove this, it's not needed, and using longjmp() out of the middle of the signal handler with all the state it has modified is not going to be a good idea anyway. Link: https://patch.msgid.link/20241010224513.901c4d390b3e.Ia74742668b44603c1ca23dd36f90e964e6e7ee55@changeid Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-23um: make stub_exe _start() pure inline asmJohannes Berg
Since __attribute__((naked)) cannot be used with functions containing C statements, just generate the few instructions it needs in assembly directly. While at it, fix the stack usage ("1 + 2*x - 1" is odd) and document what it must do, and why it must adjust the stack. Fixes: 8508a5e0e9db ("um: Fix misaligned stack in stub_exe") Link: https://lore.kernel.org/linux-um/CABVgOSntH-uoOFMP5HwMXjx_f1osMnVdhgKRKm4uz6DFm2Lb8Q@mail.gmail.com/ Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-18um: Fix misaligned stack in stub_exeDavid Gow
The stub_exe could segfault when built with some compilers (e.g. gcc 13.2.0), as SSE instructions which relied on stack alignment could be generated, but the stack was misaligned. This seems to be due to the __start entry point being run with a 16-byte aligned stack, but the x86_64 SYSV ABI wanting the stack to be so aligned _before_ a function call (so it is misaligned when the function is entered due to the return address being pushed). The function prologue then realigns it. Because the entry point is never _called_, and hence there is no return address, the prologue is therefore actually misaligning it, and causing the generated movaps instructions to SIGSEGV. This results in the following error: start_userspace : expected SIGSTOP, got status = 139 Don't generate this prologue for __start by using __attribute__((naked)), which resolves the issue. Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs") Signed-off-by: David Gow <davidgow@google.com> Link: https://lore.kernel.org/linux-um/CABVgOS=boUoG6=LHFFhxEd8H8jDP1zOaPKFEjH+iy2n2Q5S2aQ@mail.gmail.com/ Link: https://patch.msgid.link/20241017231007.1500497-2-davidgow@google.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-17um: Disable auto variable initialization for stub_exe.cNathan Chancellor
When automatic variable initialization is enabled via CONFIG_INIT_STACK_ALL_{PATTERN,ZERO}, clang will insert a call to memset() to initialize an object created with __builtin_alloca(). This ultimately breaks the build when linking stub_exe because it is a standalone executable that does not include or link against memset(). ld: arch/um/kernel/skas/stub_exe.o: in function `_start': arch/um/kernel/skas/stub_exe.c:83:(.ltext+0x15): undefined reference to `memset' Disable automatic variable initialization for stub_exe.c by passing the default value of 'uninitialized' to '-ftrivial-auto-var-init', which avoids generating the call to memset(). This code is small and runs quickly as it is just designed to set up an environment, so stack variable initialization is unnecessary overhead for little gain. Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20241016-uml-fix-stub_exe-clang-v1-2-3d6381dc5a78@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-17um: Fix passing '-n' to linker for stub_exeNathan Chancellor
When building stub_exe with clang, there is an error because '-n' is not a recognized flag by the clang driver (which is being used to invoke the linker): clang: error: unknown argument: '-n' '-n' should be passed along to the linker, as it is the short flag for '--nmagic', so prefix it with '-Wl,'. Fixes: 32e8eaf263d9 ("um: use execveat to create userspace MMs") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20241016-uml-fix-stub_exe-clang-v1-1-3d6381dc5a78@kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Switch to 4 level page tables on 64 bitBenjamin Berg
The larger memory space is useful to support more applications inside UML. One example for this is ASAN instrumentation of userspace applications which requires addresses that would otherwise not be available. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-11-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: clear all memory in new userspace processesBenjamin Berg
With the change to use execve() we can now safely clear the memory up to STUB_START as rseq will not be trying to use memory in that region. Also, on 64 bit the previous changes should mean that there is no usable memory range above the stub. Make the change and remove the comment as it is not needed anymore. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-10-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Discover host_task_size from envpBenjamin Berg
When loading the UML binary, the host kernel will place the stack at the highest possible address. It will then map the program name and environment variables onto the start of the stack. As such, an easy way to figure out the host_task_size is to use the highest pointer to an environment variable as a reference. Ensure that this works by disabling address layout randomization and re-executing UML in case it was enabled. This increases the available TASK_SIZE for 64 bit UML considerably. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-9-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Limit TASK_SIZE to the addressable rangeBenjamin Berg
We may have a TASK_SIZE from the host that is bigger than UML is able to address with a three-level pagetable on 64-bit. Guard against that by clipping the maximum TASK_SIZE to the maximum addressable area. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-8-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Calculate stub data address relative to stub codeBenjamin Berg
Instead of using the current stack pointer, we can also use the current instruction to calculate where the stub data is. With this the stub data only needs to be aligned to a full page boundary. Changing this has the advantage that we do not have a hole in the memory space above the stub data (which would need to be explicitly cleared). Another motivation to do this is that with the planned addition of a SECCOMP based userspace the stack pointer may not be fully trustworthy. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-7-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Add compile time assert that stub fits on a pageBenjamin Berg
The code assumes that the stub code can fit into a single page. This is unlikely to ever change, but add a link time assert instead so that there will be no hard to debug error. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-6-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Set parent death signal for userspace processBenjamin Berg
Enable PR_SET_PDEATHSIG so that the UML userspace process will be killed when the kernel exits unexpectedly. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-4-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: use execveat to create userspace MMsBenjamin Berg
Using clone will not undo features that have been enabled by libc. An example of this already happening is rseq, which could cause the kernel to read/write memory of the userspace process. In the future the standard library might also use mseal by default to protect itself, which would also thwart our attempts at unmapping everything. Solve all this by taking a step back and doing an execve into a tiny static binary that sets up the minimal environment required for the stub without using any standard library. That way we have a clean execution environment that is fully under the control of UML. Note that this changes things a bit as the FDs are not anymore shared with the kernel. Instead, we explicitly share the FDs for the physical memory and all existing iomem regions. Doing this is fine, as iomem regions cannot be added at runtime. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240919124511.282088-3-benjamin@sipsolutions.net [use pipe() instead of pipe2(), remove unneeded close() calls] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: remove auxiliary FP registersBenjamin Berg
We do not need the extra save/restore of the FP registers when getting the fault information. This was originally added in commit 2f56debd77a8 ("uml: fix FP register corruption") but at that time the code was not saving/restoring the FP registers when switching to userspace. This was fixed in commit fbfe9c847edf ("um: Save FPU registers between task switches") and since then the auxiliary registers have not been useful. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20241004233821.2130874-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: always use the internal copy of the FP registersBenjamin Berg
When switching from userspace to the kernel, all registers including the FP registers are copied into the kernel and restored later on. As such, the true source for the FP register state is actually already in the kernel and they should never be grabbed from the userspace process. Change the various places to simply copy the data from the internal FP register storage area. Note that on i386 the format of PTRACE_GETFPREGS and PTRACE_GETFPXREGS is different enough that conversion would be needed. With this patch, -EINVAL is returned if the non-native format is requested. The upside is, that this patchset fixes setting registers via ptrace (which simply did not work before) as well as fixing setting floating point registers using the mcontext on signal return on i386. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240913133845.964292-1-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Fix the return value of elf_core_copy_task_fpregsTiwei Bie
This function is expected to return a boolean value, which should be true on success and false on failure. Fixes: d1254b12c93e ("uml: fix x86_64 core dump crash") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240913023302.130300-1-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Fix the definition for physmem_sizeTiwei Bie
Currently physmem_size is defined as long long but declared locally as unsigned long long before using it in separate .c files. Make them match by defining physmem_size as unsigned long long and also move the declaration to a common header to allow the compiler to check it. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240916045950.508910-5-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Remove highmem leftoversTiwei Bie
Highmem was only supported on UML/i386. And the support has been removed by commit a98a6d864d3b ("um: Remove broken highmem support"). Remove the leftovers and stop UML from trying to setup highmem when the sum of physmem_size and iomem_size exceeds max_physmem. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240916045950.508910-4-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-10-10um: Fix potential integer overflow during physmem setupTiwei Bie
This issue happens when the real map size is greater than LONG_MAX, which can be easily triggered on UML/i386. Fixes: fe205bdd1321 ("um: Print minimum physical memory requirement") Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Link: https://patch.msgid.link/20240916045950.508910-3-tiwei.btw@antgroup.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-09-27Merge tag 'uml-for-linus-6.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Removal of dead code (TT mode leftovers, etc) - Fixes for the network vector driver - Fixes for time-travel mode * tag 'uml-for-linus-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: um: fix time-travel syscall scheduling hack um: Remove outdated asm/sysrq.h header um: Remove the declaration of user_thread function um: Remove the call to SUBARCH_EXECVE1 macro um: Remove unused mm_fd field from mm_id um: Remove unused fields from thread_struct um: Remove the redundant newpage check in update_pte_range um: Remove unused kpte_clear_flush macro um: Remove obsoleted declaration for execute_syscall_skas user_mode_linux_howto_v2: add VDE vector support in doc vector_user: add VDE support um: remove ARCH_NO_PREEMPT_DYNAMIC um: vector: Fix NAPI budget handling um: vector: Replace locks guarding queue depth with atomics um: remove variable stack array in os_rcv_fd_msg()
2024-09-12um: fix time-travel syscall scheduling hackJohannes Berg
The schedule() call there really never did anything at least since the introduction of the EEVDF scheduler, but now I found a case where we permanently hang in a loop of -ERESTARTNOINTR (due to locking.) Work around it by making any syscalls with error return take time (and then schedule after) so we cannot hang in such a loop forever. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove outdated asm/sysrq.h headerTiwei Bie
This header no longer serves a purpose after show_trace was removed by commit 9d1ee8ce92e1 ("um: Rewrite show_stack()"). Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove the call to SUBARCH_EXECVE1 macroTiwei Bie
This macro has never been defined by any supported sub-architectures in tree since it was introduced by commit 1d3468a6643a ("[PATCH uml: move _kern.c files"). Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove unused mm_fd field from mm_idTiwei Bie
It's no longer used since the removal of the SKAS3/4 support. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove unused fields from thread_structTiwei Bie
These fields are no longer used since the removal of tt mode. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-09-12um: Remove the redundant newpage check in update_pte_rangeTiwei Bie
The two checks have been identical since commit ef714f15027c ("um: remove force_flush_all from fork_handler"). And the inner one isn't necessary anymore. Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2024-08-08Merge tag 'drm-misc-next-2024-08-01' of ↵Daniel Vetter
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: virtio: - Define DRM capset Cross-subsystem Changes: dma-buf: - heaps: Clean up documentation printk: - Pass description to kmsg_dump() Core Changes: CI: - Update IGT tests - Point upstream repo to GitLab instance modesetting: - Introduce Power Saving Policy property for connectors - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support panic: - Avoid build-time interference with framebuffer console docs: - Document Colorspace property scheduler: - Remove full_recover from drm_sched_start TTM: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory Driver Changes: amdgpu: - Support Power Saving Policy connector property ast: - astdp: Support AST2600 with VGA; Clean up HPD bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable gma500: - Update i2c terminology ivpu: - Add MODULE_FIRMWARE() lcdif: - Fix pixel clock loongson: - Use GEM refcount over TTM's mgag200: - Improve BMC handling - Support VBLANK intterupts nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's panel: - Shutdown fixes plus documentation - Refactor several drivers for better code sharing - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing sti: - Fix module owner stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - Call drm_atomic_helper_shutdown() v3d: - Clean up perfmon vkms: - Clean up Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240801121406.GA102996@linux.fritz.box
2024-07-17printk: Add a short description string to kmsg_dump()Jocelyn Falempe
kmsg_dump doesn't forward the panic reason string to the kmsg_dumper callback. This patch adds a new struct kmsg_dump_detail, that will hold the reason and description, and pass it to the dump() callback. To avoid updating all kmsg_dump() call, it adds a kmsg_dump_desc() function and a macro for backward compatibility. I've written this for drm_panic, but it can be useful for other kmsg_dumper. It allows to see the panic reason, like "sysrq triggered crash" or "VFS: Unable to mount root fs on xxxx" on the drm panic screen. v2: * Use a struct kmsg_dump_detail to hold the reason and description pointer, for more flexibility if we want to add other parameters. (Kees Cook) * Fix powerpc/nvram_64 build, as I didn't update the forward declaration of oops_to_nvram() Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Acked-by: Petr Mladek <pmladek@suse.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Kees Cook <kees@kernel.org> Link: https://patchwork.freedesktop.org/patch/msgid/20240702122639.248110-1-jfalempe@redhat.com
2024-07-04um: register power-off handlerJohannes Berg
Otherwise we always get reboot: Power off not available: System halted instead which is really quite pointless. Link: https://patch.msgid.link/20240703173839.fcbb538c6686.I3d333f4773cff93c4337c4d128ee0b1b501b3dfa@changeid Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: Enable preemption in UMLAnton Ivanov
Since userspace state is saved in the MM process, kernel using FPU still doesn't really need to do anything, so this really is as simple as enabling preemption. The irq critical section in sigio_handler() needs preempt_disable()/preempt_enable(). Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Link: https://patch.msgid.link/20240702102549.d2fcea450854.I12f5a53d80ec1e425e66ef272b1e95cb523b608e@changeid [rebase, remove FPU save/restore, fix x86/um Makefile, rewrite commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-07-03um: refactor TLB update handlingBenjamin Berg
Conceptually, we want the memory mappings to always be up to date and represent whatever is in the TLB. To ensure that, we need to sync them over in the userspace case and for the kernel we need to process the mappings. The kernel will call flush_tlb_* if page table entries that were valid before become invalid. Unfortunately, this is not the case if entries are added. As such, change both flush_tlb_* and set_ptes to track the memory range that has to be synchronized. For the kernel, we need to execute a flush_tlb_kern_* immediately but we can wait for the first page fault in case of set_ptes. For userspace in contrast we only store that a range of memory needs to be synced and do so whenever we switch to that process. Signed-off-by: Benjamin Berg <benjamin.berg@intel.com> Link: https://patch.msgid.link/20240703134536.1161108-13-benjamin@sipsolutions.net Signed-off-by: Johannes Berg <johannes.berg@intel.com>