Age | Commit message (Collapse) | Author |
|
When a PTE is updated in the page table, the _PAGE_NEWPAGE bit will
always be set. And the corresponding page will always be mapped or
unmapped depending on whether the PTE is present or not. The check
on the _PAGE_NEWPROT bit is not really reachable. Abandoning it will
allow us to simplify the code and remove the unreachable code.
Reviewed-by: Benjamin Berg <benjamin.berg@intel.com>
Signed-off-by: Tiwei Bie <tiwei.btw@antgroup.com>
Link: https://patch.msgid.link/20241011102354.1682626-2-tiwei.btw@antgroup.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Using clone will not undo features that have been enabled by libc. An
example of this already happening is rseq, which could cause the kernel
to read/write memory of the userspace process. In the future the
standard library might also use mseal by default to protect itself,
which would also thwart our attempts at unmapping everything.
Solve all this by taking a step back and doing an execve into a tiny
static binary that sets up the minimal environment required for the
stub without using any standard library. That way we have a clean
execution environment that is fully under the control of UML.
Note that this changes things a bit as the FDs are not anymore shared
with the kernel. Instead, we explicitly share the FDs for the physical
memory and all existing iomem regions. Doing this is fine, as iomem
regions cannot be added at runtime.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240919124511.282088-3-benjamin@sipsolutions.net
[use pipe() instead of pipe2(), remove unneeded close() calls]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The kernel flushes the memory ranges anyway for CoW and does not assume
that the userspace process has anything set up already. So, start with a
fresh process for the new mm context.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-8-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
The current LDT code has a few issues that mean it should be redone in a
different way once we always start with a fresh MM even when cloning.
In a new and better world, the kernel would just ensure its own LDT is
clear at startup. At that point, all that is needed is a simple function
to populate the LDT from another MM in arch_dup_mmap combined with some
tracking of the installed LDT entries for each MM.
Note that the old implementation was even incorrect with regard to
reading, as it copied out the LDT entries in the internal format rather
than converting them to the userspace structure.
Removal should be fine as the LDT is not used for thread-local storage
anymore.
Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20240703134536.1161108-7-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Rework syscall handling to be platform independent. Also create a clean
split between queueing of syscalls and flushing them out, removing the
need to keep state in the code that triggers the syscalls.
The code adds syscall_data_len to the global mm_id structure. This will
be used later to allow surrounding code to track whether syscalls still
need to run and if errors occurred.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20240703134536.1161108-5-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
When we switch to use seccomp, we need both the signal stack and other
data (i.e. syscall information) to co-exist in the stub data. To
facilitate this, start by defining separate memory areas for the stack
and syscall data.
This moves the signal stack onto a new page as the memory area is not
sufficient to hold both signal stack and syscall information.
Only change the signal stack setup for now, as the syscall code will be
reworked later.
Signed-off-by: Benjamin Berg <benjamin@sipsolutions.net>
Link: https://patch.msgid.link/20240703134536.1161108-3-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
If the two are mixed up, then it looks as though the parent
returned an error if the child failed (before) the mmap(),
and then the resulting process never gets killed. Fix this
by splitting the child and parent errors, reporting and
using them appropriately.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Convert files to use SPDX header. All files are licensed under the GPLv2.
Signed-off-by: Alex Dewar <alex.dewar@gmx.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
UML is using an obsolete itimer call for
all timers and "polls" for kernel space timer firing
in its userspace portion resulting in a long list
of bugs and incorrect behaviour(s). It also uses
ITIMER_VIRTUAL for its timer which results in the
timer being dependent on it running and the cpu
load.
This patch fixes this by moving to posix high resolution
timers firing off CLOCK_MONOTONIC and relaying the timer
correctly to the UML userspace.
Fixes:
- crashes when hosts suspends/resumes
- broken userspace timers - effecive ~40Hz instead
of what they should be. Note - this modifies skas behavior
by no longer setting an itimer per clone(). Timer events
are relayed instead.
- kernel network packet scheduling disciplines
- tcp behaviour especially under load
- various timer related corner cases
Finally, overall responsiveness of userspace is better.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Anton Ivanov <aivanov@brocade.com>
[rw: massaged commit message]
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
We can't just plop asm/* into it - userland helpers are built with it
in search path and seeing asm/* show up there suddenly would be a bad
idea.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|