Age | Commit message (Collapse) | Author |
|
Mention the alternative of adding trace_clock=global to the kernel
command line when we detect that we've used an unstable clock across a
suspend/resume cycle.
Link: http://lkml.kernel.org/r/20180330150132.16903-2-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Across suspend, we may see a very large drift in timestamps if the sched
clock is unstable, prompting the global trace's ringbuffer code to warn
and suggest switching to the global clock. Preempt this request by
detecting when the sched clock is unstable (determined during
late_initcall) and automatically switching the default clock over to
trace_global_clock.
This should prevent requiring user interaction to resolve warnings such
as:
Delta way too big! 18446743856563626466 ts=18446744054496180323 write stamp = 197932553857
If you just came from a suspend/resume,
please switch to the trace global clock:
echo global > /sys/kernel/debug/tracing/trace_clock
Link: http://lkml.kernel.org/r/20180330150132.16903-1-chris@chris-wilson.co.uk
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Commit 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
tried to spread the interrupts accross all possible CPUs to make sure that
in case of phsyical hotplug (e.g. virtualization) the CPUs which get
plugged in after the device was initialized are targeted by a hardware
queue and the corresponding interrupt.
This has a downside in cases where the ACPI tables claim that there are
more possible CPUs than present CPUs and the number of interrupts to spread
out is smaller than the number of possible CPUs. These bogus ACPI tables
are unfortunately not uncommon.
In such a case the vector spreading algorithm assigns interrupts to CPUs
which can never be utilized and as a consequence these interrupts are
unused instead of being mapped to present CPUs. As a result the performance
of the device is suboptimal.
To fix this spread the interrupt vectors in two stages:
1) Spread as many interrupts as possible among the present CPUs
2) Spread the remaining vectors among non present CPUs
On a 8 core system, where CPU 0-3 are present and CPU 4-7 are not present,
for a device with 4 queues the resulting interrupt affinity is:
1) Before 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
irq 39, cpu list 0
irq 40, cpu list 1
irq 41, cpu list 2
irq 42, cpu list 3
2) With 84676c1f21 ("genirq/affinity: assign vectors to all possible CPUs")
irq 39, cpu list 0-2
irq 40, cpu list 3-4,6
irq 41, cpu list 5
irq 42, cpu list 7
3) With the refined vector spread applied:
irq 39, cpu list 0,4
irq 40, cpu list 1,6
irq 41, cpu list 2,5
irq 42, cpu list 3,7
On a 8 core system, where all CPUs are present the resulting interrupt
affinity for the 4 queues is:
irq 39, cpu list 0,1
irq 40, cpu list 2,3
irq 41, cpu list 4,5
irq 42, cpu list 6,7
This is independent of the number of CPUs which are online at the point of
initialization because in such a system the offline CPUs can be easily
onlined afterwards, while in non-present CPUs need to be plugged physically
or virtually which requires external interaction.
The downside of this approach is that in case of physical hotplug the
interrupt vector spreading might be suboptimal when CPUs 4-7 are physically
plugged. Suboptimal from a NUMA point of view and due to the single target
nature of interrupt affinities the later plugged CPUs might not be targeted
by interrupts at all.
Though, physical hotplug systems are not the common case while the broken
ACPI table disease is wide spread. So it's preferred to have as many
interrupts as possible utilized at the point where the device is
initialized.
Block multi-queue devices like NVME create a hardware queue per possible
CPU, so the goal of commit 84676c1f21 to assign one interrupt vector per
possible CPU is still achieved even with physical/virtual hotplug.
[ tglx: Changed from online to present CPUs for the first spreading stage,
renamed variables for readability sake, added comments and massaged
changelog ]
Reported-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Link: https://lkml.kernel.org/r/20180308105358.1506-5-ming.lei@redhat.com
|
|
To support two stage irq vector spreading, it's required to add a starting
point to the spreading function. No functional change, just preparatory
work for the actual two stage change.
[ tglx: Renamed variables, tidied up the code and massaged changelog ]
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Link: https://lkml.kernel.org/r/20180308105358.1506-4-ming.lei@redhat.com
|
|
No functional change, just prepare for converting to 2-stage irq vector
spreading.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Link: https://lkml.kernel.org/r/20180308105358.1506-3-ming.lei@redhat.com
|
|
The following patches will introduce two stage irq spreading for improving
irq spread on all possible CPUs.
No functional change.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Cc: Laurence Oberman <loberman@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Link: https://lkml.kernel.org/r/20180308105358.1506-2-ming.lei@redhat.com
|
|
When the allocation of node_to_possible_cpumask fails, then
irq_create_affinity_masks() returns with a pointer to the empty affinity
masks array, which will cause malfunction.
Reorder the allocations so the masks array allocation comes last and every
failure path returns NULL.
Fixes: 9a0ef98e186d ("genirq/affinity: Assign vectors to all present CPUs")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
|
|
Add a new pointer argument to cpuidle_select() and to the ->select
cpuidle governor callback to allow a boolean value indicating
whether or not the tick should be stopped before entering the
selected state to be returned from there.
Make the ladder governor ignore that pointer (to preserve its
current behavior) and make the menu governor return 'false" through
it if:
(1) the idle exit latency is constrained at 0, or
(2) the selected state is a polling one, or
(3) the expected idle period duration is within the tick period
range.
In addition to that, the correction factor computations in the menu
governor need to take the possibility that the tick may not be
stopped into account to avoid artificially small correction factor
values. To that end, add a mechanism to record tick wakeups, as
suggested by Peter Zijlstra, and use it to modify the menu_update()
behavior when tick wakeup occurs. Namely, if the CPU is woken up by
the tick and the return value of tick_nohz_get_sleep_length() is not
within the tick boundary, the predicted idle duration is likely too
short, so make menu_update() try to compensate for that by updating
the governor statistics as though the CPU was idle for a long time.
Since the value returned through the new argument pointer of
cpuidle_select() is not used by its caller yet, this change by
itself is not expected to alter the functionality of the code.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
Since the subsequent changes will need a TICK_USEC definition
analogous to TICK_NSEC, rename the existing TICK_USEC as
USER_TICK_USEC, update its users and redefine TICK_USEC
accordingly.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
|
|
Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
reason. It looks like it's only a convenience, so remove kmemleak.h
from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
don't already #include it. Also remove <linux/kmemleak.h> from source
files that do not use it.
This is tested on i386 allmodconfig and x86_64 allmodconfig. It would
be good to run it through the 0day bot for other $ARCHes. I have
neither the horsepower nor the storage space for the other $ARCHes.
Update: This patch has been extensively build-tested by both the 0day
bot & kisskb/ozlabs build farms. Both of them reported 2 build failures
for which patches are included here (in v2).
[ slab.h is the second most used header file after module.h; kernel.h is
right there with slab.h. There could be some minor error in the
counting due to some #includes having comments after them and I didn't
combine all of those. ]
[akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
Link: http://kisskb.ellerman.id.au/kisskb/head/13396/
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au> [2 build failures]
Reported-by: Fengguang Wu <fengguang.wu@intel.com> [2 build failures]
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
KASAN splats indicate that in some cases we free a live mm, then
continue to access it, with potentially disastrous results. This is
likely due to a mismatched mmdrop() somewhere in the kernel, but so far
the culprit remains elusive.
Let's have __mmdrop() verify that the mm isn't live for the current
task, similar to the existing check for init_mm. This way, we can catch
this class of issue earlier, and without requiring KASAN.
Currently, idle_task_exit() leaves active_mm stale after it switches to
init_mm. This isn't harmful, but will trigger the new assertions, so we
must adjust idle_task_exit() to update active_mm.
Link: http://lkml.kernel.org/r/20180312140103.19235-1-mark.rutland@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- Add info about loaded kdump kernel into the dump stack header
- Move dump-stack related code from printk.c to lib/dump_stack.c
- Write message about suspending consoles in KERN_INFO log level
* 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk: change message to pr_info
printk: move dump stack related code to lib/dump_stack.c
print kdump kernel loaded status in stack dump
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
kfifo: fix inaccurate comment
tools/thermal: tmon: fix for segfault
net: Spelling s/stucture/structure/
edd: don't spam log if no EDD information is present
Documentation: Fix early-microcode.txt references after file rename
tracing: Block comments should align the * on each line
treewide: Fix typos in printk
GenWQE: Fix a typo in two comments
treewide: Align function definition open/close braces
|
|
Make cpuidle_idle_call() decide whether or not to stop the tick.
First, the cpuidle_enter_s2idle() path deals with the tick (and with
the entire timekeeping for that matter) by itself and it doesn't need
the tick to be stopped beforehand.
Second, to address the issue with short idle duration predictions
by the idle governor after the tick has been stopped, it will be
necessary to change the ordering of cpuidle_select() with respect
to tick_nohz_idle_stop_tick(). To prepare for that, put a
tick_nohz_idle_stop_tick() call in the same branch in which
cpuidle_select() is called.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
Push the decision whether or not to stop the tick somewhat deeper
into the idle loop.
Stopping the tick upfront leads to unpleasant outcomes in case the
idle governor doesn't agree with the nohz code on the duration of the
upcoming idle period. Specifically, if the tick has been stopped and
the idle governor predicts short idle, the situation is bad regardless
of whether or not the prediction is accurate. If it is accurate, the
tick has been stopped unnecessarily which means excessive overhead.
If it is not accurate, the CPU is likely to spend too much time in
the (shallow, because short idle has been predicted) idle state
selected by the governor [1].
As the first step towards addressing this problem, change the code
to make the tick stopping decision inside of the loop in do_idle().
In particular, do not stop the tick in the cpu_idle_poll() code path.
Also don't do that in tick_nohz_irq_exit() which doesn't really have
enough information on whether or not to stop the tick.
Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1]
Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
Prepare the scheduler tick code for reworking the idle loop to
avoid stopping the tick in some cases.
The idea is to split the nohz idle entry call to decouple the idle
time stats accounting and preparatory work from the actual tick stop
code, in order to later be able to delay the tick stop once we reach
more power-knowledgeable callers.
Move away the tick_nohz_start_idle() invocation from
__tick_nohz_idle_enter(), rename the latter to
__tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick()
as a wrapper around it for calling it from the outside.
Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead
of calling the entire __tick_nohz_idle_enter(), add another wrapper
disabling and enabling interrupts around tick_nohz_idle_stop_tick()
and make the current callers of tick_nohz_idle_enter() call it too
to retain their current functionality.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
|
|
It may be useful for an architecture to override the definitions of the
COMPAT_SYSCALL_DEFINE0() and __COMPAT_SYSCALL_DEFINEx() macros in
<linux/compat.h>, in particular to use a different calling convention
for syscalls. This patch provides a mechanism to do so, based on the
previously introduced CONFIG_ARCH_HAS_SYSCALL_WRAPPER. If it is enabled,
<asm/sycall_wrapper.h> is included in <linux/compat.h> and may be used
to define the macros mentioned above. Moreover, as the syscall calling
convention may be different if CONFIG_ARCH_HAS_SYSCALL_WRAPPER is set,
the compat syscall function prototypes in <linux/compat.h> are #ifndef'd
out in that case.
As some of the syscalls and/or compat syscalls may not be present,
the COND_SYSCALL() and COND_SYSCALL_COMPAT() macros in kernel/sys_ni.c
as well as the SYS_NI() and COMPAT_SYS_NI() macros in
kernel/time/posix-stubs.c can be re-defined in <asm/syscall_wrapper.h> iff
CONFIG_ARCH_HAS_SYSCALL_WRAPPER is enabled.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180405095307.3730-5-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
By renaming the functions we can get rid of the skip parameter
and have better code redability. It makes zero sense to have
things such as:
rq_clock_skip_update(rq, false)
When the skip request is in fact not going to happen. Ever. Rename
things such that we end up with:
rq_clock_skip_update(rq)
rq_clock_cancel_skipupdate(rq)
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: matt@codeblueprint.co.uk
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20180404161539.nhadkff2aats74jh@linux-n805
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
While running rt-tests' pi_stress program I got the following splat:
rq->clock_update_flags < RQCF_ACT_SKIP
WARNING: CPU: 27 PID: 0 at kernel/sched/sched.h:960 assert_clock_updated.isra.38.part.39+0x13/0x20
[...]
<IRQ>
enqueue_top_rt_rq+0xf4/0x150
? cpufreq_dbs_governor_start+0x170/0x170
sched_rt_rq_enqueue+0x65/0x80
sched_rt_period_timer+0x156/0x360
? sched_rt_rq_enqueue+0x80/0x80
__hrtimer_run_queues+0xfa/0x260
hrtimer_interrupt+0xcb/0x220
smp_apic_timer_interrupt+0x62/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
[...]
do_idle+0x183/0x1e0
cpu_startup_entry+0x5f/0x70
start_secondary+0x192/0x1d0
secondary_startup_64+0xa5/0xb0
We can get rid of it be the "traditional" means of adding an
update_rq_clock() call after acquiring the rq->lock in
do_sched_rt_period_timer().
The case for the RT task throttling (which this workload also hits)
can be ignored in that the skip_update call is actually bogus and
quite the contrary (the request bits are removed/reverted).
By setting RQCF_UPDATED we really don't care if the skip is happening
or not and will therefore make the assert_clock_updated() check happy.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: linux-kernel@vger.kernel.org
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/20180402164954.16255-1-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the "big" set of driver core patches for 4.17-rc1.
There's really not much here, just a bunch of firmware code
refactoring from Luis as he attempts to wrangle that codebase into
something that is managable, along with a bunch of userspace tests for
it. Other than that, a handful of small bugfixes and reverts of things
that didn't work out.
Full details are in the shortlog, it's not all that much.
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (30 commits)
drivers: base: remove check for callback in coredump_store()
mt7601u: use firmware_request_cache() to address cache on reboot
firmware: add firmware_request_cache() to help with cache on reboot
firmware: fix typo on pr_info_once() when ignore_sysfs_fallback is used
firmware: explicitly include vmalloc.h
firmware: ensure the firmware cache is not used on incompatible calls
test_firmware: modify custom fallback tests to use unique files
firmware: add helper to check to see if fw cache is setup
firmware: fix checking for return values for fw_add_devm_name()
rename: _request_firmware_load() fw_load_sysfs_fallback()
test_firmware: test three firmware kernel configs using a proc knob
test_firmware: expand on library with shared helpers
firmware: enable to force disable the fallback mechanism at run time
firmware: enable run time change of forcing fallback loader
firmware: move firmware loader into its own directory
firmware: split firmware fallback functionality into its own file
firmware: move loading timeout under struct firmware_fallback_config
firmware: use helpers for setting up a temporary cache timeout
firmware: simplify CONFIG_FW_LOADER_USER_HELPER_FALLBACK further
drivers: base: add description for .coredump() callback
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"Nothing particularly stands out here, probably because people were
tied up with spectre/meltdown stuff last time around. Still, the main
pieces are:
- Rework of our CPU features framework so that we can whitelist CPUs
that don't require kpti even in a heterogeneous system
- Support for the IDC/DIC architecture extensions, which allow us to
elide instruction and data cache maintenance when writing out
instructions
- Removal of the large memory model which resulted in suboptimal
codegen by the compiler and increased the use of literal pools,
which could potentially be used as ROP gadgets since they are
mapped as executable
- Rework of forced signal delivery so that the siginfo_t is
well-formed and handling of show_unhandled_signals is consolidated
and made consistent between different fault types
- More siginfo cleanup based on the initial patches from Eric
Biederman
- Workaround for Cortex-A55 erratum #1024718
- Some small ACPI IORT updates and cleanups from Lorenzo Pieralisi
- Misc cleanups and non-critical fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
arm64: uaccess: Fix omissions from usercopy whitelist
arm64: fpsimd: Split cpu field out from struct fpsimd_state
arm64: tlbflush: avoid writing RES0 bits
arm64: cmpxchg: Include linux/compiler.h in asm/cmpxchg.h
arm64: move percpu cmpxchg implementation from cmpxchg.h to percpu.h
arm64: cmpxchg: Include build_bug.h instead of bug.h for BUILD_BUG
arm64: lse: Include compiler_types.h and export.h for out-of-line LL/SC
arm64: fpsimd: include <linux/init.h> in fpsimd.h
drivers/perf: arm_pmu_platform: do not warn about affinity on uniprocessor
perf: arm_spe: include linux/vmalloc.h for vmap()
Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)"
arm64: cpufeature: Avoid warnings due to unused symbols
arm64: Add work around for Arm Cortex-A55 Erratum 1024718
arm64: Delay enabling hardware DBM feature
arm64: Add MIDR encoding for Arm Cortex-A55 and Cortex-A35
arm64: capabilities: Handle shared entries
arm64: capabilities: Add support for checks based on a list of MIDRs
arm64: Add helpers for checking CPU MIDR against a range
arm64: capabilities: Clean up midr range helpers
arm64: capabilities: Change scope of VHE to Boot CPU feature
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"The usual pile of boring changes:
- Consolidate tasklet functions to share code instead of duplicating
it
- The first step for making the low level entry handler management on
multi-platform kernels generic
- A new sysfs file which allows to retrieve the wakeup state of
interrupts.
- Ensure that the interrupt thread follows the effective affinity and
not the programmed affinity to avoid cross core wakeups.
- Two new interrupt controller drivers (Microsemi Ocelot and Qualcomm
PDC)
- Fix the wakeup path clock handling for Reneasas interrupt chips.
- Rework the boot time register reset for ARM GIC-V2/3
- Better suspend/resume support for ARM GIV-V3/ITS
- Add missing locking to the ARM GIC set_type() callback
- Small fixes for the irq simulator code
- SPDX identifiers for the irq core code and removal of boiler plate
- Small cleanups all over the place"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
openrisc: Set CONFIG_MULTI_IRQ_HANDLER
arm64: Set CONFIG_MULTI_IRQ_HANDLER
genirq: Make GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER
irqchip/gic: Take lock when updating irq type
irqchip/gic: Update supports_deactivate static key to modern api
irqchip/gic-v3: Ensure GICR_CTLR.EnableLPI=0 is observed before enabling
irqchip: Add a driver for the Microsemi Ocelot controller
dt-bindings: interrupt-controller: Add binding for the Microsemi Ocelot interrupt controller
irqchip/gic-v3: Probe for SCR_EL3 being clear before resetting AP0Rn
irqchip/gic-v3: Don't try to reset AP0Rn
irqchip/gic-v3: Do not check trigger configuration of partitionned LPIs
genirq: Remove license boilerplate/references
genirq: Add missing SPDX identifiers
genirq/matrix: Cleanup SPDX identifier
genirq: Cleanup top of file comments
genirq: Pass desc to __irq_free instead of irq number
irqchip/gic-v3: Loudly complain about the use of IRQ_TYPE_NONE
irqchip/gic: Loudly complain about the use of IRQ_TYPE_NONE
RISC-V: Move to the new GENERIC_IRQ_MULTI_HANDLER handler
genirq: Add CONFIG_GENERIC_IRQ_MULTI_HANDLER
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull time(r) updates from Thomas Gleixner:
"A small set of updates for timers and timekeeping:
- The most interesting change is the consolidation of clock MONOTONIC
and clock BOOTTIME.
Clock MONOTONIC behaves now exactly like clock BOOTTIME and does
not longer ignore the time spent in suspend. A new clock
MONOTONIC_ACTIVE is provived which behaves like clock MONOTONIC in
kernels before this change. This allows applications to
programmatically check for the clock MONOTONIC behaviour.
As discussed in the review thread, this has the potential of
breaking user space and we might have to revert this. Knock on wood
that we can avoid that exercise.
- Updates to the NTP mechanism to improve accuracy
- A new kernel internal data structure to aid the ongoing Y2038 work.
- Cleanups and simplifications of the clocksource code.
- Make the alarmtimer code play nicely with debugobjects"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
alarmtimer: Init nanosleep alarm timer on stack
y2038: Introduce struct __kernel_old_timeval
tracing: Unify the "boot" and "mono" tracing clocks
hrtimer: Unify MONOTONIC and BOOTTIME clock behavior
posix-timers: Unify MONOTONIC and BOOTTIME clock behavior
timekeeping: Remove boot time specific code
Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior
timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock
timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock
timekeeping/ntp: Determine the multiplier directly from NTP tick length
timekeeping/ntp: Don't align NTP frequency adjustments to ticks
clocksource: Use ATTRIBUTE_GROUPS
clocksource: Use DEVICE_ATTR_RW/RO/WO to define device attributes
clocksource: Don't walk the clocksource list for empty override
|
|
These config switches enable the same code in the core and the not yet
converted architecture code. They can be selected both by randconfig builds
and cause linker error because the same symbols are defined twice.
Make the new GENERIC_IRQ_MULTI_HANDLER depend on !MULTI_IRQ_HANDLER to
prevent that. The dependency will be removed once all architectures are
converted over.
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lkml.kernel.org/r/20180404043130.31277-4-palmer@sifive.com
|
|
There will be a build warning -Wunused-function if CONFIG_CGROUP_BPF
isn't defined, since the only user is inside #ifdef CONFIG_CGROUP_BPF:
kernel/bpf/syscall.c:1229:12: warning: ‘bpf_prog_attach_check_attach_type’
defined but not used [-Wunused-function]
static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Current code moves function bpf_prog_attach_check_attach_type inside
ifdef CONFIG_CGROUP_BPF.
Fixes: 5e43f899b03a ("bpf: Check attach type at prog load time")
Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
It is possible to have multiple ULP tcp_release call paths in flight
if a sock is closed and simultaneously being removed from the sockmap
control path. The result would be setting the sk_prot to the saved
values on the first iteration and then on the second iteration setting
the value to NULL.
This patch resolves this by ensuring we only reset the sk_prot pointer
if we have a valid saved state to set.
Fixes: 4f738adba30a7 ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
If a socket with pending cork data is closed we do not return the
memory to the socket until the garbage collector free's the psock
structure. The garbage collector though can run after the sock has
completed its close operation. If this ordering happens the sock code
will through a WARN_ON because there is still outstanding memory
accounted to the sock.
To resolve this ensure we return memory to the sock when a socket
is closed.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Fixes: 91843d540a13 ("bpf: sockmap, add msg_cork_bytes() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace updates from Eric Biederman:
"There was a lot of work this cycle fixing bugs that were discovered
after the merge window and getting everything ready where we can
reasonably support fully unprivileged fuse. The bug fixes you already
have and much of the unprivileged fuse work is coming in via other
trees.
Still left for fully unprivileged fuse is figuring out how to cleanly
handle .set_acl and .get_acl in the legacy case, and properly handling
of evm xattrs on unprivileged mounts.
Included in the tree is a cleanup from Alexely that replaced a linked
list with a statically allocated fix sized array for the pid caches,
which simplifies and speeds things up.
Then there is are some cleanups and fixes for the ipc namespace. The
motivation was that in reviewing other code it was discovered that
access ipc objects from different pid namespaces recorded pids in such
a way that when asked the wrong pids were returned. In the worst case
there has been a measured 30% performance impact for sysvipc
semaphores. Other test cases showed no measurable performance impact.
Manfred Spraul and Davidlohr Bueso who tend to work on sysvipc
performance both gave the nod that this is good enough.
Casey Schaufler and James Morris have given their approval to the LSM
side of the changes.
I simplified the types and the code dealing with sysvipc to pass just
kern_ipc_perm for all three types of ipc. Which reduced the header
dependencies throughout the kernel and simplified the lsm code.
Which let me work on the pid fixes without having to worry about
trivial changes causing complete kernel recompiles"
* 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
ipc/shm: Fix pid freeing.
ipc/shm: fix up for struct file no longer being available in shm.h
ipc/smack: Tidy up from the change in type of the ipc security hooks
ipc: Directly call the security hook in ipc_ops.associate
ipc/sem: Fix semctl(..., GETPID, ...) between pid namespaces
ipc/msg: Fix msgctl(..., IPC_STAT, ...) between pid namespaces
ipc/shm: Fix shmctl(..., IPC_STAT, ...) between pid namespaces.
ipc/util: Helpers for making the sysvipc operations pid namespace aware
ipc: Move IPCMNI from include/ipc.h into ipc/util.h
msg: Move struct msg_queue into ipc/msg.c
shm: Move struct shmid_kernel into ipc/shm.c
sem: Move struct sem and struct sem_array into ipc/sem.c
msg/security: Pass kern_ipc_perm not msg_queue into the msg_queue security hooks
shm/security: Pass kern_ipc_perm not shmid_kernel into the shm security hooks
sem/security: Pass kern_ipc_perm not sem_array into the sem security hooks
pidns: simpler allocation of pid_* caches
|
|
Pull workqueue updates from Tejun Heo:
"rcu_work addition and a couple trivial changes"
* 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: remove the comment about the old manager_arb mutex
workqueue: fix the comments of nr_idle
fs/aio: Use rcu_work instead of explicit rcu and work item
cgroup: Use rcu_work instead of explicit rcu and work item
RCU, workqueue: Implement rcu_work
|
|
Pull networking updates from David Miller:
1) Support offloading wireless authentication to userspace via
NL80211_CMD_EXTERNAL_AUTH, from Srinivas Dasari.
2) A lot of work on network namespace setup/teardown from Kirill Tkhai.
Setup and cleanup of namespaces now all run asynchronously and thus
performance is significantly increased.
3) Add rx/tx timestamping support to mv88e6xxx driver, from Brandon
Streiff.
4) Support zerocopy on RDS sockets, from Sowmini Varadhan.
5) Use denser instruction encoding in x86 eBPF JIT, from Daniel
Borkmann.
6) Support hw offload of vlan filtering in mvpp2 dreiver, from Maxime
Chevallier.
7) Support grafting of child qdiscs in mlxsw driver, from Nogah
Frankel.
8) Add packet forwarding tests to selftests, from Ido Schimmel.
9) Deal with sub-optimal GSO packets better in BBR congestion control,
from Eric Dumazet.
10) Support 5-tuple hashing in ipv6 multipath routing, from David Ahern.
11) Add path MTU tests to selftests, from Stefano Brivio.
12) Various bits of IPSEC offloading support for mlx5, from Aviad
Yehezkel, Yossi Kuperman, and Saeed Mahameed.
13) Support RSS spreading on ntuple filters in SFC driver, from Edward
Cree.
14) Lots of sockmap work from John Fastabend. Applications can use eBPF
to filter sendmsg and sendpage operations.
15) In-kernel receive TLS support, from Dave Watson.
16) Add XDP support to ixgbevf, this is significant because it should
allow optimized XDP usage in various cloud environments. From Tony
Nguyen.
17) Add new Intel E800 series "ice" ethernet driver, from Anirudh
Venkataramanan et al.
18) IP fragmentation match offload support in nfp driver, from Pieter
Jansen van Vuuren.
19) Support XDP redirect in i40e driver, from Björn Töpel.
20) Add BPF_RAW_TRACEPOINT program type for accessing the arguments of
tracepoints in their raw form, from Alexei Starovoitov.
21) Lots of striding RQ improvements to mlx5 driver with many
performance improvements, from Tariq Toukan.
22) Use rhashtable for inet frag reassembly, from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1678 commits)
net: mvneta: improve suspend/resume
net: mvneta: split rxq/txq init and txq deinit into SW and HW parts
ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh
net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
net: bgmac: Correctly annotate register space
route: check sysctl_fib_multipath_use_neigh earlier than hash
fix typo in command value in drivers/net/phy/mdio-bitbang.
sky2: Increase D3 delay to sky2 stops working after suspend
net/mlx5e: Set EQE based as default TX interrupt moderation mode
ibmvnic: Disable irqs before exiting reset from closed state
net: sched: do not emit messages while holding spinlock
vlan: also check phy_driver ts_info for vlan's real device
Bluetooth: Mark expected switch fall-throughs
Bluetooth: Set HCI_QUIRK_SIMULTANEOUS_DISCOVERY for BTUSB_QCA_ROME
Bluetooth: btrsi: remove unused including <linux/version.h>
Bluetooth: hci_bcm: Remove DMI quirk for the MINIX Z83-4
sh_eth: kill useless check in __sh_eth_get_regs()
sh_eth: add sh_eth_cpu_data::no_xdfar flag
ipv6: factorize sk_wmem_alloc updates done by __ip6_append_data()
ipv4: factorize sk_wmem_alloc updates done by __ip_append_data()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These update the cpuidle poll state definition to reduce excessive
energy usage related to it, add new CPU ID to the RAPL power capping
driver, update the ACPI system suspend code to handle some special
cases better, extend the PM core's device links code slightly, add new
sysfs attribute for better suspend-to-idle diagnostics and easier
hibernation handling, update power management tools and clean up
cpufreq quite a bit.
Specifics:
- Modify the cpuidle poll state implementation to prevent CPUs from
staying in the loop in there for excessive times (Rafael Wysocki).
- Add Intel Cannon Lake chips support to the RAPL power capping
driver (Joe Konno).
- Add reference counting to the device links handling code in the PM
core (Lukas Wunner).
- Avoid reconfiguring GPEs on suspend-to-idle in the ACPI system
suspend code (Rafael Wysocki).
- Allow devices to be put into deeper low-power states via ACPI if
both _SxD and _SxW are missing (Daniel Drake).
- Reorganize the core ACPI suspend-to-idle wakeup code to avoid a
keyboard wakeup issue on Asus UX331UA (Chris Chiu).
- Prevent the PCMCIA library code from aborting suspend-to-idle due
to noirq suspend failures resulting from incorrect assumptions
(Rafael Wysocki).
- Add coupled cpuidle supprt to the Exynos3250 platform (Marek
Szyprowski).
- Add new sysfs file to make it easier to specify the image storage
location during hibernation (Mario Limonciello).
- Add sysfs files for collecting suspend-to-idle usage and time
statistics for CPU idle states (Rafael Wysocki).
- Update the pm-graph utilities (Todd Brandt).
- Reduce the kernel log noise related to reporting Low-power Idle
constraings by the ACPI system suspend code (Rafael Wysocki).
- Make it easier to distinguish dedicated wakeup IRQs in the
/proc/interrupts output (Tony Lindgren).
- Add the frequency table validation in cpufreq to the core and drop
it from a number of cpufreq drivers (Viresh Kumar).
- Drop "cooling-{min|max}-level" for CPU nodes from a couple of DT
bindings (Viresh Kumar).
- Clean up the CPU online error code path in the cpufreq core (Viresh
Kumar).
- Fix assorted issues in the SCPI, CPPC, mediatek and tegra186
cpufreq drivers (Arnd Bergmann, Chunyu Hu, George Cherian, Viresh
Kumar).
- Drop memory allocation error messages from a few places in cpufreq
and cpuildle drivers (Markus Elfring)"
* tag 'pm-4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (56 commits)
ACPI / PM: Fix keyboard wakeup from suspend-to-idle on ASUS UX331UA
cpufreq: CPPC: Use transition_delay_us depending transition_latency
PM / hibernate: Change message when writing to /sys/power/resume
PM / hibernate: Make passing hibernate offsets more friendly
cpuidle: poll_state: Avoid invoking local_clock() too often
PM: cpuidle/suspend: Add s2idle usage and time state attributes
cpuidle: Enable coupled cpuidle support on Exynos3250 platform
cpuidle: poll_state: Add time limit to poll_idle()
cpufreq: tegra186: Don't validate the frequency table twice
cpufreq: speedstep: Don't validate the frequency table twice
cpufreq: sparc: Don't validate the frequency table twice
cpufreq: sh: Don't validate the frequency table twice
cpufreq: sfi: Don't validate the frequency table twice
cpufreq: scpi: Don't validate the frequency table twice
cpufreq: sc520: Don't validate the frequency table twice
cpufreq: s3c24xx: Don't validate the frequency table twice
cpufreq: qoirq: Don't validate the frequency table twice
cpufreq: pxa: Don't validate the frequency table twice
cpufreq: ppc_cbe: Don't validate the frequency table twice
cpufreq: powernow: Don't validate the frequency table twice
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull removal of in-kernel calls to syscalls from Dominik Brodowski:
"System calls are interaction points between userspace and the kernel.
Therefore, system call functions such as sys_xyzzy() or
compat_sys_xyzzy() should only be called from userspace via the
syscall table, but not from elsewhere in the kernel.
At least on 64-bit x86, it will likely be a hard requirement from
v4.17 onwards to not call system call functions in the kernel: It is
better to use use a different calling convention for system calls
there, where struct pt_regs is decoded on-the-fly in a syscall wrapper
which then hands processing over to the actual syscall function. This
means that only those parameters which are actually needed for a
specific syscall are passed on during syscall entry, instead of
filling in six CPU registers with random user space content all the
time (which may cause serious trouble down the call chain). Those
x86-specific patches will be pushed through the x86 tree in the near
future.
Moreover, rules on how data may be accessed may differ between kernel
data and user data. This is another reason why calling sys_xyzzy() is
generally a bad idea, and -- at most -- acceptable in arch-specific
code.
This patchset removes all in-kernel calls to syscall functions in the
kernel with the exception of arch/. On top of this, it cleans up the
three places where many syscalls are referenced or prototyped, namely
kernel/sys_ni.c, include/linux/syscalls.h and include/linux/compat.h"
* 'syscalls-next' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux: (109 commits)
bpf: whitelist all syscalls for error injection
kernel/sys_ni: remove {sys_,sys_compat} from cond_syscall definitions
kernel/sys_ni: sort cond_syscall() entries
syscalls/x86: auto-create compat_sys_*() prototypes
syscalls: sort syscall prototypes in include/linux/compat.h
net: remove compat_sys_*() prototypes from net/compat.h
syscalls: sort syscall prototypes in include/linux/syscalls.h
kexec: move sys_kexec_load() prototype to syscalls.h
x86/sigreturn: use SYSCALL_DEFINE0
x86: fix sys_sigreturn() return type to be long, not unsigned long
x86/ioport: add ksys_ioperm() helper; remove in-kernel calls to sys_ioperm()
mm: add ksys_readahead() helper; remove in-kernel calls to sys_readahead()
mm: add ksys_mmap_pgoff() helper; remove in-kernel calls to sys_mmap_pgoff()
mm: add ksys_fadvise64_64() helper; remove in-kernel call to sys_fadvise64_64()
fs: add ksys_fallocate() wrapper; remove in-kernel calls to sys_fallocate()
fs: add ksys_p{read,write}64() helpers; remove in-kernel calls to syscalls
fs: add ksys_truncate() wrapper; remove in-kernel calls to sys_truncate()
fs: add ksys_sync_file_range helper(); remove in-kernel calls to syscall
kernel: add ksys_setsid() helper; remove in-kernel call to sys_setsid()
kernel: add ksys_unshare() helper; remove in-kernel calls to sys_unshare()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pul removal of obsolete architecture ports from Arnd Bergmann:
"This removes the entire architecture code for blackfin, cris, frv,
m32r, metag, mn10300, score, and tile, including the associated device
drivers.
I have been working with the (former) maintainers for each one to
ensure that my interpretation was right and the code is definitely
unused in mainline kernels. Many had fond memories of working on the
respective ports to start with and getting them included in upstream,
but also saw no point in keeping the port alive without any users.
In the end, it seems that while the eight architectures are extremely
different, they all suffered the same fate: There was one company in
charge of an SoC line, a CPU microarchitecture and a software
ecosystem, which was more costly than licensing newer off-the-shelf
CPU cores from a third party (typically ARM, MIPS, or RISC-V). It
seems that all the SoC product lines are still around, but have not
used the custom CPU architectures for several years at this point. In
contrast, CPU instruction sets that remain popular and have actively
maintained kernel ports tend to all be used across multiple licensees.
[ See the new nds32 port merged in the previous commit for the next
generation of "one company in charge of an SoC line, a CPU
microarchitecture and a software ecosystem" - Linus ]
The removal came out of a discussion that is now documented at
https://lwn.net/Articles/748074/. Unlike the original plans, I'm not
marking any ports as deprecated but remove them all at once after I
made sure that they are all unused. Some architectures (notably tile,
mn10300, and blackfin) are still being shipped in products with old
kernels, but those products will never be updated to newer kernel
releases.
After this series, we still have a few architectures without mainline
gcc support:
- unicore32 and hexagon both have very outdated gcc releases, but the
maintainers promised to work on providing something newer. At least
in case of hexagon, this will only be llvm, not gcc.
- openrisc, risc-v and nds32 are still in the process of finishing
their support or getting it added to mainline gcc in the first
place. They all have patched gcc-7.3 ports that work to some
degree, but complete upstream support won't happen before gcc-8.1.
Csky posted their first kernel patch set last week, their situation
will be similar
[ Palmer Dabbelt points out that RISC-V support is in mainline gcc
since gcc-7, although gcc-7.3.0 is the recommended minimum - Linus ]"
This really says it all:
2498 files changed, 95 insertions(+), 467668 deletions(-)
* tag 'arch-removal' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (74 commits)
MAINTAINERS: UNICORE32: Change email account
staging: iio: remove iio-trig-bfin-timer driver
tty: hvc: remove tile driver
tty: remove bfin_jtag_comm and hvc_bfin_jtag drivers
serial: remove tile uart driver
serial: remove m32r_sio driver
serial: remove blackfin drivers
serial: remove cris/etrax uart drivers
usb: Remove Blackfin references in USB support
usb: isp1362: remove blackfin arch glue
usb: musb: remove blackfin port
usb: host: remove tilegx platform glue
pwm: remove pwm-bfin driver
i2c: remove bfin-twi driver
spi: remove blackfin related host drivers
watchdog: remove bfin_wdt driver
can: remove bfin_can driver
mmc: remove bfin_sdh driver
input: misc: remove blackfin rotary driver
input: keyboard: remove bf54x driver
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull wait_var_event updates from Ingo Molnar:
"This introduces the new wait_var_event() API, which is a more flexible
waiting primitive than wait_on_atomic_t().
All wait_on_atomic_t() users are migrated over to the new API and
wait_on_atomic_t() is removed. The migration fixes one bug and should
result in no functional changes for the other usecases"
* 'sched-wait-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/wait: Improve __var_waitqueue() code generation
sched/wait: Remove the wait_on_atomic_t() API
sched/wait, arch/mips: Fix and convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/ocfs2: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/nfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/fscache: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/btrfs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, fs/afs: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, drivers/media: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait, drivers/drm: Convert wait_on_atomic_t() usage to the new wait_var_event() API
sched/wait: Introduce wait_var_event()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP hotplug updates from Ingo Molnar:
"Simplify the CPU hot-plug state machine"
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Fix unused function warning
cpu/hotplug: Merge cpuhp_bp_states and cpuhp_ap_states
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The main scheduler changes in this cycle were:
- NUMA balancing improvements (Mel Gorman)
- Further load tracking improvements (Patrick Bellasi)
- Various NOHZ balancing cleanups and optimizations (Peter Zijlstra)
- Improve blocked load handling, in particular we can now reduce and
eventually stop periodic load updates on 'very idle' CPUs. (Vincent
Guittot)
- On isolated CPUs offload the final 1Hz scheduler tick as well, plus
related cleanups and reorganization. (Frederic Weisbecker)
- Core scheduler code cleanups (Ingo Molnar)"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
sched/core: Update preempt_notifier_key to modern API
sched/cpufreq: Rate limits for SCHED_DEADLINE
sched/fair: Update util_est only on util_avg updates
sched/cpufreq/schedutil: Use util_est for OPP selection
sched/fair: Use util_est in LB and WU paths
sched/fair: Add util_est on top of PELT
sched/core: Remove TASK_ALL
sched/completions: Use bool in try_wait_for_completion()
sched/fair: Update blocked load when newly idle
sched/fair: Move idle_balance()
sched/nohz: Merge CONFIG_NO_HZ_COMMON blocks
sched/fair: Move rebalance_domains()
sched/nohz: Optimize nohz_idle_balance()
sched/fair: Reduce the periodic update duration
sched/nohz: Stop NOHZ stats when decayed
sched/cpufreq: Provide migration hint
sched/nohz: Clean up nohz enter/exit
sched/fair: Update blocked load from NEWIDLE
sched/fair: Add NOHZ stats balancing
sched/fair: Restructure nohz_balance_kick()
...
|
|
This keeps it in line with the SYSCALL_DEFINEx() / COMPAT_SYSCALL_DEFINEx()
calling convention.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Shuffle the cond_syscall() entries in kernel/sys_ni.c around so that they
are kept in the same order as in include/uapi/asm-generic/unistd.h. For
better structuring, add the same comments as in that file, but keep a few
additional comments and extend the commentary where it seems useful.
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Using this helper allows us to avoid the in-kernel call to the
sys_setsid() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_setsid().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Using this helper allows us to avoid the in-kernel calls to the
sys_unshare() syscall. The ksys_ prefix denotes that this function is meant
as a drop-in replacement for the syscall. In particular, it uses the same
calling convention as sys_unshare().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Using this helper allows us to avoid the in-kernel calls to the
sys_sync() syscall. The ksys_ prefix denotes that this function
is meant as a drop-in replacement for the syscall. In particular, it
uses the same calling convention as sys_sync().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Using the fs-interal do_fchownat() wrapper allows us to get rid of
fs-internal calls to the sys_fchownat() syscall.
Introducing the ksys_fchown() helper and the ksys_{,}chown() wrappers
allows us to avoid the in-kernel calls to the sys_{,l,f}chown() syscalls.
The ksys_ prefix denotes that these functions are meant as a drop-in
replacement for the syscalls. In particular, they use the same calling
convention as sys_{,l,f}chown().
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
While sys32_quotactl() is only needed on x86, it can use the recommended
COMPAT_SYSCALL_DEFINEx() machinery for its setup.
Acked-by: Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Move compat_sys_move_pages() to mm/migrate.c and make it call a newly
introduced helper -- kernel_move_pages() -- instead of the syscall.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Move compat_sys_migrate_pages() to mm/mempolicy.c and make it call a newly
introduced helper -- kernel_migrate_pages() -- instead of the syscall.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Using the sched-internal do_sched_yield() helper allows us to get rid of
the sched-internal call to the sys_sched_yield() syscall.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Ingo Molnar <mingo@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Using these helpers allows us to avoid the in-kernel calls to these
syscalls: sys_setregid(), sys_setgid(), sys_setreuid(), sys_setuid(),
sys_setresuid(), sys_setresgid(), sys_setfsuid(), and sys_setfsgid().
The ksys_ prefix denotes that these function are meant as a drop-in
replacement for the syscall. In particular, they use the same calling
convention.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
syscall
Using this helper allows us to avoid the in-kernel call to the
compat_sys_sigaltstack() syscall.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|
|
Using the do_getpgid() helper removes an in-kernel call to the
sys_getpgid() syscall.
This patch is part of a series which removes in-kernel calls to syscalls.
On this basis, the syscall entry path can be streamlined. For details, see
http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
|