summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2014-11-11audit: keep inode pinnedMiklos Szeredi
Audit rules disappear when an inode they watch is evicted from the cache. This is likely not what we want. The guilty commit is "fsnotify: allow marks to not pin inodes in core", which didn't take into account that audit_tree adds watches with a zero mask. Adding any mask should fix this. Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core") Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: stable@vger.kernel.org # 2.6.36+ Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-11-11tracing: Add entry->next_cpu to trace_ctxwake_bin()Jiang Liu
Function trace_ctxwake_bin() misses ctx_switch_entry->next_cpu field, so user will get stale value for "next_cpu". Link: http://lkml.kernel.org/p/1377176379-27908-1-git-send-email-liuj97@gmail.com Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11tracing: Move tracing_sched_{switch,wakeup}() into wakeup tracerSteven Rostedt (Red Hat)
The only code that references tracing_sched_switch_trace() and tracing_sched_wakeup_trace() is the wakeup latency tracer. Those two functions use to belong to the sched_switch tracer which has long been removed. These functions were left behind because the wakeup latency tracer used them. But since the wakeup latency tracer is the only one to use them, they should be static functions inside that code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11tracing: Kill the dead code in probe_sched_switch() and probe_sched_wakeup()Oleg Nesterov
After the previous patch it is clear that "tracer_enabled" can never be true, we can remove the "if (tracer_enabled)" code in probe_sched_switch() and probe_sched_wakeup(). Plus we can obviously remove tracer_enabled, ctx_trace, and sched_stopped as well. Link: http://lkml.kernel.org/p/20140723193503.GA30217@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11tracing: Kill tracing_{start,stop}_sched_switch_record() and ↵Oleg Nesterov
tracing_sched_switch_assign_trace() tracing_{start,stop}_sched_switch_record() have no callers since 87d80de2800d "tracing: Remove obsolete sched_switch tracer". The last caller of tracing_sched_switch_assign_trace() was removed by 30dbb20e68e6 "tracing: Remove boot tracer". Link: http://lkml.kernel.org/p/20140723193501.GA30214@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11ftrace: Add more information to ftrace_bug() outputSteven Rostedt (Red Hat)
With the introduction of the dynamic trampolines, it is useful that if things go wrong that ftrace_bug() produces more information about what the current state is. This can help debug issues that may arise. Ftrace has lots of checks to make sure that the state of the system it touchs is exactly what it expects it to be. When it detects an abnormality it calls ftrace_bug() and disables itself to prevent any further damage. It is crucial that ftrace_bug() produces sufficient information that can be used to debug the situation. Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolinesSteven Rostedt (Red Hat)
When the static ftrace_ops (like function tracer) enables tracing, and it is the only callback that is referencing a function, a trampoline is dynamically allocated to the function that calls the callback directly instead of calling a loop function that iterates over all the registered ftrace ops (if more than one ops is registered). But when it comes to dynamically allocated ftrace_ops, where they may be freed, on a CONFIG_PREEMPT kernel there's no way to know when it is safe to free the trampoline. If a task was preempted while executing on the trampoline, there's currently no way to know when it will be off that trampoline. But this is not true when it comes to !CONFIG_PREEMPT. The current method of calling schedule_on_each_cpu() will force tasks off the trampoline, becaues they can not schedule while on it (kernel preemption is not configured). That means it is safe to free a dynamically allocated ftrace ops trampoline when CONFIG_PREEMPT is not configured. Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Tested-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-11kernel/debug/debug_core.c: Logging clean-upFabian Frederick
-Convert printk( to pr_foo() -Add pr_fmt -Coalesce formats Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kgdb: timeout if secondary CPUs ignore the roundupDaniel Thompson
Currently if an active CPU fails to respond to a roundup request the CPU that requested the roundup will become stuck. This needlessly reduces the robustness of the debugger. This patch introduces a timeout allowing the system state to be examined even when the system contains unresponsive processors. It also modifies kdb's cpu command to make it censor attempts to switch to unresponsive processors and to report their state as (D)ead. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Allow access to sensitive commands to be restricted by defaultDaniel Thompson
Currently kiosk mode must be explicitly requested by the bootloader or userspace. It is convenient to be able to change the default value in a similar manner to CONFIG_MAGIC_SYSRQ_DEFAULT_MASK. Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Add enable mask for groups of commandsAnton Vorontsov
Currently all kdb commands are enabled whenever kdb is deployed. This makes it difficult to deploy kdb to help debug certain types of systems. Android phones provide one example; the FIQ debugger found on some Android devices has a deliberately weak set of commands to allow the debugger to enabled very late in the production cycle. Certain kiosk environments offer another interesting case where an engineer might wish to probe the system state using passive inspection commands without providing sufficient power for a passer by to root it. Without any restrictions, obtaining the root rights via KDB is a matter of a few commands, and works everywhere. For example, log in as a normal user: cbou:~$ id uid=1001(cbou) gid=1001(cbou) groups=1001(cbou) Now enter KDB (for example via sysrq): Entering kdb (current=0xffff8800065bc740, pid 920) due to Keyboard Entry kdb> ps 23 sleeping system daemon (state M) processes suppressed, use 'ps A' to see all. Task Addr Pid Parent [*] cpu State Thread Command 0xffff8800065bc740 920 919 1 0 R 0xffff8800065bca20 *bash 0xffff880007078000 1 0 0 0 S 0xffff8800070782e0 init [...snip...] 0xffff8800065be3c0 918 1 0 0 S 0xffff8800065be6a0 getty 0xffff8800065b9c80 919 1 0 0 S 0xffff8800065b9f60 login 0xffff8800065bc740 920 919 1 0 R 0xffff8800065bca20 *bash All we need is the offset of cred pointers. We can look up the offset in the distro's kernel source, but it is unnecessary. We can just start dumping init's task_struct, until we see the process name: kdb> md 0xffff880007078000 0xffff880007078000 0000000000000001 ffff88000703c000 ................ 0xffff880007078010 0040210000000002 0000000000000000 .....!@......... [...snip...] 0xffff8800070782b0 ffff8800073e0580 ffff8800073e0580 ..>.......>..... 0xffff8800070782c0 0000000074696e69 0000000000000000 init............ ^ Here, 'init'. Creds are just above it, so the offset is 0x02b0. Now we set up init's creds for our non-privileged shell: kdb> mm 0xffff8800065bc740+0x02b0 0xffff8800073e0580 0xffff8800065bc9f0 = 0xffff8800073e0580 kdb> mm 0xffff8800065bc740+0x02b8 0xffff8800073e0580 0xffff8800065bc9f8 = 0xffff8800073e0580 And thus gaining the root: kdb> go cbou:~$ id uid=0(root) gid=0(root) groups=0(root) cbou:~$ bash root:~# p.s. No distro enables kdb by default (although, with a nice KDB-over-KMS feature availability, I would expect at least some would enable it), so it's not actually some kind of a major issue. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Categorize kdb commands (similar to SysRq categorization)Daniel Thompson
This patch introduces several new flags to collect kdb commands into groups (later allowing them to be optionally disabled). This follows similar prior art to enable/disable magic sysrq commands. The commands have been categorized as follows: Always on: go (w/o args), env, set, help, ?, cpu (w/o args), sr, dmesg, disable_nmi, defcmd, summary, grephelp Mem read: md, mdr, mdp, mds, ef, bt (with args), per_cpu Mem write: mm Reg read: rd Reg write: go (with args), rm Inspect: bt (w/o args), btp, bta, btc, btt, ps, pid, lsmod Flow ctrl: bp, bl, bph, bc, be, bd, ss Signal: kill Reboot: reboot All: cpu, kgdb, (and all of the above), nmi_console Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Remove KDB_REPEAT_NONE flagAnton Vorontsov
Since we now treat KDB_REPEAT_* as flags, there is no need to pass KDB_REPEAT_NONE. It's just the default behaviour when no flags are specified. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Use KDB_REPEAT_* values as flagsAnton Vorontsov
The actual values of KDB_REPEAT_* enum values and overall logic stayed the same, but we now treat the values as flags. This makes it possible to add other flags and combine them, plus makes the code a lot simpler and shorter. But functionality-wise, there should be no changes. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Rename kdb_register_repeat() to kdb_register_flags()Anton Vorontsov
We're about to add more options for commands behaviour, so let's give a more generic name to the low-level kdb command registration function. There are just various renames, no functional changes. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Rename kdb_repeat_t to kdb_cmdflags_t, cmd_repeat to cmd_flagsAnton Vorontsov
We're about to add more options for command behaviour, so let's expand the meaning of kdb_repeat_t. So far we just do various renames, there should be no functional changes. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11kdb: Remove currently unused kdbtab_t->cmd_flagsAnton Vorontsov
The struct member is never used in the code, so we can remove it. We will introduce real flags soon by renaming cmd_repeat to cmd_flags. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2014-11-11params: cleanup sysfs allocationRusty Russell
commit 63662139e519ce06090b2759cf4a1d291b9cc0e2 attempted to patch a leak (which would only happen on OOM, ie. never), but it didn't quite work. This rewrites the code to be as simple as possible. add_sysfs_param() adds a parameter. If it fails, it's the caller's responsibility to clean up the parameters which already exist. The kzalloc-then-always-krealloc pattern is perhaps overly simplistic, but this code has clearly confused people. It worked on me... Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11kernel:module Fix coding style errors and warnings.Ionut Alexa
Fixed codin style errors and warnings. Changes printk with print_debug/warn. Changed seq_printf to seq_puts. Signed-off-by: Ionut Alexa <ionut.m.alexa@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed bogus KERN_DEFAULT conversion)
2014-11-11module: Remove stop_machine from module unloadingMasami Hiramatsu
Remove stop_machine from module unloading by adding new reference counting algorithm. This atomic refcounter works like a semaphore, it can get (be incremented) only when the counter is not 0. When loading a module, kmodule subsystem sets the counter MODULE_REF_BASE (= 1). And when unloading the module, it subtracts MODULE_REF_BASE from the counter. If no one refers the module, the refcounter becomes 0 and we can remove the module safely. If someone referes it, we try to recover the counter by adding MODULE_REF_BASE unless the counter becomes 0, because the referrer can put the module right before recovering. If the recovering is failed, we can get the 0 refcount and it never be incremented again, it can be removed safely too. Note that __module_get() forcibly gets the module refcounter, users should use try_module_get() instead of that. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11module: Replace module_ref with atomic_t refcntMasami Hiramatsu
Replace module_ref per-cpu complex reference counter with an atomic_t simple refcnt. This is for code simplification. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11lib/bug: Use RCU list ops for module_bug_listMasami Hiramatsu
Actually since module_bug_list should be used in BUG context, we may not need this. But for someone who want to use this from normal context, this makes module_bug_list an RCU list. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11module: Unlink module with RCU synchronizing instead of stop_machineMasami Hiramatsu
Unlink module from module list with RCU synchronizing instead of using stop_machine(). Since module list is already protected by rcu, we don't need stop_machine() anymore. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-11module: Wait for RCU synchronizing before releasing a moduleMasami Hiramatsu
Wait for RCU synchronizing on failure path of module loading before releasing struct module, because the memory of mod->list can still be accessed by list walkers (e.g. kallsyms). Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2014-11-10tracing: Do not risk busy looping in buffer spliceRabin Vincent
If the read loop in trace_buffers_splice_read() keeps failing due to memory allocation failures without reading even a single page then this function will keep busy looping. Remove the risk for that by exiting the function if memory allocation failures are seen. Link: http://lkml.kernel.org/r/1415309167-2373-2-git-send-email-rabin@rab.in Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-10tracing: Do not busy wait in buffer spliceRabin Vincent
On a !PREEMPT kernel, attempting to use trace-cmd results in a soft lockup: # trace-cmd record -e raw_syscalls:* -F false NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61] ... Call Trace: [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90 [<ffffffff81092e25>] wait_on_pipe+0x35/0x40 [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0 [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0 [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290 [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff8112415d>] do_splice_to+0x6d/0x90 [<ffffffff81126971>] SyS_splice+0x7c1/0x800 [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8 The problem is this: tracing_buffers_splice_read() calls ring_buffer_wait() to wait for data in the ring buffers. The buffers are not empty so ring_buffer_wait() returns immediately. But tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1, meaning it only wants to read a full page. When the full page is not available, tracing_buffers_splice_read() tries to wait again with ring_buffer_wait(), which again returns immediately, and so on. Fix this by adding a "full" argument to ring_buffer_wait() which will make ring_buffer_wait() wait until the writer has left the reader's page, i.e. until full-page reads will succeed. Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.16+ Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function") Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-11-10sched/numa: Fix out of bounds read in sched_init_numa()Andrey Ryabinin
On latest mm + KASan patchset I've got this: ================================================================== BUG: AddressSanitizer: out of bounds access in sched_init_smp+0x3ba/0x62c at addr ffff88006d4bee6c ============================================================================= BUG kmalloc-8 (Not tainted): kasan error ----------------------------------------------------------------------------- Disabling lock debugging due to kernel taint INFO: Allocated in alloc_vfsmnt+0xb0/0x2c0 age=75 cpu=0 pid=0 __slab_alloc+0x4b4/0x4f0 __kmalloc_track_caller+0x15f/0x1e0 kstrdup+0x44/0x90 alloc_vfsmnt+0xb0/0x2c0 vfs_kern_mount+0x35/0x190 kern_mount_data+0x25/0x50 pid_ns_prepare_proc+0x19/0x50 alloc_pid+0x5e2/0x630 copy_process.part.41+0xdf5/0x2aa0 do_fork+0xf5/0x460 kernel_thread+0x21/0x30 rest_init+0x1e/0x90 start_kernel+0x522/0x531 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x15b/0x16a INFO: Slab 0xffffea0001b52f80 objects=24 used=22 fp=0xffff88006d4befc0 flags=0x100000000004080 INFO: Object 0xffff88006d4bed20 @offset=3360 fp=0xffff88006d4bee70 Bytes b4 ffff88006d4bed10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ Object ffff88006d4bed20: 70 72 6f 63 00 6b 6b a5 proc.kk. Redzone ffff88006d4bed28: cc cc cc cc cc cc cc cc ........ Padding ffff88006d4bee68: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ CPU: 0 PID: 1 Comm: swapper/0 Tainted: G B 3.18.0-rc3-mm1+ #108 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 ffff88006d4be000 0000000000000000 ffff88006d4bed20 ffff88006c86fd18 ffffffff81cd0a59 0000000000000058 ffff88006d404240 ffff88006c86fd48 ffffffff811fa3a8 ffff88006d404240 ffffea0001b52f80 ffff88006d4bed20 Call Trace: dump_stack (lib/dump_stack.c:52) print_trailer (mm/slub.c:645) object_err (mm/slub.c:652) ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063) kasan_report_error (mm/kasan/report.c:102 mm/kasan/report.c:178) ? kasan_poison_shadow (mm/kasan/kasan.c:48) ? kasan_unpoison_shadow (mm/kasan/kasan.c:54) ? kasan_poison_shadow (mm/kasan/kasan.c:48) ? kasan_kmalloc (mm/kasan/kasan.c:311) __asan_load4 (mm/kasan/kasan.c:371) ? sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063) sched_init_smp (kernel/sched/core.c:6552 kernel/sched/core.c:7063) kernel_init_freeable (init/main.c:869 init/main.c:997) ? finish_task_switch (kernel/sched/sched.h:1036 kernel/sched/core.c:2248) ? rest_init (init/main.c:924) kernel_init (init/main.c:929) ? rest_init (init/main.c:924) ret_from_fork (arch/x86/kernel/entry_64.S:348) ? rest_init (init/main.c:924) Read of size 4 by task swapper/0: Memory state around the buggy address: ffff88006d4beb80: fc fc fc fc fc fc fc fc fc fc 00 fc fc fc fc fc ffff88006d4bec00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bec80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bed00: fc fc fc fc 00 fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bed80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88006d4bee00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc 04 fc ^ ffff88006d4bee80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bef00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88006d4bef80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff88006d4bf000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88006d4bf080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Zero 'level' (e.g. on non-NUMA system) causing out of bounds access in this line: sched_max_numa_distance = sched_domains_numa_distance[level - 1]; Fix this by exiting from sched_init_numa() earlier. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Reviewed-by: Rik van Riel <riel@redhat.com> Fixes: 9942f79ba ("sched/numa: Export info needed for NUMA balancing on complex topologies") Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1415372020-1871-1-git-send-email-a.ryabinin@samsung.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-09genirq: Generic chip: Add big endian I/O accessorsKevin Cernekee
Use io{read,write}32be if the caller specified IRQ_GC_BE_IO when creating the irqchip. Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/1415342669-30640-5-git-send-email-cernekee@gmail.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2014-11-09genirq: Generic chip: Change irq_reg_{readl,writel} argumentsKevin Cernekee
Pass in the irq_chip_generic struct so we can use different readl/writel settings for each irqchip driver, when appropriate. Compute (gc->reg_base + reg_offset) in the helper function because this is pretty much what all callers want to do anyway. Compile-tested using the following configurations: at91_dt_defconfig (CONFIG_ATMEL_AIC_IRQ=y) sama5_defconfig (CONFIG_ATMEL_AIC5_IRQ=y) sunxi_defconfig (CONFIG_ARCH_SUNXI=y) tb10x (ARC) is untested. Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lkml.kernel.org/r/1415342669-30640-3-git-send-email-cernekee@gmail.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
2014-11-08PM / sleep: Fix entering suspend-to-IDLE if no freeze_oops is setDmitry Eremin-Solenikov
If no freeze_ops is set, trying to enter suspend-to-IDLE will cause a nice oops in platform_suspend_prepare_late(). Add respective checks to platform_suspend_prepare_late() and platform_resume_early() functions. Fixes: a8d46b9e4e48 (ACPI / sleep: Rework the handling of ACPI GPE wakeup ...) Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-11-05tty: Move session_of_pgrp() and make staticPeter Hurley
tiocspgrp() is the lone caller of session_of_pgrp(); relocate and limit to file scope. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Reviewed-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-11-05pstore: Honor dmesg_restrict sysctl on dmesg dumpsSebastian Schmidt
When the kernel.dmesg_restrict restriction is in place, only users with CAP_SYSLOG should be able to access crash dumps (like: attacker is trying to exploit a bug, watchdog reboots, attacker can happily read crash dumps and logs). This puts the restriction on console-* types as well as sensitive information could have been leaked there. Other log types are unaffected. Signed-off-by: Sebastian Schmidt <yath@yath.de> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Tony Luck <tony.luck@intel.com>
2014-11-04sched: Refactor task_struct to use numa_faults instead of numa_* pointersIulia Manda
This patch simplifies task_struct by removing the four numa_* pointers in the same array and replacing them with the array pointer. By doing this, on x86_64, the size of task_struct is reduced by 3 ulong pointers (24 bytes on x86_64). A new parameter is added to the task_faults_idx function so that it can return an index to the correct offset, corresponding with the old precalculated pointers. All of the code in sched/ that depended on task_faults_idx and numa_* was changed in order to match the new logic. Signed-off-by: Iulia Manda <iulia.manda21@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: mgorman@suse.de Cc: dave@stgolabs.net Cc: riel@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141031001331.GA30662@winterfell Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Don't check CONFIG_SMP in switched_from_dl()Wanpeng Li
There are both UP and SMP version of pull_dl_task(), so don't need to check CONFIG_SMP in switched_from_dl(); Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-6-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Reschedule from switched_from_dl() after a successful pullWanpeng Li
In switched_from_dl() we have to issue a resched if we successfully pulled some task from other cpus. This patch also aligns the behavior with -rt. Suggested-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-5-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Push task away if the deadline is equal to curr during wakeupWanpeng Li
This patch pushes task away if the dealine of the task is equal to current during wake up. The same behavior as rt class. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-4-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Add deadline rq status printWanpeng Li
This patch add deadline rq status print. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-3-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Fix artificial overrun introduced by yield_task_dl()Wanpeng Li
The yield semantic of deadline class is to reduce remaining runtime to zero, and then update_curr_dl() will stop it. However, comsumed bandwidth is reduced from the budget of yield task again even if it has already been set to zero which leads to artificial overrun. This patch fix it by make sure we don't steal some more time from the task that yielded in update_curr_dl(). Suggested-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-2-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/rt: Clean up check_preempt_equal_prio()Wanpeng Li
This patch checks if current can be pushed/pulled somewhere else in advance to make logic clear, the same behavior as dl class. - If current can't be migrated, useless to reschedule, let's hope task can move out. - If task is migratable, so let's not schedule it and see if it can be pushed or pulled somewhere else. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@arm.com> Cc: Kirill Tkhai <ktkhai@parallels.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414708776-124078-1-git-send-email-wanpeng.li@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/core: Use dl_bw_of() under rcu_read_lock_sched()Juri Lelli
As per commit f10e00f4bf36 ("sched/dl: Use dl_bw_of() under rcu_read_lock_sched()"), dl_bw_of() has to be protected by rcu_read_lock_sched(). Signed-off-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414497286-28824-1-git-send-email-juri.lelli@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched: Check if we got a shallowest_idle_cpu before searching for ↵Yao Dongdong
least_loaded_cpu Idle cpu is idler than non-idle cpu, so we needn't search for least_loaded_cpu after we have found an idle cpu. Signed-off-by: Yao Dongdong <yaodongdong@huawei.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414469286-6023-1-git-send-email-yaodongdong@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/deadline: Implement cancel_dl_timer() to use in switched_from_dl()Kirill Tkhai
Currently used hrtimer_try_to_cancel() is racy: raw_spin_lock(&rq->lock) ... dl_task_timer raw_spin_lock(&rq->lock) ... raw_spin_lock(&rq->lock) ... switched_from_dl() ... ... hrtimer_try_to_cancel() ... ... switched_to_fair() ... ... ... ... ... ... ... ... raw_spin_unlock(&rq->lock) ... (asquired) ... ... ... ... ... ... do_exit() ... ... schedule() ... ... raw_spin_lock(&rq->lock) ... raw_spin_unlock(&rq->lock) ... ... ... raw_spin_unlock(&rq->lock) ... raw_spin_lock(&rq->lock) ... ... (asquired) put_task_struct() ... ... free_task_struct() ... ... ... ... raw_spin_unlock(&rq->lock) ... (asquired) ... ... ... ... ... (use after free) ... So, let's implement 100% guaranteed way to cancel the timer and let's be sure we are safe even in very unlikely situations. rq unlocking does not limit the area of switched_from_dl() use, because this has already been possible in pull_dl_task() below. Let's consider the safety of of this unlocking. New code in the patch is working when hrtimer_try_to_cancel() fails. This means the callback is running. In this case hrtimer_cancel() is just waiting till the callback is finished. Two 1) Since we are in switched_from_dl(), new class is not dl_sched_class and new prio is not less MAX_DL_PRIO. So, the callback returns early; it's right after !dl_task() check. After that hrtimer_cancel() returns back too. The above is: raw_spin_lock(rq->lock); ... ... dl_task_timer() ... raw_spin_lock(rq->lock); switched_from_dl() ... hrtimer_try_to_cancel() ... raw_spin_unlock(rq->lock); ... hrtimer_cancel() ... ... raw_spin_unlock(rq->lock); ... return HRTIMER_NORESTART; ... ... raw_spin_lock(rq->lock); ... 2) But the below is also possible: dl_task_timer() raw_spin_lock(rq->lock); ... raw_spin_unlock(rq->lock); raw_spin_lock(rq->lock); ... switched_from_dl() ... hrtimer_try_to_cancel() ... ... return HRTIMER_NORESTART; raw_spin_unlock(rq->lock); ... hrtimer_cancel(); ... raw_spin_lock(rq->lock); ... In this case hrtimer_cancel() returns immediately. Very unlikely case, just to mention. Nobody can manipulate the task, because check_class_changed() is always called with pi_lock locked. Nobody can force the task to participate in (concurrent) priority inheritance schemes (the same reason). All concurrent task operations require pi_lock, which is held by us. No deadlocks with dl_task_timer() are possible, because it returns right after !dl_task() check (it does nothing). If we receive a new dl_task during the time of unlocked rq, we just don't have to do pull_dl_task() in switched_from_dl() further. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> [ Added comments] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414420852.19914.186.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched: Use WARN_ONCE for the might_sleep() TASK_RUNNING testPeter Zijlstra
In some cases this can trigger a true flood of output. Requested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04audit, sched/wait: Fixup kauditd_thread() wait loopPeter Zijlstra
The kauditd_thread wait loop is a bit iffy; it has a number of problems: - calls try_to_freeze() before schedule(); you typically want the thread to re-evaluate the sleep condition when unfreezing, also freeze_task() issues a wakeup. - it unconditionally does the {add,remove}_wait_queue(), even when the sleep condition is false. Use wait_event_freezable() that does the right thing. Reported-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Eric Paris <eparis@redhat.com> Cc: oleg@redhat.com Cc: Eric Paris <eparis@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20141002102251.GA6324@worktop.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched/wait: Fix a kthread race with wait_woken()Peter Zijlstra
There is a race between kthread_stop() and the new wait_woken() that can result in a lack of progress. CPU 0 | CPU 1 | rfcomm_run() | kthread_stop() ... | if (!test_bit(KTHREAD_SHOULD_STOP)) | | set_bit(KTHREAD_SHOULD_STOP) | wake_up_process() wait_woken() | wait_for_completion() set_current_state(INTERRUPTIBLE) | if (!WQ_FLAG_WOKEN) | schedule_timeout() | | After which both tasks will wait.. forever. Fix this by having wait_woken() check for kthread_should_stop() but only for kthreads (obviously). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-04sched: Remove lockdep check in sched_move_task()Kirill Tkhai
sched_move_task() is the only interface to change sched_task_group: cpu_cgrp_subsys methods and autogroup_move_group() use it. Everything is synchronized by task_rq_lock(), so cpu_cgroup_attach() is ordered with other users of sched_move_task(). This means we do no need RCU here: if we've dereferenced a tg here, the .attach method hasn't been called for it yet. Thus, we should pass "true" to task_css_check() to silence lockdep warnings. Fixes: eeb61e53ea19 ("sched: Fix race between task_group and sched_task_group") Reported-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1414473874.8574.2.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-03rcutorture: Fix rcu_torture_cbflood() memory leakPaul E. McKenney
Commit 38706bc5a29a (rcutorture: Add callback-flood test) vmalloc()ed a bunch of RCU callbacks, but failed to free them. This commit fixes that oversight. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcutorture: Add early boot self testsPranith Kumar
Add early boot self tests for RCU under CONFIG_PROVE_RCU. Currently the only test is adding a dummy callback which increments a counter which we then later verify after calling rcu_barrier*(). Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2014-11-03cpu: Avoid puts_pending overflowPaul E. McKenney
A long string of get_online_cpus() with each followed by a put_online_cpu() that fails to acquire cpu_hotplug.lock can result in overflow of the cpu_hotplug.puts_pending counter. Although this is perhaps improbably, a system with absolutely no CPU-hotplug operations will have an arbitrarily long time in which this overflow could occur. This commit therefore adds overflow checks to get_online_cpus() and try_get_online_cpus(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>
2014-11-03rcu: Remove "cpu" argument to rcu_cleanup_after_idle()Paul E. McKenney
The "cpu" argument to rcu_cleanup_after_idle() is always the current CPU, so drop it. This moves the smp_processor_id() from the caller to rcu_cleanup_after_idle(), saving argument-passing overhead. Again, the anticipated cross-CPU uses of these functions has been replaced by NO_HZ_FULL. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com>