summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-08-07take fs_pin stuff to fs/*Al Viro
Add a new field to fs_pin - kill(pin). That's what umount and r/o remount will be calling for all pins attached to vfsmount and superblock resp. Called after bumping the refcount, so it won't go away under us. Dropping the refcount is responsibility of the instance. All generic stuff moved to fs/fs_pin.c; the next step will rip all the knowledge of kernel/acct.c from fs/super.c and fs/namespace.c. After that - death to mnt_pin(); it was intended to be usable as generic mechanism for code that wants to attach objects to vfsmount, so that they would not make the sucker busy and would get killed on umount. Never got it right; it remained acct.c-specific all along. Now it's very close to being killable. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07start carving bsd_acct_struct upAl Viro
pull generic parts into struct fs_pin. Eventually we want those to replace mnt_pin()/mnt_unpin() mess; that stuff will move to fs/*. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07acct: move mnt_pin() upwards.Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07make acct_kill() wait for file closing.Al Viro
Do actual closing of file via schedule_work(). And use __fput_sync() there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07drop ->s_umount around acct_auto_close()Al Viro
just repeat the frozen check after regaining it, and check that sb is still alive. If several threads hit acct_auto_close() at the same time, acct_auto_close() will survive that just fine. And we really don't want to play with writes and closing the file with ->s_umount held exclusive - it's a deadlock country. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07acct: get rid of acct_lock for acct->countAl Viro
* make acct->count atomic and acct freeing - rcu-delayed. * instead of grabbing acct_lock around the places where we take a reference, do that under rcu_read_lock() with atomic_long_inc_not_zero(). * have the new acct locked before making ns->bacct point to it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07acct: get rid of acct_listAl Viro
Put these suckers on per-vfsmount and per-superblock lists instead. Note: right now it's still acct_lock for everything, but that's going to change. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07acct: simplify check_free_space()Al Viro
a) file can't be NULL b) file can't be changed under us c) all writes are serialized by acct->lock; no need to mess with spinlock there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07acct: new lifetime rulesAl Viro
Do not reuse bsd_acct_struct after closing the damn thing. Structure lifetime is controlled by refcount now. We also have a mutex in there, held over closing and writing (the file is O_APPEND, so we are not losing any concurrency). As the result, we do not need to bother with get_file()/fput() on log write anymore. Moreover, do_acct_process() only needs acct itself; file and pidns are picked from it. Killed instances are distinguished by having NULL ->ns. Refcount is protected by acct_lock; anybody taking the mutex needs to grab a reference first. The things will get a lot simpler in the next commits - this is just the minimal chunk switching to the new lifetime rules. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07acct: serialize acct_on()Al Viro
brute-force - on a global mutex that isn't nested into anything. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07acct() should honour the limits from the very beginningAl Viro
We need to check free space on the first write to freshly opened log. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07split the slow path in acct_process() offAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07separate namespace-independent parts of filling acct_tAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07acct: switch to __kernel_write()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07acct: encode_comp_t(0) is 0, fortunately...Al Viro
There was an amusing bogosity in ac_rw calculation - it tried to do encode_comp_t(encode_comp_t(0) / 1024). Seeing that comp_t is a 3-bit exponent + 13-bit mantissa... it's a good thing that 0 is represented by all-bits-clear. The history of that one is interesting - it was introduced in 2.1.68pre1, when acct.c had been reworked and moved to separate file. Two months later (2.1.86) somebody has noticed that the sucker won't compile - there was no task_struct::io_usage. At which point the ac_io calculation had changed from encode_comp_t(current->io_usage) to encode_comp_t(0) and the bug in the next line (absolutely real back then, had it ever managed to compile) become a harmless bogosity. Looks like nobody has ever noticed until now. Anyway, let's bury that idiocy now that it got noticed. 17 years is long enough... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-07Merge commit 'ccbf62d8a284cf181ac28c8e8407dd077d90dd4b' into for-nextAl Viro
backmerge to avoid kernel/acct.c conflict
2014-08-03Linux 3.16v3.16Linus Torvalds
2014-08-03Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes in the timer area: - a long-standing lock inversion due to a printk - suspend-related hrtimer corruption in sched_clock" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks sched_clock: Avoid corrupting hrtimer tree during suspend
2014-08-02Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-armLinus Torvalds
Pull ARM fixes from Russell King: "A few fixes for ARM. Some of these are correctness issues: - TLBs must be flushed after the old mappings are removed by the DMA mapping code, but before the new mappings are established. - An off-by-one entry error in the Keystone LPAE setup code. Fixes include: - ensuring that the identity mapping for LPAE does not remove the kernel image from the identity map. - preventing userspace from trapping into kgdb. - fixing a preemption issue in the Intel iwmmxt code. - fixing a build error with nommu. Other changes include: - Adding a note about which areas of memory are expected to be accessible while the identity mapping tables are in place" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction ARM: idmap: add identity mapping usage note ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout ARM: fix alignment of keystone page table fixup ARM: 8112/1: only select ARM_PATCH_PHYS_VIRT if MMU is enabled ARM: 8100/1: Fix preemption disable in iwmmxt_task_enable() ARM: DMA: ensure that old section mappings are flushed from the TLB
2014-08-02ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instructionOmar Sandoval
The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn) should only be entered when a kgdb break instruction is executed from the kernel. Otherwise, if kgdb is enabled, a userspace program can cause the kernel to drop into the debugger by executing either KGDB_BREAKINST or KGDB_COMPILED_BREAK. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-08-02ARM: idmap: add identity mapping usage noteRussell King
Add a note about the usage of the identity mapping; we do not support accesses outside of the identity map region and kernel image while a CPU is using the identity map. This is because the identity mapping may overwrite vmalloc space, IO mappings, the vectors pages, etc. Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2014-08-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "This contains a couple of fixes - one is the aio fix from Christoph, the other a fallocate() one from Eric" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: fix check for fallocate on active swapfile direct-io: fix AIO regression
2014-08-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Peter Anvin: "A single fix to not invoke the espfix code on Xen PV, as it turns out to oops the guest when invoked after all. This patch leaves some amount of dead code, in particular unnecessary initialization of the espfix stacks when they won't be used, but in the interest of keeping the patch minimal that cleanup can wait for the next cycle" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64/entry/xen: Do not invoke espfix64 on Xen
2014-08-01Merge tag 'staging-3.16-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver bugfixes from Greg KH: "Here are some tiny staging driver bugfixes that I've had in my tree for the past week that resolve some reported issues. Nothing major at all, but it would be good to get them merged for 3.16-rc8 or -final" * tag 'staging-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: vt6655: Fix disassociated messages every 10 seconds staging: vt6655: Fix Warning on boot handle_irq_event_percpu. staging: rtl8723au: rtw_resume(): release semaphore before exit on error iio:bma180: Missing check for frequency fractional part iio:bma180: Fix scale factors to report correct acceleration units iio: buffer: Fix demux table creation
2014-08-01Merge tag 'dm-3.16-fixes-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: "Fix dm bufio shrinker to properly zero-fill all fields. Fix race in dm cache that caused improper reporting of the number of dirty blocks in the cache" * tag 'dm-3.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache: fix race affecting dirty block count dm bufio: fully initialize shrinker
2014-08-01Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM straggler SoC fix from Olof Johansson: "A DT bugfix for Nomadik that had an ambigouos double-inversion of a gpio line, and one MAINTAINER URL update that might as well go in now. We could hold off until the merge window, but then we'll just have to mark the DT fix for stable and it just seems like in total causing more work" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: MAINTAINERS: Update Tegra Git URL ARM: nomadik: fix up double inversion in DT
2014-08-01dm cache: fix race affecting dirty block countAnssi Hannula
nr_dirty is updated without locking, causing it to drift so that it is non-zero (either a small positive integer, or a very large one when an underflow occurs) even when there are no actual dirty blocks. This was due to a race between the workqueue and map function accessing nr_dirty in parallel without proper protection. People were seeing under runs due to a race on increment/decrement of nr_dirty, see: https://lkml.org/lkml/2014/6/3/648 Fix this by using an atomic_t for nr_dirty. Reported-by: roma1390@gmail.com Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
2014-08-01dm bufio: fully initialize shrinkerGreg Thelen
1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to struct shrinker assuming that all shrinkers were zero filled. The dm bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE. But there are proposed patches which add other bits to shrinker.flags (e.g. memcg awareness). Rather than simply initializing the shrinker, this patch uses kzalloc() when allocating the dm_bufio_client to ensure that the embedded shrinker and any other similar structures are zeroed. This fixes theoretical over aggressive shrinking of dm bufio objects. If the uninitialized dm_bufio_client.shrinker.flags contains SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for each numa node rather than just once. This has been broken since 3.12. Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.12+
2014-08-01timer: Fix lock inversion between hrtimer_bases.lock and scheduler locksJan Kara
clockevents_increase_min_delta() calls printk() from under hrtimer_bases.lock. That causes lock inversion on scheduler locks because printk() can call into the scheduler. Lockdep puts it as: ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc8-06195-g939f04b #2 Not tainted ------------------------------------------------------- trinity-main/74 is trying to acquire lock: (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c but task is already holding lock: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (hrtimer_bases.lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197 [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85 [<81080792>] task_clock_event_start+0x3a/0x3f [<810807a4>] task_clock_event_add+0xd/0x14 [<8108259a>] event_sched_in+0xb6/0x17a [<810826a2>] group_sched_in+0x44/0x122 [<81082885>] ctx_sched_in.isra.67+0x105/0x11f [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b [<81082bf6>] __perf_install_in_context+0x8b/0xa3 [<8107eb8e>] remote_function+0x12/0x2a [<8105f5af>] smp_call_function_single+0x2d/0x53 [<8107e17d>] task_function_call+0x30/0x36 [<8107fb82>] perf_install_in_context+0x87/0xbb [<810852c9>] SYSC_perf_event_open+0x5c6/0x701 [<810856f9>] SyS_perf_event_open+0x17/0x19 [<8142f8ee>] syscall_call+0x7/0xb -> #4 (&ctx->lock){......}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 -> #3 (&rq->lock){-.-.-.}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81040873>] __task_rq_lock+0x33/0x3a [<8104184c>] wake_up_new_task+0x25/0xc2 [<8102474b>] do_fork+0x15c/0x2a0 [<810248a9>] kernel_thread+0x1a/0x1f [<814232a2>] rest_init+0x1a/0x10e [<817af949>] start_kernel+0x303/0x308 [<817af2ab>] i386_start_kernel+0x79/0x7d -> #2 (&p->pi_lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<810413dd>] try_to_wake_up+0x1d/0xd6 [<810414cd>] default_wake_function+0xb/0xd [<810461f3>] __wake_up_common+0x39/0x59 [<81046346>] __wake_up+0x29/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #1 (&tty->write_wait){-.....}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<81046332>] __wake_up+0x15/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #0 (&port_lock_key){-.....}: [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105c548>] clockevents_program_event+0xe7/0xf3 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 other info that might help us debug this: Chain exists of: &port_lock_key --> &ctx->lock --> hrtimer_bases.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hrtimer_bases.lock); lock(&ctx->lock); lock(hrtimer_bases.lock); lock(&port_lock_key); *** DEADLOCK *** 4 locks held by trinity-main/74: #0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb #1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f #2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 #3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4 stack backtrace: CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003 Call Trace: [<81426f69>] dump_stack+0x16/0x18 [<81425a99>] print_circular_bug+0x18f/0x19c [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104af87>] ? lock_release+0x191/0x223 [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8104416d>] ? __dequeue_entity+0x23/0x27 [<81044505>] ? pick_next_task_fair+0xb1/0x120 [<8142cacc>] __schedule+0x4c6/0x4cb [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108 [<810475b0>] ? trace_hardirqs_off+0xb/0xd [<81056346>] ? rcu_irq_exit+0x64/0x77 Fix the problem by using printk_deferred() which does not call into the scheduler. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-01vfs: fix check for fallocate on active swapfileEric Biggers
Fix the broken check for calling sys_fallocate() on an active swapfile, introduced by commit 0790b31b69374ddadefe ("fs: disallow all fallocate operation on active swapfile"). Signed-off-by: Eric Biggers <ebiggers3@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-08-01direct-io: fix AIO regressionChristoph Hellwig
The direct-io.c rewrite to use the iov_iter infrastructure stopped updating the size field in struct dio_submit, and thus rendered the check for allowing asynchronous completions to always return false. Fix this by comparing it to the count of bytes in the iov_iter instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Tim Chen <tim.c.chen@linux.intel.com> Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
2014-07-31Merge tag 'pm+acpi-3.16-rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "One commit that fixes a problem causing PNP devices to be associated with wrong ACPI device objects sometimes during device enumeration due to an incorrect check in a matching function. That problem was uncovered by the ACPI device enumeration rework in 3.14" * tag 'pm+acpi-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / PNP: Fix acpi_pnp_match()
2014-07-31Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.linaro.org/people/mike.turquette/linux Pull clock driver fix from Mike Turquette: "A single patch to re-enable audio which is broken on all DRA7 SoC-based platforms. Missed this one from the last set of fixes" * tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux: clk: ti: clk-7xx: Correct ABE DPLL configuration
2014-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds
Pull crypto fix from Herbert Xu: "This adds missing SELinux labeling to AF_ALG sockets which apparently causes SELinux (or at least the SELinux people) to misbehave :)" * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: af_alg - properly label AF_ALG socket
2014-07-31Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI barrier fix from James Bottomley: "This is a potential data corruption fix: If we get an error sending down a barrier, we simply ignore it meaning the barrier semantics get violated without anyone being any the wiser. If the system crashes at this point, the filesystem potentially becomes corrupt. Fix is to report errors on failed barriers" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: handle flush errors properly
2014-07-31clk: ti: clk-7xx: Correct ABE DPLL configurationPeter Ujfalusi
ABE DPLL frequency need to be lowered from 361267200 to 180633600 to facilitate the ATL requironments. The dpll_abe_m2x2_ck clock need to be set to double of ABE DPLL rate in order to have correct clocks for audio. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Tero Kristo <t-kristo@ti.com> Signed-off-by: Mike Turquette <mturquette@linaro.org>
2014-07-31crypto: af_alg - properly label AF_ALG socketMilan Broz
Th AF_ALG socket was missing a security label (e.g. SELinux) which means that socket was in "unlabeled" state. This was recently demonstrated in the cryptsetup package (cryptsetup v1.6.5 and later.) See https://bugzilla.redhat.com/show_bug.cgi?id=1115120 This patch clones the sock's label from the parent sock and resolves the issue (similar to AF_BLUETOOTH protocol family). Cc: stable@vger.kernel.org Signed-off-by: Milan Broz <gmazyland@gmail.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-07-30kexec: fix build error when hugetlbfs is disabledDavid Rientjes
free_huge_page() is undefined without CONFIG_HUGETLBFS and there's no need to filter PageHuge() page is such a configuration either, so avoid exporting the symbol to fix a build error: In file included from kernel/kexec.c:14:0: kernel/kexec.c: In function 'crash_save_vmcoreinfo_init': kernel/kexec.c:1623:20: error: 'free_huge_page' undeclared (first use in this function) VMCOREINFO_SYMBOL(free_huge_page); ^ Introduced by commit 8f1d26d0e59b ("kexec: export free_huge_page to VMCOREINFO") Reported-by: kbuild test robot <fengguang.wu@intel.com> Acked-by: Olof Johansson <olof@lixom.net> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds
Merge fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: Josh has moved kexec: export free_huge_page to VMCOREINFO mm: fix filemap.c pagecache_get_page() kernel-doc warnings mm: debugfs: move rounddown_pow_of_two() out from do_fault path memcg: oom_notify use-after-free fix hwpoison: call action_result() in failure path of hwpoison_user_mappings() hwpoison: fix hugetlbfs/thp precheck in hwpoison_user_mappings() rapidio/tsi721_dma: fix failure to obtain transaction descriptor mm, thp: do not allow thp faults to avoid cpuset restrictions mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()
2014-07-30Josh has movedJosh Triplett
My IBM email addresses haven't worked for years; also map some old-but-functional forwarding addresses to my canonical address. Update my GPG key fingerprint; I moved to 4096R a long time ago. Update description. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30kexec: export free_huge_page to VMCOREINFOAtsushi Kumagai
PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe1 ("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need another symbol to filter *hugetlbfs* pages. If a user hope to filter user pages, makedumpfile tries to exclude them by checking the condition whether the page is anonymous, but hugetlbfs pages aren't anonymous while they also be user pages. We know it's possible to detect them in the same way as PageHuge(), so we need the start address of free_huge_page(): int PageHuge(struct page *page) { if (!PageCompound(page)) return 0; page = compound_head(page); return get_compound_page_dtor(page) == free_huge_page; } For that reason, this patch changes free_huge_page() into public to export it to VMCOREINFO. Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Acked-by: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30mm: fix filemap.c pagecache_get_page() kernel-doc warningsRandy Dunlap
Fix kernel-doc warnings in mm/filemap.c: pagecache_get_page(): Warning(..//mm/filemap.c:1054): No description found for parameter 'cache_gfp_mask' Warning(..//mm/filemap.c:1054): No description found for parameter 'radix_gfp_mask' Warning(..//mm/filemap.c:1054): Excess function parameter 'gfp_mask' description in 'pagecache_get_page' Fixes: 2457aec63745 ("mm: non-atomically mark page accessed during page cache allocation where possible") [mgorman@suse.de: change everything] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30mm: debugfs: move rounddown_pow_of_two() out from do_fault pathAndrey Ryabinin
do_fault_around() expects fault_around_bytes rounded down to nearest page order. Instead of calling rounddown_pow_of_two every time in fault_around_pages()/fault_around_mask() we could do round down when user changes fault_around_bytes via debugfs interface. This also fixes bug when user set fault_around_bytes to 0. Result of rounddown_pow_of_two(0) is not defined, therefore fault_around_bytes == 0 doesn't work without this patch. Let's set fault_around_bytes to PAGE_SIZE if user sets to something less than PAGE_SIZE [akpm@linux-foundation.org: tweak code layout] Fixes: a9b0f861("mm: nominate faultaround area in bytes rather than page order") Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: <stable@vger.kernel.org> [3.15.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30memcg: oom_notify use-after-free fixMichal Hocko
Paul Furtado has reported the following GPF: general protection fault: 0000 [#1] SMP Modules linked in: ipv6 dm_mod xen_netfront coretemp hwmon x86_pkg_temp_thermal crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw gf128mul glue_helper aes_x86_64 microcode pcspkr ext4 jbd2 mbcache raid0 xen_blkfront CPU: 3 PID: 3062 Comm: java Not tainted 3.16.0-rc5 #1 task: ffff8801cfe8f170 ti: ffff8801d2ec4000 task.ti: ffff8801d2ec4000 RIP: e030:mem_cgroup_oom_synchronize+0x140/0x240 RSP: e02b:ffff8801d2ec7d48 EFLAGS: 00010283 RAX: 0000000000000001 RBX: ffff88009d633800 RCX: 000000000000000e RDX: fffffffffffffffe RSI: ffff88009d630200 RDI: ffff88009d630200 RBP: ffff8801d2ec7da8 R08: 0000000000000012 R09: 00000000fffffffe R10: 0000000000000000 R11: 0000000000000000 R12: ffff88009d633800 R13: ffff8801d2ec7d48 R14: dead000000100100 R15: ffff88009d633a30 FS: 00007f1748bb4700(0000) GS:ffff8801def80000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f4110300308 CR3: 00000000c05f7000 CR4: 0000000000002660 Call Trace: pagefault_out_of_memory+0x18/0x90 mm_fault_error+0xa9/0x1a0 __do_page_fault+0x478/0x4c0 do_page_fault+0x2c/0x40 page_fault+0x28/0x30 Code: 44 00 00 48 89 df e8 40 ca ff ff 48 85 c0 49 89 c4 74 35 4c 8b b0 30 02 00 00 4c 8d b8 30 02 00 00 4d 39 fe 74 1b 0f 1f 44 00 00 <49> 8b 7e 10 be 01 00 00 00 e8 42 d2 04 00 4d 8b 36 4d 39 fe 75 RIP mem_cgroup_oom_synchronize+0x140/0x240 Commit fb2a6fc56be6 ("mm: memcg: rework and document OOM waiting and wakeup") has moved mem_cgroup_oom_notify outside of memcg_oom_lock assuming it is protected by the hierarchical OOM-lock. Although this is true for the notification part the protection doesn't cover unregistration of event which can happen in parallel now so mem_cgroup_oom_notify can see already unlinked and/or freed mem_cgroup_eventfd_list. Fix this by using memcg_oom_lock also in mem_cgroup_oom_notify. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=80881 Fixes: fb2a6fc56be6 (mm: memcg: rework and document OOM waiting and wakeup) Signed-off-by: Michal Hocko <mhocko@suse.cz> Reported-by: Paul Furtado <paulfurtado91@gmail.com> Tested-by: Paul Furtado <paulfurtado91@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30hwpoison: call action_result() in failure path of hwpoison_user_mappings()Naoya Horiguchi
hwpoison_user_mappings() could fail for various reasons, so printk()s to print out the reasons should be done in each failure check inside hwpoison_user_mappings(). And currently we don't call action_result() when hwpoison_user_mappings() fails, which is not consistent with other exit points of memory error handler. So this patch fixes these messaging problems. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Chen Yucong <slaoub@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30hwpoison: fix hugetlbfs/thp precheck in hwpoison_user_mappings()Naoya Horiguchi
A recent fix from Chen Yucong, commit 0bc1f8b0682c ("hwpoison: fix the handling path of the victimized page frame that belong to non-LRU") rejects going into unmapping operation for hugetlbfs/thp pages, which results in failing error containing on such pages. This patch fixes it. With this patch, hwpoison functional tests in mce-test testsuite pass. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Chen Yucong <slaoub@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30rapidio/tsi721_dma: fix failure to obtain transaction descriptorAlexandre Bounine
This is a bug fix for the situation when function tsi721_desc_get() fails to obtain a free transaction descriptor. The bug usually results in a memory access crash dump when data transfer scatter-gather list has more entries than size of hardware buffer descriptors ring. This fix ensures that error is properly returned to a caller instead of an invalid entry. This patch is applicable to kernel versions starting from v3.5. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com> Cc: Stef van Os <stef.van.os@prodrive-technologies.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <stable@vger.kernel.org> [3.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30mm, thp: do not allow thp faults to avoid cpuset restrictionsDavid Rientjes
The page allocator relies on __GFP_WAIT to determine if ALLOC_CPUSET should be set in allocflags. ALLOC_CPUSET controls if a page allocation should be restricted only to the set of allowed cpuset mems. Transparent hugepages clears __GFP_WAIT when defrag is disabled to prevent the fault path from using memory compaction or direct reclaim. Thus, it is unfairly able to allocate outside of its cpuset mems restriction as a side-effect. This patch ensures that ALLOC_CPUSET is only cleared when the gfp mask is truly GFP_ATOMIC by verifying it is also not a thp allocation. Signed-off-by: David Rientjes <rientjes@google.com> Reported-by: Alex Thorlton <athorlton@sgi.com> Tested-by: Alex Thorlton <athorlton@sgi.com> Cc: Bob Liu <lliubbo@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hedi Berriche <hedi@sgi.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30mm/page-writeback.c: fix divide by zero in bdi_dirty_limits()Maxim Patlasov
Under memory pressure, it is possible for dirty_thresh, calculated by global_dirty_limits() in balance_dirty_pages(), to equal zero. Then, if strictlimit is true, bdi_dirty_limits() tries to resolve the proportion: bdi_bg_thresh : bdi_thresh = background_thresh : dirty_thresh by dividing by zero. Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30MAINTAINERS: Update Tegra Git URLAndreas Färber
swarren/linux-tegra.git is a stale location; it has moved to tegra/linux.git. While the git protocol re-directs to the new location, HTTP does not. Besides, MAINTAINERS should contain the canonical URL. Signed-off-by: Andreas Färber <afaerber@suse.de> [swarren, updated commit message] Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Olof Johansson <olof@lixom.net>