Age | Commit message (Collapse) | Author |
|
Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000
broke AddressSanitizer. This is a partial revert of:
eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
The AddressSanitizer tool has hard-coded expectations about where
executable mappings are loaded.
The motivation for changing the PIE base in the above commits was to
avoid the Stack-Clash CVEs that allowed executable mappings to get too
close to heap and stack. This was mainly a problem on 32-bit, but the
64-bit bases were moved too, in an effort to proactively protect those
systems (proofs of concept do exist that show 64-bit collisions, but
other recent changes to fix stack accounting and setuid behaviors will
minimize the impact).
The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC
base), so only the 64-bit PIE base needs to be reverted to let x86 and
arm64 ASan binaries run again. Future changes to the 64-bit PIE base on
these architectures can be made optional once a more dynamic method for
dealing with AddressSanitizer is found. (e.g. always loading PIE into
the mmap region for marked binaries.)
Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast
Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
Fixes: 02445990a96e ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Kostya Serebryany <kcc@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly") added
use of __GFP_HIGHMEM for allocations. vmalloc_32 may use
GFP_DMA/GFP_DMA32 which does not play nice with __GFP_HIGHMEM and will
trigger a BUG in gfp_zone.
Only add __GFP_HIGHMEM if we aren't using GFP_DMA/GFP_DMA32.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1482249
Link: http://lkml.kernel.org/r/20170816220705.31374-1-labbott@redhat.com
Fixes: 19809c2da28a ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I hit a use after free issue when executing trinity and repoduced it
with KASAN enabled. The related call trace is as follows.
BUG: KASan: use after free in SyS_get_mempolicy+0x3c8/0x960 at addr ffff8801f582d766
Read of size 2 by task syz-executor1/798
INFO: Allocated in mpol_new.part.2+0x74/0x160 age=3 cpu=1 pid=799
__slab_alloc+0x768/0x970
kmem_cache_alloc+0x2e7/0x450
mpol_new.part.2+0x74/0x160
mpol_new+0x66/0x80
SyS_mbind+0x267/0x9f0
system_call_fastpath+0x16/0x1b
INFO: Freed in __mpol_put+0x2b/0x40 age=4 cpu=1 pid=799
__slab_free+0x495/0x8e0
kmem_cache_free+0x2f3/0x4c0
__mpol_put+0x2b/0x40
SyS_mbind+0x383/0x9f0
system_call_fastpath+0x16/0x1b
INFO: Slab 0xffffea0009cb8dc0 objects=23 used=8 fp=0xffff8801f582de40 flags=0x200000000004080
INFO: Object 0xffff8801f582d760 @offset=5984 fp=0xffff8801f582d600
Bytes b4 ffff8801f582d750: ae 01 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
Object ffff8801f582d760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8801f582d770: 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkk.
Redzone ffff8801f582d778: bb bb bb bb bb bb bb bb ........
Padding ffff8801f582d8b8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Memory state around the buggy address:
ffff8801f582d600: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801f582d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8801f582d700: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fc
!shared memory policy is not protected against parallel removal by other
thread which is normally protected by the mmap_sem. do_get_mempolicy,
however, drops the lock midway while we can still access it later.
Early premature up_read is a historical artifact from times when
put_user was called in this path see https://lwn.net/Articles/124754/
but that is gone since 8bccd85ffbaf ("[PATCH] Implement sys_* do_*
layering in the memory policy layer."). but when we have the the
current mempolicy ref count model. The issue was introduced
accordingly.
Fix the issue by removing the premature release.
Link: http://lkml.kernel.org/r/1502950924-27521-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org> [2.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
name[] in cma_debugfs_add_one() can only accommodate 16 chars including
NULL to store sprintf output. It's common for cma device name to be
larger than 15 chars. This can cause stack corrpution. If the gcc
stack protector is turned on, this can cause a panic due to stack
corruption.
Below is one example trace:
Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:
ffffff8e69a75730
Call trace:
dump_backtrace+0x0/0x2c4
show_stack+0x20/0x28
dump_stack+0xb8/0xf4
panic+0x154/0x2b0
print_tainted+0x0/0xc0
cma_debugfs_init+0x274/0x290
do_one_initcall+0x5c/0x168
kernel_init_freeable+0x1c8/0x280
Fix the short sprintf buffer in cma_debugfs_add_one() by using
scnprintf() instead of sprintf().
Link: http://lkml.kernel.org/r/1502446217-21840-1-git-send-email-guptap@codeaurora.org
Fixes: f318dd083c81 ("cma: Store a name in the cma structure")
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
Acked-by: Laura Abbott <labbott@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When forcing a signal, SIGNAL_UNKILLABLE is removed to prevent recursive
faults, but this is undesirable when tracing. For example, debugging an
init process (whether global or namespace), hitting a breakpoint and
SIGTRAP will force SIGTRAP and then remove SIGNAL_UNKILLABLE.
Everything continues fine, but then once debugging has finished, the
init process is left killable which is unlikely what the user expects,
resulting in either an accidentally killed init or an init that stops
reaping zombies.
Link: http://lkml.kernel.org/r/20170815112806.10728-1-jamie.iles@oracle.com
Signed-off-by: Jamie Iles <jamie.iles@oracle.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Wenwei Tao has noticed that our current assumption that the oom victim
is dying and never doing any visible changes after it dies, and so the
oom_reaper can tear it down, is not entirely true.
__task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set
but do_group_exit sends SIGKILL to all threads _after_ the flag is set.
So there is a race window when some threads won't have
fatal_signal_pending while the oom_reaper could start unmapping the
address space. Moreover some paths might not check for fatal signals
before each PF/g-u-p/copy_from_user.
We already have a protection for oom_reaper vs. PF races by checking
MMF_UNSTABLE. This has been, however, checked only for kernel threads
(use_mm users) which can outlive the oom victim. A simple fix would be
to extend the current check in handle_mm_fault for all tasks but that
wouldn't be sufficient because the current check assumes that a kernel
thread would bail out after EFAULT from get_user*/copy_from_user and
never re-read the same address which would succeed because the PF path
has established page tables already. This seems to be the case for the
only existing use_mm user currently (virtio driver) but it is rather
fragile in general.
This is even more fragile in general for more complex paths such as
generic_perform_write which can re-read the same address more times
(e.g. iov_iter_copy_from_user_atomic to fail and then
iov_iter_fault_in_readable on retry).
Therefore we have to implement MMF_UNSTABLE protection in a robust way
and never make a potentially corrupted content visible. That requires
to hook deeper into the PF path and check for the flag _every time_
before a pte for anonymous memory is established (that means all
!VM_SHARED mappings).
The corruption can be triggered artificially
(http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp)
but there doesn't seem to be any real life bug report. The race window
should be quite tight to trigger most of the time.
Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org
Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Wenwei Tao <wenwei.tww@alibaba-inc.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Tetsuo Handa has noticed that MMF_UNSTABLE SIGBUS path in
handle_mm_fault causes a lockdep splat
Out of memory: Kill process 1056 (a.out) score 603 or sacrifice child
Killed process 1056 (a.out) total-vm:4268108kB, anon-rss:2246048kB, file-rss:0kB, shmem-rss:0kB
a.out (1169) used greatest stack depth: 11664 bytes left
DEBUG_LOCKS_WARN_ON(depth <= 0)
------------[ cut here ]------------
WARNING: CPU: 6 PID: 1339 at kernel/locking/lockdep.c:3617 lock_release+0x172/0x1e0
CPU: 6 PID: 1339 Comm: a.out Not tainted 4.13.0-rc3-next-20170803+ #142
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
RIP: 0010:lock_release+0x172/0x1e0
Call Trace:
up_read+0x1a/0x40
__do_page_fault+0x28e/0x4c0
do_page_fault+0x30/0x80
page_fault+0x28/0x30
The reason is that the page fault path might have dropped the mmap_sem
and returned with VM_FAULT_RETRY. MMF_UNSTABLE check however rewrites
the error path to VM_FAULT_SIGBUS and we always expect mmap_sem taken in
that path. Fix this by taking mmap_sem when VM_FAULT_RETRY is held in
the MMF_UNSTABLE path.
We cannot simply add VM_FAULT_SIGBUS to the existing error code because
all arch specific page fault handlers and g-u-p would have to learn a
new error code combination.
Link: http://lkml.kernel.org/r/20170807113839.16695-2-mhocko@kernel.org
Fixes: 3f70dc38cec2 ("mm: make sure that kthreads will not refault oom reaped memory")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Wenwei Tao <wenwei.tww@alibaba-inc.com>
Cc: <stable@vger.kernel.org> [4.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
To avoid a possible deadlock, sysfs_slab_remove() schedules an
asynchronous work to delete sysfs entries corresponding to the kmem
cache. To ensure the cache isn't freed before the work function is
called, it takes a reference to the cache kobject. The reference is
supposed to be released by the work function.
However, the work function (sysfs_slab_remove_workfn()) does nothing in
case the cache sysfs entry has already been deleted, leaking the kobject
and the corresponding cache.
This may happen on a per memcg cache destruction, because sysfs entries
of a per memcg cache are deleted on memcg offline if the cache is empty
(see __kmemcg_cache_deactivate()).
The kmemleak report looks like this:
unreferenced object 0xffff9f798a79f540 (size 32):
comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.554s)
hex dump (first 32 bytes):
6b 6d 61 6c 6c 6f 63 2d 31 36 28 31 35 39 39 3a kmalloc-16(1599:
6e 65 77 72 6f 6f 74 29 00 23 6b c0 ff ff ff ff newroot).#k.....
backtrace:
kmemleak_alloc+0x4a/0xa0
__kmalloc_track_caller+0x148/0x2c0
kvasprintf+0x66/0xd0
kasprintf+0x49/0x70
memcg_create_kmem_cache+0xe6/0x160
memcg_kmem_cache_create_func+0x20/0x110
process_one_work+0x205/0x5d0
worker_thread+0x4e/0x3a0
kthread+0x109/0x140
ret_from_fork+0x2a/0x40
unreferenced object 0xffff9f79b6136840 (size 416):
comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.573s)
hex dump (first 32 bytes):
40 fb 80 c2 3e 33 00 00 00 00 00 40 00 00 00 00 @...>3.....@....
00 00 00 00 00 00 00 00 10 00 00 00 10 00 00 00 ................
backtrace:
kmemleak_alloc+0x4a/0xa0
kmem_cache_alloc+0x128/0x280
create_cache+0x3b/0x1e0
memcg_create_kmem_cache+0x118/0x160
memcg_kmem_cache_create_func+0x20/0x110
process_one_work+0x205/0x5d0
worker_thread+0x4e/0x3a0
kthread+0x109/0x140
ret_from_fork+0x2a/0x40
Fix the leak by adding the missing call to kobject_put() to
sysfs_slab_remove_workfn().
Link: http://lkml.kernel.org/r/20170812181134.25027-1-vdavydov.dev@gmail.com
Fixes: 3b7b314053d02 ("slub: make sysfs file removal asynchronous")
Signed-off-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Reported-by: Andrei Vagin <avagin@gmail.com>
Tested-by: Andrei Vagin <avagin@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org> [4.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is existing use after free bug when deferred struct pages are
enabled:
The memblock_add() allocates memory for the memory array if more than
128 entries are needed. See comment in e820__memblock_setup():
* The bootstrap memblock region count maximum is 128 entries
* (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
* than that - so allow memblock resizing.
This memblock memory is freed here:
free_low_memory_core_early()
We access the freed memblock.memory later in boot when deferred pages
are initialized in this path:
deferred_init_memmap()
for_each_mem_pfn_range()
__next_mem_pfn_range()
type = &memblock.memory;
One possible explanation for why this use-after-free hasn't been hit
before is that the limit of INIT_MEMBLOCK_REGIONS has never been
exceeded at least on systems where deferred struct pages were enabled.
Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
and verifying in qemu that this code is getting excuted and that the
freed pages are sane.
Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The descriptions were reversed, correct this.
Link: http://lkml.kernel.org/r/20170809234635.13443-4-mcgrof@kernel.org
Fixes: 64b671204afd71 ("test_sysctl: add generic script to expand on tests")
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reported-by: Daniel Mentz <danielmentz@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matt Redfearn <matt.redfearn@imgetc.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Recursive loops with module loading were previously handled in kmod by
restricting the number of modprobe calls to 50 and if that limit was
breached request_module() would return an error and a user would see the
following on their kernel dmesg:
request_module: runaway loop modprobe binfmt-464c
Starting init:/sbin/init exists but couldn't execute it (error -8)
This issue could happen for instance when a 64-bit kernel boots a 32-bit
userspace on some architectures and has no 32-bit binary format
hanlders. This is visible, for instance, when a CONFIG_MODULES enabled
64-bit MIPS kernel boots a into o32 root filesystem and the binfmt
handler for o32 binaries is not built-in.
After commit 6d7964a722af ("kmod: throttle kmod thread limit") we now
don't have any visible signs of an error and the kernel just waits for
the loop to end somehow.
Although this *particular* recursive loop could also be addressed by
doing a sanity check on search_binary_handler() and disallowing a
modular binfmt to be required for modprobe, a generic solution for any
recursive kernel kmod issues is still needed.
This should catch these loops. We can investigate each loop and address
each one separately as they come in, this however puts a stop gap for
them as before.
Link: http://lkml.kernel.org/r/20170809234635.13443-3-mcgrof@kernel.org
Fixes: 6d7964a722af ("kmod: throttle kmod thread limit")
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reported-by: Matt Redfearn <matt.redfearn@imgtec.com>
Tested-by: Matt Redfearn <matt.redfearn@imgetc.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
These are the few pending fixes I have queued up for v4.13-final. One
is a a generic regression fix for recursive loops on kmod and the other
one is a trivial print out correction.
During the v4.13 development we assumed that recursive kmod loops were
no longer possible. Clearly that is not true. The regression fix makes
use of a new killable wait. We use a killable wait to be paranoid in
how signals might be sent to modprobe and only accept a proper SIGKILL.
The signal will only be available to userspace to issue *iff* a thread
has already entered a wait state, and that happens only if we've already
throttled after 50 kmod threads have been hit.
Note that although it may seem excessive to trigger a failure afer 5
seconds if all kmod thread remain busy, prior to the series of changes
that went into v4.13 we would actually *always* fatally fail any request
which came in if the limit was already reached. The new waiting
implemented in v4.13 actually gives us *more* breathing room -- the wait
for 5 seconds is a wait for *any* kmod thread to finish. We give up and
fail *iff* no kmod thread has finished and they're *all* running
straight for 5 consecutive seconds. If 50 kmod threads are running
consecutively for 5 seconds something else must be really bad.
Recursive loops with kmod are bad but they're also hard to implement
properly as a selftest without currently fooling current userspace tools
like kmod [1]. For instance kmod will complain when you run depmod if
it finds a recursive loop with symbol dependency between modules as such
this type of recursive loop cannot go upstream as the modules_install
target will fail after running depmod.
These tests already exist on userspace kmod upstream though (refer to
the testsuite/module-playground/mod-loop-*.c files). The same is not
true if request_module() is used though, or worst if aliases are used.
Likewise the issue with 64-bit kernels booting 32-bit userspace without
a binfmt handler built-in is also currently not detected and proactively
avoided by userspace kmod tools, or kconfig for all architectures.
Although we could complain in the kernel when some of these individual
recursive issues creep up, proactively avoiding these situations in
userspace at build time is what we should keep striving for.
Lastly, since recursive loops could happen with kmod it may mean
recursive loops may also be possible with other kernel usermode helpers,
this should be investigated and long term if we can come up with a more
sensible generic solution even better!
[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20170809-kmod-for-v4.13-final
[1] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git
This patch (of 3):
This wait is similar to wait_event_interruptible_timeout() but only
accepts SIGKILL interrupt signal. Other signals are ignored.
Link: http://lkml.kernel.org/r/20170809234635.13443-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Matt Redfearn <matt.redfearn@imgetc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 05a4a9527931 ("kernel/watchdog: split up config options") lost
the perf-based hardlockup detector's dependency on PERF_EVENTS, which
can result in broken builds with some powerpc configurations.
Restore the dependency. Add it in for x86 too, despite x86 always
selecting PERF_EVENTS it seems reasonable to make the dependency
explicit.
Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com
Fixes: 05a4a9527931 ("kernel/watchdog: split up config options")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Jaegeuk and Brad report a NULL pointer crash when writeback ending tries
to update the memcg stats:
BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
IP: test_clear_page_writeback+0x12e/0x2c0
[...]
RIP: 0010:test_clear_page_writeback+0x12e/0x2c0
Call Trace:
<IRQ>
end_page_writeback+0x47/0x70
f2fs_write_end_io+0x76/0x180 [f2fs]
bio_endio+0x9f/0x120
blk_update_request+0xa8/0x2f0
scsi_end_request+0x39/0x1d0
scsi_io_completion+0x211/0x690
scsi_finish_command+0xd9/0x120
scsi_softirq_done+0x127/0x150
__blk_mq_complete_request_remote+0x13/0x20
flush_smp_call_function_queue+0x56/0x110
generic_smp_call_function_single_interrupt+0x13/0x30
smp_call_function_single_interrupt+0x27/0x40
call_function_single_interrupt+0x89/0x90
RIP: 0010:native_safe_halt+0x6/0x10
(gdb) l *(test_clear_page_writeback+0x12e)
0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619).
614 mod_node_page_state(page_pgdat(page), idx, val);
615 if (mem_cgroup_disabled() || !page->mem_cgroup)
616 return;
617 mod_memcg_state(page->mem_cgroup, idx, val);
618 pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
619 this_cpu_add(pn->lruvec_stat->count[idx], val);
620 }
621
622 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
623 gfp_t gfp_mask,
The issue is that writeback doesn't hold a page reference and the page
might get freed after PG_writeback is cleared (and the mapping is
unlocked) in test_clear_page_writeback(). The stat functions looking up
the page's node or zone are safe, as those attributes are static across
allocation and free cycles. But page->mem_cgroup is not, and it will
get cleared if we race with truncation or migration.
It appears this race window has been around for a while, but less likely
to trigger when the memcg stats were updated first thing after
PG_writeback is cleared. Recent changes reshuffled this code to update
the global node stats before the memcg ones, though, stretching the race
window out to an extent where people can reproduce the problem.
Update test_clear_page_writeback() to look up and pin page->mem_cgroup
before clearing PG_writeback, then not use that pointer afterward. It
is a partial revert of 62cccb8c8e7a ("mm: simplify lock_page_memcg()")
but leaves the pageref-holding callsites that aren't affected alone.
Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org
Fixes: 62cccb8c8e7a ("mm: simplify lock_page_memcg()")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reported-by: Bradley Bolen <bradleybolen@gmail.com>
Tested-by: Brad Bolen <bradleybolen@gmail.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A bug in the VSX register saving that could cause userspace FP/VMX
register corruption.
Never seen to happen (that we know of), was found by code inspection,
but still tagged for stable given the consequences"
* tag 'powerpc-4.13-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"A small number of bugfixes, nothing serious this time. Here is a full
list.
4.13 regression fix:
- imx7d-sdb pinctrl support regressed in 4.13 due to an incomplete
patch
DT fixes for recently added devices:
- badly copied DT entries on imx6qdl-nitrogen6_som broke PCI reset
- sama5d2 memory controller had the wrong ID and registers
- imx7 power domains did not work correctly with deferred probing
(driver added in 4.12)
- Allwinner H5 pinctrl (added in 4.12) did not work right with GPIO
interrupts
Fixes for older bugs that just got noticed:
- i.MX25 ADC support (added in 4.6) apparently never worked right due
to a missing 'ranges' property in DT.
- Renesas Salvador Audio support (added in v4.5) was broken for
device repeated bind/unbind due to a naming conflict.
- Various allwinner boards are missing an 'ethernet' alias in DT,
leading to unstable device naming.
Preventive bugfix:
- TI Keystone needs a fix to prevent a NULL pointer dereference with
an upcoming PM change"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
soc: ti: ti_sci_pm_domains: Populate name for genpd
ARM: dts: imx6qdl-nitrogen6_som2: fix PCIe reset
arm64: allwinner: h5: fix pinctrl IRQs
arm64: allwinner: a64: sopine: add missing ethernet0 alias
arm64: allwinner: a64: pine64: add missing ethernet0 alias
arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias
arm64: renesas: salvator-common: avoid audio_clkout naming conflict
ARM: dts: i.MX25: add ranges to tscadc
soc: imx: gpcv2: fix regulator deferred probe
ARM: dts: at91: sama5d2: fix EBI/NAND controllers declaration
ARM: dts: at91: sama5d2: use sama5d2 compatible string for SMC
ARM: dts: imx7d-sdb: Put pinctrl_spi4 in the correct location
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes, mostly for regression fixes (sequencer
kconfig and emu10k1 probe) and device-specific quirks (three for USB
and one for HD-audio).
One significant change is a fix for races in ALSA sequencer core,
which covers over the previous incomplete fix"
* tag 'sound-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: emu10k1: Fix forgotten user-copy conversion in init code
ALSA: usb-audio: add DSD support for new Amanero PID
ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices
ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset
ALSA: seq: 2nd attempt at fixing race creating a queue
ALSA: hda/realtek - Fix pincfg for Dell XPS 13 9370
ALSA: seq: Fix CONFIG_SND_SEQ_MIDI dependency
|
|
Pull dma-mapping fix from Christoph Hellwig:
"Another dma-mapping regression fix"
* tag 'dma-mapping-4.13-3' of git://git.infradead.org/users/hch/dma-mapping:
of: fix DMA mask generation
|
|
Commit b6a1d093f96b ("PM / Domains: Extend generic power domain
debugfs") now creates a debugfs directory for each genpd based on the
name of the genpd. Currently no name is given to the genpd created by
ti_sci_pm_domains driver so because of this we see a NULL pointer
dereferences when it is accessed on boot when the debugfs entry creation
is attempted.
Give the genpd a name before registering it to avoid this.
Fixes: 52835d59fc6c ("soc: ti: Add ti_sci_pm_domains driver")
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes
Pull "i.MX fixes for 4.13, round 3" from Shawn Guo:
- Fix PCIe reset GPIO of imx6qdl-nitrogen6_som2 board, which was
a bad copy from nitrogen6_max device tree.
* tag 'imx-fixes-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6qdl-nitrogen6_som2: fix PCIe reset
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes
Pull "Allwinner fixes for 4.13, round 2" from Chen-Yu Tsai:
Three fixes adding a missing alias for the Ethernet controller on A64
boards. One adding a missing interrupt for the pin controller.
* tag 'sunxi-fixes-for-4.13-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: allwinner: h5: fix pinctrl IRQs
arm64: allwinner: a64: sopine: add missing ethernet0 alias
arm64: allwinner: a64: pine64: add missing ethernet0 alias
arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias
|
|
The commit d42fe63d5839 ("ALSA: emu10k1: Get rid of set_fs() usage")
converted the user-space copy hack with set_fs() to the direct
memcpy(), but one place was forgotten. This resulted in the error
from snd_emu10k1_init_efx(), eventually failed to load the driver.
Fix the missing piece.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196687
Fixes: d42fe63d5839 ("ALSA: emu10k1: Get rid of set_fs() usage")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Add DSD support for new Amanero Combo384 firmware version with a new
PID. This firmware uses DSD_U32_BE.
Fixes: 3eff682d765b ("ALSA: usb-audio: Support both DSD LE/BE Amanero firmware versions")
Signed-off-by: Jussi Laako <jussi@sonarnerd.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Previous value was a bad copy of nitrogen6_max device tree.
Signed-off-by: Gary Bisson <gary.bisson@boundarydevices.com>
Fixes: 3faa1bb2e89c ("ARM: dts: imx: add Boundary Devices Nitrogen6_SOM2 support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Shawn Guo <shawnguo@kernel.org>
|
|
git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Seems to be slowing down nicely, just one amdgpu fix, and a bunch of
i915 fixes"
* tag 'drm-fixes-for-v4.13-rc6' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: save list length when fence is signaled
drm/i915: Avoid the gpu reset vs. modeset deadlock
drm/i915: Suppress switch_mm emission between the same aliasing_ppgtt
drm/i915: Return correct EDP voltage swing table for 0.85V
drm/i915/cnl: Add slice and subslice information to debugfs.
drm/i915: Perform an invalidate prior to executing golden renderstate
drm/i915: remove unused function declaration
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two issues related to exposing the current CPU frequency to
user space on x86.
Specifics:
- Disable interrupts around reading IA32_APERF and IA32_MPERF in
aperfmperf_snapshot_khz() (introduced recently) to avoid excessive
delays between the reads that may result from interrupt handling
(Doug Smythies).
- Fix the computation of the CPU frequency to be reported through the
pstate_sample tracepoint in intel_pstate (Doug Smythies)"
* tag 'pm-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: x86: Disable interrupts during MSRs reading
cpufreq: intel_pstate: report correct CPU frequencies during trace
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c - Add antoher Lenovo ACPI ID for upcoming Lenovo NB
Input: elan_i2c - add ELAN0608 to the ACPI table
Input: trackpoint - assume 3 buttons when buttons detection fails
|
|
into drm-fixes
single amdgpu fix.
* 'drm-fixes-4.13' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: save list length when fence is signaled
|
|
git://anongit.freedesktop.org/git/drm-intel into drm-fixes
drm/i915 fixes for v4.13-rc6
"Chris' "drm/i915: Perform an invalidate prior to executing golden renderstate" and Daniel's
"drm/i915: Avoid the gpu reset vs. modeset deadlock" seem like the most important ones.
* tag 'drm-intel-fixes-2017-08-16' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Avoid the gpu reset vs. modeset deadlock
drm/i915: Suppress switch_mm emission between the same aliasing_ppgtt
drm/i915: Return correct EDP voltage swing table for 0.85V
drm/i915/cnl: Add slice and subslice information to debugfs.
drm/i915: Perform an invalidate prior to executing golden renderstate
drm/i915: remove unused function declaration
|
|
* intel_pstate-fix:
cpufreq: intel_pstate: report correct CPU frequencies during trace
* cpufreq-x86-fix:
cpufreq: x86: Disable interrupts during MSRs reading
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
- Fix PCI memory bar assignments with 64-bit kernels on machines with
Dino/Cujo PCI chipsets. This makes PCI graphic cards work on such
machines (from Thomas Bogendoerfer).
- Fix documentation to be more clear about the difference between %pF
and %pS printk format usage. There are still many places in the
kernel which have it wrong (from Petr Mladek, Sergey Senozhatsky &
me).
* 'parisc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
printk-formats.txt: Better describe the difference between %pS and %pF
parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota fix from Jan Kara:
"A fix of a check for quota limit"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: correct space limit check
|
|
Christian Brauner reported that if you use the TIOCGPTPEER ioctl() to
get a slave pty file descriptor, the resulting file descriptor doesn't
look right in /proc/<pid>/fd/<fd>. In particular, he wanted to use
readlink() on /proc/self/fd/<fd> to get the pathname of the slave pty
(basically implementing "ptsname{_r}()").
The reason for that was that we had generated the wrong 'struct path'
when we create the pty in ptmx_open().
In particular, the dentry was correct, but the vfsmount pointed to the
mount of the ptmx node. That _can_ be correct - in case you use
"/dev/pts/ptmx" to open the master - but usually is not. The normal
case is to use /dev/ptmx, which then looks up the pts/ directory, and
then the vfsmount of the ptmx node is obviously the /dev directory, not
the /dev/pts/ directory.
We actually did have the right vfsmount available, but in the wrong
place (it gets looked up in 'devpts_acquire()' when we get a reference
to the pts filesystem), and so ptmx_open() used the wrong mnt pointer.
The end result of this confusion was that the pty worked fine, but when
if you did TIOCGPTPEER to get the slave side of the pty, end end result
would also work, but have that dodgy 'struct path'.
And then when doing "d_path()" on to get the pathname, the vfsmount
would not match the root of the pts directory, and d_path() would return
an empty pathname thinking that the entry had escaped a bind mount into
another mount.
This fixes the problem by making devpts_acquire() return the vfsmount
for the pts filesystem, allowing ptmx_open() to trivially just use the
right mount for the pts dentry, and create the proper 'struct path'.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
C-Media devices (at least some models) mute the playback stream when
volumes are set to the minimum value. But this isn't informed via TLV
and the user-space, typically PulseAudio, gets confused as if it's
still played in a low volume.
This patch adds the new flag, min_mute, to struct usb_mixer_elem_info
for indicating that the mixer element is with the minimum-mute volume.
This flag is set for known C-Media devices in
snd_usb_mixer_fu_apply_quirk() in turn.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=196669
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes
Pull "Fourth Round of Renesas ARM Based SoC Fixes for v4.13" from Simon Horman:
* Avoid audio_clkout naming conflict for salvator boards using
Renesas R-Car Gen 3 SoCs
Morimoto-san says "The clock name of "audio_clkout" is used by the
Renesas sound driver. This duplicated naming breaks its clock
registering/unregistering. Especially when unbind/bind it can't handle
clkout correctly. This patch renames "audio_clkout" to "audio-clkout" to
avoid the naming conflict."
* tag 'renesas-fixes4-for-v4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
arm64: renesas: salvator-common: avoid audio_clkout naming conflict
|
|
Historically, DMA masks have suffered some ambiguity between whether
they represent the range of physical memory a device can access, or the
address bits a device is capable of driving, particularly since on many
platforms the two are equivalent. Whilst there are some stragglers left
(dma_max_pfn(), I'm looking at you...), the majority of DMA code has
been cleaned up to follow the latter definition, not least since it is
the only one which makes sense once IOMMUs are involved.
In this respect, of_dma_configure() has always done the wrong thing in
how it generates initial masks based on "dma-ranges". Although rounding
down did not affect the TI Keystone platform where dma_addr + size is
already a power of two, in any other case it results in a mask which is
at best unnecessarily constrained and at worst unusable.
BCM2837 illustrates the problem nicely, where we have a DMA base of 3GB
and a size of 1GB - 16MB, giving dma_addr + size = 0xff000000 and a
resultant mask of 0x7fffffff, which is then insufficient to even cover
the necessary offset, effectively making all DMA addresses out-of-range.
This has been hidden until now (mostly because we don't yet prevent
drivers from simply overwriting this initial mask later upon probe), but
due to recent changes elsewhere now shows up as USB being broken on
Raspberry Pi 3.
Make it right by rounding up instead of down, such that the mask
correctly correctly describes all possisble bits the device needs to
emit.
Fixes: 9a6d7298b083 ("of: Calculate device DMA masks based on DT dma-range size")
Reported-by: Stefan Wahren <stefan.wahren@i2se.com>
Reported-by: Andreas Färber <afaerber@suse.de>
Reported-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A couple of minor fixes (st, ses) and some bigger driver fixes for
qla2xxx (crash triggered by fw dump) and ipr (lockdep problems with
mq)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ses: Fix wrong page error
scsi: ipr: Fix scsi-mq lockdep issue
scsi: st: fix blk_get_queue usage
scsi: qla2xxx: Fix system crash while triggering FW dump
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fixes from Paul Moore:
"Two small fixes to the audit code, both explained well in the
respective patch descriptions, but the quick summary is one
use-after-free fix, and one silly fanotify notification flag fix"
* tag 'audit-pr-20170816' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: Receive unmount event
audit: Fix use after free in audit_remove_watch_rule()
|
|
Sometimes people seems unclear when to use the %pS or %pF printk format.
For example, see commit 51d96dc2e2dc ("random: fix warning message on ia64
and parisc") which fixed such a wrong format string.
The documentation should be more clear about the difference.
Signed-off-by: Helge Deller <deller@gmx.de>
[pmladek@suse.com: Restructure the entire section]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
VSX uses a combination of the old vector registers, the old FP
registers and new "second halves" of the FP registers.
Thus when we need to see the VSX state in the thread struct
(flush_vsx_to_thread()) or when we'll use the VSX in the kernel
(enable_kernel_vsx()) we need to ensure they are all flushed into
the thread struct if either of them is individually enabled.
Unfortunately we only tested if the whole VSX was enabled, not if they
were individually enabled.
Fixes: 72cd7b44bc99 ("powerpc: Uncomment and make enable_kernel_vsx() routine available")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
|
|
For 64bit kernels the lmmio_space_offset of the host bridge window
isn't set correctly on systems with dino/cujo PCI host bridges.
This leads to not assigned memory bars and failing drivers, which
need to use these bars.
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: <stable@vger.kernel.org>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Pull networking fixes from David Miller:
1) Fix TCP checksum offload handling in iwlwifi driver, from Emmanuel
Grumbach.
2) In ksz DSA tagging code, free SKB if skb_put_padto() fails. From
Vivien Didelot.
3) Fix two regressions with bonding on wireless, from Andreas Born.
4) Fix build when busypoll is disabled, from Daniel Borkmann.
5) Fix copy_linear_skb() wrt. SO_PEEK_OFF, from Eric Dumazet.
6) Set SKB cached route properly in inet_rtm_getroute(), from Florian
Westphal.
7) Fix PCI-E relaxed ordering handling in cxgb4 driver, from Ding
Tianhong.
8) Fix module refcnt leak in ULP code, from Sabrina Dubroca.
9) Fix use of GFP_KERNEL in atomic contexts in AF_KEY code, from Eric
Dumazet.
10) Need to purge socket write queue in dccp_destroy_sock(), also from
Eric Dumazet.
11) Make bpf_trace_printk() work properly on 32-bit architectures, from
Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
bpf: fix bpf_trace_printk on 32 bit archs
PCI: fix oops when try to find Root Port for a PCI device
sfc: don't try and read ef10 data on non-ef10 NIC
net_sched: remove warning from qdisc_hash_add
net_sched/sfq: update hierarchical backlog when drop packet
net_sched: reset pointers to tcf blocks in classful qdiscs' destructors
ipv4: fix NULL dereference in free_fib_info_rcu()
net: Fix a typo in comment about sock flags.
ipv6: fix NULL dereference in ip6_route_dev_notify()
tcp: fix possible deadlock in TCP stack vs BPF filter
dccp: purge write queue in dccp_destroy_sock()
udp: fix linear skb reception with PEEK_OFF
ipv6: release rt6->rt6i_idev properly during ifdown
af_key: do not use GFP_KERNEL in atomic contexts
tcp: ulp: avoid module refcnt leak in tcp_set_ulp
net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
PCI: Disable Relaxed Ordering Attributes for AMD A1100
PCI: Disable Relaxed Ordering for some Intel processors
PCI: Disable PCIe Relaxed Ordering if unsupported
...
|
|
James reported that on MIPS32 bpf_trace_printk() is currently
broken while MIPS64 works fine:
bpf_trace_printk() uses conditional operators to attempt to
pass different types to __trace_printk() depending on the
format operators. This doesn't work as intended on 32-bit
architectures where u32 and long are passed differently to
u64, since the result of C conditional operators follows the
"usual arithmetic conversions" rules, such that the values
passed to __trace_printk() will always be u64 [causing issues
later in the va_list handling for vscnprintf()].
For example the samples/bpf/tracex5 test printed lines like
below on MIPS32, where the fd and buf have come from the u64
fd argument, and the size from the buf argument:
[...] 1180.941542: 0x00000001: write(fd=1, buf= (null), size=6258688)
Instead of this:
[...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)
One way to get it working is to expand various combinations
of argument types into 8 different combinations for 32 bit
and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64
as well that it resolves the issue.
Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Reported-by: James Hogan <james.hogan@imgtec.com>
Tested-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Eric report a oops when booting the system after applying
the commit a99b646afa8a ("PCI: Disable PCIe Relaxed..."):
[ 4.241029] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
[ 4.247001] IP: pci_find_pcie_root_port+0x62/0x80
[ 4.253011] PGD 0
[ 4.253011] P4D 0
[ 4.253011]
[ 4.258013] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 4.262015] Modules linked in:
[ 4.265005] CPU: 31 PID: 1 Comm: swapper/0 Not tainted 4.13.0-dbx-DEV #316
[ 4.271002] Hardware name: Intel RML,PCH/Iota_QC_19, BIOS 2.40.0 06/22/2016
[ 4.279002] task: ffffa2ee38cfa040 task.stack: ffffa51ec0004000
[ 4.285001] RIP: 0010:pci_find_pcie_root_port+0x62/0x80
[ 4.290012] RSP: 0000:ffffa51ec0007ab8 EFLAGS: 00010246
[ 4.295003] RAX: 0000000000000000 RBX: ffffa2ee36bae000 RCX: 0000000000000006
[ 4.303002] RDX: 000000000000081c RSI: ffffa2ee38cfa8c8 RDI: ffffa2ee36bae000
[ 4.310013] RBP: ffffa51ec0007b58 R08: 0000000000000001 R09: 0000000000000000
[ 4.317001] R10: 0000000000000000 R11: 0000000000000000 R12: ffffa51ec0007ad0
[ 4.324005] R13: ffffa2ee36bae098 R14: 0000000000000002 R15: ffffa2ee37204818
[ 4.331002] FS: 0000000000000000(0000) GS:ffffa2ee3fcc0000(0000) knlGS:0000000000000000
[ 4.339002] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4.345001] CR2: 0000000000000050 CR3: 000000401000f000 CR4: 00000000001406e0
[ 4.351002] Call Trace:
[ 4.354012] ? pci_configure_device+0x19f/0x570
[ 4.359002] ? pci_conf1_read+0xb8/0xf0
[ 4.363002] ? raw_pci_read+0x23/0x40
[ 4.366011] ? pci_read+0x2c/0x30
[ 4.370014] ? pci_read_config_word+0x67/0x70
[ 4.374012] pci_device_add+0x28/0x230
[ 4.378012] ? pci_vpd_f0_read+0x50/0x80
[ 4.382014] pci_scan_single_device+0x96/0xc0
[ 4.386012] pci_scan_slot+0x79/0xf0
[ 4.389001] pci_scan_child_bus+0x31/0x180
[ 4.394014] acpi_pci_root_create+0x1c6/0x240
[ 4.398013] pci_acpi_scan_root+0x15f/0x1b0
[ 4.402012] acpi_pci_root_add+0x2e6/0x400
[ 4.406012] ? acpi_evaluate_integer+0x37/0x60
[ 4.411002] acpi_bus_attach+0xdf/0x200
[ 4.415002] acpi_bus_attach+0x6a/0x200
[ 4.418014] acpi_bus_attach+0x6a/0x200
[ 4.422013] acpi_bus_scan+0x38/0x70
[ 4.426011] acpi_scan_init+0x10c/0x271
[ 4.429001] acpi_init+0x2fa/0x348
[ 4.433004] ? acpi_sleep_proc_init+0x2d/0x2d
[ 4.437001] do_one_initcall+0x43/0x169
[ 4.441001] kernel_init_freeable+0x1d0/0x258
[ 4.445003] ? rest_init+0xe0/0xe0
[ 4.449001] kernel_init+0xe/0x150
====================== cut here =============================
It looks like the pci_find_pcie_root_port() was trying to
find the Root Port for the PCI device which is the Root
Port already, it will return NULL and trigger the problem,
so check the highest_pcie_bridge to fix thie problem.
Fixes: a99b646afa8a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
Fixes: c56d4450eb68 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The MAC stats command takes a port ID, which doesn't exist on
pre-ef10 NICs (5000- and 6000- series). This is extracted from the
NIC specific data; we misinterpret this as the ef10 data structure,
causing us to read potentially unallocated data. With a KASAN kernel
this can cause errors with:
BUG: KASAN: slab-out-of-bounds in efx_mcdi_mac_stats
Fixes: 0a2ab4d988d7 ("sfc: set the port-id when calling MC_CMD_MAC_STATS")
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Tested-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
It was added in commit e57a784d8cae ("pkt_sched: set root qdisc
before change() in attach_default_qdiscs()") to hide duplicates
from "tc qdisc show" for incative deivices.
After 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
it triggered when classful qdisc is added to inactive device because
default qdiscs are added before switching root qdisc.
Anyway after commit ea3274695353 ("net: sched: avoid duplicates in
qdisc dump") duplicates are filtered right in dumper.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When sfq_enqueue() drops head packet or packet from another queue it
have to update backlog at upper qdiscs too.
Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Traffic filters could keep direct pointers to classes in classful qdisc,
thus qdisc destruction first removes all filters before freeing classes.
Class destruction methods also tries to free attached filters but now
this isn't safe because tcf_block_put() unlike to tcf_destroy_chain()
cannot be called second time.
This patch set class->block to NULL after first tcf_block_put() and
turn second call into no-op.
Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure")
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If fi->fib_metrics could not be allocated in fib_create_info()
we attempt to dereference a NULL pointer in free_fib_info_rcu() :
m = fi->fib_metrics;
if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
kfree(m);
Before my recent patch, we used to call kfree(NULL) and nothing wrong
happened.
Instead of using RCU to defer freeing while we are under memory stress,
it seems better to take immediate action.
This was reported by syzkaller team.
Fixes: 3fb07daff8e9 ("ipv4: add reference counting to metrics")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|