summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-12-29netfilter: nf_conncount: fix argument order to find_next_bitFlorian Westphal
Size and 'next bit' were swapped, this bug could cause worker to reschedule itself even if system was idle. Fixes: 5c789e131cbb9 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-29netfilter: nf_conncount: speculative garbage collection on empty listsPablo Neira Ayuso
Instead of removing a empty list node that might be reintroduced soon thereafter, tentatively place the empty list node on the list passed to tree_nodes_free(), then re-check if the list is empty again before erasing it from the tree. [ Florian: rebase on top of pending nf_conncount fixes ] Fixes: 5c789e131cbb9 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-29netfilter: nf_conncount: move all list iterations under spinlockPablo Neira Ayuso
Two CPUs may race to remove a connection from the list, the existing conn->dead will result in a use-after-free. Use the per-list spinlock to protect list iterations. As all accesses to the list now happen while holding the per-list lock, we no longer need to delay free operations with rcu. Joint work with Florian. Fixes: 5c789e131cbb9 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-29netfilter: nf_conncount: merge lookup and add functionsFlorian Westphal
'lookup' is always followed by 'add'. Merge both and make the list-walk part of nf_conncount_add(). This also avoids one unneeded unlock/re-lock pair. Extra care needs to be taken in count_tree, as we only hold rcu read lock, i.e. we can only insert to an existing tree node after acquiring its lock and making sure it has a nonzero count. As a zero count should be rare, just fall back to insert_tree() (which acquires tree lock). This issue and its solution were pointed out by Shawn Bohrer during patch review. Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-29netfilter: nf_conncount: restart search when nodes have been erasedFlorian Westphal
Shawn Bohrer reported a following crash: |RIP: 0010:rb_erase+0xae/0x360 [..] Call Trace: nf_conncount_destroy+0x59/0xc0 [nf_conncount] cleanup_match+0x45/0x70 [ip_tables] ... Shawn tracked this down to bogus 'parent' pointer: Problem is that when we insert a new node, then there is a chance that the 'parent' that we found was also passed to tree_nodes_free() (because that node was empty) for erase+free. Instead of trying to be clever and detect when this happens, restart the search if we have evicted one or more nodes. To prevent frequent restarts, do not perform gc on the second round. Also, unconditionally schedule the gc worker. The condition gc_count > ARRAY_SIZE(gc_nodes)) cannot be true unless tree grows very large, as the height of the tree will be low even with hundreds of nodes present. Fixes: 5c789e131cbb9 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Reported-by: Shawn Bohrer <sbohrer@cloudflare.com> Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-29netfilter: nf_conncount: split gc in two phasesFlorian Westphal
The lockless workqueue garbage collector can race with packet path garbage collector to delete list nodes, as it calls tree_nodes_free() with the addresses of nodes that might have been free'd already from another cpu. To fix this, split gc into two phases. One phase to perform gc on the connections: From a locking perspective, this is the same as count_tree(): we hold rcu lock, but we do not change the tree, we only change the nodes' contents. The second phase acquires the tree lock and reaps empty nodes. This avoids a race condition of the garbage collection vs. packet path: If a node has been free'd already, the second phase won't find it anymore. This second phase is, from locking perspective, same as insert_tree(). The former only modifies nodes (list content, count), latter modifies the tree itself (rb_erase or rb_insert). Fixes: 5c789e131cbb9 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-29netfilter: nf_conncount: don't skip eviction when age is negativeFlorian Westphal
age is signed integer, so result can be negative when the timestamps have a large delta. In this case we want to discard the entry. Instead of using age >= 2 || age < 0, just make it unsigned. Fixes: b36e4523d4d56 ("netfilter: nf_conncount: fix garbage collection confirm race") Reviewed-by: Shawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-29netfilter: nf_conncount: replace CONNCOUNT_LOCK_SLOTS with CONNCOUNT_SLOTSShawn Bohrer
Most of the time these were the same value anyway, but when CONFIG_LOCKDEP was enabled we would use a smaller number of locks to reduce overhead. Unfortunately having two values is confusing and not worth the complexity. This fixes a bug where tree_gc_worker() would only GC up to CONNCOUNT_LOCK_SLOTS trees which meant when CONFIG_LOCKDEP was enabled not all trees would be GCed by tree_gc_worker(). Fixes: 5c789e131cbb9 ("netfilter: nf_conncount: Add list lock and gc worker, and RCU for init tree search") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Shawn Bohrer <sbohrer@cloudflare.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-29netfilter: nf_tables: fix a missing check of nla_put_failureKangjie Lu
If nla_nest_start() may fail. The fix checks its return value and goes to nla_put_failure if it fails. Signed-off-by: Kangjie Lu <kjlu@umn.edu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-12-28Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - large KASAN update to use arm's "software tag-based mode" - a few misc things - sh updates - ocfs2 updates - just about all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits) kernel/fork.c: mark 'stack_vm_area' with __maybe_unused memcg, oom: notify on oom killer invocation from the charge path mm, swap: fix swapoff with KSM pages include/linux/gfp.h: fix typo mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization memory_hotplug: add missing newlines to debugging output mm: remove __hugepage_set_anon_rmap() include/linux/vmstat.h: remove unused page state adjustment macro mm/page_alloc.c: allow error injection mm: migrate: drop unused argument of migrate_page_move_mapping() blkdev: avoid migration stalls for blkdev pages mm: migrate: provide buffer_migrate_page_norefs() mm: migrate: move migrate_page_lock_buffers() mm: migrate: lock buffers before migrate_page_move_mapping() mm: migration: factor out code to compute expected number of page references mm, page_alloc: enable pcpu_drain with zone capability kmemleak: add config to select auto scan mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init ...
2018-12-28Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This has been a fairly typical cycle, with the usual sorts of driver updates. Several series continue to come through which improve and modernize various parts of the core code, and we finally are starting to get the uAPI command interface cleaned up. - Various driver fixes for bnxt_re, cxgb3/4, hfi1, hns, i40iw, mlx4, mlx5, qib, rxe, usnic - Rework the entire syscall flow for uverbs to be able to run over ioctl(). Finally getting past the historic bad choice to use write() for command execution - More functional coverage with the mlx5 'devx' user API - Start of the HFI1 series for 'TID RDMA' - SRQ support in the hns driver - Support for new IBTA defined 2x lane widths - A big series to consolidate all the driver function pointers into a big struct and have drivers provide a 'static const' version of the struct instead of open coding initialization - New 'advise_mr' uAPI to control device caching/loading of page tables - Support for inline data in SRPT - Modernize how umad uses the driver core and creates cdev's and sysfs files - First steps toward removing 'uobject' from the view of the drivers" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (193 commits) RDMA/srpt: Use kmem_cache_free() instead of kfree() RDMA/mlx5: Signedness bug in UVERBS_HANDLER() IB/uverbs: Signedness bug in UVERBS_HANDLER() IB/mlx5: Allocate the per-port Q counter shared when DEVX is supported IB/umad: Start using dev_groups of class IB/umad: Use class_groups and let core create class file IB/umad: Refactor code to use cdev_device_add() IB/umad: Avoid destroying device while it is accessed IB/umad: Simplify and avoid dynamic allocation of class IB/mlx5: Fix wrong error unwind IB/mlx4: Remove set but not used variable 'pd' RDMA/iwcm: Don't copy past the end of dev_name() string IB/mlx5: Fix long EEH recover time with NVMe offloads IB/mlx5: Simplify netdev unbinding IB/core: Move query port to ioctl RDMA/nldev: Expose port_cap_flags2 IB/core: uverbs copy to struct or zero helper IB/rxe: Reuse code which sets port state IB/rxe: Make counters thread safe IB/mlx5: Use the correct commands for UMEM and UCTX allocation ...
2018-12-28Merge tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "This is the main pull request for block/storage for 4.21. Larger than usual, it was a busy round with lots of goodies queued up. Most notable is the removal of the old IO stack, which has been a long time coming. No new features for a while, everything coming in this week has all been fixes for things that were previously merged. This contains: - Use atomic counters instead of semaphores for mtip32xx (Arnd) - Cleanup of the mtip32xx request setup (Christoph) - Fix for circular locking dependency in loop (Jan, Tetsuo) - bcache (Coly, Guoju, Shenghui) * Optimizations for writeback caching * Various fixes and improvements - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith) * host and target support for NVMe over TCP * Error log page support * Support for separate read/write/poll queues * Much improved polling * discard OOM fallback * Tracepoint improvements - lightnvm (Hans, Hua, Igor, Matias, Javier) * Igor added packed metadata to pblk. Now drives without metadata per LBA can be used as well. * Fix from Geert on uninitialized value on chunk metadata reads. * Fixes from Hans and Javier to pblk recovery and write path. * Fix from Hua Su to fix a race condition in the pblk recovery code. * Scan optimization added to pblk recovery from Zhoujie. * Small geometry cleanup from me. - Conversion of the last few drivers that used the legacy path to blk-mq (me) - Removal of legacy IO path in SCSI (me, Christoph) - Removal of legacy IO stack and schedulers (me) - Support for much better polling, now without interrupts at all. blk-mq adds support for multiple queue maps, which enables us to have a map per type. This in turn enables nvme to have separate completion queues for polling, which can then be interrupt-less. Also means we're ready for async polled IO, which is hopefully coming in the next release. - Killing of (now) unused block exports (Christoph) - Unification of the blk-rq-qos and blk-wbt wait handling (Josef) - Support for zoned testing with null_blk (Masato) - sx8 conversion to per-host tag sets (Christoph) - IO priority improvements (Damien) - mq-deadline zoned fix (Damien) - Ref count blkcg series (Dennis) - Lots of blk-mq improvements and speedups (me) - sbitmap scalability improvements (me) - Make core inflight IO accounting per-cpu (Mikulas) - Export timeout setting in sysfs (Weiping) - Cleanup the direct issue path (Jianchao) - Export blk-wbt internals in block debugfs for easier debugging (Ming) - Lots of other fixes and improvements" * tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits) kyber: use sbitmap add_wait_queue/list_del wait helpers sbitmap: add helpers for add/del wait queue handling block: save irq state in blkg_lookup_create() dm: don't reuse bio for flushes nvme-pci: trace SQ status on completions nvme-rdma: implement polling queue map nvme-fabrics: allow user to pass in nr_poll_queues nvme-fabrics: allow nvmf_connect_io_queue to poll nvme-core: optionally poll sync commands block: make request_to_qc_t public nvme-tcp: fix spelling mistake "attepmpt" -> "attempt" nvme-tcp: fix endianess annotations nvmet-tcp: fix endianess annotations nvme-pci: refactor nvme_poll_irqdisable to make sparse happy nvme-pci: only set nr_maps to 2 if poll queues are supported nvmet: use a macro for default error location nvmet: fix comparison of a u16 with -1 blk-mq: enable IO poll if .nr_queues of type poll > 0 blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() blk-mq: skip zero-queue maps in blk_mq_map_swqueue ...
2018-12-28Merge tag 'y2038-for-4.21' of ↵Linus Torvalds
ssh://gitolite.kernel.org:/pub/scm/linux/kernel/git/arnd/playground Pull y2038 updates from Arnd Bergmann: "More syscalls and cleanups This concludes the main part of the system call rework for 64-bit time_t, which has spread over most of year 2018, the last six system calls being - ppoll - pselect6 - io_pgetevents - recvmmsg - futex - rt_sigtimedwait As before, nothing changes for 64-bit architectures, while 32-bit architectures gain another entry point that differs only in the layout of the timespec structure. Hopefully in the next release we can wire up all 22 of those system calls on all 32-bit architectures, which gives us a baseline version for glibc to start using them. This does not include the clock_adjtime, getrusage/waitid, and getitimer/setitimer system calls. I still plan to have new versions of those as well, but they are not required for correct operation of the C library since they can be emulated using the old 32-bit time_t based system calls. Aside from the system calls, there are also a few cleanups here, removing old kernel internal interfaces that have become unused after all references got removed. The arch/sh cleanups are part of this, there were posted several times over the past year without a reaction from the maintainers, while the corresponding changes made it into all other architectures" * tag 'y2038-for-4.21' of ssh://gitolite.kernel.org:/pub/scm/linux/kernel/git/arnd/playground: timekeeping: remove obsolete time accessors vfs: replace current_kernel_time64 with ktime equivalent timekeeping: remove timespec_add/timespec_del timekeeping: remove unused {read,update}_persistent_clock sh: remove board_time_init() callback sh: remove unused rtc_sh_get/set_time infrastructure sh: sh03: rtc: push down rtc class ops into driver sh: dreamcast: rtc: push down rtc class ops into driver y2038: signal: Add compat_sys_rt_sigtimedwait_time64 y2038: signal: Add sys_rt_sigtimedwait_time32 y2038: socket: Add compat_sys_recvmmsg_time64 y2038: futex: Add support for __kernel_timespec y2038: futex: Move compat implementation into futex.c io_pgetevents: use __kernel_timespec pselect6: use __kernel_timespec ppoll: use __kernel_timespec signal: Add restore_user_sigmask() signal: Add set_user_sigmask()
2018-12-28mm: convert totalram_pages and totalhigh_pages variables to atomicArun KS
totalram_pages and totalhigh_pages are made static inline function. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes better to remove the lock and convert variables to atomic, with preventing poteintial store-to-read tearing as a bonus. [akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Suggested-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28mm: reference totalram_pages and managed_pages once per functionArun KS
Patch series "mm: convert totalram_pages, totalhigh_pages and managed pages to atomic", v5. This series converts totalram_pages, totalhigh_pages and zone->managed_pages to atomic variables. totalram_pages, zone->managed_pages and totalhigh_pages updates are protected by managed_page_count_lock, but readers never care about it. Convert these variables to atomic to avoid readers potentially seeing a store tear. Main motivation was that managed_page_count_lock handling was complicating things. It was discussed in length here, https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better to remove the lock and convert variables to atomic. With the change, preventing poteintial store-to-read tearing comes as a bonus. This patch (of 4): This is in preparation to a later patch which converts totalram_pages and zone->managed_pages to atomic variables. Please note that re-reading the value might lead to a different value and as such it could lead to unexpected behavior. There are no known bugs as a result of the current code but it is better to prevent from them in principle. Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.org Signed-off-by: Arun KS <arunks@codeaurora.org> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-27sunrpc: fix debug message in svc_create_xprt()Vasily Averin
_svc_create_xprt() returns positive port number so its non-zero return value is not an error Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: make visible processing error in bc_svc_process()Vasily Averin
Force bc_svc_process() to generate debug message after processing errors Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: remove unused xpo_prep_reply_hdr callbackVasily Averin
xpo_prep_reply_hdr are not used now. It was defined for tcp transport only, however it cannot be called indirectly, so let's move it to its caller and remove unused callback. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: remove svc_rdma_bc_classVasily Averin
Remove svc_xprt_class svc_rdma_bc_class and related functions. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: remove svc_tcp_bc_classVasily Averin
Remove svc_xprt_class svc_tcp_bc_class and related functions Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: remove unused bc_up operation from rpc_xprt_opsVasily Averin
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: replace svc_serv->sv_bc_xprt by boolean flagVasily Averin
svc_serv-> sv_bc_xprt is netns-unsafe and cannot be used as pointer. To prevent its misuse in future it is replaced by new boolean flag. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: use-after-free in svc_process_common()Vasily Averin
if node have NFSv41+ mounts inside several net namespaces it can lead to use-after-free in svc_process_common() svc_process_common() /* Setup reply header */ rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE svc_process_common() can use incorrect rqstp->rq_xprt, its caller function bc_svc_process() takes it from serv->sv_bc_xprt. The problem is that serv is global structure but sv_bc_xprt is assigned per-netnamespace. According to Trond, the whole "let's set up rqstp->rq_xprt for the back channel" is nothing but a giant hack in order to work around the fact that svc_process_common() uses it to find the xpt_ops, and perform a couple of (meaningless for the back channel) tests of xpt_flags. All we really need in svc_process_common() is to be able to run rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr() Bruce J Fields points that this xpo_prep_reply_hdr() call is an awfully roundabout way just to do "svc_putnl(resv, 0);" in the tcp case. This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(), now it calls svc_process_common() with rqstp->rq_xprt = NULL. To adjust reply header svc_process_common() just check rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case. To handle rqstp->rq_xprt = NULL case in functions called from svc_process_common() patch intruduces net namespace pointer svc_rqst->rq_bc_net and adjust SVC_NET() definition. Some other function was also adopted to properly handle described case. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: stable@vger.kernel.org Fixes: 23c20ecd4475 ("NFS: callback up - users counting cleanup") Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27sunrpc: use SVC_NET() in svcauth_gss_* functionsVasily Averin
Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-12-27tipc: fix a missing check of genlmsg_putKangjie Lu
genlmsg_put could fail. The fix inserts a check of its return value, and if it fails, returns -EMSGSIZE. Signed-off-by: Kangjie Lu <kjlu@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-27ipv6/route: Add a missing check on proc_dointvecAditya Pakki
While flushing the cache via ipv6_sysctl_rtcache_flush(), the call to proc_dointvec() may fail. The fix adds a check that returns the error, on failure. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-27tipc: fix a double free in tipc_enable_bearer()Cong Wang
bearer_disable() already calls kfree_rcu() to free struct tipc_bearer, we don't need to call kfree() again. Fixes: cb30a63384bc ("tipc: refactor function tipc_enable_bearer()") Reported-by: syzbot+b981acf1fb240c0c128b@syzkaller.appspotmail.com Cc: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-27Merge branch 'linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add 1472-byte test to tcrypt for IPsec - Reintroduced crypto stats interface with numerous changes - Support incremental algorithm dumps Algorithms: - Add xchacha12/20 - Add nhpoly1305 - Add adiantum - Add streebog hash - Mark cts(cbc(aes)) as FIPS allowed Drivers: - Improve performance of arm64/chacha20 - Improve performance of x86/chacha20 - Add NEON-accelerated nhpoly1305 - Add SSE2 accelerated nhpoly1305 - Add AVX2 accelerated nhpoly1305 - Add support for 192/256-bit keys in gcmaes AVX - Add SG support in gcmaes AVX - ESN for inline IPsec tx in chcr - Add support for CryptoCell 703 in ccree - Add support for CryptoCell 713 in ccree - Add SM4 support in ccree - Add SM3 support in ccree - Add support for chacha20 in caam/qi2 - Add support for chacha20 + poly1305 in caam/jr - Add support for chacha20 + poly1305 in caam/qi2 - Add AEAD cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (130 commits) crypto: skcipher - remove remnants of internal IV generators crypto: cavium/nitrox - Fix build with !CONFIG_DEBUG_FS crypto: salsa20-generic - don't unnecessarily use atomic walk crypto: skcipher - add might_sleep() to skcipher_walk_virt() crypto: x86/chacha - avoid sleeping under kernel_fpu_begin() crypto: cavium/nitrox - Added AEAD cipher support crypto: mxc-scc - fix build warnings on ARM64 crypto: api - document missing stats member crypto: user - remove unused dump functions crypto: chelsio - Fix wrong error counter increments crypto: chelsio - Reset counters on cxgb4 Detach crypto: chelsio - Handle PCI shutdown event crypto: chelsio - cleanup:send addr as value in function argument crypto: chelsio - Use same value for both channel in single WR crypto: chelsio - Swap location of AAD and IV sent in WR crypto: chelsio - remove set but not used variable 'kctx_len' crypto: ux500 - Use proper enum in hash_set_dma_transfer crypto: ux500 - Use proper enum in cryp_set_dma_transfer crypto: aesni - Add scatter/gather avx stubs, and use them in C crypto: aesni - Introduce partial block macro ..
2018-12-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) New ipset extensions for matching on destination MAC addresses, from Stefano Brivio. 2) Add ipv4 ttl and tos, plus ipv6 flow label and hop limit offloads to nfp driver. From Stefano Brivio. 3) Implement GRO for plain UDP sockets, from Paolo Abeni. 4) Lots of work from Michał Mirosław to eliminate the VLAN_TAG_PRESENT bit so that we could support the entire vlan_tci value. 5) Rework the IPSEC policy lookups to better optimize more usecases, from Florian Westphal. 6) Infrastructure changes eliminating direct manipulation of SKB lists wherever possible, and to always use the appropriate SKB list helpers. This work is still ongoing... 7) Lots of PHY driver and state machine improvements and simplifications, from Heiner Kallweit. 8) Various TSO deferral refinements, from Eric Dumazet. 9) Add ntuple filter support to aquantia driver, from Dmitry Bogdanov. 10) Batch dropping of XDP packets in tuntap, from Jason Wang. 11) Lots of cleanups and improvements to the r8169 driver from Heiner Kallweit, including support for ->xmit_more. This driver has been getting some much needed love since he started working on it. 12) Lots of new forwarding selftests from Petr Machata. 13) Enable VXLAN learning in mlxsw driver, from Ido Schimmel. 14) Packed ring support for virtio, from Tiwei Bie. 15) Add new Aquantia AQtion USB driver, from Dmitry Bezrukov. 16) Add XDP support to dpaa2-eth driver, from Ioana Ciocoi Radulescu. 17) Implement coalescing on TCP backlog queue, from Eric Dumazet. 18) Implement carrier change in tun driver, from Nicolas Dichtel. 19) Support msg_zerocopy in UDP, from Willem de Bruijn. 20) Significantly improve garbage collection of neighbor objects when the table has many PERMANENT entries, from David Ahern. 21) Remove egdev usage from nfp and mlx5, and remove the facility completely from the tree as it no longer has any users. From Oz Shlomo and others. 22) Add a NETDEV_PRE_CHANGEADDR so that drivers can veto the change and therefore abort the operation before the commit phase (which is the NETDEV_CHANGEADDR event). From Petr Machata. 23) Add indirect call wrappers to avoid retpoline overhead, and use them in the GRO code paths. From Paolo Abeni. 24) Add support for netlink FDB get operations, from Roopa Prabhu. 25) Support bloom filter in mlxsw driver, from Nir Dotan. 26) Add SKB extension infrastructure. This consolidates the handling of the auxiliary SKB data used by IPSEC and bridge netfilter, and is designed to support the needs to MPTCP which could be integrated in the future. 27) Lots of XDP TX optimizations in mlx5 from Tariq Toukan. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1845 commits) net: dccp: fix kernel crash on module load drivers/net: appletalk/cops: remove redundant if statement and mask bnx2x: Fix NULL pointer dereference in bnx2x_del_all_vlans() on some hw net/net_namespace: Check the return value of register_pernet_subsys() net/netlink_compat: Fix a missing check of nla_parse_nested ieee802154: lowpan_header_create check must check daddr net/mlx4_core: drop useless LIST_HEAD mlxsw: spectrum: drop useless LIST_HEAD net/mlx5e: drop useless LIST_HEAD iptunnel: Set tun_flags in the iptunnel_metadata_reply from src net/mlx5e: fix semicolon.cocci warnings staging: octeon: fix build failure with XFRM enabled net: Revert recent Spectre-v1 patches. can: af_can: Fix Spectre v1 vulnerability packet: validate address length if non-zero nfc: af_nfc: Fix Spectre v1 vulnerability phonet: af_phonet: Fix Spectre v1 vulnerability net: core: Fix Spectre v1 vulnerability net: minor cleanup in skb_ext_add() net: drop the unused helper skb_ext_get() ...
2018-12-26Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The biggest RCU changes in this cycle were: - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar. - Replace calls of RCU-bh and RCU-sched update-side functions to their vanilla RCU counterparts. This series is a step towards complete removal of the RCU-bh and RCU-sched update-side functions. ( Note that some of these conversions are going upstream via their respective maintainers. ) - Documentation updates, including a number of flavor-consolidation updates from Joel Fernandes. - Miscellaneous fixes. - Automate generation of the initrd filesystem used for rcutorture testing. - Convert spin_is_locked() assertions to instead use lockdep. ( Note that some of these conversions are going upstream via their respective maintainers. ) - SRCU updates, especially including a fix from Dennis Krein for a bag-on-head-class bug. - RCU torture-test updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits) rcutorture: Don't do busted forward-progress testing rcutorture: Use 100ms buckets for forward-progress callback histograms rcutorture: Recover from OOM during forward-progress tests rcutorture: Print forward-progress test age upon failure rcutorture: Print time since GP end upon forward-progress failure rcutorture: Print histogram of CB invocation at OOM time rcutorture: Print GP age upon forward-progress failure rcu: Print per-CPU callback counts for forward-progress failures rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings rcutorture: Dump grace-period diagnostics upon forward-progress OOM rcutorture: Prepare for asynchronous access to rcu_fwd_startat torture: Remove unnecessary "ret" variables rcutorture: Affinity forward-progress test to avoid housekeeping CPUs rcutorture: Break up too-long rcu_torture_fwd_prog() function rcutorture: Remove cbflood facility torture: Bring any extra CPUs online during kernel startup rcutorture: Add call_rcu() flooding forward-progress tests rcutorture/formal: Replace synchronize_sched() with synchronize_rcu() tools/kernel.h: Replace synchronize_sched() with synchronize_rcu() net/decnet: Replace rcu_barrier_bh() with rcu_barrier() ...
2018-12-259p/net: put a lower bound on msizeDominique Martinet
If the requested msize is too small (either from command line argument or from the server version reply), we won't get any work done. If it's *really* too small, nothing will work, and this got caught by syzbot recently (on a new kmem_cache_create_usercopy() call) Just set a minimum msize to 4k in both code paths, until someone complains they have a use-case for a smaller msize. We need to check in both mount option and server reply individually because the msize for the first version request would be unchecked with just a global check on clnt->msize. Link: http://lkml.kernel.org/r/1541407968-31350-1-git-send-email-asmadeus@codewreck.org Reported-by: syzbot+0c1d61e4db7db94102ca@syzkaller.appspotmail.com Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Cc: stable@vger.kernel.org
2018-12-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Pull in bug fixes before respinning my net-next pull request. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24net: dccp: fix kernel crash on module loadPeter Oskolkov
Patch eedbbb0d98b2 "net: dccp: initialize (addr,port) ..." added calling to inet_hashinfo2_init() from dccp_init(). However, inet_hashinfo2_init() is marked as __init(), and thus the kernel panics when dccp is loaded as module. Removing __init() tag from inet_hashinfo2_init() is not feasible because it calls into __init functions in mm. This patch adds inet_hashinfo2_init_mod() function that can be called after the init phase is done; changes dccp_init() to call the new function; un-marks inet_hashinfo2_init() as exported. Fixes: eedbbb0d98b2 ("net: dccp: initialize (addr,port) ...") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24net/net_namespace: Check the return value of register_pernet_subsys()Aditya Pakki
In net_ns_init(), register_pernet_subsys() could fail while registering network namespace subsystems. The fix checks the return value and sends a panic() on failure. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24net/netlink_compat: Fix a missing check of nla_parse_nestedAditya Pakki
In tipc_nl_compat_sk_dump(), if nla_parse_nested() fails, it could return an error. To be consistent with other invocations of the function call, on error, the fix passes the return value upstream. Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24ieee802154: lowpan_header_create check must check daddrWillem de Bruijn
Packet sockets may call dev_header_parse with NULL daddr. Make lowpan_header_ops.create fail. Fixes: 87a93e4eceb4 ("ieee802154: change needed headroom/tailroom") Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-24iptunnel: Set tun_flags in the iptunnel_metadata_reply from srcwenxu
ip l add tun type gretap external ip r a 10.0.0.2 encap ip id 1000 dst 172.168.0.2 key dev tun ip a a 10.0.0.1/24 dev tun The peer arp request to 10.0.0.1 with tunnel_id, but the arp reply only set the tun_id but not the tun_flags with TUNNEL_KEY. The arp reply packet don't contain tun_id field. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-23net: Revert recent Spectre-v1 patches.David S. Miller
This reverts: 50d5258634ae ("net: core: Fix Spectre v1 vulnerability") d686026b1e6e ("phonet: af_phonet: Fix Spectre v1 vulnerability") a95386f0390a ("nfc: af_nfc: Fix Spectre v1 vulnerability") a3ac5817ffe8 ("can: af_can: Fix Spectre v1 vulnerability") After some discussion with Alexei Starovoitov these all seem to be completely unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22can: af_can: Fix Spectre v1 vulnerabilityGustavo A. R. Silva
protocol is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/can/af_can.c:115 can_get_proto() warn: potential spectre issue 'proto_tab' [w] Fix this by sanitizing protocol before using it to index proto_tab. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22packet: validate address length if non-zeroWillem de Bruijn
Validate packet socket address length if a length is given. Zero length is equivalent to not setting an address. Fixes: 99137b7888f4 ("packet: validate address length") Reported-by: Ido Schimmel <idosch@idosch.org> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22nfc: af_nfc: Fix Spectre v1 vulnerabilityGustavo A. R. Silva
proto is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/nfc/af_nfc.c:42 nfc_sock_create() warn: potential spectre issue 'proto_tab' [w] (local cap) Fix this by sanitizing proto before using it to index proto_tab. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22phonet: af_phonet: Fix Spectre v1 vulnerabilityGustavo A. R. Silva
protocol is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/phonet/af_phonet.c:48 phonet_proto_get() warn: potential spectre issue 'proto_tab' [w] (local cap) Fix this by sanitizing protocol before using it to index proto_tab. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-22net: core: Fix Spectre v1 vulnerabilityGustavo A. R. Silva
flen is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/core/filter.c:1101 bpf_check_classic() warn: potential spectre issue 'filter' [w] Fix this by sanitizing flen before using it to index filter at line 1101: switch (filter[flen - 1].code) { and through pc at line 1040: const struct sock_filter *ftest = &filter[pc]; Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-12-21tls: Do not call sk_memcopy_from_iter with zero lengthVakul Garg
In some conditions e.g. when tls_clone_plaintext_msg() returns -ENOSPC, the number of bytes to be copied using subsequent function sk_msg_memcopy_from_iter() becomes zero. This causes function sk_msg_memcopy_from_iter() to fail which in turn causes tls_sw_sendmsg() to return failure. To prevent it, do not call sk_msg_memcopy_from_iter() when number of bytes to copy (indicated by 'try_to_copy') is zero. Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21net: minor cleanup in skb_ext_add()Paolo Abeni
When the extension to be added is already present, the only skb field we may need to update is 'extensions': we can reorder the code and avoid a branch. v1 -> v2: - be sure to flag the newly added extension as active Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21net: fix possible user-after-free in skb_ext_add()Paolo Abeni
On cow we can free the old extension: we must avoid dereferencing such extension after skb_ext_maybe_cow(). Since 'new' contents are always equal to 'old' after the copy, we can fix the above accessing the relevant data using 'new'. Fixes: df5042f4c5b9 ("sk_buff: add skb extension infrastructure") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21ipv6: tunnels: fix two use-after-freeEric Dumazet
xfrm6_policy_check() might have re-allocated skb->head, we need to reload ipv6 header pointer. sysbot reported : BUG: KASAN: use-after-free in __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40 Read of size 4 at addr ffff888191b8cb70 by task syz-executor2/1304 CPU: 0 PID: 1304 Comm: syz-executor2 Not tainted 4.20.0-rc7+ #356 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x244/0x39d lib/dump_stack.c:113 print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:432 __ipv6_addr_type+0x302/0x32f net/ipv6/addrconf_core.c:40 ipv6_addr_type include/net/ipv6.h:403 [inline] ip6_tnl_get_cap+0x27/0x190 net/ipv6/ip6_tunnel.c:727 ip6_tnl_rcv_ctl+0xdb/0x2a0 net/ipv6/ip6_tunnel.c:757 vti6_rcv+0x336/0x8f3 net/ipv6/ip6_vti.c:321 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434 NF_HOOK include/linux/netfilter.h:289 [inline] ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443 IPVS: ftp: loaded support on port[0] = 21 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537 dst_input include/net/dst.h:450 [inline] ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:289 [inline] ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083 process_backlog+0x24e/0x7a0 net/core/dev.c:5923 napi_poll net/core/dev.c:6346 [inline] net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412 __do_softirq+0x308/0xb7e kernel/softirq.c:292 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1027 </IRQ> do_softirq.part.14+0x126/0x160 kernel/softirq.c:337 do_softirq+0x19/0x20 kernel/softirq.c:340 netif_rx_ni+0x521/0x860 net/core/dev.c:4569 dev_loopback_xmit+0x287/0x8c0 net/core/dev.c:3576 NF_HOOK include/linux/netfilter.h:289 [inline] ip6_finish_output2+0x193a/0x2930 net/ipv6/ip6_output.c:84 ip6_fragment+0x2b06/0x3850 net/ipv6/ip6_output.c:727 ip6_finish_output+0x6b7/0xc50 net/ipv6/ip6_output.c:152 NF_HOOK_COND include/linux/netfilter.h:278 [inline] ip6_output+0x232/0x9d0 net/ipv6/ip6_output.c:171 dst_output include/net/dst.h:444 [inline] ip6_local_out+0xc5/0x1b0 net/ipv6/output_core.c:176 ip6_send_skb+0xbc/0x340 net/ipv6/ip6_output.c:1727 ip6_push_pending_frames+0xc5/0xf0 net/ipv6/ip6_output.c:1747 rawv6_push_pending_frames net/ipv6/raw.c:615 [inline] rawv6_sendmsg+0x3a3e/0x4b40 net/ipv6/raw.c:945 kobject: 'queues' (0000000089e6eea2): kobject_add_internal: parent: 'tunl0', set: '<NULL>' kobject: 'queues' (0000000089e6eea2): kobject_uevent_env inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 kobject: 'queues' (0000000089e6eea2): kobject_uevent_env: filter function caused the event to drop! sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 sock_write_iter+0x35e/0x5c0 net/socket.c:900 call_write_iter include/linux/fs.h:1857 [inline] new_sync_write fs/read_write.c:474 [inline] __vfs_write+0x6b8/0x9f0 fs/read_write.c:487 kobject: 'rx-0' (00000000e2d902d9): kobject_add_internal: parent: 'queues', set: 'queues' kobject: 'rx-0' (00000000e2d902d9): kobject_uevent_env vfs_write+0x1fc/0x560 fs/read_write.c:549 ksys_write+0x101/0x260 fs/read_write.c:598 kobject: 'rx-0' (00000000e2d902d9): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/rx-0' __do_sys_write fs/read_write.c:610 [inline] __se_sys_write fs/read_write.c:607 [inline] __x64_sys_write+0x73/0xb0 fs/read_write.c:607 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 kobject: 'tx-0' (00000000443b70ac): kobject_add_internal: parent: 'queues', set: 'queues' entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457669 Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f9bd200bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000457669 RDX: 000000000000058f RSI: 00000000200033c0 RDI: 0000000000000003 kobject: 'tx-0' (00000000443b70ac): kobject_uevent_env RBP: 000000000072bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9bd200c6d4 R13: 00000000004c2dcc R14: 00000000004da398 R15: 00000000ffffffff Allocated by task 1304: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553 __do_kmalloc_node mm/slab.c:3684 [inline] __kmalloc_node_track_caller+0x50/0x70 mm/slab.c:3698 __kmalloc_reserve.isra.41+0x41/0xe0 net/core/skbuff.c:140 __alloc_skb+0x155/0x760 net/core/skbuff.c:208 kobject: 'tx-0' (00000000443b70ac): fill_kobj_path: path = '/devices/virtual/net/tunl0/queues/tx-0' alloc_skb include/linux/skbuff.h:1011 [inline] __ip6_append_data.isra.49+0x2f1a/0x3f50 net/ipv6/ip6_output.c:1450 ip6_append_data+0x1bc/0x2d0 net/ipv6/ip6_output.c:1619 rawv6_sendmsg+0x15ab/0x4b40 net/ipv6/raw.c:938 inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798 sock_sendmsg_nosec net/socket.c:621 [inline] sock_sendmsg+0xd5/0x120 net/socket.c:631 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2116 __sys_sendmsg+0x11d/0x280 net/socket.c:2154 __do_sys_sendmsg net/socket.c:2163 [inline] __se_sys_sendmsg net/socket.c:2161 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2161 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe kobject: 'gre0' (00000000cb1b2d7b): kobject_add_internal: parent: 'net', set: 'devices' Freed by task 1304: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kfree+0xcf/0x230 mm/slab.c:3817 skb_free_head+0x93/0xb0 net/core/skbuff.c:553 pskb_expand_head+0x3b2/0x10d0 net/core/skbuff.c:1498 __pskb_pull_tail+0x156/0x18a0 net/core/skbuff.c:1896 pskb_may_pull include/linux/skbuff.h:2188 [inline] _decode_session6+0xd11/0x14d0 net/ipv6/xfrm6_policy.c:150 __xfrm_decode_session+0x71/0x140 net/xfrm/xfrm_policy.c:3272 kobject: 'gre0' (00000000cb1b2d7b): kobject_uevent_env __xfrm_policy_check+0x380/0x2c40 net/xfrm/xfrm_policy.c:3322 __xfrm_policy_check2 include/net/xfrm.h:1170 [inline] xfrm_policy_check include/net/xfrm.h:1175 [inline] xfrm6_policy_check include/net/xfrm.h:1185 [inline] vti6_rcv+0x4bd/0x8f3 net/ipv6/ip6_vti.c:316 xfrm6_ipcomp_rcv+0x1a5/0x3a0 net/ipv6/xfrm6_protocol.c:132 ip6_protocol_deliver_rcu+0x372/0x1940 net/ipv6/ip6_input.c:394 ip6_input_finish+0x84/0x170 net/ipv6/ip6_input.c:434 NF_HOOK include/linux/netfilter.h:289 [inline] ip6_input+0xe9/0x600 net/ipv6/ip6_input.c:443 ip6_mc_input+0x514/0x11c0 net/ipv6/ip6_input.c:537 dst_input include/net/dst.h:450 [inline] ip6_rcv_finish+0x17a/0x330 net/ipv6/ip6_input.c:76 NF_HOOK include/linux/netfilter.h:289 [inline] ipv6_rcv+0x115/0x640 net/ipv6/ip6_input.c:272 __netif_receive_skb_one_core+0x14d/0x200 net/core/dev.c:4973 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:5083 process_backlog+0x24e/0x7a0 net/core/dev.c:5923 kobject: 'gre0' (00000000cb1b2d7b): fill_kobj_path: path = '/devices/virtual/net/gre0' napi_poll net/core/dev.c:6346 [inline] net_rx_action+0x7fa/0x19b0 net/core/dev.c:6412 __do_softirq+0x308/0xb7e kernel/softirq.c:292 The buggy address belongs to the object at ffff888191b8cac0 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 176 bytes inside of 512-byte region [ffff888191b8cac0, ffff888191b8ccc0) The buggy address belongs to the page: page:ffffea000646e300 count:1 mapcount:0 mapping:ffff8881da800940 index:0x0 flags: 0x2fffc0000000200(slab) raw: 02fffc0000000200 ffffea0006eaaa48 ffffea00065356c8 ffff8881da800940 raw: 0000000000000000 ffff888191b8c0c0 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected kobject: 'queues' (000000005fd6226e): kobject_add_internal: parent: 'gre0', set: '<NULL>' Memory state around the buggy address: ffff888191b8ca00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888191b8ca80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb >ffff888191b8cb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888191b8cb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888191b8cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 0d3c703a9d17 ("ipv6: Cleanup IPv6 tunnel receive path") Fixes: ed1efb2aefbb ("ipv6: Add support for IPsec virtual tunnel interfaces") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21Prevent overflow of sk_msg in sk_msg_clone()Vakul Garg
Fixed function sk_msg_clone() to prevent overflow of 'dst' while adding pages in scatterlist entries. The overflow of 'dst' causes crash in kernel tls module while doing record encryption. Crash fixed by this patch. [ 78.796119] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 78.804900] Mem abort info: [ 78.807683] ESR = 0x96000004 [ 78.810744] Exception class = DABT (current EL), IL = 32 bits [ 78.816677] SET = 0, FnV = 0 [ 78.819727] EA = 0, S1PTW = 0 [ 78.822873] Data abort info: [ 78.825759] ISV = 0, ISS = 0x00000004 [ 78.829600] CM = 0, WnR = 0 [ 78.832576] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000bf8ee311 [ 78.839195] [0000000000000008] pgd=0000000000000000 [ 78.844081] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 78.849642] Modules linked in: tls xt_conntrack ipt_REJECT nf_reject_ipv4 ip6table_filter ip6_tables xt_CHECKSUM cpve cpufreq_conservative lm90 ina2xx crct10dif_ce [ 78.865377] CPU: 0 PID: 6007 Comm: openssl Not tainted 4.20.0-rc6-01647-g754d5da63145-dirty #107 [ 78.874149] Hardware name: LS1043A RDB Board (DT) [ 78.878844] pstate: 60000005 (nZCv daif -PAN -UAO) [ 78.883632] pc : scatterwalk_copychunks+0x164/0x1c8 [ 78.888500] lr : scatterwalk_copychunks+0x160/0x1c8 [ 78.893366] sp : ffff00001d04b600 [ 78.896668] x29: ffff00001d04b600 x28: ffff80006814c680 [ 78.901970] x27: 0000000000000000 x26: ffff80006c8de786 [ 78.907272] x25: ffff00001d04b760 x24: 000000000000001a [ 78.912573] x23: 0000000000000006 x22: ffff80006814e440 [ 78.917874] x21: 0000000000000100 x20: 0000000000000000 [ 78.923175] x19: 000081ffffffffff x18: 0000000000000400 [ 78.928476] x17: 0000000000000008 x16: 0000000000000000 [ 78.933778] x15: 0000000000000100 x14: 0000000000000001 [ 78.939079] x13: 0000000000001080 x12: 0000000000000020 [ 78.944381] x11: 0000000000001080 x10: 00000000ffff0002 [ 78.949683] x9 : ffff80006814c248 x8 : 00000000ffff0000 [ 78.954985] x7 : ffff80006814c318 x6 : ffff80006c8de786 [ 78.960286] x5 : 0000000000000f80 x4 : ffff80006c8de000 [ 78.965588] x3 : 0000000000000000 x2 : 0000000000001086 [ 78.970889] x1 : ffff7e0001b74e02 x0 : 0000000000000000 [ 78.976192] Process openssl (pid: 6007, stack limit = 0x00000000291367f9) [ 78.982968] Call trace: [ 78.985406] scatterwalk_copychunks+0x164/0x1c8 [ 78.989927] skcipher_walk_next+0x28c/0x448 [ 78.994099] skcipher_walk_done+0xfc/0x258 [ 78.998187] gcm_encrypt+0x434/0x4c0 [ 79.001758] tls_push_record+0x354/0xa58 [tls] [ 79.006194] bpf_exec_tx_verdict+0x1e4/0x3e8 [tls] [ 79.010978] tls_sw_sendmsg+0x650/0x780 [tls] [ 79.015326] inet_sendmsg+0x2c/0xf8 [ 79.018806] sock_sendmsg+0x18/0x30 [ 79.022284] __sys_sendto+0x104/0x138 [ 79.025935] __arm64_sys_sendto+0x24/0x30 [ 79.029936] el0_svc_common+0x60/0xe8 [ 79.033588] el0_svc_handler+0x2c/0x80 [ 79.037327] el0_svc+0x8/0xc [ 79.040200] Code: 6b01005f 54fff788 940169b1 f9000320 (b9400801) [ 79.046283] ---[ end trace 74db007d069c1cf7 ]--- Fixes: d829e9c4112b ("tls: convert to generic sk_msg interface") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-12-21packet: validate address lengthWillem de Bruijn
Packet sockets with SOCK_DGRAM may pass an address for use in dev_hard_header. Ensure that it is of sufficient length. Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>