summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-22bonding: Force slave speed check after link state recovery for 802.3adThomas Falcon
The following scenario was encountered during testing of logical partition mobility on pseries partitions with bonded ibmvnic adapters in LACP mode. 1. Driver receives a signal that the device has been swapped, and it needs to reset to initialize the new device. 2. Driver reports loss of carrier and begins initialization. 3. Bonding driver receives NETDEV_CHANGE notifier and checks the slave's current speed and duplex settings. Because these are unknown at the time, the bond sets its link state to BOND_LINK_FAIL and handles the speed update, clearing AD_PORT_LACP_ENABLE. 4. Driver finishes recovery and reports that the carrier is on. 5. Bond receives a new notification and checks the speed again. The speeds are valid but miimon has not altered the link state yet. AD_PORT_LACP_ENABLE remains off. Because the slave's link state is still BOND_LINK_FAIL, no further port checks are made when it recovers. Though the slave devices are operational and have valid speed and duplex settings, the bond will not send LACPDU's. The simplest fix I can see is to force another speed check in bond_miimon_commit. This way the bond will update AD_PORT_LACP_ENABLE if needed when transitioning from BOND_LINK_FAIL to BOND_LINK_UP. CC: Jarod Wilson <jarod@redhat.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull preemption Kconfig fix from Thomas Gleixner: "The PREEMPT_RT stub config renamed PREEMPT to PREEMPT_LL and defined PREEMPT outside of the menu and made it selectable by both PREEMPT_LL and PREEMPT_RT. Stupid me missed that 114 defconfigs select CONFIG_PREEMPT which obviously can't work anymore. oldconfig builds are affected as well, but it's more obvious as the user gets asked. [old]defconfig silently fixes it up and selects PREEMPT_NONE. Unbreak it by undoing the rename and adding a intermediate config symbol which is selected by both PREEMPT and PREEMPT_RT. That requires to chase down a few #ifdefs, but it's better than tweaking 114 defconfigs and annoying users" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=y
2019-07-22Merge tag 'for-linus-20190722' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull pidfd polling fix from Christian Brauner: "A fix for pidfd polling. It ensures that the task's exit state is visible to all waiters" * tag 'for-linus-20190722' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: pidfd: fix a poll race when setting exit_state
2019-07-22Merge tag 'for-5.3-rc1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: - fixes for leaks caused by recently merged patches - one build fix - a fix to prevent mixing of incompatible features * tag 'for-5.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: don't leak extent_map in btrfs_get_io_geometry() btrfs: free checksum hash on in close_ctree btrfs: Fix build error while LIBCRC32C is module btrfs: inode: Don't compress if NODATASUM or NODATACOW set
2019-07-22sched/rt, Kconfig: Unbreak def/oldconfig with CONFIG_PREEMPT=yThomas Gleixner
The merge of the CONFIG_PREEMPT_RT stub renamed CONFIG_PREEMPT to CONFIG_PREEMPT_LL which causes all defconfigs which have CONFIG_PREEMPT=y set to fall back to CONFIG_PREEMPT_NONE because CONFIG_PREEMPT depends on the preemption mode choice wich defaults to NONE. This also affects oldconfig builds. So rather than changing 114 defconfig files and being an annoyance to users, revert the rename and select a new config symbol PREEMPTION. That keeps everything working smoothly and the revelant ifdef's are going to be fixed up step by step. Reported-by: Mark Rutland <mark.rutland@arm.com> Fixes: a50a3f4b6a31 ("sched/rt, Kconfig: Introduce CONFIG_PREEMPT_RT") Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2019-07-22Merge tag 'media/v5.3-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "For two regressions in media core: - v4l2-subdev: fix regression in check_pad() - videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use" * tag 'media/v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: videodev2.h: change V4L2_PIX_FMT_BGRA444 define: fourcc was already in use media: v4l2-subdev: fix regression in check_pad()
2019-07-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Several netfilter fixes including a nfnetlink deadlock fix from Florian Westphal and fix for dropping VRF packets from Miaohe Lin. 2) Flow offload fixes from Pablo Neira Ayuso including a fix to restore proper block sharing. 3) Fix r8169 PHY init from Thomas Voegtle. 4) Fix memory leak in mac80211, from Lorenzo Bianconi. 5) Missing NULL check on object allocation in cxgb4, from Navid Emamdoost. 6) Fix scaling of RX power in sfp phy driver, from Andrew Lunn. 7) Check that there is actually an ip header to access in skb->data in VRF, from Peter Kosyh. 8) Remove spurious rcu unlock in hv_netvsc, from Haiyang Zhang. 9) One more tweak the the TCP fragmentation memory limit changes, to be less harmful to applications setting small SO_SNDBUF values. From Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits) tcp: be more careful in tcp_fragment() hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback() vrf: make sure skb->data contains ip header to make routing connector: remove redundant input callback from cn_dev qed: Prefer pcie_capability_read_word() igc: Prefer pcie_capability_read_word() cxgb4: Prefer pcie_capability_read_word() be2net: Synchronize be_update_queues with dev_watchdog bnx2x: Prevent load reordering in tx completion processing net: phy: sfp: hwmon: Fix scaling of RX power net: sched: verify that q!=NULL before setting q->flags chelsio: Fix a typo in a function name allocate_flower_entry: should check for null deref net: hns3: typo in the name of a constant kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist. tipc: Fix a typo mac80211: don't warn about CW params when not using them mac80211: fix possible memory leak in ieee80211_assign_beacon nl80211: fix NL80211_HE_MAX_CAPABILITY_LEN nl80211: fix VENDOR_CMD_RAW_DATA ...
2019-07-22selftests/bpf: fix sendmsg6_prog on s390Ilya Leoshkevich
"sendmsg6: rewrite IP & port (C)" fails on s390, because the code in sendmsg_v6_prog() assumes that (ctx->user_ip6[0] & 0xFFFF) refers to leading IPv6 address digits, which is not the case on big-endian machines. Since checking bitwise operations doesn't seem to be the point of the test, replace two short comparisons with a single int comparison. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Acked-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22libbpf: Avoid designated initializers for unnamed union membersArnaldo Carvalho de Melo
As it fails to build in some systems with: libbpf.c: In function 'perf_buffer__new': libbpf.c:4515: error: unknown field 'sample_period' specified in initializer libbpf.c:4516: error: unknown field 'wakeup_events' specified in initializer Doing as: attr.sample_period = 1; I.e. not as a designated initializer makes it build everywhere. Cc: Andrii Nakryiko <andriin@fb.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: fb84b8224655 ("libbpf: add perf buffer API") Link: https://lkml.kernel.org/n/tip-hnlmch8qit1ieksfppmr32si@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22libbpf: Fix endianness macro usage for some compilersArnaldo Carvalho de Melo
Using endian.h and its endianness macros makes this code build in a wider range of compilers, as some don't have those macros (__BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__, __ORDER_BIG_ENDIAN__), so use instead endian.h's macros (__BYTE_ORDER, __LITTLE_ENDIAN, __BIG_ENDIAN) which makes this code even shorter :-) Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Fixes: 12ef5634a855 ("libbpf: simplify endianness check") Fixes: e6c64855fd7a ("libbpf: add btf__parse_elf API to load .BTF and .BTF.ext") Link: https://lkml.kernel.org/n/tip-eep5n8vgwcdphw3uc058k03u@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22Merge branch 'bpf-sockmap-tls-fixes'Daniel Borkmann
Jakub Kicinski says: ==================== John says: Resolve a series of splats discovered by syzbot and an unhash TLS issue noted by Eric Dumazet. The main issues revolved around interaction between TLS and sockmap tear down. TLS and sockmap could both reset sk->prot ops creating a condition where a close or unhash op could be called forever. A rare race condition resulting from a missing rcu sync operation was causing a use after free. Then on the TLS side dropping the sock lock and re-acquiring it during the close op could hang. Finally, sockmap must be deployed before tls for current stack assumptions to be met. This is enforced now. A feature series can enable it. To fix this first refactor TLS code so the lock is held for the entire teardown operation. Then add an unhash callback to ensure TLS can not transition from ESTABLISHED to LISTEN state. This transition is a similar bug to the one found and fixed previously in sockmap. Then apply three fixes to sockmap to fix up races on tear down around map free and close. Finally, if sockmap is destroyed before TLS we add a new ULP op update to inform the TLS stack it should not call sockmap ops. This last one appears to be the most commonly found issue from syzbot. v4: - fix some use after frees; - disable disconnect work for offload (ctx lifetime is much more complex); - remove some of the dead code which made it hard to understand (for me) that things work correctly (e.g. the checks TLS is the top ULP); - add selftets. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: add shutdown testsJakub Kicinski
Add test for killing the connection via shutdown. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: close the socket with open recordJakub Kicinski
Add test which sends some data with MSG_MORE and then closes the socket (never calling send without MSG_MORE). This should make sure we clean up open records correctly. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: add a bidirectional testJakub Kicinski
Add a simple test which installs the TLS state for both directions, sends and receives data on both sockets. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: test error codes around TLS ULP installationJakub Kicinski
Test the error codes returned when TCP connection is not in ESTABLISHED state. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22selftests/tls: add a test for ULP but no keysJakub Kicinski
Make sure we test the TLS_BASE/TLS_BASE case both with data and the tear down/clean up path. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap/tls, close can race with map freeJohn Fastabend
When a map free is called and in parallel a socket is closed we have two paths that can potentially reset the socket prot ops, the bpf close() path and the map free path. This creates a problem with which prot ops should be used from the socket closed side. If the map_free side completes first then we want to call the original lowest level ops. However, if the tls path runs first we want to call the sockmap ops. Additionally there was no locking around prot updates in TLS code paths so the prot ops could be changed multiple times once from TLS path and again from sockmap side potentially leaving ops pointed at either TLS or sockmap when psock and/or tls context have already been destroyed. To fix this race first only update ops inside callback lock so that TLS, sockmap and lowest level all agree on prot state. Second and a ULP callback update() so that lower layers can inform the upper layer when they are being removed allowing the upper layer to reset prot ops. This gets us close to allowing sockmap and tls to be stacked in arbitrary order but will save that patch for *next trees. v4: - make sure we don't free things for device; - remove the checks which swap the callbacks back only if TLS is at the top. Reported-by: syzbot+06537213db7ba2745c4a@syzkaller.appspotmail.com Fixes: 02c558b2d5d6 ("bpf: sockmap, support for msg_peek in sk_msg with redirect ingress") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap, only create entry if ulp is not already enabledJohn Fastabend
Sockmap does not currently support adding sockets after TLS has been enabled. There never was a real use case for this so it was never added. But, we lost the test for ULP at some point so add it here and fail the socket insert if TLS is enabled. Future work could make sockmap support this use case but fixup the bug here. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap, synchronize_rcu before free'ing mapJohn Fastabend
We need to have a synchronize_rcu before free'ing the sockmap because any outstanding psock references will have a pointer to the map and when they use this could trigger a use after free. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22bpf: sockmap, sock_map_delete needs to use xchgJohn Fastabend
__sock_map_delete() may be called from a tcp event such as unhash or close from the following trace, tcp_bpf_close() tcp_bpf_remove() sk_psock_unlink() sock_map_delete_from_link() __sock_map_delete() In this case the sock lock is held but this only protects against duplicate removals on the TCP side. If the map is free'd then we have this trace, sock_map_free xchg() <- replaces map entry sock_map_unref() sk_psock_put() sock_map_del_link() The __sock_map_delete() call however uses a read, test, null over the map entry which can result in both paths trying to free the map entry. To fix use xchg in TCP paths as well so we avoid having two references to the same map entry. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: fix transition through disconnect with closeJohn Fastabend
It is possible (via shutdown()) for TCP socks to go through TCP_CLOSE state via tcp_disconnect() without actually calling tcp_close which would then call the tls close callback. Because of this a user could disconnect a socket then put it in a LISTEN state which would break our assumptions about sockets always being ESTABLISHED state. More directly because close() can call unhash() and unhash is implemented by sockmap if a sockmap socket has TLS enabled we can incorrectly destroy the psock from unhash() and then call its close handler again. But because the psock (sockmap socket representation) is already destroyed we call close handler in sk->prot. However, in some cases (TLS BASE/BASE case) this will still point at the sockmap close handler resulting in a circular call and crash reported by syzbot. To fix both above issues implement the unhash() routine for TLS. v4: - add note about tls offload still needing the fix; - move sk_proto to the cold cache line; - split TX context free into "release" and "free", otherwise the GC work itself is in already freed memory; - more TX before RX for consistency; - reuse tls_ctx_free(); - schedule the GC work after we're done with context to avoid UAF; - don't set the unhash in all modes, all modes "inherit" TLS_BASE's callbacks anyway; - disable the unhash hook for TLS_HW. Fixes: 3c4d7559159bf ("tls: kernel TLS support") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: remove sock unlock/lock around strp_done()John Fastabend
The tls close() callback currently drops the sock lock to call strp_done(). Split up the RX cleanup into stopping the strparser and releasing most resources, syncing strparser and finally freeing the context. To avoid the need for a strp_done() call on the cleanup path of device offload make sure we don't arm the strparser until we are sure init will be successful. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: remove close callback sock unlock/lock around TX work flushJohn Fastabend
The tls close() callback currently drops the sock lock, makes a cancel_delayed_work_sync() call, and then relocks the sock. By restructuring the code we can avoid droping lock and then reclaiming it. To simplify this we do the following, tls_sk_proto_close set_bit(CLOSING) set_bit(SCHEDULE) cancel_delay_work_sync() <- cancel workqueue lock_sock(sk) ... release_sock(sk) strp_done() Setting the CLOSING bit prevents the SCHEDULE bit from being cleared by any workqueue items e.g. if one happens to be scheduled and run between when we set SCHEDULE bit and cancel work. Then because SCHEDULE bit is set now no new work will be scheduled. Tested with net selftests and bpf selftests. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: don't call tls_sk_proto_close for hw record offloadJakub Kicinski
The deprecated TOE offload doesn't actually do anything in tls_sk_proto_close() - all TLS code is skipped and context not freed. Remove the callback to make it easier to refactor tls_sk_proto_close(). Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22net/tls: don't arm strparser immediately in tls_set_sw_offload()Jakub Kicinski
In tls_set_device_offload_rx() we prepare the software context for RX fallback and proceed to add the connection to the device. Unfortunately, software context prep includes arming strparser so in case of a later error we have to release the socket lock to call strp_done(). In preparation for not releasing the socket lock half way through callbacks move arming strparser into a separate function. Following patches will make use of that. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-07-22pidfd: fix a poll race when setting exit_stateSuren Baghdasaryan
There is a race between reading task->exit_state in pidfd_poll and writing it after do_notify_parent calls do_notify_pidfd. Expected sequence of events is: CPU 0 CPU 1 ------------------------------------------------ exit_notify do_notify_parent do_notify_pidfd tsk->exit_state = EXIT_DEAD pidfd_poll if (tsk->exit_state) However nothing prevents the following sequence: CPU 0 CPU 1 ------------------------------------------------ exit_notify do_notify_parent do_notify_pidfd pidfd_poll if (tsk->exit_state) tsk->exit_state = EXIT_DEAD This causes a polling task to wait forever, since poll blocks because exit_state is 0 and the waiting task is not notified again. A stress test continuously doing pidfd poll and process exits uncovered this bug. To fix it, we make sure that the task's exit_state is always set before calling do_notify_pidfd. Fixes: b53b0b9d9a6 ("pidfd: add polling support") Cc: kernel-team@android.com Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Link: https://lore.kernel.org/r/20190717172100.261204-1-joel@joelfernandes.org [christian@brauner.io: adapt commit message and drop unneeded changes from wait_task_zombie] Signed-off-by: Christian Brauner <christian@brauner.io>
2019-07-21tcp: be more careful in tcp_fragment()Eric Dumazet
Some applications set tiny SO_SNDBUF values and expect TCP to just work. Recent patches to address CVE-2019-11478 broke them in case of losses, since retransmits might be prevented. We should allow these flows to make progress. This patch allows the first and last skb in retransmit queue to be split even if memory limits are hit. It also adds the some room due to the fact that tcp_sendmsg() and tcp_sendpage() might overshoot sk_wmem_queued by about one full TSO skb (64KB size). Note this allowance was already present in stable backports for kernels < 4.15 Note for < 4.15 backports : tcp_rtx_queue_tail() will probably look like : static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk) { struct sk_buff *skb = tcp_send_head(sk); return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk); } Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Andrew Prout <aprout@ll.mit.edu> Tested-by: Jonathan Lemon <jonathan.lemon@gmail.com> Tested-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Acked-by: Christoph Paasch <cpaasch@apple.com> Cc: Jonathan Looney <jtl@netflix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()Haiyang Zhang
There is an extra rcu_read_unlock left in netvsc_recv_callback(), after a previous patch that removes RCU from this function. This patch removes the extra RCU unlock. Fixes: 345ac08990b8 ("hv_netvsc: pass netvsc_device to receive callback") Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21Linus 5.3-rc1v5.3-rc1Linus Torvalds
2019-07-21vrf: make sure skb->data contains ip header to make routingPeter Kosyh
vrf_process_v4_outbound() and vrf_process_v6_outbound() do routing using ip/ipv6 addresses, but don't make sure the header is available in skb->data[] (skb_headlen() is less then header size). Case: 1) igb driver from intel. 2) Packet size is greater then 255. 3) MPLS forwards to VRF device. So, patch adds pskb_may_pull() calls in vrf_process_v4/v6_outbound() functions. Signed-off-by: Peter Kosyh <p.kosyh@gmail.com> Reviewed-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21connector: remove redundant input callback from cn_devVasily Averin
A small cleanup: this callback is never used. Originally fixed by Stanislav Kinsburskiy <skinsbursky@virtuozzo.com> for OpenVZ7 bug OVZ-6877 cc: stanislav.kinsburskiy@gmail.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21qed: Prefer pcie_capability_read_word()Frederick Lawler
Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability") added accessors for the PCI Express Capability so that drivers didn't need to be aware of differences between v1 and v2 of the PCI Express Capability. Replace pci_read_config_word() and pci_write_config_word() calls with pcie_capability_read_word() and pcie_capability_write_word(). Signed-off-by: Frederick Lawler <fred@fredlawl.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21igc: Prefer pcie_capability_read_word()Frederick Lawler
Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability") added accessors for the PCI Express Capability so that drivers didn't need to be aware of differences between v1 and v2 of the PCI Express Capability. Replace pci_read_config_word() and pci_write_config_word() calls with pcie_capability_read_word() and pcie_capability_write_word(). Signed-off-by: Frederick Lawler <fred@fredlawl.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21cxgb4: Prefer pcie_capability_read_word()Frederick Lawler
Commit 8c0d3a02c130 ("PCI: Add accessors for PCI Express Capability") added accessors for the PCI Express Capability so that drivers didn't need to be aware of differences between v1 and v2 of the PCI Express Capability. Replace pci_read_config_word() and pci_write_config_word() calls with pcie_capability_read_word() and pcie_capability_write_word(). Signed-off-by: Frederick Lawler <fred@fredlawl.com> Reviewed-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21be2net: Synchronize be_update_queues with dev_watchdogBenjamin Poirier
As pointed out by Firo Yang, a netdev tx timeout may trigger just before an ethtool set_channels operation is started. be_tx_timeout(), which dumps some queue structures, is not written to run concurrently with be_update_queues(), which frees/allocates those queues structures. Add some synchronization between the two. Message-id: <CH2PR18MB31898E033896F9760D36BFF288C90@CH2PR18MB3189.namprd18.prod.outlook.com> Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21bnx2x: Prevent load reordering in tx completion processingBrian King
This patch fixes an issue seen on Power systems with bnx2x which results in the skb is NULL WARN_ON in bnx2x_free_tx_pkt firing due to the skb pointer getting loaded in bnx2x_free_tx_pkt prior to the hw_cons load in bnx2x_tx_int. Adding a read memory barrier resolves the issue. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21netfilter: ebtables: fix a memory leak bug in compatWenwen Wang
In compat_do_replace(), a temporary buffer is allocated through vmalloc() to hold entries copied from the user space. The buffer address is firstly saved to 'newinfo->entries', and later on assigned to 'entries_tmp'. Then the entries in this temporary buffer is copied to the internal kernel structure through compat_copy_entries(). If this copy process fails, compat_do_replace() should be terminated. However, the allocated temporary buffer is not freed on this path, leading to a memory leak. To fix the bug, free the buffer before returning from compat_do_replace(). Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-21net: phy: sfp: hwmon: Fix scaling of RX powerAndrew Lunn
The RX power read from the SFP uses units of 0.1uW. This must be scaled to units of uW for HWMON. This requires a divide by 10, not the current 100. With this change in place, sensors(1) and ethtool -m agree: sff2-isa-0000 Adapter: ISA adapter in0: +3.23 V temp1: +33.1 C power1: 270.00 uW power2: 200.00 uW curr1: +0.01 A Laser output power : 0.2743 mW / -5.62 dBm Receiver signal average optical power : 0.2014 mW / -6.96 dBm Reported-by: chris.healy@zii.aero Signed-off-by: Andrew Lunn <andrew@lunn.ch> Fixes: 1323061a018a ("net: phy: sfp: Add HWMON support for module sensors") Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21net: sched: verify that q!=NULL before setting q->flagsVlad Buslov
In function int tc_new_tfilter() q pointer can be NULL when adding filter on a shared block. With recent change that resets TCQ_F_CAN_BYPASS after filter creation, following NULL pointer dereference happens in case parent block is shared: [ 212.925060] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 212.925445] #PF: supervisor write access in kernel mode [ 212.925709] #PF: error_code(0x0002) - not-present page [ 212.925965] PGD 8000000827923067 P4D 8000000827923067 PUD 827924067 PMD 0 [ 212.926302] Oops: 0002 [#1] SMP KASAN PTI [ 212.926539] CPU: 18 PID: 2617 Comm: tc Tainted: G B 5.2.0+ #512 [ 212.926938] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 212.927364] RIP: 0010:tc_new_tfilter+0x698/0xd40 [ 212.927633] Code: 74 0d 48 85 c0 74 08 48 89 ef e8 03 aa 62 00 48 8b 84 24 a0 00 00 00 48 8d 78 10 48 89 44 24 18 e8 4d 0c 6b ff 48 8b 44 24 18 <83> 60 10 f b 48 85 ed 0f 85 3d fe ff ff e9 4f fe ff ff e8 81 26 f8 [ 212.928607] RSP: 0018:ffff88884fd5f5d8 EFLAGS: 00010296 [ 212.928905] RAX: 0000000000000000 RBX: 0000000000000000 RCX: dffffc0000000000 [ 212.929201] RDX: 0000000000000007 RSI: 0000000000000004 RDI: 0000000000000297 [ 212.929402] RBP: ffff88886bedd600 R08: ffffffffb91d4b51 R09: fffffbfff7616e4d [ 212.929609] R10: fffffbfff7616e4c R11: ffffffffbb0b7263 R12: ffff88886bc61040 [ 212.929803] R13: ffff88884fd5f950 R14: ffffc900039c5000 R15: ffff88835e927680 [ 212.929999] FS: 00007fe7c50b6480(0000) GS:ffff88886f980000(0000) knlGS:0000000000000000 [ 212.930235] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 212.930394] CR2: 0000000000000010 CR3: 000000085bd04002 CR4: 00000000001606e0 [ 212.930588] Call Trace: [ 212.930682] ? tc_del_tfilter+0xa40/0xa40 [ 212.930811] ? __lock_acquire+0x5b5/0x2460 [ 212.930948] ? find_held_lock+0x85/0xa0 [ 212.931081] ? tc_del_tfilter+0xa40/0xa40 [ 212.931201] rtnetlink_rcv_msg+0x4ab/0x5f0 [ 212.931332] ? rtnl_dellink+0x490/0x490 [ 212.931454] ? lockdep_hardirqs_on+0x260/0x260 [ 212.931589] ? netlink_deliver_tap+0xab/0x5a0 [ 212.931717] ? match_held_lock+0x1b/0x240 [ 212.931844] netlink_rcv_skb+0xd0/0x200 [ 212.931958] ? rtnl_dellink+0x490/0x490 [ 212.932079] ? netlink_ack+0x440/0x440 [ 212.932205] ? netlink_deliver_tap+0x161/0x5a0 [ 212.932335] ? lock_downgrade+0x360/0x360 [ 212.932457] ? lock_acquire+0xe5/0x210 [ 212.932579] netlink_unicast+0x296/0x350 [ 212.932705] ? netlink_attachskb+0x390/0x390 [ 212.932834] ? _copy_from_iter_full+0xe0/0x3a0 [ 212.932976] netlink_sendmsg+0x394/0x600 [ 212.937998] ? netlink_unicast+0x350/0x350 [ 212.943033] ? move_addr_to_kernel.part.0+0x90/0x90 [ 212.948115] ? netlink_unicast+0x350/0x350 [ 212.953185] sock_sendmsg+0x96/0xa0 [ 212.958099] ___sys_sendmsg+0x482/0x520 [ 212.962881] ? match_held_lock+0x1b/0x240 [ 212.967618] ? copy_msghdr_from_user+0x250/0x250 [ 212.972337] ? lock_downgrade+0x360/0x360 [ 212.976973] ? rwlock_bug.part.0+0x60/0x60 [ 212.981548] ? __mod_node_page_state+0x1f/0xa0 [ 212.986060] ? match_held_lock+0x1b/0x240 [ 212.990567] ? find_held_lock+0x85/0xa0 [ 212.994989] ? do_user_addr_fault+0x349/0x5b0 [ 212.999387] ? lock_downgrade+0x360/0x360 [ 213.003713] ? find_held_lock+0x85/0xa0 [ 213.007972] ? __fget_light+0xa1/0xf0 [ 213.012143] ? sockfd_lookup_light+0x91/0xb0 [ 213.016165] __sys_sendmsg+0xba/0x130 [ 213.020040] ? __sys_sendmsg_sock+0xb0/0xb0 [ 213.023870] ? handle_mm_fault+0x337/0x470 [ 213.027592] ? page_fault+0x8/0x30 [ 213.031316] ? lockdep_hardirqs_off+0xbe/0x100 [ 213.034999] ? mark_held_locks+0x24/0x90 [ 213.038671] ? do_syscall_64+0x1e/0xe0 [ 213.042297] do_syscall_64+0x74/0xe0 [ 213.045828] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 213.049354] RIP: 0033:0x7fe7c527c7b8 [ 213.052792] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f 0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 213.060269] RSP: 002b:00007ffc3f7908a8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 213.064144] RAX: ffffffffffffffda RBX: 000000005d34716f RCX: 00007fe7c527c7b8 [ 213.068094] RDX: 0000000000000000 RSI: 00007ffc3f790910 RDI: 0000000000000003 [ 213.072109] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007fe7c5340cc0 [ 213.076113] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000080 [ 213.080146] R13: 0000000000480640 R14: 0000000000000080 R15: 0000000000000000 [ 213.084147] Modules linked in: act_gact cls_flower sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc intel_rapl_msr intel_rapl_common [<1;69;32Msb_edac rdma_ucm rdma_cm x86_pkg_temp_thermal iw_cm intel_powerclamp ib_cm coretemp kvm_intel kvm irqbypass mlx5_ib ib_uverbs ib_core crct10dif_pclmul crc32_pc lmul crc32c_intel ghash_clmulni_intel mlx5_core intel_cstate intel_uncore iTCO_wdt igb iTCO_vendor_support mlxfw mei_me ptp ses intel_rapl_perf mei pcspkr ipmi _ssif i2c_i801 joydev enclosure pps_core lpc_ich ioatdma wmi dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_vram_helpe r ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas [ 213.112326] CR2: 0000000000000010 [ 213.117429] ---[ end trace adb58eb0a4ee6283 ]--- Verify that q pointer is not NULL before setting the 'flags' field. Fixes: 3f05e6886a59 ("net_sched: unset TCQ_F_CAN_BYPASS when adding filters") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21chelsio: Fix a typo in a function nameChristophe JAILLET
It is likely that 'my3216_poll()' should be 'my3126_poll()'. (1 and 2 switched in 3126. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21allocate_flower_entry: should check for null derefNavid Emamdoost
allocate_flower_entry does not check for allocation success, but tries to deref the result. I only moved the spin_lock under null check, because the caller is checking allocation's status at line 652. Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21net: hns3: typo in the name of a constantChristophe JAILLET
All constant in 'enum HCLGE_MBX_OPCODE' start with HCLGE, except 'HLCGE_MBX_PUSH_VLAN_INFO' (C and L switched) s/HLC/HCL/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21kbuild: add net/netfilter/nf_tables_offload.h to header-test blacklist.Jeremy Sowden
net/netfilter/nf_tables_offload.h includes net/netfilter/nf_tables.h which is itself on the blacklist. Reported-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jeremy Sowden <jeremy@azazel.net> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21tipc: Fix a typoChristophe JAILLET
s/tipc_toprsv_listener_data_ready/tipc_topsrv_listener_data_ready/ (r and s switched in topsrv) Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21Merge tag 'mac80211-for-davem-2019-07-20' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== We have a handful of fixes: * ignore bad CW parameters if we aren't using them, instead of warning * fix operation (and then build) with the new netlink vendor command policy requirement * fix a memory leak in an error path when setting beacons ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-21Merge tag 'devicetree-fixes-for-5.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull Devicetree fixes from Rob Herring: "Fix several warnings/errors in validation of binding schemas" * tag 'devicetree-fixes-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: pinctrl: stm32: Fix missing 'clocks' property in examples dt-bindings: iio: ad7124: Fix dtc warnings in example dt-bindings: iio: avia-hx711: Fix avdd-supply typo in example dt-bindings: pinctrl: aspeed: Fix AST2500 example errors dt-bindings: pinctrl: aspeed: Fix 'compatible' schema errors dt-bindings: riscv: Limit cpus schema to only check RiscV 'cpu' nodes dt-bindings: Ensure child nodes are of type 'object'
2019-07-21Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs documentation typo fix from Al Viro. * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: typo fix: it's d_make_root, not d_make_inode...
2019-07-21Merge tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Two fixes for stable, one that had dependency on earlier patch in this merge window and can now go in, and a perf improvement in SMB3 open" * tag '5.3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal module number cifs: flush before set-info if we have writeable handles smb3: optimize open to not send query file internal info cifs: copy_file_range needs to strip setuid bits and update timestamps CIFS: fix deadlock in cached root handling
2019-07-21iommu/amd: fix a crash in iova_magazine_free_pfnsQian Cai
The commit b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method") incorrectly changed the checking from dma_ops_alloc_iova() in map_sg() causes a crash under memory pressure as dma_ops_alloc_iova() never return DMA_MAPPING_ERROR on failure but 0, so the error handling is all wrong. kernel BUG at drivers/iommu/iova.c:801! Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:iova_magazine_free_pfns+0x7d/0xc0 Call Trace: free_cpu_cached_iovas+0xbd/0x150 alloc_iova_fast+0x8c/0xba dma_ops_alloc_iova.isra.6+0x65/0xa0 map_sg+0x8c/0x2a0 scsi_dma_map+0xc6/0x160 pqi_aio_submit_io+0x1f6/0x440 [smartpqi] pqi_scsi_queue_command+0x90c/0xdd0 [smartpqi] scsi_queue_rq+0x79c/0x1200 blk_mq_dispatch_rq_list+0x4dc/0xb70 blk_mq_sched_dispatch_requests+0x249/0x310 __blk_mq_run_hw_queue+0x128/0x200 blk_mq_run_work_fn+0x27/0x30 process_one_work+0x522/0xa10 worker_thread+0x63/0x5b0 kthread+0x1d2/0x1f0 ret_from_fork+0x22/0x40 Fixes: b3aa14f02254 ("iommu: remove the mapping_error dma_map_ops method") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-21hexagon: switch to generic version of pte allocationMike Rapoport
The hexagon implementation pte_alloc_one(), pte_alloc_one_kernel(), pte_free_kernel() and pte_free() is identical to the generic except of lack of __GFP_ACCOUNT for the user PTEs allocation. Switch hexagon to use generic version of these functions. Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>