summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-08-07ipv6: sr: allow SRH insertion with arbitrary segments_left valueDavid Lebrun
The seg6_validate_srh() function only allows SRHs whose active segment is the first segment of the path. However, an application may insert an SRH whose active segment is not the first one. Such an application might be for example an SR-aware Virtual Network Function. This patch enables to insert SRHs with an arbitrary active segment. Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07bpf: devmap fix mutex in rcu critical sectionJohn Fastabend
Originally we used a mutex to protect concurrent devmap update and delete operations from racing with netdev unregister notifier callbacks. The notifier hook is needed because we increment the netdev ref count when a dev is added to the devmap. This ensures the netdev reference is valid in the datapath. However, we don't want to block unregister events, hence the initial mutex and notifier handler. The concern was in the notifier hook we search the map for dev entries that hold a refcnt on the net device being torn down. But, in order to do this we require two steps, (i) dereference the netdev: dev = rcu_dereference(map[i]) (ii) test ifindex: dev->ifindex == removing_ifindex and then finally we can swap in the NULL dev in the map via an xchg operation, xchg(map[i], NULL) The danger here is a concurrent update could run a different xchg op concurrently leading us to replace the new dev with a NULL dev incorrectly. CPU 1 CPU 2 notifier hook bpf devmap update dev = rcu_dereference(map[i]) dev = rcu_dereference(map[i]) xchg(map[i]), new_dev); rcu_call(dev,...) xchg(map[i], NULL) The above flow would create the incorrect state with the dev reference in the update path being lost. To resolve this the original code used a mutex around the above block. However, updates, deletes, and lookups occur inside rcu critical sections so we can't use a mutex in this context safely. Fortunately, by writing slightly better code we can avoid the mutex altogether. If CPU 1 in the above example uses a cmpxchg and _only_ replaces the dev reference in the map when it is in fact the expected dev the race is removed completely. The two cases being illustrated here, first the race condition, CPU 1 CPU 2 notifier hook bpf devmap update dev = rcu_dereference(map[i]) dev = rcu_dereference(map[i]) xchg(map[i]), new_dev); rcu_call(dev,...) odev = cmpxchg(map[i], dev, NULL) Now we can test the cmpxchg return value, detect odev != dev and abort. Or in the good case, CPU 1 CPU 2 notifier hook bpf devmap update dev = rcu_dereference(map[i]) odev = cmpxchg(map[i], dev, NULL) [...] Now 'odev == dev' and we can do proper cleanup. And viola the original race we tried to solve with a mutex is corrected and the trace noted by Sasha below is resolved due to removal of the mutex. Note: When walking the devmap and removing dev references as needed we depend on the core to fail any calls to dev_get_by_index() using the ifindex of the device being removed. This way we do not race with the user while searching the devmap. Additionally, the mutex was also protecting list add/del/read on the list of maps in-use. This patch converts this to an RCU list and spinlock implementation. This protects the list from concurrent alloc/free operations. The notifier hook walks this list so it uses RCU read semantics. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:747 in_atomic(): 1, irqs_disabled(): 0, pid: 16315, name: syz-executor1 1 lock held by syz-executor1/16315: #0: (rcu_read_lock){......}, at: [<ffffffff8c363bc2>] map_delete_elem kernel/bpf/syscall.c:577 [inline] #0: (rcu_read_lock){......}, at: [<ffffffff8c363bc2>] SYSC_bpf kernel/bpf/syscall.c:1427 [inline] #0: (rcu_read_lock){......}, at: [<ffffffff8c363bc2>] SyS_bpf+0x1d32/0x4ba0 kernel/bpf/syscall.c:1388 Fixes: 2ddf71e23cc2 ("net: add notifier hooks for devmap bpf map") Reported-by: Sasha Levin <alexander.levin@verizon.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07Merge branch 'net_sched-clean-up-filter-handle'David S. Miller
Cong Wang says: ==================== net_sched: clean up filter handle This patchset sits in my local branch for a long time, it is time to send it out. It cleans up the ambiguous use of 'unsigned long fh', please see each of them for details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net_sched: use void pointer for filter handleWANG Cong
Now we use 'unsigned long fh' as a pointer in every place, it is safe to convert it to a void pointer now. This gets rid of many casts to pointer. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net_sched: refactor notification code for RTM_DELTFILTERWANG Cong
It is confusing to use 'unsigned long fh' as both a handle and a pointer, especially commit 9ee7837449b3 ("net sched filters: fix notification of filter delete with proper handle"). This patch introduces tfilter_del_notify() so that we can pass it as a pointer as before, and we don't need to check RTM_DELTFILTER in tcf_fill_node() any more. This prepares for the next patch. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07lwtunnel: replace EXPORT_SYMBOL with EXPORT_SYMBOL_GPLRoopa Prabhu
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07Merge branch 'bpf-add-support-for-sys-enter-exit-tracepoints'David S. Miller
Yonghong Song says: ==================== bpf: add support for sys_{enter|exit}_* tracepoints Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_* style tracepoints. The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_* tracepoints are treated differently from other tracepoints and there is no bpf hook to it. This patch set adds bpf support for these syscalls tracepoints and also adds a test case for it. Changelogs: v3 -> v4: - Check the legality of ctx offset access for syscall tracepoint as well. trace_event_get_offsets will return correct max offset for each specific syscall tracepoint. - Use variable length array to avoid hardcode 6 as the maximum arguments beyond syscall_nr. v2 -> v3: - Fix a build issue v1 -> v2: - Do not use TRACE_EVENT_FL_CAP_ANY to identify syscall tracepoint. Instead use trace_event_call->class. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07bpf: add a test case for syscalls/sys_{enter|exit}_* tracepointsYonghong Song
Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07bpf: add support for sys_enter_* and sys_exit_* tracepointsYonghong Song
Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_* style tracepoints. The iovisor/bcc issue #748 (https://github.com/iovisor/bcc/issues/748) documents this issue. For example, if you try to attach a bpf program to tracepoints syscalls/sys_enter_newfstat, you will get the following error: # ./tools/trace.py t:syscalls:sys_enter_newfstat Ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument Failed to attach BPF to tracepoint The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_* tracepoints are treated differently from other tracepoints and there is no bpf hook to it. This patch adds bpf support for these syscalls tracepoints by . permitting bpf attachment in ioctl PERF_EVENT_IOC_SET_BPF . calling bpf programs in perf_syscall_enter and perf_syscall_exit The legality of bpf program ctx access is also checked. Function trace_event_get_offsets returns correct max offset for each specific syscall tracepoint, which is compared against the maximum offset access in bpf program. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07of_mdio: use of_property_read_u32_array()Sergei Shtylyov
The "fixed-link" prop support predated of_property_read_u32_array(), so basically had to open-code it. Using the modern API saves 24 bytes of the object code (ARM gcc 4.8.5); the only behavior change would be that the prop length check is now less strict (however the strict pre-check done in of_phy_is_fixed_link() is left intact anyway)... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07ibmvnic: Report rx buffer return codes as netdev_dbgJohn Allen
Reporting any return code for a receive buffer as an "rx error" only produces alarming noise and the only values that have been observed to be used in this field are not error conditions. Change this to a netdev_dbg with a more descriptive message. Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07Merge branch 'net-l3mdev-Support-for-sockets-bound-to-enslaved-device'David S. Miller
David Ahern says: ==================== net: l3mdev: Support for sockets bound to enslaved device A missing piece to the VRF puzzle is the ability to bind sockets to devices enslaved to a VRF. This patch set adds the enslaved device index, sdif, to IPv4 and IPv6 socket lookups. The end result for users is the following scope options for services: 1. "global" services - sockets not bound to any device Allows 1 service to work across all network interfaces with connected sockets bound to the VRF the connection originates (Requires net.ipv4.tcp_l3mdev_accept=1 for TCP and net.ipv4.udp_l3mdev_accept=1 for UDP) 2. "VRF" local services - sockets bound to a VRF Sockets work across all network interfaces enslaved to a VRF but are limited to just the one VRF. 3. "device" services - sockets bound to a specific network interface Service works only through the one specific interface. v3 - convert __inet_lookup_established in dccp_v4_err; missed in v2 v2 - remove sk_lookup struct and add sdif as an argument to existing functions Changes since RFC: - no significant logic changes; mainly whitespace cleanups ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv6: add second dif to raw socket lookupsDavid Ahern
Add a second device index, sdif, to raw socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv6: add second dif to inet6 socket lookupsDavid Ahern
Add a second device index, sdif, to inet6 socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the ingress index is obtained from IPCB using inet_sdif and after tcp_v4_rcv tcp_v4_sdif is used. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv6: add second dif to udp socket lookupsDavid Ahern
Add a second device index, sdif, to udp socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Early demux lookups are handled in the next patch as part of INET_MATCH changes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv4: add second dif to multicast source filterDavid Ahern
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv4: add second dif to raw socket lookupsDavid Ahern
Add a second device index, sdif, to raw socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv4: add second dif to inet socket lookupsDavid Ahern
Add a second device index, sdif, to inet socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. TCP moves the data in the cb. Prior to tcp_v4_rcv (e.g., early demux) the ingress index is obtained from IPCB using inet_sdif and after the cb move in tcp_v4_rcv the tcp_v4_sdif helper is used. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: ipv4: add second dif to udp socket lookupsDavid Ahern
Add a second device index, sdif, to udp socket lookups. sdif is the index for ingress devices enslaved to an l3mdev. It allows the lookups to consider the enslaved device as well as the L3 domain when searching for a socket. Early demux lookups are handled in the next patch as part of INET_MATCH changes. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07Merge tag 'wireless-drivers-next-for-davem-2017-08-07' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.14 The first wireless-drivers-next pull request for 4.14. I'm submitting this unusally late in the cycle as my vacation postponed this. But even if this is late there's not still that much new features, mostly cleanup or fixes. Major changes: ath10k * preparation for wcn3990 support iwlwifi * Reorganization of the code into separate directories continues qtnfmac * regulatory support updates * add get_channel, dump_survey and channel_switch cfg80211 handlers ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07hns3: fix unused function warningArnd Bergmann
Without CONFIG_PCI_IOV, we get a harmless warning about an unused function: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c:2273:13: error: 'hclge_disable_sriov' defined but not used [-Werror=unused-function] The #ifdefs in this driver are obviously wrong, so this just removes them and uses an IS_ENABLED() check that does the same thing correctly in a more readable way. Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07Merge tag 'mlx5-shared-2017-08-07' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Saeed Mahameed says: ==================== mlx5-shared-2017-08-07 This series includes some mlx5 updates for both net-next and rdma trees. From Saeed, Core driver updates to allow selectively building the driver with or without some large driver components, such as - E-Switch (Ethernet SRIOV support). - Multi-Physical Function Switch (MPFs) support. For that we split E-Switch and MPFs functionalities into separate files. From Erez, Delay mlx5_core events when mlx5 interfaces, namely mlx5_ib, registration is taking place and until it completes. From Rabie, Increase the maximum supported flow counters. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07Merge branch 'net-sched-summer-cleanup-part-2-ndo_setup_tc'David S. Miller
Jiri Pirko says: ==================== net: sched: summer cleanup part 2, ndo_setup_tc This patchset focuses on ndo_setup_tc and its args. Currently there are couple of things that do not make much sense. The type is passed in struct tc_to_netdev, but as it is always required, should be arg of the ndo. Other things are passed as args but they are only relevant for cls offloads and not mqprio. Therefore, they should be pushed to struct. As the tc_to_netdev struct in the end is just a container of single pointer, we get rid of it and pass the struct according to type. So in the end, we have: ndo_setup_tc(dev, type, type_data_struct) There are couple of cosmetics done on the way to make things smooth. Also, reported error is consolidated to eopnotsupp in case the asked offload is not supported. v1->v2: - added forgotten hns3pf bits ==================== Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: get rid of struct tc_to_netdevJiri Pirko
Get rid of struct tc_to_netdev which is now just unnecessary container and rather pass per-type structures down to drivers directly. Along with that, consolidate the naming of per-type structure variables in cls_*. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: change return value of ndo_setup_tc for driver supporting mqprio ↵Jiri Pirko
only Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the drivers have it like that, so be aligned. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: move prio into cls_commonJiri Pirko
prio is not cls_flower specific, but it is meaningful for all classifiers. Seems that only mlxsw cares about the value. Obviously, cls offload in other drivers is broken. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: push cls related args into cls_common structureJiri Pirko
As ndo_setup_tc is generic offload op for whole tc subsystem, does not really make sense to have cls-specific args. So move them under cls_common structurure which is embedded in all cls structs. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07hns3pf: don't check handle during mqprio offloadJiri Pirko
Similar to the rest offloaders of mqprio, no need to check handle. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07nfp: change flows in apps that offload ndo_setup_tcJiri Pirko
Change the flows a bit in preparation of follow-up changes in ndo_setup_tc args. Also, change the error code to align with the rest of the drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07dsa: push cls_matchall setup_tc processing into a separate functionJiri Pirko
Let dsa_slave_setup_tc be a splitter for specific setup_tc types and push out cls_matchall specific code into a separate function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07mlxsw: spectrum: rename cls arg in matchall processingJiri Pirko
To sync-up with the naming in the rest of the driver, rename the cls arg. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07mlxsw: spectrum: push cls_flower and cls_matchall setup_tc processing into ↵Jiri Pirko
separate functions Let mlxsw_sp_setup_tc be a splitter for specific setup_tc types and push out cls_flower and cls_matchall specific codes into separate functions. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07mlx5e_rep: push cls_flower setup_tc processing into a separate functionJiri Pirko
Let mlx5e_rep_setup_tc (former mlx5e_rep_ndo_setup_tc) be a splitter for specific setup_tc types and push out cls_flower specific code into a separate function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07mlx5e: push cls_flower and mqprio setup_tc processing into separate functionsJiri Pirko
Let mlx5e_setup_tc (former mlx5e_ndo_setup_tc) be a splitter for specific setup_tc types and push out cls_flower and mqprio specific codes into separate functions. Also change the return values so they are the same as in the rest of the drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07ixgbe: push cls_u32 and mqprio setup_tc processing into separate functionsJiri Pirko
Let __ixgbe_setup_tc be a splitter for specific setup_tc types and push out cls_u32 and mqprio specific codes into separate functions. Also change the return values so they are the same as in the rest of the drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07cxgb4: push cls_u32 setup_tc processing into a separate functionJiri Pirko
Let cxgb_setup_tc be a splitter for specific setup_tc types and push out cls_u32 specific code into a separate function. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: make egress_dev flag part of flower offload structJiri Pirko
Since this is specific to flower now, make it part of the flower offload struct. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: rename TC_SETUP_MATCHALL to TC_SETUP_CLSMATCHALLJiri Pirko
In order to be aligned with the rest of the types, rename TC_SETUP_MATCHALL to TC_SETUP_CLSMATCHALL. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net: sched: make type an argument for ndo_setup_tcJiri Pirko
Since the type is always present, push it to be a separate argument to ndo_setup_tc. On the way, name the type enum and use it for arg type. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-07net/mlx5: Increase the maximum flow counters supportedRabie Loulou
Read new NIC capability field which represnts 16 MSBs of the max flow counters number supported (max_flow_counter_31_16). Backward compatibility with older firmware is preserved, the modified driver reads max_flow_counter_31_16 as 0 from the older firmware and uses up to 64K counters. Changed flow counter id from 16 bits to 32 bits. Backward compatibility with older firmware is preserved as we kept the 16 LSBs of the counter id in place and added 16 MSBs from reserved field. Changed the background bulk reading of flow counters to work in chunks of at most 32K counters, to make sure we don't attempt to allocate very large buffers. Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-07net/mlx5: Fix counter list hardware structureRabie Loulou
The counter list hardware structure doesn't contain a clear and num_of_counters fields, remove them. These wrong fields were never used by the driver hence no other driver changes. Fixes: a351a1b03bf1 ("net/mlx5: Introduce bulk reading of flow counters") Signed-off-by: Rabie Loulou <rabiel@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-07net/mlx5: Delay events till ib registration endsErez Shitrit
When mlx5_ib registers itself to mlx5_core as an interface, it will call mlx5_add_device which will call mlx5_ib interface add callback, in case the latter successfully returns, only then mlx5_core will add it to the interface list and async events will be forwarded to mlx5_ib. Between mlx5_ib interface add callback and mlx5_core adding the mlx5_ib interface to its devices list, arriving mlx5_core events can be missed by the new mlx5_ib registering interface. In other words: thread 1: mlx5_ib: mlx5_register_interface(dev) thread 1: mlx5_core: mlx5_add_device(dev) thread 1: mlx5_core: ctx = dev->add => (mlx5_ib)->mlx5_ib_add thread 2: mlx5_core_event: **new event arrives, forward to dev_list thread 1: mlx5_core: add_ctx_to_dev_list(ctx) /* previous event was missed by the new interface.*/ It is ok to miss events before dev->add (mlx5_ib)->mlx5_ib_add_device but not after. We fix this race by accumulating the events that come between the ib_register_device (inside mlx5_add_device->(dev->add)) till the adding to the list completes and fire them to the new registering interface after that. Fixes: f1ee87fe55c8 ("net/mlx5: Organize device list API in one place") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-07net/mlx5: Add CONFIG_MLX5_ESWITCH KconfigSaeed Mahameed
Allow to selectively build the driver with or without sriov eswitch, VF representors and TC offloads. Also remove the need of two ndo ops structures (sriov & basic) and keep only one unified ndo ops, compile out VF SRIOV ndos when not needed (MLX5_ESWITCH=n), and for VF netdev calling those ndos will result in returning -EPERM. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: Jes Sorensen <jsorensen@fb.com> Cc: kernel-team@fb.com
2017-08-07net/mlx5: Separate between E-Switch and MPFSSaeed Mahameed
Multi-Physical Function Switch (MPFs) is required for when multi-PF configuration is enabled to allow passing user configured unicast MAC addresses to the requesting PF. Before this patch eswitch.c used to manage the HW MPFS l2 table, E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's contexts update on unicast mac address list changes, to populate the PF's MPFS L2 table accordingly. In downstream patch we would like to allow compiling the driver without E-Switch functionalities, for that we move MPFS l2 table logic out of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to allow compiling out MPFS for those who don't want Multi-PF support. NIC PF netdevice will now directly update MPFS l2 table via the new MPFS API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain responsible of updating its MPFS l2 table on behalf of its VFs. Due to this change we also don't require enabling vport(0) (PF vport) unicast mac changes events anymore, for when SRIOV is not enabled. Which means E-Switch is now activated only on SRIOV activation, and not required otherwise. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Jes Sorensen <jsorensen@fb.com> Cc: kernel-team@fb.com
2017-08-07net/mlx5: Unify vport manager capability checkSaeed Mahameed
Expose MLX5_VPORT_MANAGER macro to check for strict vport manager E-switch and MPFS (Multi Physical Function Switch) abilities. VPORT manager must be a PF with an ethernet link and with FW advertised vport group manager capability Replace older checks with the new macro and use it where needed in eswitch.c and mlx5e netdev eswitch related flows. The same macro will be reused in MPFS separation downstream patch. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2017-08-07net/mlx5e: NIC netdev init flow cleanupSaeed Mahameed
Remove redundant call to unregister vport representor in mlx5e_add error flow. Hide the representor priv and eswitch internal structures from en_main.c as preparation step for downstream patches which would allow building the driver without support for representors and eswitch. Fixes: 6f08a22c5fb2 ("net/mlx5e: Register/unregister vport representors on interface attach/detach") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2017-08-07net/mlx5e: Rearrange netdevice ops structuresSaeed Mahameed
Since we are going to allow building the driver without eswitch support, it would be possible to compile out the sriov netdevice ops struct such that the basic ops instance will be used for non VF devices too. Add missing udp tunnel ndos into mlx5e_netdev_ops_basic. While here, rearrange some ndos in the sriov ops struct and put vf/eswitch related ndos towards the end of it. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
2017-08-06Merge branch 'sctp-remove-typedefs-from-structures-part-5'David S. Miller
Xin Long says: ==================== sctp: remove typedefs from structures part 5 As we know, typedef is suggested not to use in kernel, even checkpatch.pl also gives warnings about it. Now sctp is using it for many structures. All this kind of typedef's using should be removed. This patchset is the part 5 to remove all typedefs in include/net/sctp/constants.h. Just as the part 1-4, No any code's logic would be changed in these patches, only cleaning up. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-06sctp: remove the typedef sctp_subtype_tXin Long
This patch is to remove the typedef sctp_subtype_t, and replace with union sctp_subtype in the places where it's using this typedef. Note that it doesn't fix many indents although it should, as sctp_disposition_t's removal would mess them up again. So better to fix them when removing sctp_disposition_t in later patch. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-06sctp: remove the typedef sctp_event_tXin Long
This patch is to remove the typedef sctp_event_t, and replace with enum sctp_event in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>