summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2023-08-14devlink: introduce dumpit callbacks for split opsJiri Pirko
Introduce dumpit callbacks for generated split ops. Have them as a thin wrapper around iteration function and allow to pass dump_one() function pointer directly without need to store in devlink_cmd structs. Note that the function prototypes are temporary until the generated ones will replace them in a follow-up patch. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230811155714.1736405-6-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14devlink: rename doit callbacks for per-instance dump commandsJiri Pirko
Rename netlink doit callback functions for the commands that do implement per-instance dump to match the generated names that are going to be introduce in the follow-up patch. Note that the function prototypes are temporary until the generated ones will replace them in a follow-up patch. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230811155714.1736405-5-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14devlink: introduce devlink_nl_pre_doit_port*() helper functionsJiri Pirko
Define port handling helpers what don't rely on internal_flags. Have __devlink_nl_pre_doit() to accept the flags as a function arg and make devlink_nl_pre_doit() a wrapper helper function calling it. Introduce new helpers devlink_nl_pre_doit_port() and devlink_nl_pre_doit_port_optional() to be used by split ops in follow-up patch. Note that the function prototypes are temporary until the generated ones will replace them in a follow-up patch. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230811155714.1736405-4-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14devlink: parse rate attrs in doit() callbacksJiri Pirko
No need to give the rate any special treatment in netlink attributes parsing, as unlike for ports, there is only a couple of commands benefiting from that. Remove DEVLINK_NL_FLAG_NEED_RATE*, make pre_doit() callback simpler by moving the rate attributes parsing to rate_*_doit() ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230811155714.1736405-3-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14devlink: parse linecard attr in doit() callbacksJiri Pirko
No need to give the linecards any special treatment in netlink attribute parsing, as unlike for ports, there is only a couple of commands benefiting from that. Remove DEVLINK_NL_FLAG_NEED_LINECARD, make pre_doit() callback simpler by moving the linecard attribute parsing to linecard_[gs]et_doit() ops. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230811155714.1736405-2-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14net: phy: Introduce PSGMII PHY interface modeGabor Juhos
The PSGMII interface is similar to QSGMII. The main difference is that the PSGMII interface combines five SGMII lines into a single link while in QSGMII only four lines are combined. Similarly to the QSGMII, this interface mode might also needs special handling within the MAC driver. It is commonly used by Qualcomm with their QCA807x PHY series and modern WiSoC-s. Add definitions for the PHY layer to allow to express this type of connection between the MAC and PHY. Signed-off-by: Gabor Juhos <j4g8y7@gmail.com> Signed-off-by: Robert Marko <robert.marko@sartura.hr> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14dt-bindings: net: ethernet-controller: add PSGMII modeRobert Marko
Add a new PSGMII mode which is similar to QSGMII with the difference being that it combines 5 SGMII lines into a single link compared to 4 on QSGMII. It is commonly used by Qualcomm on their QCA807x PHY series. Signed-off-by: Robert Marko <robert.marko@sartura.hr> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14Merge branch 'mlxsw-redirection'David S. Miller
Petr Machata says: ==================== mlxsw: Support traffic redirection from a locked bridge port Ido Schimmel writes: It is possible to add a filter that redirects traffic from the ingress of a bridge port that is locked (i.e., performs security / SMAC lookup) and has learning enabled. For example: # ip link add name br0 type bridge # ip link set dev swp1 master br0 # bridge link set dev swp1 learning on locked on mab on # tc qdisc add dev swp1 clsact # tc filter add dev swp1 ingress pref 1 proto ip flower skip_sw src_ip 192.0.2.1 action mirred egress redirect dev swp2 In the kernel's Rx path, this filter is evaluated before the Rx handler of the bridge, which means that redirected traffic should not be affected by bridge port configuration such as learning. However, the hardware data path is a bit different and the redirect action (FORWARDING_ACTION in hardware) merely attaches a pointer to the packet, which is later used by the L2 lookup stage to understand how to forward the packet. Between both stages - ingress ACL and L2 lookup - learning and security lookup are performed, which means that redirected traffic is affected by bridge port configuration, unlike in the kernel's data path. The learning discrepancy was handled in commit 577fa14d2100 ("mlxsw: spectrum: Do not process learned records with a dummy FID") by simply ignoring learning notifications generated by the redirected traffic. A similar solution is not possible for the security / SMAC lookup since - unlike learning - the CPU is not involved and packets that failed the lookup are dropped by the device. Instead, solve this by prepending the ignore action to the redirect action and use it to instruct the device to disable both learning and the security / SMAC lookup for redirected traffic. Patch #1 adds the ignore action. Patch #2 prepends the action to the redirect action in flower offload code. Patch #3 removes the workaround in commit 577fa14d2100 ("mlxsw: spectrum: Do not process learned records with a dummy FID") since it is no longer needed. Patch #4 adds a test case. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14selftests: forwarding: Add test case for traffic redirection from a locked portIdo Schimmel
Check that traffic can be redirected from a locked bridge port and that it does not create locked FDB entries. Cc: Hans J. Schultz <netdev@kapio-technology.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mlxsw: spectrum: Stop ignoring learning notifications from redirected trafficIdo Schimmel
As explained in the previous patch, with the ignore action prepended to the redirect action, it is not longer possible for redirected traffic to generate learning notifications. Therefore, remove the workaround that was added in commit 577fa14d2100 ("mlxsw: spectrum: Do not process learned records with a dummy FID") as it is no longer needed. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mlxsw: spectrum_flower: Disable learning and security lookup when redirectingIdo Schimmel
It is possible to add a filter that redirects traffic from the ingress of a bridge port that is locked (i.e., performs security / SMAC lookup) and has learning enabled. For example: # ip link add name br0 type bridge # ip link set dev swp1 master br0 # bridge link set dev swp1 learning on locked on mab on # tc qdisc add dev swp1 clsact # tc filter add dev swp1 ingress pref 1 proto ip flower skip_sw src_ip 192.0.2.1 action mirred egress redirect dev swp2 In the kernel's Rx path, this filter is evaluated before the Rx handler of the bridge, which means that redirected traffic should not be affected by bridge port configuration such as learning. However, the hardware data path is a bit different and the redirect action (FORWARDING_ACTION in hardware) merely attaches a pointer to the packet, which is later used by the L2 lookup stage to understand how to forward the packet. Between both stages - ingress ACL and L2 lookup - learning and security lookup are performed, which means that redirected traffic is affected by bridge port configuration, unlike in the kernel's data path. The learning discrepancy was handled in commit 577fa14d2100 ("mlxsw: spectrum: Do not process learned records with a dummy FID") by simply ignoring learning notifications generated by the redirected traffic. A similar solution is not possible for the security / SMAC lookup since - unlike learning - the CPU is not involved and packets that failed the lookup are dropped by the device. Instead, solve this by prepending the ignore action to the redirect action and use it to instruct the device to disable both learning and the security / SMAC lookup for redirected traffic. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mlxsw: core_acl_flex_actions: Add IGNORE_ACTIONIdo Schimmel
Add the IGNORE_ACTION which is used to ignore basic switching functions such as learning on a per-packet basis. The action will be prepended to the FORWARDING_ACTION in subsequent patches. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14net: stmmac: xgmac: show more MAC HW features in debugfsFurong Xu
1. Show TSSTSSEL(Timestamp System Time Source), ADDMACADRSEL(additional MAC addresses), SMASEL(SMA/MDIO Interface), HDSEL(Half-duplex Support) in debugfs. 2. Show exact number of additional MAC address registers for XGMAC2 core. 3. XGMAC2 core does not have different IP checksum offload types, so just show rx_coe instead of rx_coe_type1 or rx_coe_type2. 4. XGMAC2 core does not have rxfifo_over_2048 definition, skip it. Signed-off-by: Furong Xu <0x1207@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14Merge branch 'net-stats-helpers'David S. Miller
Li Zetao says: ==================== Use helper functions to update stats The patch set uses the helper functions dev_sw_netstats_rx_add() and dev_sw_netstats_tx_add() to update stats, which is the same as implementing the function separately. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14vxlan: Use helper functions to update statsLi Zetao
Use the helper functions dev_sw_netstats_rx_add() and dev_sw_netstats_tx_add() to update stats, which helps to provide code readability. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14net: macsec: Use helper functions to update statsLi Zetao
Use the helper functions dev_sw_netstats_rx_add() and dev_sw_netstats_tx_add() to update stats, which helps to provide code readability. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14vmxnet3: Add XDP support.William Tu
The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT. Background: The vmxnet3 rx consists of three rings: ring0, ring1, and dataring. For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma mapped to the ring's descriptor. If LRO is enabled and packet size larger than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is allocated using alloc_page. So for LRO packets, the payload will be in one buffer from r0 and multiple from r1, for non-LRO packets, only one descriptor in r0 is used for packet size less than 3k. When receiving a packet, the first descriptor will have the sop (start of packet) bit set, and the last descriptor will have the eop (end of packet) bit set. Non-LRO packets will have only one descriptor with both sop and eop set. Other than r0 and r1, vmxnet3 dataring is specifically designed for handling packets with small size, usually 128 bytes, defined in VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in order to avoid memory mapping/unmapping overhead. In summary, packet size: A. < 128B: use dataring B. 128B - 3K: use ring0 (VMXNET3_RX_BUF_SKB) C. > 3K: use ring0 and ring1 (VMXNET3_RX_BUF_SKB + VMXNET3_RX_BUF_PAGE) As a result, the patch adds XDP support for packets using dataring and r0 (case A and B), not the large packet size when LRO is enabled. XDP Implementation: When user loads and XDP prog, vmxnet3 driver checks configurations, such as mtu, lro, and re-allocate the rx buffer size for reserving the extra headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be associated with every rx queue of the device. Note that when using dataring for small packet size, vmxnet3 (front-end driver) doesn't control the buffer allocation, as a result we allocate a new page and copy packet from the dataring to XDP frame. The receive side of XDP is implemented for case A and B, by invoking the bpf program at vmxnet3_rq_rx_complete and handle its returned action. The vmxnet3_process_xdp(), vmxnet3_process_xdp_small() function handles the ring0 and dataring case separately, and decides the next journey of the packet afterward. For TX, vmxnet3 has split header design. Outgoing packets are parsed first and protocol headers (L2/L3/L4) are copied to the backend. The rest of the payload are dma mapped. Since XDP_TX does not parse the packet protocol, the entire XDP frame is dma mapped for transmission and transmitted in a batch. Later on, the frame is freed and recycled back to the memory pool. Performance: Tested using two VMs inside one ESXi vSphere 7.0 machine, using single core on each vmxnet3 device, sender using DPDK testpmd tx-mode attached to vmxnet3 device, sending 64B or 512B UDP packet. VM1 txgen: $ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \ --forward-mode=txonly --eth-peer=0,<mac addr of vm2> option: add "--txonly-multi-flow" option: use --txpkts=512 or 64 byte VM2 running XDP: $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> --skb-mode $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> options: XDP_DROP, XDP_PASS, XDP_TX To test REDIRECT to cpu 0, use $ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop Single core performance comparison with skb-mode. 64B: skb-mode -> native-mode XDP_DROP: 1.6Mpps -> 2.4Mpps XDP_PASS: 338Kpps -> 367Kpps XDP_TX: 1.1Mpps -> 2.3Mpps REDIRECT-drop: 1.3Mpps -> 2.3Mpps 512B: skb-mode -> native-mode XDP_DROP: 863Kpps -> 1.3Mpps XDP_PASS: 275Kpps -> 376Kpps XDP_TX: 554Kpps -> 1.2Mpps REDIRECT-drop: 659Kpps -> 1.2Mpps Demo: https://youtu.be/4lm1CSCi78Q Future work: - XDP frag support - use napi_consume_skb() instead of dev_kfree_skb_any at unmap - stats using u64_stats_t - using bitfield macro BIT() - optimization for DMA synchronization using actual frame length, instead of always max_len Signed-off-by: William Tu <u9012063@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14Merge branch 'ovs-drop-reasons'David S. Miller
Adrian Moreno says: ==================== openvswitch: add drop reasons There is currently a gap in drop visibility in the openvswitch module. This series tries to improve this by adding a new drop reason subsystem for OVS. Apart from adding a new drop reasson subsystem and some common drop reasons, this series takes Eric's preliminary work [1] on adding an explicit drop action and integrates it into the same subsystem. A limitation of this series is that it does not report upcall errors. The reason is that there could be many sources of upcall drops and the most common one, which is the netlink buffer overflow, cannot be reported via kfree_skb() because the skb is freed in the netlink layer (see [2]). Therefore, using a reason for the rare events and not the common one would be even more misleading. I'd propose we add (in a follow up patch) a tracepoint to better report upcall errors. [1] https://lore.kernel.org/netdev/202306300609.tdRdZscy-lkp@intel.com/T/ [2] commit 1100248a5c5c ("openvswitch: Fix double reporting of drops in dropwatch") --- v4 -> v5: - Rebased - Added a helper function to explicitly convert drop reason enum types v3 -> v4: - Changed names of errors following Ilya's suggestions - Moved the ovs-dpctl.py changes from patch 7/7 to 3/7 - Added a test to ensure actions following a drop are rejected rfc2 -> v3: - Rebased on top of latest net-next rfc1 -> rfc2: - Fail when an explicit drop is not the last - Added a drop reason for action errors - Added braces around macros - Dropped patch that added support for masks in ovs-dpctl.py as it's now included in Aaron's series [2]. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14selftests: openvswitch: add explicit drop testcaseAdrian Moreno
Test explicit drops generate the right drop reason. Also, verify that the kernel rejects flows with actions following an explicit drop. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14selftests: openvswitch: add drop reason testcaseAdrian Moreno
Test if the correct drop reason is reported when OVS drops a packet due to an explicit flow. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14net: openvswitch: add misc error drop reasonsAdrian Moreno
Use drop reasons from include/net/dropreason-core.h when a reasonable candidate exists. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14net: openvswitch: add meter drop reasonAdrian Moreno
By using an independent drop reason it makes it easy to distinguish between QoS-triggered or flow-triggered drop. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14net: openvswitch: add explicit drop actionEric Garver
From: Eric Garver <eric@garver.life> This adds an explicit drop action. This is used by OVS to drop packets for which it cannot determine what to do. An explicit action in the kernel allows passing the reason _why_ the packet is being dropped or zero to indicate no particular error happened (i.e: OVS intentionally dropped the packet). Since the error codes coming from userspace mean nothing for the kernel, we squash all of them into only two drop reasons: - OVS_DROP_EXPLICIT_WITH_ERROR to indicate a non-zero value was passed - OVS_DROP_EXPLICIT to indicate a zero value was passed (no error) e.g. trace all OVS dropped skbs # perf trace -e skb:kfree_skb --filter="reason >= 0x30000" [..] 106.023 ping/2465 skb:kfree_skb(skbaddr: 0xffffa0e8765f2000, \ location:0xffffffffc0d9b462, protocol: 2048, reason: 196611) reason: 196611 --> 0x30003 (OVS_DROP_EXPLICIT) Also, this patch allows ovs-dpctl.py to add explicit drop actions as: "drop" -> implicit empty-action drop "drop(0)" -> explicit non-error action drop "drop(42)" -> explicit error action drop Signed-off-by: Eric Garver <eric@garver.life> Co-developed-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14net: openvswitch: add action error drop reasonAdrian Moreno
Add a drop reason for packets that are dropped because an action returns a non-zero error code. Acked-by: Aaron Conole <aconole@redhat.com> Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14net: openvswitch: add last-action drop reasonAdrian Moreno
Create a new drop reason subsystem for openvswitch and add the first drop reason to represent last-action drops. Last-action drops happen when a flow has an empty action list or there is no action that consumes the packet (output, userspace, recirc, etc). It is the most common way in which OVS drops packets. Implementation-wise, most of these skb-consuming actions already call "consume_skb" internally and return directly from within the do_execute_actions() loop so with minimal changes we can assume that any skb that exits the loop normally is a packet drop. Signed-off-by: Adrian Moreno <amorenoz@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14Merge branch 'mptcp-remove-msk-subflow'David S. Miller
Matthieu Baerts says: ==================== mptcp: get rid of msk->subflow The MPTCP protocol maintains an additional struct socket per connection, mainly to be able to easily use tcp-level struct socket operations. This leads to several side effects, beyond the quite unfortunate / confusing 'subflow' field name: - active and passive sockets behaviour is inconsistent: only active ones have a not NULL msk->subflow, leading to different error handling and different error code returned to the user-space in several places. - active sockets uses an unneeded, larger amount of memory - passive sockets can't successfully go through accept(), disconnect(), accept() sequence, see [1] for more details. The 13 first patches of this series are from Paolo and address all the above, finally getting rid of the blamed field: - The first patch is a minor clean-up. - In the next 11 patches, msk->subflow usage is systematically removed from the MPTCP protocol, replacing it with direct msk->first usage, eventually introducing new core helpers when needed. - The 13th patch finally disposes the field, and it's the only patch in the series intended to produce functional changes. The last and 14th patch is from Kuniyuki and it is not linked to the previous ones: it is a small clean-up to get rid of an unnecessary check in mptcp_init_sock(). [1] https://github.com/multipath-tcp/mptcp_net-next/issues/290 ==================== Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
2023-08-14mptcp: Remove unnecessary test for __mptcp_init_sock()Kuniyuki Iwashima
__mptcp_init_sock() always returns 0 because mptcp_init_sock() used to return the value directly. But after commit 18b683bff89d ("mptcp: queue data for mptcp level retransmission"), __mptcp_init_sock() need not return value anymore. Let's remove the unnecessary test for __mptcp_init_sock() and make it return void. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: get rid of msk->subflowPaolo Abeni
Such field is now unused just as a flag to control the first subflow deletion at close() time. Introduce a new bit flag for that and finally drop the mentioned field. As an intended side effect, now the first subflow sock is not freed before close() even for passive sockets. The msk has no open/active subflows if the first one is closed and the subflow list is singular, update accordingly the state check in mptcp_stream_accept(). Among other benefits, the subflow removal, reduces the amount of memory used on the client side for each mptcp connection, allows passive sockets to go through successful accept()/disconnect()/connect() and makes return error code consistent for failing both passive and active sockets. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/290 Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: change the mpc check helper to return a skPaolo Abeni
After the previous patch the __mptcp_nmpc_socket helper is used only to ensure that the MPTCP socket is a suitable status - that is, the mptcp capable handshake is not started yet. Change the return value to the relevant subflow sock, to finally remove the last references to first subflow socket in the MPTCP stack. As a bonus, we can get rid of a few local variables in different functions. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: avoid ssock usage in mptcp_pm_nl_create_listen_socket()Paolo Abeni
This is one of the few remaining spots actually manipulating the first subflow socket. We can leverage the recently introduced inet helpers to get rid of ssock there. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: avoid additional indirection in sockoptPaolo Abeni
The mptcp sockopt infrastructure unneedly uses the first subflow socket struct in a few spots. We are going to remove such field soon, so use directly the first subflow sock instead. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: avoid unneeded indirection in mptcp_stream_accept()Paolo Abeni
We are going to remove the first subflow socket soon, so avoid the additional indirection at accept() time. Instead access directly the first subflow sock, and update mptcp_accept() to operate on it. This allows dropping a duplicated check in mptcp_accept(). No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: avoid additional indirection in mptcp_poll()Paolo Abeni
We are going to remove the first subflow socket soon, so avoid the additional indirection at poll() time. Instead access directly the first subflow sock. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: avoid additional indirection in mptcp_listen()Paolo Abeni
We are going to remove the first subflow socket soon, so avoid the additional indirection via at listen() time. Instead call directly the recently introduced helper on the first subflow sock. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14net: factor out __inet_listen_sk() helperPaolo Abeni
The mptcp protocol maintains an additional socket just to easily invoke a few stream operations on the first subflow. One of them is inet_listen(). Factor out an helper operating directly on the (locked) struct sock, to allow get rid of the above dependency in the next patch without duplicating the existing code. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: mptcp: avoid additional indirection in mptcp_bind()Paolo Abeni
We are going to remove the first subflow socket soon, so avoid the additional indirection via at bind() time. Instead call directly the recently introduced helpers on the first subflow sock. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14net: factor out inet{,6}_bind_sk helpersPaolo Abeni
The mptcp protocol maintains an additional socket just to easily invoke a few stream operations on the first subflow. One of them is bind(). Factor out the helpers operating directly on the struct sock, to allow get rid of the above dependency in the next patch without duplicating the existing code. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: avoid subflow socket usage in mptcp_get_port()Paolo Abeni
We are going to remove the first subflow socket soon, so avoid accessing it in mptcp_get_port(). Instead, access directly the first subflow sock. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: avoid additional __inet_stream_connect() callPaolo Abeni
The mptcp protocol maintains an additional socket just to easily invoke a few stream operations on the first subflow. One of them is __inet_stream_connect(). We are going to remove the first subflow socket soon, so avoid the additional indirection via at connect time, calling directly into the sock-level connect() ops. The sk-level connect never return -EINPROGRESS, cleanup the error path accordingly. Additionally, the ssk status on error is always TCP_CLOSE. Avoid unneeded access to the subflow sk state. No functional change intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-14mptcp: avoid unneeded mptcp_token_destroy() callsPaolo Abeni
The MPTCP protocol currently clears the msk token both at connect() and listen() time. That is needed to deal with failing connect() calls that can create a new token while leaving the sk in TCP_CLOSE,SS_UNCONNECTED status and thus allowing later connect() and/or listen() calls. Let's deal with such failures explicitly, cleaning the token in a timely manner and avoid the confusing early mptcp_token_destroy(). Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-13net: Remove leftover include from nftables.hJörn-Thorben Hinz
Commit db3685b4046f ("net: remove obsolete members from struct net") removed the uses of struct list_head from this header, without removing the corresponding included header. Signed-off-by: Jörn-Thorben Hinz <jthinz@mailbox.tu-berlin.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-13Merge tag 'for-net-next-2023-08-11' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next bluetooth-next pull request for net-next: - Add new VID/PID for Mediatek MT7922 - Add support multiple BIS/BIG - Add support for Intel Gale Peak - Add support for Qualcomm WCN3988 - Add support for BT_PKT_STATUS for ISO sockets - Various fixes for experimental ISO support - Load FW v2 for RTL8852C - Add support for NXP AW693 chipset - Add support for Mediatek MT2925
2023-08-13net: pcs: lynx: fix lynx_pcs_link_up_sgmii() not doing anything in ↵Vladimir Oltean
fixed-link mode lynx_pcs_link_up_sgmii() is supposed to update the PCS speed and duplex for the non-inband operating modes, and prior to the blamed commit, it did just that, but a mistake sneaked into the conversion and reversed the condition. It is easy for this to go undetected on platforms that also initialize the PCS in the bootloader, because Linux doesn't reset it (although maybe it should). The nature of the bug is that phylink will not touch the IF_MODE_HALF_DUPLEX | IF_MODE_SPEED_MSK fields when it should, and it will apparently keep working if the previous values set by the bootloader were correct. Fixes: c689a6528c22 ("net: pcs: lynx: update PCS driver to use neg_mode") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-13Merge branch 'net-pci_dev_id'David S. Miller
Zheng Zengkai says: ==================== net: Use pci_dev_id() to simplify the code PCI core API pci_dev_id() can be used to get the BDF number for a pci device. Use the API to simplify the code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-13net: ngbe: use pci_dev_id() to simplify the codeZheng Zengkai
PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it manually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-13net: tc35815: Use pci_dev_id() to simplify the codeZheng Zengkai
PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it manually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-13net: smsc: Use pci_dev_id() to simplify the codeZheng Zengkai
PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it manually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-13tg3: Use pci_dev_id() to simplify the codeZheng Zengkai
PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it manually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-13et131x: Use pci_dev_id() to simplify the codeZheng Zengkai
PCI core API pci_dev_id() can be used to get the BDF number for a pci device. We don't need to compose it manually. Use pci_dev_id() to simplify the code a little bit. Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-13net: e1000: Remove unused declarationsYue Haibing
Commit 675ad47375c7 ("e1000: Use netdev_<level>, pr_<level> and dev_<level>") declared but never implemented e1000_get_hw_dev_name(). Commit 1532ecea1deb ("e1000: drop dead pcie code from e1000") removed e1000_check_mng_mode()/e1000_blink_led_start() but not the declarations. Commit c46b59b241ec ("e1000: Remove unused function e1000_mta_set.") removed e1000_mta_set() but not its declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>