summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/ibm
AgeCommit message (Collapse)Author
2024-11-03net: ibm: emac: mal: move irq maps downRosen Penev
Moves the handling right before they are used and allows merging a branch. Also get rid of the error handling as devm_request_irq can handle that. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-13-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: mal: use devm for request_irqRosen Penev
Avoids manual frees. Also replaced irq_of_parse_and_map with platform_get_irq since it's simpler and does the same thing. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-12-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: mal: use devm for kzallocRosen Penev
Simplifies the probe function by removing gotos. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-11-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: zmii: devm_platform_get_resourceRosen Penev
Simplifies the probe function by a bit and allows removing the _remove function such that devm now handles all cleanup. printk gets converted to dev_err as np is now gone. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-10-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: zmii: use devm for mutex_initRosen Penev
It seems that since inception, this driver never called mutex_destroy in _remove. Use devm to handle this automatically. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-9-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: zmii: use devm for kzallocRosen Penev
Simplifies the probe function by removing gotos. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-8-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: rgmii: devm_platform_get_resourceRosen Penev
Simplifies the probe function by a bit and allows removing the _remove function such that devm now handles all cleanup. printk gets converted to dev_err as np is now gone. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-7-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: rgmii: use devm for mutex_initRosen Penev
It seems that since inception, this driver never called mutex_destroy in _remove. Use devm to handle this automatically. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-6-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: rgmii: use devm for kzallocRosen Penev
Simplifies the probe function by removing gotos. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-5-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: tah: devm_platform_get_resourcesRosen Penev
Simplifies the probe function by a bit and allows removing the _remove function such that devm now handles all cleanup. printk gets converted to dev_err as np is now gone. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-4-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: tah: use devm for mutex_initRosen Penev
It seems that since inception, this driver never called mutex_destroy in _remove. Use devm to handle this automatically. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-3-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-03net: ibm: emac: tah: use devm for kzallocRosen Penev
Simplifies the probe function by removing gotos. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241030203727.6039-2-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29ibmvnic: use ethtool string helpersRosen Penev
They are the preferred way to copy ethtool strings. Avoids manually incrementing the data pointer. Signed-off-by: Rosen Penev <rosenp@gmail.com> Tested-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241022203240.391648-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-29net: ibm: emac: generate random MAC if not foundRosen Penev
On this Cisco MX60W, u-boot sets the local-mac-address property. Unfortunately by default, the MAC is wrong and is actually located on a UBI partition. Which means nvmem needs to be used to grab it. In the case where that fails, EMAC fails to initialize instead of generating a random MAC as many other drivers do. Match behavior with other drivers to have a working ethernet interface. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29net: ibm: emac: use devm for mutex_initRosen Penev
It seems since inception that mutex_destroy was never called for these in _remove. Instead of handling this manually, just use devm for simplicity. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29net: ibm: emac: use platform_get_irqRosen Penev
No need for irq_of_parse_and_map since we have platform_device. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29net: ibm: emac: use devm_platform_ioremap_resourceRosen Penev
No need to have a struct resource. Gets rid of the TODO. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-29net: ibm: emac: use netif_receive_skb_listRosen Penev
Small rx improvement. Would use napi_gro_receive instead but that's a lot more involved than netif_receive_skb_list because of how the function is implemented. Before: > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 51556 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 559 MBytes 467 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 48228 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.03 sec 558 MBytes 467 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 47600 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 557 MBytes 466 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 37252 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.05 sec 559 MBytes 467 Mbits/sec After: > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 40786 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.05 sec 572 MBytes 478 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 52482 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 571 MBytes 477 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 48370 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 572 MBytes 478 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 46086 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.05 sec 571 MBytes 476 Mbits/sec > iperf -c 192.168.1.1 ------------------------------------------------------------ Client connecting to 192.168.1.1, TCP port 5001 TCP window size: 16.0 KByte (default) ------------------------------------------------------------ [ 1] local 192.168.1.101 port 46062 connected with 192.168.1.1 port 5001 [ ID] Interval Transfer Bandwidth [ 1] 0.00-10.04 sec 572 MBytes 478 Mbits/sec Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc3). No conflicts and no adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-09net: ibm: emac: mal: add dcr_unmap to _removeRosen Penev
It's done in probe so it should be undone here. Fixes: 1d3bb996481e ("Device tree aware EMAC driver") Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20241008233050.9422-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-08net: ibm: emac: mal: fix wrong gotoRosen Penev
dcr_map is called in the previous if and therefore needs to be unmapped. Fixes: 1ff0fcfcb1a6 ("ibm_newemac: Fix new MAL feature handling") Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20241007235711.5714-1-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04net: ethernet: Switch back to struct platform_driver::remove()Uwe Kleine-König
After commit 0edb555a65d1 ("platform: Make platform_driver::remove() return void") .remove() is (again) the right callback to implement for platform drivers. Convert all platform drivers below drivers/net/ethernet to use .remove(), with the eventual goal to drop struct platform_driver::remove_new(). As .remove() and .remove_new() have the same prototypes, conversion is done by just changing the structure member name in the driver initializer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Link: https://patch.msgid.link/18f7c585a1a8a8ac8b03a2fca7de19bd5c52ac2b.1727949050.git.u.kleine-koenig@baylibre.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-04ibmvnic: Inspect header requirements before using scrq directNick Child
Previously, the TX header requirement for standard frames was ignored. This requirement is a bitstring sent from the VIOS which maps to the type of header information needed during TX. If no header information, is needed then send subcrq direct can be used (which can be more performant). This bitstring was previously ignored for standard packets (AKA non LSO, non CSO) due to the belief that the bitstring was over-cautionary. It turns out that there are some configurations where the backing device does need header information for transmission of standard packets. If the information is not supplied then this causes continuous "Adapter error" transport events. Therefore, this bitstring should be respected and observed before considering the use of send subcrq direct. Fixes: 74839f7a8268 ("ibmvnic: Introduce send sub-crq direct") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001163200.1802522-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-03ibmvnic: Add stat for tx direct vs tx batchedNick Child
Allow tracking of packets sent with send_subcrq direct vs indirect. `ethtool -S <dev>` will now provide a counter of the number of uses of each xmit method. This metric will be useful in performance debugging. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241001163531.1803152-1-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13net: ibm: emac: get rid of wol_irqRosen Penev
This is completely unused. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-10-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13net: ibm: emac: remove all waiting codeRosen Penev
EPROBE_DEFER, which probably wasn't available when this driver was written, can be used instead of waiting manually. Signed-off-by: Rosen Penev <rosenp@gmail.com> Link: https://patch.msgid.link/20240912024903.6201-9-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13net: ibm: emac: replace of_get_propertyRosen Penev
of_property_read_u32 can be used. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-8-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13net: ibm: emac: use netdev's phydev directlyRosen Penev
Avoids having to use own struct member. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-7-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13net: ibm: emac: use devm for register_netdevRosen Penev
Cleans it up automatically. No need to handle manually. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-6-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13net: ibm: emac: remove mii_bus with devmRosen Penev
Switching to devm management of mii_bus allows to remove mdiobus_unregister calls and thus avoids needing a mii_bus global struct member. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-5-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13net: ibm: emac: use devm for of_iomapRosen Penev
Allows removing manual iounmap. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-4-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13net: ibm: emac: manage emac_irq with devmRosen Penev
It's the last to go in remove. Safe to let devm handle it. Also move request_irq to probe for clarity. It's removed in _remove not close. Use dev_err_probe instead of printk. Handles EPROBE_DEFER automatically. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-3-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-13net: ibm: emac: use devm for alloc_etherdevRosen Penev
Allows to simplify the code slightly. This is safe to do as free_netdev gets called last. Signed-off-by: Rosen Penev <rosenp@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20240912024903.6201-2-rosenp@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09net: ibm: emac: Use __iomem annotation for emac_[xg]aht_baseSimon Horman
dev->emacp contains an __iomem pointer and values derived from it are used as __iomem pointers. So use this annotation in the return type for helpers that derive pointers from dev->emacp. Flagged by Sparse as: .../core.c:444:36: warning: incorrect type in argument 1 (different address spaces) .../core.c:444:36: expected unsigned int volatile [noderef] [usertype] __iomem *addr .../core.c:444:36: got unsigned int [usertype] * .../core.c: note: in included file: .../core.h:416:25: warning: cast removes address space '__iomem' of expression Compile tested only. No functional change intended. Signed-off-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240906-emac-iomem-v1-1-207cc4f3fed0@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20net: ethernet: ibm: Simpify code with for_each_child_of_node()Zhang Zekun
for_each_child_of_node can help to iterate through the device_node, and we don't need to use while loop. No functional change with this conversion. Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240816015837.109627-1-zhangzekun11@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-09ibmvnic: Perform tx CSO during send scrq directNick Child
During initialization with the vnic server, a bitstring is communicated to the client regarding header info needed during CSO (See "VNIC Capabilities" in PAPR). Most of the time, to be safe, vnic server requests header info for CSO. When header info is needed, multiple TX descriptors are required per skb; This limits the driver to use send_subcrq_indirect instead of send_subcrq_direct. Previously, the vnic server request for header info was ignored. This allowed the use of send_sub_crq_direct. Transmissions were successful because the bitstring returned by vnic server is broad and over cautionary. It was observed that mlx backing devices could actually transmit and handle CSO packets without the vnic server receiving header info (despite the fact that the bitstring requested it). There was a trust issue: The bitstring was overcautionary. This extra precaution (requesting header info when the backing device may not use it) comes at the cost of performance (using direct vs indirect hcalls has a 30% delta in small packet RR transaction rate). So it has been requested that the vnic server team tries to ensure that the bitstring is more exact. In the meantime, disable CSO when it is possible to use the skb in the send_subcrq_direct path. In other words, calculate the checksum before handing the packet to FW when the packet is not segmented and xmit_more is false. Since the code path is only possible if the skb is non GSO and xmit_more is false, the cost of doing checksum in the send_subcrq_direct path is minimal. Any large segmented skb will have xmit_more set to true more frequently and it is inexpensive to do checksumming on a small skb. The worst-case workload would be a 9000 MTU TCP_RR test with close to MTU sized packets (and TSO off). This allows xmit_more to be false more frequently and open the code path up to use send_subcrq_direct. Observing trace data (graph-time = 1) and packet rate with this workload shows minimal performance degradation: 1. NIC does checksum w headers, safely use send_subcrq_indirect: - Packet rate: 631k txs - Trace data: ibmvnic_xmit = 44344685.87 us / 6234576 hits = AVG 7.11 us skb_checksum_help = 4.07 us / 2 hits = AVG 2.04 us ^ Notice hits, tracing this just for reassurance ibmvnic_tx_scrq_flush = 33040649.69 us / 5638441 hits = AVG 5.86 us send_subcrq_indirect = 37438922.24 us / 6030859 hits = AVG 6.21 us 2. NIC does checksum w/o headers, dangerously use send_subcrq_direct: - Packet rate: 831k txs - Trace data: ibmvnic_xmit = 48940092.29 us / 8187630 hits = AVG 5.98 us skb_checksum_help = 2.03 us / 1 hits = AVG 2.03 ibmvnic_tx_scrq_flush = 31141879.57 us / 7948960 hits = AVG 3.92 us send_subcrq_indirect = 8412506.03 us / 728781 hits = AVG 11.54 ^ notice hits is much lower b/c send_subcrq_direct was called ^ wasn't traceable 3. driver does checksum, safely use send_subcrq_direct (THIS PATCH): - Packet rate: 829k txs - Trace data: ibmvnic_xmit = 56696077.63 us / 8066168 hits = AVG 7.03 us skb_checksum_help = 8587456.16 us / 7526072 hits = AVG 1.14 us ibmvnic_tx_scrq_flush = 30219545.55 us / 7782409 hits = AVG 3.88 us send_subcrq_indirect = 8638326.44 us / 763693 hits = AVG 11.31 us When the bitstring ever specifies that CSO does not require headers (dependent on VIOS vnic server changes), then this patch should be removed and replaced with one that investigates the bitstring before using send_subcrq_direct. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-8-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09ibmvnic: Only record tx completed bytes once per handlerNick Child
Byte Queue Limits depends on dql_completed being called once per tx completion round in order to adjust its algorithm appropriately. The dql->limit value is an approximation of the amount of bytes that the NIC can consume per irq interval. If this approximation is too high then the NIC will become over-saturated. Too low and the NIC will starve. The dql->limit depends on dql->prev-* stats to calculate an optimal value. If dql_completed() is called more than once per irq handler then those prev-* values become unreliable (because they are not an accurate representation of the previous state of the NIC) resulting in a sub-optimal limit value. Therefore, move the call to netdev_tx_completed_queue() to the end of ibmvnic_complete_tx(). When performing 150 sessions of TCP rr (request-response 1 byte packets) workloads, one could observe: PREVIOUSLY: - limit and inflight values hovering around 130 - transaction rate of around 750k pps. NOW: - limit rises and falls in response to inflight (130-900) - transaction rate of around 1M pps (33% improvement) Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-7-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09ibmvnic: Introduce send sub-crq directNick Child
Firmware supports two hcalls to send a sub-crq request: H_SEND_SUB_CRQ_INDIRECT and H_SEND_SUB_CRQ. The indirect hcall allows for submission of batched messages while the other hcall is limited to only one message. This protocol is defined in PAPR section 17.2.3.3. Previously, the ibmvnic xmit function only used the indirect hcall. This allowed the driver to batch it's skbs. A single skb can occupy a few entries per hcall depending on if FW requires skb header information or not. The FW only needs header information if the packet is segmented. By this logic, if an skb is not GSO then it can fit in one sub-crq message and therefore is a candidate for H_SEND_SUB_CRQ. Batching skb transmission is only useful when there are more packets coming down the line (ie netdev_xmit_more is true). As it turns out, H_SEND_SUB_CRQ induces less latency than H_SEND_SUB_CRQ_INDIRECT. Therefore, use H_SEND_SUB_CRQ where appropriate. Small latency gains seen when doing TCP_RR_150 (request/response workload). Ftrace results (graph-time=1): Previous: ibmvnic_xmit = 29618270.83 us / 8860058.0 hits = AVG 3.34 ibmvnic_tx_scrq_flush = 21972231.02 us / 6553972.0 hits = AVG 3.35 Now: ibmvnic_xmit = 22153350.96 us / 8438942.0 hits = AVG 2.63 ibmvnic_tx_scrq_flush = 15858922.4 us / 6244076.0 hits = AVG 2.54 Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-6-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09ibmvnic: Remove duplicate memory barriers in txNick Child
send_subcrq_[in]direct() already has a dma memory barrier. Remove the earlier one. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-5-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09ibmvnic: Reduce memcpys in tx descriptor generationNick Child
Previously when creating the header descriptors, the driver would: 1. allocate a temporary buffer on the stack (in build_hdr_descs_arr) 2. memcpy the header info into the temporary buffer (in build_hdr_data) 3. memcpy the temp buffer into a local variable (in create_hdr_descs) 4. copy the local variable into the return buffer (in create_hdr_descs) Since, there is no opportunity for errors during this process, the temp buffer is not needed and work can be done on the return buffer directly. Repurpose build_hdr_data() to only calculate the header lengths. Rename it to get_hdr_lens(). Edit create_hdr_descs() to read from the skb directly and copy directly into the returned useful buffer. The process now involves less memory and write operations while also being more readable. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-4-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09ibmvnic: Use header len helper functions on txNick Child
Use the header length helper functions rather than trying to calculate it within the driver. There are defined functions for mac and network headers (skb_mac_header_len and skb_network_header_len) but no such function exists for the transport header length. Also, hdr_data was memset during allocation to all 0's so no need to memset again. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-3-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-09ibmvnic: Only replenish rx pool when resources are getting lowNick Child
Previously, the driver would replenish the rx pool if the polling function consumed less than the budget. The logic being that the driver did not exhaust its budget so that must mean that the driver is not busy and has cycles to spare for replenishing the pool. So pool replenishment happens on every poll which did not consume the budget. This can very costly during request-response tests. In fact, an extra ~100pps can be seen in TCP_RR_150 tests when we remove this conditional. Trace results (ftrace, graph-time=1) for the poll function are below: Previous results: ibmvnic_poll = 64951846.0 us / 4167628.0 hits = AVG 15.58 replenish_rx_pool = 17602846.0 us / 4710437.0 hits = AVG 3.74 Now: ibmvnic_poll = 57673941.0 us / 4791737.0 hits = AVG 12.04 replenish_rx_pool = 3938171.6 us / 4314.0 hits = AVG 912.88 While the replenish function takes longer, it is hit less frequently meaning the ibmvnic_poll function, on average, is faster. Furthermore, this change does not have a negative effect on performance bandwidth/latency measurements. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://patch.msgid.link/20240807211809.1259563-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02ibmveth: Recycle buffers during replenish phaseNick Child
When the length of a packet is under the rx_copybreak threshold, the buffer is copied into a new skb and sent up the stack. This allows the dma mapped memory to be recycled back to FW. Previously, the reuse of the DMA space was handled immediately. This means that further packet processing has to wait until h_add_logical_lan finishes for this packet. Therefore, when reusing a packet, offload the hcall to the replenish function. As a result, much of the shared logic between the recycle and replenish functions can be removed. This change increases TCP_RR packet rate by another 15% (370k to 430k txns). We can see the ftrace data supports this: PREV: ibmveth_poll = 8078553.0 us / 190999.0 hits = AVG 42.3 us NEW: ibmveth_poll = 7632787.0 us / 224060.0 hits = AVG 34.07 us Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20240801211215.128101-3-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-02ibmveth: Optimize poll rescheduling processNick Child
When the ibmveth driver processes less than the budget, it must call napi_complete_done() to release the instance. This function will return false if the driver should avoid rearming interrupts. Previously, the driver was ignoring the return code of napi_complete_done(). As a result, there were unnecessary calls to enable the veth irq. Therefore, use the return code napi_complete_done() to determine if irq rearm is necessary. Additionally, in the event that new data is received immediately after rearming interrupts, rather than just rescheduling napi, also jump back to the poll processing loop since we are already in the poll function (and know that we did not expense all of budget). This slight tweak results in a 15% increase in TCP_RR transaction rate (320k to 370k txns). We can see the ftrace data supports this: PREV: ibmveth_poll = 8818014.0 us / 182802.0 hits = AVG 48.24 NEW: ibmveth_poll = 8082398.0 us / 191413.0 hits = AVG 42.22 Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://patch.msgid.link/20240801211215.128101-2-nnac123@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-25ibmvnic: Add tx check to prevent skb leakNick Child
Below is a summary of how the driver stores a reference to an skb during transmit: tx_buff[free_map[consumer_index]]->skb = new_skb; free_map[consumer_index] = IBMVNIC_INVALID_MAP; consumer_index ++; Where variable data looks like this: free_map == [4, IBMVNIC_INVALID_MAP, IBMVNIC_INVALID_MAP, 0, 3] consumer_index^ tx_buff == [skb=null, skb=<ptr>, skb=<ptr>, skb=null, skb=null] The driver has checks to ensure that free_map[consumer_index] pointed to a valid index but there was no check to ensure that this index pointed to an unused/null skb address. So, if, by some chance, our free_map and tx_buff lists become out of sync then we were previously risking an skb memory leak. This could then cause tcp congestion control to stop sending packets, eventually leading to ETIMEDOUT. Therefore, add a conditional to ensure that the skb address is null. If not then warn the user (because this is still a bug that should be patched) and free the old pointer to prevent memleak/tcp problems. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-22ibmvnic: Free any outstanding tx skbs during scrq resetNick Child
There are 2 types of outstanding tx skb's: Type 1: Packets that are sitting in the drivers ind_buff that are waiting to be batch sent to the NIC. During a device reset, these are freed with a call to ibmvnic_tx_scrq_clean_buffer() Type 2: Packets that have been sent to the NIC and are awaiting a TX completion IRQ. These are free'd during a reset with a call to clean_tx_pools() During any reset which requires us to free the tx irq, ensure that the Type 2 skb references are freed. Since the irq is released, it is impossible for the NIC to inform of any completions. Furthermore, later in the reset process is a call to init_tx_pools() which marks every entry in the tx pool as free (ie not outstanding). So if the driver is to make a call to init_tx_pools(), it must first be sure that the tx pool is empty of skb references. This issue was discovered by observing the following in the logs during EEH testing: TX free map points to untracked skb (tso_pool 0 idx=4) TX free map points to untracked skb (tso_pool 0 idx=5) TX free map points to untracked skb (tso_pool 1 idx=36) Fixes: 65d6470d139a ("ibmvnic: clean pending indirect buffs during reset") Signed-off-by: Nick Child <nnac123@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-07net: annotate writes on dev->mtu from ndo_change_mtu()Eric Dumazet
Simon reported that ndo_change_mtu() methods were never updated to use WRITE_ONCE(dev->mtu, new_mtu) as hinted in commit 501a90c94510 ("inet: protect against too small mtu values.") We read dev->mtu without holding RTNL in many places, with READ_ONCE() annotations. It is time to take care of ndo_change_mtu() methods to use corresponding WRITE_ONCE() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Simon Horman <horms@kernel.org> Closes: https://lore.kernel.org/netdev/20240505144608.GB67882@kernel.org/ Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Simon Horman <horms@kernel.org> Acked-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240506102812.3025432-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-24net: ibm/emac: allocate dummy net_device dynamicallyBreno Leitao
Embedding net_device into structures prohibits the usage of flexible arrays in the net_device structure. For more details, see the discussion at [1]. Un-embed the net_device from the private struct by converting it into a pointer. Then use the leverage the new alloc_netdev_dummy() helper to allocate and initialize dummy devices. [1] https://lore.kernel.org/all/20240229225910.79e224cf@kernel.org/ Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-18ibmvnic: Return error code on TX scrq flush failNick Child
In ibmvnic_xmit() if ibmvnic_tx_scrq_flush() returns H_CLOSED then it will inform upper level networking functions to disable tx queues. H_CLOSED signals that the connection with the vnic server is down and a transport event is expected to recover the device. Previously, ibmvnic_tx_scrq_flush() was hard-coded to return success. Therefore, the queues would remain active until ibmvnic_cleanup() is called within do_reset(). The problem is that do_reset() depends on the RTNL lock. If several ibmvnic devices are resetting then there can be a long wait time until the last device can grab the lock. During this time the tx/rx queues still appear active to upper level functions. FYI, we do make a call to netif_carrier_off() outside the RTNL lock but its calls to dev_deactivate() are also dependent on the RTNL lock. As a result, large amounts of retransmissions were observed in a short period of time, eventually leading to ETIMEOUT. This was specifically seen with HNV devices, likely because of even more RTNL dependencies. Therefore, ensure the return code of ibmvnic_tx_scrq_flush() is propagated to the xmit function to allow for an earlier (and lock-less) response to a transport event. Signed-off-by: Nick Child <nnac123@linux.ibm.com> Link: https://lore.kernel.org/r/20240416164128.387920-1-nnac123@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-08mm, treewide: rename MAX_ORDER to MAX_PAGE_ORDERKirill A. Shutemov
commit 23baf831a32c ("mm, treewide: redefine MAX_ORDER sanely") has changed the definition of MAX_ORDER to be inclusive. This has caused issues with code that was not yet upstream and depended on the previous definition. To draw attention to the altered meaning of the define, rename MAX_ORDER to MAX_PAGE_ORDER. Link: https://lkml.kernel.org/r/20231228144704.14033-2-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>