summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-04-19dt-bindings: vendor-prefix: add prefix for the Czech Technical University in ↵Pavel Pisa
Prague. The Czech Technical University in Prague (CTU) is one of the biggest and oldest (founded 1707) technical universities in Europe. The abbreviation in Czech language is ČVUT according to official name in Czech language České vysoké učení technické v Praze The English translation The Czech Technical University in Prague The university pages in English https://www.cvut.cz/en Link: https://lore.kernel.org/all/ff3a7216114fcd83530e70b994ef0e4277ddf000.1647904780.git.pisa@cmp.felk.cvut.cz Signed-off-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> Acked-by: Rob Herring <robh@kernel.org> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19can: mcp251xfd: add support for mcp251863Marc Kleine-Budde
The MCP251863 device is a CAN-FD controller (MCP2518FD) with an integrated transceiver (ATA6563). This patch add support for the new device. Link: https://lore.kernel.org/all/20220419072805.2840340-3-mkl@pengutronix.de Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19dt-binding: can: mcp251xfd: add binding information for mcp251863Marc Kleine-Budde
The MCP251863 device is a CAN-FD controller (MCP2518FD) with an integrated Transceiver (ATA6563). Add the microchip,mcp251863 as a new compatible to the binding. Link: https://lore.kernel.org/all/20220419072805.2840340-2-mkl@pengutronix.de Cc: devicetree@vger.kernel.org Cc: Thomas Kopp <thomas.kopp@microchip.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19dt-bindings: can: renesas,rcar-canfd: document r8a77961 supportWolfram Sang
This patch adds documentation for the r8a77961 to the renesas,rcar-canfd binding. Link: https://lore.kernel.org/all/20220401153743.77871-1-wsa+renesas@sang-engineering.com Cc: devicetree@vger.kernel.org Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19can: xilinx_can: mark bit timing constants as constMarc Kleine-Budde
This patch marks the bit timing constants as const. Fixes: c223da689324 ("can: xilinx_can: Add support for CANFD FD frames") Link: https://lore.kernel.org/all/20220317203119.792552-1-mkl@pengutronix.de Cc: Appana Durga Kedareswara rao <appana.durga.rao@xilinx.com> Cc: Naga Sureshkumar Relli <naga.sureshkumar.relli@xilinx.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19MAINTAINERS: rectify entry for XILINX CAN DRIVERLukas Bulwahn
Commit 7843d3c8e5e6 ("dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML") converts xilinx_can.txt to xilinx,can.yaml, but missed to adjust its reference in MAINTAINERS. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a broken reference. Repair this file reference in XILINX CAN DRIVER. Fixes: 7843d3c8e5e6 ("dt-bindings: can: xilinx_can: Convert Xilinx CAN binding to YAML") Link: https://lore.kernel.org/all/20220321122840.17841-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19can: flexcan: using pm_runtime_resume_and_get instead of pm_runtime_get_syncMinghao Chi
Using pm_runtime_resume_and_get is more appropriate for simplifing code Link: https://lore.kernel.org/all/20220419081449.2574026-1-chi.minghao@zte.com.cn Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19can: mscan: mpc5xxx_can: Prepare cleanup of powerpc's asm/prom.hChristophe Leroy
powerpc's asm/prom.h brings some headers that it doesn't need itself. In order to clean it up, first add missing headers in users of asm/prom.h Link: https://lore.kernel.org/all/878888f9057ad2f66ca0621a0007472bf57f3e3d.1648833432.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19can: Fix Links to Technologic Systems web resourcesKris Bahnsen
Technologic Systems has rebranded as embeddedTS with the current domain eventually going offline. Update web/doc URLs to correct resource locations. Link: https://lore.kernel.org/all/20220329201229.16279-1-kris@embeddedTS.com Signed-off-by: Kris Bahnsen <kris@embeddedTS.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19can: bittiming: can_calc_bittiming(): prefer small bit rate pre-scalers over ↵Marc Kleine-Budde
larger ones The CiA (CAN in Automation) lists in their Newsletter 1/2018 in the "Recommendation for the CAN FD bit-timing" [1] article several recommendations, one of them is: | Recommendation 3: Choose BRPA and BRPD as low as possible [1] https://can-newsletter.org/uploads/media/raw/f6a36d1461371a2f86ef0011a513712c.pdf With the current bit timing algorithm Srinivas Neeli noticed that on the Xilinx Versal ACAP board the CAN data bit timing parameters are not calculated optimally. For most bit rates, the bit rate prescaler (BRP) is != 1, although it's possible to configure the requested with a bit rate with a prescaler of 1: | Data Bit timing parameters for xilinx_can_fd2i with 79.999999 MHz ref clock (cmd-line) using algo 'v4.8' | nominal real Bitrt nom real SampP | Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error | 12000000 12 2 2 2 1 1 11428571 4.8% 75.0% 71.4% 4.8% | 10000000 25 1 1 1 1 2 9999999 0.0% 75.0% 75.0% 0.0% | 8000000 12 3 3 3 1 1 7999999 0.0% 75.0% 70.0% 6.7% | 5000000 50 1 1 1 1 4 4999999 0.0% 75.0% 75.0% 0.0% | 4000000 62 1 1 1 1 5 3999999 0.0% 75.0% 75.0% 0.0% | 2000000 125 1 1 1 1 10 1999999 0.0% 75.0% 75.0% 0.0% | 1000000 250 1 1 1 1 20 999999 0.0% 75.0% 75.0% 0.0% The bit timing parameter calculation algorithm iterates effectively from low to high BRP values. It selects a new best parameter set, if the sample point error of the current parameter set is equal or less to old best parameter set. If the given hardware constraints (clock rate and bit timing parameter constants) don't allow a sample point error of 0, the algorithm will first find a valid bit timing parameter set with a low BRP, but then will accept parameter sets with higher BRPs that have the same sample point error. This patch changes the algorithm to only accept a new parameter set, if the resulting sample point error is lower. This leads to the following data bit timing parameter for the Versal ACAP board: | Data Bit timing parameters for xilinx_can_fd2i with 79.999999 MHz ref clock (cmd-line) using algo 'can-next' | nominal real Bitrt nom real SampP | Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error | 12000000 12 2 2 2 1 1 11428571 4.8% 75.0% 71.4% 4.8% | 10000000 12 2 3 2 1 1 9999999 0.0% 75.0% 75.0% 0.0% | 8000000 12 3 3 3 1 1 7999999 0.0% 75.0% 70.0% 6.7% | 5000000 12 5 6 4 1 1 4999999 0.0% 75.0% 75.0% 0.0% | 4000000 12 7 7 5 1 1 3999999 0.0% 75.0% 75.0% 0.0% | 2000000 12 14 15 10 1 1 1999999 0.0% 75.0% 75.0% 0.0% | 1000000 25 14 15 10 1 2 999999 0.0% 75.0% 75.0% 0.0% Note: Due to HW constraints a data bit rate of 1 MBit/s with BRP = 1 is not possible. Link: https://lore.kernel.org/all/20220318144913.873614-1-mkl@pengutronix.de Link: https://lore.kernel.org/all/20220113203004.jf2rqj2pirhgx72i@pengutronix.de Cc: Srinivas Neeli <sneeli@xilinx.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19can: rx-offload: rename can_rx_offload_queue_sorted() -> ↵Marc Kleine-Budde
can_rx_offload_queue_timestamp() This patch renames the function can_rx_offload_queue_sorted() to can_rx_offload_queue_timestamp(). This better describes what the function does, it adds a newly RX'ed skb to the sorted queue by its timestamp. Link: https://lore.kernel.org/all/20220417194327.2699059-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2022-04-19Merge branch 'rtnetlink-improve-alt_ifname-config-and-fix-dangerous-group-usage'Paolo Abeni
Florent Fourcot says: ==================== rtnetlink: improve ALT_IFNAME config and fix dangerous GROUP usage First commit forbids dangerous calls when both IFNAME and GROUP are given, since it can introduce unexpected behaviour when IFNAME does not match any interface. Second patch achieves primary goal of this patchset to fix/improve IFLA_ALT_IFNAME attribute, since previous code was never working for newlink/setlink. ip-link command is probably getting interface index before, and was not using this feature. Last two patches are improving error code on corner cases. Changes in v2: * Remove ifname argument in rtnl_dev_get/do_setlink functions (simplify code) * Use a boolean to avoid condition duplication in __rtnl_newlink Changes in v3: * Simplify rtnl_dev_get signature Changes in v4: * Rename link_lookup to link_specified Changes in v5: * Re-order patches ==================== Link: https://lore.kernel.org/r/20220415165330.10497-1-florent.fourcot@wifirst.fr Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19rtnetlink: return EINVAL when request cannot succeedFlorent Fourcot
A request without interface name/interface index/interface group cannot work. We should return EINVAL Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19rtnetlink: return ENODEV when IFLA_ALT_IFNAME is used in dellinkFlorent Fourcot
If IFLA_ALT_IFNAME is set and given interface is not found, we should return ENODEV and be consistent with IFLA_IFNAME behaviour This commit extends feature of commit 76c9ac0ee878, "net: rtnetlink: add possibility to use alternative names as message handle" CC: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19rtnetlink: enable alt_ifname for setlink/newlinkFlorent Fourcot
buffer called "ifname" given in function rtnl_dev_get is always valid when called by setlink/newlink, but contains only empty string when IFLA_IFNAME is not given. So IFLA_ALT_IFNAME is always ignored This patch fixes rtnl_dev_get function with a remove of ifname argument, and move ifname copy in do_setlink when required. It extends feature of commit 76c9ac0ee878, "net: rtnetlink: add possibility to use alternative names as message handle"" CC: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19rtnetlink: return ENODEV when ifname does not exist and group is givenFlorent Fourcot
When the interface does not exist, and a group is given, the given parameters are being set to all interfaces of the given group. The given IFNAME/ALT_IF_NAME are being ignored in that case. That can be dangerous since a typo (or a deleted interface) can produce weird side effects for caller: Case 1: IFLA_IFNAME=valid_interface IFLA_GROUP=1 MTU=1234 Case 1 will update MTU and group of the given interface "valid_interface". Case 2: IFLA_IFNAME=doesnotexist IFLA_GROUP=1 MTU=1234 Case 2 will update MTU of all interfaces in group 1. IFLA_IFNAME is ignored in this case This behaviour is not consistent and dangerous. In order to fix this issue, we now return ENODEV when the given IFNAME does not exist. Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr> Signed-off-by: Brian Baboch <brian.baboch@wifirst.fr> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19Merge branch 'net-sched-allow-user-to-select-txqueue'Paolo Abeni
Tonghao Zhang says: ==================== net: sched: allow user to select txqueue From: Tonghao Zhang <xiangxia.m.yue@gmail.com> Patch 1 allow user to select txqueue in clsact hook. Patch 2 support skbhash to select txqueue. ==================== Link: https://lore.kernel.org/r/20220415164046.26636-1-xiangxia.m.yue@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19net: sched: support hash selecting tx queueTonghao Zhang
This patch allows users to pick queue_mapping, range from A to B. Then we can load balance packets from A to B tx queue. The range is an unsigned 16bit value in decimal format. $ tc filter ... action skbedit queue_mapping skbhash A B "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is enhanced with flags: SKBEDIT_F_TXQ_SKBHASH +----+ +----+ +----+ | P1 | | P2 | | Pn | +----+ +----+ +----+ | | | +-----------+-----------+ | | clsact/skbedit | MQ v +-----------+-----------+ | q0 | qn | qm v v v HTB/FQ FIFO ... FIFO For example: If P1 sends out packets to different Pods on other host, and we want distribute flows from qn - qm. Then we can use skb->hash as hash. setup commands: $ NETDEV=eth0 $ ip netns add n1 $ ip link add ipv1 link $NETDEV type ipvlan mode l2 $ ip link set ipv1 netns n1 $ ip netns exec n1 ifconfig ipv1 2.2.2.100/24 up $ tc qdisc add dev $NETDEV clsact $ tc filter add dev $NETDEV egress protocol ip prio 1 \ flower skip_hw src_ip 2.2.2.100 action skbedit queue_mapping skbhash 2 6 $ tc qdisc add dev $NETDEV handle 1: root mq $ tc qdisc add dev $NETDEV parent 1:1 handle 2: htb $ tc class add dev $NETDEV parent 2: classid 2:1 htb rate 100kbit $ tc class add dev $NETDEV parent 2: classid 2:2 htb rate 200kbit $ tc qdisc add dev $NETDEV parent 1:2 tbf rate 100mbit burst 100mb latency 1 $ tc qdisc add dev $NETDEV parent 1:3 pfifo $ tc qdisc add dev $NETDEV parent 1:4 pfifo $ tc qdisc add dev $NETDEV parent 1:5 pfifo $ tc qdisc add dev $NETDEV parent 1:6 pfifo $ tc qdisc add dev $NETDEV parent 1:7 pfifo $ ip netns exec n1 iperf3 -c 2.2.2.1 -i 1 -t 10 -P 10 pick txqueue from 2 - 6: $ ethtool -S $NETDEV | grep -i tx_queue_[0-9]_bytes tx_queue_0_bytes: 42 tx_queue_1_bytes: 0 tx_queue_2_bytes: 11442586444 tx_queue_3_bytes: 7383615334 tx_queue_4_bytes: 3981365579 tx_queue_5_bytes: 3983235051 tx_queue_6_bytes: 6706236461 tx_queue_7_bytes: 42 tx_queue_8_bytes: 0 tx_queue_9_bytes: 0 txqueues 2 - 6 are mapped to classid 1:3 - 1:7 $ tc -s class show dev $NETDEV ... class mq 1:3 root leaf 8002: Sent 11949133672 bytes 7929798 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq 1:4 root leaf 8003: Sent 7710449050 bytes 5117279 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq 1:5 root leaf 8004: Sent 4157648675 bytes 2758990 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq 1:6 root leaf 8005: Sent 4159632195 bytes 2759990 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 class mq 1:7 root leaf 8006: Sent 7003169603 bytes 4646912 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 ... Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Kevin Hao <haokexin@gmail.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Kees Cook <keescook@chromium.org> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Cc: Antoine Tenart <atenart@kernel.org> Cc: Wei Wang <weiwan@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-19net: sched: use queue_mapping to pick tx queueTonghao Zhang
This patch fixes issue: * If we install tc filters with act_skbedit in clsact hook. It doesn't work, because netdev_core_pick_tx() overwrites queue_mapping. $ tc filter ... action skbedit queue_mapping 1 And this patch is useful: * We can use FQ + EDT to implement efficient policies. Tx queues are picked by xps, ndo_select_queue of netdev driver, or skb hash in netdev_core_pick_tx(). In fact, the netdev driver, and skb hash are _not_ under control. xps uses the CPUs map to select Tx queues, but we can't figure out which task_struct of pod/containter running on this cpu in most case. We can use clsact filters to classify one pod/container traffic to one Tx queue. Why ? In containter networking environment, there are two kinds of pod/ containter/net-namespace. One kind (e.g. P1, P2), the high throughput is key in these applications. But avoid running out of network resource, the outbound traffic of these pods is limited, using or sharing one dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods (e.g. Pn), the low latency of data access is key. And the traffic is not limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc. This choice provides two benefits. First, contention on the HTB/FQ Qdisc lock is significantly reduced since fewer CPUs contend for the same queue. More importantly, Qdisc contention can be eliminated completely if each CPU has its own FIFO Qdisc for the second kind of pods. There must be a mechanism in place to support classifying traffic based on pods/container to different Tx queues. Note that clsact is outside of Qdisc while Qdisc can run a classifier to select a sub-queue under the lock. In general recording the decision in the skb seems a little heavy handed. This patch introduces a per-CPU variable, suggested by Eric. The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit(). - Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag is set in qdisc->enqueue() though tx queue has been selected in netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared firstly in __dev_queue_xmit(), is useful: - Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy: For example, eth0, macvlan in pod, which root Qdisc install skbedit queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of eth0.3, clear the flag, does not select tx queue according to skb->queue_mapping because there is no filters in clsact or tx Qdisc of this netdev. Same action taked in eth0, ixgbe in Host. - Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue in tx Qdisc (qdisc->enqueue()), the proper way to clear it is clearing it in __dev_queue_xmit when processing next packets. For performance reasons, use the static key. If user does not config the NET_EGRESS, the patch will not be compiled. +----+ +----+ +----+ | P1 | | P2 | | Pn | +----+ +----+ +----+ | | | +-----------+-----------+ | | clsact/skbedit | MQ v +-----------+-----------+ | q0 | q1 | qn v v v HTB/FQ HTB/FQ ... FIFO Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Alexander Lobakin <alobakin@pm.me> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Talal Ahmad <talalahmad@google.com> Cc: Kevin Hao <haokexin@gmail.com> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Kees Cook <keescook@chromium.org> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Cc: Antoine Tenart <atenart@kernel.org> Cc: Wei Wang <weiwan@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-18docs: net: dsa: describe issues with checksum offloadLuiz Angelo Daros de Luca
DSA tags before IP header (categories 1 and 2) or after the payload (3) might introduce offload checksum issues. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18Merge branch 'mlxsw-line-card'David S. Miller
Ido Schimmel says: ==================== mlxsw: Introduce line card support for modular switch Jiri says: This patchset introduces support for modular switch systems and also introduces mlxsw support for NVIDIA Mellanox SN4800 modular switch. It contains 8 slots to accommodate line cards - replaceable PHY modules which may contain gearboxes. Currently supported line card: 16X 100GbE (QSFP28) Other line cards that are going to be supported: 8X 200GbE (QSFP56) 4X 400GbE (QSFP-DD) There may be other types of line cards added in the future. To be consistent with the port split configuration (splitter cabels), the line card entities are treated in the similar way. The nature of a line card is not "a pluggable device", but "a pluggable PHY module". A concept of "provisioning" is introduced. The user may "provision" certain slot with a line card type. Driver then creates all instances (devlink ports, netdevices, etc) related to this line card type. It does not matter if the line card is plugged-in at the time. User is able to configure netdevices, devlink ports, setup port splitters, etc. From the perspective of the switch ASIC, all is present and can be configured. The carrier of netdevices stays down if the line card is not plugged-in. Once the line card is inserted and activated, the carrier of the related netdevices is then reflecting the physical line state, same as for an ordinary fixed port. Once user does not want to use the line card related instances anymore, he can "unprovision" the slot. Driver then removes the instances. Patches 1-4 are extending devlink driver API and UAPI in order to register, show, dump, provision and activate the line card. Patches 5-17 are implementing the introduced API in mlxsw. The last patch adds a selftest for mlxsw line cards. Example: $ devlink port # No ports are listed $ devlink lc pci/0000:01:00.0: lc 1 state unprovisioned supported_types: 16x100G lc 2 state unprovisioned supported_types: 16x100G lc 3 state unprovisioned supported_types: 16x100G lc 4 state unprovisioned supported_types: 16x100G lc 5 state unprovisioned supported_types: 16x100G lc 6 state unprovisioned supported_types: 16x100G lc 7 state unprovisioned supported_types: 16x100G lc 8 state unprovisioned supported_types: 16x100G Note that driver exposes list supported line card types. Currently there is only one: "16x100G". To provision the slot #8: $ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G $ devlink lc show pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 state active type 16x100G supported_types: 16x100G $ devlink port pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4 pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4 pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4 pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4 pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4 pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4 pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4 pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4 pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4 pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4 pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4 pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4 pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4 pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4 pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4 pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4 To uprovision the slot #8: $ devlink lc set pci/0000:01:00.0 lc 8 notype ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18selftests: mlxsw: Introduce devlink line card ↵Jiri Pirko
provision/unprovision/activation tests Introduce basic line card manipulation which consists of provisioning, unprovisioning and activation of a line card. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: spectrum: Add port to linecard mappingJiri Pirko
For each port get slot_index using PMLP register. For ports residing on a linecard, identify it with the linecard by setting mapping using devlink_port_linecard_set() helper. Use linecard slot index for PMTDB register queries. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: core: Extend driver ops by remove selected ports opJiri Pirko
In case of line card implementation, the core has to have a way to remove relevant ports manually. Extend the Spectrum driver ops by an op that implements port removal of selected ports upon request. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: core_linecards: Implement line card activation processJiri Pirko
Allow to process events generated upon line card getting "ready" and "active". When DSDSC event with "ready" bit set is delivered, that means the line card is powered up. Use MDDC register to push the line card to active state. Once FW is done with that, the DSDSC event with "active" bit set is delivered. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: core_linecards: Add line card objects and implement provisioningJiri Pirko
Introduce objects for line cards and an infrastructure around that. Use devlink_linecard_create/destroy() to register the line card with devlink core. Implement provisioning ops with a list of supported line cards. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: reg: Add Management Binary Code Transfer RegisterJiri Pirko
The MBCT register allows to transfer binary INI codes from the host to the management FW by transferring it by chunks of maximum 1KB. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: reg: Add Management DownStream Device Control RegisterJiri Pirko
The MDDC register allows to control downstream devices and line cards. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: reg: Add Management DownStream Device Query RegisterJiri Pirko
The MDDQ register allows to query the DownStream device properties. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: spectrum: Introduce port mapping change event processingJiri Pirko
Register PMLPE trap and process the port mapping changes delivered by it by creating related ports. Note that this happens after provisioning. The INI of the linecard is processed and merged by FW. PMLPE is generated for each port. Process this mapping change. Layout of PMLPE is the same as layout of PMLP. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: Narrow the critical section of devl_lock during ports creation/removalJiri Pirko
No need to hold the lock for alloc and freecpu. So narrow the critical section. Follow-up patch is going to benefit from this by adding more code to the functions which will be out of the critical as well. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: reg: Add Ports Mapping Event Configuration RegisterJiri Pirko
The PMECR register is used to enable/disable event triggering in case of local port mapping change. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: spectrum: Allocate port mapping array of structs instead of pointersJiri Pirko
Instead of array of pointers to port mapping structures, allocate the array of structures directly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18mlxsw: spectrum: Allow lane to start from non-zero indexJiri Pirko
So far, the lane index always started from zero. That is not true for modular systems with gearbox-equipped linecards. Loose the check so the lanes can start from non-zero index. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18devlink: add port to line card relationship setJiri Pirko
In order to properly inform user about relationship between port and line card, introduce a driver API to set line card for a port. Use this information to extend port devlink netlink message by line card index and also include the line card index into phys_port_name and by that into a netdevice name. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18devlink: implement line card active stateJiri Pirko
Allow driver to mark a line card as active. Expose this state to the userspace over devlink netlink interface with proper notifications. 'active' state means that line card was plugged in after being provisioned. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18devlink: implement line card provisioningJiri Pirko
In order to be able to configure all needed stuff on a port/netdevice of a line card without the line card being present, introduce line card provisioning. Basically by setting a type, provisioning process will start and driver is supposed to create a placeholder for instances (ports/netdevices) for a line card type. Allow the user to query the supported line card types over line card get command. Then implement two netlink command SET to allow user to set/unset the card type. On the driver API side, add provision/unprovision ops and supported types array to be advertised. Upon provision op call, the driver should take care of creating the instances for the particular line card type. Introduce provision_set/clear() functions to be called by the driver once the provisioning/unprovisioning is done on its side. These helpers are not to be called directly due to the async nature of provisioning. Example: $ devlink port # No ports are listed $ devlink lc pci/0000:01:00.0: lc 1 state unprovisioned supported_types: 16x100G lc 2 state unprovisioned supported_types: 16x100G lc 3 state unprovisioned supported_types: 16x100G lc 4 state unprovisioned supported_types: 16x100G lc 5 state unprovisioned supported_types: 16x100G lc 6 state unprovisioned supported_types: 16x100G lc 7 state unprovisioned supported_types: 16x100G lc 8 state unprovisioned supported_types: 16x100G $ devlink lc set pci/0000:01:00.0 lc 8 type 16x100G $ devlink lc show pci/0000:01:00.0 lc 8 pci/0000:01:00.0: lc 8 state active type 16x100G supported_types: 16x100G $ devlink port pci/0000:01:00.0/0: type notset flavour cpu port 0 splittable false pci/0000:01:00.0/53: type eth netdev enp1s0nl8p1 flavour physical lc 8 port 1 splittable true lanes 4 pci/0000:01:00.0/54: type eth netdev enp1s0nl8p2 flavour physical lc 8 port 2 splittable true lanes 4 pci/0000:01:00.0/55: type eth netdev enp1s0nl8p3 flavour physical lc 8 port 3 splittable true lanes 4 pci/0000:01:00.0/56: type eth netdev enp1s0nl8p4 flavour physical lc 8 port 4 splittable true lanes 4 pci/0000:01:00.0/57: type eth netdev enp1s0nl8p5 flavour physical lc 8 port 5 splittable true lanes 4 pci/0000:01:00.0/58: type eth netdev enp1s0nl8p6 flavour physical lc 8 port 6 splittable true lanes 4 pci/0000:01:00.0/59: type eth netdev enp1s0nl8p7 flavour physical lc 8 port 7 splittable true lanes 4 pci/0000:01:00.0/60: type eth netdev enp1s0nl8p8 flavour physical lc 8 port 8 splittable true lanes 4 pci/0000:01:00.0/61: type eth netdev enp1s0nl8p9 flavour physical lc 8 port 9 splittable true lanes 4 pci/0000:01:00.0/62: type eth netdev enp1s0nl8p10 flavour physical lc 8 port 10 splittable true lanes 4 pci/0000:01:00.0/63: type eth netdev enp1s0nl8p11 flavour physical lc 8 port 11 splittable true lanes 4 pci/0000:01:00.0/64: type eth netdev enp1s0nl8p12 flavour physical lc 8 port 12 splittable true lanes 4 pci/0000:01:00.0/125: type eth netdev enp1s0nl8p13 flavour physical lc 8 port 13 splittable true lanes 4 pci/0000:01:00.0/126: type eth netdev enp1s0nl8p14 flavour physical lc 8 port 14 splittable true lanes 4 pci/0000:01:00.0/127: type eth netdev enp1s0nl8p15 flavour physical lc 8 port 15 splittable true lanes 4 pci/0000:01:00.0/128: type eth netdev enp1s0nl8p16 flavour physical lc 8 port 16 splittable true lanes 4 $ devlink lc set pci/0000:01:00.0 lc 8 notype Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18devlink: add support to create line card and expose to userJiri Pirko
Extend the devlink API so the driver is going to be able to create and destroy linecard instances. There can be multiple line cards per devlink device. Expose this new type of object over devlink netlink API to the userspace, with notifications. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-18tcp: fix signed/unsigned comparisonEric Dumazet
Kernel test robot reported: smatch warnings: net/ipv4/tcp_input.c:5966 tcp_rcv_established() warn: unsigned 'reason' is never less than zero. I actually had one packetdrill failing because of this bug, and was about to send the fix :) v2: Andreas Schwab also pointed out that @reason needs to be negated before we reach tcp_drop_reason() Fixes: 4b506af9c5b8 ("tcp: add two drop reasons for tcp_ack()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17Merge branch 'tcp-drop-reason-additions'David S. Miller
Eric Dumazet says: ==================== tcp: drop reason additions Currently, TCP is either missing drop reasons, or pretending that some useful packets are dropped. This patch series makes "perf record -a -e skb:kfree_skb" much more usable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: add drop reason support to tcp_ofo_queue()Eric Dumazet
packets in OFO queue might be redundant, and dropped. tcp_drop() is no longer needed. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: add drop reasons to tcp_rcv_synsent_state_process()Eric Dumazet
Re-use existing reasons. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: make tcp_rcv_synsent_state_process() drop monitor friendEric Dumazet
1) A valid RST packet should be consumed, to not confuse drop monitor. 2) Same remark for packet validating cross syn setup, even if we might ignore part of it. 3) When third packet of 3WHS is delayed, do not pretend the SYNACK was dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: add drop reason support to tcp_prune_ofo_queue()Eric Dumazet
Add one reason for packets dropped from OFO queue because of memory pressure. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: add two drop reasons for tcp_ack()Eric Dumazet
Add TCP_TOO_OLD_ACK and TCP_ACK_UNSENT_DATA drop reasons so that tcp_rcv_established() can report them. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: add drop reasons to tcp_rcv_state_process()Eric Dumazet
Add basic support for drop reasons in tcp_rcv_state_process() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: make tcp_rcv_state_process() drop monitor friendlyEric Dumazet
tcp_rcv_state_process() incorrectly drops packets instead of consuming it, making drop monitor very noisy, if not unusable. Calling tcp_time_wait() or tcp_done() is part of standard behavior, packets triggering these actions were not dropped. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: add drop reason support to tcp_validate_incoming()Eric Dumazet
Creates four new drop reasons for the following cases: 1) packet being rejected by RFC 7323 PAWS check 2) packet being rejected by SEQUENCE check 3) Invalid RST packet 4) Invalid SYN packet Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: get rid of rst_seq_matchEric Dumazet
Small cleanup in tcp_validate_incoming(), no need for rst_seq_match setting and testing. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-17tcp: consume incoming skb leading to a resetEric Dumazet
Whenever tcp_validate_incoming() handles a valid RST packet, we should not pretend the packet was dropped. Create a special section at the end of tcp_validate_incoming() to handle this case. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>