summaryrefslogtreecommitdiff
path: root/drivers/net/dsa/ocelot/felix.c
AgeCommit message (Collapse)Author
2021-07-20net: dsa: tag_8021q: absorb dsa_8021q_setup into dsa_tag_8021q_{,un}registerVladimir Oltean
Right now, setting up tag_8021q is a 2-step operation for a driver, first the context structure needs to be created, then the VLANs need to be installed on the ports. A similar thing is true for teardown. Merge the 2 steps into the register/unregister methods, to be as transparent as possible for the driver as to what tag_8021q does behind the scenes. This also gets rid of the funny "bool setup == true means setup, == false means teardown" API that tag_8021q used to expose. Note that dsa_tag_8021q_register() must be called at least in the .setup() driver method and never earlier (like in the driver probe function). This is because the DSA switch tree is not initialized at probe time, and the cross-chip notifiers will not work. For symmetry with .setup(), the unregister method should be put in .teardown(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: dsa: make tag_8021q operations part of the coreVladimir Oltean
Make tag_8021q a more central element of DSA and move the 2 driver specific operations outside of struct dsa_8021q_context (which is supposed to hold dynamic data and not really constant function pointers). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: dsa: let the core manage the tag_8021q contextVladimir Oltean
The basic problem description is as follows: Be there 3 switches in a daisy chain topology: | sw0p0 sw0p1 sw0p2 sw0p3 sw0p4 [ user ] [ user ] [ user ] [ dsa ] [ cpu ] | +---------+ | sw1p0 sw1p1 sw1p2 sw1p3 sw1p4 [ user ] [ user ] [ user ] [ dsa ] [ dsa ] | +---------+ | sw2p0 sw2p1 sw2p2 sw2p3 sw2p4 [ user ] [ user ] [ user ] [ user ] [ dsa ] The CPU will not be able to ping through the user ports of the bottom-most switch (like for example sw2p0), simply because tag_8021q was not coded up for this scenario - it has always assumed DSA switch trees with a single switch. To add support for the topology above, we must admit that the RX VLAN of sw2p0 must be added on some ports of switches 0 and 1 as well. This is in fact a textbook example of thing that can use the cross-chip notifier framework that DSA has set up in switch.c. There is only one problem: core DSA (switch.c) is not able right now to make the connection between a struct dsa_switch *ds and a struct dsa_8021q_context *ctx. Right now, it is drivers who call into tag_8021q.c and always provide a struct dsa_8021q_context *ctx pointer, and tag_8021q.c calls them back with the .tag_8021q_vlan_{add,del} methods. But with cross-chip notifiers, it is possible for tag_8021q to call drivers without drivers having ever asked for anything. A good example is right above: when sw2p0 wants to set itself up for tag_8021q, the .tag_8021q_vlan_add method needs to be called for switches 1 and 0, so that they transport sw2p0's VLANs towards the CPU without dropping them. So instead of letting drivers manage the tag_8021q context, add a tag_8021q_ctx pointer inside of struct dsa_switch, which will be populated when dsa_tag_8021q_register() returns success. The patch is fairly long-winded because we are partly reverting commit 5899ee367ab3 ("net: dsa: tag_8021q: add a context structure") which made the driver-facing tag_8021q API use "ctx" instead of "ds". Now that we can access "ctx" directly from "ds", this is no longer needed. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-07-20net: dsa: tag_8021q: create dsa_tag_8021q_{register,unregister} helpersVladimir Oltean
In preparation of moving tag_8021q to core DSA, move all initialization and teardown related to tag_8021q which is currently done by drivers in 2 functions called "register" and "unregister". These will gather more functionality in future patches, which will better justify the chosen naming scheme. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-08net: dsa: felix: set TX flow control according to the phylink_mac_link_up ↵Vladimir Oltean
resolution Instead of relying on the static initialization done by ocelot_init_port() which enables flow control unconditionally, set SYS_PAUSE_CFG_PAUSE_ENA according to the parameters negotiated by the PHY. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: mscc: ocelot: convert to ocelot_port_txtstamp_request()Yangbo Lu
Convert to a common ocelot_port_txtstamp_request() for TX timestamp request handling. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: free skb->cb usage in core driverYangbo Lu
Free skb->cb usage in core driver and let device drivers decide to use or not. The reason having a DSA_SKB_CB(skb)->clone was because dsa_skb_tx_timestamp() which may set the clone pointer was called before p->xmit() which would use the clone if any, and the device driver has no way to initialize the clone pointer. This patch just put memset(skb->cb, 0, sizeof(skb->cb)) at beginning of dsa_slave_xmit(). Some new features in the future, like one-step timestamp may need more bytes of skb->cb to use in dsa_skb_tx_timestamp(), and p->xmit(). Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: no longer clone skb in core driverYangbo Lu
It was a waste to clone skb directly in dsa_skb_tx_timestamp(). For one-step timestamping, a clone was not needed. For any failure of port_txtstamp (this may usually happen), the skb clone had to be freed. So this patch moves skb cloning for tx timestamp out of dsa core, and let drivers clone skb in port_txtstamp if they really need. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: no longer identify PTP packet in core driverYangbo Lu
Move ptp_classify_raw out of dsa core driver for handling tx timestamp request. Let device drivers do this if they want. Not all drivers want to limit tx timestamping for only PTP packet. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: dsa: check tx timestamp request in core driverYangbo Lu
Check tx timestamp request in core driver at very beginning of dsa_skb_tx_timestamp(), so that most skbs not requiring tx timestamp just return. And drop such checking in device drivers. Signed-off-by: Yangbo Lu <yangbo.lu@nxp.com> Tested-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-23net: ocelot: replay switchdev events when joining bridgeVladimir Oltean
The premise of this change is that the switchdev port attributes and objects offloaded by ocelot might have been missed when we are joining an already existing bridge port, such as a bonding interface. The patch pulls these switchdev attributes and objects from the bridge, on behalf of the 'bridge port' net device which might be either the ocelot switch interface, or the bonding upper interface. The ocelot_net.c belongs strictly to the switchdev ocelot driver, while ocelot.c is part of a library shared with the DSA felix driver. The ocelot_port_bridge_leave function (part of the common library) used to call ocelot_port_vlan_filtering(false), something which is not necessary for DSA, since the framework deals with that already there. So we move this function to ocelot_switchdev_unsync, which is specific to the switchdev driver. The code movement described above makes ocelot_port_bridge_leave no longer return an error code, so we change its type from int to void. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16net: dsa: felix: Add support for MRPHoratiu Vultur
Implement functions 'port_mrp_add', 'port_mrp_del', 'port_mrp_add_ring_role' and 'port_mrp_del_ring_role' to call the mrp functions from ocelot. Also all MRP frames that arrive to CPU on queue number OCELOT_MRP_CPUQ will be forward by the SW. Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16net: dsa: felix: perform teardown on error in felix_setupVladimir Oltean
If the driver fails to probe, it would be nice to not leak memory. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-16net: dsa: felix: don't deinitialize unused portsVladimir Oltean
ocelot_init_port is called only if dsa_is_unused_port == false, however ocelot_deinit_port is called unconditionally. This causes a warning in the skb_queue_purge inside ocelot_deinit_port saying that the spin lock protecting ocelot_port->tx_skbs was not initialized. Fixes: e5fb512d81d0 ("net: mscc: ocelot: deinitialize only initialized ports") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: propagate extack to .port_vlan_filteringVladimir Oltean
Some drivers can't dynamically change the VLAN filtering option, or impose some restrictions, it would be nice to propagate this info through netlink instead of printing it to a kernel log that might never be read. Also netlink extack includes the module that emitted the message, which means that it's easier to figure out which ones are driver-generated errors as opposed to command misuse. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: propagate extack to .port_vlan_addVladimir Oltean
Allow drivers to communicate their restrictions to user space directly, instead of printing to the kernel log. Where the conversion would have been lossy and things like VLAN ID could no longer be conveyed (due to the lack of support for printf format specifier in netlink extack), I chose to keep the messages in full form to the kernel log only, and leave it up to individual driver maintainers to move more messages to extack. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: tag_ocelot_8021q: add support for PTP timestampingVladimir Oltean
For TX timestamping, we use the felix_txtstamp method which is common with the regular (non-8021q) ocelot tagger. This method says that skb deferral is needed, prepares a timestamp request ID, and puts a clone of the skb in a queue waiting for the timestamp IRQ. felix_txtstamp is called by dsa_skb_tx_timestamp() just before the tagger's xmit method. In the tagger xmit, we divert the packets classified by dsa_skb_tx_timestamp() as PTP towards the MMIO-based injection registers, and we declare them as dead towards dsa_slave_xmit. If not PTP, we proceed with normal tag_8021q stuff. Then the timestamp IRQ fires, the clone queued up from felix_txtstamp is matched to the TX timestamp retrieved from the switch's FIFO based on the timestamp request ID, and the clone is delivered to the stack. On RX, thanks to the VCAP IS2 rule that redirects the frames with an EtherType for 1588 towards two destinations: - the CPU port module (for MMIO based extraction) and - if the "no XTR IRQ" workaround is in place, the dsa_8021q CPU port the relevant data path processing starts in the ptp_classify_raw BPF classifier installed by DSA in the RX data path (post tagger, which is completely unaware that it saw a PTP packet). This time we can't reuse the same implementation of .port_rxtstamp that also works with the default ocelot tagger. That is because felix_rxtstamp is given an skb with a freshly stripped DSA header, and it says "I don't need deferral for its RX timestamp, it's right in it, let me show you"; and it just points to the header right behind skb->data, from where it unpacks the timestamp and annotates the skb with it. The same thing cannot happen with tag_ocelot_8021q, because for one thing, the skb did not have an extraction frame header in the first place, but a VLAN tag with no timestamp information. So the code paths in felix_rxtstamp for the regular and 8021q tagger are completely independent. With tag_8021q, the timestamp must come from the packet's duplicate delivered to the CPU port module, but there is potentially complex logic to be handled [ and prone to reordering ] if we were to just start reading packets from the CPU port module, and try to match them to the one we received over Ethernet and which needs an RX timestamp. So we do something simple: we tell DSA "give me some time to think" (we request skb deferral by returning false from .port_rxtstamp) and we just drop the frame we got over Ethernet with no attempt to match it to anything - we just treat it as a notification that there's data to be processed from the CPU port module's queues. Then we proceed to read the packets from those, one by one, which we deliver up the stack, timestamped, using netif_rx - the same function that any driver would use anyway if it needed RX timestamp deferral. So the assumption is that we'll come across the PTP packet that triggered the CPU extraction notification eventually, but we don't know when exactly. Thanks to the VCAP IS2 trap/redirect rule and the exclusion of the CPU port module from the flooding replicators, only PTP frames should be present in the CPU port module's RX queues anyway. There is just one conflict between the VCAP IS2 trapping rule and the semantics of the BPF classifier. Namely, ptp_classify_raw() deems general messages as non-timestampable, but still, those are trapped to the CPU port module since they have an EtherType of ETH_P_1588. So, if the "no XTR IRQ" workaround is in place, we need to run another BPF classifier on the frames extracted over MMIO, to avoid duplicates being sent to the stack (once over Ethernet, once over MMIO). It doesn't look like it's possible to install VCAP IS2 rules based on keys extracted from the 1588 frame headers. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: felix: setup MMIO filtering rules for PTP when using tag_8021qVladimir Oltean
Since the tag_8021q tagger is software-defined, it has no means by itself for retrieving hardware timestamps of PTP event messages. Because we do want to support PTP on ocelot even with tag_8021q, we need to use the CPU port module for that. The RX timestamp is present in the Extraction Frame Header. And because we can't use NPI mode which redirects the CPU queues to an "external CPU" (meaning the ARM CPU running Linux), then we need to poll the CPU port module through the MMIO registers to retrieve TX and RX timestamps. Sadly, on NXP LS1028A, the Felix switch was integrated into the SoC without wiring the extraction IRQ line to the ARM GIC. So, if we want to be notified of any PTP packets received on the CPU port module, we have a problem. There is a possible workaround, which is to use the Ethernet CPU port as a notification channel that packets are available on the CPU port module as well. When a PTP packet is received by the DSA tagger (without timestamp, of course), we go to the CPU extraction queues, poll for it there, then we drop the original Ethernet packet and masquerade the packet retrieved over MMIO (plus the timestamp) as the original when we inject it up the stack. Create a quirk in struct felix is selected by the Felix driver (but not by Seville, since that doesn't support PTP at all). We want to do this such that the workaround is minimally invasive for future switches that don't require this workaround. The only traffic for which we need timestamps is PTP traffic, so add a redirection rule to the CPU port module for this. Currently we only have the need for PTP over L2, so redirection rules for UDP ports 319 and 320 are TBD for now. Note that for the workaround of matching of PTP-over-Ethernet-port with PTP-over-MMIO queues to work properly, both channels need to be absolutely lossless. There are two parts to achieving that: - We keep flow control enabled on the tag_8021q CPU port - We put the DSA master interface in promiscuous mode, so it will never drop a PTP frame (for the profiles we are interested in, these are sent to the multicast MAC addresses of 01-80-c2-00-00-0e and 01-1b-19-00-00-00). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: dsa: tag_ocelot: create separate tagger for SevilleVladimir Oltean
The ocelot tagger is a hot mess currently, it relies on memory initialized by the attached driver for basic frame transmission. This is against all that DSA tagging protocols stand for, which is that the transmission and reception of a DSA-tagged frame, the data path, should be independent from the switch control path, because the tag protocol is in principle hot-pluggable and reusable across switches (even if in practice it wasn't until very recently). But if another driver like dsa_loop wants to make use of tag_ocelot, it couldn't. This was done to have common code between Felix and Ocelot, which have one bit difference in the frame header format. Quoting from commit 67c2404922c2 ("net: dsa: felix: create a template for the DSA tags on xmit"): Other alternatives have been analyzed, such as: - Create a separate tag_seville.c: too much code duplication for just 1 bit field difference. - Create a separate DSA_TAG_PROTO_SEVILLE under tag_ocelot.c, just like tag_brcm.c, which would have a separate .xmit function. Again, too much code duplication for just 1 bit field difference. - Allocate the template from the init function of the tag_ocelot.c module, instead of from the driver: couldn't figure out a method of accessing the correct port template corresponding to the correct tagger in the .xmit function. The really interesting part is that Seville should have had its own tagging protocol defined - it is not compatible on the wire with Ocelot, even for that single bit. In principle, a packet generated by DSA_TAG_PROTO_OCELOT when booted on NXP LS1028A would look in a certain way, but when booted on NXP T1040 it would look differently. The reverse is also true: a packet generated by a Seville switch would be interpreted incorrectly by Wireshark if it was told it was generated by an Ocelot switch. Actually things are a bit more nuanced. If we concentrate only on the DSA tag, what I said above is true, but Ocelot/Seville also support an optional DSA tag prefix, which can be short or long, and it is possible to distinguish the two taggers based on an integer constant put in that prefix. Nonetheless, creating a separate tagger is still justified, since the tag prefix is optional, and without it, there is again no way to distinguish. Claiming backwards binary compatibility is a bit more tough, since I've already changed the format of tag_ocelot once, in commit 5124197ce58b ("net: dsa: tag_ocelot: use a short prefix on both ingress and egress"). Therefore I am not very concerned with treating this as a bugfix and backporting it to stable kernels (which would be another mess due to the fact that there would be lots of conflicts with the other DSA_TAG_PROTO* definitions). It's just simpler to say that the string values of the taggers have ABI value starting with kernel 5.12, which will be when the changing of tag protocol via /sys/class/net/<dsa-master>/dsa/tagging goes live. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-14net: mscc: ocelot: use common tag parsing code with DSAVladimir Oltean
The Injection Frame Header and Extraction Frame Header that the switch prepends to frames over the NPI port is also prepended to frames delivered over the CPU port module's queues. Let's unify the handling of the frame headers by making the ocelot driver call some helpers exported by the DSA tagger. Among other things, this allows us to get rid of the strange cpu_to_be32 when transmitting the Injection Frame Header on ocelot, since the packing API uses network byte order natively (when "quirks" is 0). The comments above ocelot_gen_ifh talk about setting pop_cnt to 3, and the cpu extraction queue mask to something, but the code doesn't do it, so we don't do it either. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: mscc: ocelot: offload bridge port flags to deviceVladimir Oltean
We should not be unconditionally enabling address learning, since doing that is actively detrimential when a port is standalone and not offloading a bridge. Namely, if a port in the switch is standalone and others are offloading the bridge, then we could enter a situation where we learn an address towards the standalone port, but the bridged ports could not forward the packet there, because the CPU is the only path between the standalone and the bridged ports. The solution of course is to not enable address learning unless the bridge asks for it. We need to set up the initial port flags for no learning and flooding everything, and also when the port joins and leaves the bridge. The flood configuration was already configured ok for standalone mode in ocelot_init, we just need to disable learning in ocelot_init_port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: mscc: ocelot: use separate flooding PGID for broadcastVladimir Oltean
In preparation of offloading the bridge port flags which have independent settings for unknown multicast and for broadcast, we should also start reserving one destination Port Group ID for the flooding of broadcast packets, to allow configuring it individually. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-12net: dsa: felix: restore multicast flood to CPU when NPI tagger reinitializesVladimir Oltean
ocelot_init sets up PGID_MC to include the CPU port module, and that is fine, but the ocelot-8021q tagger removes the CPU port module from the unknown multicast replicator. So after a transition from the default ocelot tagger towards ocelot-8021q and then again towards ocelot, multicast flooding towards the CPU port module will be disabled. Fixes: e21268efbe26 ("net: dsa: felix: perform switch setup for tag_8021q") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
2021-02-09net: dsa: felix: implement port flushing on .phylink_mac_link_downVladimir Oltean
There are several issues which may be seen when the link goes down while forwarding traffic, all of which can be attributed to the fact that the port flushing procedure from the reference manual was not closely followed. With flow control enabled on both the ingress port and the egress port, it may happen when a link goes down that Ethernet packets are in flight. In flow control mode, frames are held back and not dropped. When there is enough traffic in flight (example: iperf3 TCP), then the ingress port might enter congestion and never exit that state. This is a problem, because it is the egress port's link that went down, and that has caused the inability of the ingress port to send packets to any other port. This is solved by flushing the egress port's queues when it goes down. There is also a problem when performing stream splitting for IEEE 802.1CB traffic (not yet upstream, but a sort of multicast, basically). There, if one port from the destination ports mask goes down, splitting the stream towards the other destinations will no longer be performed. This can be traced down to this line: ocelot_port_writel(ocelot_port, 0, DEV_MAC_ENA_CFG); which should have been instead, as per the reference manual: ocelot_port_rmwl(ocelot_port, 0, DEV_MAC_ENA_CFG_RX_ENA, DEV_MAC_ENA_CFG); Basically only DEV_MAC_ENA_CFG_RX_ENA should be disabled, but not DEV_MAC_ENA_CFG_TX_ENA - I don't have further insight into why that is the case, but apparently multicasting to several ports will cause issues if at least one of them doesn't have DEV_MAC_ENA_CFG_TX_ENA set. I am not sure what the state of the Ocelot VSC7514 driver is, but probably not as bad as Felix/Seville, since VSC7514 uses phylib and has the following in ocelot_adjust_link: if (!phydev->link) return; therefore the port is not really put down when the link is lost, unlike the DSA drivers which use .phylink_mac_link_down for that. Nonetheless, I put ocelot_port_flush() in the common ocelot.c because it needs to access some registers from drivers/net/ethernet/mscc/ocelot_rew.h which are not exported in include/soc/mscc/ and a bugfix patch should probably not move headers around. Fixes: bdeced75b13f ("net: dsa: felix: Add PCS operations for PHYLINK") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-06net: dsa: felix: propagate the LAG offload ops towards the ocelot libVladimir Oltean
The ocelot switch has been supporting LAG offload since its initial commit, however felix could not make use of that, due to lack of a LAG abstraction in DSA. Now that we have that, let's forward DSA's calls towards the ocelot library, who will deal with setting up the bonding. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: dsa: felix: perform switch setup for tag_8021qVladimir Oltean
Unlike sja1105, the only other user of the software-defined tag_8021q.c tagger format, the implementation we choose for the Felix DSA switch driver preserves full functionality under a vlan_filtering bridge (i.e. IP termination works through the DSA user ports under all circumstances). The tag_8021q protocol just wants: - Identifying the ingress switch port based on the RX VLAN ID, as seen by the CPU. We achieve this by using the TCAM engines (which are also used for tc-flower offload) to push the RX VLAN as a second, outer tag, on egress towards the CPU port. - Steering traffic injected into the switch from the network stack towards the correct front port based on the TX VLAN, and consuming (popping) that header on the switch's egress. A tc-flower pseudocode of the static configuration done by the driver would look like this: $ tc qdisc add dev <cpu-port> clsact $ for eth in swp0 swp1 swp2 swp3; do \ tc filter add dev <cpu-port> egress flower indev ${eth} \ action vlan push id <rxvlan> protocol 802.1ad; \ tc filter add dev <cpu-port> ingress protocol 802.1Q flower vlan_id <txvlan> action vlan pop \ action mirred egress redirect dev ${eth}; \ done but of course since DSA does not register network interfaces for the CPU port, this configuration would be impossible for the user to do. Also, due to the same reason, it is impossible for the user to inadvertently delete these rules using tc. These rules do not collide in any way with tc-flower, they just consume some TCAM space, which is something we can live with. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: dsa: felix: convert to the new .change_tag_protocol DSA APIVladimir Oltean
In expectation of the new tag_ocelot_8021q tagger implementation, we need to be able to do runtime switchover between one tagger and another. So we must structure the existing code for the current NPI-based tagger in a certain way. We move the felix_npi_port_init function in expectation of the future driver configuration necessary for tag_ocelot_8021q: we would like to not have the NPI-related bits interspersed with the tag_8021q bits. The conversion from this: ocelot_write_rix(ocelot, ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports, 0)), ANA_PGID_PGID, PGID_UC); to this: cpu_flood = ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)); ocelot_rmw_rix(ocelot, cpu_flood, cpu_flood, ANA_PGID_PGID, PGID_UC); is perhaps non-trivial, but is nonetheless non-functional. The PGID_UC (replicator for unknown unicast) is already configured out of hardware reset to flood to all ports except ocelot->num_phys_ports (the CPU port module). All we change is that we use a read-modify-write to only add the CPU port module to the unknown unicast replicator, as opposed to doing a full write to the register. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-29net: mscc: ocelot: don't use NPI tag prefix for the CPU port moduleVladimir Oltean
Context: Ocelot switches put the injection/extraction frame header in front of the Ethernet header. When used in NPI mode, a DSA master would see junk instead of the destination MAC address, and it would most likely drop the packets. So the Ocelot frame header can have an optional prefix, which is just "ff:ff:ff:ff:ff:fe > ff:ff:ff:ff:ff:ff" padding put before the actual tag (still before the real Ethernet header) such that the DSA master thinks it's looking at a broadcast frame with a strange EtherType. Unfortunately, a lesson learned in commit 69df578c5f4b ("net: mscc: ocelot: eliminate confusion between CPU and NPI port") seems to have been forgotten in the meanwhile. The CPU port module and the NPI port have independent settings for the length of the tag prefix. However, the driver is using the same variable to program both of them. There is no reason really to use any tag prefix with the CPU port module, since that is not connected to any Ethernet port. So this patch makes the inj_prefix and xtr_prefix variables apply only to the NPI port (which the switchdev ocelot_vsc7514 driver does not use). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: mscc: ocelot: configure watermarks using devlink-sbVladimir Oltean
Using devlink-sb, we can configure 12/16 (the important 75%) of the switch's controlling watermarks for congestion drops, and we can monitor 50% of the watermark occupancies (we can monitor the reservation watermarks, but not the sharing watermarks, which are exposed as pool sizes). The following definitions can be made: SB_BUF=0 # The devlink-sb for frame buffers SB_REF=1 # The devlink-sb for frame references POOL_ING=0 # The pool for ingress traffic. Both devlink-sb instances # have one of these. POOL_EGR=1 # The pool for egress traffic. Both devlink-sb instances # have one of these. Editing the hardware watermarks is done in the following way: BUF_xxxx_I is accessed when sb=$SB_BUF and pool=$POOL_ING REF_xxxx_I is accessed when sb=$SB_REF and pool=$POOL_ING BUF_xxxx_E is accessed when sb=$SB_BUF and pool=$POOL_EGR REF_xxxx_E is accessed when sb=$SB_REF and pool=$POOL_EGR Configuring the sharing watermarks for COL_SHR(dp=0) is done implicitly by modifying the corresponding pool size. By default, the pool size has maximum size, so this can be skipped. devlink sb pool set pci/0000:00:00.5 sb $SB_BUF pool $POOL_ING \ size 129840 thtype static Since by default there is no buffer reservation, the above command has maxed out BUF_COL_SHR_I(dp=0). Configuring the per-port reservation watermark (P_RSRV) is done in the following way: devlink sb port pool set pci/0000:00:00.5/0 sb $SB_BUF \ pool $POOL_ING th 1000 The above command sets BUF_P_RSRV_I(port 0) to 1000 bytes. After this command, the sharing watermarks are internally reconfigured with 1000 bytes less, i.e. from 129840 bytes to 128840 bytes. Configuring the per-port-tc reservation watermarks (Q_RSRV) is done in the following way: for tc in {0..7}; do devlink sb tc bind set pci/0000:00:00.5/0 sb 0 tc $tc \ type ingress pool $POOL_ING \ th 3000 done The above command sets BUF_Q_RSRV_I(port 0, tc 0..7) to 3000 bytes. The sharing watermarks are again reconfigured with 24000 bytes less. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: mscc: ocelot: export NUM_TC constant from felix to common switch libVladimir Oltean
We should be moving anything that isn't DSA-specific or SoC-specific out of the felix DSA driver, and into the common mscc_ocelot switch library. The number of traffic classes is one of the aspects that is common between all ocelot switches, so it belongs in the library. This patch also makes seville use 8 TX queues, and therefore enables prioritization via the QOS_CLASS field in the NPI injection header. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: dsa: felix: perform teardown in reverse order of setupVladimir Oltean
In general it is desirable that cleanup is the reverse process of setup. In this case I am not seeing any particular issue, but with the introduction of devlink-sb for felix, a non-obvious decision had to be made as to where to put its cleanup method. When there's a convention in place, that decision becomes obvious. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: dsa: felix: reindent struct dsa_switch_opsVladimir Oltean
The devlink function pointer names are super long, and they would break the alignment. So reindent the existing ops now by adding one tab. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: mscc: ocelot: auto-detect packet buffer size and number of frame referencesVladimir Oltean
Instead of reading these values from the reference manual and writing them down into the driver, it appears that the hardware gives us the option of detecting them dynamically. The number of frame references corresponds to what the reference manual notes, however it seems that the frame buffers are reported as slightly less than the books would indicate. On VSC9959 (Felix), the books say it should have 128KB of packet buffer, but the registers indicate only 129840 bytes (126.79 KB). Also, the unit of measurement for FREECNT from the documentation of all these devices is incorrect (taken from an older generation). This was confirmed by Younes Leroul from Microchip support. Not having anything better to do with these values at the moment* (this will change soon), let's just print them. *The frame buffer size is, in fact, used to calculate the tail dropping watermarks. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15net: dsa: set configure_vlan_while_not_filtering to true by defaultVladimir Oltean
As explained in commit 54a0ed0df496 ("net: dsa: provide an option for drivers to always receive bridge VLANs"), DSA has historically been skipping VLAN switchdev operations when the bridge wasn't in vlan_filtering mode, but the reason why it was doing that has never been clear. So the configure_vlan_while_not_filtering option is there merely to preserve functionality for existing drivers. It isn't some behavior that drivers should opt into. Ideally, when all drivers leave this flag set, we can delete the dsa_port_skip_vlan_configuration() function. New drivers always seem to omit setting this flag, for some reason. So let's reverse the logic: the DSA core sets it by default to true before the .setup() callback, and legacy drivers can turn it off. This way, new drivers get the new behavior by default, unless they explicitly set the flag to false, which is more obvious during review. Remove the assignment from drivers which were setting it to true, and add the assignment to false for the drivers that didn't previously have it. This way, it should be easier to see how many we have left. The following drivers: lan9303, mv88e6060 were skipped from setting this flag to false, because they didn't have any VLAN offload ops in the first place. The Broadcom Starfighter 2 driver calls the common b53_switch_alloc and therefore also inherits the configure_vlan_while_not_filtering=true behavior. Also, print a message through netlink extack every time a VLAN has been skipped. This is mildly annoying on purpose, so that (a) it is at least clear that VLANs are being skipped - the legacy behavior in itself is confusing, and the extack should be much more difficult to miss, unlike kernel logs - and (b) people have one more incentive to convert to the new behavior. No behavior change except for the added prints is intended at this time. $ ip link add br0 type bridge vlan_filtering 0 $ ip link set sw0p2 master br0 [ 60.315148] br0: port 1(sw0p2) entered blocking state [ 60.320350] br0: port 1(sw0p2) entered disabled state [ 60.327839] device sw0p2 entered promiscuous mode [ 60.334905] br0: port 1(sw0p2) entered blocking state [ 60.340142] br0: port 1(sw0p2) entered forwarding state Warning: dsa_core: skipping configuration of VLAN. # This was the pvid $ bridge vlan add dev sw0p2 vid 100 Warning: dsa_core: skipping configuration of VLAN. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210115231919.43834-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11net: dsa: remove the transactional logic from VLAN objectsVladimir Oltean
It should be the driver's business to logically separate its VLAN offloading into a preparation and a commit phase, and some drivers don't need / can't do this. So remove the transactional shim from DSA and let drivers propagate errors directly from the .port_vlan_add callback. It would appear that the code has worse error handling now than it had before. DSA is the only in-kernel user of switchdev that offloads one switchdev object to more than one port: for every VLAN object offloaded to a user port, that VLAN is also offloaded to the CPU port. So the "prepare for user port -> check for errors -> prepare for CPU port -> check for errors -> commit for user port -> commit for CPU port" sequence appears to make more sense than the one we are using now: "offload to user port -> check for errors -> offload to CPU port -> check for errors", but it is really a compromise. In the new way, we can catch errors from the commit phase that we previously had to ignore. But we have our hands tied and cannot do any rollback now: if we add a VLAN on the CPU port and it fails, we can't do the rollback by simply deleting it from the user port, because the switchdev API is not so nice with us: it could have simply been there already, even with the same flags. So we don't even attempt to rollback anything on addition error, just leave whatever VLANs managed to get offloaded right where they are. This should not be a problem at all in practice. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11net: dsa: remove the transactional logic from MDB entriesVladimir Oltean
For many drivers, the .port_mdb_prepare callback was not a good opportunity to avoid any error condition, and they would suppress errors found during the actual commit phase. Where a logical separation between the prepare and the commit phase existed, the function that used to implement the .port_mdb_prepare callback still exists, but now it is called directly from .port_mdb_add, which was modified to return an int code. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Reviewed-by: Linus Wallei <linus.walleij@linaro.org> # RTL8366 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11net: switchdev: remove the transaction structure from port attributesVladimir Oltean
Since the introduction of the switchdev API, port attributes were transmitted to drivers for offloading using a two-step transactional model, with a prepare phase that was supposed to catch all errors, and a commit phase that was supposed to never fail. Some classes of failures can never be avoided, like hardware access, or memory allocation. In the latter case, merely attempting to move the memory allocation to the preparation phase makes it impossible to avoid memory leaks, since commit 91cf8eceffc1 ("switchdev: Remove unused transaction item queue") which has removed the unused mechanism of passing on the allocated memory between one phase and another. It is time we admit that separating the preparation from the commit phase is something that is best left for the driver to decide, and not something that should be baked into the API, especially since there are no switchdev callers that depend on this. This patch removes the struct switchdev_trans member from switchdev port attribute notifier structures, and converts drivers to not look at this member. In part, this patch contains a revert of my previous commit 2e554a7a5d8a ("net: dsa: propagate switchdev vlan_filtering prepare phase to drivers"). For the most part, the conversion was trivial except for: - Rocker's world implementation based on Broadcom OF-DPA had an odd implementation of ofdpa_port_attr_bridge_flags_set. The conversion was done mechanically, by pasting the implementation twice, then only keeping the code that would get executed during prepare phase on top, then only keeping the code that gets executed during the commit phase on bottom, then simplifying the resulting code until this was obtained. - DSA's offloading of STP state, bridge flags, VLAN filtering and multicast router could be converted right away. But the ageing time could not, so a shim was introduced and this was left for a further commit. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Reviewed-by: Linus Walleij <linus.walleij@linaro.org> # RTL8366RB Reviewed-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11net: switchdev: remove vid_begin -> vid_end range from VLAN objectsVladimir Oltean
The call path of a switchdev VLAN addition to the bridge looks something like this today: nbp_vlan_init | __br_vlan_set_default_pvid | | | | | br_afspec | | | | | | | v | | | br_process_vlan_info | | | | | | | v | | | br_vlan_info | | | / \ / | | / \ / | | / \ / | | / \ / v v v v v nbp_vlan_add br_vlan_add ------+ | ^ ^ | | | / | | | | / / / | \ br_vlan_get_master/ / v \ ^ / / br_vlan_add_existing \ | / / | \ | / / / \ | / / / \ | / / / \ | / / / v | | v / __vlan_add / / | / / | / v | / __vlan_vid_add | / \ | / v v v br_switchdev_port_vlan_add The ranges UAPI was introduced to the bridge in commit bdced7ef7838 ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") (Jan 10 2015). But the VLAN ranges (parsed in br_afspec) have always been passed one by one, through struct bridge_vlan_info tmp_vinfo, to br_vlan_info. So the range never went too far in depth. Then Scott Feldman introduced the switchdev_port_bridge_setlink function in commit 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink"). That marked the introduction of the SWITCHDEV_OBJ_PORT_VLAN, which made full use of the range. But switchdev_port_bridge_setlink was called like this: br_setlink -> br_afspec -> switchdev_port_bridge_setlink Basically, the switchdev and the bridge code were not tightly integrated. Then commit 41c498b9359e ("bridge: restore br_setlink back to original") came, and switchdev drivers were required to implement .ndo_bridge_setlink = switchdev_port_bridge_setlink for a while. In the meantime, commits such as 0944d6b5a2fa ("bridge: try switchdev op first in __vlan_vid_add/del") finally made switchdev penetrate the br_vlan_info() barrier and start to develop the call path we have today. But remember, br_vlan_info() still receives VLANs one by one. Then Arkadi Sharshevsky refactored the switchdev API in 2017 in commit 29ab586c3d83 ("net: switchdev: Remove bridge bypass support from switchdev") so that drivers would not implement .ndo_bridge_setlink any longer. The switchdev_port_bridge_setlink also got deleted. This refactoring removed the parallel bridge_setlink implementation from switchdev, and left the only switchdev VLAN objects to be the ones offloaded from __vlan_vid_add (basically RX filtering) and __vlan_add (the latter coming from commit 9c86ce2c1ae3 ("net: bridge: Notify about bridge VLANs")). That is to say, today the switchdev VLAN object ranges are not used in the kernel. Refactoring the above call path is a bit complicated, when the bridge VLAN call path is already a bit complicated. Let's go off and finish the job of commit 29ab586c3d83 by deleting the bogus iteration through the VLAN ranges from the drivers. Some aspects of this feature never made too much sense in the first place. For example, what is a range of VLANs all having the BRIDGE_VLAN_INFO_PVID flag supposed to mean, when a port can obviously have a single pvid? This particular configuration _is_ denied as of commit 6623c60dc28e ("bridge: vlan: enforce no pvid flag in vlan ranges"), but from an API perspective, the driver still has to play pretend, and only offload the vlan->vid_end as pvid. And the addition of a switchdev VLAN object can modify the flags of another, completely unrelated, switchdev VLAN object! (a VLAN that is PVID will invalidate the PVID flag from whatever other VLAN had previously been offloaded with switchdev and had that flag. Yet switchdev never notifies about that change, drivers are supposed to guess). Nonetheless, having a VLAN range in the API makes error handling look scarier than it really is - unwinding on errors and all of that. When in reality, no one really calls this API with more than one VLAN. It is all unnecessary complexity. And despite appearing pretentious (two-phase transactional model and all), the switchdev API is really sloppy because the VLAN addition and removal operations are not paired with one another (you can add a VLAN 100 times and delete it just once). The bridge notifies through switchdev of a VLAN addition not only when the flags of an existing VLAN change, but also when nothing changes. There are switchdev drivers out there who don't like adding a VLAN that has already been added, and those checks don't really belong at driver level. But the fact that the API contains ranges is yet another factor that prevents this from being addressed in the future. Of the existing switchdev pieces of hardware, it appears that only Mellanox Spectrum supports offloading more than one VLAN at a time, through mlxsw_sp_port_vlan_set. I have kept that code internal to the driver, because there is some more bookkeeping that makes use of it, but I deleted it from the switchdev API. But since the switchdev support for ranges has already been de facto deleted by a Mellanox employee and nobody noticed for 4 years, I'm going to assume it's not a biggie. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> # switchdev and mlxsw Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> # hellcreek Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07net: dsa: ocelot: request DSA to fix up lack of address learning on CPU portVladimir Oltean
Given the following setup: ip link add br0 type bridge ip link set eno0 master br0 ip link set swp0 master br0 ip link set swp1 master br0 ip link set swp2 master br0 ip link set swp3 master br0 Currently, packets received on a DSA slave interface (such as swp0) which should be routed by the software bridge towards a non-switch port (such as eno0) are also flooded towards the other switch ports (swp1, swp2, swp3) because the destination is unknown to the hardware switch. This patch addresses the issue by monitoring the addresses learnt by the software bridge on eno0, and adding/deleting them as static FDB entries on the CPU port accordingly. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
xdp_return_frame_bulk() needs to pass a xdp_buff to __xdp_return(). strlcpy got converted to strscpy but here it makes no functional difference, so just keep the right code. Conflicts: net/netfilter/nf_tables_api.c Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-05net: mscc: ocelot: fix dropping of unknown IPv4 multicast on SevilleVladimir Oltean
The current assumption is that the felix DSA driver has flooding knobs per traffic class, while ocelot switchdev has a single flooding knob. This was correct for felix VSC9959 and ocelot VSC7514, but with the introduction of seville VSC9953, we see a switch driven by felix.c which has a single flooding knob. So it is clear that we must do what should have been done from the beginning, which is not to overwrite the configuration done by ocelot.c in felix, but instead to teach the common ocelot library about the differences in our switches, and set up the flooding PGIDs centrally. The effect that the bogus iteration through FELIX_NUM_TC has upon seville is quite dramatic. ANA_FLOODING is located at 0x00b548, and ANA_FLOODING_IPMC is located at 0x00b54c. So the bogus iteration will actually overwrite ANA_FLOODING_IPMC when attempting to write ANA_FLOODING[1]. There is no ANA_FLOODING[1] in sevile, just ANA_FLOODING. And when ANA_FLOODING_IPMC is overwritten with a bogus value, the effect is that ANA_FLOODING_IPMC gets the value of 0x0003CF7D: MC6_DATA = 61, MC6_CTRL = 61, MC4_DATA = 60, MC4_CTRL = 0. Because MC4_CTRL is zero, this means that IPv4 multicast control packets are not flooded, but dropped. An invalid configuration, and this is how the issue was actually spotted. Reported-by: Eldar Gasanov <eldargasanov2@gmail.com> Reported-by: Maxim Kochetkov <fido_max@inbox.ru> Tested-by: Eldar Gasanov <eldargasanov2@gmail.com> Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 3c7b51bd39b2 ("net: dsa: felix: allow flooding for all traffic classes") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20201204175416.1445937-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02net: dsa: felix: improve the workaround for multiple native VLANs on NPI portVladimir Oltean
After the good discussion with Florian from here: https://lore.kernel.org/netdev/20200911000337.htwr366ng3nc3a7d@skbuf/ I realized that the VLAN settings on the NPI port (the hardware "CPU port", in DSA parlance) don't actually make any difference, because that port is hardcoded in hardware to use what mv88e6xxx would call "unmodified" egress policy for VLANs. So earlier patch 183be6f967fe ("net: dsa: felix: send VLANs on CPU port as egress-tagged") was incorrect in the sense that it didn't actually make the VLANs be sent on the NPI port as egress-tagged. It only made ocelot_port_set_native_vlan shut up. Now that we have moved the check from ocelot_port_set_native_vlan to ocelot_vlan_prepare, we can simply shunt ocelot_vlan_prepare from DSA, and avoid calling it. This is the correct way to deal with things, because the NPI port configuration is DSA-specific, so the ocelot switch library should not have the check for multiple native VLANs refined in any way, it is correct the way it is. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-02net: mscc: ocelot: deny changing the native VLAN from the prepare phaseVladimir Oltean
Put the preparation phase of switchdev VLAN objects to some good use, and move the check we already had, for preventing the existence of more than one egress-untagged VLAN per port, to the preparation phase of the addition. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-05net: dsa: propagate switchdev vlan_filtering prepare phase to driversVladimir Oltean
A driver may refuse to enable VLAN filtering for any reason beyond what the DSA framework cares about, such as: - having tc-flower rules that rely on the switch being VLAN-aware - the particular switch does not support VLAN, even if the driver does (the DSA framework just checks for the presence of the .port_vlan_add and .port_vlan_del pointers) - simply not supporting this configuration to be toggled at runtime Currently, when a driver rejects a configuration it cannot support, it does this from the commit phase, which triggers various warnings in switchdev. So propagate the prepare phase to drivers, to give them the ability to refuse invalid configurations cleanly and avoid the warnings. Since we need to modify all function prototypes and check for the prepare phase from within the drivers, take that opportunity and move the existing driver restrictions within the prepare phase where that is possible and easy. Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: Hauke Mehrtens <hauke@hauke-m.de> Cc: Woojung Huh <woojung.huh@microchip.com> Cc: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com> Cc: Sean Wang <sean.wang@mediatek.com> Cc: Landen Chao <Landen.Chao@mediatek.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Vivien Didelot <vivien.didelot@gmail.com> Cc: Jonathan McDowell <noodles@earth.li> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-10-02net: mscc: ocelot: introduce conversion helpers between port and netdevVladimir Oltean
Since the mscc_ocelot_switch_lib is common between a pure switchdev and a DSA driver, the procedure of retrieving a net_device for a certain port index differs, as those are registered by their individual front-ends. Up to now that has been dealt with by always passing the port index to the switch library, but now, we're going to need to work with net_device pointers from the tc-flower offload, for things like indev, or mirred. It is not desirable to refactor that, so let's make sure that the flower offload core has the ability to translate between a net_device and a port index properly. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29net: mscc: ocelot: generalize existing code for VCAPVladimir Oltean
In the Ocelot switches there are 3 TCAMs: VCAP ES0, IS1 and IS2, which have the same configuration interface, but different sets of keys and actions. The driver currently only supports VCAP IS2. In preparation of VCAP IS1 and ES0 support, the existing code must be generalized to work with any VCAP. In that direction, we should move the structures that depend upon VCAP instantiation, like vcap_is2_keys and vcap_is2_actions, out of struct ocelot and into struct vcap_props .keys and .actions, a structure that is replicated 3 times, once per VCAP. We'll pass that structure as an argument to each function that does the key and action packing - only the control logic needs to distinguish between ocelot->vcap[VCAP_IS2] or IS1 or ES0. Another change is to make use of the newly introduced ocelot_target_read and ocelot_target_write API, since the 3 VCAPs have the same registers but put at different addresses. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26net: dsa: tag_ocelot: use a short prefix on both ingress and egressVladimir Oltean
There are 2 goals that we follow: - Reduce the header size - Make the header size equal between RX and TX The issue that required long prefix on RX was the fact that the ocelot DSA tag, being put before Ethernet as it is, would overlap with the area that a DSA master uses for RX filtering (destination MAC address mainly). Now that we can ask DSA to put the master in promiscuous mode, in theory we could remove the prefix altogether and call it a day, but it looks like we can't. Using no prefix on ingress, some packets (such as ICMP) would be received, while others (such as PTP) would not be received. This is because the DSA master we use (enetc) triggers parse errors ("MAC rx frame errors") presumably because it sees Ethernet frames with a bad length. And indeed, when using no prefix, the EtherType (bytes 12-13 of the frame, bits 96-111) falls over the REW_VAL field from the extraction header, aka the PTP timestamp. When turning the short (32-bit) prefix on, the EtherType overlaps with bits 64-79 of the extraction header, which are a reserved area transmitted as zero by the switch. The packets are not dropped by the DSA master with a short prefix. Actually, the frames look like this in tcpdump (below is a PTP frame, with an extra dsa_8021q tag - dadb 0482 - added by a downstream sja1105). 89:0c:a9:f2:01:00 > 88:80:00:0a:00:1d, 802.3, length 0: LLC, \ dsap Unknown (0x10) Individual, ssap ProWay NM (0x0e) Response, \ ctrl 0x0004: Information, send seq 2, rcv seq 0, \ Flags [Response], length 78 0x0000: 8880 000a 001d 890c a9f2 0100 0000 100f ................ 0x0010: 0400 0000 0180 c200 000e 001f 7b63 0248 ............{c.H 0x0020: dadb 0482 88f7 1202 0036 0000 0000 0000 .........6...... 0x0030: 0000 0000 0000 0000 0000 001f 7bff fe63 ............{..c 0x0040: 0248 0001 1f81 0500 0000 0000 0000 0000 .H.............. 0x0050: 0000 0000 0000 0000 0000 0000 ............ So the short prefix is our new default: we've shortened our RX frames by 12 octets, increased TX by 4, and headers are now equal between RX and TX. Note that we still need promiscuous mode for the DSA master to not drop it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26net: mscc: ocelot: move NPI port configuration to DSAVladimir Oltean
Remove the ocelot_configure_cpu() function, which was in fact bringing up 2 ports: the CPU port module, which both switchdev and DSA have, and the NPI port, which only DSA has. The (non-Ethernet) CPU port module is at a fixed index in the analyzer, whereas the NPI port is selected through the "ethernet" property in the device tree. Therefore, the function to set up an NPI port is DSA-specific, so we move it there, simplifying the ocelot switch library a little bit. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: UNGLinuxDriver <UNGLinuxDriver@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-24net: mscc: ocelot: always pass skb clone to ocelot_port_add_txtstamp_skbVladimir Oltean
Currently, ocelot switchdev passes the skb directly to the function that enqueues it to the list of skb's awaiting a TX timestamp. Whereas the felix DSA driver first clones the skb, then passes the clone to this queue. This matters because in the case of felix, the common IRQ handler, which is ocelot_get_txtstamp(), currently clones the clone, and frees the original clone. This is useless and can be simplified by using skb_complete_tx_timestamp() instead of skb_tstamp_tx(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>