summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice/ice_lib.c
AgeCommit message (Collapse)Author
2022-04-05ice: Set txq_teid to ICE_INVAL_TEID on ring creationAnatolii Gerasymenko
When VF is freshly created, but not brought up, ring->txq_teid value is by default set to 0. But 0 is a valid TEID. On some platforms the Root Node of Tx scheduler has a TEID = 0. This can cause issues as shown below. The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF). Testing Hints: echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs ip link set dev ens785f0v0 up ip link set dev ens785f0v0 down If we have freshly created VF and quickly turn it on and off, so there would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error: [ 639.531454] disable queue 89 failed 14 [ 639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR [ 639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5 The reason for the fail is that we are trying to send AQ command to delete queue 89, which has never been created and receive an "invalid argument" error from firmware. As this queue has never been created, it's teid and ring->txq_teid have default value 0. ice_dis_vsi_txq has a check against non-existent queues: node = ice_sched_find_node_by_teid(pi->root, q_teids[i]); if (!node) continue; But on some platforms the Root Node of Tx scheduler has a teid = 0. Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is pi->root), and we go further to submit an erroneous request to firmware. Fixes: 37bb83901286 ("ice: Move common functions out of ice_main.c part 7/7") Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-01ice: Clear default forwarding VSI during VSI releaseIvan Vecera
VSI is set as default forwarding one when promisc mode is set for PF interface, when PF is switched to switchdev mode or when VF driver asks to enable allmulticast or promisc mode for the VF interface (when vf-true-promisc-support priv flag is off). The third case is buggy because in that case VSI associated with VF remains as default one after VF removal. Reproducer: 1. Create VF echo 1 > sys/class/net/ens7f0/device/sriov_numvfs 2. Enable allmulticast or promisc mode on VF ip link set ens7f0v0 allmulticast on ip link set ens7f0v0 promisc on 3. Delete VF echo 0 > sys/class/net/ens7f0/device/sriov_numvfs 4. Try to enable promisc mode on PF ip link set ens7f0 promisc on Although it looks that promisc mode on PF is enabled the opposite is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC handling first checks if any other VSI is set as default forwarding one and if so the function does not do anything. At this point it is not possible to enable promisc mode on PF without re-probe device. To resolve the issue this patch clear default forwarding VSI during ice_vsi_release() when the VSI to be released is the default one. Fixes: 01b5e89aab49 ("ice: Add VF promiscuous support") Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Alice Michael <alice.michael@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-09ice: change "can't set link" message to dbg levelJonathan Toppins
In the case where the link is owned by manageability, the firmware is not allowed to set the link state, so an error code is returned. This however is non-fatal and there is nothing the operator can do, so instead of confusing the operator with messages they can do nothing about hide this message behind the debug log level. Signed-off-by: Jonathan Toppins <jtoppins@redhat.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: convert VF storage to hash table with krefs and RCUJacob Keller
The ice driver stores VF structures in a simple array which is allocated once at the time of VF creation. The VF structures are then accessed from the array by their VF ID. The ID must be between 0 and the number of allocated VFs. Multiple threads can access this table: * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust * interrupts, such as due to messages from the VF using the virtchnl communication * processing such as device reset * commands to add or remove VFs The current implementation does not keep track of when all threads are done operating on a VF and can potentially result in use-after-free issues caused by one thread accessing a VF structure after it has been released when removing VFs. Some of these are prevented with various state flags and checks. In addition, this structure is quite static and does not support a planned future where virtualization can be more dynamic. As we begin to look at supporting Scalable IOV with the ice driver (as opposed to just supporting Single Root IOV), this structure is not sufficient. In the future, VFs will be able to be added and removed individually and dynamically. To allow for this, and to better protect against a whole class of use-after-free bugs, replace the VF storage with a combination of a hash table and krefs to reference track all of the accesses to VFs through the hash table. A hash table still allows efficient look up of the VF given its ID, but also allows adding and removing VFs. It does not require contiguous VF IDs. The use of krefs allows the cleanup of the VF memory to be delayed until after all threads have released their reference (by calling ice_put_vf). To prevent corruption of the hash table, a combination of RCU and the mutex table_lock are used. Addition and removal from the hash table use the RCU-aware hash macros. This allows simple read-only look ups that iterate to locate a single VF can be fast using RCU. Accesses which modify the hash table, or which can't take RCU because they sleep, will hold the mutex lock. By using this design, we have a stronger guarantee that the VF structure can't be released until after all threads are finished operating on it. We also pave the way for the more dynamic Scalable IOV implementation in the future. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: introduce VF accessor functionsJacob Keller
Before we switch the VF data structure storage mechanism to a hash, introduce new accessor functions to define the new interface. * ice_get_vf_by_id is a function used to obtain a reference to a VF from the table based on its VF ID * ice_has_vfs is used to quickly check if any VFs are configured * ice_get_num_vfs is used to get an exact count of how many VFs are configured We can drop the old ice_validate_vf_id function, since every caller was just going to immediately access the VF table to get a reference anyways. This way we simply use the single ice_get_vf_by_id to both validate the VF ID is within range and that there exists a VF with that ID. This change enables us to more easily convert the codebase to the hash table since most callers now properly use the interface. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: factor VF variables to separate structureJacob Keller
We maintain a number of values for VFs within the ice_pf structure. This includes the VF table, the number of allocated VFs, the maximum number of supported SR-IOV VFs, the number of queue pairs per VF, the number of MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events. We're about to add a few more variables to this list. Clean this up first by extracting these members out into a new ice_vfs structure defined in ice_virtchnl_pf.h Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: convert ice_for_each_vf to include VF entry iteratorJacob Keller
The ice_for_each_vf macro is intended to be used to loop over all VFs. The current implementation relies on an iterator that is the index into the VF array in the PF structure. This forces all users to perform a look up themselves. This abstraction forces a lot of duplicate work on callers and leaks the interface implementation to the caller. Replace this with an implementation that includes the VF pointer the primary iterator. This version simplifies callers which just want to iterate over every VF, as they no longer need to perform their own lookup. The "i" iterator value is replaced with a new unsigned int "bkt" parameter, as this will match the necessary interface for replacing the VF array with a hash table. For now, the bkt is the VF ID, but in the future it will simply be the hash bucket index. Document that it should not be treated as a VF ID. This change aims to simplify switching from the array to a hash table. I considered alternative implementations such as an xarray but decided that the hash table was the simplest and most suitable implementation. I also looked at methods to hide the bkt iterator entirely, but I couldn't come up with a feasible solution that worked for hash table iterators. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: store VF pointer instead of VF IDJacob Keller
The VSI structure contains a vf_id field used to associate a VSI with a VF. This is used mainly for ICE_VSI_VF as well as partially for ICE_VSI_CTRL associated with the VFs. This API was designed with the idea that VFs are stored in a simple array that was expected to be static throughout most of the driver's life. We plan on refactoring VF storage in a few key ways: 1) converting from a simple static array to a hash table 2) using krefs to track VF references obtained from the hash table 3) use RCU to delay release of VF memory until after all references are dropped This is motivated by the goal to ensure that the lifetime of VF structures is accounted for, and prevent various use-after-free bugs. With the existing vsi->vf_id, the reference tracking for VFs would become somewhat convoluted, because each VSI maintains a vf_id field which will then require performing a look up. This means all these flows will require reference tracking and proper usage of rcu_read_lock, etc. We know that the VF VSI will always be backed by a valid VF structure, because the VSI is created during VF initialization and removed before the VF is destroyed. Rely on this and store a reference to the VF in the VSI structure instead of storing a VF ID. This will simplify the usage and avoid the need to perform lookups on the hash table in the future. For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a vsi->vf pointer is valid when dealing with VF VSIs. This will aid in debugging code which violates this assumption and avoid more disastrous panics. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03ice: add TTY for GNSS module for E810T deviceKarol Kolacinski
Add a new ice_gnss.c file for holding the basic GNSS module functions. If the device supports GNSS module, call the new ice_gnss_init and ice_gnss_release functions where appropriate. Implement basic functionality for reading the data from GNSS module using TTY device. Add I2C read AQ command. It is now required for controlling the external physical connectors via external I2C port expander on E810-T adapters. Future changes will introduce write functionality. Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Sudhansu Sekhar Mishra <sudhansu.mishra@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-14ice: Simplify tracking status of RDMA supportDave Ertman
The status of support for RDMA is currently being tracked with two separate status flags. This is unnecessary with the current state of the driver. Simplify status tracking down to a single flag. Rename the helper function to denote the RDMA specific status and universally use the helper function to test the status bit. Signed-off-by: Dave Ertman <david.m.ertman@intel.com> Tested-by: Leszek Kaliszczuk <leszek.kaliszczuk@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14ice: enable parsing IPSEC SPI headers for RSSJesse Brandeburg
The COMMS package can enable the hardware parser to recognize IPSEC frames with ESP header and SPI identifier. If this package is available and configured for loading in /lib/firmware, then the driver will succeed in enabling this protocol type for RSS. This in turn allows the hardware to hash over the SPI and use it to pick a consistent receive queue for the same secure flow. Without this all traffic is steered to the same queue for multiple traffic threads from the same IP address. For that reason this is marked as a fix, as the driver supports the model, but it wasn't enabled. If the package is not available, adding this type will fail, but the failure is ignored on purpose as it has no negative affect. Fixes: c90ed40cefe1 ("ice: Enable writing hardware filtering tables") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-09ice: Advertise 802.1ad VLAN filtering and offloads for PF netdevBrett Creeley
In order for the driver to support 802.1ad VLAN filtering and offloads, it needs to advertise those VLAN features and also support modifying those VLAN features, so make the necessary changes to ice_set_netdev_features(). By default, enable CTAG insertion/stripping and CTAG filtering for both Single and Double VLAN Modes (SVM/DVM). Also, in DVM, enable STAG filtering by default. This is done by setting the feature bits in netdev->features. Also, in DVM, support toggling of STAG insertion/stripping, but don't enable them by default. This is done by setting the feature bits in netdev->hw_features. Since 802.1ad VLAN filtering and offloads are only supported in DVM, make sure they are not enabled by default and that they cannot be enabled during runtime, when the device is in SVM. Add an implementation for the ndo_fix_features() callback. This is needed since the hardware cannot support multiple VLAN ethertypes for VLAN insertion/stripping simultaneously and all supported VLAN filtering must either be enabled or disabled together. Disable inner VLAN stripping by default when DVM is enabled. If a VSI supports stripping the inner VLAN in DVM, then it will have to configure that during runtime. For example if a VF is configured in a port VLAN while DVM is enabled it will be allowed to offload inner VLANs. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add hot path support for 802.1Q and 802.1ad VLAN offloadsBrett Creeley
Currently the driver only supports 802.1Q VLAN insertion and stripping. However, once Double VLAN Mode (DVM) is fully supported, then both 802.1Q and 802.1ad VLAN insertion and stripping will be supported. Unfortunately the VSI context parameters only allow for one VLAN ethertype at a time for VLAN offloads so only one or the other VLAN ethertype offload can be supported at once. To support this, multiple changes are needed. Rx path changes: [1] In DVM, the Rx queue context l2tagsel field needs to be cleared so the outermost tag shows up in the l2tag2_2nd field of the Rx flex descriptor. In Single VLAN Mode (SVM), the l2tagsel field should remain 1 to support SVM configurations. [2] Modify the ice_test_staterr() function to take a __le16 instead of the ice_32b_rx_flex_desc union pointer so this function can be used for both rx_desc->wb.status_error0 and rx_desc->wb.status_error1. [3] Add the new inline function ice_get_vlan_tag_from_rx_desc() that checks if there is a VLAN tag in l2tag1 or l2tag2_2nd. [4] In ice_receive_skb(), add a check to see if NETIF_F_HW_VLAN_STAG_RX is enabled in netdev->features. If it is, then this is the VLAN ethertype that needs to be added to the stripping VLAN tag. Since ice_fix_features() prevents CTAG_RX and STAG_RX from being enabled simultaneously, the VLAN ethertype will only ever be 802.1Q or 802.1ad. Tx path changes: [1] In DVM, the VLAN tag needs to be placed in the l2tag2 field of the Tx context descriptor. The new define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN was added to the list of tx_flags to handle this case. [2] When the stack requests the VLAN tag to be offloaded on Tx, the driver needs to set either ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN or ICE_TX_FLAGS_HW_VLAN, so the tag is inserted in l2tag2 or l2tag1 respectively. To determine which location to use, set a bit in the Tx ring flags field during ring allocation that can be used to determine which field to use in the Tx descriptor. In DVM, always use l2tag2, and in SVM, always use l2tag1. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add outer_vlan_ops and VSI specific VLAN ops implementationsBrett Creeley
Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN ops are only available when the device is in Double VLAN Mode (DVM). Depending on the VSI type, the requirements for what operations to use/allow differ. By default all VSI's have unsupported inner and outer VSI VLAN ops. This implementation was chosen to prevent unexpected crashes due to null pointer dereferences. Instead, if a VSI calls an unsupported op, it will just return -EOPNOTSUPP. Add implementations to support modifying outer VLAN fields for VSI context. This includes the ability to modify VLAN stripping, insertion, and the port VLAN based on the outer VLAN handling fields of the VSI context. These functions should only ever be used if DVM is enabled because that means the firmware supports the outer VLAN fields in the VSI context. If the device is in DVM, then always use the outer_vlan_ops, else use the vlan_ops since the device is in Single VLAN Mode (SVM). Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to ice_vsi_vlan_setup() as the latter function is specific to the PF and all other VSI types that need an untagged VLAN 0 filter already do this in their specific flows. Without this change, Flow Director is failing to initialize because it does not implement any VSI VLAN ops. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Adjust naming for inner VLAN operationsBrett Creeley
Current operations act on inner VLAN fields. To support double VLAN, outer VLAN operations and functions will be implemented. Add the "inner" naming to existing VLAN operations to distinguish them from the upcoming outer values and functions. Some spacing adjustments are made to align values. Note that the inner is not talking about a tunneled VLAN, but the second VLAN in the packet. For SVM the driver uses inner or single VLAN filtering and offloads and in Double VLAN Mode the driver uses the inner filtering and offloads for SR-IOV VFs in port VLANs in order to support offloading the guest VLAN while a port VLAN is configured. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Use the proto argument for VLAN opsBrett Creeley
Currently the proto argument is unused. This is because the driver only supports 802.1Q VLAN filtering. This policy is enforced via netdev features that the driver sets up when configuring the netdev, so the proto argument won't ever be anything other than 802.1Q. However, this will allow for future iterations of the driver to seemlessly support 802.1ad filtering. Begin using the proto argument and extend the related structures to support its use. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Introduce ice_vlan structBrett Creeley
Add a new struct for VLAN related information. Currently this holds VLAN ID and priority values, but will be expanded to hold TPID value. This reduces the changes necessary if any other values are added in future. Remove the action argument from these calls as it's always ICE_FWD_VSI. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add new VSI VLAN opsBrett Creeley
Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and offloads require more flexibility when configuring VLANs. The VSI VLAN interface will allow flexibility for configuring VLANs for all VSI types. Add new files to separate the VSI VLAN ops and move functions to make the code more organized. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Add helper function for adding VLAN 0Brett Creeley
There are multiple places where VLAN 0 is being added. Create a function to be called in order to minimize changes as the implementation is expanded to support double VLAN and avoid duplicated code. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09ice: Refactor spoofcheck configuration functionsBrett Creeley
Add functions to configure Tx VLAN antispoof based on iproute configuration and/or VLAN mode and VF driver support. This is needed later so the driver can control when it can be configured. Also, add functions that can be used to enable and disable MAC and VLAN spoofcheck. Move spoofchk configuration during VSI setup into the SR-IOV initialization path and into the post VSI rebuild flow for VF VSIs. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-30ice: Add flow director support for channel modeKiran Patil
Add support to enable flow-director filter when multiple TCs are configured. Flow director filter can be configured using ethtool (--config-ntuple option). When multiple TCs are configured, each TC is mapped to an unique HW VSI. So VSI corresponding to queue used in filter is identified and flow director context is updated with correct VSI while configuring ntuple filter in HW. Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-14ice: Propagate error codesTony Nguyen
As all functions now return standard error codes, propagate the values being returned instead of converting them to generic values. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14ice: Remove excess error variablesTony Nguyen
ice_status previously had a variable to contain these values where other error codes had a variable as well. With ice_status now being an int, there is no need for two variables to hold error values. In cases where this occurs, remove one of the excess variables and use a single one. Some initialization of variables are no longer needed and have been removed. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14ice: Cleanup after ice_status removalTony Nguyen
Clean up code after changing ice_status to int. Rearrange to fix reverse Christmas tree and pull lines up where applicable. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14ice: Remove enum ice_statusTony Nguyen
Replace uses of ice_status to, as equivalent as possible, error codes. Remove enum ice_status and its helper conversion function as they are no longer needed. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14ice: Use int for ice_statusTony Nguyen
To prepare for removal of ice_status, change the variables from ice_status to int. This eases the transition when values are changed to return standard int error codes over enum ice_status. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-12-14ice: Remove string printing for ice_statusTony Nguyen
Remove the ice_stat_str() function which prints the string representation of the ice_status error code. With upcoming changes moving away from ice_status, there will be no need for this function. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com>
2021-11-22ice: fix vsi->txq_map sizingMaciej Fijalkowski
The approach of having XDP queue per CPU regardless of user's setting exposed a hidden bug that could occur in case when Rx queue count differ from Tx queue count. Currently vsi->txq_map's size is equal to the doubled vsi->alloc_txq, which is not correct due to the fact that XDP rings were previously based on the Rx queue count. Below splat can be seen when ethtool -L is used and XDP rings are configured: [ 682.875339] BUG: kernel NULL pointer dereference, address: 000000000000000f [ 682.883403] #PF: supervisor read access in kernel mode [ 682.889345] #PF: error_code(0x0000) - not-present page [ 682.895289] PGD 0 P4D 0 [ 682.898218] Oops: 0000 [#1] PREEMPT SMP PTI [ 682.903055] CPU: 42 PID: 2878 Comm: ethtool Tainted: G OE 5.15.0-rc5+ #1 [ 682.912214] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016 [ 682.923380] RIP: 0010:devres_remove+0x44/0x130 [ 682.928527] Code: 49 89 f4 55 48 89 fd 4c 89 ff 53 48 83 ec 10 e8 92 b9 49 00 48 8b 9d a8 02 00 00 48 8d 8d a0 02 00 00 49 89 c2 48 39 cb 74 0f <4c> 3b 63 10 74 25 48 8b 5b 08 48 39 cb 75 f1 4c 89 ff 4c 89 d6 e8 [ 682.950237] RSP: 0018:ffffc90006a679f0 EFLAGS: 00010002 [ 682.956285] RAX: 0000000000000286 RBX: ffffffffffffffff RCX: ffff88908343a370 [ 682.964538] RDX: 0000000000000001 RSI: ffffffff81690d60 RDI: 0000000000000000 [ 682.972789] RBP: ffff88908343a0d0 R08: 0000000000000000 R09: 0000000000000000 [ 682.981040] R10: 0000000000000286 R11: 3fffffffffffffff R12: ffffffff81690d60 [ 682.989282] R13: ffffffff81690a00 R14: ffff8890819807a8 R15: ffff88908343a36c [ 682.997535] FS: 00007f08c7bfa740(0000) GS:ffff88a03fd00000(0000) knlGS:0000000000000000 [ 683.006910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 683.013557] CR2: 000000000000000f CR3: 0000001080a66003 CR4: 00000000003706e0 [ 683.021819] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 683.030075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 683.038336] Call Trace: [ 683.041167] devm_kfree+0x33/0x50 [ 683.045004] ice_vsi_free_arrays+0x5e/0xc0 [ice] [ 683.050380] ice_vsi_rebuild+0x4c8/0x750 [ice] [ 683.055543] ice_vsi_recfg_qs+0x9a/0x110 [ice] [ 683.060697] ice_set_channels+0x14f/0x290 [ice] [ 683.065962] ethnl_set_channels+0x333/0x3f0 [ 683.070807] genl_family_rcv_msg_doit+0xea/0x150 [ 683.076152] genl_rcv_msg+0xde/0x1d0 [ 683.080289] ? channels_prepare_data+0x60/0x60 [ 683.085432] ? genl_get_cmd+0xd0/0xd0 [ 683.089667] netlink_rcv_skb+0x50/0xf0 [ 683.094006] genl_rcv+0x24/0x40 [ 683.097638] netlink_unicast+0x239/0x340 [ 683.102177] netlink_sendmsg+0x22e/0x470 [ 683.106717] sock_sendmsg+0x5e/0x60 [ 683.110756] __sys_sendto+0xee/0x150 [ 683.114894] ? handle_mm_fault+0xd0/0x2a0 [ 683.119535] ? do_user_addr_fault+0x1f3/0x690 [ 683.134173] __x64_sys_sendto+0x25/0x30 [ 683.148231] do_syscall_64+0x3b/0xc0 [ 683.161992] entry_SYSCALL_64_after_hwframe+0x44/0xae Fix this by taking into account the value that num_possible_cpus() yields in addition to vsi->alloc_txq instead of doubling the latter. Fixes: efc2214b6047 ("ice: Add support for XDP") Fixes: 22bf877e528f ("ice: introduce XDP_TX fallback path") Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-29ice: Remove boolean vlan_promisc flag from functionBrett Creeley
Currently, the vlan_promisc flag is used exclusively by VF VSI to determine whether or not to toggle VLAN pruning along with trusted/true-promiscuous mode. This is not needed for a couple of reasons. First, trusted/true-promiscuous mode is only supposed to allow all MAC filters within VLANs that a VF has added filters for, so VLAN pruning should not be disabled. Second, the boolean argument makes the function confusing and unintuitive. Remove this flag. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Tony Brelinski <tony.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-28ice: Fix clang -Wimplicit-fallthrough in ice_pull_qvec_from_rc()Nathan Chancellor
Clang warns: drivers/net/ethernet/intel/ice/ice_lib.c:1906:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough] default: ^ drivers/net/ethernet/intel/ice/ice_lib.c:1906:2: note: insert 'break;' to avoid fall-through default: ^ break; 1 error generated. Clang is a little more pedantic than GCC, which does not warn when falling through to a case that is just break or return. Clang's version is more in line with the kernel's own stance in deprecated.rst, which states that all switch/case blocks must end in either break, fallthrough, continue, goto, or return. Add the missing break to silence the warning. Link: https://github.com/ClangBuiltLinux/linux/issues/1482 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller
Lots of simnple overlapping additions. With a build fix from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-20ice: enable ndo_setup_tc support for mqprio_qdiscKiran Patil
Add support in driver for TC_QDISC_SETUP_MQPRIO. This support enables instantiation of channels in HW using existing MQPRIO infrastructure which is extended to be offloadable. This provides a mechanism to configure dedicated set of queues for each TC. Configuring channels using "tc mqprio": -------------------------------------- tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \ queues 4@0 4@4 4@8 hw 1 mode channel Above command configures 3 TCs having 4 queues each. "hw 1 mode channel" implies offload of channel configuration to HW. When driver processes configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each TC maps to HW VSI with specified queues. User can optionally specify bandwidth min and max rate limit per TC (see example below). If shaper params like min and/or max bandwidth rate limit are specified, driver configures VSI specific rate limiter in HW. Configuring channels and bandwidth shaper parameters using "tc mqprio": ---------------------------------------------------------------- tc qdisc add dev <ethX> root mqprio \ num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \ shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \ max_rate 4Gbit 5Gbit 6Gbit 7Gbit Command to view configured TCs: ----------------------------- tc qdisc show dev <ethX> Deleting TCs: ------------ tc qdisc del dev <ethX> root mqprio Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-20ice: Add infrastructure for mqprio support via ndo_setup_tcKiran Patil
Add infrastructure required for "ndo_setup_tc:qdisc_mqprio". ice_vsi_setup is modified to configure traffic classes based on mqprio data received from the stack. This includes low-level functions to configure min, max rate-limit parameters in hardware for traffic classes. Each traffic class gets mapped to a hardware channel (VSI) which can be individually configured with different bandwidth parameters. Co-developed-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: fix rate limit update after coalesce changeJesse Brandeburg
If the adaptive settings are changed with ethtool -C ethx adaptive-rx off adaptive-tx off then the interrupt rate limit should be maintained as a user set value, but only if BOTH adaptive settings are off. Fix a bug where the rate limit that was being used in adaptive mode was staying set in the register but was not reported correctly by ethtool -c ethx. Due to long lines include a small refactor of q_vector variable. Fixes: b8b4772377dd ("ice: refactor interrupt moderation writes") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: update dim usage and moderationJesse Brandeburg
The driver was having trouble with unreliable latency when doing single threaded ping-pong tests. This was root caused to the DIM algorithm landing on a too slow interrupt value, which caused high latency, and it was especially present when queues were being switched frequently by the scheduler as happens on default setups today. In attempting to improve this, we allow the upper rate limit for interrupts to move to rate limit of 4 microseconds as a max, which means that no vector can generate more than 250,000 interrupts per second. The old config was up to 100,000. The driver previously tried to program the rate limit too frequently and if the receive and transmit side were both active on the same vector, the INTRL would be set incorrectly, and this change fixes that issue as a side effect of the redesign. This driver will operate from now on with a slightly changed DIM table with more emphasis towards latency sensitivity by having more table entries with lower latency than with high latency (high being >= 64 microseconds). The driver also resets the DIM algorithm state with a new stats set when there is no work done and the data becomes stale (older than 1 second), for the respective receive or transmit portion of the interrupt. Add a new helper for setting rate limit, which will be used more in a followup patch. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19ice: Add support for VF rate limitingBrett Creeley
Implement ndo_set_vf_rate to support setting of min_tx_rate and max_tx_rate; set the appropriate bandwidth in the scheduler for the node representing the specified VF VSI. Co-developed-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Tarun Singh <tarun.k.singh@intel.com> Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: make use of ice_for_each_* macrosMaciej Fijalkowski
Go through the code base and use ice_for_each_* macros. While at it, introduce ice_for_each_xdp_txq() macro that can be used for looping over xdp_rings array. Commit is not introducing any new functionality. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: introduce XDP_TX fallback pathMaciej Fijalkowski
Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_ring inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. One thing that is needed to be mentioned is a fact that the static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: unify xdp_rings accessesMaciej Fijalkowski
There has been a long lasting issue of improper xdp_rings indexing for XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index is mixed with smp_processor_id(), there could be a situation where Tx descriptors are produced onto XDP Tx ring, but tail is never bumped - for example pin a particular queue id to non-matching IRQ line. Address this problem by ignoring the user ring count setting and always initialize the xdp_rings array to be of num_possible_cpus() size. Then, always use the smp_processor_id() as an index to xdp_rings array. This provides serialization as at given time only a single softirq can run on a particular CPU. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: split ice_ring onto Tx/Rx separate structsMaciej Fijalkowski
While it was convenient to have a generic ring structure that served both Tx and Rx sides, next commits are going to introduce several Tx-specific fields, so in order to avoid hurting the Rx side, let's pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs. Rx ring could be handled by the old ice_ring which would reduce the code churn within this patch, but this would make things asymmetric. Make the union out of the ring container within ice_q_vector so that it is possible to iterate over newly introduced ice_tx_ring. Remove the @size as it's only accessed from control path and it can be calculated pretty easily. Change definitions of ice_update_ring_stats and ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be used for both Rx and Tx rings. Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In Rx ring xdp_rxq_info occupies its own cacheline, so it's the major difference now. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15ice: remove ring_active from ice_ringMaciej Fijalkowski
This field is dead and driver is not making any use of it. Simply remove it. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Fix failure to re-add LAN/RDMA Tx queuesBrett Creeley
Currently if the VSI is rebuilt/removed and the RDMA PF driver is active the RDMA Tx queue scheduler node configuration will not be cleaned up. This will cause the rebuild/re-add of the VSI to fail due to the software structures not being correctly cleaned up for the VSI index. Fix this by always calling ice_rm_vsi_rdma_cfg() for all VSI. If there are no RDMA scheduler nodes created, then there is no harm in calling ice_rm_vsi_rdma_cfg(). This change applies to all VSI types, so if RDMA support is added for other VSI types they will also get this change. Fixes: 348048e724a0 ("ice: Implement iidc operations") Signed-off-by: Brett Creeley <brett.creeley@intel.com> Tested-by: Jerzy Wiktor Jurkowski <jerzy.wiktor.jurkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-14ice: Implement support for SMA and U.FL on E810-TMaciej Machnikowski
Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and allow controlling them. E810-T adapters are equipped with: - 2 external bidirectional SMA connectors - 1 internal TX U.FL - 1 internal RX U.FL U.FL connectors share signal lines with the SMA connectors. The TX U.FL1 share the line with the SMA1 and the RX U.FL2 share line with the SMA2. This dependence is controlled by the ice_verify_pin_e810t. Additionally add support for the E810-T-based devices which don't use the SMA/U.FL controller. If the IO expander is not detected don't expose pins and use 2 predefined 1PPS input and output pins. Signed-off-by: Maciej Machnikowski <maciej.machnikowski@intel.com> Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: introduce new type of VSI for switchdevGrzegorz Nitka
New type of VSI has to be defined for switchdev control plane VSI. Number of allocated Tx and Rx queue has to be equal to amount of VFs, because each port representor should have one Tx and Rx queue. Also to not increase number of used irqs too much, control plane VSI uses only one q_vector and handle all queues in one irq. To allow handling all queues in one irq , new function to clean msix for eswitch was introduced. This function will schedule napi for each representor instead of scheduling it only for one like in normal clean irq function. Only one additional msix has to be requested. Always try to request it in ice_ena_msix_range function. Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: manage VSI antispoof and destination overrideMichal Swiatkowski
Implement functions to make setting VSI security config easier. Main function ice_update_security fills security section field and checks against error in updating VSI. Reset functions are responsible for correct filling config according to user expectations. This helper is needed because destination override is located in this section. Driver has to set this bit to allow strering Tx packet on VSI based on value in Tx descriptors. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07ice: Move devlink port to PF/VF structWojciech Drewek
Keeping devlink port inside VSI data structure causes some issues. Since VF VSI is released during reset that means that we have to unregister devlink port and register it again every time reset is triggered. With the new changes in devlink API it might cause deadlock issues. After calling devlink_port_register/devlink_port_unregister devlink API is going to lock rtnl_mutex. It's an issue when VF reset is triggered in netlink operation context (like setting VF MAC address or VLAN), because rtnl_lock is already taken by netlink. Another call of rtnl_lock from devlink API results in dead-lock. By moving devlink port to PF/VF we avoid creating/destroying it during reset. Since this patch, devlink ports are created during ice_probe, destroyed during ice_remove for PF and created during ice_repr_add, destroyed during ice_repr_rem for VF. Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-09-28ice: Add feature bitmap, helpers and a check for DSCPAnirudh Venkataramanan
DSCP a.k.a L3 QoS is only supported on certain devices. To enforce this, this patch introduces a bitmap of features and helper functions. The feature bitmap is set based on device IDs on driver init. Currently, DSCP is the only feature in this bitmap, but there will be more in the future. In the DCB netlink flow, check if the feature bit is set before exercising DSCP. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Gurucharan G <gurucharanx.g@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Trivial conflicts in net/can/isotp.c and tools/testing/selftests/net/mptcp/mptcp_connect.sh scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c to include/linux/ptp_clock_kernel.h in -next so re-apply the fix there. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-17ice: reduce scope of variablesPaul M Stillwell Jr
There are some places where the scope of a variable can be reduced so do that. Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>