summaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/intel/ice
AgeCommit message (Collapse)Author
2024-08-26ice: improve debug print for control queue messagesJacob Keller
The ice_debug_cq function is called to print debug data for a control queue descriptor in multiple places. This includes both before we send a message on a transmit queue, after the writeback completion of a message on the transmit queue, and when we receive a message on a receive queue. This function does not include data about *which* control queue the message is on, nor whether it was what we sent to the queue or what we received from the queue. Modify ice_debug_cq to take two extra parameters, a pointer to the control queue and a boolean indicating if this was a response or a command. Improve the debug messages by replacing "CQ CMD" with a string indicating which specific control queue (based on cq->qtype) and whether this was a command sent by the PF or a response from the queue. This helps make the log output easier to understand and consume when debugging. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-26ice: implement and use rd32_poll_timeout for ice_sq_done timeoutJacob Keller
The ice_sq_done function is used to check the control queue head register and determine whether or not the control queue processing is done. This function is called in a loop checking against jiffies for a specified timeout. The pattern of reading a register in a loop until a condition is true or a timeout is reached is a relatively common pattern. In fact, the kernel provides a read_poll_timeout function implementing this behavior in <linux/iopoll.h> Use of read_poll_timeout is preferred over directly coding these loops. However, using it in the ice driver is a bit more difficult because of the rd32 wrapper. Implement a rd32_poll_timeout wrapper based on read_poll_timeout. Refactor ice_sq_done to use rd32_poll_timeout, replacing the loop calling ice_sq_done in ice_sq_send_cmd. This simplifies the logic down to a single ice_sq_done() call. The implementation of rd32_poll_timeout uses microseconds for its timeout value, so update the CQ timeout macros used to be specified in microseconds units as well instead of using HZ for jiffies. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.h c948c0973df5 ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops") f2878cdeb754 ("bnxt_en: Add support to call FW to update a VNIC") Link: https://patch.msgid.link/20240822210125.1542769-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-21ice: Fix a 32bit bugDan Carpenter
BIT() is unsigned long but ->pu.flg_msk and ->pu.flg_val are u64 type. On 32 bit systems, unsigned long is a u32 and the mismatch between u32 and u64 will break things for the high 32 bits. Fixes: 9a4c07aaa0f5 ("ice: add parser execution main loop") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://patch.msgid.link/ddc231a8-89c1-4ff4-8704-9198bcb41f8d@stanley.mountain Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20ice: use internal pf id instead of function numberMichal Swiatkowski
Use always the same pf id in devlink port number. When doing pass-through the PF to VM bus info func number can be any value. Fixes: 2ae0aa4758b0 ("ice: Move devlink port to PF/VF struct") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Suggested-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-20ice: fix truesize operations for PAGE_SIZE >= 8192Maciej Fijalkowski
When working on multi-buffer packet on arch that has PAGE_SIZE >= 8192, truesize is calculated and stored in xdp_buff::frame_sz per each processed Rx buffer. This means that frame_sz will contain the truesize based on last received buffer, but commit 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling") assumed this value will be constant for each buffer, which breaks the page recycling scheme and mess up the way we update the page::page_offset. To fix this, let us work on constant truesize when PAGE_SIZE >= 8192 instead of basing this on size of a packet read from Rx descriptor. This way we can simplify the code and avoid calculating truesize per each received frame and on top of that when using xdp_update_skb_shared_info(), current formula for truesize update will be valid. This means ice_rx_frame_truesize() can be removed altogether. Furthermore, first call to it within ice_clean_rx_irq() for 4k PAGE_SIZE was redundant as xdp_buff::frame_sz is initialized via xdp_init_buff() in ice_vsi_cfg_rxq(). This should have been removed at the point where xdp_buff struct started to be a member of ice_rx_ring and it was no longer a stack based variable. There are two fixes tags as my understanding is that the first one exposed us to broken truesize and page_offset handling and then second introduced broken skb_shared_info update in ice_{construct,build}_skb(). Reported-and-tested-by: Luiz Capitulino <luizcap@redhat.com> Closes: https://lore.kernel.org/netdev/8f9e2a5c-fd30-4206-9311-946a06d031bb@redhat.com/ Fixes: 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling") Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-20ice: fix ICE_LAST_OFFSET formulaMaciej Fijalkowski
For bigger PAGE_SIZE archs, ice driver works on 3k Rx buffers. Therefore, ICE_LAST_OFFSET should take into account ICE_RXBUF_3072, not ICE_RXBUF_2048. Fixes: 7237f5b0dba4 ("ice: introduce legacy Rx flag") Suggested-by: Luiz Capitulino <luizcap@redhat.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-20ice: fix page reuse when PAGE_SIZE is over 8kMaciej Fijalkowski
Architectures that have PAGE_SIZE >= 8192 such as arm64 should act the same as x86 currently, meaning reuse of a page should only take place when no one else is busy with it. Do two things independently of underlying PAGE_SIZE: - store the page count under ice_rx_buf::pgcnt - then act upon its value vs ice_rx_buf::pagecnt_bias when making the decision regarding page reuse Fixes: 2b245cb29421 ("ice: Implement transmit and NAPI support") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13iavf: add support for offloading tc U32 cls filtersAhmed Zaki
Add support for offloading cls U32 filters. Only "skbedit queue_mapping" and "drop" actions are supported. Also, only "ip" and "802_3" tc protocols are allowed. The PF must advertise the VIRTCHNL_VF_OFFLOAD_TC_U32 capability flag. Since the filters will be enabled via the FD stage at the PF, a new type of FDIR filters is added and the existing list and state machine are used. The new filters can be used to configure flow directors based on raw (binary) pattern in the rx packet. Examples: 0. # tc qdisc add dev enp175s0v0 ingress 1. Redirect UDP from src IP 192.168.2.1 to queue 12: # tc filter add dev <dev> protocol ip ingress u32 \ match u32 0x45000000 0xff000000 at 0 \ match u32 0x00110000 0x00ff0000 at 8 \ match u32 0xC0A80201 0xffffffff at 12 \ match u32 0x00000000 0x00000000 at 24 \ action skbedit queue_mapping 12 skip_sw 2. Drop all ICMP: # tc filter add dev <dev> protocol ip ingress u32 \ match u32 0x45000000 0xff000000 at 0 \ match u32 0x00010000 0x00ff0000 at 8 \ match u32 0x00000000 0x00000000 at 24 \ action drop skip_sw 3. Redirect ICMP traffic from MAC 3c:fd:fe:a5:47:e0 to queue 7 (note proto: 802_3): # tc filter add dev <dev> protocol 802_3 ingress u32 \ match u32 0x00003CFD 0x0000ffff at 4 \ match u32 0xFEA547E0 0xffffffff at 8 \ match u32 0x08004500 0xffffff00 at 12 \ match u32 0x00000001 0x000000ff at 20 \ match u32 0x0000 0x0000 at 40 \ action skbedit queue_mapping 7 skip_sw Notes on matches: 1 - All intermediate fields that are needed to parse the correct PTYPE must be provided (in e.g. 3: Ethernet Type 0x0800 in MAC, IP version and IP length: 0x45 and protocol: 0x01 (ICMP)). 2 - The last match must provide an offset that guarantees all required headers are accounted for, even if the last header is not matched. For example, in #2, the last match is 4 bytes at offset 24 starting from IP header, so the total is 14 (MAC) + 24 + 4 = 42, which is the sum of MAC+IP+ICMP headers. Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: enable FDIR filters from raw binary patterns for VFsJunfeng Guo
Enable VFs to create FDIR filters from raw binary patterns. The corresponding processes for raw flow are added in the Parse / Create / Destroy stages. Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: add method to disable FDIR SWAP optionJunfeng Guo
The SWAP Flag in the FDIR Programming Descriptor doesn't work properly, it is always set and cannot be unset (hardware bug). Thus, add a method to effectively disable the FDIR SWAP option by setting the FDSWAP instead of FDINSET registers. Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: add API for parser profile initializationJunfeng Guo
Add API ice_parser_profile_init() to init a parser profile based on a parser result and a mask buffer. The ice_parser_profile struct is used by the low level FXP engine to create HW profile/field vectors. Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: add UDP tunnels support to the parserJunfeng Guo
Add support for the vxlan, geneve, ecpri UDP tunnels through the following APIs: - ice_parser_vxlan_tunnel_set() - ice_parser_geneve_tunnel_set() - ice_parser_ecpri_tunnel_set() Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: support turning on/off the parser's double vlan modeJunfeng Guo
Add API ice_parser_dvm_set() to support turning on/off the parser's double vlan mode. Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: add parser execution main loopJunfeng Guo
Implement the core work of the runtime parser via: - ice_parser_rt_execute() - ice_parser_rt_reset() - ice_parser_rt_pkt_buf_set() Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: add parser internal helper functionsJunfeng Guo
Add the following internal helper functions: - ice_bst_tcam_match(): to perform ternary match on boost TCAM. - ice_pg_cam_match(): to perform parse graph key match in cam table. - ice_pg_nm_cam_match(): to perform parse graph key no match in cam table. - ice_ptype_mk_tcam_match(): to perform ptype markers match in tcam table. - ice_flg_redirect(): to redirect parser flags to packet flags. - ice_xlt_kb_flag_get(): to aggregate 64 bit packet flag into 16 bit key builder flags. Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: add debugging functions for the parser sectionsJunfeng Guo
Add debug for all parser sections. Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: parse and init various DDP parser sectionsJunfeng Guo
Parse the following DDP sections: - ICE_SID_RXPARSER_IMEM into an array of struct ice_imem_item - ICE_SID_RXPARSER_METADATA_INIT into an array of struct ice_metainit_item - ICE_SID_RXPARSER_CAM or ICE_SID_RXPARSER_PG_SPILL into an array of struct ice_pg_cam_item - ICE_SID_RXPARSER_NOMATCH_CAM or ICE_SID_RXPARSER_NOMATCH_SPILL into an array of struct ice_pg_nm_cam_item - ICE_SID_RXPARSER_CAM into an array of ice_bst_tcam_item - ICE_SID_LBL_RXPARSER_TMEM into an array of ice_lbl_item - ICE_SID_RXPARSER_MARKER_PTYPE into an array of ice_ptype_mk_tcam_item - ICE_SID_RXPARSER_MARKER_GRP into an array of ice_mk_grp_item - ICE_SID_RXPARSER_PROTO_GRP into an array of ice_proto_grp_item - ICE_SID_RXPARSER_FLAG_REDIR into an array of ice_flg_rd_item - ICE_SID_XLT_KEY_BUILDER_SW, ICE_SID_XLT_KEY_BUILDER_ACL, ICE_SID_XLT_KEY_BUILDER_FD and ICE_SID_XLT_KEY_BUILDER_RSS into struct ice_xlt_kb Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-13ice: add parser create and destroy skeletonJunfeng Guo
Add new parser module which can parse a packet in binary and generate information like ptype, protocol/offset pairs and flags which can be later used to feed the FXP profile creation directly. Add skeleton of the create and destroy APIs: ice_parser_create() ice_parser_destroy() Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Qi Zhang <qi.z.zhang@intel.com> Signed-off-by: Junfeng Guo <junfeng.guo@intel.com> Co-developed-by: Ahmed Zaki <ahmed.zaki@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-12ethtool: rss: don't report key if device doesn't support itJakub Kicinski
marvell/otx2 and mvpp2 do not support setting different keys for different RSS contexts. Contexts have separate indirection tables but key is shared with all other contexts. This is likely fine, indirection table is the most important piece. Don't report the key-related parameters from such drivers. This prevents driver-errors, e.g. otx2 always writes the main key, even when user asks to change per-context key. The second reason is that without this change tracking the keys by the core gets complicated. Even if the driver correctly reject setting key with rss_context != 0, change of the main key would have to be reflected in the XArray for all additional contexts. Since the additional contexts don't have their own keys not including the attributes (in Netlink speak) seems intuitive. ethtool CLI seems to deal with it just fine. Having to set the flag in majority of the drivers is a bit tedious but not reporting the key is a safer default. Reviewed-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-07ice: Fix incorrect assigns of FEC countsMateusz Polchlopek
Commit ac21add2540e ("ice: Implement driver functionality to dump fec statistics") introduces obtaining FEC correctable and uncorrectable stats per netdev in ICE driver. Unfortunately the assignment of values to fec_stats structure has been done incorrectly. This commit fixes the assignments. Fixes: ac21add2540e ("ice: Implement driver functionality to dump fec statistics") Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-07ice: Skip PTP HW writes during PTP reset procedureGrzegorz Nitka
Block HW write access for the driver while the device is in reset to avoid potential race condition and access to the PTP HW in non-nominal state which could lead to undesired effects Fixes: 4aad5335969f ("ice: add individual interrupt allocation") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Co-developed-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com> Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-08-07ice: Fix reset handlerGrzegorz Nitka
Synchronize OICR IRQ when preparing for reset to avoid potential race conditions between the reset procedure and OICR Fixes: 4aad5335969f ("ice: add individual interrupt allocation") Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com> Signed-off-by: Sergey Temerkhanov <sergey.temerkhanov@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: xsk: fix txq interrupt mappingMaciej Fijalkowski
ice_cfg_txq_interrupt() internally handles XDP Tx ring. Do not use ice_for_each_tx_ring() in ice_qvec_cfg_msix() as this causing us to treat XDP ring that belongs to queue vector as Tx ring and therefore misconfiguring the interrupts. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: add missing WRITE_ONCE when clearing ice_rx_ring::xdp_progMaciej Fijalkowski
It is read by data path and modified from process context on remote cpu so it is needed to use WRITE_ONCE to clear the pointer. Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: improve updating ice_{t,r}x_ring::xsk_poolMaciej Fijalkowski
xsk_buff_pool pointers that ice ring structs hold are updated via ndo_bpf that is executed in process context while it can be read by remote CPU at the same time within NAPI poll. Use synchronize_net() after pointer update and {READ,WRITE}_ONCE() when working with mentioned pointer. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: toggle netif_carrier when setting up XSK poolMaciej Fijalkowski
This so we prevent Tx timeout issues. One of conditions checked on running in the background dev_watchdog() is netif_carrier_ok(), so let us turn it off when we disable the queues that belong to a q_vector where XSK pool is being configured. Turn carrier on in ice_qp_ena() only when ice_get_link_status() tells us that physical link is up. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: modify error handling when setting XSK pool in ndo_bpfMaciej Fijalkowski
Don't bail out right when spotting an error within ice_qp_{dis,ena}() but rather track error and go through whole flow of disabling and enabling queue pair. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: replace synchronize_rcu with synchronize_netMaciej Fijalkowski
Given that ice_qp_dis() is called under rtnl_lock, synchronize_net() can be called instead of synchronize_rcu() so that XDP rings can finish its job in a faster way. Also let us do this as earlier in XSK queue disable flow. Additionally, turn off regular Tx queue before disabling irqs and NAPI. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: don't busy wait for Rx queue disable in ice_qp_dis()Maciej Fijalkowski
When ice driver is spammed with multiple xdpsock instances and flow control is enabled, there are cases when Rx queue gets stuck and unable to reflect the disable state in QRX_CTRL register. Similar issue has previously been addressed in commit 13a6233b033f ("ice: Add support to enable/disable all Rx queues before waiting"). To workaround this, let us simply not wait for a disabled state as later patch will make sure that regardless of the encountered error in the process of disabling a queue pair, the Rx queue will be enabled. Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-29ice: respect netif readiness in AF_XDP ZC related ndo'sMichal Kubiak
Address a scenario in which XSK ZC Tx produces descriptors to XDP Tx ring when link is either not yet fully initialized or process of stopping the netdev has already started. To avoid this, add checks against carrier readiness in ice_xsk_wakeup() and in ice_xmit_zc(). One could argue that bailing out early in ice_xsk_wakeup() would be sufficient but given the fact that we produce Tx descriptors on behalf of NAPI that is triggered for Rx traffic, the latter is also needed. Bringing link up is an asynchronous event executed within ice_service_task so even though interface has been brought up there is still a time frame where link is not yet ok. Without this patch, when AF_XDP ZC Tx is used simultaneously with stack Tx, Tx timeouts occur after going through link flap (admin brings interface down then up again). HW seem to be unable to transmit descriptor to the wire after HW tail register bump which in turn causes bit __QUEUE_STATE_STACK_XOFF to be set forever as netdev_tx_completed_queue() sees no cleaned bytes on the input. Fixes: 126cdfe1007a ("ice: xsk: Improve AF_XDP ZC Tx and use batching API") Fixes: 2d4238f55697 ("ice: Add support for AF_XDP") Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Tested-by: Chandan Kumar Rout <chandanx.rout@intel.com> (A Contingent Worker at Intel) Signed-off-by: Michal Kubiak <michal.kubiak@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-25Merge tag 'net-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. A lot of networking people were at a conference last week, busy catching COVID, so relatively short PR. Current release - regressions: - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP Current release - new code bugs: - l2tp: protect session IDR and tunnel session list with one lock, make sure the state is coherent to avoid a warning - eth: bnxt_en: update xdp_rxq_info in queue restart logic - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field Previous releases - regressions: - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len, the field reuses previously un-validated pad Previous releases - always broken: - tap/tun: drop short frames to prevent crashes later in the stack - eth: ice: add a per-VF limit on number of FDIR filters - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash" * tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) tun: add missing verification for short frame tap: add missing verification for short frame mISDN: Fix a use after free in hfcmulti_tx() gve: Fix an edge case for TSO skb validity check bnxt_en: update xdp_rxq_info in queue restart logic tcp: process the 3rd ACK with sk_socket for TFO/MPTCP selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling MAINTAINERS: make Breno the netconsole maintainer MAINTAINERS: Update bonding entry net: nexthop: Initialize all fields in dumped nexthops net: stmmac: Correct byte order of perfect_match selftests: forwarding: skip if kernel not support setting bridge fdb learning limit tipc: Return non-zero value from tipc_udp_addr2str() on error netfilter: nft_set_pipapo_avx2: disable softinterrupts ice: Fix recipe read procedure ice: Add a per-VF limit on number of FDIR filters net: bonding: correctly annotate RCU in bond_should_notify_peers() ...
2024-07-25Merge tag 'driver-core-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the big set of driver core changes for 6.11-rc1. Lots of stuff in here, with not a huge diffstat, but apis are evolving which required lots of files to be touched. Highlights of the changes in here are: - platform remove callback api final fixups (Uwe took many releases to get here, finally!) - Rust bindings for basic firmware apis and initial driver-core interactions. It's not all that useful for a "write a whole driver in rust" type of thing, but the firmware bindings do help out the phy rust drivers, and the driver core bindings give a solid base on which others can start their work. There is still a long way to go here before we have a multitude of rust drivers being added, but it's a great first step. - driver core const api changes. This reached across all bus types, and there are some fix-ups for some not-common bus types that linux-next and 0-day testing shook out. This work is being done to help make the rust bindings more safe, as well as the C code, moving toward the end-goal of allowing us to put driver structures into read-only memory. We aren't there yet, but are getting closer. - minor devres cleanups and fixes found by code inspection - arch_topology minor changes - other minor driver core cleanups All of these have been in linux-next for a very long time with no reported problems" * tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits) ARM: sa1100: make match function take a const pointer sysfs/cpu: Make crash_hotplug attribute world-readable dio: Have dio_bus_match() callback take a const * zorro: make match function take a const pointer driver core: module: make module_[add|remove]_driver take a const * driver core: make driver_find_device() take a const * driver core: make driver_[create|remove]_file take a const * firmware_loader: fix soundness issue in `request_internal` firmware_loader: annotate doctests as `no_run` devres: Correct code style for functions that return a pointer type devres: Initialize an uninitialized struct member devres: Fix memory leakage caused by driver API devm_free_percpu() devres: Fix devm_krealloc() wasting memory driver core: platform: Switch to use kmemdup_array() driver core: have match() callback in struct bus_type take a const * MAINTAINERS: add Rust device abstractions to DRIVER CORE device: rust: improve safety comments MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER firmware: rust: improve safety comments ...
2024-07-23ice: Fix recipe read procedureWojciech Drewek
When ice driver reads recipes from firmware information about need_pass_l2 and allow_pass_l2 flags is not stored correctly. Those flags are stored as one bit each in ice_sw_recipe structure. Because of that, the result of checking a flag has to be casted to bool. Note that the need_pass_l2 flag currently works correctly, because it's stored in the first bit. Fixes: bccd9bce29e0 ("ice: Add guard rule when creating FDB in switchdev") Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-23ice: Add a per-VF limit on number of FDIR filtersAhmed Zaki
While the iavf driver adds a s/w limit (128) on the number of FDIR filters that the VF can request, a malicious VF driver can request more than that and exhaust the resources for other VFs. Add a similar limit in ice. CC: stable@vger.kernel.org Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Suggested-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki@intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-16Merge tag 'net-next-6.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Not much excitement - a handful of large patchsets (devmem among them) did not make it in time. Core & protocols: - Use local_lock in addition to local_bh_disable() to protect per-CPU resources in networking, a step closer for local_bh_disable() not to act as a big lock on PREEMPT_RT - Use flex array for netdevice priv area, ensure its cache alignment - Add a sysctl knob to allow user to specify a default rto_min at socket init time. Bit of a big hammer but multiple companies were independently carrying such patch downstream so clearly it's useful - Support scheduling transmission of packets based on CLOCK_TAI - Un-pin TCP TIMEWAIT timer to avoid it firing on CPUs later cordoned off using cpusets - Support multiple L2TPv3 UDP tunnels using the same 5-tuple address - Allow configuration of multipath hash seed, to both allow synchronizing hashing of two routers, and preventing partial accidental sync - Improve TCP compliance with RFC 9293 for simultaneous connect() - Support sending NAT keepalives in IPsec ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it - Support sending supervision HSR frames with MAC addresses stored in ProxyNodeTable when RedBox (i.e. HSR-SAN) is enabled - Introduce IPPROTO_SMC for selecting SMC when socket is created - Allow UDP GSO transmit from devices with no checksum offload - openvswitch: add packet sampling via psample, separating the sampled traffic from "upcall" packets sent to user space for forwarding - nf_tables: shrink memory consumption for transaction objects Things we sprinkled into general kernel code: - Power Sequencing subsystem (used by Qualcomm Bluetooth driver for QCA6390) [ Already merged separately - Linus ] - Add IRQ information in sysfs for auxiliary bus - Introduce guard definition for local_lock - Add aligned flavor of __cacheline_group_{begin, end}() markings for grouping fields in structures BPF: - Notify user space (via epoll) when a struct_ops object is getting detached/unregistered - Add new kfuncs for a generic, open-coded bits iterator - Enable BPF programs to declare arrays of kptr, bpf_rb_root, and bpf_list_head - Support resilient split BTF which cuts down on duplication and makes BTF as compact as possible WRT BTF from modules - Add support for dumping kfunc prototypes from BTF which enables both detecting as well as dumping compilable prototypes for kfuncs - riscv64 BPF JIT improvements in particular to add 12-argument support for BPF trampolines and to utilize bpf_prog_pack for the latter - Add the capability to offload the netfilter flowtable in XDP layer through kfuncs Driver API: - Allow users to configure IRQ tresholds between which automatic IRQ moderation can choose - Expand Power Sourcing (PoE) status with power, class and failure reason. Support setting power limits - Track additional RSS contexts in the core, make sure configuration changes don't break them - Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths - Support updating firmware on SFP modules Tests and tooling: - mptcp: use net/lib.sh to manage netns - TCP-AO and TCP-MD5: replace debug prints used by tests with tracepoints - openvswitch: make test self-contained (don't depend on OvS CLI tools) Drivers: - Ethernet high-speed NICs: - Broadcom (bnxt): - increase the max total outstanding PTP TX packets to 4 - add timestamping statistics support - implement netdev_queue_mgmt_ops - support new RSS context API - Intel (100G, ice, idpf): - implement FEC statistics and dumping signal quality indicators - support E825C products (with 56Gbps PHYs) - nVidia/Mellanox: - support HW-GRO - mlx4/mlx5: support per-queue statistics via netlink - obey the max number of EQs setting in sub-functions - AMD/Solarflare: - support new RSS context API - AMD/Pensando: - ionic: rework fix for doorbell miss to lower overhead and skip it on new HW - Wangxun: - txgbe: support Flow Director perfect filters - Ethernet NICs consumer, embedded and virtual: - Add driver for Tehuti Networks TN40xx chips - Add driver for Meta's internal NIC chips - Add driver for Ethernet MAC on Airoha EN7581 SoCs - Add driver for Renesas Ethernet-TSN devices - Google cloud vNIC: - flow steering support - Microsoft vNIC: - support page sizes other than 4KB on ARM64 - vmware vNIC: - support latency measurement (update to version 9) - VirtIO net: - support for Byte Queue Limits - support configuring thresholds for automatic IRQ moderation - support for AF_XDP Rx zero-copy - Synopsys (stmmac): - support for STM32MP13 SoC - let platforms select the right PCS implementation - TI: - icssg-prueth: add multicast filtering support - icssg-prueth: enable PTP timestamping and PPS - Renesas: - ravb: improve Rx performance 30-400% by using page pool, theaded NAPI and timer-based IRQ coalescing - ravb: add MII support for R-Car V4M - Cadence (macb): - macb: add ARP support to Wake-On-LAN - Cortina: - use phylib for RX and TX pause configuration - Ethernet switches: - nVidia/Mellanox: - support configuration of multipath hash seed - report more accurate max MTU - use page_pool to improve Rx performance - MediaTek: - mt7530: add support for bridge port isolation - Qualcomm: - qca8k: add support for bridge port isolation - Microchip: - lan9371/2: add 100BaseTX PHY support - NXP: - vsc73xx: implement VLAN operations - Ethernet PHYs: - aquantia: enable support for aqr115c - aquantia: add support for PHY LEDs - realtek: add support for rtl8224 2.5Gbps PHY - xpcs: add memory-mapped device support - add BroadR-Reach link mode and support in Broadcom's PHY driver - CAN: - add document for ISO 15765-2 protocol support - mcp251xfd: workaround for erratum DS80000789E, use timestamps to catch when device returns incorrect FIFO status - WiFi: - mac80211/cfg80211: - parse Transmit Power Envelope (TPE) data in mac80211 instead of in drivers - improvements for 6 GHz regulatory flexibility - multi-link improvements - support multiple radios per wiphy - remove DEAUTH_NEED_MGD_TX_PREP flag - Intel (iwlwifi): - bump FW API to 91 for BZ/SC devices - report 64-bit radiotap timestamp - enable P2P low latency by default - handle Transmit Power Envelope (TPE) advertised by AP - remove support for older FW for new devices - fast resume (keeping the device configured) - mvm: re-enable Multi-Link Operation (MLO) - aggregation (A-MSDU) optimizations - MediaTek (mt76): - mt7925 Multi-Link Operation (MLO) support - Qualcomm (ath10k): - LED support for various chipsets - Qualcomm (ath12k): - remove unsupported Tx monitor handling - support channel 2 in 6 GHz band - support Spatial Multiplexing Power Save (SMPS) in 6 GHz band - supprt multiple BSSID (MBSSID) and Enhanced Multi-BSSID Advertisements (EMA) - support dynamic VLAN - add panic handler for resetting the firmware state - DebugFS support for datapath statistics - WCN7850: support for Wake on WLAN - Microchip (wilc1000): - read MAC address during probe to make it visible to user space - suspend/resume improvements - TI (wl18xx): - support newer firmware versions - RealTek (rtw89): - preparation for RTL8852BE-VT support - Wake on WLAN support for WiFi 6 chips - 36-bit PCI DMA support - RealTek (rtlwifi): - RTL8192DU support - Broadcom (brcmfmac): - Management Frame Protection support (to enable WPA3) - Bluetooth: - qualcomm: use the power sequencer for QCA6390 - btusb: mediatek: add ISO data transmission functions - hci_bcm4377: add BCM4388 support - btintel: add support for BlazarU core - btintel: add support for Whale Peak2 - btnxpuart: add support for AW693 A1 chipset - btnxpuart: add support for IW615 chipset - btusb: add Realtek RTL8852BE support ID 0x13d3:0x3591" * tag 'net-next-6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1589 commits) eth: fbnic: Fix spelling mistake "tiggerring" -> "triggering" tcp: Replace strncpy() with strscpy() wifi: ath12k: fix build vs old compiler tcp: Don't access uninit tcp_rsk(req)->ao_keyid in tcp_create_openreq_child(). eth: fbnic: Write the TCAM tables used for RSS control and Rx to host eth: fbnic: Add L2 address programming eth: fbnic: Add basic Rx handling eth: fbnic: Add basic Tx handling eth: fbnic: Add link detection eth: fbnic: Add initial messaging to notify FW of our presence eth: fbnic: Implement Rx queue alloc/start/stop/free eth: fbnic: Implement Tx queue alloc/start/stop/free eth: fbnic: Allocate a netdevice and napi vectors with queues eth: fbnic: Add FW communication mechanism eth: fbnic: Add message parsing for FW messages eth: fbnic: Add register init to set PCIe/Ethernet device config eth: fbnic: Allocate core device specific structures and devlink interface eth: fbnic: Add scaffolding for Meta's NIC driver PCI: Add Meta Platforms vendor ID net/sched: cls_flower: propagate tca[TCA_OPTIONS] to NL_REQ_ATTR_CHECK ...
2024-07-15Merge branch 'net-make-timestamping-selectable'Jakub Kicinski
First part of "net: Make timestamping selectable" from Kory Maincent. Change the driver-facing type already to lower rebasing pain. Link: https://lore.kernel.org/20240709-feature_ptp_netnext-v17-0-b5317f50df2a@bootlin.com/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-15net: Add struct kernel_ethtool_ts_infoKory Maincent
In prevision to add new UAPI for hwtstamp we will be limited to the struct ethtool_ts_info that is currently passed in fixed binary format through the ETHTOOL_GET_TS_INFO ethtool ioctl. It would be good if new kernel code already started operating on an extensible kernel variant of that structure, similar in concept to struct kernel_hwtstamp_config vs struct hwtstamp_config. Since struct ethtool_ts_info is in include/uapi/linux/ethtool.h, here we introduce the kernel-only structure in include/linux/ethtool.h. The manual copy is then made in the function called by ETHTOOL_GET_TS_INFO. Acked-by: Shannon Nelson <shannon.nelson@amd.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20240709-feature_ptp_netnext-v17-6-b5317f50df2a@bootlin.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: Switch API optimizations Marcin Szycik says: Optimize the process of creating a recipe in the switch block by removing duplicate switch ID words and changing how result indexes are fitted into recipes. In many cases this can decrease the number of recipes required to add a certain set of rules, potentially allowing a more varied set of rules to be created. Total rule count will also increase, since less words will be left unused/wasted. There are only 64 rules available in total, so every one counts. After this modification, many fields and some structs became unused or were simplified, resulting in overall simpler implementation. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: Add tracepoint for adding and removing switch rules ice: Remove unused members from switch API ice: Optimize switch recipe creation ice: remove unused recipe bookkeeping data ice: Simplify bitmap setting in adding recipe ice: Remove reading all recipes before adding a new one ice: Remove unused struct ice_prot_lkup_ext members ==================== Link: https://patch.msgid.link/20240711181312.2019606-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-13Merge tag 'timers-v6.11-rc1' of ↵Thomas Gleixner
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clocksource/event driver updates from Daniel Lezcano: - Remove unnecessary local variables initialization as they will be initialized in the code path anyway right after on the ARM arch timer and the ARM global timer (Li kunyu) - Fix a race condition in the interrupt leading to a deadlock on the SH CMT driver. Note that this fix was not tested on the platform using this timer but the fix seems reasonable enough to be picked confidently (Niklas Söderlund) - Increase the rating of the gic-timer and use the configured width clocksource register on the MIPS architecture (Jiaxun Yang) - Add the DT bindings for the TMU on the Renesas platforms (Geert Uytterhoeven) - Add the DT bindings for the SOPHGO SG2002 clint on RiscV (Thomas Bonnefille) - Add the rtl-otto timer driver along with the DT bindings for the Realtek platform (Chris Packham) Link: https://lore.kernel.org/all/91cd05de-4c5d-4242-a381-3b8a4fe6a2a2@linaro.org
2024-07-11ice: remove eswitch rebuildMichal Swiatkowski
Since the port representors are added one by one there is no need to do eswitch rebuild. Each port representor is detached and attached in VF reset path. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-11ice: Add support for devlink local_forwarding paramPawel Kaminski
Add support for driver-specific devlink local_forwarding param. Supported values are "enabled", "disabled" and "prioritized". Default configuration is set to "enabled". Add documentation in networking/devlink/ice.rst. In previous generations of Intel NICs the transmit scheduler was only limited by PCIe bandwidth when scheduling/assigning hairpin-bandwidth between VFs. Changes to E810 HW design introduced scheduler limitation, so that available hairpin-bandwidth is bound to external port speed. In order to address this limitation and enable NFV services such as "service chaining" a knob to adjust the scheduler config was created. Driver can send a configuration message to the FW over admin queue and internal FW logic will reconfigure HW to prioritize and add more BW to VF to VF traffic. An end result, for example, 10G port will no longer limit hairpin-bandwidth to 10G and much higher speeds can be achieved. Devlink local_forwarding param set to "prioritized" enables higher hairpin-bandwitdh on related PFs. Configuration is applicable only to 8x10G and 4x25G cards. Changing local_forwarding configuration will trigger CORER reset in order to take effect. Example command to change current value: devlink dev param set pci/0000:b2:00.3 name local_forwarding \ value prioritized \ cmode runtime Co-developed-by: Michal Wilczynski <michal.wilczynski@intel.com> Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Pawel Kaminski <pawel.kaminski@intel.com> Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-11net: intel: Remove MODULE_AUTHORsTony Nguyen
We are moving away from the Sourceforge email address. Rather than removing or updating the email for the affected entries, remove the MODULE_AUTHOR altogether as its usage is incorrect [1]. Link: https://lore.kernel.org/netdev/20200626115236.7f36d379@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/ [1] Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Acked-by: Alexander Lobakin <aleksander.lobakin@intel.com> # libeth, libie Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-11ice: Add tracepoint for adding and removing switch rulesMarcin Szycik
Track the number of rules and recipes added to switch. Add a tracepoint to ice_aq_sw_rules(), which shows both rule and recipe count. This information can be helpful when designing a set of rules to program to the hardware, as it shows where the practical limit is. Actual limits are known (64 recipes, 32k rules), but it's hard to translate these values to how many rules the *user* can actually create, because of extra metadata being implicitly added, and recipe/rule chaining. Chaining combines several recipes/rules to create a larger recipe/rule, so one large rule added by the user might actually consume multiple rules from hardware perspective. Rule counter is simply incremented/decremented in ice_aq_sw_rules(), since all rules are added or removed via it. Counting recipes is harder, as recipes can't be removed (only overwritten). Recipes added via ice_aq_add_recipe() could end up being unused, when there is an error in later stages of rule creation. Instead, track the allocation and freeing of recipes, which should reflect the actual usage of recipes (if something fails after recipe(s) were created, caller should free them). Also, a number of recipes are loaded from NVM by default - initialize the recipe counter with the number of these recipes on switch initialization. Example configuration: cd /sys/kernel/tracing echo function > current_tracer echo ice_aq_sw_rules > set_ftrace_filter echo ice_aq_sw_rules > set_event echo 1 > tracing_on cat trace Example output: tc-4097 [069] ...1. 787.595536: ice_aq_sw_rules <-ice_rem_adv_rule tc-4097 [069] ..... 787.595705: ice_aq_sw_rules: rules=9 recipes=15 tc-4098 [057] ...1. 787.652033: ice_aq_sw_rules <-ice_add_adv_rule tc-4098 [057] ..... 787.652201: ice_aq_sw_rules: rules=10 recipes=16 Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-11ice: Remove unused members from switch APIMarcin Szycik
Remove several members of struct ice_sw_recipe and struct ice_prot_lkup_ext. Remove struct ice_recp_grp_entry and struct ice_pref_recipe_group, since they are now unused as well. All of the deleted members were only written to and never read, so it's pointless to keep them. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-11ice: Optimize switch recipe creationMarcin Szycik
Currently when creating switch recipes, switch ID is always added as the first word in every recipe. There are only 5 words in a recipe, so one word is always wasted. This is also true for the last recipe, which stores result indexes (in case of chain recipes). Therefore the maximum usable length of a chain recipe is 4 * 4 = 16 words. 4 words in a recipe, 4 recipes that can be chained (using a 5th one for result indexes). Current max size chained recipe: 0: smmmm 1: smmmm 2: smmmm 3: smmmm 4: srrrr Where: s - switch ID m - regular match (e.g. ipv4 src addr, udp dst port, etc.) r - result index Switch ID does not actually need to be present in every recipe, only in one of them (in case of chained recipe). This frees up to 8 extra words: 3 from recipes in the middle (because first recipe still needs to have switch ID), and 5 from one extra recipe (because now the last recipe also does not have switch ID, so it can chain 1 more recipe). Max size chained recipe after changes: 0: smmmm 1: Mmmmm 2: Mmmmm 3: Mmmmm 4: MMMMM 5: Rrrrr Extra usable words available after this change are highlighted with capital letters. Changing how switch ID is added is not straightforward, because it's not a regular lookup. Its FV index and mask can't be determined based on protocol + offset pair read from package and instead need to be added manually. Additionally, change how result indexes are added. Currently they are always inserted in a new recipe at the end. Example for 13 words, (with above optimization, switch ID being one of the words): 0: smmmm 1: mmmmm 2: mmmxx 3: rrrxx Where: x - unused word In this and some other cases, the result indexes can be moved just after last matches because there are unused words, saving one recipe. Example for 13 words after both optimizations: 0: smmmm 1: mmmmm 2: mmmrr Note how one less result index is needed in this case, because the last recipe does not need to "link" to itself. There are cases when adding an additional recipe for result indexes cannot be avoided. In that cases result indexes are all put in the last recipe. Example for 14 words after both optimizations: 0: smmmm 1: mmmmm 2: mmmmx 3: rrrxx With these two changes, recipes/rules are more space efficient, allowing more to be created in total. Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-11ice: remove unused recipe bookkeeping dataMichal Swiatkowski
Remove root_buf from recipe struct. Its only usage was in ice_find_recp(), where if recipe had an inverse action, it was skipped, but actually the driver never adds inverse actions, so effectively it was pointless. Without root_buf, the recipe data element in ice_add_sw_recipe() does not need to be persistent and can also be automatically deallocated with __free, which nicely simplifies unroll. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-11ice: Simplify bitmap setting in adding recipeMichal Swiatkowski
Remove unnecessary size checks when copying bitmaps in ice_add_sw_recipe() and replace them with compile time assert. Check if the bitmaps are equal size, as they are copied both ways. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Co-developed-by: Marcin Szycik <marcin.szycik@linux.intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-11ice: Remove reading all recipes before adding a new oneMichal Swiatkowski
The content of the first read recipe is used as a template when adding a recipe. It isn't needed - only prune index is directly set from there. Set it in the code instead. Also, now there's no need to set rid and lookup indexes to 0, as the whole recipe buffer is initialized to 0. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-07-11ice: Remove unused struct ice_prot_lkup_ext membersMarcin Szycik
Remove field_off as it's never used. Remove done bitmap, as its value is only checked and never assigned. Reusing sub-recipes while creating new root recipes is currently not supported in the driver. Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>