Age | Commit message (Collapse) | Author |
|
Add the support for 200G phy speeds and the mapping for their
advertisement in link. Add the new PHY type bits for AQ command, as
needed for 200G E830 controllers.
Signed-off-by: Alice Michael <alice.michael@intel.com>
Co-developed-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Tony Brelinski <tony.brelinski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231025214157.1222758-3-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Get ahead of the game and fix all the -Wformat=2 noted warnings in the
intel drivers directory.
There are one set of i40e and iavf warnings I couldn't figure out how to
fix because the driver is already using vsnprintf without an explicit
"const char *" format string.
Tested with both gcc-12 and clang-15. I found gcc-12 runs clean after
this series but clang-15 is a little worried about the vsnprintf lines.
summary of warnings:
drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c:148:34: warning: format string is not a string literal [-Wformat-nonliteral]
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1416:24: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c:1421:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/igc/igc_ethtool.c:776:24: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/igc/igc_ethtool.c:779:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/iavf/iavf_ethtool.c:199:34: warning: format string is not a string literal [-Wformat-nonliteral]
drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/igb/igb_ethtool.c:2360:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/igb/igb_ethtool.c:2363:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:208:34: warning: format string is not a string literal [-Wformat-nonliteral]
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2515:23: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/i40e/i40e_ethtool.c:2519:23: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ice/ice_ethtool.c:1064:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ice/ice_ethtool.c:1084:6: note: treat the string as an argument to avoid this
drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: warning: format string is not a string literal (potentially insecure) [-Wformat-security]
drivers/net/ethernet/intel/ice/ice_ethtool.c:1100:24: note: treat the string as an argument to avoid this
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://lore.kernel.org/r/20231017190411.2199743-3-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Refactor ice_get_link_ksettings to using forced speed to link modes
mapping.
Suggested-by : Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The PHC clock id used to be moved between PFs using FW admin queue
shared parameters - move the implementation to auxiliary bus.
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Calling ethtool during reload can lead to call trace, because VSI isn't
configured for some time, but netdev is alive.
To fix it add rtnl lock for VSI deconfig and config. Set ::num_q_vectors
to 0 after freeing and add a check for ::tx/rx_rings in ring related
ethtool ops.
Add proper unroll of filters in ice_start_eth().
Reproduction:
$watch -n 0.1 -d 'ethtool -g enp24s0f0np0'
$devlink dev reload pci/0000:18:00.0 action driver_reinit
Call trace before fix:
[66303.926205] BUG: kernel NULL pointer dereference, address: 0000000000000000
[66303.926259] #PF: supervisor read access in kernel mode
[66303.926286] #PF: error_code(0x0000) - not-present page
[66303.926311] PGD 0 P4D 0
[66303.926332] Oops: 0000 [#1] PREEMPT SMP PTI
[66303.926358] CPU: 4 PID: 933821 Comm: ethtool Kdump: loaded Tainted: G OE 6.4.0-rc5+ #1
[66303.926400] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.00.01.0014.070920180847 07/09/2018
[66303.926446] RIP: 0010:ice_get_ringparam+0x22/0x50 [ice]
[66303.926649] Code: 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 48 8b 87 c0 09 00 00 c7 46 04 e0 1f 00 00 c7 46 10 e0 1f 00 00 48 8b 50 20 <48> 8b 12 0f b7 52 3a 89 56 14 48 8b 40 28 48 8b 00 0f b7 40 58 48
[66303.926722] RSP: 0018:ffffad40472f39c8 EFLAGS: 00010246
[66303.926749] RAX: ffff98a8ada05828 RBX: ffff98a8c46dd060 RCX: ffffad40472f3b48
[66303.926781] RDX: 0000000000000000 RSI: ffff98a8c46dd068 RDI: ffff98a8b23c4000
[66303.926811] RBP: ffffad40472f3b48 R08: 00000000000337b0 R09: 0000000000000000
[66303.926843] R10: 0000000000000001 R11: 0000000000000100 R12: ffff98a8b23c4000
[66303.926874] R13: ffff98a8c46dd060 R14: 000000000000000f R15: ffffad40472f3a50
[66303.926906] FS: 00007f6397966740(0000) GS:ffff98b390900000(0000) knlGS:0000000000000000
[66303.926941] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66303.926967] CR2: 0000000000000000 CR3: 000000011ac20002 CR4: 00000000007706e0
[66303.926999] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[66303.927029] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[66303.927060] PKRU: 55555554
[66303.927075] Call Trace:
[66303.927094] <TASK>
[66303.927111] ? __die+0x23/0x70
[66303.927140] ? page_fault_oops+0x171/0x4e0
[66303.927176] ? exc_page_fault+0x7f/0x180
[66303.927209] ? asm_exc_page_fault+0x26/0x30
[66303.927244] ? ice_get_ringparam+0x22/0x50 [ice]
[66303.927433] rings_prepare_data+0x62/0x80
[66303.927469] ethnl_default_doit+0xe2/0x350
[66303.927501] genl_family_rcv_msg_doit.isra.0+0xe3/0x140
[66303.927538] genl_rcv_msg+0x1b1/0x2c0
[66303.927561] ? __pfx_ethnl_default_doit+0x10/0x10
[66303.927590] ? __pfx_genl_rcv_msg+0x10/0x10
[66303.927615] netlink_rcv_skb+0x58/0x110
[66303.927644] genl_rcv+0x28/0x40
[66303.927665] netlink_unicast+0x19e/0x290
[66303.927691] netlink_sendmsg+0x254/0x4d0
[66303.927717] sock_sendmsg+0x93/0xa0
[66303.927743] __sys_sendto+0x126/0x170
[66303.927780] __x64_sys_sendto+0x24/0x30
[66303.928593] do_syscall_64+0x5d/0x90
[66303.929370] ? __count_memcg_events+0x60/0xa0
[66303.930146] ? count_memcg_events.constprop.0+0x1a/0x30
[66303.930920] ? handle_mm_fault+0x9e/0x350
[66303.931688] ? do_user_addr_fault+0x258/0x740
[66303.932452] ? exc_page_fault+0x7f/0x180
[66303.933193] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixes: 5b246e533d01 ("ice: split probe into smaller functions")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Refactor ice_phy_type_to_ethtool to use phy_type_[low|high]_lkup table to
map PHY type to AQ link speed and ethtool link mode. This removes
complexity and simplifies future changes.
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Currently interrupt allocations, depending on a feature are distributed
in batches. Also, after allocation there is a series of operations that
distributes per irq settings through that batch of interrupts.
Although driver does not yet support dynamic interrupt allocation, keep
allocated interrupts in a pool and add allocation abstraction logic to
make code more flexible. Keep per interrupt information in the
ice_q_vector structure, which yields ice_vsi::base_vector redundant.
Also, as a result there are a few functions that can be removed.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
ice_get_module_eeprom() is broken since commit e9c9692c8a81 ("ice:
Reimplement module reads used by ethtool") In this refactor,
ice_get_module_eeprom() reads the eeprom in blocks of size 8.
But the condition that should protect the buffer overflow
ignores the last block. The last block always contains zeros.
Bug uncovered by ethtool upstream commit 9538f384b535
("netlink: eeprom: Defer page requests to individual parsers")
After this commit, ethtool reads a block with length = 1;
to read the SFF-8024 identifier value.
unpatched driver:
$ ethtool -m enp65s0f0np0 offset 0x90 length 8
Offset Values
------ ------
0x0090: 00 00 00 00 00 00 00 00
$ ethtool -m enp65s0f0np0 offset 0x90 length 12
Offset Values
------ ------
0x0090: 00 00 01 a0 4d 65 6c 6c 00 00 00 00
$
$ ethtool -m enp65s0f0np0
Offset Values
------ ------
0x0000: 11 06 06 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0060: 00 00 00 00 00 00 00 00 00 00 00 00 00 01 08 00
0x0070: 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00
patched driver:
$ ethtool -m enp65s0f0np0 offset 0x90 length 8
Offset Values
------ ------
0x0090: 00 00 01 a0 4d 65 6c 6c
$ ethtool -m enp65s0f0np0 offset 0x90 length 12
Offset Values
------ ------
0x0090: 00 00 01 a0 4d 65 6c 6c 61 6e 6f 78
$ ethtool -m enp65s0f0np0
Identifier : 0x11 (QSFP28)
Extended identifier : 0x00
Extended identifier description : 1.5W max. Power consumption
Extended identifier description : No CDR in TX, No CDR in RX
Extended identifier description : High Power Class (> 3.5 W) not enabled
Connector : 0x23 (No separable connector)
Transceiver codes : 0x88 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Transceiver type : 40G Ethernet: 40G Base-CR4
Transceiver type : 25G Ethernet: 25G Base-CR CA-N
Encoding : 0x05 (64B/66B)
BR, Nominal : 25500Mbps
Rate identifier : 0x00
Length (SMF,km) : 0km
Length (OM3 50um) : 0m
Length (OM2 50um) : 0m
Length (OM1 62.5um) : 0m
Length (Copper or Active cable) : 1m
Transmitter technology : 0xa0 (Copper cable unequalized)
Attenuation at 2.5GHz : 4db
Attenuation at 5.0GHz : 5db
Attenuation at 7.0GHz : 7db
Attenuation at 12.9GHz : 10db
........
....
Fixes: e9c9692c8a81 ("ice: Reimplement module reads used by ethtool")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
====================
pull-request: bpf-next 2023-02-11
We've added 96 non-merge commits during the last 14 day(s) which contain
a total of 152 files changed, 4884 insertions(+), 962 deletions(-).
There is a minor conflict in drivers/net/ethernet/intel/ice/ice_main.c
between commit 5b246e533d01 ("ice: split probe into smaller functions")
from the net-next tree and commit 66c0e13ad236 ("drivers: net: turn on
XDP features") from the bpf-next tree. Remove the hunk given ice_cfg_netdev()
is otherwise there a 2nd time, and add XDP features to the existing
ice_cfg_netdev() one:
[...]
ice_set_netdev_features(netdev);
netdev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
NETDEV_XDP_ACT_XSK_ZEROCOPY;
ice_set_ops(netdev);
[...]
Stephen's merge conflict mail:
https://lore.kernel.org/bpf/20230207101951.21a114fa@canb.auug.org.au/
The main changes are:
1) Add support for BPF trampoline on s390x which finally allows to remove many
test cases from the BPF CI's DENYLIST.s390x, from Ilya Leoshkevich.
2) Add multi-buffer XDP support to ice driver, from Maciej Fijalkowski.
3) Add capability to export the XDP features supported by the NIC.
Along with that, add a XDP compliance test tool,
from Lorenzo Bianconi & Marek Majtyka.
4) Add __bpf_kfunc tag for marking kernel functions as kfuncs,
from David Vernet.
5) Add a deep dive documentation about the verifier's register
liveness tracking algorithm, from Eduard Zingerman.
6) Fix and follow-up cleanups for resolve_btfids to be compiled
as a host program to avoid cross compile issues,
from Jiri Olsa & Ian Rogers.
7) Batch of fixes to the BPF selftest for xdp_hw_metadata which resulted
when testing on different NICs, from Jesper Dangaard Brouer.
8) Fix libbpf to better detect kernel version code on Debian, from Hao Xiang.
9) Extend libbpf to add an option for when the perf buffer should
wake up, from Jon Doron.
10) Follow-up fix on xdp_metadata selftest to just consume on TX
completion, from Stanislav Fomichev.
11) Extend the kfuncs.rst document with description on kfunc
lifecycle & stability expectations, from David Vernet.
12) Fix bpftool prog profile to skip attaching to offline CPUs,
from Tonghao Zhang.
====================
Link: https://lore.kernel.org/r/20230211002037.8489-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Do few small cleanups:
1) Rename the function to reflect that it doesn't configure all things
related to VSI. ice_vsi_cfg_lan() better fits to what function is doing.
ice_vsi_cfg() can be use to name function that will configure whole VSI.
2) Remove unused ethtype field from VSI. There is no need to set
ethtype here, because it is never used.
3) Remove unnecessary check for ICE_VSI_CHNL. There is check for
ICE_VSI_CHNL in ice_vsi_get_qs, so there is no need to check it before
calling the function.
4) Simplify ice_vsi_alloc() call. There is no need to check the type of
VSI before calling ice_vsi_alloc(). For ICE_VSI_CHNL vf is always NULL
(ice_vsi_setup() is called with vf=NULL).
For ICE_VSI_VF or ICE_VSI_CTRL ch is always NULL and for other VSI types
ch and vf are always NULL.
5) Remove unnecessary call to ice_vsi_dis_irq(). ice_vsi_dis_irq() will
be called in ice_vsi_close() flow (ice_vsi_close() -> ice_vsi_down() ->
ice_vsi_dis_irq()). Remove unnecessary call.
6) Don't remove specific filters in release. All hw filters are removed
in ice_fltr_remove_alli(), which is always called in VSI release flow.
There is no need to remove only ethertype filters before calling
ice_fltr_remove_all().
7) Rename ice_vsi_clear() to ice_vsi_free(). As ice_vsi_clear() only
free memory allocated in ice_vsi_alloc() rename it to ice_vsi_free()
which better shows what function is doing.
8) Free coalesce param in rebuild. There is potential memory leak if
configuration of VSI lan fails. Free coalesce to avoid it.
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
net/core/gro.c
7d2c89b32587 ("skb: Do mix page pool and page referenced frags in GRO")
b1a78b9b9886 ("net: add support for ipv4 big tcp")
https://lore.kernel.org/all/20230203094454.5766f160@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that both ZC and standard XDP data paths stopped using Tx logic
based on next_dd and next_rs fields, we can safely remove these fields
and shrink Tx ring structure.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/bpf/20230131204506.219292-13-maciej.fijalkowski@intel.com
|
|
Ice driver needs to be a bit reworked on Rx data path in order to
support multi-buffer XDP. For skb path, it currently works in a way that
Rx ring carries pointer to skb so if driver didn't manage to combine
fragmented frame at current NAPI instance, it can restore the state on
next instance and keep looking for last fragment (so descriptor with EOP
bit set). What needs to be achieved is that xdp_buff needs to be
combined in such way (linear + frags part) in the first place. Then skb
will be ready to go in case of XDP_PASS or BPF program being not present
on interface. If BPF program is there, it would work on multi-buffer
XDP. At this point xdp_buff resides directly on Rx ring, so given the
fact that skb will be built straight from xdp_buff, there will be no
further need to carry skb on Rx ring.
Besides removing skb pointer from Rx ring, lots of members have been
moved around within ice_rx_ring. First and foremost reason was to place
rx_buf with xdp_buff on the same cacheline. This means that once we
touch rx_buf (which is a preceding step before touching xdp_buff),
xdp_buff will already be hot in cache. Second thing was that xdp_rxq is
used rather rarely and it occupies a separate cacheline, so maybe it is
better to have it at the end of ice_rx_ring.
Other change that affects ice_rx_ring is the introduction of
ice_rx_ring::first_desc. Its purpose is twofold - first is to propagate
rx_buf->act to all the parts of current xdp_buff after running XDP
program, so that ice_put_rx_buf() that got moved out of the main Rx
processing loop will be able to tak an appriopriate action on each
buffer. Second is for ice_construct_skb().
ice_construct_skb() has a copybreak mechanism which had an explicit
impact on xdp_buff->skb conversion in the new approach when legacy Rx
flag is toggled. It works in a way that linear part is 256 bytes long,
if frame is bigger than that, remaining bytes are going as a frag to
skb_shared_info.
This means while memcpying frags from xdp_buff to newly allocated skb,
care needs to be taken when picking the destination frag array entry.
Upon the time ice_construct_skb() is called, when dealing with
fragmented frame, current rx_buf points to the *last* fragment, but
copybreak needs to be done against the first one. That's where
ice_rx_ring::first_desc helps.
When frame building spans across NAPI polls (DD bit is not set on
current descriptor and xdp->data is not NULL) with current Rx buffer
handling state there might be some problems.
Since calls to ice_put_rx_buf() were pulled out of the main Rx
processing loop and were scoped from cached_ntc to current ntc, remember
that now mentioned function relies on rx_buf->act, which is set within
ice_run_xdp(). ice_run_xdp() is called when EOP bit was found, so
currently we could put Rx buffer with rx_buf->act being *uninitialized*.
To address this, change scoping to rely on first_desc on both boundaries
instead.
This also implies that cleaned_count which is used as an input to
ice_alloc_rx_buffers() and tells how many new buffers should be refilled
has to be adjusted. If it stayed as is, what could happen is a case
where ntc would go over ntu.
Therefore, remove cleaned_count altogether and use against allocing
routine newly introduced ICE_RX_DESC_UNUSED() macro which is an
equivalent of ICE_DESC_UNUSED() dedicated for Rx side and based on
struct ice_rx_ring::first_desc instead of next_to_clean.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/bpf/20230131204506.219292-11-maciej.fijalkowski@intel.com
|
|
The PF controls the set of queues that the RDMA auxiliary_driver requests
resources from. The set_channel command will alter that pool and trigger a
reconfiguration of the VSI, which breaks RDMA functionality.
Prevent set_channel from executing when RDMA driver bound to auxiliary
device.
Adding a locked variable to pass down the call chain to avoid double
locking the device_lock.
Fixes: 348048e724a0 ("ice: Implement iidc operations")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Previous checks, and goto, will catch all errors meaning these returns
will only return 0; explicitly return 0 for these cases.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
|
|
Combine if statements setting the same link speed together.
Suggested-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit 2736d94f351b ("ethtool: Added support for 50Gbps per lane link modes")
in v5.1 added (among other things) support for 100G CR2/KR2/SR2 link modes.
Advertise these link modes if the firmware reports the corresponding PHY types.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Resets may occur with or without user interaction. For example, a TX hang
or reconfiguration of parameters will result in a reset. During reset, the
VSI is freed, freeing any statistics structures inside as well. This would
create an issue for the user where a reset happens in the background,
statistics set to zero, and the user checks ring statistics expecting them
to be populated.
To ensure this doesn't happen, accumulate ring statistics over reset.
Define a new ring statistics structure, ice_ring_stats. The new structure
lives in the VSI's parent, preserving ring statistics when VSI is freed.
1. Define a new structure vsi_ring_stats in the PF scope
2. Allocate/free stats only during probe, unload, or change in ring size
3. Replace previous ring statistics functionality with new structure
Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In the event of a Tx hang it can be useful to read a variety of hardware
registers to capture some state about why the transmit queue got stuck.
Extend the ETHTOOL_GREGS dump provided by the ice driver with several CSR
registers that provide such relevant information regarding the hardware Tx
state. This enables capturing relevant data to enable debugging such a Tx
hang.
Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Link: https://lore.kernel.org/r/20221027104239.1691549-1-jacob.e.keller@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For certain devices, 100M speeds are supported. Do not mask off
100M speed for these devices.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Co-developed-by: Chinh T Cao <chinh.t.cao@intel.com>
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Signed-off-by: Mikael Barsehyan <mikael.barsehyan@intel.com>
Tested-by: Kavya AV <kavyax.av@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The driver can allow the user to configure whether the CRC aka the FCS
(Frame Check Sequence) is DMA'd to the host as part of the receive
buffer. The driver usually wants this feature disabled so that the
hardware checks the FCS and strips it in order to save PCI bandwidth.
Control the reception of FCS to the host using the command:
ethtool -K eth0 rx-fcs <on|off>
The default shown in ethtool -k eth0 | grep fcs; should be "off", as the
hardware will drop any frame with a bad checksum, and DMA of the
checksum is useless overhead especially for small packets.
Testing Hints:
test the FCS/CRC arrives with received packets using
tcpdump -nnpi eth0 -xxxx
and it should show crc data as the last 4 bytes of the packet. Can also
use wireshark to turn on CRC checking and check the data is correct.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Co-developed-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Signed-off-by: Benjamin Mikailenko <benjamin.mikailenko@intel.com>
Co-developed-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice driver requires a cached copy of the PHC time in order to perform
timestamp extension on Tx and Rx hardware timestamp values. This cached PHC
time must always be updated at least once every 2 seconds. Otherwise, the
math used to perform the extension would produce invalid results.
The updates are supposed to occur periodically in the PTP kthread work
item, which is scheduled to run every half second. Thus, we do not expect
an update to be delayed for so long. However, there are error conditions
which can cause the update to be delayed.
Track this situation by using jiffies to determine approximately how long
ago the last update occurred. Add a new statistic and a dev_warn when we
have failed to update the cached PHC time. This makes the error case more
obvious.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Several Intel networking drivers which support PTP track when Tx timestamps
are skipped or when they timeout without a timestamp from hardware. The
conditions which could cause these events are rare, but it can be useful to
know when and how often they occur.
Implement similar statistics for the ice driver, tx_hwtstamp_skipped,
tx_hwtstamp_timeouts, and tx_hwtstamp_flushed.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When we create new Rx rings, the cached_phctime field is zero initialized.
This could result in incorrect timestamp reporting due to the cached value
not yet being updated. Although a background task will periodically update
the cached value, ensure it matches the existing cached value in the PF
structure at ring initialization.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When the user changes the number of queues via ethtool, the driver
allocates new rings. This allocation did not initialize tx_tstamps. This
results in the tx_tstamps field being zero (due to kcalloc allocation), and
would result in a NULL pointer dereference when attempting a transmit
timestamp on the new ring.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-07-28
This series contains updates to ice driver only.
Michal allows for VF true promiscuous mode to be set for multiple VFs
and adds clearing of promiscuous filters when VF trust is removed.
Maciej refactors ice_set_features() to track/check changed features
instead of constantly checking against netdev features and adds support for
NETIF_F_LOOPBACK.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: allow toggling loopback mode via ndo_set_features callback
ice: compress branches in ice_set_features()
ice: Fix promiscuous mode not turning off
ice: Introduce enabling promiscuous mode on multiple VF's
====================
Link: https://lore.kernel.org/r/20220728195538.3391360-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
In current implementation default VSI switch filter is only able to
forward traffic to a single VSI. This limits promiscuous mode with
private flag 'vf-true-promisc-support' to a single VF. Enabling it on
the second VF won't work. Also allmulticast support doesn't seem to be
properly implemented when vf-true-promisc-support is true.
Use standard ice_add_rule_internal() function that already implements
forwarding to multiple VSI's instead of constructing AQ call manually.
Add switch filter for allmulticast mode when vf-true-promisc-support is
enabled. The same filter is added regardless of the flag - it doesn't
matter for this case.
Remove unnecessary fields in switch structure. From now on book keeping
will be done by ice_add_rule_internal().
Refactor unnecessarily passed function arguments.
To test:
1) Create 2 VM's, and two VF's. Attach VF's to VM's.
2) Enable promiscuous mode on both of them and check if
traffic is seen on both of them.
Signed-off-by: Michal Wilczynski <michal.wilczynski@intel.com>
Tested-by: Marek Szlosek <marek.szlosek@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Tx side sets EOP and RS bits on descriptors to indicate that a
particular descriptor is the last one and needs to generate an irq when
it was sent. These bits should not be checked on completion path
regardless whether it's the Tx or the Rx. DD bit serves this purpose and
it indicates that a particular descriptor is either for Rx or was
successfully Txed. EOF is also set as loopback test does not xmit
fragmented frames.
Look at (DD | EOF) bits setting in ice_lbtest_receive_frames() instead
of EOP and RS pair.
Fixes: 0e674aeb0b77 ("ice: Add handler for ethtool selftest")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Do not allow setting less channels, than Traffic Classes there are
via ethtool. There must be at least one channel per Traffic Class.
If you set less channels, than Traffic Classes there are, then during
ice_vsi_rebuild there would be allocated only the requested amount
of tx/rx rings in ice_vsi_alloc_arrays. But later in ice_vsi_setup_q_map
there would be requested at least one channel per Traffic Class. This
results in setting num_rxq > alloc_rxq and num_txq > alloc_txq.
Later, there would be a NULL pointer dereference in
ice_vsi_map_rings_to_vectors, because we go beyond of rx_rings or
tx_rings arrays.
Change ice_set_channels() to return error if you try to allocate less
channels, than Traffic Classes there are.
Change ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() to return
status code instead of void.
Add error handling for ice_vsi_setup_q_map() and
ice_vsi_setup_q_map_mqprio() in ice_vsi_init() and ice_vsi_cfg_tc().
[53753.889983] INFO: Flow control is disabled for this traffic class (0) on this vsi.
[53763.984862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[53763.992915] PGD 14b45f5067 P4D 0
[53763.996444] Oops: 0002 [#1] SMP NOPTI
[53764.000312] CPU: 12 PID: 30661 Comm: ethtool Kdump: loaded Tainted: GOE --------- - - 4.18.0-240.el8.x86_64 #1
[53764.011825] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P21.2012150710 12/15/2020
[53764.022584] RIP: 0010:ice_vsi_map_rings_to_vectors+0x7e/0x120 [ice]
[53764.029089] Code: 41 0d 0f b7 b7 12 05 00 00 0f b6 d0 44 29 de 44 0f b7 c6 44 01 c2 41 39 d0 7d 2d 4c 8b 47 28 44 0f b7 ce 83 c6 01 4f 8b 04 c8 <49> 89 48 28 4 c 8b 89 b8 01 00 00 4d 89 08 4c 89 81 b8 01 00 00 44
[53764.048379] RSP: 0018:ff550dd88ea47b20 EFLAGS: 00010206
[53764.053884] RAX: 0000000000000002 RBX: 0000000000000004 RCX: ff385ea42fa4a018
[53764.061301] RDX: 0000000000000006 RSI: 0000000000000005 RDI: ff385e9baeedd018
[53764.068717] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000004
[53764.076133] R10: 0000000000000002 R11: 0000000000000004 R12: 0000000000000000
[53764.083553] R13: 0000000000000000 R14: ff385e658fdd9000 R15: ff385e9baeedd018
[53764.090976] FS: 000014872c5b5740(0000) GS:ff385e847f100000(0000) knlGS:0000000000000000
[53764.099362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[53764.105409] CR2: 0000000000000028 CR3: 0000000a820fa002 CR4: 0000000000761ee0
[53764.112851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[53764.120301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[53764.127747] PKRU: 55555554
[53764.130781] Call Trace:
[53764.133564] ice_vsi_rebuild+0x611/0x870 [ice]
[53764.138341] ice_vsi_recfg_qs+0x94/0x100 [ice]
[53764.143116] ice_set_channels+0x1a8/0x3e0 [ice]
[53764.147975] ethtool_set_channels+0x14e/0x240
[53764.152667] dev_ethtool+0xd74/0x2a10
[53764.156665] ? __mod_lruvec_state+0x44/0x110
[53764.161280] ? __mod_lruvec_state+0x44/0x110
[53764.165893] ? page_add_file_rmap+0x15/0x170
[53764.170518] ? inet_ioctl+0xd1/0x220
[53764.174445] ? netdev_run_todo+0x5e/0x290
[53764.178808] dev_ioctl+0xb5/0x550
[53764.182485] sock_do_ioctl+0xa0/0x140
[53764.186512] sock_ioctl+0x1a8/0x300
[53764.190367] ? selinux_file_ioctl+0x161/0x200
[53764.195090] do_vfs_ioctl+0xa4/0x640
[53764.199035] ksys_ioctl+0x60/0x90
[53764.202722] __x64_sys_ioctl+0x16/0x20
[53764.206845] do_syscall_64+0x5b/0x1a0
[53764.210887] entry_SYSCALL_64_after_hwframe+0x65/0xca
Fixes: 87324e747fde ("ice: Implement ethtool ops for channels")
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
In current implementation ice_update_phy_type enables all link modes
for selected speed. This approach doesn't work for 1000M speeds,
because both copper (1000baseT) and optical (1000baseX) standards
cannot be enabled at once.
Fix this, by adding the function `ice_set_phy_type_from_speed()`
for 1000M speeds.
Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations")
Signed-off-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
When ADQ queue groups (TCs) are created via tc mqprio command,
RSS contexts and associated RSS indirection tables are configured
automatically per TC based on the queue ranges specified for
each traffic class.
For ex:
tc qdisc add dev enp175s0f0 root mqprio num_tc 3 map 0 1 2 \
queues 2@0 8@2 4@10 hw 1 mode channel
will create 3 queue groups (TC 0-2) with queue ranges 2, 8 and 4
in 3 queue groups. Each queue group is associated with its
own RSS context and RSS indirection table.
Add support to expose RSS indirection tables for all ADQ queue
groups using ethtool RSS contexts interface.
ethtool -x enp175s0f0 context <tc-num>
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20220512213249.3747424-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This reverts commit bfaaba99e680bf82bf2cbf69866c3f37434ff766.
Commit bfaaba99e680 ("ice: Hide bus-info in ethtool for PRs in switchdev
mode") was a workaround for lshw tool displaying incorrect
descriptions for port representors and PF in switchdev mode. Now the issue
has been fixed in the lshw tool itself [1].
Removing the workaround can be considered a regression, as the user might
be running older, unpatched lshw version. However, another important change
(ice: link representors to PCI device, which improves port representor
netdev naming with SET_NETDEV_DEV) also causes the same "regression" as
removing the workaround, i.e. unpatched lshw is able to access bus-info
information (this time not via ethtool) and the bug can occur. Therefore,
the workaround no longer prevents the bug and can be removed.
[1] https://ezix.org/src/pkg/lshw/commit/9bf4e4c9c1
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
net/dsa/dsa2.c
commit afb3cc1a397d ("net: dsa: unlock the rtnl_mutex when dsa_master_setup() fails")
commit e83d56537859 ("net: dsa: replay master state events in dsa_tree_{setup,teardown}_master")
https://lore.kernel.org/all/20220307101436.7ae87da0@canb.auug.org.au/
drivers/net/ethernet/intel/ice/ice.h
commit 97b0129146b1 ("ice: Fix error with handling of bonding MTU")
commit 43113ff73453 ("ice: add TTY for GNSS module for E810T device")
https://lore.kernel.org/all/20220310112843.3233bcf1@canb.auug.org.au/
drivers/staging/gdm724x/gdm_lte.c
commit fc7f750dc9d1 ("staging: gdm724x: fix use after free in gdm_lte_rx()")
commit 4bcc4249b4cf ("staging: Use netif_rx().")
https://lore.kernel.org/all/20220308111043.1018a59d@canb.auug.org.au/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Change curr_link_speed advertised speed, due to
link_info.link_speed is not equal phy.curr_user_speed_req.
Without this patch it is impossible to set advertised
speed to same as link_speed.
Testing Hints: Try to set advertised speed
to 25G only with 25G default link (use ethtool -s 0x80000000)
Fixes: 48cb27f2fd18 ("ice: Implement handlers for ethtool PHY/link operations")
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice driver stores VF structures in a simple array which is allocated
once at the time of VF creation. The VF structures are then accessed
from the array by their VF ID. The ID must be between 0 and the number
of allocated VFs.
Multiple threads can access this table:
* .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
* interrupts, such as due to messages from the VF using the virtchnl
communication
* processing such as device reset
* commands to add or remove VFs
The current implementation does not keep track of when all threads are
done operating on a VF and can potentially result in use-after-free
issues caused by one thread accessing a VF structure after it has been
released when removing VFs. Some of these are prevented with various
state flags and checks.
In addition, this structure is quite static and does not support a
planned future where virtualization can be more dynamic. As we begin to
look at supporting Scalable IOV with the ice driver (as opposed to just
supporting Single Root IOV), this structure is not sufficient.
In the future, VFs will be able to be added and removed individually and
dynamically.
To allow for this, and to better protect against a whole class of
use-after-free bugs, replace the VF storage with a combination of a hash
table and krefs to reference track all of the accesses to VFs through
the hash table.
A hash table still allows efficient look up of the VF given its ID, but
also allows adding and removing VFs. It does not require contiguous VF
IDs.
The use of krefs allows the cleanup of the VF memory to be delayed until
after all threads have released their reference (by calling ice_put_vf).
To prevent corruption of the hash table, a combination of RCU and the
mutex table_lock are used. Addition and removal from the hash table use
the RCU-aware hash macros. This allows simple read-only look ups that
iterate to locate a single VF can be fast using RCU. Accesses which
modify the hash table, or which can't take RCU because they sleep, will
hold the mutex lock.
By using this design, we have a stronger guarantee that the VF structure
can't be released until after all threads are finished operating on it.
We also pave the way for the more dynamic Scalable IOV implementation in
the future.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Before we switch the VF data structure storage mechanism to a hash,
introduce new accessor functions to define the new interface.
* ice_get_vf_by_id is a function used to obtain a reference to a VF from
the table based on its VF ID
* ice_has_vfs is used to quickly check if any VFs are configured
* ice_get_num_vfs is used to get an exact count of how many VFs are
configured
We can drop the old ice_validate_vf_id function, since every caller was
just going to immediately access the VF table to get a reference
anyways. This way we simply use the single ice_get_vf_by_id to both
validate the VF ID is within range and that there exists a VF with that
ID.
This change enables us to more easily convert the codebase to the hash
table since most callers now properly use the interface.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
We maintain a number of values for VFs within the ice_pf structure. This
includes the VF table, the number of allocated VFs, the maximum number
of supported SR-IOV VFs, the number of queue pairs per VF, the number of
MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.
We're about to add a few more variables to this list. Clean this up
first by extracting these members out into a new ice_vfs structure
defined in ice_virtchnl_pf.h
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
The ice_for_each_vf macro is intended to be used to loop over all VFs.
The current implementation relies on an iterator that is the index into
the VF array in the PF structure. This forces all users to perform a
look up themselves.
This abstraction forces a lot of duplicate work on callers and leaks the
interface implementation to the caller. Replace this with an
implementation that includes the VF pointer the primary iterator. This
version simplifies callers which just want to iterate over every VF, as
they no longer need to perform their own lookup.
The "i" iterator value is replaced with a new unsigned int "bkt"
parameter, as this will match the necessary interface for replacing
the VF array with a hash table. For now, the bkt is the VF ID, but in
the future it will simply be the hash bucket index. Document that it
should not be treated as a VF ID.
This change aims to simplify switching from the array to a hash table. I
considered alternative implementations such as an xarray but decided
that the hash table was the simplest and most suitable implementation. I
also looked at methods to hide the bkt iterator entirely, but I couldn't
come up with a feasible solution that worked for hash table iterators.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2022-02-09
This series contains updates to ice driver only.
Brett adds support for QinQ. This begins with code refactoring and
re-organization of VLAN configuration functions to allow for
introduction of VSI VLAN ops to enable setting and calling of
respective operations based on device support of single or double
VLANs. Implementations are added for outer VLAN support.
To support QinQ, the device must be set to double VLAN mode (DVM).
In order for this to occur, the DDP package and NVM must also support
DVM. Functions to determine compatibility and properly configure the
device are added as well as setting the proper bits to advertise and
utilize the proper offloads. Support for VIRTCHNL_VF_OFFLOAD_VLAN_V2
is also included to allow for VF to negotiate and utilize this
functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
VFs by default are able to see all tagged traffic regardless of trust
and VLAN filters. Based on legacy devices (i.e. ixgbe, i40e), customers
expect VFs to receive all VLAN tagged traffic with a matching
destination MAC.
Add an ethtool private flag 'vf-vlan-pruning' and set the default to
off so VFs will receive all VLAN traffic directed towards them. When
the flag is turned on, VF will only be able to receive untagged
traffic or traffic with VLAN tags it has created interfaces for.
Also, the flag cannot be changed while any VFs are allocated. This was
done to simplify the implementation. So, if this flag is needed, then
the PF admin must enable it. If the user tries to enable the flag while
VFs are active, then print an unsupported message with the
vf-vlan-pruning flag included. In case multiple flags were specified, this
makes it clear to the user which flag failed.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
XDP_TX workloads use a concept of Tx threshold that indicates the
interval of setting RS bit on descriptors which in turn tells the HW to
generate an interrupt to signal the completion of Tx on HW side. It is
currently based on a constant value of 32 which might not work out well
for various sizes of ring combined with for example batch size that can
be set via SO_BUSY_POLL_BUDGET.
Internal tests based on AF_XDP showed that most convenient setup of
mentioned threshold is when it is equal to quarter of a ring length.
Make use of recently introduced ICE_RING_QUARTER macro and use this
value as a substitute for ICE_TX_THRESH.
Align also ethtool -G callback so that next_dd/next_rs fields are up to
date in terms of the ring size.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220125160446.78976-5-maciej.fijalkowski@intel.com
|
|
The driver had comments to the effect of: This flag should be set before
calling this function. While reviewing code it was found that there were
several violations of this policy, which could introduce hard to find
bugs or races.
Fix the violations of the "VSI DOWN state must be set before calling
ice_down" and make checking the state into code with a WARN_ON.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
As all functions now return standard error codes, propagate the values
being returned instead of converting them to generic values.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
|
|
ice_status previously had a variable to contain these values where other
error codes had a variable as well. With ice_status now being an int,
there is no need for two variables to hold error values. In cases where
this occurs, remove one of the excess variables and use a single one.
Some initialization of variables are no longer needed and have been
removed.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
|
|
Clean up code after changing ice_status to int. Rearrange to fix reverse
Christmas tree and pull lines up where applicable.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
|
|
To prepare for removal of ice_status, change the variables from
ice_status to int. This eases the transition when values are changed to
return standard int error codes over enum ice_status.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
|
|
Remove the ice_stat_str() function which prints the string
representation of the ice_status error code. With upcoming changes
moving away from ice_status, there will be no need for this function.
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
|
|
Add two new parameters kernel_ringparam and extack for
.get_ringparam and .set_ringparam to extend more ring params
through netlink.
Signed-off-by: Hao Chen <chenhao288@hisilicon.com>
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Disable showing bus-info information for port representors in switchdev
mode. This fixes a bug that caused displaying wrong netdev descriptions in
lshw tool - one port representor displayed PF branding string, and in turn
one PF displayed a "generic" description. The bug occurs when many devices
show the same bus-info in ethtool, which was the case in switchdev mode (PF
and its port representors displayed the same bus-info). The bug occurs only
if a port representor netdev appears before PF netdev in /proc/net/dev.
In the examples below:
ens6fX is PF
ens6fXvY is VF
ethX is port representor
One irrelevant column was removed from output
Before:
$ sudo lshw -c net -businfo
Bus info Device Description
=========================================
pci@0000:02:00.0 eth102 Ethernet Controller E810-XXV for SFP
pci@0000:02:00.1 ens6f1 Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0 ens6f0v0 Ethernet Adaptive Virtual Function
pci@0000:02:01.1 ens6f0v1 Ethernet Adaptive Virtual Function
pci@0000:02:01.2 ens6f0v2 Ethernet Adaptive Virtual Function
pci@0000:02:00.0 ens6f0 Ethernet interface
Notice that eth102 and ens6f0 have the same bus-info and their descriptions
are swapped.
After:
$ sudo lshw -c net -businfo
Bus info Device Description
=========================================
pci@0000:02:00.0 ens6f0 Ethernet Controller E810-XXV for SFP
pci@0000:02:00.1 ens6f1 Ethernet Controller E810-XXV for SFP
pci@0000:02:01.0 ens6f0v0 Ethernet Adaptive Virtual Function
pci@0000:02:01.1 ens6f0v1 Ethernet Adaptive Virtual Function
pci@0000:02:01.2 ens6f0v2 Ethernet Adaptive Virtual Function
Fixes: 7aae80cef7ba ("ice: add port representor ethtool ops and stats")
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Add support in driver for TC_QDISC_SETUP_MQPRIO. This support
enables instantiation of channels in HW using existing MQPRIO
infrastructure which is extended to be offloadable. This
provides a mechanism to configure dedicated set of queues for
each TC.
Configuring channels using "tc mqprio":
--------------------------------------
tc qdisc add dev <ethX> root mqprio num_tc 3 map 0 1 2 \
queues 4@0 4@4 4@8 hw 1 mode channel
Above command configures 3 TCs having 4 queues each. "hw 1 mode channel"
implies offload of channel configuration to HW. When driver processes
configuration received via "ndo_setup_tc: QDISC_SETUP_MQPRIO", each
TC maps to HW VSI with specified queues.
User can optionally specify bandwidth min and max rate limit per TC
(see example below). If shaper params like min and/or max bandwidth
rate limit are specified, driver configures VSI specific rate limiter
in HW.
Configuring channels and bandwidth shaper parameters using "tc mqprio":
----------------------------------------------------------------
tc qdisc add dev <ethX> root mqprio \
num_tc 4 map 0 1 2 3 queues 4@0 4@4 4@8 4@12 hw 1 mode channel \
shaper bw_rlimit min_rate 1Gbit 2Gbit 3Gbit 4Gbit \
max_rate 4Gbit 5Gbit 6Gbit 7Gbit
Command to view configured TCs:
-----------------------------
tc qdisc show dev <ethX>
Deleting TCs:
------------
tc qdisc del dev <ethX> root mqprio
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|