Age | Commit message (Collapse) | Author |
|
Commit 0c3d79bce48034018e840468ac5a642894a521a3 ("tcp: reduce SYN-ACK
retrans for TCP_DEFER_ACCEPT") introduces syn_ack_recalc() which decides
if a minisock is held and a SYN+ACK is retransmitted or not.
If rskq_defer_accept is not zero in syn_ack_recalc(), max_retries always
has the same value because max_retries is overwritten by rskq_defer_accept
in reqsk_timer_handler().
This commit adds three changes:
- remove redundant non-zero check for rskq_defer_accept in
reqsk_timer_handler().
- remove max_retries from the arguments of syn_ack_recalc() and use
rskq_defer_accept instead.
- rename thresh to max_syn_ack_retries for readability.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
CC: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to use new devlink port health reporters infrastructure, add
corresponding constructor and destructor functions.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add devlink-health reporter support on per-port basis.
The main difference existing devlink-health is that port reporters are
stored in per-devlink_port lists. Upon creation of such health reporter the
reference to a port it belongs to is stored in reporter struct.
Fill the port index attribute in devlink-health response to
allow devlink userspace utility to distinguish between device and port
reporters.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a generic __devlink_health_reporter_find_by_name() that can be used
with arbitrary devlink health reporter list.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Devlink keeps its own reference to every reporter in a list and inits
refcount to 1 upon reporter's creation. Existing destructor waits to
free the memory indefinitely using msleep() until all references except
devlink's own are put.
Rework this mechanism by moving memory free routine to a separate
function, which is called when the last reporter reference is put.
Besides, it allows to call __devlink_health_reporter_destroy() while
locked on a reporters list mutex in symmetry to
__devlink_health_reporter_create(), which is required in follow-up
patch.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Prepare a common routine in devlink_health_reporter_create() for usage
in similar functions for devlink port health reporters.
Signed-off-by: Vladyslav Tarasiuk <vladyslavt@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add an interface to report offloaded UDP ports via ethtool netlink.
Now that core takes care of tracking which UDP tunnel ports the NICs
are aware of we can quite easily export this information out to
user space.
The responsibility of writing the netlink dumps is split between
ethtool code and udp_tunnel_nic.c - since udp_tunnel module may
not always be loaded, yet we should always report the capabilities
of the NIC.
$ ethtool --show-tunnels eth0
Tunnel information for eth0:
UDP port table 0:
Size: 4
Types: vxlan
No entries
UDP port table 1:
Size: 4
Types: geneve, vxlan-gpe
Entries (1):
port 1230, vxlan-gpe
v4:
- back to v2, build fix is now directly in udp_tunnel.h
v3:
- don't compile ETHTOOL_MSG_TUNNEL_INFO_GET in if CONFIG_INET
not set.
v2:
- fix string set count,
- reorder enums in the uAPI,
- fix type of ETHTOOL_A_TUNNEL_UDP_TABLE_TYPES to bitset
in docs and comments.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Cater to devices which:
(a) may want to sleep in the callbacks;
(b) only have IPv4 support;
(c) need all the programming to happen while the netdev is up.
Drivers attach UDP tunnel offload info struct to their netdevs,
where they declare how many UDP ports of various tunnel types
they support. Core takes care of tracking which ports to offload.
Use a fixed-size array since this matches what almost all drivers
do, and avoids a complexity and uncertainty around memory allocations
in an atomic context.
Make sure that tunnel drivers don't try to replay the ports when
new NIC netdev is registered. Automatic replays would mess up
reference counting, and will be removed completely once all drivers
are converted.
v4:
- use a #define NULL to avoid build issues with CONFIG_INET=n.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, all the input checks are done in driver.
After adding the split capability to devlink port, move the checks to
devlink.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new attribute that indicates the split ability of devlink port.
Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add a new devlink port attribute that indicates the port's number of lanes.
Drivers are expected to set it via devlink_port_attrs_set(), before
registering the port.
The attribute is not passed to user space in case the number of lanes is
invalid (0).
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.
Use the devlink_port_attrs struct to replace the relevant parameters.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The struct devlink_port_attrs holds the attributes of devlink_port.
Similarly to the previous patch, 'switch_port' attribute is another
exception.
Move 'switch_port' to be devlink_port's field.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The struct devlink_port_attrs holds the attributes of devlink_port.
The 'set' field is not devlink_port's attribute as opposed to most of the
others.
Move 'set' to be devlink_port's field called 'attrs_set'.
Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
exposes basic inet socket attribute, plus some MPTCP socket
fields comprising PM status and MPTCP-level sequence numbers.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
mptcp_token_iter_next() allow traversing all the MPTCP
sockets inside the token container belonging to the given
network namespace with a quite standard iterator semantic.
That will be used by the next patch, but keep the API generic,
as we plan to use this later for PM's sake.
Additionally export mptcp_token_get_sock(), as it also
will be used by the diag module.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
After commit bf9765145b85 ("sock: Make sk_protocol a 16-bit value")
the current size of 'sdiag_protocol' is not sufficient to represent
the possible protocol values.
This change introduces a new inet diag request attribute to let
user space specify the relevant protocol number using u32 values.
The attribute is parsed by inet diag core on get/dump command
and the extended protocol value, if available, is preferred to
'sdiag_protocol' to lookup the diag handler.
The parse attributed are exposed to all the diag handlers via
the cb->data.
Note that inet_diag_dump_one_icsk() is left unmodified, as it
will not be used by protocol using the extended attribute.
Suggested-by: David S. Miller <davem@davemloft.net>
Co-developed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This implements the known parts of the Realtek 4 byte
tag protocol version 0xA, as found in the RTL8366RB
DSA switch.
It is designated as protocol version 0xA as a
different Realtek 4 byte tag format with protocol
version 0x9 is known to exist in the Realtek RTL8306
chips.
The tag and switch chip lacks public documentation, so
the tag format has been reverse-engineered from
packet dumps. As only ingress traffic has been available
for analysis an egress tag has not been possible to
develop (even using educated guesses about bit fields)
so this is as far as it gets. It is not known if the
switch even supports egress tagging.
Excessive attempts to figure out the egress tag format
was made. When nothing else worked, I just tried all bit
combinations with 0xannp where a is protocol and p is
port. I looped through all values several times trying
to get a response from ping, without any positive
result.
Using just these ingress tags however, the switch
functionality is vastly improved and the packets find
their way into the destination port without any
tricky VLAN configuration. On the D-Link DIR-685 the
LAN ports now come up and respond to ping without
any command line configuration so this is a real
improvement for users.
Egress packets need to be restricted to the proper
target ports using VLAN, which the RTL8366RB DSA
switch driver already sets up.
Cc: DENG Qingfang <dqfext@gmail.com>
Cc: Mauri Sandberg <sandberg@mailfence.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Define 100G, 200G and 400G link modes using 100Gbps per lane
LR, ER and FR are defined as a single link mode because they are
using same technology and by design are fully interoperable.
EEPROM content indicates if the module is LR, ER, or FR, and the
user space ethtool decoder is planned to support decoding these
modes in the EEPROM.
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
CC: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Support for rejecting packets from the prerouting chain, from
Laura Garcia Liebana.
2) Remove useless assignment in pipapo, from Stefano Brivio.
3) On demand hook registration in IPVS, from Julian Anastasov.
4) Expire IPVS connection from process context to not overload
timers, also from Julian.
5) Fallback to conntrack TCP tracker to handle connection reuse
in IPVS, from Julian Anastasov.
6) Several patches to support for chain bindings.
7) Expose enum nft_chain_flags through UAPI.
8) Reject unsupported chain flags from the netlink control plane.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that we have moved the PHY ethtool statistics to be dynamically
registered, we no longer need to inline those for ethtool. This used to
be done to avoid cross symbol referencing and allow ethtool to be
decoupled from PHYLIB entirely.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/latest/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Now that we have introduced ethtool_phy_ops and the PHY library
dynamically registers its operations with that function pointer, we can
remove the direct PHYLIB dependency in favor of using dynamic
operations.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
In order to decouple ethtool from its PHY library dependency, define an
ethtool_phy_ops singleton which can be overriden by the PHY library when
it loads with an appropriate set of function pointers.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We can re-use the existing work queue to handle path management
instead of a dedicated work queue. Just move pm_worker to protocol.c,
call it from the mptcp worker and get rid of the msk lock (already held).
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
syzkaller was able to make the kernel reach subflow_data_ready() for a
server subflow that was closed before subflow_finish_connect() completed.
In these cases we can avoid using the path for regular/fallback MPTCP
data, and just wake the main socket, to avoid the following warning:
WARNING: CPU: 0 PID: 9370 at net/mptcp/subflow.c:885
subflow_data_ready+0x1e6/0x290 net/mptcp/subflow.c:885
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 9370 Comm: syz-executor.0 Not tainted 5.7.0 #106
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0xb7/0xfe lib/dump_stack.c:118
panic+0x29e/0x692 kernel/panic.c:221
__warn.cold+0x2f/0x3d kernel/panic.c:582
report_bug+0x28b/0x2f0 lib/bug.c:195
fixup_bug arch/x86/kernel/traps.c:105 [inline]
fixup_bug arch/x86/kernel/traps.c:100 [inline]
do_error_trap+0x10f/0x180 arch/x86/kernel/traps.c:197
do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:216
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:1027
RIP: 0010:subflow_data_ready+0x1e6/0x290 net/mptcp/subflow.c:885
Code: 04 02 84 c0 74 06 0f 8e 91 00 00 00 41 0f b6 5e 48 31 ff 83 e3 18
89 de e8 37 ec 3d fe 84 db 0f 85 65 ff ff ff e8 fa ea 3d fe <0f> 0b e9
59 ff ff ff e8 ee ea 3d fe 48 89 ee 4c 89 ef e8 f3 77 ff
RSP: 0018:ffff88811b2099b0 EFLAGS: 00010206
RAX: ffff888111197000 RBX: 0000000000000000 RCX: ffffffff82fbc609
RDX: 0000000000000100 RSI: ffffffff82fbc616 RDI: 0000000000000001
RBP: ffff8881111bc800 R08: ffff888111197000 R09: ffffed10222a82af
R10: ffff888111541577 R11: ffffed10222a82ae R12: 1ffff11023641336
R13: ffff888111541000 R14: ffff88810fd4ca00 R15: ffff888111541570
tcp_child_process+0x754/0x920 net/ipv4/tcp_minisocks.c:841
tcp_v4_do_rcv+0x749/0x8b0 net/ipv4/tcp_ipv4.c:1642
tcp_v4_rcv+0x2666/0x2e60 net/ipv4/tcp_ipv4.c:1999
ip_protocol_deliver_rcu+0x29/0x1f0 net/ipv4/ip_input.c:204
ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
NF_HOOK include/linux/netfilter.h:421 [inline]
ip_local_deliver+0x2da/0x390 net/ipv4/ip_input.c:252
dst_input include/net/dst.h:441 [inline]
ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
ip_rcv_finish net/ipv4/ip_input.c:414 [inline]
NF_HOOK include/linux/netfilter.h:421 [inline]
ip_rcv+0xef/0x140 net/ipv4/ip_input.c:539
__netif_receive_skb_one_core+0x197/0x1e0 net/core/dev.c:5268
__netif_receive_skb+0x27/0x1c0 net/core/dev.c:5382
process_backlog+0x1e5/0x6d0 net/core/dev.c:6226
napi_poll net/core/dev.c:6671 [inline]
net_rx_action+0x3e3/0xd70 net/core/dev.c:6739
__do_softirq+0x18c/0x634 kernel/softirq.c:292
do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
</IRQ>
do_softirq.part.0+0x26/0x30 kernel/softirq.c:337
do_softirq arch/x86/include/asm/preempt.h:26 [inline]
__local_bh_enable_ip+0x46/0x50 kernel/softirq.c:189
local_bh_enable include/linux/bottom_half.h:32 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:723 [inline]
ip_finish_output2+0x78a/0x19c0 net/ipv4/ip_output.c:229
__ip_finish_output+0x471/0x720 net/ipv4/ip_output.c:306
dst_output include/net/dst.h:435 [inline]
ip_local_out+0x181/0x1e0 net/ipv4/ip_output.c:125
__ip_queue_xmit+0x7a1/0x14e0 net/ipv4/ip_output.c:530
__tcp_transmit_skb+0x19dc/0x35e0 net/ipv4/tcp_output.c:1238
__tcp_send_ack.part.0+0x3c2/0x5b0 net/ipv4/tcp_output.c:3785
__tcp_send_ack net/ipv4/tcp_output.c:3791 [inline]
tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3791
tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6040 [inline]
tcp_rcv_state_process+0x36a4/0x49c2 net/ipv4/tcp_input.c:6209
tcp_v4_do_rcv+0x343/0x8b0 net/ipv4/tcp_ipv4.c:1651
sk_backlog_rcv include/net/sock.h:996 [inline]
__release_sock+0x1ad/0x310 net/core/sock.c:2548
release_sock+0x54/0x1a0 net/core/sock.c:3064
inet_wait_for_connect net/ipv4/af_inet.c:594 [inline]
__inet_stream_connect+0x57e/0xd50 net/ipv4/af_inet.c:686
inet_stream_connect+0x53/0xa0 net/ipv4/af_inet.c:725
mptcp_stream_connect+0x171/0x5f0 net/mptcp/protocol.c:1920
__sys_connect_file net/socket.c:1854 [inline]
__sys_connect+0x267/0x2f0 net/socket.c:1871
__do_sys_connect net/socket.c:1882 [inline]
__se_sys_connect net/socket.c:1879 [inline]
__x64_sys_connect+0x6f/0xb0 net/socket.c:1879
do_syscall_64+0xb7/0x3d0 arch/x86/entry/common.c:295
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7fb577d06469
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
RSP: 002b:00007fb5783d5dd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 000000000068bfa0 RCX: 00007fb577d06469
RDX: 000000000000004d RSI: 0000000020000040 RDI: 0000000000000003
RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000041427c R14: 00007fb5783d65c0 R15: 0000000000000003
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/39
Reported-by: Christoph Paasch <cpaasch@apple.com>
Fixes: e1ff9e82e2ea ("net: mptcp: improve fallback to TCP")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net/dsa/tag_qca.c:48:15: warning: incorrect type in assignment (different base types)
net/dsa/tag_qca.c:48:15: expected unsigned short [usertype]
net/dsa/tag_qca.c:48:15: got restricted __be16 [usertype]
net/dsa/tag_qca.c:68:13: warning: incorrect type in assignment (different base types)
net/dsa/tag_qca.c:68:13: expected restricted __be16 [usertype] hdr
net/dsa/tag_qca.c:68:13: got int
net/dsa/tag_qca.c:71:16: warning: restricted __be16 degrades to integer
net/dsa/tag_qca.c:81:17: warning: restricted __be16 degrades to integer
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net/dsa/tag_mtk.c:84:13: warning: incorrect type in assignment (different base types)
net/dsa/tag_mtk.c:84:13: expected restricted __be16 [usertype] hdr
net/dsa/tag_mtk.c:84:13: got int
net/dsa/tag_mtk.c:94:17: warning: restricted __be16 degrades to integer
The result of a ntohs() is not __be16, but u16.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net/dsa/tag_lan9303.c:76:24: warning: incorrect type in assignment (different base types)
net/dsa/tag_lan9303.c:76:24: expected unsigned short [usertype]
net/dsa/tag_lan9303.c:76:24: got restricted __be16 [usertype]
net/dsa/tag_lan9303.c:80:24: warning: incorrect type in assignment (different base types)
net/dsa/tag_lan9303.c:80:24: expected unsigned short [usertype]
net/dsa/tag_lan9303.c:80:24: got restricted __be16 [usertype]
net/dsa/tag_lan9303.c:106:31: warning: restricted __be16 degrades to integer
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
net/dsa/tag_lan9303.c:111:24: warning: cast to restricted __be16
Make use of __be16 where appropriate to fix these warnings.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
cpu_to_be16 returns a __be16 value. So what it is assigned to needs to
have the same type to avoid warnings.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
net/dsa/slave.c:505:13: warning: incorrect type in initializer (different address spaces)
net/dsa/slave.c:505:13: expected void const [noderef] <asn:3> *__vpp_verify
net/dsa/slave.c:505:13: got struct pcpu_sw_netstats *
Add the needed _percpu property to prevent this warning.
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Without this, Opensshd fails to open an ipv6 socket listening
socket:
error: setsockopt IPV6_V6ONLY: Operation not supported
error: Bind to port 22 on :: failed: Address already in use.
Opensshd opens an ipv4 and and ipv6 listening socket, but because
IPV6_V6ONLY setsockopt fails, the port number is already in use.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This will e.g. make 'sshd restart' work when MPTCP is used, as we will
now set this option on the listener socket instead of only the mptcp
socket (where it has no effect).
We still need to copy the setting to the master socket so that a
subsequent getsockopt() returns the expected value.
Reported-by: Christoph Paasch <cpaasch@apple.com>
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
setsockopt(mptcp_fd, SOL_SOCKET, ...)... appears to work (returns 0),
but it has no effect -- this is because the MPTCP layer never has a
chance to copy the settings to the subflow socket.
Skip the generic handling for the mptcp case and instead call the
mptcp specific handler instead for SOL_SOCKET too.
Next patch adds more specific handling for SOL_SOCKET to mptcp.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2020-07-04
The following pull-request contains BPF updates for your *net-next* tree.
We've added 73 non-merge commits during the last 17 day(s) which contain
a total of 106 files changed, 5233 insertions(+), 1283 deletions(-).
The main changes are:
1) bpftool ability to show PIDs of processes having open file descriptors
for BPF map/program/link/BTF objects, relying on BPF iterator progs
to extract this info efficiently, from Andrii Nakryiko.
2) Addition of BPF iterator progs for dumping TCP and UDP sockets to
seq_files, from Yonghong Song.
3) Support access to BPF map fields in struct bpf_map from programs
through BTF struct access, from Andrey Ignatov.
4) Add a bpf_get_task_stack() helper to be able to dump /proc/*/stack
via seq_file from BPF iterator progs, from Song Liu.
5) Make SO_KEEPALIVE and related options available to bpf_setsockopt()
helper, from Dmitry Yakunin.
6) Optimize BPF sk_storage selection of its caching index, from Martin
KaFai Lau.
7) Removal of redundant synchronize_rcu()s from BPF map destruction which
has been a historic leftover, from Alexei Starovoitov.
8) Several improvements to test_progs to make it easier to create a shell
loop that invokes each test individually which is useful for some CIs,
from Jesper Dangaard Brouer.
9) Fix bpftool prog dump segfault when compiled without skeleton code on
older clang versions, from John Fastabend.
10) Bunch of cleanups and minor improvements, from various others.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Bail out if userspace sends unsupported chain flags.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This new chain flag specifies that:
* the kernel dynamically allocates the chain name, if no chain name
is specified.
* If the immediate expression that refers to this chain is removed,
then this bound chain (and its content) is destroyed.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This patch adds a helper function to add the chain to the hashtable and
the chain list.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This enum definition was never exposed through UAPI. Rename
NFT_BASE_CHAIN to NFT_CHAIN_BASE for consistency.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This netlink attribute allows you to identify the chain to jump/goto by
means of the chain ID.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This new netlink attribute allows you to add rules to chains by the
chain ID.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
This netlink attribute allows you to refer to chains inside a
transaction as an alternative to the name and the handle. The chain
binding support requires this new chain ID approach.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
YangYuxi is reporting that connection reuse
is causing one-second delay when SYN hits
existing connection in TIME_WAIT state.
Such delay was added to give time to expire
both the IPVS connection and the corresponding
conntrack. This was considered a rare case
at that time but it is causing problem for
some environments such as Kubernetes.
As nf_conntrack_tcp_packet() can decide to
release the conntrack in TIME_WAIT state and
to replace it with a fresh NEW conntrack, we
can use this to allow rescheduling just by
tuning our check: if the conntrack is
confirmed we can not schedule it to different
real server and the one-second delay still
applies but if new conntrack was created,
we are free to select new real server without
any delays.
YangYuxi lists some of the problem reports:
- One second connection delay in masquerading mode:
https://marc.info/?t=151683118100004&r=1&w=2
- IPVS low throughput #70747
https://github.com/kubernetes/kubernetes/issues/70747
- Apache Bench can fill up ipvs service proxy in seconds #544
https://github.com/cloudnativelabs/kube-router/issues/544
- Additional 1s latency in `host -> service IP -> pod`
https://github.com/kubernetes/kubernetes/issues/90854
Fixes: f719e3754ee2 ("ipvs: drop first packet to redirect conntrack")
Co-developed-by: YangYuxi <yx.atom1@gmail.com>
Signed-off-by: YangYuxi <yx.atom1@gmail.com>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
IPv6 ping sockets route based on fwmark, but do not yet set skb->mark.
Add this. IPv4 ping sockets also do both.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This patch extends the function br_fill_ifinfo to return also the MRP
status for each instance on a bridge. It also adds a new filter
RTEXT_FILTER_MRP to return the MRP status only when this is set, not to
interfer with the vlans. The MRP status is return only on the bridge
interfaces.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Add the function br_mrp_fill_info which populates the MRP attributes
regarding the status of each MRP instance.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
When mptcp is used, userspace doesn't read from the tcp (subflow)
socket but from the parent (mptcp) socket receive queue.
skbs are moved from the subflow socket to the mptcp rx queue either from
'data_ready' callback (if mptcp socket can be locked), a work queue, or
the socket receive function.
This means tcp_rcv_space_adjust() is never called and thus no receive
buffer size auto-tuning is done.
An earlier (not merged) patch added tcp_rcv_space_adjust() calls to the
function that moves skbs from subflow to mptcp socket.
While this enabled autotuning, it also meant tuning was done even if
userspace was reading the mptcp socket very slowly.
This adds mptcp_rcv_space_adjust() and calls it after userspace has
read data from the mptcp socket rx queue.
Its very similar to tcp_rcv_space_adjust, with two differences:
1. The rtt estimate is the largest one observed on a subflow
2. The rcvbuf size and window clamp of all subflows is adjusted
to the mptcp-level rcvbuf.
Otherwise, we get spurious drops at tcp (subflow) socket level if
the skbs are not moved to the mptcp socket fast enough.
Before:
time mptcp_connect.sh -t -f $((4*1024*1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap
[..]
ns4 MPTCP -> ns3 (10.0.3.2:10108 ) MPTCP (duration 40823ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109 ) TCP (duration 23119ms) [ OK ]
ns4 TCP -> ns3 (10.0.3.2:10110 ) MPTCP (duration 5421ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP (duration 41446ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP (duration 23427ms) [ OK ]
ns4 TCP -> ns3 (dead:beef:3::2:10113) MPTCP (duration 5426ms) [ OK ]
Time: 1396 seconds
After:
ns4 MPTCP -> ns3 (10.0.3.2:10108 ) MPTCP (duration 5417ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109 ) TCP (duration 5427ms) [ OK ]
ns4 TCP -> ns3 (10.0.3.2:10110 ) MPTCP (duration 5422ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP (duration 5415ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP (duration 5422ms) [ OK ]
ns4 TCP -> ns3 (dead:beef:3::2:10113) MPTCP (duration 5423ms) [ OK ]
Time: 296 seconds
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Similar to fq_codel and the other qdiscs that can set as default,
fq_pie is also suitable for general use without explicit configuration,
which makes it a valid choice for this.
This is useful in situations where a painless out-of-the-box solution
for reducing bufferbloat is desired but fq_codel is not necessarily the
best choice. For example, fq_pie can be better for DASH streaming, but
there could be more cases where it's the better choice of the two simple
AQMs available in the kernel.
Signed-off-by: Danny Lin <danny@kdrag0n.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
|