diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-10-31 05:10:11 -1000 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-10-31 05:10:11 -1000 |
commit | 89ed67ef126c4160349c1b96fdb775ea6170ac90 (patch) | |
tree | 98caaf8bba44b21f9345a0af1dd2bd9987764e27 /net/ipv4/tcp_bbr.c | |
parent | 5a6a09e97199d6600d31383055f9d43fbbcbe86f (diff) | |
parent | f1c73396133cb3d913e2075298005644ee8dfade (diff) |
Merge tag 'net-next-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Support usec resolution of TCP timestamps, enabled selectively by a
route attribute.
- Defer regular TCP ACK while processing socket backlog, try to send
a cumulative ACK at the end. Increase single TCP flow performance
on a 200Gbit NIC by 20% (100Gbit -> 120Gbit).
- The Fair Queuing (FQ) packet scheduler:
- add built-in 3 band prio / WRR scheduling
- support bypass if the qdisc is mostly idle (5% speed up for TCP RR)
- improve inactive flow reporting
- optimize the layout of structures for better cache locality
- Support TCP Authentication Option (RFC 5925, TCP-AO), a more modern
replacement for the old MD5 option.
- Add more retransmission timeout (RTO) related statistics to
TCP_INFO.
- Support sending fragmented skbs over vsock sockets.
- Make sure we send SIGPIPE for vsock sockets if socket was
shutdown().
- Add sysctl for ignoring lower limit on lifetime in Router
Advertisement PIO, based on an in-progress IETF draft.
- Add sysctl to control activation of TCP ping-pong mode.
- Add sysctl to make connection timeout in MPTCP configurable.
- Support rcvlowat and notsent_lowat on MPTCP sockets, to help apps
limit the number of wakeups.
- Support netlink GET for MDB (multicast forwarding), allowing user
space to request a single MDB entry instead of dumping the entire
table.
- Support selective FDB flushing in the VXLAN tunnel driver.
- Allow limiting learned FDB entries in bridges, prevent OOM attacks.
- Allow controlling via configfs netconsole targets which were
created via the kernel cmdline at boot, rather than via configfs at
runtime.
- Support multiple PTP timestamp event queue readers with different
filters.
- MCTP over I3C.
BPF:
- Add new veth-like netdevice where BPF program defines the logic of
the xmit routine. It can operate in L3 and L2 mode.
- Support exceptions - allow asserting conditions which should never
be true but are hard for the verifier to infer. With some extra
flexibility around handling of the exit / failure:
https://lwn.net/Articles/938435/
- Add support for local per-cpu kptr, allow allocating and storing
per-cpu objects in maps. Access to those objects operates on the
value for the current CPU.
This allows to deprecate local one-off implementations of per-CPU
storage like BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE maps.
- Extend cgroup BPF sockaddr hooks for UNIX sockets. The use case is
for systemd to re-implement the LogNamespace feature which allows
running multiple instances of systemd-journald to process the logs
of different services.
- Enable open-coded task_vma iteration, after maple tree conversion
made it hard to directly walk VMAs in tracing programs.
- Add open-coded task, css_task and css iterator support. One of the
use cases is customizable OOM victim selection via BPF.
- Allow source address selection with bpf_*_fib_lookup().
- Add ability to pin BPF timer to the current CPU.
- Prevent creation of infinite loops by combining tail calls and
fentry/fexit programs.
- Add missed stats for kprobes to retrieve the number of missed
kprobe executions and subsequent executions of BPF programs.
- Inherit system settings for CPU security mitigations.
- Add BPF v4 CPU instruction support for arm32 and s390x.
Changes to common code:
- overflow: add DEFINE_FLEX() for on-stack definition of structs with
flexible array members.
- Process doc update with more guidance for reviewers.
Driver API:
- Simplify locking in WiFi (cfg80211 and mac80211 layers), use wiphy
mutex in most places and remove a lot of smaller locks.
- Create a common DPLL configuration API. Allow configuring and
querying state of PLL circuits used for clock syntonization, in
network time distribution.
- Unify fragmented and full page allocation APIs in page pool code.
Let drivers be ignorant of PAGE_SIZE.
- Rework PHY state machine to avoid races with calls to phy_stop().
- Notify DSA drivers of MAC address changes on user ports, improve
correctness of offloads which depend on matching port MAC
addresses.
- Allow antenna control on injected WiFi frames.
- Reduce the number of variants of napi_schedule().
- Simplify error handling when composing devlink health messages.
Misc:
- A lot of KCSAN data race "fixes", from Eric.
- A lot of __counted_by() annotations, from Kees.
- A lot of strncpy -> strscpy and printf format fixes.
- Replace master/slave terminology with conduit/user in DSA drivers.
- Handful of KUnit tests for netdev and WiFi core.
Removed:
- AppleTalk COPS.
- AppleTalk ipddp.
- TI AR7 CPMAC Ethernet driver.
Drivers:
- Ethernet high-speed NICs:
- Intel (100G, ice, idpf):
- add a driver for the Intel E2000 IPUs
- make CRC/FCS stripping configurable
- cross-timestamping for E823 devices
- basic support for E830 devices
- use aux-bus for managing client drivers
- i40e: report firmware versions via devlink
- nVidia/Mellanox:
- support 4-port NICs
- increase max number of channels to 256
- optimize / parallelize SF creation flow
- Broadcom (bnxt):
- enhance NIC temperature reporting
- support PAM4 speeds and lane configuration
- Marvell OcteonTX2:
- PTP pulse-per-second output support
- enable hardware timestamping for VFs
- Solarflare/AMD:
- conntrack NAT offload and offload for tunnels
- Wangxun (ngbe/txgbe):
- expose HW statistics
- Pensando/AMD:
- support PCI level reset
- narrow down the condition under which skbs are linearized
- Netronome/Corigine (nfp):
- support CHACHA20-POLY1305 crypto in IPsec offload
- Ethernet NICs embedded, slower, virtual:
- Synopsys (stmmac):
- add Loongson-1 SoC support
- enable use of HW queues with no offload capabilities
- enable PPS input support on all 5 channels
- increase TX coalesce timer to 5ms
- RealTek USB (r8152): improve efficiency of Rx by using GRO frags
- xen: support SW packet timestamping
- add drivers for implementations based on TI's PRUSS (AM64x EVM)
- nVidia/Mellanox Ethernet datacenter switches:
- avoid poor HW resource use on Spectrum-4 by better block
selection for IPv6 multicast forwarding and ordering of blocks
in ACL region
- Ethernet embedded switches:
- Microchip:
- support configuring the drive strength for EMI compliance
- ksz9477: partial ACL support
- ksz9477: HSR offload
- ksz9477: Wake on LAN
- Realtek:
- rtl8366rb: respect device tree config of the CPU port
- Ethernet PHYs:
- support Broadcom BCM5221 PHYs
- TI dp83867: support hardware LED blinking
- CAN:
- add support for Linux-PHY based CAN transceivers
- at91_can: clean up and use rx-offload helpers
- WiFi:
- MediaTek (mt76):
- new sub-driver for mt7925 USB/PCIe devices
- HW wireless <> Ethernet bridging in MT7988 chips
- mt7603/mt7628 stability improvements
- Qualcomm (ath12k):
- WCN7850:
- enable 320 MHz channels in 6 GHz band
- hardware rfkill support
- enable IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS to
make scan faster
- read board data variant name from SMBIOS
- QCN9274: mesh support
- RealTek (rtw89):
- TDMA-based multi-channel concurrency (MCC)
- Silicon Labs (wfx):
- Remain-On-Channel (ROC) support
- Bluetooth:
- ISO: many improvements for broadcast support
- mark BCM4378/BCM4387 as BROKEN_LE_CODED
- add support for QCA2066
- btmtksdio: enable Bluetooth wakeup from suspend"
* tag 'net-next-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1816 commits)
net: pcs: xpcs: Add 2500BASE-X case in get state for XPCS drivers
net: bpf: Use sockopt_lock_sock() in ip_sock_set_tos()
net: mana: Use xdp_set_features_flag instead of direct assignment
vxlan: Cleanup IFLA_VXLAN_PORT_RANGE entry in vxlan_get_size()
iavf: delete the iavf client interface
iavf: add a common function for undoing the interrupt scheme
iavf: use unregister_netdev
iavf: rely on netdev's own registered state
iavf: fix the waiting time for initial reset
iavf: in iavf_down, don't queue watchdog_task if comms failed
iavf: simplify mutex_trylock+sleep loops
iavf: fix comments about old bit locks
doc/netlink: Update schema to support cmd-cnt-name and cmd-max-name
tools: ynl: introduce option to process unknown attributes or types
ipvlan: properly track tx_errors
netdevsim: Block until all devices are released
nfp: using napi_build_skb() to replace build_skb()
net: dsa: microchip: ksz9477: Fix spelling mistake "Enery" -> "Energy"
net: dsa: microchip: Ensure Stable PME Pin State for Wake-on-LAN
net: dsa: microchip: Refactor switch shutdown routine for WoL preparation
...
Diffstat (limited to 'net/ipv4/tcp_bbr.c')
-rw-r--r-- | net/ipv4/tcp_bbr.c | 13 |
1 files changed, 7 insertions, 6 deletions
diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 146792cd26fe..22358032dd48 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -258,7 +258,7 @@ static unsigned long bbr_bw_to_pacing_rate(struct sock *sk, u32 bw, int gain) u64 rate = bw; rate = bbr_rate_bytes_per_sec(sk, rate, gain); - rate = min_t(u64, rate, sk->sk_max_pacing_rate); + rate = min_t(u64, rate, READ_ONCE(sk->sk_max_pacing_rate)); return rate; } @@ -278,7 +278,8 @@ static void bbr_init_pacing_rate_from_rtt(struct sock *sk) } bw = (u64)tcp_snd_cwnd(tp) * BW_UNIT; do_div(bw, rtt_us); - sk->sk_pacing_rate = bbr_bw_to_pacing_rate(sk, bw, bbr_high_gain); + WRITE_ONCE(sk->sk_pacing_rate, + bbr_bw_to_pacing_rate(sk, bw, bbr_high_gain)); } /* Pace using current bw estimate and a gain factor. */ @@ -290,14 +291,14 @@ static void bbr_set_pacing_rate(struct sock *sk, u32 bw, int gain) if (unlikely(!bbr->has_seen_rtt && tp->srtt_us)) bbr_init_pacing_rate_from_rtt(sk); - if (bbr_full_bw_reached(sk) || rate > sk->sk_pacing_rate) - sk->sk_pacing_rate = rate; + if (bbr_full_bw_reached(sk) || rate > READ_ONCE(sk->sk_pacing_rate)) + WRITE_ONCE(sk->sk_pacing_rate, rate); } /* override sysctl_tcp_min_tso_segs */ __bpf_kfunc static u32 bbr_min_tso_segs(struct sock *sk) { - return sk->sk_pacing_rate < (bbr_min_tso_rate >> 3) ? 1 : 2; + return READ_ONCE(sk->sk_pacing_rate) < (bbr_min_tso_rate >> 3) ? 1 : 2; } static u32 bbr_tso_segs_goal(struct sock *sk) @@ -309,7 +310,7 @@ static u32 bbr_tso_segs_goal(struct sock *sk) * driver provided sk_gso_max_size. */ bytes = min_t(unsigned long, - sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift), + READ_ONCE(sk->sk_pacing_rate) >> READ_ONCE(sk->sk_pacing_shift), GSO_LEGACY_MAX_SIZE - 1 - MAX_TCP_HEADER); segs = max_t(u32, bytes / tp->mss_cache, bbr_min_tso_segs(sk)); |