<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/net/ipv4/tcp.c, branch v4.7</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.</subtitle>
<id>https://git.kobert.dev/pm24.git/atom/net/ipv4/tcp.c?h=v4.7</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom/net/ipv4/tcp.c?h=v4.7'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2016-05-04T16:44:36Z</updated>
<entry>
<title>tcp: guarantee forward progress in tcp_sendmsg()</title>
<updated>2016-05-04T16:44:36Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-05-03T04:49:25Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d4011239f46ac6e407af61e3f74d1e3874fc9394'/>
<id>urn:sha1:d4011239f46ac6e407af61e3f74d1e3874fc9394</id>
<content type='text'>
Under high rx pressure, it is possible tcp_sendmsg() never has a
chance to allocate an skb and loop forever as sk_flush_backlog()
would always return true.

Fix this by calling sk_flush_backlog() only if one skb had been
allocated and filled before last backlog check.

Fixes: d41a69f1d390 ("tcp: make tcp_sendmsg() aware of socket backlog")
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: make tcp_sendmsg() aware of socket backlog</title>
<updated>2016-05-02T21:02:26Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-29T21:16:53Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d41a69f1d390fa3f2546498103cdcd78b30676ff'/>
<id>urn:sha1:d41a69f1d390fa3f2546498103cdcd78b30676ff</id>
<content type='text'>
Large sendmsg()/write() hold socket lock for the duration of the call,
unless sk-&gt;sk_sndbuf limit is hit. This is bad because incoming packets
are parked into socket backlog for a long time.
Critical decisions like fast retransmit might be delayed.
Receivers have to maintain a big out of order queue with additional cpu
overhead, and also possible stalls in TX once windows are full.

Bidirectional flows are particularly hurt since the backlog can become
quite big if the copy from user space triggers IO (page faults)

Some applications learnt to use sendmsg() (or sendmmsg()) with small
chunks to avoid this issue.

Kernel should know better, right ?

Add a generic sk_flush_backlog() helper and use it right
before a new skb is allocated. Typically we put 64KB of payload
per skb (unless MSG_EOR is requested) and checking socket backlog
every 64KB gives good results.

As a matter of fact, tests with TSO/GSO disabled give very nice
results, as we manage to keep a small write queue and smaller
perceived rtt.

Note that sk_flush_backlog() maintains socket ownership,
so is not equivalent to a {release_sock(sk); lock_sock(sk);},
to ensure implicit atomicity rules that sendmsg() was
giving to (possibly buggy) applications.

In this simple implementation, I chose to not call tcp_release_cb(),
but we might consider this later.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Alexei Starovoitov &lt;ast@fb.com&gt;
Cc: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: do not block bh during prequeue processing</title>
<updated>2016-05-02T21:02:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-29T21:16:48Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=fb3477c0f45aad5dfb2de559949872770e6cd431'/>
<id>urn:sha1:fb3477c0f45aad5dfb2de559949872770e6cd431</id>
<content type='text'>
AFAIK, nothing in current TCP stack absolutely wants BH
being disabled once socket is owned by a thread running in
process context.

As mentioned in my prior patch ("tcp: give prequeue mode some care"),
processing a batch of packets might take time, better not block BH
at all.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: do not assume TCP code is non preemptible</title>
<updated>2016-05-02T21:02:25Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-29T21:16:47Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=c10d9310edf5aa4a676991139d1a43ec7d87e56b'/>
<id>urn:sha1:c10d9310edf5aa4a676991139d1a43ec7d87e56b</id>
<content type='text'>
We want to to make TCP stack preemptible, as draining prequeue
and backlog queues can take lot of time.

Many SNMP updates were assuming that BH (and preemption) was disabled.

Need to convert some __NET_INC_STATS() calls to NET_INC_STATS()
and some __TCP_INC_STATS() to TCP_INC_STATS()

Before using this_cpu_ptr(net-&gt;ipv4.tcp_sk) in tcp_v4_send_reset()
and tcp_v4_send_ack(), we add an explicit preempt disabled section.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Make use of MSG_EOR in tcp_sendmsg</title>
<updated>2016-04-28T20:14:18Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2016-04-25T21:44:48Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=c134ecb87817ce70fd62b2dc48bb079c44fc08df'/>
<id>urn:sha1:c134ecb87817ce70fd62b2dc48bb079c44fc08df</id>
<content type='text'>
This patch adds an eor bit to the TCP_SKB_CB.  When MSG_EOR
is passed to tcp_sendmsg, the eor bit will be set at the skb
containing the last byte of the userland's msg.  The eor bit
will prevent data from appending to that skb in the future.

The change in do_tcp_sendpages is to honor the eor set
during the previous tcp_sendmsg(MSG_EOR) call.

This patch handles the tcp_sendmsg case.  The followup patches
will handle other skb coalescing and fragment cases.

One potential use case is to use MSG_EOR with
SOF_TIMESTAMPING_TX_ACK to get a more accurate
TCP ack timestamping on application protocol with
multiple outgoing response messages (e.g. HTTP2).

Packetdrill script for testing:
~~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 &lt; S 0:0(0) win 32792 &lt;mss 1460,sackOK,nop,nop,nop,wscale 7&gt;
0.100 &gt; S. 0:0(0) ack 1 &lt;mss 1460,nop,nop,sackOK,nop,wscale 7&gt;
0.200 &lt; . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 14600) = 14600
0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730
0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730

0.200 &gt; .  1:7301(7300) ack 1
0.200 &gt; P. 7301:14601(7300) ack 1

0.300 &lt; . 1:1(0) ack 14601 win 257
0.300 &gt; P. 14601:15331(730) ack 1
0.300 &gt; P. 15331:16061(730) ack 1

0.400 &lt; . 1:1(0) ack 16061 win 257
0.400 close(4) = 0
0.400 &gt; F. 16061:16061(0) ack 1
0.400 &lt; F. 1:1(0) ack 16062 win 257
0.400 &gt; . 16062:16062(0) ack 2

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Suggested-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: remove SKBTX_ACK_TSTAMP since it is redundant</title>
<updated>2016-04-28T20:06:10Z</updated>
<author>
<name>Soheil Hassas Yeganeh</name>
<email>soheil@google.com</email>
</author>
<published>2016-04-28T03:39:01Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=0a2cf20c3fb62ad4717276b5303bf831f7b29d54'/>
<id>urn:sha1:0a2cf20c3fb62ad4717276b5303bf831f7b29d54</id>
<content type='text'>
The SKBTX_ACK_TSTAMP flag is set in skb_shinfo-&gt;tx_flags when
the timestamp of the TCP acknowledgement should be reported on
error queue. Since accessing skb_shinfo is likely to incur a
cache-line miss at the time of receiving the ack, the
txstamp_ack bit was added in tcp_skb_cb, which is set iff
the SKBTX_ACK_TSTAMP flag is set for an skb. This makes
SKBTX_ACK_TSTAMP flag redundant.

Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit
everywhere.

Note that this frees one bit in shinfo-&gt;tx_flags.

Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Suggested-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: remove an unnecessary check in tcp_tx_timestamp</title>
<updated>2016-04-28T20:06:10Z</updated>
<author>
<name>Soheil Hassas Yeganeh</name>
<email>soheil@google.com</email>
</author>
<published>2016-04-28T03:39:00Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=863c1fd9814618eefba02218f8fadf8a430c2a17'/>
<id>urn:sha1:863c1fd9814618eefba02218f8fadf8a430c2a17</id>
<content type='text'>
Remove the redundant check for sk-&gt;sk_tsflags in tcp_tx_timestamp.

tcp_tx_timestamp() receives the tsflags as a parameter. As a
result the "sk-&gt;sk_tsflags || tsflags" is redundant, since
tsflags already includes sk-&gt;sk_tsflags plus overrides from
control messages.

Signed-off-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: rename NET_{ADD|INC}_STATS_BH()</title>
<updated>2016-04-28T02:48:24Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-27T23:44:39Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=02a1d6e7a6bb025a77da77012190e1efc1970f1c'/>
<id>urn:sha1:02a1d6e7a6bb025a77da77012190e1efc1970f1c</id>
<content type='text'>
Rename NET_INC_STATS_BH() to __NET_INC_STATS()
and NET_ADD_STATS_BH() to __NET_ADD_STATS()

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: tcp: rename TCP_INC_STATS_BH</title>
<updated>2016-04-28T02:48:23Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-27T23:44:32Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=90bbcc608369a1b46089b0f5aa22b8ea31ffa12e'/>
<id>urn:sha1:90bbcc608369a1b46089b0f5aa22b8ea31ffa12e</id>
<content type='text'>
Rename TCP_INC_STATS_BH() to __TCP_INC_STATS()

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: snmp: kill various STATS_USER() helpers</title>
<updated>2016-04-28T02:48:22Z</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-04-27T23:44:27Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=6aef70a851ac77967992340faaff33f44598f60a'/>
<id>urn:sha1:6aef70a851ac77967992340faaff33f44598f60a</id>
<content type='text'>
In the old days (before linux-3.0), SNMP counters were duplicated,
one for user context, and one for BH context.

After commit 8f0ea0fe3a03 ("snmp: reduce percpu needs by 50%")
we have a single copy, and what really matters is preemption being
enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc()
respectively.

We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(),
NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(),
SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(),
UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER()

Following patches will rename __BH helpers to make clear their
usage is not tied to BH being disabled.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
