<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/include/net, branch v4.6-rc7</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v4.6-rc7</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v4.6-rc7'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2016-05-03T20:00:54Z</updated>
<entry>
<title>vxlan: Add checksum check to the features check function</title>
<updated>2016-05-03T20:00:54Z</updated>
<author>
<name>Alexander Duyck</name>
<email>aduyck@mirantis.com</email>
</author>
<published>2016-05-02T16:25:16Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=af67eb9e7e1ab37880459f83153d34b3c42b0075'/>
<id>urn:sha1:af67eb9e7e1ab37880459f83153d34b3c42b0075</id>
<content type='text'>
We need to perform an additional check on the inner headers to determine if
we can offload the checksum for them.  Previously this check didn't occur
so we would generate an invalid frame in the case of an IPv6 header
encapsulated inside of an IPv4 tunnel.  To fix this I added a secondary
check to vxlan_features_check so that we can verify that we can offload the
inner checksum.

Signed-off-by: Alexander Duyck &lt;aduyck@mirantis.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>switchdev: Adding complete operation to deferred switchdev ops</title>
<updated>2016-04-24T18:23:32Z</updated>
<author>
<name>Elad Raz</name>
<email>eladr@mellanox.com</email>
</author>
<published>2016-04-21T10:52:43Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=7ceb2afbd6aee4643056b47156baad6841db8e78'/>
<id>urn:sha1:7ceb2afbd6aee4643056b47156baad6841db8e78</id>
<content type='text'>
When using switchdev deferred operation (SWITCHDEV_F_DEFER), the operation
is executed in different context and the application doesn't have any way
to get the operation real status.

Adding a completion callback fixes that. This patch adds fields to
switchdev_attr and switchdev_obj "complete_priv" field which is used by
the "complete" callback.

Application can set a complete function which will be called once the
operation executed.

Signed-off-by: Elad Raz &lt;eladr@mellanox.com&gt;
Signed-off-by: Jiri Pirko &lt;jiri@mellanox.com&gt;
Reviewed-by: Ido Schimmel &lt;idosch@mellanox.com&gt;
Acked-by: Nikolay Aleksandrov &lt;nikolay@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>tcp: Merge tx_flags and tskey in tcp_shifted_skb</title>
<updated>2016-04-21T18:40:55Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2016-04-20T05:39:29Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=cfea5a688eb37bcd1081255df9f9f777f4e61999'/>
<id>urn:sha1:cfea5a688eb37bcd1081255df9f9f777f4e61999</id>
<content type='text'>
After receiving sacks, tcp_shifted_skb() will collapse
skbs if possible.  tx_flags and tskey also have to be
merged.

This patch reuses the tcp_skb_collapse_tstamp() to handle
them.

BPF Output Before:
~~~~~
&lt;no-output-due-to-missing-tstamp-event&gt;

BPF Output After:
~~~~~
&lt;...&gt;-2024  [007] d.s.    88.644374: : ee_data:14599

Packetdrill Script:
~~~~~
+0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10`
+0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1`
+0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0

0.100 &lt; S 0:0(0) win 32792 &lt;mss 1460,sackOK,nop,nop,nop,wscale 7&gt;
0.100 &gt; S. 0:0(0) ack 1 &lt;mss 1460,nop,nop,sackOK,nop,wscale 7&gt;
0.200 &lt; . 1:1(0) ack 1 win 257
0.200 accept(3, ..., ...) = 4
+0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0

0.200 write(4, ..., 1460) = 1460
+0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0
0.200 write(4, ..., 13140) = 13140

0.200 &gt; P. 1:1461(1460) ack 1
0.200 &gt; . 1461:8761(7300) ack 1
0.200 &gt; P. 8761:14601(5840) ack 1

0.300 &lt; . 1:1(0) ack 1 win 257 &lt;sack 1461:14601,nop,nop&gt;
0.300 &gt; P. 1:1461(1460) ack 1
0.400 &lt; . 1:1(0) ack 14601 win 257

0.400 close(4) = 0
0.400 &gt; F. 14601:14601(0) ack 1
0.500 &lt; F. 1:1(0) ack 14602 win 257
0.500 &gt; . 14602:14602(0) ack 2

Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Neal Cardwell &lt;ncardwell@google.com&gt;
Cc: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Cc: Willem de Bruijn &lt;willemb@google.com&gt;
Cc: Yuchung Cheng &lt;ycheng@google.com&gt;
Acked-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Tested-by: Soheil Hassas Yeganeh &lt;soheil@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>cls_cgroup: get sk_classid only from full sockets</title>
<updated>2016-04-20T00:09:25Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@yandex-team.ru</email>
</author>
<published>2016-04-18T11:37:10Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=2309236c13feae619572324efd3e910e66ef6bd0'/>
<id>urn:sha1:2309236c13feae619572324efd3e910e66ef6bd0</id>
<content type='text'>
skb-&gt;sk could point to timewait or request socket which has no sk_classid.
Detected as "BUG: KASAN: slab-out-of-bounds in cls_cgroup_classify".

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@yandex-team.ru&gt;
Acked-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>soreuseport: fix ordering for mixed v4/v6 sockets</title>
<updated>2016-04-15T01:14:03Z</updated>
<author>
<name>Craig Gallek</name>
<email>kraig@google.com</email>
</author>
<published>2016-04-12T17:11:25Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d894ba18d4e449b3a7f6eb491f16c9e02933736e'/>
<id>urn:sha1:d894ba18d4e449b3a7f6eb491f16c9e02933736e</id>
<content type='text'>
With the SO_REUSEPORT socket option, it is possible to create sockets
in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address.
This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on
the AF_INET6 sockets.

Prior to the commits referenced below, an incoming IPv4 packet would
always be routed to a socket of type AF_INET when this mixed-mode was used.
After those changes, the same packet would be routed to the most recently
bound socket (if this happened to be an AF_INET6 socket, it would
have an IPv4 mapped IPv6 address).

The change in behavior occurred because the recent SO_REUSEPORT optimizations
short-circuit the socket scoring logic as soon as they find a match.  They
did not take into account the scoring logic that favors AF_INET sockets
over AF_INET6 sockets in the event of a tie.

To fix this problem, this patch changes the insertion order of AF_INET
and AF_INET6 addresses in the TCP and UDP socket lists when the sockets
have SO_REUSEPORT set.  AF_INET sockets will be inserted at the head of the
list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at
the tail of the list.  This will force AF_INET sockets to always be
considered first.

Fixes: e32ea7e74727 ("soreuseport: fast reuseport UDP socket selection")
Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection")

Reported-by: Maciej Żenczykowski &lt;maze@google.com&gt;
Signed-off-by: Craig Gallek &lt;kraig@google.com&gt;
Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: udp: Do a route lookup and update during release_cb</title>
<updated>2016-04-14T20:29:53Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2016-04-11T22:29:37Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=e646b657f6983017783914a951039e323120dc55'/>
<id>urn:sha1:e646b657f6983017783914a951039e323120dc55</id>
<content type='text'>
This patch adds a release_cb for UDPv6.  It does a route lookup
and updates sk-&gt;sk_dst_cache if it is needed.  It picks up the
left-over job from ip6_sk_update_pmtu() if the sk was owned
by user during the pmtu update.

It takes a rcu_read_lock to protect the __sk_dst_get() operations
because another thread may do ip6_dst_store() without taking the
sk lock (e.g. sendmsg).

Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Reported-by: Wei Wang &lt;weiwan@google.com&gt;
Cc: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Wei Wang &lt;weiwan@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>ipv6: datagram: Update dst cache of a connected datagram sk during pmtu update</title>
<updated>2016-04-14T20:29:51Z</updated>
<author>
<name>Martin KaFai Lau</name>
<email>kafai@fb.com</email>
</author>
<published>2016-04-11T22:29:36Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=33c162a980fe03498fcecb917f618ad7e7c55e61'/>
<id>urn:sha1:33c162a980fe03498fcecb917f618ad7e7c55e61</id>
<content type='text'>
There is a case in connected UDP socket such that
getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible
sequence could be the following:
1. Create a connected UDP socket
2. Send some datagrams out
3. Receive a ICMPV6_PKT_TOOBIG
4. No new outgoing datagrams to trigger the sk_dst_check()
   logic to update the sk-&gt;sk_dst_cache.
5. getsockopt(IPV6_MTU) returns the mtu from the invalid
   sk-&gt;sk_dst_cache instead of the newly created RTF_CACHE clone.

This patch updates the sk-&gt;sk_dst_cache for a connected datagram sk
during pmtu-update code path.

Note that the sk-&gt;sk_v6_daddr is used to do the route lookup
instead of skb-&gt;data (i.e. iph).  It is because a UDP socket can become
connected after sending out some datagrams in un-connected state.  or
It can be connected multiple times to different destinations.  Hence,
iph may not be related to where sk is currently connected to.

It is done under '!sock_owned_by_user(sk)' condition because
the user may make another ip6_datagram_connect()  (i.e changing
the sk-&gt;sk_v6_daddr) while dst lookup is happening in the pmtu-update
code path.

For the sock_owned_by_user(sk) == true case, the next patch will
introduce a release_cb() which will update the sk-&gt;sk_dst_cache.

Test:

Server (Connected UDP Socket):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Route Details:
[root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac'
2fac::/64 dev eth0  proto kernel  metric 256  pref medium
2fac:face::/64 via 2fac::face dev eth0  metric 1024  pref medium

A simple python code to create a connected UDP socket:

import socket
import errno

HOST = '2fac::1'
PORT = 8080

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
s.bind((HOST, PORT))
s.connect(('2fac:face::face', 53))
print("connected")
while True:
    try:
	data = s.recv(1024)
    except socket.error as se:
	if se.errno == errno.EMSGSIZE:
		pmtu = s.getsockopt(41, 24)
		print("PMTU:%d" % pmtu)
		break
s.close()

Python program output after getting a ICMPV6_PKT_TOOBIG:
[root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py
connected
PMTU:1300

Cache routes after recieving TOOBIG:
[root@arch-fb-vm1 ~]# ip -6 r show table cache
2fac:face::face via 2fac::face dev eth0  metric 0
    cache  expires 463sec mtu 1300 pref medium

Client (Send the ICMPV6_PKT_TOOBIG):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
scapy is used to generate the TOOBIG message.  Here is the scapy script I have
used:

&gt;&gt;&gt; p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac::
1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53)
&gt;&gt;&gt; sendp(p, iface='qemubr0')

Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception")
Signed-off-by: Martin KaFai Lau &lt;kafai@fb.com&gt;
Reported-by: Wei Wang &lt;weiwan@google.com&gt;
Cc: Cong Wang &lt;xiyou.wangcong@gmail.com&gt;
Cc: Eric Dumazet &lt;edumazet@google.com&gt;
Cc: Wei Wang &lt;weiwan@google.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>net: vrf: Fix dst reference counting</title>
<updated>2016-04-11T19:56:20Z</updated>
<author>
<name>David Ahern</name>
<email>dsa@cumulusnetworks.com</email>
</author>
<published>2016-04-07T18:10:06Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=9ab179d83b4e31ea277a123492e419067c2f129a'/>
<id>urn:sha1:9ab179d83b4e31ea277a123492e419067c2f129a</id>
<content type='text'>
Vivek reported a kernel exception deleting a VRF with an active
connection through it. The root cause is that the socket has a cached
reference to a dst that is destroyed. Converting the dst_destroy to
dst_release and letting proper reference counting kick in does not
work as the dst has a reference to the device which needs to be released
as well.

I talked to Hannes about this at netdev and he pointed out the ipv4 and
ipv6 dst handling has dst_ifdown for just this scenario. Rather than
continuing with the reinvented dst wheel in VRF just remove it and
leverage the ipv4 and ipv6 versions.

Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver")
Fixes: 35402e3136634 ("net: Add IPv6 support to VRF device")

Signed-off-by: David Ahern &lt;dsa@cumulusnetworks.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>sctp: avoid refreshing heartbeat timer too often</title>
<updated>2016-04-11T02:22:34Z</updated>
<author>
<name>Marcelo Ricardo Leitner</name>
<email>marcelo.leitner@gmail.com</email>
</author>
<published>2016-04-06T18:15:19Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=ba6f5e33bdbb9ed2014b778fbbaecf20060ca989'/>
<id>urn:sha1:ba6f5e33bdbb9ed2014b778fbbaecf20060ca989</id>
<content type='text'>
Currently on high rate SCTP streams the heartbeat timer refresh can
consume quite a lot of resources as timer updates are costly and it
contains a random factor, which a) is also costly and b) invalidates
mod_timer() optimization for not editing a timer to the same value.
It may even cause the timer to be slightly advanced, for no good reason.

As suggested by David Laight this patch now removes this timer update
from hot path by leaving the timer on and re-evaluating upon its
expiration if the heartbeat is still needed or not, similarly to what is
done for TCP. If it's not needed anymore the timer is re-scheduled to
the new timeout, considering the time already elapsed.

For this, we now record the last tx timestamp per transport, updated in
the same spots as hb timer was restarted on tx. Also split up
sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
heartbeat one.

On loopback with MTU of 65535 and data chunks with 1636, so that we
have a considerable amount of chunks without stressing system calls,
netperf -t SCTP_STREAM -l 30, perf looked like this before:

Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
  Overhead  Command  Shared Object      Symbol
+    6,15%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
-    5,43%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
   - _raw_write_unlock_irqrestore
      - 96,54% _raw_spin_unlock_irqrestore
         - 36,14% mod_timer
            + 97,24% sctp_transport_reset_timers
            + 2,76% sctp_do_sm
         + 33,65% __wake_up_sync_key
         + 28,77% sctp_ulpq_tail_event
         + 1,40% del_timer
      - 1,84% mod_timer
         + 99,03% sctp_transport_reset_timers
         + 0,97% sctp_do_sm
      + 1,50% sctp_ulpq_tail_event

And after this patch, now with netperf -l 60:

Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
  Overhead  Command  Shared Object      Symbol
+    5,65%  netperf  [kernel.vmlinux]   [k] memcpy_erms
+    5,59%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
-    5,05%  netperf  [kernel.vmlinux]   [k] _raw_spin_unlock_irqrestore
   - _raw_spin_unlock_irqrestore
      + 49,89% __wake_up_sync_key
      + 45,68% sctp_ulpq_tail_event
      - 2,85% mod_timer
         + 76,51% sctp_transport_reset_t3_rtx
         + 23,49% sctp_do_sm
      + 1,55% del_timer
+    2,50%  netperf  [sctp]             [k] sctp_datamsg_from_user
+    2,26%  netperf  [sctp]             [k] sctp_sendmsg

Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
~3.7%.

Signed-off-by: Marcelo Ricardo Leitner &lt;marcelo.leitner@gmail.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mac80211-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211</title>
<updated>2016-04-08T20:41:28Z</updated>
<author>
<name>David S. Miller</name>
<email>davem@davemloft.net</email>
</author>
<published>2016-04-08T20:41:28Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=30d237a6c2e9be1bb816fe8e787b88fd7aad833b'/>
<id>urn:sha1:30d237a6c2e9be1bb816fe8e787b88fd7aad833b</id>
<content type='text'>
Johannes Berg says:

====================
For the current RC series, we have the following fixes:
 * TDLS fixes from Arik and Ilan
 * rhashtable fixes from Ben and myself
 * documentation fixes from Luis
 * U-APSD fixes from Emmanuel
 * a TXQ fix from Felix
 * and a compiler warning suppression from Jeff
====================

Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
</content>
</entry>
</feed>
