summaryrefslogtreecommitdiff
path: root/net/ipv6
AgeCommit message (Collapse)Author
2018-01-26net/ipv6: Add support for onlink flagDavid Ahern
Similar to IPv4 allow routes to be added with the RTNH_F_ONLINK flag. The onlink option requires a gateway and a nexthop device. Any unicast gateway is allowed (including IPv4 mapped addresses and unresolved ones) as long as the gateway is not a local address and if it resolves it must match the given device. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26net/ipv6: Add flags and table id to ip6_nh_lookup_tableDavid Ahern
onlink verification needs to do a lookup in potentially different table than the table in fib6_config and without the RT6_LOOKUP_F_IFACE flag. Change ip6_nh_lookup_table to take table id and flags as input arguments. Both verifications want to ignore link state, so add that flag can stay in the lookup helper. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-26net/ipv6: Move gateway validation into helperDavid Ahern
Move existing code to validate nexthop into a helper. Follow on patch adds support for nexthops marked with onlink, and this helper keeps the complexity of ip6_route_info_create in check. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net: erspan: use bitfield instead of mask and offsetWilliam Tu
Originally the erspan fields are defined as a group into a __be16 field, and use mask and offset to access each field. This is more costly due to calling ntohs/htons. The patch changes it to use bitfields. Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net: don't call update_pmtu unconditionallyNicolas Dichtel
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to: "BUG: unable to handle kernel NULL pointer dereference at (null)" Let's add a helper to check if update_pmtu is available before calling it. Fixes: 52a589d51f10 ("geneve: update skb dst pmtu on tx path") Fixes: a93bf0ff4490 ("vxlan: update skb dst pmtu on tx path") CC: Roman Kapl <code@rkapl.cz> CC: Xin Long <lucien.xin@gmail.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/ipv6: Do not allow route add with a device that is downDavid Ahern
IPv6 allows routes to be installed when the device is not up (admin up). Worse, it does not mark it as LINKDOWN. IPv4 does not allow it and really there is no reason for IPv6 to allow it, so check the flags and deny if device is admin down. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-01-24 1) Only offloads SAs after they are fully initialized. Otherwise a NIC may receive packets on a SA we can not yet handle in the stack. From Yossi Kuperman. 2) Fix negative refcount in case of a failing offload. From Aviad Yehezkel. 3) Fix inner IP ptoro version when decapsulating from interaddress family tunnels. From Yossi Kuperman. 4) Use true or false for boolean variables instead of an integer value in xfrm_get_type_offload. From Gustavo A. R. Silva. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABELBen Hutchings
Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl setting") removed the initialisation of ipv6_pinfo::autoflowlabel and added a second flag to indicate whether this field or the net namespace default should be used. The getsockopt() handling for this case was not updated, so it currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is not explicitly enabled. Fix it to return the effective value, whether that has been set at the socket or net namespace level. Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
en_rx_am.c was deleted in 'net-next' but had a bug fixed in it in 'net'. The esp{4,6}_offload.c conflicts were overlapping changes. The 'out' label is removed so we just return ERR_PTR(-EINVAL) directly. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23xfrm: Fix eth_hdr(skb)->h_proto to reflect inner IP versionYossi Kuperman
IPSec tunnel mode supports encapsulation of IPv4 over IPv6 and vice-versa. The outer IP header is stripped and the inner IP inherits the original Ethernet header. Tcpdump fails to properly decode the inner packet in case that h_proto is different than the inner IP version. Fix h_proto to reflect the inner IP version. Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-22gso: validate gso_type in GSO handlersWillem de Bruijn
Validate gso_type during segmentation as SKB_GSO_DODGY sources may pass packets where the gso_type does not match the contents. Syzkaller was able to enter the SCTP gso handler with a packet of gso_type SKB_GSO_TCPV4. On entry of transport layer gso handlers, verify that the gso_type matches the transport protocol. Fixes: 90017accff61 ("sctp: Add GSO support") Link: http://lkml.kernel.org/r/<001a1137452496ffc305617e5fe0@google.com> Reported-by: syzbot+fee64147a25aecd48055@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. Basically, a new extension for ip6tables, simplification work of nf_tables that saves us 500 LoC, allow raw table registration before defragmentation, conversion of the SNMP helper to use the ASN.1 code generator, unique 64-bit handle for all nf_tables objects and fixes to address fallout from previous nf-next batch. More specifically, they are: 1) Seven patches to remove family abstraction layer (struct nft_af_info) in nf_tables, this simplifies our codebase and it saves us 64 bytes per net namespace. 2) Add IPv6 segment routing header matching for ip6tables, from Ahmed Abdelsalam. 3) Allow to register iptable_raw table before defragmentation, some people do not want to waste cycles on defragmenting traffic that is going to be dropped, hence add a new module parameter to enable this behaviour in iptables and ip6tables. From Subash Abhinov Kasiviswanathan. This patch needed a couple of follow up patches to get things tidy from Arnd Bergmann. 4) SNMP helper uses the ASN.1 code generator, from Taehee Yoo. Several patches for this helper to prepare this change are also part of this patch series. 5) Add 64-bit handles to uniquely objects in nf_tables, from Harsha Sharma. 6) Remove log message that several netfilter subsystems print at boot/load time. 7) Restore x_tables module autoloading, that got broken in a previous patch to allow singleton NAT hook callback registration per hook spot, from Florian Westphal. Moreover, return EBUSY to report that the singleton NAT hook slot is already in instead. 8) Several fixes for the new nf_tables flowtable representation, including incorrect error check after nf_tables_flowtable_lookup(), missing Kconfig dependencies that lead to build breakage and missing initialization of priority and hooknum in flowtable object. 9) Missing NETFILTER_FAMILY_ARP dependency in Kconfig for the clusterip target. This is due to recent updates in the core to shrink the hook array size and compile it out if no specific family is enabled via .config file. Patch from Florian Westphal. 10) Remove duplicated include header files, from Wei Yongjun. 11) Sparse warning fix for the NFPROTO_INET handling from the core due to missing static function definition, also from Wei Yongjun. 12) Restore ICMPv6 Parameter Problem error reporting when defragmentation fails, from Subash Abhinov Kasiviswanathan. 13) Remove obsolete owner field initialization from struct file_operations, patch from Alexey Dobriyan. 14) Use boolean datatype where needed in the Netfilter codebase, from Gustavo A. R. Silva. 15) Remove double semicolon in dynset nf_tables expression, from Luis de Bethencourt. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
The BPF verifier conflict was some minor contextual issue. The TUN conflict was less trivial. Cong Wang fixed a memory leak of tfile->tx_array in 'net'. This is an skb_array. But meanwhile in net-next tun changed tfile->tx_arry into tfile->tx_ring which is a ptr_ring. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19ipv6: mcast: remove dead codeEric Dumazet
Since commit 41033f029e39 ("snmp: Remove duplicate OUTMCAST stat increment") one line of code became unneeded. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-19netfilter: remove messages print and boot/module load timePablo Neira Ayuso
Several reasons for this: * Several modules maintain internal version numbers, that they print at boot/module load time, that are not exposed to userspace, as a primitive mechanism to make revision number control from the earlier days of Netfilter. * IPset shows the protocol version at boot/module load time, instead display this via module description, as Jozsef suggested. * Remove copyright notice at boot/module load time in two spots, the Netfilter codebase is a collective development effort, if we would have to display copyrights for each contributor at boot/module load time for each extensions we have, we would probably fill up logs with lots of useless information - from a technical standpoint. So let's be consistent and remove them all. Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-18ipv6: don't let tb6_root node share routes with other nodeWei Wang
After commit 4512c43eac7e, if we add a route to the subtree of tb6_root which does not have any route attached to it yet, the current code will let tb6_root and the node in the subtree share the same route. This could cause problem cause tb6_root has RTN_INFO flag marked and the tree repair and clean up code will not work properly. This commit makes sure tb6_root->leaf points back to null_entry instead of sharing route with other node. It fixes the following syzkaller reported issue: BUG: KASAN: use-after-free in ipv6_prefix_equal include/net/ipv6.h:540 [inline] BUG: KASAN: use-after-free in fib6_add_1+0x165f/0x1790 net/ipv6/ip6_fib.c:618 Read of size 8 at addr ffff8801bc043498 by task syz-executor5/19819 CPU: 1 PID: 19819 Comm: syz-executor5 Not tainted 4.15.0-rc7+ #186 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430 ipv6_prefix_equal include/net/ipv6.h:540 [inline] fib6_add_1+0x165f/0x1790 net/ipv6/ip6_fib.c:618 fib6_add+0x5fa/0x1540 net/ipv6/ip6_fib.c:1214 __ip6_ins_rt+0x6c/0x90 net/ipv6/route.c:1003 ip6_route_add+0x141/0x190 net/ipv6/route.c:2790 ipv6_route_ioctl+0x4db/0x6b0 net/ipv6/route.c:3299 inet6_ioctl+0xef/0x1e0 net/ipv6/af_inet6.c:520 sock_do_ioctl+0x65/0xb0 net/socket.c:958 sock_ioctl+0x2c2/0x440 net/socket.c:1055 vfs_ioctl fs/ioctl.c:46 [inline] do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686 SYSC_ioctl fs/ioctl.c:701 [inline] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692 entry_SYSCALL_64_fastpath+0x23/0x9a RIP: 0033:0x452ac9 RSP: 002b:00007fd42b321c58 EFLAGS: 00000212 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000000000071bea0 RCX: 0000000000452ac9 RDX: 0000000020fd7000 RSI: 000000000000890b RDI: 0000000000000013 RBP: 000000000000049e R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000212 R12: 00000000006f4f70 R13: 00000000ffffffff R14: 00007fd42b3226d4 R15: 0000000000000000 Fixes: 4512c43eac7e ("ipv6: remove null_entry before adding default route") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-18ip6_gre: init dev->mtu and dev->hard_header_len correctlyAlexey Kodanev
Commit b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") moved dev->mtu initialization from ip6gre_tunnel_setup() to ip6gre_tunnel_init(), as a result, the previously set values, before ndo_init(), are reset in the following cases: * rtnl_create_link() can update dev->mtu from IFLA_MTU parameter. * ip6gre_tnl_link_config() is invoked before ndo_init() in netlink and ioctl setup, so ndo_init() can reset MTU adjustments with the lower device MTU as well, dev->mtu and dev->hard_header_len. Not applicable for ip6gretap because it has one more call to ip6gre_tnl_link_config(tunnel, 1) in ip6gre_tap_init(). Fix the first case by updating dev->mtu with 'tb[IFLA_MTU]' parameter if a user sets it manually on a device creation, and fix the second one by moving ip6gre_tnl_link_config() call after register_netdevice(). Fixes: b05229f44228 ("gre6: Cleanup GREv6 transmit path, call common GRE functions") Fixes: db2ec95d1ba4 ("ip6_gre: Fix MTU setting") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Overlapping changes all over. The mini-qdisc bits were a little bit tricky, however. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16net: delete /proc THIS_MODULE referencesAlexey Dobriyan
/proc has been ignoring struct file_operations::owner field for 10 years. Specifically, it started with commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where inode->i_fop is initialized with proxy struct file_operations for regular files: - if (de->proc_fops) - inode->i_fop = de->proc_fops; + if (de->proc_fops) { + if (S_ISREG(inode->i_mode)) + inode->i_fop = &proc_reg_file_ops; + else + inode->i_fop = de->proc_fops; + } VFS stopped pinning module at this point. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-16netfilter: nf_defrag: move NF_CONNTRACK bits into #ifdefArnd Bergmann
We cannot access the skb->_nfct field when CONFIG_NF_CONNTRACK is disabled: net/ipv4/netfilter/nf_defrag_ipv4.c: In function 'ipv4_conntrack_defrag': net/ipv4/netfilter/nf_defrag_ipv4.c:83:9: error: 'struct sk_buff' has no member named '_nfct' net/ipv6/netfilter/nf_defrag_ipv6_hooks.c: In function 'ipv6_defrag': net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68:9: error: 'struct sk_buff' has no member named '_nfct' Both functions already have an #ifdef for this, so let's move the check in there. Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16netfilter: nf_defrag: mark xt_table structures 'const' againArnd Bergmann
As a side-effect of adding the module option, we now get a section mismatch warning: WARNING: net/ipv4/netfilter/iptable_raw.o(.data+0x1c): Section mismatch in reference from the variable packet_raw to the function .init.text:iptable_raw_table_init() The variable packet_raw references the function __init iptable_raw_table_init() If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable: *_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console Apparently it's ok to link to a __net_init function from .rodata but not from .data. We can address this by rearranging the logic so that the structure is read-only again. Instead of writing to the .priority field later, we have an extra copies of the structure with that flag. An added advantage is that that we don't have writable function pointers with this approach. Fixes: 902d6a4c2a4f ("netfilter: nf_defrag: Skip defrag if NOTRACK is set") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-16netfilter: ipv6: nf_defrag: Pass on packets to stack per RFC2460Subash Abhinov Kasiviswanathan
ipv6_defrag pulls network headers before fragment header. In case of an error, the netfilter layer is currently dropping these packets. This results in failure of some IPv6 standards tests which passed on older kernels due to the netfilter framework using cloning. The test case run here is a check for ICMPv6 error message replies when some invalid IPv6 fragments are sent. This specific test case is listed in https://www.ipv6ready.org/docs/Core_Conformance_Latest.pdf in the Extension Header Processing Order section. A packet with unrecognized option Type 11 is sent and the test expects an ICMP error in line with RFC2460 section 4.2 - 11 - discard the packet and, only if the packet's Destination Address was not a multicast address, send an ICMP Parameter Problem, Code 2, message to the packet's Source Address, pointing to the unrecognized Option Type. Since netfilter layer now drops all invalid IPv6 frag packets, we no longer see the ICMP error message and fail the test case. To fix this, save the transport header. If defrag is unable to process the packet due to RFC2460, restore the transport header and allow packet to be processed by stack. There is no change for other packet processing paths. Tested by confirming that stack sends an ICMP error when it receives these packets. Also tested that fragmented ICMP pings succeed. v1->v2: Instead of cloning always, save the transport_header and restore it in case of this specific error. Update the title and commit message accordingly. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-15ip: Define usercopy region in IP proto slab cacheDavid Windsor
The ICMP filters for IPv4 and IPv6 raw sockets need to be copied to/from userspace. In support of usercopy hardening, this patch defines a region in the struct proto slab cache in which userspace copy operations are allowed. example usage trace: net/ipv4/raw.c: raw_seticmpfilter(...): ... copy_from_user(&raw_sk(sk)->filter, ..., optlen) raw_geticmpfilter(...): ... copy_to_user(..., &raw_sk(sk)->filter, len) net/ipv6/raw.c: rawv6_seticmpfilter(...): ... copy_from_user(&raw6_sk(sk)->filter, ..., optlen) rawv6_geticmpfilter(...): ... copy_to_user(..., &raw6_sk(sk)->filter, len) This region is known as the slab cache's usercopy region. Slab caches can now check that each dynamically sized copy operation involving cache-managed memory falls entirely within the slab's usercopy region. This patch is modified from Brad Spengler/PaX Team's PAX_USERCOPY whitelisting code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Signed-off-by: David Windsor <dave@nullcore.net> [kees: split from network patch, provide usage trace] Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org>
2018-01-15ipv6: Fix build with gcc-4.4.5Ido Schimmel
Emil reported the following compiler errors: net/ipv6/route.c: In function `rt6_sync_up`: net/ipv6/route.c:3586: error: unknown field `nh_flags` specified in initializer net/ipv6/route.c:3586: warning: missing braces around initializer net/ipv6/route.c:3586: warning: (near initialization for `arg.<anonymous>`) net/ipv6/route.c: In function `rt6_sync_down_dev`: net/ipv6/route.c:3695: error: unknown field `event` specified in initializer net/ipv6/route.c:3695: warning: missing braces around initializer net/ipv6/route.c:3695: warning: (near initialization for `arg.<anonymous>`) Problem is with the named initializers for the anonymous union members. Fix this by adding curly braces around the initialization. Fixes: 4c981e28d373 ("ipv6: Prepare to handle multiple netdev events") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Emil S Tantilov <emils.tantilov@gmail.com> Tested-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv6: ip6_make_skb() needs to clear cork.base.dstEric Dumazet
In my last patch, I missed fact that cork.base.dst was not initialized in ip6_make_skb() : If ip6_setup_cork() returns an error, we might attempt a dst_release() on some random pointer. Fixes: 862c03ee1deb ("ipv6: fix possible mem leaks in ipv6_make_skb()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-15ipv6: fix udpv6 sendmsg crash caused by too small MTUMike Maloney
The logic in __ip6_append_data() assumes that the MTU is at least large enough for the headers. A device's MTU may be adjusted after being added while sendmsg() is processing data, resulting in __ip6_append_data() seeing any MTU. For an mtu smaller than the size of the fragmentation header, the math results in a negative 'maxfraglen', which causes problems when refragmenting any previous skb in the skb_write_queue, leaving it possibly malformed. Instead sendmsg returns EINVAL when the mtu is calculated to be less than IPV6_MIN_MTU. Found by syzkaller: kernel BUG at ./include/linux/skbuff.h:2064! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801d0b68580 task.stack: ffff8801ac6b8000 RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline] RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216 RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000 RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0 RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000 R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8 R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000 FS: 00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6_finish_skb include/net/ipv6.h:911 [inline] udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762 sock_sendmsg_nosec net/socket.c:633 [inline] sock_sendmsg+0xca/0x110 net/socket.c:643 SYSC_sendto+0x352/0x5a0 net/socket.c:1750 SyS_sendto+0x40/0x50 net/socket.c:1718 entry_SYSCALL_64_fastpath+0x1f/0xbe RIP: 0033:0x4512e9 RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9 RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005 RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69 R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000 Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570 RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Mike Maloney <maloney@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-12Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-01-11 1) Don't allow to change the encap type on state updates. The encap type is set on state initialization and should not change anymore. From Herbert Xu. 2) Skip dead policies when rehashing to fix a slab-out-of-bounds bug in xfrm_hash_rebuild. From Florian Westphal. 3) Two buffer overread fixes in pfkey. From Eric Biggers. 4) Fix rcu usage in xfrm_get_type_offload, request_module can sleep, so can't be used under rcu_read_lock. From Sabrina Dubroca. 5) Fix an uninitialized lock in xfrm_trans_queue. Use __skb_queue_tail instead of skb_queue_tail in xfrm_trans_queue as we don't need the lock. From Herbert Xu. 6) Currently it is possible to create an xfrm state with an unknown encap type in ESP IPv4. Fix this by returning an error on unknown encap types. Also from Herbert Xu. 7) Fix sleeping inside a spinlock in xfrm_policy_cache_flush. From Florian Westphal. 8) Fix ESP GRO when the headers not fully in the linear part of the skb. We need to pull before we can access them. 9) Fix a skb leak on error in key_notify_policy. 10) Fix a race in the xdst pcpu cache, we need to run the resolver routines with bottom halfes off like the old flowcache did. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
BPF alignment tests got a conflict because the registers are output as Rn_w instead of just Rn in net-next, and in net a fixup for a testcase prohibits logical operations on pointers before using them. Also, we should attempt to patch BPF call args if JIT always on is enabled. Instead, if we fail to JIT the subprogs we should pass an error back up and fail immediately. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-11netfilter: nf_defrag: Skip defrag if NOTRACK is setSubash Abhinov Kasiviswanathan
conntrack defrag is needed only if some module like CONNTRACK or NAT explicitly requests it. For plain forwarding scenarios, defrag is not needed and can be skipped if NOTRACK is set in a rule. Since conntrack defrag is currently higher priority than raw table, setting NOTRACK is not sufficient. We need to move raw to a higher priority for iptables only. This is achieved by introducing a module parameter "raw_before_defrag" which allows to change the priority of raw table to place it before defrag. By default, the parameter is disabled and the priority of raw table is NF_IP_PRI_RAW to support legacy behavior. If the module parameter is enabled, then the priority of the raw table is set to NF_IP_PRI_RAW_BEFORE_DEFRAG. Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-10ipv6: sr: fix TLVs not being copied using setsockoptMathieu Xhonneux
Function ipv6_push_rthdr4 allows to add an IPv6 Segment Routing Header to a socket through setsockopt, but the current implementation doesn't copy possible TLVs at the end of the SRH received from userspace. Therefore, the execution of the following branch if (sr_has_hmac(sr_phdr)) { ... } will never complete since the len and type fields of a possible HMAC TLV are not copied, hence seg6_get_tlv_hmac will return an error, and the HMAC will not be computed. This commit adds a memcpy in case TLVs have been appended to the SRH. Fixes: a149e7c7ce81 ("ipv6: sr: add support for SRH injection through setsockopt") Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10ipv6: fix possible mem leaks in ipv6_make_skb()Eric Dumazet
ip6_setup_cork() might return an error, while memory allocations have been done and must be rolled back. Fixes: 6422398c2ab0 ("ipv6: introduce ipv6_make_skb") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Reported-by: Mike Maloney <maloney@google.com> Acked-by: Mike Maloney <maloney@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10ipv6: Add support for non-equal-cost multipathIdo Schimmel
The use of hash-threshold instead of modulo-N makes it trivial to add support for non-equal-cost multipath. Instead of dividing the multipath hash function's output space equally between the nexthops, each nexthop is assigned a region size which is proportional to its weight. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10ipv6: Use hash-threshold instead of modulo-NIdo Schimmel
Now that each nexthop stores its region boundary in the multipath hash function's output space, we can use hash-threshold instead of modulo-N in multipath selection. This reduces the number of checks we need to perform during lookup, as dead and linkdown nexthops are assigned a negative region boundary. In addition, in contrast to modulo-N, only flows near region boundaries are affected when a nexthop is added or removed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10ipv6: Use a 31-bit multipath hashIdo Schimmel
The hash thresholds assigned to IPv6 nexthops are in the range of [-1, 2^31 - 1], where a negative value is assigned to nexthops that should not be considered during multipath selection. Therefore, in a similar fashion to IPv4, we need to use the upper 31-bits of the multipath hash for multipath selection. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10ipv6: Calculate hash thresholds for IPv6 nexthopsIdo Schimmel
Before we convert IPv6 to use hash-threshold instead of modulo-N, we first need each nexthop to store its region boundary in the hash function's output space. The boundary is calculated by dividing the output space equally between the different active nexthops. That is, nexthops that are not dead or linkdown. The boundaries are rebalanced whenever a nexthop is added or removed to a multipath route and whenever a nexthop becomes active or inactive. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-10netfilter: improve flow table Kconfig dependenciesArnd Bergmann
The newly added NF_FLOW_TABLE options cause some build failures in randconfig kernels: - when CONFIG_NF_CONNTRACK is disabled, or is a loadable module but NF_FLOW_TABLE is built-in: In file included from net/netfilter/nf_flow_table.c:8:0: include/net/netfilter/nf_conntrack.h:59:22: error: field 'ct_general' has incomplete type struct nf_conntrack ct_general; include/net/netfilter/nf_conntrack.h: In function 'nf_ct_get': include/net/netfilter/nf_conntrack.h:148:15: error: 'const struct sk_buff' has no member named '_nfct' include/net/netfilter/nf_conntrack.h: In function 'nf_ct_put': include/net/netfilter/nf_conntrack.h:157:2: error: implicit declaration of function 'nf_conntrack_put'; did you mean 'nf_ct_put'? [-Werror=implicit-function-declaration] net/netfilter/nf_flow_table.o: In function `nf_flow_offload_work_gc': (.text+0x1540): undefined reference to `nf_ct_delete' - when CONFIG_NF_TABLES is disabled: In file included from net/ipv6/netfilter/nf_flow_table_ipv6.c:13:0: include/net/netfilter/nf_tables.h: In function 'nft_gencursor_next': include/net/netfilter/nf_tables.h:1189:14: error: 'const struct net' has no member named 'nft'; did you mean 'nf'? - when CONFIG_NF_FLOW_TABLE_INET is enabled, but NF_FLOW_TABLE_IPV4 or NF_FLOW_TABLE_IPV6 are not, or are loadable modules net/netfilter/nf_flow_table_inet.o: In function `nf_flow_offload_inet_hook': nf_flow_table_inet.c:(.text+0x94): undefined reference to `nf_flow_offload_ipv6_hook' nf_flow_table_inet.c:(.text+0x40): undefined reference to `nf_flow_offload_ip_hook' - when CONFIG_NF_FLOW_TABLES is disabled, but the other options are enabled: net/netfilter/nf_flow_table_inet.o: In function `nf_flow_offload_inet_hook': nf_flow_table_inet.c:(.text+0x6c): undefined reference to `nf_flow_offload_ipv6_hook' net/netfilter/nf_flow_table_inet.o: In function `nf_flow_inet_module_exit': nf_flow_table_inet.c:(.exit.text+0x8): undefined reference to `nft_unregister_flowtable_type' net/netfilter/nf_flow_table_inet.o: In function `nf_flow_inet_module_init': nf_flow_table_inet.c:(.init.text+0x8): undefined reference to `nft_register_flowtable_type' net/ipv4/netfilter/nf_flow_table_ipv4.o: In function `nf_flow_ipv4_module_exit': nf_flow_table_ipv4.c:(.exit.text+0x8): undefined reference to `nft_unregister_flowtable_type' net/ipv4/netfilter/nf_flow_table_ipv4.o: In function `nf_flow_ipv4_module_init': nf_flow_table_ipv4.c:(.init.text+0x8): undefined reference to `nft_register_flowtable_type' This adds additional Kconfig dependencies to ensure that NF_CONNTRACK and NF_TABLES are always visible from NF_FLOW_TABLE, and that the internal dependencies between the four new modules are met. Fixes: 7c23b629a808 ("netfilter: flow table support for the mixed IPv4/IPv6 family") Fixes: 0995210753a2 ("netfilter: flow table support for IPv6") Fixes: 97add9f0d66d ("netfilter: flow table support for IPv4") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-10netfilter: add IPv6 segment routing header 'srh' matchAhmed Abdelsalam
It allows matching packets based on Segment Routing Header (SRH) information. The implementation considers revision 7 of the SRH draft. https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07 Currently supported match options include: (1) Next Header (2) Hdr Ext Len (3) Segments Left (4) Last Entry (5) Tag value of SRH Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-10netfilter: remove duplicated includeWei Yongjun
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-10netfilter: nf_tables: get rid of struct nft_af_info abstractionPablo Neira Ayuso
Remove the infrastructure to register/unregister nft_af_info structure, this structure stores no useful information anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-10netfilter: nf_tables: get rid of pernet familiesPablo Neira Ayuso
Now that we have a single table list for each netns, we can get rid of one pointer per family and the global afinfo list, thus, shrinking struct netns for nftables that now becomes 64 bytes smaller. And call __nft_release_afinfo() from __net_exit path accordingly to release netnamespace objects on removal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-10netfilter: nf_tables: remove nhooks field from struct nft_af_infoPablo Neira Ayuso
We already validate the hook through bitmask, so this check is superfluous. When removing this, this patch is also fixing a bug in the new flowtable codebase, since ctx->afi points to the table family instead of the netdev family which is where the flowtable is really hooked in. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-09ipv6: remove null_entry before adding default routeWei Wang
In the current code, when creating a new fib6 table, tb6_root.leaf gets initialized to net->ipv6.ip6_null_entry. If a default route is being added with rt->rt6i_metric = 0xffffffff, fib6_add() will add this route after net->ipv6.ip6_null_entry. As null_entry is shared, it could cause problem. In order to fix it, set fn->leaf to NULL before calling fib6_add_rt2node() when trying to add the first default route. And reset fn->leaf to null_entry when adding fails or when deleting the last default route. syzkaller reported the following issue which is fixed by this commit: WARNING: suspicious RCU usage 4.15.0-rc5+ #171 Not tainted ----------------------------- net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by swapper/0/0: #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline] #0: ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310 #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline] #1: (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007 #2: (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560 #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline] #3: (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585 fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701 fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892 fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815 fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863 fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933 __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949 fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline] fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016 fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033 call_timer_fn+0x228/0x820 kernel/time/timer.c:1320 expire_timers kernel/time/timer.c:1357 [inline] __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660 run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:540 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904 </IRQ> Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table") Signed-off-by: Wei Wang <weiwan@google.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09ipv6: use ARRAY_SIZE for array sizing calculation on array seg6_action_tableColin Ian King
Use the ARRAY_SIZE macro on array seg6_action_table to determine size of the array. Improvement suggested by coccinelle. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2018-01-09esp: Fix GRO when the headers not fully in the linear part of the skb.Steffen Klassert
The GRO layer does not necessarily pull the complete headers into the linear part of the skb, a part may remain on the first page fragment. This can lead to a crash if we try to pull the headers, so make sure we have them on the linear part before pulling. Fixes: 7785bba299a8 ("esp: Add a software GRO codepath") Reported-by: syzbot+82bbd65569c49c6c0c4d@syzkaller.appspotmail.com Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2018-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree: 1) Free hooks via call_rcu to speed up netns release path, from Florian Westphal. 2) Reduce memory footprint of hook arrays, skip allocation if family is not present - useful in case decnet support is not compiled built-in. Patches from Florian Westphal. 3) Remove defensive check for malformed IPv4 - including ihl field - and IPv6 headers in x_tables and nf_tables. 4) Add generic flow table offload infrastructure for nf_tables, this includes the netlink control plane and support for IPv4, IPv6 and mixed IPv4/IPv6 dataplanes. This comes with NAT support too. This patchset adds the IPS_OFFLOAD conntrack status bit to indicate that this flow has been offloaded. 5) Add secpath matching support for nf_tables, from Florian. 6) Save some code bytes in the fast path for the nf_tables netdev, bridge and inet families. 7) Allow one single NAT hook per point and do not allow to register NAT hooks in nf_tables before the conntrack hook, patches from Florian. 8) Seven patches to remove the struct nf_af_info abstraction, instead we perform direct calls for IPv4 which is faster. IPv6 indirections are still needed to avoid dependencies with the 'ipv6' module, but these now reside in struct nf_ipv6_ops. 9) Seven patches to handle NFPROTO_INET from the Netfilter core, hence we can remove specific code in nf_tables to handle this pseudofamily. 10) No need for synchronize_net() call for nf_queue after conversion to hook arrays. Also from Florian. 11) Call cond_resched_rcu() when dumping large sets in ipset to avoid softlockup. Again from Florian. 12) Pass lockdep_nfnl_is_held() to rcu_dereference_protected(), patch from Florian Westphal. 13) Fix matching of counters in ipset, from Jozsef Kadlecsik. 14) Missing nfnl lock protection in the ip_set_net_exit path, also from Jozsef. 15) Move connlimit code that we can reuse from nf_tables into nf_conncount, from Florian Westhal. And asorted cleanups: 16) Get rid of nft_dereference(), it only has one single caller. 17) Add nft_set_is_anonymous() helper function. 18) Remove NF_ARP_FORWARD leftover chain definition in nf_tables_arp. 19) Remove unnecessary comments in nf_conntrack_h323_asn1.c From Varsha Rao. 20) Remove useless parameters in frag_safe_skb_hp(), from Gao Feng. 21) Constify layer 4 conntrack protocol definitions, function parameters to register/unregister these protocol trackers, and timeouts. Patches from Florian Westphal. 22) Remove nlattr_size indirection, from Florian Westphal. 23) Add fall-through comments as -Wimplicit-fallthrough needs this, from Gustavo A. R. Silva. 24) Use swap() macro to exchange values in ipset, patch from Gustavo A. R. Silva. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08net: ipv6: Allow connect to linklocal address from socket bound to vrfDavid Ahern
Allow a process bound to a VRF to connect to a linklocal address. Currently, this fails because of a mismatch between the scope of the linklocal address and the sk_bound_dev_if inherited by the VRF binding: $ ssh -6 fe80::70b8:cff:fedd:ead8%eth1 ssh: connect to host fe80::70b8:cff:fedd:ead8%eth1 port 22: Invalid argument Relax the scope check to allow the socket to be bound to the same L3 device as the scope id. This makes ipv6 linklocal consistent with other relaxed checks enabled by commits 1ff23beebdd3 ("net: l3mdev: Allow send on enslaved interface") and 7bb387c5ab12a ("net: Allow IP_MULTICAST_IF to set index to L3 slave"). Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-08netfilter: flow table support for the mixed IPv4/IPv6 familyPablo Neira Ayuso
This patch adds the IPv6 flow table type, that implements the datapath flow table to forward IPv6 traffic. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-01-08netfilter: flow table support for IPv6Pablo Neira Ayuso
This patch adds the IPv6 flow table type, that implements the datapath flow table to forward IPv6 traffic. This patch exports ip6_dst_mtu_forward() that is required to check for mtu to pass up packets that need PMTUD handling to the classic forwarding path. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>