From 9c90c9b3e50e16d03c7f87d63e9db373974781e0 Mon Sep 17 00:00:00 2001 From: Michal Kubecek Date: Mon, 23 May 2022 22:05:24 +0200 Subject: Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process" This reverts commit 4dc2a5a8f6754492180741facf2a8787f2c415d7. A non-zero return value from pfkey_broadcast() does not necessarily mean an error occurred as this function returns -ESRCH when no registered listener received the message. In particular, a call with BROADCAST_PROMISC_ONLY flag and null one_sk argument can never return zero so that this commit in fact prevents processing any PF_KEY message. One visible effect is that racoon daemon fails to find encryption algorithms like aes and refuses to start. Excluding -ESRCH return value would fix this but it's not obvious that we really want to bail out here and most other callers of pfkey_broadcast() also ignore the return value. Also, as pointed out by Steffen Klassert, PF_KEY is kind of deprecated and newer userspace code should use netlink instead so that we should only disturb the code for really important fixes. v2: add a comment explaining why is the return value ignored Signed-off-by: Michal Kubecek Signed-off-by: Steffen Klassert --- net/key/af_key.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/key/af_key.c b/net/key/af_key.c index 339d95df19d3..d93bde657359 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -2826,10 +2826,12 @@ static int pfkey_process(struct sock *sk, struct sk_buff *skb, const struct sadb void *ext_hdrs[SADB_EXT_MAX]; int err; - err = pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, - BROADCAST_PROMISC_ONLY, NULL, sock_net(sk)); - if (err) - return err; + /* Non-zero return value of pfkey_broadcast() does not always signal + * an error and even on an actual error we may still want to process + * the message so rather ignore the return value. + */ + pfkey_broadcast(skb_clone(skb, GFP_KERNEL), GFP_KERNEL, + BROADCAST_PROMISC_ONLY, NULL, sock_net(sk)); memset(ext_hdrs, 0, sizeof(ext_hdrs)); err = parse_exthdrs(skb, hdr, ext_hdrs); -- cgit v1.2.3-70-g09d2 From 6821ad8770340825f17962cf5ef64ebaffee7fd7 Mon Sep 17 00:00:00 2001 From: Maciej Żenczykowski Date: Wed, 18 May 2022 14:05:48 -0700 Subject: xfrm: do not set IPv4 DF flag when encapsulating IPv6 frames <= 1280 bytes. One may want to have DF set on large packets to support discovering path mtu and limiting the size of generated packets (hence not setting the XFRM_STATE_NOPMTUDISC tunnel flag), while still supporting networks that are incapable of carrying even minimal sized IPv6 frames (post encapsulation). Having IPv4 Don't Frag bit set on encapsulated IPv6 frames that are not larger than the minimum IPv6 mtu of 1280 isn't useful, because the resulting ICMP Fragmentation Required error isn't actionable (even assuming you receive it) because IPv6 will not drop it's path mtu below 1280 anyway. While the IPv4 stack could prefrag the packets post encap, this requires the ICMP error to be successfully delivered and causes a loss of the original IPv6 frame (thus requiring a retransmit and latency hit). Luckily with IPv4 if we simply don't set the DF flag, we'll just make further fragmenting the packets some other router's problems. We'll still learn the correct IPv4 path mtu through encapsulation of larger IPv6 frames. I'm still not convinced this patch is entirely sufficient to make everything happy... but I don't see how it could possibly make things worse. See also recent: 4ff2980b6bd2 'xfrm: fix tunnel model fragmentation behavior' and friends Cc: Lorenzo Colitti Cc: Eric Dumazet Cc: Lina Wang Cc: Steffen Klassert Signed-off-by: Maciej Zenczykowski Signed-off-by: Steffen Klassert --- net/xfrm/xfrm_output.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c index d4935b3b9983..555ab35cd119 100644 --- a/net/xfrm/xfrm_output.c +++ b/net/xfrm/xfrm_output.c @@ -273,6 +273,7 @@ static int xfrm4_beet_encap_add(struct xfrm_state *x, struct sk_buff *skb) */ static int xfrm4_tunnel_encap_add(struct xfrm_state *x, struct sk_buff *skb) { + bool small_ipv6 = (skb->protocol == htons(ETH_P_IPV6)) && (skb->len <= IPV6_MIN_MTU); struct dst_entry *dst = skb_dst(skb); struct iphdr *top_iph; int flags; @@ -303,7 +304,7 @@ static int xfrm4_tunnel_encap_add(struct xfrm_state *x, struct sk_buff *skb) if (flags & XFRM_STATE_NOECN) IP_ECN_clear(top_iph); - top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) ? + top_iph->frag_off = (flags & XFRM_STATE_NOPMTUDISC) || small_ipv6 ? 0 : (XFRM_MODE_SKB_CB(skb)->frag_off & htons(IP_DF)); top_iph->ttl = ip4_dst_hoplimit(xfrm_dst_child(dst)); -- cgit v1.2.3-70-g09d2