summaryrefslogtreecommitdiff
path: root/net/xfrm
AgeCommit message (Collapse)Author
2024-11-18Merge tag 'ipsec-next-2024-11-15' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next-11-15 1) Add support for RFC 9611 per cpu xfrm state handling. 2) Add inbound and outbound xfrm state caches to speed up state lookups. 3) Convert xfrm to dscp_t. From Guillaume Nault. 4) Fix error handling in build_aevent. From Everest K.C. 5) Replace strncpy with strscpy_pad in copy_to_user_auth. From Daniel Yang. 6) Fix an uninitialized symbol during acquire state insertion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-15xfrm: Fix acquire state insertion.Steffen Klassert
A recent commit jumped over the dst hash computation and left the symbol uninitialized. Fix this by explicitly computing the dst hash before it is used. Fixes: 0045e3d80613 ("xfrm: Cache used outbound xfrm states at the policy.") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-14xfrm: replace deprecated strncpy with strscpy_padDaniel Yang
The function strncpy is deprecated since it does not guarantee the destination buffer is NULL terminated. Recommended replacement is strscpy. The padded version was used to remain consistent with the other strscpy_pad usage in the modified function. Signed-off-by: Daniel Yang <danielyangkang@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-14xfrm: Add error handling when nla_put_u32() returns an errorEverest K.C
Error handling is missing when call to nla_put_u32() fails. Handle the error when the call to nla_put_u32() returns an error. The error was reported by Coverity Scan. Report: CID 1601525: (#1 of 1): Unused value (UNUSED_VALUE) returned_value: Assigning value from nla_put_u32(skb, XFRMA_SA_PCPU, x->pcpu_num) to err here, but that stored value is overwritten before it can be used Fixes: 1ddf9916ac09 ("xfrm: Add support for per cpu xfrm state handling.") Signed-off-by: Everest K.C. <everestkc@everestkc.com.np> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-11net: convert to nla_get_*_default()Johannes Berg
Most of the original conversion is from the spatch below, but I edited some and left out other instances that were either buggy after conversion (where default values don't fit into the type) or just looked strange. @@ expression attr, def; expression val; identifier fn =~ "^nla_get_.*"; fresh identifier dfn = fn ## "_default"; @@ ( -if (attr) - val = fn(attr); -else - val = def; +val = dfn(attr, def); | -if (!attr) - val = def; -else - val = fn(attr); +val = dfn(attr, def); | -if (!attr) - return def; -return fn(attr); +return dfn(attr, def); | -attr ? fn(attr) : def +dfn(attr, def) | -!attr ? def : fn(attr) +dfn(attr, def) ) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://patch.msgid.link/20241108114145.0580b8684e7f.I740beeaa2f70ebfc19bfca1045a24d6151992790@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-06xfrm: Convert struct xfrm_dst_lookup_params -> tos to dscp_t.Guillaume Nault
Add type annotation to the "tos" field of struct xfrm_dst_lookup_params, to ensure that the ECN bits aren't mistakenly taken into account when doing route lookups. Rename that field (tos -> dscp) to make that change explicit. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-06xfrm: Convert xfrm_dst_lookup() to dscp_t.Guillaume Nault
Pass a dscp_t variable to xfrm_dst_lookup(), instead of an int, to prevent accidental setting of ECN bits in ->flowi4_tos. Only xfrm_bundle_create() actually calls xfrm_dst_lookup(). Since it already has a dscp_t variable to pass as parameter, we only need to remove the inet_dscp_to_dsfield() conversion. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-06xfrm: Convert xfrm_bundle_create() to dscp_t.Guillaume Nault
Use a dscp_t variable to store the result of xfrm_get_dscp(). This prepares for the future conversion of xfrm_dst_lookup(). Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-11-06xfrm: Convert xfrm_get_tos() to dscp_t.Guillaume Nault
Return a dscp_t variable to prepare for the future conversion of xfrm_bundle_create() to dscp_t. While there, rename the function "xfrm_get_dscp", to align its name with the new return type. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-10-29xfrm: Restrict percpu SA attribute to specific netlink message typesSteffen Klassert
Reject the usage of XFRMA_SA_PCPU in xfrm netlink messages when it's not applicable. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29xfrm: Add an inbound percpu state cache.Steffen Klassert
Now that we can have percpu xfrm states, the number of active states might increase. To get a better lookup performance, we add a percpu cache to cache the used inbound xfrm states. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29xfrm: Cache used outbound xfrm states at the policy.Steffen Klassert
Now that we can have percpu xfrm states, the number of active states might increase. To get a better lookup performance, we cache the used xfrm states at the policy for outbound IPsec traffic. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-29xfrm: Add support for per cpu xfrm state handling.Steffen Klassert
Currently all flows for a certain SA must be processed by the same cpu to avoid packet reordering and lock contention of the xfrm state lock. To get rid of this limitation, the IETF standardized per cpu SAs in RFC 9611. This patch implements the xfrm part of it. We add the cpu as a lookup key for xfrm states and a config option to generate acquire messages for each cpu. With that, we can have on each cpu a SA with identical traffic selector so that flows can be processed in parallel on all cpus. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Tested-by: Antony Antony <antony.antony@secunet.com> Tested-by: Tobias Brunner <tobias@strongswan.org>
2024-10-24Merge tag 'ipsec-2024-10-22' of ↵Paolo Abeni
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-10-22 1) Fix routing behavior that relies on L4 information for xfrm encapsulated packets. From Eyal Birger. 2) Remove leftovers of pernet policy_inexact lists. From Florian Westphal. 3) Validate new SA's prefixlen when the selector family is not set from userspace. From Sabrina Dubroca. 4) Fix a kernel-infoleak when dumping an auth algorithm. From Petr Vaganov. Please pull or let me know if there are problems. ipsec-2024-10-22 * tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: fix one more kernel-infoleak in algo dumping xfrm: validate new SA's prefixlen using SA family when sel.family is unset xfrm: policy: remove last remnants of pernet inexact list xfrm: respect ip protocols rules criteria when performing dst lookups xfrm: extract dst lookup parameters into a struct ==================== Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-11xfrm: fix one more kernel-infoleak in algo dumpingPetr Vaganov
During fuzz testing, the following issue was discovered: BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x598/0x2a30 _copy_to_iter+0x598/0x2a30 __skb_datagram_iter+0x168/0x1060 skb_copy_datagram_iter+0x5b/0x220 netlink_recvmsg+0x362/0x1700 sock_recvmsg+0x2dc/0x390 __sys_recvfrom+0x381/0x6d0 __x64_sys_recvfrom+0x130/0x200 x64_sys_call+0x32c8/0x3cc0 do_syscall_64+0xd8/0x1c0 entry_SYSCALL_64_after_hwframe+0x79/0x81 Uninit was stored to memory at: copy_to_user_state_extra+0xcc1/0x1e00 dump_one_state+0x28c/0x5f0 xfrm_state_walk+0x548/0x11e0 xfrm_dump_sa+0x1e0/0x840 netlink_dump+0x943/0x1c40 __netlink_dump_start+0x746/0xdb0 xfrm_user_rcv_msg+0x429/0xc00 netlink_rcv_skb+0x613/0x780 xfrm_netlink_rcv+0x77/0xc0 netlink_unicast+0xe90/0x1280 netlink_sendmsg+0x126d/0x1490 __sock_sendmsg+0x332/0x3d0 ____sys_sendmsg+0x863/0xc30 ___sys_sendmsg+0x285/0x3e0 __x64_sys_sendmsg+0x2d6/0x560 x64_sys_call+0x1316/0x3cc0 do_syscall_64+0xd8/0x1c0 entry_SYSCALL_64_after_hwframe+0x79/0x81 Uninit was created at: __kmalloc+0x571/0xd30 attach_auth+0x106/0x3e0 xfrm_add_sa+0x2aa0/0x4230 xfrm_user_rcv_msg+0x832/0xc00 netlink_rcv_skb+0x613/0x780 xfrm_netlink_rcv+0x77/0xc0 netlink_unicast+0xe90/0x1280 netlink_sendmsg+0x126d/0x1490 __sock_sendmsg+0x332/0x3d0 ____sys_sendmsg+0x863/0xc30 ___sys_sendmsg+0x285/0x3e0 __x64_sys_sendmsg+0x2d6/0x560 x64_sys_call+0x1316/0x3cc0 do_syscall_64+0xd8/0x1c0 entry_SYSCALL_64_after_hwframe+0x79/0x81 Bytes 328-379 of 732 are uninitialized Memory access of size 732 starts at ffff88800e18e000 Data copied to user address 00007ff30f48aff0 CPU: 2 PID: 18167 Comm: syz-executor.0 Not tainted 6.8.11 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Fixes copying of xfrm algorithms where some random data of the structure fields can end up in userspace. Padding in structures may be filled with random (possibly sensitve) data and should never be given directly to user-space. A similar issue was resolved in the commit 8222d5910dae ("xfrm: Zero padding when dumping algos and encap") Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Fixes: c7a5899eb26e ("xfrm: redact SA secret with lockdown confidentiality") Cc: stable@vger.kernel.org Co-developed-by: Boris Tonofa <b.tonofa@ideco.ru> Signed-off-by: Boris Tonofa <b.tonofa@ideco.ru> Signed-off-by: Petr Vaganov <p.vaganov@ideco.ru> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-10-07xfrm: validate new SA's prefixlen using SA family when sel.family is unsetSabrina Dubroca
This expands the validation introduced in commit 07bf7908950a ("xfrm: Validate address prefix lengths in the xfrm selector.") syzbot created an SA with usersa.sel.family = AF_UNSPEC usersa.sel.prefixlen_s = 128 usersa.family = AF_INET Because of the AF_UNSPEC selector, verify_newsa_info doesn't put limits on prefixlen_{s,d}. But then copy_from_user_state sets x->sel.family to usersa.family (AF_INET). Do the same conversion in verify_newsa_info before validating prefixlen_{s,d}, since that's how prefixlen is going to be used later on. Reported-by: syzbot+cc39f136925517aed571@syzkaller.appspotmail.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-09-24xfrm: policy: remove last remnants of pernet inexact listFlorian Westphal
xfrm_net still contained the no-longer-used inexact policy list heads, remove them. Fixes: a54ad727f745 ("xfrm: policy: remove remaining use of inexact list") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-23xfrm: respect ip protocols rules criteria when performing dst lookupsEyal Birger
The series in the "fixes" tag added the ability to consider L4 attributes in routing rules. The dst lookup on the outer packet of encapsulated traffic in the xfrm code was not adapted to this change, thus routing behavior that relies on L4 information is not respected. Pass the ip protocol information when performing dst lookups. Fixes: a25724b05af0 ("Merge branch 'fib_rules-support-sport-dport-and-proto-match'") Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-23xfrm: extract dst lookup parameters into a structEyal Birger
Preparation for adding more fields to dst lookup functions without changing their signatures. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-10Merge tag 'ipsec-next-2024-09-10' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-09-10 1) Remove an unneeded WARN_ON on packet offload. From Patrisious Haddad. 2) Add a copy from skb_seq_state to buffer function. This is needed for the upcomming IPTFS patchset. From Christian Hopps. 3) Spelling fix in xfrm.h. From Simon Horman. 4) Speed up xfrm policy insertions. From Florian Westphal. 5) Add and revert a patch to support xfrm interfaces for packet offload. This patch was just half cooked. 6) Extend usage of the new xfrm_policy_is_dead_or_sk helper. From Florian Westphal. 7) Update comments on sdb and xfrm_policy. From Florian Westphal. 8) Fix a null pointer dereference in the new policy insertion code From Florian Westphal. 9) Fix an uninitialized variable in the new policy insertion code. From Nathan Chancellor. * tag 'ipsec-next-2024-09-10' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: policy: Restore dir assignments in xfrm_hash_rebuild() xfrm: policy: fix null dereference Revert "xfrm: add SA information to the offloaded packet" xfrm: minor update to sdb and xfrm_policy comments xfrm: policy: use recently added helper in more places xfrm: add SA information to the offloaded packet xfrm: policy: remove remaining use of inexact list xfrm: switch migrate to xfrm_policy_lookup_bytype xfrm: policy: don't iterate inexact policies twice at insert time selftests: add xfrm policy insertion speed test script xfrm: Correct spelling in xfrm.h net: add copy from skb_seq_state to buffer function xfrm: Remove documentation WARN_ON to limit return values for offloaded SA ==================== Link: https://patch.msgid.link/20240910065507.2436394-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-09xfrm: policy: Restore dir assignments in xfrm_hash_rebuild()Nathan Chancellor
Clang warns (or errors with CONFIG_WERROR): net/xfrm/xfrm_policy.c:1286:8: error: variable 'dir' is uninitialized when used here [-Werror,-Wuninitialized] 1286 | if ((dir & XFRM_POLICY_MASK) == XFRM_POLICY_OUT) { | ^~~ net/xfrm/xfrm_policy.c:1257:9: note: initialize the variable 'dir' to silence this warning 1257 | int dir; | ^ | = 0 1 error generated. A recent refactoring removed some assignments to dir because xfrm_policy_is_dead_or_sk() has a dir assignment in it. However, dir is used elsewhere in xfrm_hash_rebuild(), including within loops where it needs to be reloaded for each policy. Restore the assignments before the first use of dir to fix the warning and ensure dir is properly initialized throughout the function. Fixes: 08c2182cf0b4 ("xfrm: policy: use recently added helper in more places") Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-09xfrm: policy: fix null dereferenceFlorian Westphal
Julian Wiedmann says: > + if (!xfrm_pol_hold_rcu(ret)) Coverity spotted that ^^^ needs a s/ret/pol fix-up: > CID 1599386: Null pointer dereferences (FORWARD_NULL) > Passing null pointer "ret" to "xfrm_pol_hold_rcu", which dereferences it. Ditch the bogus 'ret' variable. Fixes: 563d5ca93e88 ("xfrm: switch migrate to xfrm_policy_lookup_bytype") Reported-by: Julian Wiedmann <jwiedmann.dev@gmail.com> Closes: https://lore.kernel.org/netdev/06dc2499-c095-4bd4-aee3-a1d0e3ec87c4@gmail.com/ Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-09Revert "xfrm: add SA information to the offloaded packet"Steffen Klassert
This reverts commit e7cd191f83fd899c233dfbe7dc6d96ef703dcbbd. While supporting xfrm interfaces in the packet offload API is needed, this patch does not do the right thing. There are more things to do to really support xfrm interfaces, so revert it for now. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-09-03netdev_features: convert NETIF_F_LLTX to dev->lltxAlexander Lobakin
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-31xfrm: Unmask upper DSCP bits in xfrm_get_tos()Ido Schimmel
The function returns a value that is used to initialize 'flowi4_tos' before being passed to the FIB lookup API in the following call chain: xfrm_bundle_create() tos = xfrm_get_tos(fl, family) xfrm_dst_lookup(..., tos, ...) __xfrm_dst_lookup(..., tos, ...) xfrm4_dst_lookup(..., tos, ...) __xfrm4_dst_lookup(..., tos, ...) fl4->flowi4_tos = tos __ip_route_output_key(net, fl4) Unmask the upper DSCP bits so that in the future the output route lookup could be performed according to the full DSCP value. Remove IPTOS_RT_MASK since it is no longer used. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-28xfrm: minor update to sdb and xfrm_policy commentsFlorian Westphal
The spd is no longer maintained as a linear list. We also haven't been caching bundles in the xfrm_policy struct since 2010. While at it, add kdoc style comments for the xfrm_policy structure and extend the description of the current rbtree based search to mention why it needs to search the candidate set. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-28xfrm: policy: use recently added helper in more placesFlorian Westphal
No logical change intended. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-27xfrm: add SA information to the offloaded packetwangfe
In packet offload mode, append Security Association (SA) information to each packet, replicating the crypto offload implementation. The XFRM_XMIT flag is set to enable packet to be returned immediately from the validate_xmit_xfrm function, thus aligning with the existing code path for packet offload mode. This SA info helps HW offload match packets to their correct security policies. The XFRM interface ID is included, which is crucial in setups with multiple XFRM interfaces where source/destination addresses alone can't pinpoint the right policy. Signed-off-by: wangfe <wangfe@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-24xfrm: policy: remove remaining use of inexact listFlorian Westphal
No consumers anymore, remove it. After this, insertion of policies no longer require list walk of all inexact policies but only those that are reachable via the candidate sets. This gives almost linear insertion speeds provided the inserted policies are for non-overlapping networks. Before: Inserted 1000 policies in 70 ms Inserted 10000 policies in 1155 ms Inserted 100000 policies in 216848 ms After: Inserted 1000 policies in 56 ms Inserted 10000 policies in 478 ms Inserted 100000 policies in 4580 ms Insertion of 1m entries takes about ~40s after this change on my test vm. Cc: Noel Kuntze <noel@familie-kuntze.de> Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-24xfrm: switch migrate to xfrm_policy_lookup_bytypeFlorian Westphal
XFRM_MIGRATE still uses the old lookup method: first check the bydst hash table, then search the list of all the other policies. Switch MIGRATE to use the same lookup function as the packetpath. This is done to remove the last remaining users of the pernet xfrm.policy_inexact lists with the intent of removing this list. After this patch, policies are still added to the list on insertion and they are rehashed as-needed but no single API makes use of these anymore. This change is compile tested only. Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-24xfrm: policy: don't iterate inexact policies twice at insert timeFlorian Westphal
Since commit 6be3b0db6db8 ("xfrm: policy: add inexact policy search tree infrastructure") policy lookup no longer walks a list but has a set of candidate lists. This set has to be searched for the best match. In case there are several matches, the priority wins. If the priority is also the same, then the historic behaviour with a single list was to return the first match (first-in-list). With introduction of serval lists, this doesn't work and a new 'pos' member was added that reflects the xfrm_policy structs position in the list. This value is not exported to userspace and it does not need to be the 'position in the list', it just needs to make sure that a->pos < b->pos means that a was added to the lists more recently than b. This re-walk is expensive when many inexact policies are in use. Speed this up: when appending the policy to the end of the walker list, then just take the ->pos value of the last entry made and add 1. Add a slowpath version to prevent overflow, if we'd assign UINT_MAX then iterate the entire list and fix the ordering. While this speeds up insertion considerably finding the insertion spot in the inexact list still requires a partial list walk. This is addressed in followup patches. Before: ./xfrm_policy_add_speed.sh Inserted 1000 policies in 72 ms Inserted 10000 policies in 1540 ms Inserted 100000 policies in 334780 ms After: Inserted 1000 policies in 68 ms Inserted 10000 policies in 1137 ms Inserted 100000 policies in 157307 ms Reported-by: Noel Kuntze <noel@familie-kuntze.de> Cc: Tobias Brunner <tobias@strongswan.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-08-16xfrm: Remove documentation WARN_ON to limit return values for offloaded SAPatrisious Haddad
The original idea to put WARN_ON() on return value from driver code was to make sure that packet offload doesn't have silent fallback to SW implementation, like crypto offload has. In reality, this is not needed as all *swan implementations followed this request and used explicit configuration style to make sure that "users will get what they ask". So instead of forcing drivers to make sure that even their internal flows don't return -EOPNOTSUPP, let's remove this WARN_ON. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Merge in late fixes to prepare for the 6.11 net-next PR. Conflicts: 93c3a96c301f ("net: pse-pd: Do not return EOPNOSUPP if config is null") 4cddb0f15ea9 ("net: ethtool: pse-pd: Fix possible null-deref") 30d7b6727724 ("net: ethtool: Add new power limit get and set features") https://lore.kernel.org/20240715123204.623520bb@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge tag 'ipsec-next-2024-07-13' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-07-13 1) Support sending NAT keepalives in ESP in UDP states. Userspace IKE daemon had to do this before, but the kernel can better keep track of it. From Eyal Birger. 2) Support IPsec crypto offload for IPv6 ESP and IPv4 UDP-encapsulated ESP data paths. Currently, IPsec crypto offload is enabled for GRO code path only. This patchset support UDP encapsulation for the non GRO path. From Mike Yu. * tag 'ipsec-next-2024-07-13' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Support crypto offload for outbound IPv4 UDP-encapsulated ESP packet xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packet xfrm: Allow UDP encapsulation in crypto offload control path xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO path xfrm: support sending NAT keepalives in ESP in UDP states ==================== Link: https://patch.msgid.link/20240713102416.3272997-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-14Merge tag 'ipsec-2024-07-11' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2024-07-11 1) Fix esp_output_tail_tcp() on unsupported ESPINTCP. From Hagar Hemdan. 2) Fix two bugs in the recently introduced SA direction separation. From Antony Antony. 3) Fix unregister netdevice hang on hardware offload. We had to add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. 4) Fix netdev reference count imbalance in xfrm_state_find. From Jianbo Liu. 5) Call xfrm_dev_policy_delete when killingi them on offloaded policies. Jianbo Liu. * tag 'ipsec-2024-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: call xfrm_dev_policy_delete when kill policy xfrm: fix netdev reference count imbalance xfrm: Export symbol xfrm_dev_state_delete. xfrm: Fix unregister netdevice hang on hardware offload. xfrm: Log input direction mismatch error in one place xfrm: Fix input error path memory access net: esp: cleanup esp_output_tail_tcp() in case of unsupported ESPINTCP ==================== Link: https://patch.msgid.link/20240711100025.1949454-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-07-12xfrm: Support crypto offload for inbound IPv4 UDP-encapsulated ESP packetMike Yu
If xfrm_input() is called with UDP_ENCAP_ESPINUDP, the packet is already processed in UDP layer that removes the UDP header. Therefore, there should be no much difference to treat it as an ESP packet in the XFRM stack. Test: Enabled dir=in IPsec crypto offload, and verified IPv4 UDP-encapsulated ESP packets on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12xfrm: Allow UDP encapsulation in crypto offload control pathMike Yu
Unblock this limitation so that SAs with encapsulation specified can be passed to HW drivers. HW drivers can still reject the SA in their implementation of xdo_dev_state_add if the encapsulation is not supported. Test: Verified on Android device Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-12xfrm: Support crypto offload for inbound IPv6 ESP packets not in GRO pathMike Yu
IPsec crypt offload supports outbound IPv6 ESP packets, but it doesn't support inbound IPv6 ESP packets. This change enables the crypto offload for inbound IPv6 ESP packets that are not handled through GRO code path. If HW drivers add the offload information to the skb, the packet will be handled in the crypto offload rx code path. Apart from the change in crypto offload rx code path, the change in xfrm_policy_check is also needed. Exampe of RX data path: +-----------+ +-------+ | HW Driver |-->| wlan0 |--------+ +-----------+ +-------+ | v +---------------+ +------+ +------>| Network Stack |-->| Apps | | +---------------+ +------+ | | | v +--------+ +------------+ | ipsec1 |<--| XFRM Stack | +--------+ +------------+ Test: Enabled both in/out IPsec crypto offload, and verified IPv6 ESP packets on Android device on both wifi/cellular network Signed-off-by: Mike Yu <yumike@google.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-08xfrm: call xfrm_dev_policy_delete when kill policyJianbo Liu
xfrm_policy_kill() is called at different places to delete xfrm policy. It will call xfrm_pol_put(). But xfrm_dev_policy_delete() is not called to free the policy offloaded to hardware. The three commits cited here are to handle this issue by calling xfrm_dev_policy_delete() outside xfrm_get_policy(). But they didn't cover all the cases. An example, which is not handled for now, is xfrm_policy_insert(). It is called when XFRM_MSG_UPDPOLICY request is received. Old policy is replaced by new one, but the offloaded policy is not deleted, so driver doesn't have the chance to release hardware resources. To resolve this issue for all cases, move xfrm_dev_policy_delete() into xfrm_policy_kill(), so the offloaded policy can be deleted from hardware when it is called, which avoids hardware resources leakage. Fixes: 919e43fad516 ("xfrm: add an interface to offload policy") Fixes: bf06fcf4be0f ("xfrm: add missed call to delete offloaded policies") Fixes: 982c3aca8bac ("xfrm: delete offloaded policy") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-08xfrm: fix netdev reference count imbalanceJianbo Liu
In cited commit, netdev_tracker_alloc() is called for the newly allocated xfrm state, but dev_hold() is missed, which causes netdev reference count imbalance, because netdev_put() is called when the state is freed in xfrm_dev_state_free(). Fix the issue by replacing netdev_tracker_alloc() with netdev_hold(). Fixes: f8a70afafc17 ("xfrm: add TX datapath support for IPsec packet offload mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-07-01xfrm: Export symbol xfrm_dev_state_delete.Steffen Klassert
This fixes a build failure if xfrm_user is build as a module. Fixes: 07b87f9eea0c ("xfrm: Fix unregister netdevice hang on hardware offload.") Reported-by: Mark Brown <broonie@kernel.org> Tested-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-26xfrm: support sending NAT keepalives in ESP in UDP statesEyal Birger
Add the ability to send out RFC-3948 NAT keepalives from the xfrm stack. To use, Userspace sets an XFRM_NAT_KEEPALIVE_INTERVAL integer property when creating XFRM outbound states which denotes the number of seconds between keepalive messages. Keepalive messages are sent from a per net delayed work which iterates over the xfrm states. The logic is guarded by the xfrm state spinlock due to the xfrm state walk iterator. Possible future enhancements: - Adding counters to keep track of sent keepalives. - deduplicate NAT keepalives between states sharing the same nat keepalive parameters. - provisioning hardware offloads for devices capable of implementing this. - revise xfrm state list to use an rcu list in order to avoid running this under spinlock. Suggested-by: Paul Wouters <paul.wouters@aiven.io> Tested-by: Paul Wouters <paul.wouters@aiven.io> Tested-by: Antony Antony <antony.antony@secunet.com> Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-25xfrm: Fix unregister netdevice hang on hardware offload.Steffen Klassert
When offloading xfrm states to hardware, the offloading device is attached to the skbs secpath. If a skb is free is deferred, an unregister netdevice hangs because the netdevice is still refcounted. Fix this by removing the netdevice from the xfrm states when the netdevice is unregistered. To find all xfrm states that need to be cleared we add another list where skbs linked to that are unlinked from the lists (deleted) but not yet freed. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-17xfrm: Log input direction mismatch error in one placeAntony Antony
Previously, the offload data path decrypted the packet before checking the direction, leading to error logging and packet dropping. However, dropped packets wouldn't be visible in tcpdump or audit log. With this fix, the offload path, upon noticing SA direction mismatch, will pass the packet to the stack without decrypting it. The L3 layer will then log the error, audit, and drop ESP without decrypting or decapsulating it. This also ensures that the slow path records the error and audit log, making dropped packets visible in tcpdump. Fixes: 304b44f0d5a4 ("xfrm: Add dir validation to "in" data path lookup") Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-06-17xfrm: Fix input error path memory accessAntony Antony
When there is a misconfiguration of input state slow path KASAN report error. Fix this error. west login: [ 52.987278] eth1: renamed from veth11 [ 53.078814] eth1: renamed from veth21 [ 53.181355] eth1: renamed from veth31 [ 54.921702] ================================================================== [ 54.922602] BUG: KASAN: wild-memory-access in xfrmi_rcv_cb+0x2d/0x295 [ 54.923393] Read of size 8 at addr 6b6b6b6b00000000 by task ping/512 [ 54.924169] [ 54.924386] CPU: 0 PID: 512 Comm: ping Not tainted 6.9.0-08574-gcd29a4313a1b #25 [ 54.925290] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 54.926401] Call Trace: [ 54.926731] <IRQ> [ 54.927009] dump_stack_lvl+0x2a/0x3b [ 54.927478] kasan_report+0x84/0xa6 [ 54.927930] ? xfrmi_rcv_cb+0x2d/0x295 [ 54.928410] xfrmi_rcv_cb+0x2d/0x295 [ 54.928872] ? xfrm4_rcv_cb+0x3d/0x5e [ 54.929354] xfrm4_rcv_cb+0x46/0x5e [ 54.929804] xfrm_rcv_cb+0x7e/0xa1 [ 54.930240] xfrm_input+0x1b3a/0x1b96 [ 54.930715] ? xfrm_offload+0x41/0x41 [ 54.931182] ? raw_rcv+0x292/0x292 [ 54.931617] ? nf_conntrack_confirm+0xa2/0xa2 [ 54.932158] ? skb_sec_path+0xd/0x3f [ 54.932610] ? xfrmi_input+0x90/0xce [ 54.933066] xfrm4_esp_rcv+0x33/0x54 [ 54.933521] ip_protocol_deliver_rcu+0xd7/0x1b2 [ 54.934089] ip_local_deliver_finish+0x110/0x120 [ 54.934659] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 54.935248] NF_HOOK.constprop.0+0xf8/0x138 [ 54.935767] ? ip_sublist_rcv_finish+0x68/0x68 [ 54.936317] ? secure_tcpv6_ts_off+0x23/0x168 [ 54.936859] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 54.937454] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 54.938135] NF_HOOK.constprop.0+0xf8/0x138 [ 54.938663] ? ip_sublist_rcv_finish+0x68/0x68 [ 54.939220] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 54.939904] ? ip_local_deliver_finish+0x120/0x120 [ 54.940497] __netif_receive_skb_one_core+0xc9/0x107 [ 54.941121] ? __netif_receive_skb_list_core+0x1c2/0x1c2 [ 54.941771] ? blk_mq_start_stopped_hw_queues+0xc7/0xf9 [ 54.942413] ? blk_mq_start_stopped_hw_queue+0x38/0x38 [ 54.943044] ? virtqueue_get_buf_ctx+0x295/0x46b [ 54.943618] process_backlog+0xb3/0x187 [ 54.944102] __napi_poll.constprop.0+0x57/0x1a7 [ 54.944669] net_rx_action+0x1cb/0x380 [ 54.945150] ? __napi_poll.constprop.0+0x1a7/0x1a7 [ 54.945744] ? vring_new_virtqueue+0x17a/0x17a [ 54.946300] ? note_interrupt+0x2cd/0x367 [ 54.946805] handle_softirqs+0x13c/0x2c9 [ 54.947300] do_softirq+0x5f/0x7d [ 54.947727] </IRQ> [ 54.948014] <TASK> [ 54.948300] __local_bh_enable_ip+0x48/0x62 [ 54.948832] __neigh_event_send+0x3fd/0x4ca [ 54.949361] neigh_resolve_output+0x1e/0x210 [ 54.949896] ip_finish_output2+0x4bf/0x4f0 [ 54.950410] ? __ip_finish_output+0x171/0x1b8 [ 54.950956] ip_send_skb+0x25/0x57 [ 54.951390] raw_sendmsg+0xf95/0x10c0 [ 54.951850] ? check_new_pages+0x45/0x71 [ 54.952343] ? raw_hash_sk+0x21b/0x21b [ 54.952815] ? kernel_init_pages+0x42/0x51 [ 54.953337] ? prep_new_page+0x44/0x51 [ 54.953811] ? get_page_from_freelist+0x72b/0x915 [ 54.954390] ? signal_pending_state+0x77/0x77 [ 54.954936] ? preempt_count_sub+0x14/0xb3 [ 54.955450] ? __might_resched+0x8a/0x240 [ 54.955951] ? __might_sleep+0x25/0xa0 [ 54.956424] ? first_zones_zonelist+0x2c/0x43 [ 54.956977] ? __rcu_read_lock+0x2d/0x3a [ 54.957476] ? __pte_offset_map+0x32/0xa4 [ 54.957980] ? __might_resched+0x8a/0x240 [ 54.958483] ? __might_sleep+0x25/0xa0 [ 54.958963] ? inet_send_prepare+0x54/0x54 [ 54.959478] ? sock_sendmsg_nosec+0x42/0x6c [ 54.960000] sock_sendmsg_nosec+0x42/0x6c [ 54.960502] __sys_sendto+0x15d/0x1cc [ 54.960966] ? __x64_sys_getpeername+0x44/0x44 [ 54.961522] ? __handle_mm_fault+0x679/0xae4 [ 54.962068] ? find_vma+0x6b/0x8b [ 54.962497] ? find_vma_intersection+0x8a/0x8a [ 54.963052] ? handle_mm_fault+0x38/0x154 [ 54.963556] ? handle_mm_fault+0xeb/0x154 [ 54.964059] ? preempt_latency_start+0x29/0x34 [ 54.964613] ? preempt_count_sub+0x14/0xb3 [ 54.965141] ? up_read+0x4b/0x5c [ 54.965557] __x64_sys_sendto+0x76/0x82 [ 54.966041] do_syscall_64+0x69/0xd5 [ 54.966497] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 54.967119] RIP: 0033:0x7f2d2fec9a73 [ 54.967572] Code: 8b 15 a9 83 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 71 0b 0d 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24 [ 54.969747] RSP: 002b:00007ffe85756418 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 54.970655] RAX: ffffffffffffffda RBX: 0000558bebad1340 RCX: 00007f2d2fec9a73 [ 54.971511] RDX: 0000000000000040 RSI: 0000558bebad73c0 RDI: 0000000000000003 [ 54.972366] RBP: 0000558bebad73c0 R08: 0000558bebad35c0 R09: 0000000000000010 [ 54.973234] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040 [ 54.974091] R13: 00007ffe85757b00 R14: 0000001d00000001 R15: 0000558bebad4680 [ 54.974951] </TASK> [ 54.975244] ================================================================== [ 54.976133] Disabling lock debugging due to kernel taint [ 54.976784] Oops: stack segment: 0000 [#1] PREEMPT DEBUG_PAGEALLOC KASAN [ 54.977603] CPU: 0 PID: 512 Comm: ping Tainted: G B 6.9.0-08574-gcd29a4313a1b #25 [ 54.978654] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 54.979750] RIP: 0010:xfrmi_rcv_cb+0x2d/0x295 [ 54.980293] Code: 00 00 41 57 41 56 41 89 f6 41 55 41 54 55 53 48 89 fb 51 85 f6 75 31 48 89 df e8 d7 e8 ff ff 48 89 c5 48 89 c7 e8 8b a4 4f ff <48> 8b 7d 00 48 89 ee e8 eb f3 ff ff 49 89 c5 b8 01 00 00 00 4d 85 [ 54.982462] RSP: 0018:ffffc90000007990 EFLAGS: 00010282 [ 54.983099] RAX: 0000000000000001 RBX: ffff8881126e9900 RCX: fffffbfff07b77cd [ 54.983948] RDX: fffffbfff07b77cd RSI: fffffbfff07b77cd RDI: ffffffff83dbbe60 [ 54.984794] RBP: 6b6b6b6b00000000 R08: 0000000000000008 R09: 0000000000000001 [ 54.985647] R10: ffffffff83dbbe67 R11: fffffbfff07b77cc R12: 00000000ffffffff [ 54.986512] R13: 00000000ffffffff R14: 00000000ffffffff R15: 0000000000000002 [ 54.987365] FS: 00007f2d2fc0dc40(0000) GS:ffffffff82eb2000(0000) knlGS:0000000000000000 [ 54.988329] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 54.989026] CR2: 00007ffe85755ff8 CR3: 0000000109941000 CR4: 0000000000350ef0 [ 54.989897] Call Trace: [ 54.990223] <IRQ> [ 54.990500] ? __die_body+0x1a/0x56 [ 54.990950] ? die+0x30/0x49 [ 54.991326] ? do_trap+0x9b/0x132 [ 54.991751] ? do_error_trap+0x7d/0xaf [ 54.992223] ? exc_stack_segment+0x35/0x45 [ 54.992734] ? asm_exc_stack_segment+0x22/0x30 [ 54.993294] ? xfrmi_rcv_cb+0x2d/0x295 [ 54.993764] ? xfrm4_rcv_cb+0x3d/0x5e [ 54.994228] xfrm4_rcv_cb+0x46/0x5e [ 54.994670] xfrm_rcv_cb+0x7e/0xa1 [ 54.995106] xfrm_input+0x1b3a/0x1b96 [ 54.995572] ? xfrm_offload+0x41/0x41 [ 54.996038] ? raw_rcv+0x292/0x292 [ 54.996472] ? nf_conntrack_confirm+0xa2/0xa2 [ 54.997011] ? skb_sec_path+0xd/0x3f [ 54.997466] ? xfrmi_input+0x90/0xce [ 54.997925] xfrm4_esp_rcv+0x33/0x54 [ 54.998378] ip_protocol_deliver_rcu+0xd7/0x1b2 [ 54.998944] ip_local_deliver_finish+0x110/0x120 [ 54.999520] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 55.000111] NF_HOOK.constprop.0+0xf8/0x138 [ 55.000630] ? ip_sublist_rcv_finish+0x68/0x68 [ 55.001195] ? secure_tcpv6_ts_off+0x23/0x168 [ 55.001743] ? ip_protocol_deliver_rcu+0x1b2/0x1b2 [ 55.002331] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 55.003008] NF_HOOK.constprop.0+0xf8/0x138 [ 55.003527] ? ip_sublist_rcv_finish+0x68/0x68 [ 55.004078] ? __xfrm_policy_check2.constprop.0+0x18d/0x18d [ 55.004755] ? ip_local_deliver_finish+0x120/0x120 [ 55.005351] __netif_receive_skb_one_core+0xc9/0x107 [ 55.005972] ? __netif_receive_skb_list_core+0x1c2/0x1c2 [ 55.006626] ? blk_mq_start_stopped_hw_queues+0xc7/0xf9 [ 55.007266] ? blk_mq_start_stopped_hw_queue+0x38/0x38 [ 55.007899] ? virtqueue_get_buf_ctx+0x295/0x46b [ 55.008476] process_backlog+0xb3/0x187 [ 55.008961] __napi_poll.constprop.0+0x57/0x1a7 [ 55.009540] net_rx_action+0x1cb/0x380 [ 55.010020] ? __napi_poll.constprop.0+0x1a7/0x1a7 [ 55.010610] ? vring_new_virtqueue+0x17a/0x17a [ 55.011173] ? note_interrupt+0x2cd/0x367 [ 55.011675] handle_softirqs+0x13c/0x2c9 [ 55.012169] do_softirq+0x5f/0x7d [ 55.012597] </IRQ> [ 55.012882] <TASK> [ 55.013179] __local_bh_enable_ip+0x48/0x62 [ 55.013704] __neigh_event_send+0x3fd/0x4ca [ 55.014227] neigh_resolve_output+0x1e/0x210 [ 55.014761] ip_finish_output2+0x4bf/0x4f0 [ 55.015278] ? __ip_finish_output+0x171/0x1b8 [ 55.015823] ip_send_skb+0x25/0x57 [ 55.016261] raw_sendmsg+0xf95/0x10c0 [ 55.016729] ? check_new_pages+0x45/0x71 [ 55.017229] ? raw_hash_sk+0x21b/0x21b [ 55.017708] ? kernel_init_pages+0x42/0x51 [ 55.018225] ? prep_new_page+0x44/0x51 [ 55.018704] ? get_page_from_freelist+0x72b/0x915 [ 55.019292] ? signal_pending_state+0x77/0x77 [ 55.019840] ? preempt_count_sub+0x14/0xb3 [ 55.020357] ? __might_resched+0x8a/0x240 [ 55.020860] ? __might_sleep+0x25/0xa0 [ 55.021345] ? first_zones_zonelist+0x2c/0x43 [ 55.021896] ? __rcu_read_lock+0x2d/0x3a [ 55.022396] ? __pte_offset_map+0x32/0xa4 [ 55.022901] ? __might_resched+0x8a/0x240 [ 55.023404] ? __might_sleep+0x25/0xa0 [ 55.023879] ? inet_send_prepare+0x54/0x54 [ 55.024391] ? sock_sendmsg_nosec+0x42/0x6c [ 55.024918] sock_sendmsg_nosec+0x42/0x6c [ 55.025428] __sys_sendto+0x15d/0x1cc [ 55.025892] ? __x64_sys_getpeername+0x44/0x44 [ 55.026441] ? __handle_mm_fault+0x679/0xae4 [ 55.026988] ? find_vma+0x6b/0x8b [ 55.027414] ? find_vma_intersection+0x8a/0x8a [ 55.027966] ? handle_mm_fault+0x38/0x154 [ 55.028470] ? handle_mm_fault+0xeb/0x154 [ 55.028972] ? preempt_latency_start+0x29/0x34 [ 55.029532] ? preempt_count_sub+0x14/0xb3 [ 55.030047] ? up_read+0x4b/0x5c [ 55.030463] __x64_sys_sendto+0x76/0x82 [ 55.030949] do_syscall_64+0x69/0xd5 [ 55.031406] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 55.032028] RIP: 0033:0x7f2d2fec9a73 [ 55.032481] Code: 8b 15 a9 83 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 80 3d 71 0b 0d 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24 [ 55.034660] RSP: 002b:00007ffe85756418 EFLAGS: 00000202 ORIG_RAX: 000000000000002c [ 55.035567] RAX: ffffffffffffffda RBX: 0000558bebad1340 RCX: 00007f2d2fec9a73 [ 55.036424] RDX: 0000000000000040 RSI: 0000558bebad73c0 RDI: 0000000000000003 [ 55.037293] RBP: 0000558bebad73c0 R08: 0000558bebad35c0 R09: 0000000000000010 [ 55.038153] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000040 [ 55.039012] R13: 00007ffe85757b00 R14: 0000001d00000001 R15: 0000558bebad4680 [ 55.039871] </TASK> [ 55.040167] Modules linked in: [ 55.040585] ---[ end trace 0000000000000000 ]--- [ 55.041164] RIP: 0010:xfrmi_rcv_cb+0x2d/0x295 [ 55.041714] Code: 00 00 41 57 41 56 41 89 f6 41 55 41 54 55 53 48 89 fb 51 85 f6 75 31 48 89 df e8 d7 e8 ff ff 48 89 c5 48 89 c7 e8 8b a4 4f ff <48> 8b 7d 00 48 89 ee e8 eb f3 ff ff 49 89 c5 b8 01 00 00 00 4d 85 [ 55.043889] RSP: 0018:ffffc90000007990 EFLAGS: 00010282 [ 55.044528] RAX: 0000000000000001 RBX: ffff8881126e9900 RCX: fffffbfff07b77cd [ 55.045386] RDX: fffffbfff07b77cd RSI: fffffbfff07b77cd RDI: ffffffff83dbbe60 [ 55.046250] RBP: 6b6b6b6b00000000 R08: 0000000000000008 R09: 0000000000000001 [ 55.047104] R10: ffffffff83dbbe67 R11: fffffbfff07b77cc R12: 00000000ffffffff [ 55.047960] R13: 00000000ffffffff R14: 00000000ffffffff R15: 0000000000000002 [ 55.048820] FS: 00007f2d2fc0dc40(0000) GS:ffffffff82eb2000(0000) knlGS:0000000000000000 [ 55.049805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 55.050507] CR2: 00007ffe85755ff8 CR3: 0000000109941000 CR4: 0000000000350ef0 [ 55.051366] Kernel panic - not syncing: Fatal exception in interrupt [ 55.052136] Kernel Offset: disabled [ 55.052577] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: 304b44f0d5a4 ("xfrm: Add dir validation to "in" data path lookup") Signed-off-by: Antony Antony <antony.antony@secunet.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2024-05-29net: fix __dst_negative_advice() raceEric Dumazet
__dst_negative_advice() does not enforce proper RCU rules when sk->dst_cache must be cleared, leading to possible UAF. RCU rules are that we must first clear sk->sk_dst_cache, then call dst_release(old_dst). Note that sk_dst_reset(sk) is implementing this protocol correctly, while __dst_negative_advice() uses the wrong order. Given that ip6_negative_advice() has special logic against RTF_CACHE, this means each of the three ->negative_advice() existing methods must perform the sk_dst_reset() themselves. Note the check against NULL dst is centralized in __dst_negative_advice(), there is no need to duplicate it in various callbacks. Many thanks to Clement Lecigne for tracking this issue. This old bug became visible after the blamed commit, using UDP sockets. Fixes: a87cb3e48ee8 ("net: Facility to report route quality of connected sockets") Reported-by: Clement Lecigne <clecigne@google.com> Diagnosed-by: Clement Lecigne <clecigne@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <tom@herbertland.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20240528114353.1794151-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 35d92abfbad8 ("net: hns3: fix kernel crash when devlink reload during initialization") 2a1a1a7b5fd7 ("net: hns3: add command queue trace for hns3") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-07rtnetlink: allow rtnl_fill_link_netnsid() to run under RCU protectionEric Dumazet
We want to be able to run rtnl_fill_ifinfo() under RCU protection instead of RTNL in the future. All rtnl_link_ops->get_link_net() methods already using dev_net() are ready. I added READ_ONCE() annotations on others. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-06Merge tag 'ipsec-next-2024-05-03' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2024-05-03 1) Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support. This was defined by an early version of an IETF draft that did not make it to a standard. 2) Introduce direction attribute for xfrm states. xfrm states have a direction, a stsate can be used either for input or output packet processing. Add a direction to xfrm states to make it clear for what a xfrm state is used. * tag 'ipsec-next-2024-05-03' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Restrict SA direction attribute to specific netlink message types xfrm: Add dir validation to "in" data path lookup xfrm: Add dir validation to "out" data path lookup xfrm: Add Direction to the SA in or out udpencap: Remove Obsolete UDP_ENCAP_ESPINUDP_NON_IKE Support ==================== Link: https://lore.kernel.org/r/20240503082732.2835810-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>