Age | Commit message (Collapse) | Author |
|
As discussed before, return -ENODATA (No data available) would be more
meaningful than ENOENT (No such file or directory).
Suggested-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221231151436.6541-1-changbin.du@gmail.com
|
|
Add PT_REGS macros for LoongArch ([0]).
[0]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Huacai Chen <chenhuacai@loongson.cn>
Link: https://lore.kernel.org/bpf/20221231100757.3177034-1-hengqi.chen@gmail.com
|
|
pr_warn calls into user-provided callback, which can clobber errno, so
`errno = saved_errno` should happen after pr_warn.
Fixes: 07453245620c ("libbpf: fix errno is overwritten after being closed.")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Currently, many API functions are not described in the document.
Add add API description of the following four API functions:
- libbpf_set_print;
- bpf_object__open;
- bpf_object__load;
- bpf_object__close.
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224112058.12038-1-liuxin350@huawei.com
|
|
"Daniel T. Lee" says:
====================
Syscall tracing using kprobe is quite unstable. Since it uses the exact
name of the kernel function, the program might broke due to the rename
of a function. The problem can also be caused by a changes in the
arguments of the function to which the kprobe connects. This commit
enhances syscall tracing program with the following instruments.
In this patchset, ksyscall is used instead of kprobe. By using
ksyscall, libbpf will detect the appropriate kernel function name.
(e.g. sys_write -> __s390_sys_write). This eliminates the need to worry
about which wrapper function to attach in order to parse arguments.
Also ksyscall provides more fine method with attaching system call, the
coarse SYSCALL helper at trace_common.h can be removed.
Next, BPF_SYSCALL is used to reduce the inconvenience of parsing
arguments. Since the nature of SYSCALL_WRAPPER function wraps the
argument once, additional process of argument extraction is required
to properly parse the argument. The BPF_SYSCALL macro will reduces the
hassle of parsing arguments from pt_regs.
Lastly, vmlinux.h is applied to syscall tracing program. This change
allows the bpf program to refer to the internal structure as a single
"vmlinux.h" instead of including each header referenced by the bpf
program.
Additionally, this patchset changes the suffix of _kern to .bpf to make
use of the new compile rule (CLANG-BPF) which is more simple and neat.
By just changing the _kern suffix to .bpf will inherit the benefit of
the new CLANG-BPF compile target.
Also, this commit adds dummy gnu/stub.h to the samples/bpf directory.
This will fix the compiling problem with 'clang -target bpf'.
To fix the build error with the s390x, this patchset also includes the
fix of libbpf invalid return address register mapping in s390.
---
Changes in V2:
- add gnu/stub.h hack to fix compile error with 'clang -target bpf'
Changes in V3:
- fix libbpf invalid return address register mapping in s390
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
There is currently an invalid register mapping in the s390 return
address register. As the manual[1] states, the return address can be
found at r14. In bpf_tracing.h, the s390 registers were named
gprs(general purpose registers). This commit fixes the problem by
correcting the mistyped mapping.
[1]: https://uclibc.org/docs/psABI-s390x.pdf#page=14
Fixes: 3cc31d794097 ("libbpf: Normalize PT_REGS_xxx() macro definitions")
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224071527.2292-7-danieltimlee@gmail.com
|
|
This commit enhances the syscall tracing programs by using the
BPF_SYSCALL macro to reduce the inconvenience of parsing arguments from
pt_regs. By simplifying argument extraction, bpf program will become
clear to understand.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224071527.2292-6-danieltimlee@gmail.com
|
|
Currently, there is a problem with tracex2, as it doesn't print the
histogram properly and the results are misleading. (all results report
as 0)
The problem is caused by a change in arguments of the function to which
the kprobe connects. This tracex2 bpf program uses kprobe (attached
to __x64_sys_write) to figure out the size of the write system call. In
order to achieve this, the third argument 'count' must be intact.
The following is a prototype of the sys_write variant. (checked with
pfunct)
~/git/linux$ pfunct -P fs/read_write.o | grep sys_write
ssize_t ksys_write(unsigned int fd, const char * buf, size_t count);
long int __x64_sys_write(const struct pt_regs * regs);
... cross compile with s390x ...
long int __s390_sys_write(struct pt_regs * regs);
Since the nature of SYSCALL_WRAPPER function wraps the argument once,
additional process of argument extraction is required to properly parse
the argument.
#define BPF_KSYSCALL(name, args...)
... snip ...
struct pt_regs *regs = LINUX_HAS_SYSCALL_WRAPPER \
? (struct pt_regs *)PT_REGS_PARM1(ctx) \
: ctx; \
In order to fix this problem, the BPF_SYSCALL macro has been used. This
reduces the hassle of parsing arguments from pt_regs. Since the macro
uses the CORE version of argument extraction, additional portability
comes too.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224071527.2292-5-danieltimlee@gmail.com
|
|
Currently old compile rule (CLANG-bpf) doesn't contains VMLINUX_H define
flag which is essential for the bpf program that includes "vmlinux.h".
Also old compile rule doesn't directly specify the compile target as bpf,
instead it uses bunch of extra options with clang followed by long chain
of commands. (e.g. clang | opt | llvm-dis | llc)
In Makefile, there is already new compile rule which is more simple and
neat. And it also has -D__VMLINUX_H__ option. By just changing the _kern
suffix to .bpf will inherit the benefit of the new CLANG-BPF compile
target.
Also, this commit adds dummy gnu/stub.h to the samples/bpf directory.
As commit 1c2dd16add7e ("selftests/bpf: get rid of -D__x86_64__") noted,
compiling with 'clang -target bpf' will raise an error with stubs.h
unless workaround (-D__x86_64) is used. This commit solves this problem
by adding dummy stub.h to make /usr/include/features.h to follow the
expected path as the same way selftests/bpf dealt with.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224071527.2292-4-danieltimlee@gmail.com
|
|
program
This commit applies vmlinux.h to syscall tracing program. This change
allows the bpf program to refer to the internal structure as a single
"vmlinux.h" instead of including each header referenced by the bpf
program.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224071527.2292-3-danieltimlee@gmail.com
|
|
Syscall tracing using kprobe is quite unstable. Since it uses the exact
name of the kernel function, the program might broke due to the rename
of a function. The problem can also be caused by a changes in the
arguments of the function to which the kprobe connects.
In this commit, ksyscall is used instead of kprobe. By using ksyscall,
libbpf will detect the appropriate kernel function name.
(e.g. sys_write -> __s390_sys_write). This eliminates the need to worry
about which wrapper function to attach in order to parse arguments.
In addition, ksyscall provides more fine method with attaching system
call, the coarse SYSCALL helper at trace_common.h can be removed.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221224071527.2292-2-danieltimlee@gmail.com
|
|
Many of the structs recently added to track field info for linked-list
head are useful as-is for rbtree root. So let's do a mechanical renaming
of list_head-related types and fields:
include/linux/bpf.h:
struct btf_field_list_head -> struct btf_field_graph_root
list_head -> graph_root in struct btf_field union
kernel/bpf/btf.c:
list_head -> graph_root in struct btf_field_info
This is a nonfunctional change, functionality to actually use these
fields for rbtree will be added in further patches.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20221217082506.1570898-5-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In the ensure_good_fd function, if the fcntl function succeeds but
the close function fails, ensure_good_fd returns a normal fd and
sets errno, which may cause users to misunderstand. The close
failure is not a serious problem, and the correct FD has been
handed over to the upper-layer application. Let's restore errno here.
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Link: https://lore.kernel.org/r/20221223133618.10323-1-liuxin350@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Comparing IDs exactly between two separate states is not just
suboptimal, but also incorrect in some cases. So update regs_exact()
check to do byte-by-byte memcmp() only up to id/ref_obj_id. For id and
ref_obj_id perform proper check_ids() checks, taking into account idmap.
This change makes more states equivalent improving insns and states
stats across a bunch of selftest BPF programs:
File Program Insns (A) Insns (B) Insns (DIFF) States (A) States (B) States (DIFF)
------------------------------------------- -------------------------------- --------- --------- -------------- ---------- ---------- -------------
cgrp_kfunc_success.bpf.linked1.o test_cgrp_get_release 141 137 -4 (-2.84%) 13 13 +0 (+0.00%)
cgrp_kfunc_success.bpf.linked1.o test_cgrp_xchg_release 142 139 -3 (-2.11%) 14 13 -1 (-7.14%)
connect6_prog.bpf.linked1.o connect_v6_prog 139 102 -37 (-26.62%) 9 6 -3 (-33.33%)
ima.bpf.linked1.o bprm_creds_for_exec 68 61 -7 (-10.29%) 6 5 -1 (-16.67%)
linked_list.bpf.linked1.o global_list_in_list 569 499 -70 (-12.30%) 60 52 -8 (-13.33%)
linked_list.bpf.linked1.o global_list_push_pop 167 150 -17 (-10.18%) 18 16 -2 (-11.11%)
linked_list.bpf.linked1.o global_list_push_pop_multiple 881 815 -66 (-7.49%) 74 63 -11 (-14.86%)
linked_list.bpf.linked1.o inner_map_list_in_list 579 534 -45 (-7.77%) 61 55 -6 (-9.84%)
linked_list.bpf.linked1.o inner_map_list_push_pop 190 181 -9 (-4.74%) 19 18 -1 (-5.26%)
linked_list.bpf.linked1.o inner_map_list_push_pop_multiple 916 850 -66 (-7.21%) 75 64 -11 (-14.67%)
linked_list.bpf.linked1.o map_list_in_list 588 525 -63 (-10.71%) 62 55 -7 (-11.29%)
linked_list.bpf.linked1.o map_list_push_pop 183 174 -9 (-4.92%) 18 17 -1 (-5.56%)
linked_list.bpf.linked1.o map_list_push_pop_multiple 909 843 -66 (-7.26%) 75 64 -11 (-14.67%)
map_kptr.bpf.linked1.o test_map_kptr 264 256 -8 (-3.03%) 26 26 +0 (+0.00%)
map_kptr.bpf.linked1.o test_map_kptr_ref 95 91 -4 (-4.21%) 9 8 -1 (-11.11%)
task_kfunc_success.bpf.linked1.o test_task_xchg_release 139 136 -3 (-2.16%) 14 13 -1 (-7.14%)
test_bpf_nf.bpf.linked1.o nf_skb_ct_test 815 509 -306 (-37.55%) 57 30 -27 (-47.37%)
test_bpf_nf.bpf.linked1.o nf_xdp_ct_test 815 509 -306 (-37.55%) 57 30 -27 (-47.37%)
test_cls_redirect.bpf.linked1.o cls_redirect 78925 78390 -535 (-0.68%) 4782 4704 -78 (-1.63%)
test_cls_redirect_subprogs.bpf.linked1.o cls_redirect 64901 63897 -1004 (-1.55%) 4612 4470 -142 (-3.08%)
test_sk_lookup.bpf.linked1.o access_ctx_sk 181 95 -86 (-47.51%) 19 10 -9 (-47.37%)
test_sk_lookup.bpf.linked1.o ctx_narrow_access 447 437 -10 (-2.24%) 38 37 -1 (-2.63%)
test_sk_lookup_kern.bpf.linked1.o sk_lookup_success 148 133 -15 (-10.14%) 14 12 -2 (-14.29%)
test_tcp_check_syncookie_kern.bpf.linked1.o check_syncookie_clsact 304 300 -4 (-1.32%) 23 22 -1 (-4.35%)
test_tcp_check_syncookie_kern.bpf.linked1.o check_syncookie_xdp 304 300 -4 (-1.32%) 23 22 -1 (-4.35%)
test_verify_pkcs7_sig.bpf.linked1.o bpf 87 76 -11 (-12.64%) 7 6 -1 (-14.29%)
------------------------------------------- -------------------------------- --------- --------- -------------- ---------- ---------- -------------
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Extract byte-by-byte comparison of bpf_reg_state in regsafe() into
a helper function, which makes it more convenient to use it "on demand"
only for registers that benefit from such checks, instead of doing it
all the time, even if result of such comparison is ignored.
Also, remove WARN_ON_ONCE(1)+return false dead code. There is no risk of
missing some case as compiler will warn about non-void function not
returning value in some branches (and that under assumption that default
case is removed in the future).
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Generalize the (somewhat implicit) rule of regsafe(), which states that
if register types in old and current states do not match *exactly*, they
can't be safely considered equivalent.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Make generic check to prevent XXX_OR_NULL and XXX register types to be
intermixed. While technically in some situations it could be safe, it's
impossible to enforce due to the loss of an ID when converting
XXX_OR_NULL to its non-NULL variant. So prevent this in general, not
just for PTR_TO_MAP_KEY and PTR_TO_MAP_VALUE.
PTR_TO_MAP_KEY_OR_NULL and PTR_TO_MAP_VALUE_OR_NULL checks, which were
previously special-cased, are simplified to generic check that takes
into account range_within() and tnum_in(). This is correct as BPF
verifier doesn't allow arithmetic on XXX_OR_NULL register types, so
var_off and ranges should stay zero. But even if in the future this
restriction is lifted, it's even more important to enforce that var_off
and ranges are compatible, otherwise it's possible to construct case
where this can be exploited to bypass verifier's memory range safety
checks.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Move id and ref_obj_id fields after scalar data section (var_off and
ranges). This is necessary to simplify next patch which will change
regsafe()'s logic to be safer, as it makes the contents that has to be
an exact match (type-specific parts, off, type, and var_off+ranges)
a single sequential block of memory, while id and ref_obj_id should
always be remapped and thus can't be memcp()'ed.
There are few places that assume that var_off is after id/ref_obj_id to
clear out id/ref_obj_id with the single memset(0). These are changed to
explicitly zero-out id/ref_obj_id fields. Other places are adjusted to
preserve exact byte-by-byte comparison behavior.
No functional changes.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
states_equal() check performs ID mapping between old and new states to
establish a 1-to-1 correspondence between IDs, even if their absolute
numberic values across two equivalent states differ. This is important
both for correctness and to avoid unnecessary work when two states are
equivalent.
With recent changes we partially fixed this logic by maintaining ID map
across all function frames. This patch also makes refsafe() check take
into account (and maintain) ID map, making states_equal() behavior more
optimal and correct.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221223054921.958283-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Maxlen is used by standard proc_handlers such as proc_dointvec(), but in this
case we have our own proc_handler via bpf_stats_handler(). Therefore, remove
the initialization.
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221221-bpf-syscall-v1-0-9550f5f2c3fc@chromium.org
|
|
BPF CI fails for s390x with the following result:
[...]
All error logs:
libbpf: prog 'test_jit_probe_mem': BPF program load failed: ERROR: strerror_r(524)=22
libbpf: prog 'test_jit_probe_mem':
-- BEGIN PROG LOAD LOG --
JIT does not support calling kernel function
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'test_jit_probe_mem': failed to load: -524
libbpf: failed to load object 'jit_probe_mem'
libbpf: failed to load BPF skeleton 'jit_probe_mem': -524
test_jit_probe_mem:FAIL:jit_probe_mem__open_and_load unexpected error: -524
#89 jit_probe_mem:FAIL
[...]
Add the test to the deny list.
Fixes: 59fe41b5255f ("selftests/bpf: Add verifier test exercising jit PROBE_MEM logic")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
|
|
This patch adds a test exercising logic that was fixed / improved in
the previous patch in the series, as well as general sanity checking for
jit's PROBE_MEM logic which should've been unaffected by the previous
patch.
The added verifier test does the following:
* Acquire a referenced kptr to struct prog_test_ref_kfunc using
existing net/bpf/test_run.c kfunc
* Helper returns ptr to a specific prog_test_ref_kfunc whose first
two fields - both ints - have been prepopulated w/ vals 42 and
108, respectively
* kptr_xchg the acquired ptr into an arraymap
* Do a direct map_value load of the just-added ptr
* Goal of all this setup is to get an unreferenced kptr pointing to
struct with ints of known value, which is the result of this step
* Using unreferenced kptr obtained in previous step, do loads of
prog_test_ref_kfunc.a (offset 0) and .b (offset 4)
* Then incr the kptr by 8 and load prog_test_ref_kfunc.a again (this
time at offset -8)
* Add all the loaded ints together and return
Before the PROBE_MEM fixes in previous patch, the loads at offset 0 and
4 would succeed, while the load at offset -8 would incorrectly fail
runtime check emitted by the JIT and 0 out dst reg as a result. This
confirmed by retval of 150 for this test before previous patch - since
second .a read is 0'd out - and a retval of 192 with the fixed logic.
The test exercises the two optimizations to fixed logic added in last
patch as well:
* First load, with insn "r8 = *(u32 *)(r9 + 0)" exercises "insn->off
is 0, no need to add / sub from src_reg" optimization
* Third load, with insn "r9 = *(u32 *)(r9 - 8)" exercises "src_reg ==
dst_reg, no need to restore src_reg after load" optimization
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221216214319.3408356-2-davemarchevsky@fb.com
|
|
This patch rewrites the runtime PROBE_MEM check insns emitted by the BPF
JIT in order to ensure load safety. The changes in the patch fix two
issues with the previous logic and more generally improve size of
emitted code. Paragraphs between this one and "FIX 1" below explain the
purpose of the runtime check and examine the current implementation.
When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the
address being loaded from is not necessarily valid. The BPF jit sets up
exception handlers for each such load which catch page faults and 0 out
the destination register.
Arbitrary register-relative loads can escape this exception handling
mechanism. Specifically, a load like dst_reg = *(src_reg + off) will not
trigger BPF exception handling if (src_reg + off) is outside of kernel
address space, resulting in an uncaught page fault. A concrete example
of such behavior is a program like:
struct result {
char space[40];
long a;
};
/* if err, returns ERR_PTR(-EINVAL) */
struct result *ptr = get_ptr_maybe_err();
long x = ptr->a;
If get_ptr_maybe_err returns ERR_PTR(-EINVAL) and the result isn't
checked for err, 'result' will be (u64)-EINVAL, a number close to
U64_MAX. The ptr->a load will be > U64_MAX and will wrap over to a small
positive u64, which will be in userspace and thus not covered by BPF
exception handling mechanism.
In order to prevent such loads from occurring, the BPF jit emits some
instructions which do runtime checking of (src_reg + off) and skip the
actual load if it's out of range. As an example, here are instructions
emitted for a %rdi = *(%rdi + 0x10) PROBE_MEM load:
72: movabs $0x800000000010,%r11 --|
7c: cmp %r11,%rdi |- 72 - 7f: Check 1
7f: jb 0x000000000000008d --|
81: mov %rdi,%r11 -----|
84: add $0x0000000000000010,%r11 |- 81-8b: Check 2
8b: jnc 0x0000000000000091 -----|
8d: xor %edi,%edi ---- 0 out dest
8f: jmp 0x0000000000000095
91: mov 0x10(%rdi),%rdi ---- Actual load
95:
The JIT considers kernel address space to start at MAX_TASK_SIZE +
PAGE_SIZE. Determining whether a load will be outside of kernel address
space should be a simple check:
(src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE
But because there is only one spare register when the checking logic is
emitted, this logic is split into two checks:
Check 1: src_reg >= (MAX_TASK_SIZE + PAGE_SIZE - off)
Check 2: src_reg + off doesn't wrap over U64_MAX and result in small pos u64
Emitted insns implementing Checks 1 and 2 are annotated in the above
example. Check 1 can be done with a single spare register since the
source reg by definition is the left-hand-side of the inequality.
Since adding 'off' to both sides of Check 1's inequality results in the
original inequality we want, it's equivalent to testing that inequality.
Except in the case where src_reg + off wraps past U64_MAX, which is why
Check 2 needs to actually add src_reg + off if Check 1 passes - again
using the single spare reg.
FIX 1: The Check 1 inequality listed above is not what current code is
doing. Current code is a bit more pessimistic, instead checking:
src_reg >= (MAX_TASK_SIZE + PAGE_SIZE + abs(off))
The 0x800000000010 in above example is from this current check. If Check
1 was corrected to use the correct right-hand-side, the value would be
0x7ffffffffff0. This patch changes the checking logic more broadly (FIX
2 below will elaborate), fixing this issue as a side-effect of the
rewrite. Regardless, it's important to understand why Check 1 should've
been doing MAX_TASK_SIZE + PAGE_SIZE - off before proceeding.
FIX 2: Current code relies on a 'jnc' to determine whether src_reg + off
addition wrapped over. For negative offsets this logic is incorrect.
Consider Check 2 insns emitted when off = -0x10:
81: mov %rdi,%r11
84: add 0xfffffffffffffff0,%r11
8b: jnc 0x0000000000000091
2's complement representation of -0x10 is a large positive u64. Any
value of src_reg that passes Check 1 will result in carry flag being set
after (src_reg + off) addition. So a load with any negative offset will
always fail Check 2 at runtime and never do the actual load. This patch
fixes the negative offset issue by rewriting both checks in order to not
rely on carry flag.
The rewrite takes advantage of the fact that, while we only have one
scratch reg to hold arbitrary values, we know the offset at JIT time.
This we can use src_reg as a temporary scratch reg to hold src_reg +
offset since we can return it to its original value by later subtracting
offset. As a result we can directly check the original inequality we
care about:
(src_reg + off) >= MAX_TASK_SIZE + PAGE_SIZE
For a load like %rdi = *(%rsi + -0x10), this results in emitted code:
43: movabs $0x800000000000,%r11
4d: add $0xfffffffffffffff0,%rsi --- src_reg += off
54: cmp %r11,%rsi --- Check original inequality
57: jae 0x000000000000005d
59: xor %edi,%edi
5b: jmp 0x0000000000000061
5d: mov 0x0(%rdi),%rsi --- Actual Load
61: sub $0xfffffffffffffff0,%rsi --- src_reg -= off
Note that the actual load is always done with offset 0, since previous
insns have already done src_reg += off. Regardless of whether the new
check succeeds or fails, insn 61 is always executed, returning src_reg
to its original value.
Because the goal of these checks is to ensure that loaded-from address
will be protected by BPF exception handler, the new check can safely
ignore any wrapover from insn 4d. If such wrapped-over address passes
insn 54 + 57's cmp-and-jmp it will have such protection so the load can
proceed.
IMPROVEMENTS: The above improved logic is 8 insns vs original logic's 9,
and has 1 fewer jmp. The number of checking insns can be further
improved in common scenarios:
If src_reg == dst_reg, the actual load insn will clobber src_reg, so
there's no original src_reg state for the sub insn immediately following
the load to restore, so it can be omitted. In fact, it must be omitted
since it would incorrectly subtract from the result of the load if it
wasn't. So for src_reg == dst_reg, JIT emits these insns:
3c: movabs $0x800000000000,%r11
46: add $0xfffffffffffffff0,%rdi
4d: cmp %r11,%rdi
50: jae 0x0000000000000056
52: xor %edi,%edi
54: jmp 0x000000000000005a
56: mov 0x0(%rdi),%rdi
5a:
The only difference from larger example being the omitted sub, which
would've been insn 5a in this example.
If offset == 0, we can similarly omit the sub as in previous case, since
there's nothing added to subtract. For the same reason we can omit the
addition as well, resulting in JIT emitting these insns:
46: movabs $0x800000000000,%r11
4d: cmp %r11,%rdi
50: jae 0x0000000000000056
52: xor %edi,%edi
54: jmp 0x000000000000005a
56: mov 0x0(%rdi),%rdi
5a:
Although the above example also has src_reg == dst_reg, the same
offset == 0 optimization is valid to apply if src_reg != dst_reg.
To summarize the improvements in emitted insn count for the
check-and-load:
BEFORE: 8 check insns, 3 jmps
AFTER (general case): 7 check insns, 2 jmps (12.5% fewer insn, 33% jmp)
AFTER (src == dst): 6 check insns, 2 jmps (25% fewer insn)
AFTER (offset == 0): 5 check insns, 2 jmps (37.5% fewer insn)
(Above counts don't include the 1 load insn, just checking around it)
Based on BPF bytecode + JITted x86 insn I saw while experimenting with
these improvements, I expect the src_reg == dst_reg case to occur most
often, followed by offset == 0, then the general case.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221216214319.3408356-1-davemarchevsky@fb.com
|
|
Bump current version for new development cycle to v1.2.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221221180049.853365-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
'struct bpf_local_storage_elem' has an unused 56 byte padding at the
end due to struct's cache-line alignment requirement. This padding
space is overlapped by storage value contents, so if we use sizeof()
to calculate the total size, we overinflate it by 56 bytes. Use
offsetof() instead to calculate more exact memory use.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221221013036.3427431-1-martin.lau@linux.dev
|
|
Changbin Du says:
====================
Display error message for missing ".BTF" section and clean up empty
vmlinux.h file.
v3:
- fix typo and make error message consistent. (Andrii Nakryiko)
- split out perf change.
v2:
- remove vmlinux specific error info.
- use builtin target .DELETE_ON_ERROR: to delete empty vmlinux.h
====================
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Remove the empty vmlinux.h if bpftool failed to dump btf info.
The empty vmlinux.h can hide real error when reading output
of make.
This is done by adding .DELETE_ON_ERROR special target in related
makefiles.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20221217223509.88254-3-changbin.du@gmail.com
|
|
Show the real problem instead of just saying "No such file or directory".
Now will print below info:
libbpf: failed to find '.BTF' ELF section in /home/changbin/work/linux/vmlinux
Error: failed to load BTF from /home/changbin/work/linux/vmlinux: No such file or directory
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221217223509.88254-2-changbin.du@gmail.com
|
|
Clang warns on 32-bit ARM on this comparision:
libbpf.c:10497:18: error: result of comparison of constant 4294967296 with expression of type 'size_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare]
if (ref_ctr_off >= (1ULL << PERF_UPROBE_REF_CTR_OFFSET_BITS))
~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Typecast ref_ctr_off to __u64 in the check conditional, it is false on
32bit anyways.
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221219191526.296264-1-raj.khem@gmail.com
|
|
Add documentation for BPF_MAP_TYPE_SOCK[MAP|HASH]
including kernel versions introduced, usage
and examples.
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20221219095512.26534-1-mtahhan@redhat.com
|
|
This patch adds a selftest simulating a GRE sender and receiver using
tunnel headers without tunnel keys. It validates if packets encapsulated
using BPF_F_NO_TUNNEL_KEY are decapsulated by a GRE receiver not
configured with tunnel keys.
Signed-off-by: Christian Ehrig <cehrig@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221218051734.31411-2-cehrig@cloudflare.com
|
|
This patch allows to remove TUNNEL_KEY from the tunnel flags bitmap
when using bpf_skb_set_tunnel_key by providing a BPF_F_NO_TUNNEL_KEY
flag. On egress, the resulting tunnel header will not contain a tunnel
key if the protocol and implementation supports it.
At the moment bpf_tunnel_key wants a user to specify a numeric tunnel
key. This will wrap the inner packet into a tunnel header with the key
bit and value set accordingly. This is problematic when using a tunnel
protocol that supports optional tunnel keys and a receiving tunnel
device that is not expecting packets with the key bit set. The receiver
won't decapsulate and drop the packet.
RFC 2890 and RFC 2784 GRE tunnels are examples where this flag is
useful. It allows for generating packets, that can be decapsulated by
a GRE tunnel device not operating in collect metadata mode or not
expecting the key bit set.
Signed-off-by: Christian Ehrig <cehrig@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221218051734.31411-1-cehrig@cloudflare.com
|
|
"Daniel T. Lee" says:
====================
Currently, compiling samples/bpf with LLVM emits several warning. They
are only small details, but they do not appear when compiled with GCC.
Detailed compilation command and warning logs can be found from bpf CI.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Currently, compiling samples/bpf with LLVM warns about the uninitialized
use of variable with test_current_task_under_cgroup.
./samples/bpf/test_current_task_under_cgroup_user.c:57:6:
warning: variable 'cg2' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
if (setup_cgroup_environment())
^~~~~~~~~~~~~~~~~~~~~~~~~~
./samples/bpf/test_current_task_under_cgroup_user.c:106:8:
note: uninitialized use occurs here
close(cg2);
^~~
./samples/bpf/test_current_task_under_cgroup_user.c:57:2:
note: remove the 'if' if its condition is always false
if (setup_cgroup_environment())
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./samples/bpf/test_current_task_under_cgroup_user.c:19:9:
note: initialize the variable 'cg2' to silence this warning
int cg2, idx = 0, rc = 1;
^
= 0
1 warning generated.
This commit resolve this compiler warning by pre-initialize the variable
with error for safeguard.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221218061453.6287-4-danieltimlee@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Currently, compiling samples/bpf with LLVM warns about the unused but
set variable with tracex4_user.
./samples/bpf/tracex4_user.c:54:14:
warning: variable 'i' set but not used [-Wunused-but-set-variable]
int map_fd, i, j = 0;
^
1 warning generated.
This commit resolve this compiler warning by replacing the meaningless
counter.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221218061453.6287-3-danieltimlee@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Currently, compiling samples/bpf with LLVM warns about the unused
function with test_lru_dist.
./samples/bpf/test_lru_dist.c:45:19:
warning: unused function 'list_empty' [-Wunused-function]
static inline int list_empty(const struct list_head *head)
^
1 warning generated.
This commit resolve this compiler warning by removing the abandoned
function.
Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221218061453.6287-2-danieltimlee@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Both bpf_trace_printk and bpf_trace_vprintk helpers use static buffer guarded
with trace_printk_lock spin lock.
The spin lock contention causes issues with bpf programs attached to
contention_begin tracepoint [1][2].
Andrii suggested we could get rid of the contention by using trylock, but we
could actually get rid of the spinlock completely by using percpu buffers the
same way as for bin_args in bpf_bprintf_prepare function.
Adding new return 'buf' argument to struct bpf_bprintf_data and making
bpf_bprintf_prepare to return also the buffer for printk helpers.
[1] https://lore.kernel.org/bpf/CACkBjsakT_yWxnSWr4r-0TpPvbKm9-OBmVUhJb7hV3hY8fdCkw@mail.gmail.com/
[2] https://lore.kernel.org/bpf/CACkBjsaCsTovQHFfkqJKto6S4Z8d02ud1D7MPESrHa1cVNNTrw@mail.gmail.com/
Reported-by: Hao Sun <sunhao.th@gmail.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221215214430.1336195-4-jolsa@kernel.org
|
|
Currently we always cleanup/decrement bpf_bprintf_nest_level variable
in bpf_bprintf_cleanup if it's > 0.
There's possible scenario where this could cause a problem, when
bpf_bprintf_prepare does not get bin_args buffer (because num_args is 0)
and following bpf_bprintf_cleanup call decrements bpf_bprintf_nest_level
variable, like:
in task context:
bpf_bprintf_prepare(num_args != 0) increments 'bpf_bprintf_nest_level = 1'
-> first irq :
bpf_bprintf_prepare(num_args == 0)
bpf_bprintf_cleanup decrements 'bpf_bprintf_nest_level = 0'
-> second irq:
bpf_bprintf_prepare(num_args != 0) bpf_bprintf_nest_level = 1
gets same buffer as task context above
Adding check to bpf_bprintf_cleanup and doing the real cleanup only if we
got bin_args data in the first place.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221215214430.1336195-3-jolsa@kernel.org
|
|
Adding struct bpf_bprintf_data to hold bin_args argument for
bpf_bprintf_prepare function.
We will add another return argument to bpf_bprintf_prepare and
pass the struct to bpf_bprintf_cleanup for proper cleanup in
following changes.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221215214430.1336195-2-jolsa@kernel.org
|
|
When libelf was not installed in the standard location, it cannot be
located by the current building config.
Use pkg-config to help locate libelf in such cases.
Signed-off-by: Shen Jiamin <shen_jiamin@comp.nus.edu.sg>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20221215044703.400139-1-shen_jiamin@comp.nus.edu.sg
|
|
Fix bug in btf_dump's logic of determining if a given struct type is
packed or not. The notion of "natural alignment" is not needed and is
even harmful in this case, so drop it altogether. The biggest difference
in btf_is_struct_packed() compared to its original implementation is
that we don't really use btf__align_of() to determine overall alignment
of a struct type (because it could be 1 for both packed and non-packed
struct, depending on specifci field definitions), and just use field's
actual alignment to calculate whether any field is requiring packing or
struct's size overall necessitates packing.
Add two simple test cases that demonstrate the difference this change
would make.
Fixes: ea2ce1ba99aa ("libbpf: Fix BTF-to-C converter's padding logic")
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20221215183605.4149488-1-andrii@kernel.org
|
|
Add few hand-crafted cases and few randomized cases found using script
from [0] that tests btf_dump's padding logic.
[0] https://lore.kernel.org/bpf/85f83c333f5355c8ac026f835b18d15060725fcb.camel@ericsson.com/
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-7-andrii@kernel.org
|
|
Turns out that btf_dump API doesn't handle a bunch of tricky corner
cases, as reported by Per, and further discovered using his testing
Python script ([0]).
This patch revamps btf_dump's padding logic significantly, making it
more correct and also avoiding unnecessary explicit padding, where
compiler would pad naturally. This overall topic turned out to be very
tricky and subtle, there are lots of subtle corner cases. The comments
in the code tries to give some clues, but comments themselves are
supposed to be paired with good understanding of C alignment and padding
rules. Plus some experimentation to figure out subtle things like
whether `long :0;` means that struct is now forced to be long-aligned
(no, it's not, turns out).
Anyways, Per's script, while not completely correct in some known
situations, doesn't show any obvious cases where this logic breaks, so
this is a nice improvement over the previous state of this logic.
Some selftests had to be adjusted to accommodate better use of natural
alignment rules, eliminating some unnecessary padding, or changing it to
`type: 0;` alignment markers.
Note also that for when we are in between bitfields, we emit explicit
bit size, while otherwise we use `: 0`, this feels much more natural in
practice.
Next patch will add few more test cases, found through randomized Per's
script.
[0] https://lore.kernel.org/bpf/85f83c333f5355c8ac026f835b18d15060725fcb.camel@ericsson.com/
Reported-by: Per Sundström XP <per.xp.sundstrom@ericsson.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-6-andrii@kernel.org
|
|
btf__align_of() is supposed to be return alignment requirement of
a requested BTF type. For STRUCT/UNION it doesn't always return correct
value, because it calculates alignment only based on field types. But
for packed structs this is not enough, we need to also check field
offsets and struct size. If field offset isn't aligned according to
field type's natural alignment, then struct must be packed. Similarly,
if struct size is not a multiple of struct's natural alignment, then
struct must be packed as well.
This patch fixes this issue precisely by additionally checking these
conditions.
Fixes: 3d208f4ca111 ("libbpf: Expose btf__align_of() API")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-5-andrii@kernel.org
|
|
Add few custom enum definitions testing mode(byte) and mode(word)
attributes.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-4-andrii@kernel.org
|
|
Turns out C allows to force enum to be 1-byte or 8-byte explicitly using
mode(byte) or mode(word), respecticely. Linux sources are using this in
some cases. This is imporant to handle correctly, as enum size
determines corresponding fields in a struct that use that enum type. And
if enum size is incorrect, this will lead to invalid struct layout. So
add mode(byte) and mode(word) attribute support to btf_dump APIs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-3-andrii@kernel.org
|
|
btf_dump APIs emit unnecessary tabs when emitting struct/union
definition that fits on the single line. Before this patch we'd get:
struct blah {<tab>};
This patch fixes this and makes sure that we get more natural:
struct blah {};
Fixes: 44a726c3f23c ("bpftool: Print newline before '}' for struct with padding only fields")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-2-andrii@kernel.org
|
|
This is a small improvement in libbpf_strerror. When libbpf_strerror
is used to obtain the system error description, if the length of the
buf is insufficient, libbpf_sterror returns ERANGE and sets errno to
ERANGE.
However, this processing is not performed when the error code
customized by libbpf is obtained. Make some minor improvements here,
return -ERANGE and set errno to ERANGE when buf is not enough for
custom description.
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221210082045.233697-1-liuxin350@huawei.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the
performances of complex queue discipline configurations
- Add inet drop monitor support
- A few GRO performance improvements
- Add infrastructure for atomic dev stats, addressing long standing
data races
- De-duplicate common code between OVS and conntrack offloading
infrastructure
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up the
workload with the number of available CPUs
- Add the helper support for connection-tracking OVS offload
BPF:
- Support for user defined BPF objects: the use case is to allocate
own objects, build own object hierarchies and use the building
blocks to build own data structures flexibly, for example, linked
lists in BPF
- Make cgroup local storage available to non-cgroup attached BPF
programs
- Avoid unnecessary deadlock detection and failures wrt BPF task
storage helpers
- A relevant bunch of BPF verifier fixes and improvements
- Veristat tool improvements to support custom filtering, sorting,
and replay of results
- Add LLVM disassembler as default library for dumping JITed code
- Lots of new BPF documentation for various BPF maps
- Add bpf_rcu_read_{,un}lock() support for sleepable programs
- Add RCU grace period chaining to BPF to wait for the completion of
access from both sleepable and non-sleepable BPF programs
- Add support storing struct task_struct objects as kptrs in maps
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
Protocols:
- TCP: implement Protective Load Balancing across switch links
- TCP: allow dynamically disabling TCP-MD5 static key, reverting back
to fast[er]-path
- UDP: Introduce optional per-netns hash lookup table
- IPv6: simplify and cleanup sockets disposal
- Netlink: support different type policies for each generic netlink
operation
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support
- MPTCP: add netlink notification support for listener sockets events
- SCTP: add VRF support, allowing sctp sockets binding to VRF devices
- Add bridging MAC Authentication Bypass (MAB) support
- Extensions for Ethernet VPN bridging implementation to better
support multicast scenarios
- More work for Wi-Fi 7 support, comprising conversion of all the
existing drivers to internal TX queue usage
- IPSec: introduce a new offload type (packet offload) allowing
complete header processing and crypto offloading
- IPSec: extended ack support for more descriptive XFRM error
reporting
- RXRPC: increase SACK table size and move processing into a
per-local endpoint kernel thread, reducing considerably the
required locking
- IEEE 802154: synchronous send frame and extended filtering support,
initial support for scanning available 15.4 networks
- Tun: bump the link speed from 10Mbps to 10Gbps
- Tun/VirtioNet: implement UDP segmentation offload support
Driver API:
- PHY/SFP: improve power level switching between standard level 1 and
the higher power levels
- New API for netdev <-> devlink_port linkage
- PTP: convert existing drivers to new frequency adjustment
implementation
- DSA: add support for rx offloading
- Autoload DSA tagging driver when dynamically changing protocol
- Add new PCP and APPTRUST attributes to Data Center Bridging
- Add configuration support for 800Gbps link speed
- Add devlink port function attribute to enable/disable RoCE and
migratable
- Extend devlink-rate to support strict prioriry and weighted fair
queuing
- Add devlink support to directly reading from region memory
- New device tree helper to fetch MAC address from nvmem
- New big TCP helper to simplify temporary header stripping
New hardware / drivers:
- Ethernet:
- Marvel Octeon CNF95N and CN10KB Ethernet Switches
- Marvel Prestera AC5X Ethernet Switch
- WangXun 10 Gigabit NIC
- Motorcomm yt8521 Gigabit Ethernet
- Microchip ksz9563 Gigabit Ethernet Switch
- Microsoft Azure Network Adapter
- Linux Automation 10Base-T1L adapter
- PHY:
- Aquantia AQR112 and AQR412
- Motorcomm YT8531S
- PTP:
- Orolia ART-CARD
- WiFi:
- MediaTek Wi-Fi 7 (802.11be) devices
- RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
devices
- Bluetooth:
- Broadcom BCM4377/4378/4387 Bluetooth chipsets
- Realtek RTL8852BE and RTL8723DS
- Cypress.CYW4373A0 WiFi + Bluetooth combo device
Drivers:
- CAN:
- gs_usb: bus error reporting support
- kvaser_usb: listen only and bus error reporting support
- Ethernet NICs:
- Intel (100G):
- extend action skbedit to RX queue mapping
- implement devlink-rate support
- support direct read from memory
- nVidia/Mellanox (mlx5):
- SW steering improvements, increasing rules update rate
- Support for enhanced events compression
- extend H/W offload packet manipulation capabilities
- implement IPSec packet offload mode
- nVidia/Mellanox (mlx4):
- better big TCP support
- Netronome Ethernet NICs (nfp):
- IPsec offload support
- add support for multicast filter
- Broadcom:
- RSS and PTP support improvements
- AMD/SolarFlare:
- netlink extened ack improvements
- add basic flower matches to offload, and related stats
- Virtual NICs:
- ibmvnic: introduce affinity hint support
- small / embedded:
- FreeScale fec: add initial XDP support
- Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
- TI am65-cpsw: add suspend/resume support
- Mediatek MT7986: add RX wireless wthernet dispatch support
- Realtek 8169: enable GRO software interrupt coalescing per
default
- Ethernet high-speed switches:
- Microchip (sparx5):
- add support for Sparx5 TC/flower H/W offload via VCAP
- Mellanox mlxsw:
- add 802.1X and MAC Authentication Bypass offload support
- add ip6gre support
- Embedded Ethernet switches:
- Mediatek (mtk_eth_soc):
- improve PCS implementation, add DSA untag support
- enable flow offload support
- Renesas:
- add rswitch R-Car Gen4 gPTP support
- Microchip (lan966x):
- add full XDP support
- add TC H/W offload via VCAP
- enable PTP on bridge interfaces
- Microchip (ksz8):
- add MTU support for KSZ8 series
- Qualcomm 802.11ax WiFi (ath11k):
- support configuring channel dwell time during scan
- MediaTek WiFi (mt76):
- enable Wireless Ethernet Dispatch (WED) offload support
- add ack signal support
- enable coredump support
- remain_on_channel support
- Intel WiFi (iwlwifi):
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
- 320 MHz channels support
- RealTek WiFi (rtw89):
- new dynamic header firmware format support
- wake-over-WLAN support"
* tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
ipvs: fix type warning in do_div() on 32 bit
net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
net: ipa: add IPA v4.7 support
dt-bindings: net: qcom,ipa: Add SM6350 compatible
bnxt: Use generic HBH removal helper in tx path
IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
selftests: forwarding: Add bridge MDB test
selftests: forwarding: Rename bridge_mdb test
bridge: mcast: Support replacement of MDB port group entries
bridge: mcast: Allow user space to specify MDB entry routing protocol
bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
bridge: mcast: Add support for (*, G) with a source list and filter mode
bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
bridge: mcast: Add a flag for user installed source entries
bridge: mcast: Expose __br_multicast_del_group_src()
bridge: mcast: Expose br_multicast_new_group_src()
bridge: mcast: Add a centralized error path
bridge: mcast: Place netlink policy before validation functions
bridge: mcast: Split (*, G) and (S, G) addition into different functions
bridge: mcast: Do not derive entry type from its filter mode
...
|
|
Pull Xtensa updates from Max Filippov:
- fix kernel build with gcc-13
- various minor fixes
* tag 'xtensa-20221213' of https://github.com/jcmvbkbc/linux-xtensa:
xtensa: add __umulsidi3 helper
xtensa: update config files
MAINTAINERS: update the 'T:' entry for xtensa
|