summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2021-01-26selftests/bpf: Don't exit on failed bpf_testmod unloadAndrii Nakryiko
Fix bug in handling bpf_testmod unloading that will cause test_progs exiting prematurely if bpf_testmod unloading failed. This is especially problematic when running a subset of test_progs that doesn't require root permissions and doesn't rely on bpf_testmod, yet will fail immediately due to exit(1) in unload_bpf_testmod(). Fixes: 9f7fa225894c ("selftests/bpf: Add bpf_testmod kernel module for testing") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210126065019.1268027-1-andrii@kernel.org
2021-01-26samples/bpf: Set flag __SANE_USERSPACE_TYPES__ for MIPS to fix build warningsTiezhu Yang
There exists many build warnings when make M=samples/bpf on the Loongson platform, this issue is MIPS related, x86 compiles just fine. Here are some warnings: CC samples/bpf/ibumad_user.o samples/bpf/ibumad_user.c: In function ‘dump_counts’: samples/bpf/ibumad_user.c:46:24: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf("0x%02x : %llu\n", key, value); ~~~^ ~~~~~ %lu CC samples/bpf/offwaketime_user.o samples/bpf/offwaketime_user.c: In function ‘print_ksym’: samples/bpf/offwaketime_user.c:34:17: warning: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf("%s/%llx;", sym->name, addr); ~~~^ ~~~~ %lx samples/bpf/offwaketime_user.c: In function ‘print_stack’: samples/bpf/offwaketime_user.c:68:17: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 3 has type ‘__u64’ {aka ‘long unsigned int’} [-Wformat=] printf(";%s %lld\n", key->waker, count); ~~~^ ~~~~~ %ld MIPS needs __SANE_USERSPACE_TYPES__ before <linux/types.h> to select 'int-ll64.h' in arch/mips/include/uapi/asm/types.h, then it can avoid build warnings when printing __u64 with %llu, %llx or %lld. The header tools/include/linux/types.h defines __SANE_USERSPACE_TYPES__, it seems that we can include <linux/types.h> in the source files which have build warnings, but it has no effect due to actually it includes usr/include/linux/types.h instead of tools/include/linux/types.h, the problem is that "usr/include" is preferred first than "tools/include" in samples/bpf/Makefile, that sounds like a ugly hack to -Itools/include before -Iusr/include. So define __SANE_USERSPACE_TYPES__ for MIPS in samples/bpf/Makefile is proper, if add "TPROGS_CFLAGS += -D__SANE_USERSPACE_TYPES__" in samples/bpf/Makefile, it appears the following error: Auto-detecting system features: ... libelf: [ on ] ... zlib: [ on ] ... bpf: [ OFF ] BPF API too old make[3]: *** [Makefile:293: bpfdep] Error 1 make[2]: *** [Makefile:156: all] Error 2 With #ifndef __SANE_USERSPACE_TYPES__ in tools/include/linux/types.h, the above error has gone and this ifndef change does not hurt other compilations. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/1611551146-14052-1-git-send-email-yangtiezhu@loongson.cn
2021-01-26tools, headers: Sync struct bpf_perf_event_dataFlorian Lehner
Update struct bpf_perf_event_data with the addr field to match the tools headers with the kernel headers. Signed-off-by: Florian Lehner <dev@der-flo.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210123185221.23946-1-dev@der-flo.net
2021-01-26selftests/bpf: Avoid useless void *-castsBjörn Töpel
There is no need to cast to void * when the argument is void *. Avoid cluttering of code. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-13-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Consistent malloc/calloc usageBjörn Töpel
Use calloc instead of malloc where it makes sense, and avoid C++-style void *-cast. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-12-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Avoid heap allocationBjörn Töpel
The data variable is only used locally. Instead of using the heap, stick to using the stack. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-11-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Define local variables at the beginning of a blockBjörn Töpel
Use C89 rules for variable definition. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-10-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Change type from void * to struct generic_data *Björn Töpel
Instead of casting from void *, let us use the actual type in gen_udp_hdr(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-9-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Change type from void * to struct ifaceconfigobj *Björn Töpel
Instead of casting from void *, let us use the actual type in init_iface_config(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-8-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Remove casting by introduce local variableBjörn Töpel
Let us use a local variable in nsswitchthread(), so we can remove a lot of casting for better readability. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-7-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Improve readability of xdpxceiver/worker_pkt_validate()Björn Töpel
Introduce a local variable to get rid of lot of casting. Move common code outside the if/else-clause. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-6-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Remove memory leakBjörn Töpel
The allocated entry is immediately overwritten by an assignment. Fix that. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-5-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Fix style warningsBjörn Töpel
Silence three checkpatch style warnings. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-4-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Remove unused enumsBjörn Töpel
The enums undef and bidi are not used. Remove them. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-3-bjorn.topel@gmail.com
2021-01-26selftests/bpf: Remove a lot of ifobject castingBjörn Töpel
Instead of passing void * all over the place, let us pass the actual type (ifobject) and remove the void-ptr-to-type-ptr casting. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210122154725.22140-2-bjorn.topel@gmail.com
2021-01-25libbpf, xsk: Select AF_XDP BPF program based on kernel versionBjörn Töpel
Add detection for kernel version, and adapt the BPF program based on kernel support. This way, users will get the best possible performance from the BPF program. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Marek Majtyka <alardam@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/bpf/20210122105351.11751-4-bjorn.topel@gmail.com
2021-01-21selftest/bpf: Fix typoJunlin Yang
Change 'exeeds' to 'exceeds'. Signed-off-by: Junlin Yang <yangjunlin@yulong.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210121122309.1501-1-angkery@163.com
2021-01-21libbpf: Use string table index from index table if neededJiri Olsa
For very large ELF objects (with many sections), we could get special value SHN_XINDEX (65535) for elf object's string table index - e_shstrndx. Call elf_getshdrstrndx to get the proper string table index, instead of reading it directly from ELF header. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210121202203.9346-4-jolsa@kernel.org
2021-01-20bpf: Remove extra lock_sock for TCP_ZEROCOPY_RECEIVEStanislav Fomichev
Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE. We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom call in do_tcp_getsockopt using the on-stack data. This removes 3% overhead for locking/unlocking the socket. Without this patch: 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt | --3.30%--__cgroup_bpf_run_filter_getsockopt | --0.81%--__kmalloc With the patch applied: 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern Note, exporting uapi/tcp.h requires removing netinet/tcp.h from test_progs.h because those headers have confliciting definitions. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-2-sdf@google.com
2021-01-20bpf: Permit size-0 datasecYonghong Song
llvm patch https://reviews.llvm.org/D84002 permitted to emit empty rodata datasec if the elf .rodata section contains read-only data from local variables. These local variables will be not emitted as BTF_KIND_VARs since llvm converted these local variables as static variables with private linkage without debuginfo types. Such an empty rodata datasec will make skeleton code generation easy since for skeleton a rodata struct will be generated if there is a .rodata elf section. The existence of a rodata btf datasec is also consistent with the existence of a rodata map created by libbpf. The btf with such an empty rodata datasec will fail in the kernel though as kernel will reject a datasec with zero vlen and zero size. For example, for the below code, int sys_enter(void *ctx) { int fmt[6] = {1, 2, 3, 4, 5, 6}; int dst[6]; bpf_probe_read(dst, sizeof(dst), fmt); return 0; } We got the below btf (bpftool btf dump ./test.o): [1] PTR '(anon)' type_id=0 [2] FUNC_PROTO '(anon)' ret_type_id=3 vlen=1 'ctx' type_id=1 [3] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED [4] FUNC 'sys_enter' type_id=2 linkage=global [5] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED [6] ARRAY '(anon)' type_id=5 index_type_id=7 nr_elems=4 [7] INT '__ARRAY_SIZE_TYPE__' size=4 bits_offset=0 nr_bits=32 encoding=(none) [8] VAR '_license' type_id=6, linkage=global-alloc [9] DATASEC '.rodata' size=0 vlen=0 [10] DATASEC 'license' size=0 vlen=1 type_id=8 offset=0 size=4 When loading the ./test.o to the kernel with bpftool, we see the following error: libbpf: Error loading BTF: Invalid argument(22) libbpf: magic: 0xeb9f ... [6] ARRAY (anon) type_id=5 index_type_id=7 nr_elems=4 [7] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none) [8] VAR _license type_id=6 linkage=1 [9] DATASEC .rodata size=24 vlen=0 vlen == 0 libbpf: Error loading .BTF into kernel: -22. BTF is optional, ignoring. Basically, libbpf changed .rodata datasec size to 24 since elf .rodata section size is 24. The kernel then rejected the BTF since vlen = 0. Note that the above kernel verifier failure can be worked around with changing local variable "fmt" to a static or global, optionally const, variable. This patch permits a datasec with vlen = 0 in kernel. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210119153519.3901963-1-yhs@fb.com
2021-01-20selftests: bpf: Add a new test for bare tracepointsQais Yousef
Reuse module_attach infrastructure to add a new bare tracepoint to check we can attach to it as a raw tracepoint. Signed-off-by: Qais Yousef <qais.yousef@arm.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210119122237.2426878-3-qais.yousef@arm.com
2021-01-20selftests/bpf: Add verifier tests for x64 jit jump paddingGary Lin
There are 3 tests added into verifier's jit tests to trigger x64 jit jump padding. The first test can be represented as the following assembly code: 1: bpf_call bpf_get_prandom_u32 2: if r0 == 1 goto pc+128 3: if r0 == 2 goto pc+128 ... 129: if r0 == 128 goto pc+128 130: goto pc+128 131: goto pc+127 ... 256: goto pc+2 257: goto pc+1 258: r0 = 1 259: ret We first store a random number to r0 and add the corresponding conditional jumps (2~129) to make verifier believe that those jump instructions from 130 to 257 are reachable. When the program is sent to x64 jit, it starts to optimize out the NOP jumps backwards from 257. Since there are 128 such jumps, the program easily reaches 15 passes and triggers jump padding. Here is the x64 jit code of the first test: 0: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 5: 66 90 xchg ax,ax 7: 55 push rbp 8: 48 89 e5 mov rbp,rsp b: e8 4c 90 75 e3 call 0xffffffffe375905c 10: 48 83 f8 01 cmp rax,0x1 14: 0f 84 fe 04 00 00 je 0x518 1a: 48 83 f8 02 cmp rax,0x2 1e: 0f 84 f9 04 00 00 je 0x51d ... f6: 48 83 f8 18 cmp rax,0x18 fa: 0f 84 8b 04 00 00 je 0x58b 100: 48 83 f8 19 cmp rax,0x19 104: 0f 84 86 04 00 00 je 0x590 10a: 48 83 f8 1a cmp rax,0x1a 10e: 0f 84 81 04 00 00 je 0x595 ... 500: 0f 84 83 01 00 00 je 0x689 506: 48 81 f8 80 00 00 00 cmp rax,0x80 50d: 0f 84 76 01 00 00 je 0x689 513: e9 71 01 00 00 jmp 0x689 518: e9 6c 01 00 00 jmp 0x689 ... 5fe: e9 86 00 00 00 jmp 0x689 603: e9 81 00 00 00 jmp 0x689 608: 0f 1f 00 nop DWORD PTR [rax] 60b: eb 7c jmp 0x689 60d: eb 7a jmp 0x689 ... 683: eb 04 jmp 0x689 685: eb 02 jmp 0x689 687: 66 90 xchg ax,ax 689: b8 01 00 00 00 mov eax,0x1 68e: c9 leave 68f: c3 ret As expected, a 3 bytes NOPs is inserted at 608 due to the transition from imm32 jmp to imm8 jmp. A 2 bytes NOPs is also inserted at 687 to replace a NOP jump. The second test case is tricky. Here is the assembly code: 1: bpf_call bpf_get_prandom_u32 2: if r0 == 1 goto pc+2048 3: if r0 == 2 goto pc+2048 ... 2049: if r0 == 2048 goto pc+2048 2050: goto pc+2048 2051: goto pc+16 2052: goto pc+15 ... 2064: goto pc+3 2065: goto pc+2 2066: goto pc+1 ... [repeat "goto pc+16".."goto pc+1" 127 times] ... 4099: r0 = 2 4100: ret There are 4 major parts of the program. 1) 1~2049: Those are instructions to make 2050~4098 reachable. Some of them also could generate the padding for jmp_cond. 2) 2050: This is the target instruction for the imm32 nop jmp padding. 3) 2051~4098: The repeated "goto 1~16" instructions are designed to be consumed by the nop jmp optimization. In the end, those instrucitons become 128 continuous 0 offset jmp and are optimized out in 1 pass, and this make insn 2050 an imm32 nop jmp in the next pass, so that we can trigger the 5 bytes padding. 4) 4099~4100: Those are the instructions to end the program. The x64 jit code is like this: 0: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 5: 66 90 xchg ax,ax 7: 55 push rbp 8: 48 89 e5 mov rbp,rsp b: e8 bc 7b d5 d3 call 0xffffffffd3d57bcc 10: 48 83 f8 01 cmp rax,0x1 14: 0f 84 7e 66 00 00 je 0x6698 1a: 48 83 f8 02 cmp rax,0x2 1e: 0f 84 74 66 00 00 je 0x6698 24: 48 83 f8 03 cmp rax,0x3 28: 0f 84 6a 66 00 00 je 0x6698 2e: 48 83 f8 04 cmp rax,0x4 32: 0f 84 60 66 00 00 je 0x6698 38: 48 83 f8 05 cmp rax,0x5 3c: 0f 84 56 66 00 00 je 0x6698 42: 48 83 f8 06 cmp rax,0x6 46: 0f 84 4c 66 00 00 je 0x6698 ... 666c: 48 81 f8 fe 07 00 00 cmp rax,0x7fe 6673: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 6677: 74 1f je 0x6698 6679: 48 81 f8 ff 07 00 00 cmp rax,0x7ff 6680: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 6684: 74 12 je 0x6698 6686: 48 81 f8 00 08 00 00 cmp rax,0x800 668d: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 6691: 74 05 je 0x6698 6693: 0f 1f 44 00 00 nop DWORD PTR [rax+rax*1+0x0] 6698: b8 02 00 00 00 mov eax,0x2 669d: c9 leave 669e: c3 ret Since insn 2051~4098 are optimized out right before the padding pass, there are several conditional jumps from the first part are replaced with imm8 jmp_cond, and this triggers the 4 bytes padding, for example at 6673, 6680, and 668d. On the other hand, Insn 2050 is replaced with the 5 bytes nops at 6693. The third test is to invoke the first and second tests as subprogs to test bpf2bpf. Per the system log, there was one more jit happened with only one pass and the same jit code was produced. v4: - Add the second test case which triggers jmp_cond padding and imm32 nop jmp padding. - Add the new test case as another subprog Signed-off-by: Gary Lin <glin@suse.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210119102501.511-4-glin@suse.com
2021-01-20bpf, selftests: Fold test_current_pid_tgid_new_ns into test_progs.Carlos Neira
Currently tests for bpf_get_ns_current_pid_tgid() are outside test_progs. This change folds test cases into test_progs. Changes from v11: - Fixed test failure is not detected. - Removed EXIT(3) call as it will stop test_progs execution. Signed-off-by: Carlos Neira <cneirabustos@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210114141033.GA17348@localhost Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Conflicts: drivers/net/can/dev.c commit 03f16c5075b2 ("can: dev: can_restart: fix use after free bug") commit 3e77f70e7345 ("can: dev: move driver related infrastructure into separate subdir") Code move. drivers/net/dsa/b53/b53_common.c commit 8e4052c32d6b ("net: dsa: b53: fix an off by one in checking "vlan->vid"") commit b7a9e0da2d1c ("net: switchdev: remove vid_begin -> vid_end range from VLAN objects") Field rename. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-20Merge tag 'net-5.11-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.11-rc5, including fixes from bpf, wireless, and can trees. Current release - regressions: - nfc: nci: fix the wrong NCI_CORE_INIT parameters Current release - new code bugs: - bpf: allow empty module BTFs Previous releases - regressions: - bpf: fix signed_{sub,add32}_overflows type handling - tcp: do not mess with cloned skbs in tcp_add_backlog() - bpf: prevent double bpf_prog_put call from bpf_tracing_prog_attach - bpf: don't leak memory in bpf getsockopt when optlen == 0 - tcp: fix potential use-after-free due to double kfree() - mac80211: fix encryption issues with WEP - devlink: use right genl user_ptr when handling port param get/set - ipv6: set multicast flag on the multicast route - tcp: fix TCP_USER_TIMEOUT with zero window Previous releases - always broken: - bpf: local storage helpers should check nullness of owner ptr passed - mac80211: fix incorrect strlen of .write in debugfs - cls_flower: call nla_ok() before nla_next() - skbuff: back tiny skbs with kmalloc() in __netdev_alloc_skb() too" * tag 'net-5.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (52 commits) net: systemport: free dev before on error path net: usb: cdc_ncm: don't spew notifications net: mscc: ocelot: Fix multicast to the CPU port tcp: Fix potential use-after-free due to double kfree() bpf: Fix signed_{sub,add32}_overflows type handling can: peak_usb: fix use after free bugs can: vxcan: vxcan_xmit: fix use after free bug can: dev: can_restart: fix use after free bug tcp: fix TCP socket rehash stats mis-accounting net: dsa: b53: fix an off by one in checking "vlan->vid" tcp: do not mess with cloned skbs in tcp_add_backlog() selftests: net: fib_tests: remove duplicate log test net: nfc: nci: fix the wrong NCI_CORE_INIT parameters sh_eth: Fix power down vs. is_opened flag ordering net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled netfilter: rpfilter: mask ecn bits before fib lookup udp: mask TOS bits in udp_v4_early_demux() xsk: Clear pool even for inactive queues bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callback sh_eth: Make PHY access aware of Runtime PM to fix reboot crash ...
2021-01-19selftests: forwarding: Fix spelling mistake "succeded" -> "succeeded"Colin Ian King
There are two spelling mistakes in check_fail messages. Fix them. Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210118111902.71096-1-colin.king@canonical.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-19selftests: net: fib_tests: remove duplicate log testHangbin Liu
The previous test added an address with a specified metric and check if correspond route was created. I somehow added two logs for the same test. Remove the duplicated one. Reported-by: Antoine Tenart <atenart@redhat.com> Fixes: 0d29169a708b ("selftests/net/fib_tests: update addr_metric_test for peer route testing") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20210119025930.2810532-1-liuhangbin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-17Merge tag 'perf-tools-fixes-2021-01-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools fixes from Arnaldo Carvalho de Melo: - Fix 'CPU too large' error in Intel PT - Correct event attribute sizes in 'perf inject' - Sync build_bug.h and kvm.h kernel copies - Fix bpf.h header include directive in 5sec.c 'perf trace' bpf example - libbpf tests fixes - Fix shadow stat 'perf test' for non-bash shells - Take cgroups into account for shadow stats in 'perf stat' * tag 'perf-tools-fixes-2021-01-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: perf inject: Correct event attribute sizes perf intel-pt: Fix 'CPU too large' error perf stat: Take cgroups into account for shadow stats perf stat: Introduce struct runtime_stat_data libperf tests: Fail when failing to get a tracepoint id libperf tests: If a test fails return non-zero libperf tests: Avoid uninitialized variable warning perf test: Fix shadow stat test for non-bash shells tools headers: Syncronize linux/build_bug.h with the kernel sources tools headers UAPI: Sync kvm.h headers with the kernel sources perf bpf examples: Fix bpf.h header include directive in 5sec.c example
2021-01-15GTP: add support for flow based tunneling APIPravin B Shelar
Following patch add support for flow based tunneling API to send and recv GTP tunnel packet over tunnel metadata API. This would allow this device integration with OVS or eBPF using flow based tunneling APIs. Signed-off-by: Pravin B Shelar <pbshelar@fb.com> Link: https://lore.kernel.org/r/20210110070021.26822-1-pbshelar@fb.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf-next 2021-01-16 1) Extend atomic operations to the BPF instruction set along with x86-64 JIT support, that is, atomic{,64}_{xchg,cmpxchg,fetch_{add,and,or,xor}}, from Brendan Jackman. 2) Add support for using kernel module global variables (__ksym externs in BPF programs) retrieved via module's BTF, from Andrii Nakryiko. 3) Generalize BPF stackmap's buildid retrieval and add support to have buildid stored in mmap2 event for perf, from Jiri Olsa. 4) Various fixes for cross-building BPF sefltests out-of-tree which then will unblock wider automated testing on ARM hardware, from Jean-Philippe Brucker. 5) Allow to retrieve SOL_SOCKET opts from sock_addr progs, from Daniel Borkmann. 6) Clean up driver's XDP buffer init and split into two helpers to init per- descriptor and non-changing fields during processing, from Lorenzo Bianconi. 7) Minor misc improvements to libbpf & bpftool, from Ian Rogers. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (41 commits) perf: Add build id data in mmap2 event bpf: Add size arg to build_id_parse function bpf: Move stack_map_get_build_id into lib bpf: Document new atomic instructions bpf: Add tests for new BPF atomic operations bpf: Add bitwise atomic instructions bpf: Pull out a macro for interpreting atomic ALU operations bpf: Add instructions for atomic_[cmp]xchg bpf: Add BPF_FETCH field / create atomic_fetch_add instruction bpf: Move BPF_STX reserved field check into BPF_STX verifier code bpf: Rename BPF_XADD and prepare to encode other atomics in .imm bpf: x86: Factor out a lookup table for some ALU opcodes bpf: x86: Factor out emission of REX byte bpf: x86: Factor out emission of ModR/M for *(reg + off) tools/bpftool: Add -Wall when building BPF programs bpf, libbpf: Avoid unused function warning on bpf_tail_call_static selftests/bpf: Install btf_dump test cases selftests/bpf: Fix installation of urandom_read selftests/bpf: Move generated test files to $(TEST_GEN_FILES) selftests/bpf: Fix out-of-tree build ... ==================== Link: https://lore.kernel.org/r/20210116012922.17823-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfJakub Kicinski
Daniel Borkmann says: ==================== pull-request: bpf 2021-01-16 1) Fix a double bpf_prog_put() for BPF_PROG_{TYPE_EXT,TYPE_TRACING} types in link creation's error path causing a refcount underflow, from Jiri Olsa. 2) Fix BTF validation errors for the case where kernel modules don't declare any new types and end up with an empty BTF, from Andrii Nakryiko. 3) Fix BPF local storage helpers to first check their {task,inode} owners for being NULL before access, from KP Singh. 4) Fix a memory leak in BPF setsockopt handling for the case where optlen is zero and thus temporary optval buffer should be freed, from Stanislav Fomichev. 5) Fix a syzbot memory allocation splat in BPF_PROG_TEST_RUN infra for raw_tracepoint caused by too big ctx_size_in, from Song Liu. 6) Fix LLVM code generation issues with verifier where PTR_TO_MEM{,_OR_NULL} registers were spilled to stack but not recognized, from Gilad Reti. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: MAINTAINERS: Update my email address selftests/bpf: Add verifier test for PTR_TO_MEM spill bpf: Support PTR_TO_MEM{,_OR_NULL} register spilling bpf: Reject too big ctx_size_in for raw_tp test run libbpf: Allow loading empty BTFs bpf: Allow empty module BTFs bpf: Don't leak memory in bpf getsockopt when optlen == 0 bpf: Update local storage test to check handling of null ptrs bpf: Fix typo in bpf_inode_storage.c bpf: Local storage helpers should check nullness of owner ptr passed bpf: Prevent double bpf_prog_put call from bpf_tracing_prog_attach ==================== Link: https://lore.kernel.org/r/20210116002025.15706-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-15Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Set the minimum GCC version to 5.1 for arm64 due to earlier compiler bugs. - Make atomic helpers __always_inline to avoid a section mismatch when compiling with clang. - Fix the CMA and crashkernel reservations to use ZONE_DMA (remove the arm64_dma32_phys_limit variable, no longer needed with a dynamic ZONE_DMA sizing in 5.11). - Remove redundant IRQ flag tracing that was leaving lockdep inconsistent with the hardware state. - Revert perf events based hard lockup detector that was causing smp_processor_id() to be called in preemptible context. - Some trivial cleanups - spelling fix, renaming S_FRAME_SIZE to PT_REGS_SIZE, function prototypes added. * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: selftests: Fix spelling of 'Mismatch' arm64: syscall: include prototype for EL0 SVC functions compiler.h: Raise minimum version of GCC to 5.1 for arm64 arm64: make atomic helpers __always_inline arm64: rename S_FRAME_SIZE to PT_REGS_SIZE Revert "arm64: Enable perf events based hard lockup detector" arm64: entry: remove redundant IRQ flag tracing arm64: Remove arm64_dma32_phys_limit and its uses
2021-01-15perf inject: Correct event attribute sizesAl Grant
When 'perf inject' reads a perf.data file from an older version of perf, it writes event attributes into the output with the original size field, but lays them out as if they had the size currently used. Readers see a corrupt file. Update the size field to match the layout. Signed-off-by: Al Grant <al.grant@foss.arm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20201124195818.30603-1-al.grant@arm.com Signed-off-by: Denis Nikitin <denik@chromium.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf intel-pt: Fix 'CPU too large' errorAdrian Hunter
In some cases, the number of cpus (nr_cpus_online) is confused with the maximum cpu number (nr_cpus_avail), which results in the error in the example below: Example on system with 8 cpus: Before: # echo 0 > /sys/devices/system/cpu/cpu2/online # ./perf record --kcore -e intel_pt// taskset --cpu-list 7 uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.147 MB perf.data ] # ./perf script --itrace=e Requested CPU 7 too large. Consider raising MAX_NR_CPUS 0x25908 [0x8]: failed to process type: 68 [Invalid argument] After: # ./perf script --itrace=e # Fixes: 8c7274691f0d ("perf machine: Replace MAX_NR_CPUS with perf_env::nr_cpus_online") Fixes: 7df4e36a4785 ("perf session: Replace MAX_NR_CPUS with perf_env::nr_cpus_online") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Kan Liang <kan.liang@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lore.kernel.org/lkml/20210107174159.24897-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf stat: Take cgroups into account for shadow statsNamhyung Kim
As of now it doesn't consider cgroups when collecting shadow stats and metrics so counter values from different cgroups will be saved in a same slot. This resulted in incorrect numbers when those cgroups have different workloads. For example, let's look at the scenario below: cgroups A and C runs same workload which burns a cpu while cgroup B runs a light workload. $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1 Performance counter stats for 'system wide': 3,958,116,522 cycles A 6,722,650,929 instructions A # 2.53 insn per cycle 1,132,741 cycles B 571,743 instructions B # 0.00 insn per cycle 4,007,799,935 cycles C 6,793,181,523 instructions C # 2.56 insn per cycle 1.001050869 seconds time elapsed When I run 'perf stat' with single workload, it usually shows IPC around 1.7. We can verify it (6,722,650,929.0 / 3,958,116,522 = 1.698) for cgroup A. But in this case, since cgroups are ignored, cycles are averaged so it used the lower value for IPC calculation and resulted in around 2.5. avg cycle: (3958116522 + 1132741 + 4007799935) / 3 = 2655683066 IPC (A) : 6722650929 / 2655683066 = 2.531 IPC (B) : 571743 / 2655683066 = 0.0002 IPC (C) : 6793181523 / 2655683066 = 2.557 We can simply compare cgroup pointers in the evsel and it'll be NULL when cgroups are not specified. With this patch, I can see correct numbers like below: $ perf stat -a -e cycles,instructions --for-each-cgroup A,B,C sleep 1 Performance counter stats for 'system wide': 4,171,051,687 cycles A 7,219,793,922 instructions A # 1.73 insn per cycle 1,051,189 cycles B 583,102 instructions B # 0.55 insn per cycle 4,171,124,710 cycles C 7,192,944,580 instructions C # 1.72 insn per cycle 1.007909814 seconds time elapsed Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210115071139.257042-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf stat: Introduce struct runtime_stat_dataNamhyung Kim
To pass more info to the saved_value in the runtime_stat, add a new struct runtime_stat_data. Currently it only has 'ctx' field but later patch will add more. Note that we intentionally pass 0 as ctx to clock-related events for compatibility. It was already there in a few places. So move the code into the saved_value_lookup() explicitly and add a comment. Suggested-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210115071139.257042-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15libperf tests: Fail when failing to get a tracepoint idIan Rogers
Permissions are necessary to get a tracepoint id. Fail the test when the read fails. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210114180250.3853825-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15libperf tests: If a test fails return non-zeroIan Rogers
If a test fails return -1 rather than 0. This is consistent with the return value in test-cpumap.c Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210114180250.3853825-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15libperf tests: Avoid uninitialized variable warningIan Rogers
The variable 'bf' is read (for a write call) without being initialized triggering a memory sanitizer warning. Use 'bf' in the read and switch the write to reading from a string. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210114212304.4018119-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf test: Fix shadow stat test for non-bash shellsNamhyung Kim
It was using some bash-specific features and failed to parse when running with a different shell like below: root@kbl-ppc:~/kbl-ws/perf-dev/lck-9077/acme.tmp/tools/perf# ./perf test 83 -vv 83: perf stat metrics (shadow stat) test : --- start --- test child forked, pid 3922 ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found (standard_in) 2: syntax error ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 19: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 24: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 30: ./tests/shell/stat+shadow_stat.sh: [[: not found (standard_in) 2: syntax error ./tests/shell/stat+shadow_stat.sh: 36: ./tests/shell/stat+shadow_stat.sh: [[: not found ./tests/shell/stat+shadow_stat.sh: 45: ./tests/shell/stat+shadow_stat.sh: declare: not found test child finished with -1 ---- end ---- perf stat metrics (shadow stat) test: FAILED! Reported-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Laight <david.laight@aculab.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lore.kernel.org/lkml/20210114050609.1258820-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15tools headers: Syncronize linux/build_bug.h with the kernel sourcesArnaldo Carvalho de Melo
To pick up the changes in: 3a176b94609a18f5 ("Revert "kbuild: avoid static_assert for genksyms"") And silence this perf build warning: Warning: Kernel ABI header at 'tools/include/linux/build_bug.h' differs from latest version at 'include/linux/build_bug.h' diff -u tools/include/linux/build_bug.h include/linux/build_bug.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15tools headers UAPI: Sync kvm.h headers with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in: 647daca25d24fb6e ("KVM: SVM: Add support for booting APs in an SEV-ES guest") That don't cause any tooling change, just silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h' diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15perf bpf examples: Fix bpf.h header include directive in 5sec.c exampleArnaldo Carvalho de Melo
It was looking at bpf/bpf.h, which caused this problem: # perf trace -e tools/perf/examples/bpf/5sec.c /home/acme/git/perf/tools/perf/examples/bpf/5sec.c:42:10: fatal error: 'bpf/bpf.h' file not found #include <bpf/bpf.h> ^~~~~~~~~~~ 1 error generated. ERROR: unable to compile tools/perf/examples/bpf/5sec.c Hint: Check error message shown above. Hint: You can also pre-compile it into .o using: clang -target bpf -O2 -c tools/perf/examples/bpf/5sec.c with proper -I and -D options. event syntax error: 'tools/perf/examples/bpf/5sec.c' \___ Failed to load tools/perf/examples/bpf/5sec.c from source: Error when compiling BPF scriptlet # Change that to plain bpf.h, to make it work again: # perf trace -e tools/perf/examples/bpf/5sec.c sleep 5s 0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000) # perf trace -e tools/perf/examples/bpf/5sec.c/max-stack=16/ sleep 5s 0.000 perf_bpf_probe:hrtimer_nanosleep(__probe_ip: -1776891872, rqtp: 5000000000) hrtimer_nanosleep ([kernel.kallsyms]) common_nsleep ([kernel.kallsyms]) __x64_sys_clock_nanosleep ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms]) entry_SYSCALL_64_after_hwframe ([kernel.kallsyms]) __clock_nanosleep_2 (/usr/lib64/libc-2.32.so) # perf trace -e tools/perf/examples/bpf/5sec.c sleep 4s # Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-15arm64: selftests: Fix spelling of 'Mismatch'Mark Brown
The SVE and FPSIMD stress tests have a spelling mistake in the output, fix it. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20210108183144.673-1-broonie@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-01-14Merge tag 'trace-v5.11-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull bootconfig fix from Steven Rostedt: "Update bootconf scripts for tracing_on option The tracing_on option is supported by bootconfig entries, but the scripts to convert from ftrace to a bootconfig and back were not updated" * tag 'trace-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tools/bootconfig: Add tracing_on support to helper scripts
2021-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-14bpf: Add tests for new BPF atomic operationsBrendan Jackman
The prog_test that's added depends on Clang/LLVM features added by Yonghong in commit 286daafd6512 (was https://reviews.llvm.org/D72184). Note the use of a define called ENABLE_ATOMICS_TESTS: this is used to: - Avoid breaking the build for people on old versions of Clang - Avoid needing separate lists of test objects for no_alu32, where atomics are not supported even if Clang has the feature. The atomics_test.o BPF object is built unconditionally both for test_progs and test_progs-no_alu32. For test_progs, if Clang supports atomics, ENABLE_ATOMICS_TESTS is defined, so it includes the proper test code. Otherwise, progs and global vars are defined anyway, as stubs; this means that the skeleton user code still builds. The atomics_test.o userspace object is built once and used for both test_progs and test_progs-no_alu32. A variable called skip_tests is defined in the BPF object's data section, which tells the userspace object whether to skip the atomics test. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-11-jackmanb@google.com
2021-01-14bpf: Add bitwise atomic instructionsBrendan Jackman
This adds instructions for atomic[64]_[fetch_]and atomic[64]_[fetch_]or atomic[64]_[fetch_]xor All these operations are isomorphic enough to implement with the same verifier, interpreter, and x86 JIT code, hence being a single commit. The main interesting thing here is that x86 doesn't directly support the fetch_ version these operations, so we need to generate a CMPXCHG loop in the JIT. This requires the use of two temporary registers, IIUC it's safe to use BPF_REG_AX and x86's AUX_REG for this purpose. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-10-jackmanb@google.com
2021-01-14bpf: Add instructions for atomic_[cmp]xchgBrendan Jackman
This adds two atomic opcodes, both of which include the BPF_FETCH flag. XCHG without the BPF_FETCH flag would naturally encode atomic_set. This is not supported because it would be of limited value to userspace (it doesn't imply any barriers). CMPXCHG without BPF_FETCH woulud be an atomic compare-and-write. We don't have such an operation in the kernel so it isn't provided to BPF either. There are two significant design decisions made for the CMPXCHG instruction: - To solve the issue that this operation fundamentally has 3 operands, but we only have two register fields. Therefore the operand we compare against (the kernel's API calls it 'old') is hard-coded to be R0. x86 has similar design (and A64 doesn't have this problem). A potential alternative might be to encode the other operand's register number in the immediate field. - The kernel's atomic_cmpxchg returns the old value, while the C11 userspace APIs return a boolean indicating the comparison result. Which should BPF do? A64 returns the old value. x86 returns the old value in the hard-coded register (and also sets a flag). That means return-old-value is easier to JIT, so that's what we use. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-8-jackmanb@google.com
2021-01-14bpf: Add BPF_FETCH field / create atomic_fetch_add instructionBrendan Jackman
The BPF_FETCH field can be set in bpf_insn.imm, for BPF_ATOMIC instructions, in order to have the previous value of the atomically-modified memory location loaded into the src register after an atomic op is carried out. Suggested-by: Yonghong Song <yhs@fb.com> Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210114181751.768687-7-jackmanb@google.com