summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2024-10-24tools/thermal/thermal-engine: Take into account the thresholds APIDaniel Lezcano
Enhance the thermal-engine skeleton with the thresholds added in the kernel and use the API exported by the thermal library. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20241022155147.463475-6-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-24tools/lib/thermal: Add the threshold netlink ABIDaniel Lezcano
The thermal framework supports the thresholds and allows the userspace to create, delete, flush, get the list of the thresholds as well as getting the list of the thresholds set for a specific thermal zone. Add the netlink abstraction in the thermal library to take full advantage of thresholds for the userspace program. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20241022155147.463475-5-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-24tools/lib/thermal: Make more generic the command encoding functionDaniel Lezcano
The thermal netlink has been extended with more commands which require an encoding with more information. The generic encoding function puts the thermal zone id with the command name. It is the unique parameters. The next changes will provide more parameters to the command. Set the scene for those new parameters by making the encoding function more generic. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Link: https://patch.msgid.link/20241022155147.463475-4-daniel.lezcano@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-10-24pidfd: add ioctl to retrieve pid infoLuca Boccassi
A common pattern when using pid fds is having to get information about the process, which currently requires /proc being mounted, resolving the fd to a pid, and then do manual string parsing of /proc/N/status and friends. This needs to be reimplemented over and over in all userspace projects (e.g.: I have reimplemented resolving in systemd, dbus, dbus-daemon, polkit so far), and requires additional care in checking that the fd is still valid after having parsed the data, to avoid races. Having a programmatic API that can be used directly removes all these requirements, including having /proc mounted. As discussed at LPC24, add an ioctl with an extensible struct so that more parameters can be added later if needed. Start with returning pid/tgid/ppid and creds unconditionally, and cgroupid optionally. Signed-off-by: Luca Boccassi <luca.boccassi@gmail.com> Link: https://lore.kernel.org/r/20241010155401.2268522-1-luca.boccassi@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-10-24tools/net/ynl: improve async notification handlingDonald Hunter
The notification handling in ynl is currently very simple, using sleep() to wait a period of time and then handling all the buffered messages in a single batch. This patch changes the notification handling so that messages are processed as they are received. This makes it possible to use ynl as a library that supplies notifications in a timely manner. - Change check_ntf() to be a generator that yields 1 notification at a time and blocks until a notification is available. - Use the --sleep parameter to set an alarm and exit when it fires. This means that the CLI has the same interface, but notifications get printed as they are received: ./tools/net/ynl/cli.py --spec <SPEC> --subscribe <TOPIC> [ --sleep <SECS> ] Here is an example python snippet that shows how to use ynl as a library for receiving notifications: ynl = YnlFamily(f"{dir}/rt_route.yaml") ynl.ntf_subscribe('rtnlgrp-ipv4-route') for event in ynl.check_ntf(): handle(event) Signed-off-by: Donald Hunter <donald.hunter@gmail.com> Tested-by: Kory Maincent <kory.maincent@bootlin.com> Link: https://patch.msgid.link/20241018093228.25477-1-donald.hunter@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-23selftests/bpf: validate generic bpf_object and subskel APIs work togetherAndrii Nakryiko
Add a new subtest validating that bpf_object loaded and initialized through generic APIs is still interoperable with BPF subskeleton, including initialization and reading of global variables. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241023043908.3834423-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-23libbpf: move global data mmap()'ing into bpf_object__load()Andrii Nakryiko
Since BPF skeleton inception libbpf has been doing mmap()'ing of global data ARRAY maps in bpf_object__load_skeleton() API, which is used by code generated .skel.h files (i.e., by BPF skeletons only). This is wrong because if BPF object is loaded through generic bpf_object__load() API, global data maps won't be re-mmap()'ed after load step, and memory pointers returned from bpf_map__initial_value() would be wrong and won't reflect the actual memory shared between BPF program and user space. bpf_map__initial_value() return result is rarely used after load, so this went unnoticed for a really long time, until bpftrace project attempted to load BPF object through generic bpf_object__load() API and then used BPF subskeleton instantiated from such bpf_object. It turned out that .data/.rodata/.bss data updates through such subskeleton was "blackholed", all because libbpf wouldn't re-mmap() those maps during bpf_object__load() phase. Long story short, this step should be done by libbpf regardless of BPF skeleton usage, right after BPF map is created in the kernel. This patch moves this functionality into bpf_object__populate_internal_map() to achieve this. And bpf_object__load_skeleton() is now simple and almost trivial, only propagating these mmap()'ed pointers into user-supplied skeleton structs. We also do trivial adjustments to error reporting inside bpf_object__populate_internal_map() for consistency with the rest of libbpf's map-handling code. Reported-by: Alastair Robertson <ajor@meta.com> Reported-by: Jonathan Wiepert <jwiepert@meta.com> Fixes: d66562fba1ce ("libbpf: Add BPF object skeleton support") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241023043908.3834423-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-23selftests/bpf: fix test_spin_lock_fail.c's global vars usageAndrii Nakryiko
Global variables of special types (like `struct bpf_spin_lock`) make underlying ARRAY maps non-mmapable. To make this work with libbpf's mmaping logic, application is expected to declare such special variables as static, so libbpf doesn't even attempt to mmap() such ARRAYs. test_spin_lock_fail.c didn't follow this rule, but given it relied on this test to trigger failures, this went unnoticed, as we never got to the step of mmap()'ing these ARRAY maps. It is fragile and relies on specific sequence of libbpf steps, which are an internal implementation details. Fix the test by marking lockA and lockB as static. Fixes: c48748aea4f8 ("selftests/bpf: Add failure test cases for spin lock pairing") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241023043908.3834423-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-23perf disasm: Fix not cleaning up disasm_line in symbol__disassemble_raw()Li Huafei
In symbol__disassemble_raw(), the created disasm_line should be discarded before returning an error. When creating disasm_line fails, break the loop and then release the created lines. Fixes: 0b971e6bf1c3 ("perf annotate: Add support to capture and parse raw instruction in powerpc using dso__data_read_offset utility") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: sesse@google.com Cc: kjain@linux.ibm.com Link: https://lore.kernel.org/r/20241019154157.282038-3-lihuafei1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-23perf disasm: Use disasm_line__free() to properly free disasm_lineLi Huafei
symbol__disassemble_capstone_powerpc() goto the 'err' label when it failed in the loop that created disasm_line, and then used free() directly to free disasm_line. Since the structure disasm_line contains members that allocate memory dynamically, this can result in a memory leak. In fact, we can simply break the loop when it fails in the middle of the loop, and disasm_line__free() will then be called to properly free the created line. Other error paths do not need to consider freeing disasm_line. Fixes: c5d60de1813a ("perf annotate: Add support to use libcapstone in powerpc") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: sesse@google.com Cc: kjain@linux.ibm.com Link: https://lore.kernel.org/r/20241019154157.282038-2-lihuafei1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-23perf disasm: Use disasm_line__free() to properly free disasm_lineLi Huafei
The structure disasm_line contains members that require dynamically allocated memory and need to be freed correctly using disasm_line__free(). This patch fixes the incorrect release in symbol__disassemble_capstone(). Fixes: 6d17edc113de ("perf annotate: Use libcapstone to disassemble") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: sesse@google.com Cc: kjain@linux.ibm.com Link: https://lore.kernel.org/r/20241019154157.282038-1-lihuafei1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-23perf python: Fix up the build on architectures without HAVE_KVM_STAT_SUPPORTArnaldo Carvalho de Melo
Noticed while building on a raspbian arm 32-bit system. There was also this other case, fixed by adding a missing util/stat.h with the prototypes: /tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1396:6: error: no previous prototype for ‘perf_stat__set_no_csv_summary’ [-Werror=missing-prototypes] 1396 | void perf_stat__set_no_csv_summary(int set __maybe_unused) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /tmp/tmp.MbiSHoF3dj/perf-6.12.0-rc3/tools/perf/util/python.c:1400:6: error: no previous prototype for ‘perf_stat__set_big_num’ [-Werror=missing-prototypes] 1400 | void perf_stat__set_big_num(int set __maybe_unused) | ^~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors In other architectures this must be building due to some lucky indirect inclusion of that header. Fixes: 9dabf4003423c8d3 ("perf python: Switch module to linking libraries from building source") Reviewed-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/lkml/ZxllAtpmEw5fg9oy@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23libsubcmd: Silence compiler warningEder Zulian
Initialize the pointer 'o' in options__order to NULL to prevent a compiler warning/error which is observed when compiling with the '-Og' option, but is not emitted by the compiler with the current default compilation options. For example, when compiling libsubcmd with $ make "EXTRA_CFLAGS=-Og" -C tools/lib/subcmd/ clean all Clang version 17.0.6 and GCC 13.3.1 fail to compile parse-options.c due to following error: parse-options.c: In function ‘options__order’: parse-options.c:832:9: error: ‘o’ may be used uninitialized [-Werror=maybe-uninitialized] 832 | memcpy(&ordered[nr_opts], o, sizeof(*o)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ parse-options.c:810:30: note: ‘o’ was declared here 810 | const struct option *o, *p = opts; | ^ cc1: all warnings being treated as errors Signed-off-by: Eder Zulian <ezulian@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20241022172329.3871958-4-ezulian@redhat.com
2024-10-23libbpf: Prevent compiler warnings/errorsEder Zulian
Initialize 'new_off' and 'pad_bits' to 0 and 'pad_type' to NULL in btf_dump_emit_bit_padding to prevent compiler warnings/errors which are observed when compiling with 'EXTRA_CFLAGS=-g -Og' options, but do not happen when compiling with current default options. For example, when compiling libbpf with $ make "EXTRA_CFLAGS=-g -Og" -C tools/lib/bpf/ clean all Clang version 17.0.6 and GCC 13.3.1 fail to compile btf_dump.c due to following errors: btf_dump.c: In function ‘btf_dump_emit_bit_padding’: btf_dump.c:903:42: error: ‘new_off’ may be used uninitialized [-Werror=maybe-uninitialized] 903 | if (new_off > cur_off && new_off <= next_off) { | ~~~~~~~~^~~~~~~~~~~ btf_dump.c:870:13: note: ‘new_off’ was declared here 870 | int new_off, pad_bits, bits, i; | ^~~~~~~ btf_dump.c:917:25: error: ‘pad_type’ may be used uninitialized [-Werror=maybe-uninitialized] 917 | btf_dump_printf(d, "\n%s%s: %d;", pfx(lvl), pad_type, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 918 | in_bitfield ? new_off - cur_off : 0); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ btf_dump.c:871:21: note: ‘pad_type’ was declared here 871 | const char *pad_type; | ^~~~~~~~ btf_dump.c:930:20: error: ‘pad_bits’ may be used uninitialized [-Werror=maybe-uninitialized] 930 | if (bits == pad_bits) { | ^ btf_dump.c:870:22: note: ‘pad_bits’ was declared here 870 | int new_off, pad_bits, bits, i; | ^~~~~~~~ cc1: all warnings being treated as errors Signed-off-by: Eder Zulian <ezulian@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20241022172329.3871958-3-ezulian@redhat.com
2024-10-23resolve_btfids: Fix compiler warningsEder Zulian
Initialize 'set' and 'set8' pointers to NULL in sets_patch to prevent possible compiler warnings which are issued for various optimization levels, but do not happen when compiling with current default compilation options. For example, when compiling resolve_btfids with $ make "HOSTCFLAGS=-O2 -Wall" -C tools/bpf/resolve_btfids/ clean all Clang version 17.0.6 and GCC 13.3.1 issue following -Wmaybe-uninitialized warnings for variables 'set8' and 'set': In function ‘sets_patch’, inlined from ‘symbols_patch’ at main.c:748:6, inlined from ‘main’ at main.c:823:6: main.c:163:9: warning: ‘set8’ may be used uninitialized [-Wmaybe-uninitialized] 163 | eprintf(1, verbose, pr_fmt(fmt), ##__VA_ARGS__) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ main.c:729:17: note: in expansion of macro ‘pr_debug’ 729 | pr_debug("sorting addr %5lu: cnt %6d [%s]\n", | ^~~~~~~~ main.c: In function ‘main’: main.c:682:37: note: ‘set8’ was declared here 682 | struct btf_id_set8 *set8; | ^~~~ In function ‘sets_patch’, inlined from ‘symbols_patch’ at main.c:748:6, inlined from ‘main’ at main.c:823:6: main.c:163:9: warning: ‘set’ may be used uninitialized [-Wmaybe-uninitialized] 163 | eprintf(1, verbose, pr_fmt(fmt), ##__VA_ARGS__) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ main.c:729:17: note: in expansion of macro ‘pr_debug’ 729 | pr_debug("sorting addr %5lu: cnt %6d [%s]\n", | ^~~~~~~~ main.c: In function ‘main’: main.c:683:36: note: ‘set’ was declared here 683 | struct btf_id_set *set; | ^~~ Signed-off-by: Eder Zulian <ezulian@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20241022172329.3871958-2-ezulian@redhat.com
2024-10-23perf test: Handle perftool-testsuite_probe failure due to broken DWARFVeronika Molnarova
Test case test_adding_blacklisted ends in failure if the blacklisted probe is of an assembler function with no DWARF available. At the same time, probing the blacklisted function with ASM DWARF doesn't test the blacklist itself as the failure is a result of the broken DWARF. When the broken DWARF output is encountered, check if the probed function was compiled by the assembler. If so, the broken DWARF message is expected and does not report a perf issue, else report a failure. If the ASM DWARF affected the probe, try the next probe on the blacklist. If the first 5 probes are defective due to broken DWARF, skip the test case. Fixes: def5480d63c1e847 ("perf testsuite probe: Add test for blacklisted kprobes handling") Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Veronika Molnarova <vmolnaro@redhat.com> Link: https://lore.kernel.org/r/20241017161555.236769-1-vmolnaro@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23selftest: rtc: Add to check rtc alarm status for alarm related testJoseph Jang
In alarm_wkalm_set and alarm_wkalm_set_minute test, they use different ioctl (RTC_ALM_SET/RTC_WKALM_SET) for alarm feature detection. They will skip testing if RTC_ALM_SET/RTC_WKALM_SET ioctl returns an EINVAL error code. This design may miss detecting real problems when the efi.set_wakeup_time() return errors and then RTC_ALM_SET/RTC_WKALM_SET ioctl returns an EINVAL error code with RTC_FEATURE_ALARM enabled. In order to make rtctest more explicit and robust, we propose to use RTC_PARAM_GET ioctl interface to check rtc alarm feature state before running alarm related tests. If the kernel does not support RTC_PARAM_GET ioctl interface, we will fallback to check the error number of (RTC_ALM_SET/RTC_WKALM_SET) ioctl call for alarm feature detection. Requires commit 101ca8d05913b ("rtc: efi: Enable SET/GET WAKEUP services as optional") Reviewed-by: Koba Ko <kobak@nvidia.com> Reviewed-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Joseph Jang <jjang@nvidia.com> Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-10-23selftests/sched_ext: add order-only dependency of runner.o on BPFOBJIhor Solodrai
The runner.o may start building before libbpf headers are installed, and as a result build fails. This happened a couple of times on libbpf/ci test jobs: * https://github.com/libbpf/ci/actions/runs/11447667257/job/31849533100 * https://github.com/theihor/libbpf-ci/actions/runs/11445162764/job/31841649552 Headers are installed in a recipe for $(BPFOBJ) target, and adding an order-only dependency should ensure this doesn't happen. Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-10-23uprobe: Add data pointer to consumer handlersJiri Olsa
Adding data pointer to both entry and exit consumer handlers and all its users. The functionality itself is coming in following change. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20241018202252.693462-2-jolsa@kernel.org
2024-10-23selftests/bpf: Increase verifier log limit in veristatMykyta Yatsenko
The current default buffer size of 16MB allocated by veristat is no longer sufficient to hold the verifier logs of some production BPF programs. To address this issue, we need to increase the verifier log limit. Commit 7a9f5c65abcc ("bpf: increase verifier log limit") has already increased the supported buffer size by the kernel, but veristat users need to explicitly pass a log size argument to use the bigger log. This patch adds a function to detect the maximum verifier log size supported by the kernel and uses that by default in veristat. This ensures that veristat can handle larger verifier logs without requiring users to manually specify the log size. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20241023155314.126255-1-mykyta.yatsenko5@gmail.com
2024-10-23tools headers UAPI: Sync kvm headers with the kernel sourcesArnaldo Carvalho de Melo
To pick the changes in: aa8d1f48d353b046 ("KVM: x86/mmu: Introduce a quirk to control memslot zap behavior") That don't change functionality in tools/perf, as no new ioctl is added for the 'perf trace' scripts to harvest. This addresses these perf build warnings: Warning: Kernel ABI header differences: diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h Please see tools/include/uapi/README for further details. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Yan Zhao <yan.y.zhao@intel.com> Link: https://lore.kernel.org/lkml/ZxgN0O02YrAJ2qIC@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf trace: Fix non-listed archs in the syscalltbl routinesJiri Slaby
This fixes a build breakage on 32-bit arm, where the syscalltbl__id_at_idx() function was missing. Committer notes: Generating a proper syscall table from a copy of arch/arm/tools/syscall.tbl ends up being too big a patch for this rc stage, I started doing it but while testing noticed some other problems with using BPF to collect pointer args on arm7 (32-bit) will maybe continue trying to make it work on the next cycle... Fixes: 7a2fb5619cc1fb53 ("perf trace: Fix iteration of syscall ids in syscalltbl->entries") Suggested-by: Howard Chu <howardchu95@gmail.com> Signed-off-by: <jslaby@suse.cz> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/lkml/3a592835-a14f-40be-8961-c0cee7720a94@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf build: Change the clang check back to 12.0.1Howard Chu
This serves as a revert for this patch: https://lore.kernel.org/linux-perf-users/ZuGL9ROeTV2uXoSp@x1/ Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: James Clark <james.clark@linaro.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241011021403.4089793-2-howardchu95@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf trace augmented_raw_syscalls: Add more checks to pass the verifierHoward Chu
Add some more checks to pass the verifier in more kernels. Signed-off-by: Howard Chu <howardchu95@gmail.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20241011021403.4089793-3-howardchu95@gmail.com [ Reduced the patch removing things that can be done later ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23perf trace augmented_raw_syscalls: Add extra array index bounds checking to ↵Arnaldo Carvalho de Melo
satisfy some BPF verifiers In a RHEL8 kernel (4.18.0-513.11.1.el8_9.x86_64), that, as enterprise kernels go, have backports from modern kernels, the verifier complains about lack of bounds check for the index into the array of syscall arguments, on a BPF bytecode generated by clang 17, with: ; } else if (size < 0 && size >= -6) { /* buffer */ 116: (b7) r1 = -6 117: (2d) if r1 > r6 goto pc-30 R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv-6 R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6=inv(id=0,umin_value=18446744073709551610,var_off=(0xffffffff00000000; 0xffffffff)) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value ; index = -(size + 1); 118: (a7) r6 ^= -1 119: (67) r6 <<= 32 120: (77) r6 >>= 32 ; aug_size = args->args[index]; 121: (67) r6 <<= 3 122: (79) r1 = *(u64 *)(r10 -24) 123: (0f) r1 += r6 last_idx 123 first_idx 116 regs=40 stack=0 before 122: (79) r1 = *(u64 *)(r10 -24) regs=40 stack=0 before 121: (67) r6 <<= 3 regs=40 stack=0 before 120: (77) r6 >>= 32 regs=40 stack=0 before 119: (67) r6 <<= 32 regs=40 stack=0 before 118: (a7) r6 ^= -1 regs=40 stack=0 before 117: (2d) if r1 > r6 goto pc-30 regs=42 stack=0 before 116: (b7) r1 = -6 R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=inv1 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_w=inv(id=0) R5_w=inv40 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=0) R7_w=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8_w=invP6 R9_w=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_w=map_value fp-24_r=map_value fp-32_w=inv40 fp-40=ctx fp-48=map_value fp-56_w=inv1 fp-64_w=map_value fp-72=map_value fp-80=map_value parent didn't have regs=40 stack=0 marks last_idx 110 first_idx 98 regs=40 stack=0 before 110: (6d) if r1 s> r6 goto pc+5 regs=42 stack=0 before 109: (b7) r1 = 1 regs=40 stack=0 before 108: (65) if r6 s> 0x1000 goto pc+7 regs=40 stack=0 before 98: (55) if r6 != 0x1 goto pc+9 R0_w=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=invP12 R2_w=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3_rw=inv(id=0) R5_w=inv24 R6_rw=invP(id=0,smin_value=-2147483648,smax_value=2147483647) R7_w=map_value(id=0,off=40,ks=4,vs=8272,imm=0) R8_rw=invP4 R9_w=map_value(id=0,off=12,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16_rw=map_value fp-24_r=map_value fp-32_rw=invP24 fp-40_r=ctx fp-48_r=map_value fp-56_w=invP1 fp-64_rw=map_value fp-72_r=map_value fp-80_r=map_value parent already had regs=40 stack=0 marks 124: (79) r6 = *(u64 *)(r1 +16) R0=map_value(id=0,off=0,ks=4,vs=24688,imm=0) R1_w=map_value(id=0,off=0,ks=4,vs=8272,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R2=map_value(id=0,off=16,ks=4,vs=8272,imm=0) R3=inv(id=0) R5=inv40 R6_w=invP(id=0,umax_value=34359738360,var_off=(0x0; 0x7fffffff8),s32_max_value=2147483640,u32_max_value=-8) R7=map_value(id=0,off=56,ks=4,vs=8272,imm=0) R8=invP6 R9=map_value(id=0,off=20,ks=4,vs=24,imm=0) R10=fp0 fp-8=mmmmmmmm fp-16=map_value fp-24=map_value fp-32=inv40 fp-40=ctx fp-48=map_value fp-56=inv1 fp-64=map_value fp-72=map_value fp-80=map_value R1 unbounded memory access, make sure to bounds check any such access processed 466 insns (limit 1000000) max_states_per_insn 2 total_states 20 peak_states 20 mark_read 3 If we add this line, as used in other BPF programs, to cap that index: index &= 7; The generated BPF program is considered safe by that version of the BPF verifier, allowing perf to collect the syscall args in one more kernel using the BPF based pointer contents collector. With the above one-liner it works with that kernel: [root@dell-per740-01 ~]# uname -a Linux dell-per740-01.khw.eng.rdu2.dc.redhat.com 4.18.0-513.11.1.el8_9.x86_64 #1 SMP Thu Dec 7 03:06:13 EST 2023 x86_64 x86_64 x86_64 GNU/Linux [root@dell-per740-01 ~]# ~acme/bin/perf trace -e *sleep* sleep 1.234567890 0.000 (1234.704 ms): sleep/3863610 nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }) = 0 [root@dell-per740-01 ~]# As well as with the one in Fedora 40: root@number:~# uname -a Linux number 6.11.3-200.fc40.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Oct 10 22:31:19 UTC 2024 x86_64 GNU/Linux root@number:~# perf trace -e *sleep* sleep 1.234567890 0.000 (1234.722 ms): sleep/14873 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 234567890 }, rmtp: 0x7ffe87311a40) = 0 root@number:~# Song Liu reported that this one-liner was being optimized out by clang 18, so I suggested and he tested that adding a compiler barrier before it made clang v18 to keep it and the verifier in the kernel in Song's case (Meta's 5.12 based kernel) also was happy with the resulting bytecode. I'll investigate using virtme-ng[1] to have all the perf BPF based functionality thoroughly tested over multiple kernels and clang versions. [1] https://kernel-recipes.org/en/2024/virtme-ng/ Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alan Maguire <alan.maguire@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrea Righi <andrea.righi@linux.dev> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/lkml/Zw7JgJc0LOwSpuvx@x1 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2024-10-23kselftest/arm64: Log fp-stress child startup errors to stdoutMark Brown
Currently if we encounter an error between fork() and exec() of a child process we log the error to stderr. This means that the errors don't get annotated with the child information which makes diagnostics harder and means that if we miss the exit signal from the child we can deadlock waiting for output from the child. Improve robustness and output quality by logging to stdout instead. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241023-arm64-fp-stress-exec-fail-v1-1-ee3c62932c15@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-10-22selftests/bpf: Add test for passing in uninit mtu_lenDaniel Borkmann
Add a small test to pass an uninitialized mtu_len to the bpf_check_mtu() helper to probe whether the verifier rejects it under !CAP_PERFMON. # ./vmtest.sh -- ./test_progs -t verifier_mtu [...] ./test_progs -t verifier_mtu [ 1.414712] tsc: Refined TSC clocksource calibration: 3407.993 MHz [ 1.415327] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcd52370, max_idle_ns: 440795242006 ns [ 1.416463] clocksource: Switched to clocksource tsc [ 1.429842] bpf_testmod: loading out-of-tree module taints kernel. [ 1.430283] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #510/1 verifier_mtu/uninit/mtu: write rejected:OK #510 verifier_mtu:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-5-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22selftests/bpf: Add test for writes to .rodataDaniel Borkmann
Add a small test to write a (verification-time) fixed vs unknown but bounded-sized buffer into .rodata BPF map and assert that both get rejected. # ./vmtest.sh -- ./test_progs -t verifier_const [...] ./test_progs -t verifier_const [ 1.418717] tsc: Refined TSC clocksource calibration: 3407.994 MHz [ 1.419113] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcde90a1, max_idle_ns: 440795222066 ns [ 1.419972] clocksource: Switched to clocksource tsc [ 1.449596] bpf_testmod: loading out-of-tree module taints kernel. [ 1.449958] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel #475/1 verifier_const/rodata/strtol: write rejected:OK #475/2 verifier_const/bss/strtol: write accepted:OK #475/3 verifier_const/data/strtol: write accepted:OK #475/4 verifier_const/rodata/mtu: write rejected:OK #475/5 verifier_const/bss/mtu: write accepted:OK #475/6 verifier_const/data/mtu: write accepted:OK #475/7 verifier_const/rodata/mark: write with unknown reg rejected:OK #475/8 verifier_const/rodata/mark: write with unknown reg rejected:OK #475 verifier_const:OK #476/1 verifier_const_or/constant register |= constant should keep constant type:OK #476/2 verifier_const_or/constant register |= constant should not bypass stack boundary checks:OK #476/3 verifier_const_or/constant register |= constant register should keep constant type:OK #476/4 verifier_const_or/constant register |= constant register should not bypass stack boundary checks:OK #476 verifier_const_or:OK Summary: 2/12 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20241021152809.33343-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-10-22selftests/bpf: Retire test_sock.cJordan Rife
Completely remove test_sock.c and associated config. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20241022152913.574836-5-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-22selftests/bpf: Migrate BPF_CGROUP_INET_SOCK_CREATE test cases to prog_testsJordan Rife
Move the "load w/o expected_attach_type" test case to prog_tests/sock_create.c and drop the remaining test case, as it is made redundant with the existing coverage inside prog_tests/sock_create.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20241022152913.574836-4-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-22selftests/bpf: Migrate LOAD_REJECT test cases to prog_testsJordan Rife
Move LOAD_REJECT test cases from test_sock.c to an equivalent set of verifier tests in progs/verifier_sock.c. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20241022152913.574836-3-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-22selftests/bpf: Migrate *_POST_BIND test cases to prog_testsJordan Rife
Move all BPF_CGROUP_INET6_POST_BIND and BPF_CGROUP_INET4_POST_BIND test cases to a new prog_test, prog_tests/sock_post_bind.c, except for LOAD_REJECT test cases. Signed-off-by: Jordan Rife <jrife@google.com> Link: https://lore.kernel.org/r/20241022152913.574836-2-jrife@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-10-22perf test: Add precise_max subtest to the perf record shell testNamhyung Kim
It's a very simply test just to run with cycles:P and instructions:P events. Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-10-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf record: Just use "cycles:P" as the default eventNamhyung Kim
The fallback logic can add ":u" modifier if needed. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-9-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Check fallback error and orderNamhyung Kim
The perf_event_open might fail due to various reasons, so blindly reducing precise_ip level might not be the best way to deal with it. It seems the kernel return -EOPNOTSUPP when PMU doesn't support the given precise level. Let's try again with the correct error code. This caused a problem on AMD, as it stops on precise_ip of 2 for IBS but user events with exclude_kernel=1 cannot make progress. Let's add the evsel__handle_error_quirks() to this case specially. I plan to work on the kernel side to improve this situation but it'd still need some special handling for IBS. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-8-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Move x86__is_amd_cpu() to util/env.cNamhyung Kim
It can be called from non-x86 platform so let's move it to the general util directory. Also add a new helper perf_env__is_x86_amd_cpu() so that it can be called with an existing perf_env as well. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-7-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Detect missing kernel features properlyNamhyung Kim
The evsel__detect_missing_features() is to check if the attributes of the evsel is supported or not. But it checks the attribute based on the given evsel, it might miss something if the attr doesn't have the bit or give incorrect results if the event is special. Also it maintains the order of the feature that was added to the kernel which means it can assume older features should be supported once it detects the current feature is working. To minimized the confusion and to accurately check the kernel features, I think it's better to use a software event and go through all the features at once. Also make the function static since it's only used in evsel.c. Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-6-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Do not set exclude_guest for precise_ipNamhyung Kim
It seems perf sets the exclude_guest bit because of Intel PEBS implementation which uses a virtual address. IIUC now kernel disables PEBS when it goes to the guest mode regardless of this bit so we don't need to set it explicitly. At least for the other archs/vendors. I found the commit 1342798cc13e set the exclude_guest for precise_ip in the tool and the commit 20b279ddb38c added kernel side enforcement which was reverted by commit a706d965dcfd later. Actually it doesn't set the exclude_guest for the default event (cycles:P) already. $ grep -m1 vendor /proc/cpuinfo vendor_id : GenuineIntel $ perf record -e cycles:P true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ] $ perf evlist -v | tr ',' '\n' | grep -e exclude -e precise precise_ip: 3 But having lower 'p' modifier set the bit for some reason. $ perf record -e cycles:pp true [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB perf.data (9 samples) ] $ perf evlist -v | tr ',' '\n' | grep -e exclude -e precise precise_ip: 2 exclude_guest: 1 Actually AMD IBS suffers from this because it doesn't support excludes and having this bit effectively disables new features in the current implementation (due to the missing feature check). $ grep -m1 vendor /proc/cpuinfo vendor_id : AuthenticAMD $ perf record -W -e cycles:p -vv true 2>&1 | grep switching switching off PERF_FORMAT_LOST support switching off weight struct support switching off bpf_event switching off ksymbol switching off cloexec flag switching off mmap2 switching off exclude_guest, exclude_host By not setting exclude_guest, we can fix this inconsistency and the troubles. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-5-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Simplify evsel__add_modifier()Namhyung Kim
Since it doesn't set the exclude_guest, no need to special handle the bit and simply show only if one of host or guest bit is set. Now the default event name might not have :H prefix anymore so change the dlfilter test not to compare the ":" at the end. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-4-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Don't set attr.exclude_guest by defaultNamhyung Kim
The exclude_guest in the event attribute is to limit profiling in the host environment. But I'm not sure why we want to set it by default cause we don't care about it in most cases and I feel like it just makes new PMU implementation complicated. Of course it's useful for perf kvm command so I added the exclude_GH_default variable to preserve the old behavior for perf kvm and other commands like perf record and stat won't set the exclude bit. This is helpful for AMD IBS case since having exclude_guest bit will clear new feature bit due to the missing feature check logic. $ sysctl kernel.perf_event_paranoid kernel.perf_event_paranoid = 0 $ perf record -W -e ibs_op// -vv true 2>&1 | grep switching switching off PERF_FORMAT_LOST support switching off weight struct support switching off bpf_event switching off ksymbol switching off cloexec flag switching off mmap2 switching off exclude_guest, exclude_host Intestingly, I found it sets the exclude_bit if "u" modifier is used. I don't know why but it's neither intuitive nor consistent. Let's remove the bit there too. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-3-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22perf tools: Add fallback for exclude_guestNamhyung Kim
Commit 7b100989b4f6bce70 ("perf evlist: Remove __evlist__add_default") changed to parse "cycles:P" event instead of creating a new cycles event for perf record. But it also changed the way how modifiers are handled so it doesn't set the exclude_guest bit by default. It seems Apple M1 PMU requires exclude_guest set and returns EOPNOTSUPP if not. Let's add a fallback so that it can work with default events. Also update perf stat hybrid tests to handle possible u or H modifiers. Reviewed-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@linaro.org> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Cc: James Clark <james.clark@arm.com> Cc: Atish Patra <atishp@atishpatra.org> Cc: Mingwei Zhang <mizhang@google.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Palmer Dabbelt <palmer@rivosinc.com> Link: https://lore.kernel.org/r/20241016062359.264929-2-namhyung@kernel.org Fixes: 7b100989b4f6bce70 ("perf evlist: Remove __evlist__add_default") Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2024-10-22selftests: livepatch: test livepatching a kprobed functionMichael Vetter
The test proves that a function that is being kprobed and uses a post_handler cannot be livepatched. Only one ftrace_ops with FTRACE_OPS_FL_IPMODIFY set may be registered to any given function at a time. Note that the conflicting kprobe could not be created using the tracefs interface, see Documentation/trace/kprobetrace.rst. This interface uses only the pre_handler(), see alloc_trace_kprobe(). But FTRACE_OPS_FL_IPMODIFY is used only when the kprobe is using a post_handler, see arm_kprobe_ftrace(). Signed-off-by: Michael Vetter <mvetter@suse.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com> Tested-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20241017200132.21946-4-mvetter@suse.com Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-10-22selftests: livepatch: save and restore kprobe stateMichael Vetter
Save the state of /sys/kernel/debug/kprobes/enabled during setup_config() and restore it during cleanup(). This is in preparation for a future commit that will add a test that should confirm that we cannot livepatch a kprobed function if that kprobe has a post handler. Signed-off-by: Michael Vetter <mvetter@suse.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com> Tested-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20241017200132.21946-3-mvetter@suse.com [pmladek@suse.com: Added few more substitutions in test-syscall.sh] Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-10-22selftests: livepatch: rename KLP_SYSFS_DIR to SYSFS_KLP_DIRMichael Vetter
This naming makes more sense according to the directory structure. Especially when we later add more paths. Addtionally replace `/sys/kernel/livepatch` with `$SYSFS_KLP_DIR` in the livepatch test files. Signed-off-by: Michael Vetter <mvetter@suse.com> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Joe Lawrence <joe.lawrence@redhat.com> Tested-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Marcos Paulo de Souza <mpdesouza@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Tested-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20241017200132.21946-2-mvetter@suse.com [pmladek@suse.com: Fix corrupted substitution] Signed-off-by: Petr Mladek <pmladek@suse.com>
2024-10-22tools: ynl-gen: use big-endian netlink attribute typesAsbjørn Sloth Tønnesen
Change ynl-gen-c.py to use NLA_BE16 and NLA_BE32 types to represent big-endian u16 and u32 ynl types. Doing this enables those attributes to have range checks applied, as the validator will then convert to host endianness prior to validation. The autogenerated kernel/uapi code have been regenerated by running: ./tools/net/ynl/ynl-regen.sh -f This changes the policy types of the following attributes: FOU_ATTR_PORT (NLA_U16 -> NLA_BE16) FOU_ATTR_PEER_PORT (NLA_U16 -> NLA_BE16) These two are used with nla_get_be16/nla_put_be16(). MPTCP_PM_ADDR_ATTR_ADDR4 (NLA_U32 -> NLA_BE32) This one is used with nla_get_in_addr/nla_put_in_addr(), which uses nla_get_be32/nla_put_be32(). IOWs the generated changes are AFAICT aligned with their implementations. The generated userspace code remains identical, and have been verified by comparing the output generated by the following command: make -C tools/net/ynl/generated Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20241017094704.3222173-1-ast@fiberby.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22selftests: mlxsw: devlink_trap_police: Use defer for test cleanupPetr Machata
Use the defer framework to schedule cleanups as soon as the command is executed. Note that the start_traffic commands in __burst_test() are each sending a fixed number of packets (note the -c flag) and then ending. They therefore do not need a matching stop_traffic. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22selftests: mlxsw: qos_max_descriptors: Use defer for test cleanupPetr Machata
Use the defer framework to schedule cleanups as soon as the command is executed. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22selftests: mlxsw: qos_ets_strict: Use defer for test cleanupPetr Machata
Use the defer framework to schedule cleanups as soon as the command is executed. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22selftests: mlxsw: qos_mc_aware: Use defer for test cleanupPetr Machata
Use the defer framework to schedule cleanups as soon as the command is executed. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-22selftests: ETS: Use defer for test cleanupPetr Machata
Use the defer framework to schedule cleanups as soon as the command is executed. Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>