Age | Commit message (Collapse) | Author |
|
Otherwise the different instances of _start_c from each compilation unit
will lead to linker errors:
/usr/bin/ld: /tmp/ccSNvRqs.o: in function `_start_c':
nolibc-test-foo.c:(.text.nolibc_memset+0x9): multiple definition of `_start_c'; /tmp/ccG25101.o:nolibc-test.c:(.text+0x1ea3): first defined here
Fixes: 17336755150b ("tools/nolibc: add new crt.h with _start_c")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Link: https://lore.kernel.org/lkml/20231012-nolibc-start_c-multiple-v1-1-fbfc73e0283f@weissschuh.net/
Link: https://lore.kernel.org/lkml/20231012-nolibc-linkage-test-v1-1-315e682768b4@weissschuh.net/
Acked-by: Willy Tarreau <w@1wt.eu>
|
|
The ABI mandates that the %esp register must be a multiple of 16 when
executing a 'call' instruction.
Commit 2ab446336b17 ("tools/nolibc: i386: shrink _start with _start_c")
simplified the _start function, but it didn't take care of the %esp
alignment, causing SIGSEGV on SSE and AVX programs that use aligned move
instruction (e.g., movdqa, movaps, and vmovdqa).
The 'and $-16, %esp' aligns the %esp at a multiple of 16. Then 'push
%eax' will subtract the %esp by 4; thus, it breaks the 16-byte
alignment. Make sure the %esp is correctly aligned after the push by
subtracting 12 before the push.
Extra:
Add 'add $12, %esp' before the 'and $-16, %esp' to avoid over-estimating
for particular cases as suggested by Willy.
A test program to validate the %esp alignment on _start can be found at:
https://lore.kernel.org/lkml/ZOoindMFj1UKqo+s@biznet-home.integral.gnuweeb.org
[ Thomas: trim Fixes tag commit id ]
Cc: Zhangjin Wu <falcon@tinylab.org>
Fixes: 2ab446336b17 ("tools/nolibc: i386: shrink _start with _start_c")
Reported-by: Nicholas Rosenberg <inori@vnlx.org>
Acked-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Reviewed-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
|
|
Some SoCs have firmware support to notify, if the system can't lower
power limit to a value requested from user space via RAPL constraints.
This test program waits for notification of power floor and prints. This
program can be used to test this feature and also allows other user space
programs to use as a reference.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Extend x86's state to forcefully load *all* host-supported xfeatures by
modifying xstate_bv in the saved state. Stuffing xstate_bv ensures that
the selftest is verifying KVM's full ABI regardless of whether or not the
guest code is successful in getting various xfeatures out of their INIT
state, e.g. see the disaster that is/was MPX.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230928001956.924301-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Expand x86's state test to load XSAVE state into a "dummy" vCPU prior to
KVM_SET_CPUID2, and again with an empty guest CPUID model. Except for
off-by-default features, i.e. AMX, KVM's ABI for KVM_SET_XSAVE is that
userspace is allowed to load xfeatures so long as they are supported by
the host. This is a regression test for a combination of KVM bugs where
the state saved by KVM_GET_XSAVE{2} could not be loaded via KVM_SET_XSAVE
if the saved xstate_bv would load guest-unsupported xfeatures.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230928001956.924301-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Modify support XSAVE state in the "state test's" guest code so that saving
and loading state via KVM_{G,S}ET_XSAVE actually does something useful,
i.e. so that xstate_bv in XSAVE state isn't empty.
Punt on BNDCSR for now, it's easier to just stuff that xfeature from the
host side.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20230928001956.924301-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
When outputting the "new" register list we want to print all of the
new registers, decoding as much as possible of each of them. Also, we
don't want to assert while listing registers with '--list'. We output
"/* UNKNOWN */" after each new register (which we were already doing
for some), which should be enough.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Haibo Xu <haibo1.xu@intel.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
We have a new conditional operations related ISA extensions so let us
add these extensions to get-reg-list test.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
We have a new smstateen registers as separate sub-type of CSR ONE_REG
interface so let us add these registers to get-reg-list test.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
We have a new senvcfg register in the general CSR ONE_REG interface
so let us add it to get-reg-list test.
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
Add a couple macros to use when filling arrays in order to ensure
the elements are placed in the right order, regardless of the
order we prefer to read them. And immediately apply the new
macro to resorting the ISA extension lists alphabetically.
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Haibo Xu <haibo1.xu@intel.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
|
|
These selftests are written in prog_tests style instead of adding
them to the existing test_sock_addr tests. Migrating the existing
sock addr tests to prog_tests style is left for future work. This
commit adds support for testing bind() sockaddr hooks, even though
there's no unix socket sockaddr hook for bind(). We leave this code
intact for when the INET and INET6 tests are migrated in the future
which do support intercepting bind().
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-10-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
The mount directory for the selftests cgroup tree might
not exist so let's make sure it does exist by creating
it ourselves if it doesn't exist.
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-9-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add the necessary plumbing to hook up the new cgroup unix sockaddr
hooks into bpftool.
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/r/20231011185113.140426-7-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add the necessary plumbing to hook up the new cgroup unix sockaddr
hooks into libbpf.
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-6-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
These hooks allows intercepting connect(), getsockname(),
getpeername(), sendmsg() and recvmsg() for unix sockets. The unix
socket hooks get write access to the address length because the
address length is not fixed when dealing with unix sockets and
needs to be modified when a unix socket address is modified by
the hook. Because abstract socket unix addresses start with a
NUL byte, we cannot recalculate the socket address in kernelspace
after running the hook by calculating the length of the unix socket
path using strlen().
These hooks can be used when users want to multiplex syscall to a
single unix socket to multiple different processes behind the scenes
by redirecting the connect() and other syscalls to process specific
sockets.
We do not implement support for intercepting bind() because when
using bind() with unix sockets with a pathname address, this creates
an inode in the filesystem which must be cleaned up. If we rewrite
the address, the user might try to clean up the wrong file, leaking
the socket in the filesystem where it is never cleaned up. Until we
figure out a solution for this (and a use case for intercepting bind()),
we opt to not allow rewriting the sockaddr in bind() calls.
We also implement recvmsg() support for connected streams so that
after a connect() that is modified by a sockaddr hook, any corresponding
recmvsg() on the connected socket can also be modified to make the
connected program think it is connected to the "intended" remote.
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-5-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Jiri added more careful handling of output of the code generator
to avoid wiping out existing files in
commit f65f305ae008 ("tools: ynl-gen: use temporary file for rendering")
Make use of the -o option in the Makefiles, it is already used
by ynl-regen.sh.
Link: https://lore.kernel.org/r/20231010202714.4045168-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
These were missed when these hooks were first added so add them now
instead to make sure every sockaddr hook has a matching section name
test.
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Link: https://lore.kernel.org/r/20231011185113.140426-2-daan.j.demeyer@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Now that the driver core allows for struct class to be in read-only
memory, we should make all 'class' structures declared at build time
placing them into read-only memory, instead of having to be dynamically
allocated at runtime.
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/2023100611-platinum-galleria-ceb3@gregkh
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
Turns out that we were relying on the globally installed headers, not
the ones we freshly compiled.
Add a manual include in CFLAGS to sort this out.
Tested-by: Nick Desaulniers <ndesaulniers@google.com> # Build
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20230825-wip-selftests-v3-3-639963c54109@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
"make headers" is a requirement before calling make on the selftests
dir, so we should not have to manually install those headers
Acked-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com> # Build
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20230825-wip-selftests-v3-2-639963c54109@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
For the hid-bpf tests to compile, we need to have the definition of
struct hid_bpf_ctx. This definition is an internal one from the kernel
and it is supposed to be defined in the generated vmlinux.h.
This vmlinux.h header is generated based on the currently running kernel
or if the kernel was already compiled in the tree. If you just compile
the selftests without compiling the kernel beforehand and you are running
on a 6.2 kernel, you'll end up with a vmlinux.h without the hid_bpf_ctx
definition.
Use the clever trick from tools/testing/selftests/bpf/progs/bpf_iter.h
to force the definition of that symbol in case we don't find it in the
BTF and also add __attribute__((preserve_access_index)) to further
support CO-RE functionality for these tests.
Signed-off-by: Justin Stitt <justinstitt@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com> # Build
Tested-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20230825-wip-selftests-v3-1-639963c54109@kernel.org
Signed-off-by: Benjamin Tissoires <bentiss@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fixes from Wei Liu:
- fixes for Hyper-V VTL code (Saurabh Sengar and Olaf Hering)
- fix hv_kvp_daemon to support keyfile based connection profile
(Shradha Gupta)
* tag 'hyperv-fixes-signed-20231009' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
hv/hv_kvp_daemon:Support for keyfile based connection profile
hyperv: reduce size of ms_hyperv_info
x86/hyperv: Add common print prefix "Hyper-V" in hv_init
x86/hyperv: Remove hv_vtl_early_init initcall
x86/hyperv: Restrict get_vtl to only VTL platforms
|
|
Add mock_domain_alloc_user() and a new test case for
IOMMU_HWPT_ALLOC_NEST_PARENT.
Link: https://lore.kernel.org/r/20230928071528.26258-6-yi.l.liu@intel.com
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Ifcfg config file support in NetworkManger is deprecated. This patch
provides support for the new keyfile config format for connection
profiles in NetworkManager. The patch modifies the hv_kvp_daemon code
to generate the new network configuration in keyfile
format(.ini-style format) along with a ifcfg format configuration.
The ifcfg format configuration is also retained to support easy
backward compatibility for distro vendors. These configurations are
stored in temp files which are further translated using the
hv_set_ifconfig.sh script. This script is implemented by individual
distros based on the network management commands supported.
For example, RHEL's implementation could be found here:
https://gitlab.com/redhat/centos-stream/src/hyperv-daemons/-/blob/c9s/hv_set_ifconfig.sh
Debian's implementation could be found here:
https://github.com/endlessm/linux/blob/master/debian/cloud-tools/hv_set_ifconfig
The next part of this support is to let the Distro vendors consume
these modified implementations to the new configuration format.
Tested-on: Rhel9(Hyper-V, Azure)(nm and ifcfg files verified)
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Ani Sinha <anisinha@redhat.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Link: https://lore.kernel.org/r/1696847920-31125-1-git-send-email-shradhagupta@linux.microsoft.com
|
|
The code supports dumps with no input attributes currently
thru a combination of special-casing and luck.
Clean up the handling of ops with no inputs. Create empty
Structs, and skip printing of empty types.
This makes dos with no inputs work.
Tested-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Link: https://lore.kernel.org/r/20231006135032.3328523-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch extends the existing fib_lookup test suite by adding two test
cases (for each IP family):
* Test source IP selection from the egressing netdev.
* Test source IP selection when an IP route has a preferred src IP addr.
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Link: https://lore.kernel.org/r/20231007081415.33502-3-m@lambda.lt
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Extend the bpf_fib_lookup() helper by making it to return the source
IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set.
For example, the following snippet can be used to derive the desired
source IP address:
struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr };
ret = bpf_skb_fib_lookup(skb, p, sizeof(p),
BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH);
if (ret != BPF_FIB_LKUP_RET_SUCCESS)
return TC_ACT_SHOT;
/* the p.ipv4_src now contains the source address */
The inability to derive the proper source address may cause malfunctions
in BPF-based dataplanes for hosts containing netdevs with more than one
routable IP address or for multi-homed hosts.
For example, Cilium implements packet masquerading in BPF. If an
egressing netdev to which the Cilium's BPF prog is attached has
multiple IP addresses, then only one [hardcoded] IP address can be used for
masquerading. This breaks connectivity if any other IP address should have
been selected instead, for example, when a public and private addresses
are attached to the same egress interface.
The change was tested with Cilium [1].
Nikolay Aleksandrov helped to figure out the IPv6 addr selection.
[1]: https://github.com/cilium/cilium/pull/28283
Signed-off-by: Martynas Pumputis <m@lambda.lt>
Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
A previous commit updated the verifier to print an accurate failure
message for when someone specifies a nonzero return value from an async
callback. This adds a testcase for validating that the verifier emits
the correct message in such a case.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231009161414.235829-2-void@manifault.com
|
|
Verify the following behavior holds true for writes and reads of HWCR from
host userspace:
* Attempts to set bits 3, 6, or 8 are ignored
* Bits 18 and 24 are the only bits that can be set
* Any bit that can be set can also be cleared
Signed-off-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20230929230246.1954854-4-jmattson@google.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
A C string lacks alignment so use aligned arrays to avoid potential
alignment problems. Switch to using sizeof (less 1 for the \0
terminator) rather than a hardcode size constant.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231007044439.25171-2-irogers@google.com
|
|
libbpf accesses the ELF data requiring at least 8 byte alignment,
however, the data is generated into a C string that doesn't guarantee
alignment. Fix this by assigning to an aligned char array. Use sizeof
on the array, less one for the \0 terminator, rather than generating a
constant.
Fixes: a6cc6b34b93e ("bpftool: Provide a helper method for accessing skeleton's embedded ELF data")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20231007044439.25171-1-irogers@google.com
|
|
Now that we support pinning a BPF timer to the current core, we should
test it with some selftests. This patch adds two new testcases to the
timer suite, which verifies that a BPF timer both with and without
BPF_F_TIMER_ABS, can be pinned to the calling core with BPF_F_TIMER_CPU_PIN.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20231004162339.200702-3-void@manifault.com
|
|
BPF supports creating high resolution timers using bpf_timer_* helper
functions. Currently, only the BPF_F_TIMER_ABS flag is supported, which
specifies that the timeout should be interpreted as absolute time. It
would also be useful to be able to pin that timer to a core. For
example, if you wanted to make a subset of cores run without timer
interrupts, and only have the timer be invoked on a single core.
This patch adds support for this with a new BPF_F_TIMER_CPU_PIN flag.
When specified, the HRTIMER_MODE_PINNED flag is passed to
hrtimer_start(). A subsequent patch will update selftests to validate.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20231004162339.200702-2-void@manifault.com
|
|
Martin reported that on his local dev machine the test_tc_chain_mixed() fails as
"test_tc_chain_mixed:FAIL:seen_tc5 unexpected seen_tc5: actual 1 != expected 0"
and others occasionally, too.
However, when running in a more isolated setup (qemu in particular), it works fine
for him. The reason is that there is a small race-window where seen_tc* could turn
into true for various test cases when there is background traffic, e.g. after the
asserts they often get reset. In such case when subsequent detach takes place,
unrelated background traffic could have already flipped the bool to true beforehand.
Add a small helper tc_skel_reset_all_seen() to reset all bools before we do the ping
test. At this point, everything is set up as expected and therefore no race can occur.
All tc_{opts,links} tests continue to pass after this change.
Reported-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20231006220655.1653-7-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add a new test case to query on an empty bpf_mprog and pass the revision
directly into expected_revision for attachment to assert that this does
succeed.
./test_progs -t tc_opts
[ 1.406778] tsc: Refined TSC clocksource calibration: 3407.990 MHz
[ 1.408863] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fcaf6eb0, max_idle_ns: 440795321766 ns
[ 1.412419] clocksource: Switched to clocksource tsc
[ 1.428671] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.430260] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#252 tc_opts_after:OK
#253 tc_opts_append:OK
#254 tc_opts_basic:OK
#255 tc_opts_before:OK
#256 tc_opts_chain_classic:OK
#257 tc_opts_chain_mixed:OK
#258 tc_opts_delete_empty:OK
#259 tc_opts_demixed:OK
#260 tc_opts_detach:OK
#261 tc_opts_detach_after:OK
#262 tc_opts_detach_before:OK
#263 tc_opts_dev_cleanup:OK
#264 tc_opts_invalid:OK
#265 tc_opts_max:OK
#266 tc_opts_mixed:OK
#267 tc_opts_prepend:OK
#268 tc_opts_query:OK
#269 tc_opts_query_attach:OK <--- (new test)
#270 tc_opts_replace:OK
#271 tc_opts_revision:OK
Summary: 20/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20231006220655.1653-6-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Simplify __assert_mprog_count() to remove the -ENOENT corner case as the
bpf_prog_query() now returns 0 when no bpf_mprog is attached. This also
allows to convert a few test cases from using raw __assert_mprog_count()
over to plain assert_mprog_count() helper.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20231006220655.1653-5-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Add a new test case which performs double query of the bpf_mprog through
libbpf API, but also via raw bpf(2) syscall. This is testing to gather
first the count and then in a subsequent probe the full information with
the program array without clearing passed structs in between.
# ./vmtest.sh -- ./test_progs -t tc_opts
[...]
./test_progs -t tc_opts
[ 1.398818] tsc: Refined TSC clocksource calibration: 3407.999 MHz
[ 1.400263] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x311fd336761, max_idle_ns: 440795243819 ns
[ 1.402734] clocksource: Switched to clocksource tsc
[ 1.426639] bpf_testmod: loading out-of-tree module taints kernel.
[ 1.428112] bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel
#252 tc_opts_after:OK
#253 tc_opts_append:OK
#254 tc_opts_basic:OK
#255 tc_opts_before:OK
#256 tc_opts_chain_classic:OK
#257 tc_opts_chain_mixed:OK
#258 tc_opts_delete_empty:OK
#259 tc_opts_demixed:OK
#260 tc_opts_detach:OK
#261 tc_opts_detach_after:OK
#262 tc_opts_detach_before:OK
#263 tc_opts_dev_cleanup:OK
#264 tc_opts_invalid:OK
#265 tc_opts_max:OK
#266 tc_opts_mixed:OK
#267 tc_opts_prepend:OK
#268 tc_opts_query:OK <--- (new test)
#269 tc_opts_replace:OK
#270 tc_opts_revision:OK
Summary: 19/0 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20231006220655.1653-4-daniel@iogearbox.net
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
These duplicate defines should automatically be picked up from kernel
headers.
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Remove duplicate defines which are already defined in kernel headers and
re-definition isn't required.
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Remove duplicate defines which are already included in kernel headers.
MAX_PID_NS_LEVEL macro is used inside kernel only. It isn't exposed to
userspace. So it is never defined in test application. Remove #ifndef in
this case.
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
These duplicate defines should automatically be picked up from kernel
headers. Use KHDR_INCLUDES to add kernel header files.
Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Add some tests to cover the new PR_MDWE_NO_INHERIT flag of the
PR_SET_MDWE prctl.
Check that:
- it can't be set without PR_SET_MDWE
- MDWE flags can't be unset
- when set, PR_SET_MDWE doesn't propagate to children
Link: https://lkml.kernel.org/r/20230828150858.393570-7-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ayush Jain <ayush.jain3@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This extends the current PR_SET_MDWE prctl arg with a bit to indicate that
the process doesn't want MDWE protection to propagate to children.
To implement this no-inherit mode, the tag in current->mm->flags must be
absent from MMF_INIT_MASK. This means that the encoding for "MDWE but
without inherit" is different in the prctl than in the mm flags. This
leads to a bit of bit-mangling in the prctl implementation.
Link: https://lkml.kernel.org/r/20230828150858.393570-6-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ayush Jain <ayush.jain3@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Defining a prctl flag as an int is a footgun because on a 64 bit machine
and with a variadic implementation of prctl (like in musl and glibc), when
used directly as a prctl argument, it can get casted to long with garbage
upper bits which would result in unexpected behaviors.
This patch changes the constant to an unsigned long to eliminate that
possibilities. This does not break UAPI.
I think that a stable backport would be "nice to have": to reduce the
chances that users build binaries that could end up with garbage bits in
their MDWE prctl arguments. We are not aware of anyone having yet
encountered this corner case with MDWE prctls but a backport would reduce
the likelihood it happens, since this sort of issues has happened with
other prctls. But If this is perceived as a backporting burden, I suppose
we could also live without a stable backport.
Link: https://lkml.kernel.org/r/20230828150858.393570-5-revest@chromium.org
Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl")
Signed-off-by: Florent Revest <revest@chromium.org>
Suggested-by: Alexey Izbyshev <izbyshev@ispras.ru>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ayush Jain <ayush.jain3@amd.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: Topi Miettinen <toiwoton@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Invalid prctls return a negative code and set errno. It's good practice
to check that errno is set as expected.
Link: https://lkml.kernel.org/r/20230828150858.393570-4-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ayush Jain <ayush.jain3@amd.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
I checked with the original author, the mmap_FIXED test case wasn't
properly tested and fails. Currently, it maps two consecutive (non
overlapping) pages and expects the second mapping to be denied by MDWE but
these two pages have nothing to do with each other so MDWE is actually out
of the picture here.
What the test actually intended to do was to remap a virtual address using
MAP_FIXED. However, this operation unmaps the existing mapping and
creates a new one so the va is backed by a new page and MDWE is again out
of the picture, all remappings should succeed.
This patch keeps the test case to make it clear that this situation is
expected to work: MDWE shouldn't block a MAP_FIXED replacement.
Link: https://lkml.kernel.org/r/20230828150858.393570-3-revest@chromium.org
Fixes: 4cf1fe34fd18 ("kselftest: vm: add tests for memory-deny-write-execute")
Signed-off-by: Florent Revest <revest@chromium.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Ryan Roberts <ryan.roberts@arm.com>
Tested-by: Ayush Jain <ayush.jain3@amd.com>
Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: Topi Miettinen <toiwoton@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "MDWE without inheritance", v4.
Joey recently introduced a Memory-Deny-Write-Executable (MDWE) prctl which
tags current with a flag that prevents pages that were previously not
executable from becoming executable. This tag always gets inherited by
children tasks. (it's in MMF_INIT_MASK)
At Google, we've been using a somewhat similar downstream patch for a few
years now. To make the adoption of this feature easier, we've had it
support a mode in which the W^X flag does not propagate to children. For
example, this is handy if a C process which wants W^X protection suspects
it could start children processes that would use a JIT.
I'd like to align our features with the upstream prctl. This series
proposes a new NO_INHERIT flag to the MDWE prctl to make this kind of
adoption easier. It sets a different flag in current that is not in
MMF_INIT_MASK and which does not propagate.
As part of looking into MDWE, I also fixed a couple of things in the MDWE
test.
The background for this was discussed in these threads:
v1: https://lore.kernel.org/all/66900d0ad42797a55259061f757beece@ispras.ru/
v2: https://lore.kernel.org/all/d7e3749c-a718-df94-92af-1cb0fecab772@redhat.com/
This patch (of 6):
Fix tabs/spaces inconsistency in the mdwe test.
Link: https://lkml.kernel.org/r/20230828150858.393570-1-revest@chromium.org
Link: https://lkml.kernel.org/r/20230828150858.393570-2-revest@chromium.org
Signed-off-by: Florent Revest <revest@chromium.org>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ayush Jain <ayush.jain3@amd.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Joey Gouly <joey.gouly@arm.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: Topi Miettinen <toiwoton@gmail.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Extract duplicate code from these four functions
unix_redir_to_connected()
udp_redir_to_connected()
inet_unix_redir_to_connected()
unix_inet_redir_to_connected()
to generate a new helper pairs_redir_to_connected(). Create the
different socketpairs in these four functions, then pass the
socketpairs info to the new common helper to do the connections.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Link: https://lore.kernel.org/r/54bb28dcf764e7d4227ab160883931d2173f4f3d.1696588133.git.geliang.tang@suse.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
We currently expect up to a three-digit number of tests and subtests, so:
#999/999: some_test/some_subtest: ...
Is the largest test/subtest we can see. If we happen to cross into
1000s, current logic will just truncate everything after 7th character.
This patch fixes this truncate and allows to go way higher (up to 31
characters in total). We still nicely align test numbers:
#60/66 core_reloc_btfgen/type_based___incompat:OK
#60/67 core_reloc_btfgen/type_based___fn_wrong_args:OK
#60/68 core_reloc_btfgen/type_id:OK
#60/69 core_reloc_btfgen/type_id___missing_targets:OK
#60/70 core_reloc_btfgen/enumval:OK
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20231006175744.3136675-3-andrii@kernel.org
|