Age | Commit message (Collapse) | Author |
|
For struct_ops progs, whether a particular prog uses private stack
depends on prog->aux->priv_stack_requested setting before actual
insn-level verification for that prog. One particular implementation
is to piggyback on struct_ops->check_member(). The next patch has
an example for this. The struct_ops->check_member() sets
prog->aux->priv_stack_requested to be true which enables private stack
usage.
The struct_ops prog follows the same rule as kprobe/tracing progs after
function bpf_enable_priv_stack(). For example, even a struct_ops prog
requests private stack, it could still use normal kernel stack if
the stack size is small (< 64 bytes).
Similar to tracing progs, nested same cpu same prog run will be skipped.
A field (recursion_detected()) is added to bpf_prog_aux structure.
If bpf_prog->aux->recursion_detected is implemented by the struct_ops
subsystem and nested same cpu/prog happens, the function will be
triggered to report an error, collect related info, etc.
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163933.2224962-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Private stack is allocated in function bpf_int_jit_compile() with
alignment 8. Private stack allocation size includes the stack size
determined by verifier and additional space to protect stack overflow
and underflow. See below an illustration:
---> memory address increasing
[8 bytes to protect overflow] [normal stack] [8 bytes to protect underflow]
If overflow/underflow is detected, kernel messages will be
emited in dmesg like
BPF private stack overflow/underflow detected for prog Fx
BPF Private stack overflow/underflow detected for prog bpf_prog_a41699c234a1567a_subprog1x
Those messages are generated when I made some changes to jitted code
to intentially cause overflow for some progs.
For the jited prog, The x86 register 9 (X86_REG_R9) is used to replace
bpf frame register (BPF_REG_10). The private stack is used per
subprog per cpu. The X86_REG_R9 is saved and restored around every
func call (not including tailcall) to maintain correctness of
X86_REG_R9.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163922.2224385-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
If private stack is used by any subprog, set that subprog
prog->aux->jits_use_priv_stack to be true so later jit can allocate
private stack for that subprog properly.
Also set env->prog->aux->jits_use_priv_stack to be true if
any subprog uses private stack. This is a use case for a
single main prog (no subprogs) to use private stack, and
also a use case for later struct-ops progs where
env->prog->aux->jits_use_priv_stack will enable recursion
check if any subprog uses private stack.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163912.2224007-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Private stack will be allocated with percpu allocator in jit time.
To avoid complexity at runtime, only one copy of private stack is
available per cpu per prog. So runtime recursion check is necessary
to avoid stack corruption.
Current private stack only supports kprobe/perf_event/tp/raw_tp
which has recursion check in the kernel, and prog types that use
bpf trampoline recursion check. For trampoline related prog types,
currently only tracing progs have recursion checking.
To avoid complexity, all async_cb subprogs use normal kernel stack
including those subprogs used by both main prog subtree and async_cb
subtree. Any prog having tail call also uses kernel stack.
To avoid jit penalty with private stack support, a subprog stack
size threshold is set such that only if the stack size is no less
than the threshold, private stack is supported. The current threshold
is 64 bytes. This avoids jit penality if the stack usage is small.
A useless 'continue' is also removed from a loop in func
check_max_stack_depth().
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20241112163907.2223839-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Commit c25b2ae13603 ("bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX |
PTR_MAYBE_NULL") moved the fields around and misplaced the
documentation for "PTR_TO_BTF_ID_OR_NULL". So, let's replace it in the
proper place.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241111124911.1436911-1-dongml2@chinatelecom.cn
|
|
Logic to prevent callbacks from acquiring new references for the program
(i.e. leaving acquired references), and releasing caller references
(i.e. those acquired in parent frames) was introduced in commit
9d9d00ac29d0 ("bpf: Fix reference state management for synchronous callbacks").
This was necessary because back then, the verifier simulated each
callback once (that could potentially be executed N times, where N can
be zero). This meant that callbacks that left lingering resources or
cleared caller resources could do it more than once, operating on
undefined state or leaking memory.
With the fixes to callback verification in commit
ab5cfac139ab ("bpf: verify callbacks as if they are called unknown number of times"),
all of this extra logic is no longer necessary. Hence, drop it as part
of this commit.
Cc: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241109231430.2475236-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
When bpf_spin_lock was introduced originally, there was deliberation on
whether to use an array of lock IDs, but since bpf_spin_lock is limited
to holding a single lock at any given time, we've been using a single ID
to identify the held lock.
In preparation for introducing spin locks that can be taken multiple
times, introduce support for acquiring multiple lock IDs. For this
purpose, reuse the acquired_refs array and store both lock and pointer
references. We tag the entry with REF_TYPE_PTR or REF_TYPE_LOCK to
disambiguate and find the relevant entry. The ptr field is used to track
the map_ptr or btf (for bpf_obj_new allocations) to ensure locks can be
matched with protected fields within the same "allocation", i.e.
bpf_obj_new object or map value.
The struct active_lock is changed to an int as the state is part of the
acquired_refs array, and we only need active_lock as a cheap way of
detecting lock presence.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241109231430.2475236-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Adding support to attach BPF program for entry and return probe
of the same function. This is common use case which at the moment
requires to create two uprobe multi links.
Adding new BPF_TRACE_UPROBE_SESSION attach type that instructs
kernel to attach single link program to both entry and exit probe.
It's possible to control execution of the BPF program on return
probe simply by returning zero or non zero from the entry BPF
program execution to execute or not the BPF program on return
probe respectively.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241108134544.480660-4-jolsa@kernel.org
|
|
Stable tag for bpf-next's uprobe work.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
|
|
Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL. However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].
Thus, there is a discrepancy between the reality, that raw_tp arguments
can actually be NULL, and the verifier's knowledge, that they are never
NULL, causing explicit NULL checks to be deleted, and accesses to such
pointers potentially crashing the kernel.
To fix this, mark raw_tp arguments as PTR_MAYBE_NULL, and then special
case the dereference and pointer arithmetic to permit it, and allow
passing them into helpers/kfuncs; these exceptions are made for raw_tp
programs only. Ensure that we don't do this when ref_obj_id > 0, as in
that case this is an acquired object and doesn't need such adjustment.
The reason we do mask_raw_tp_trusted_reg logic is because other will
recheck in places whether the register is a trusted_reg, and then
consider our register as untrusted when detecting the presence of the
PTR_MAYBE_NULL flag.
To allow safe dereference, we enable PROBE_MEM marking when we see loads
into trusted pointers with PTR_MAYBE_NULL.
While trusted raw_tp arguments can also be passed into helpers or kfuncs
where such broken assumption may cause issues, a future patch set will
tackle their case separately, as PTR_TO_BTF_ID (without PTR_TRUSTED) can
already be passed into helpers and causes similar problems. Thus, they
are left alone for now.
It is possible that these checks also permit passing non-raw_tp args
that are trusted PTR_TO_BTF_ID with null marking. In such a case,
allowing dereference when pointer is NULL expands allowed behavior, so
won't regress existing programs, and the case of passing these into
helpers is the same as above and will be dealt with later.
Also update the failure case in tp_btf_nullable selftest to capture the
new behavior, as the verifier will no longer cause an error when
directly dereference a raw tracepoint argument marked as __nullable.
[0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Reported-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241104171959.2938862-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
The static inline btf_type_is_struct_ptr() function calls
btf_type_skip_modifiers() which is guarded by CONFIG_BPF_SYSCALL.
btf_type_is_struct_ptr() is also only called by CONFIG_BPF_SYSCALL
ifdef code, so let's only expose btf_type_is_struct_ptr() if
CONFIG_BPF_SYSCALL is defined.
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Link: https://lore.kernel.org/r/20241104060300.421403-1-alistair.francis@wdc.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Avoid taking refcount on uprobe in prepare_uretprobe(), instead take
uretprobe-specific SRCU lock and keep it active as kernel transfers
control back to user space.
Given we can't rely on user space returning from traced function within
reasonable time period, we need to make sure not to keep SRCU lock
active for too long, though. To that effect, we employ a timer callback
which is meant to terminate SRCU lock region after predefined timeout
(currently set to 100ms), and instead transfer underlying struct
uprobe's lifetime protection to refcounting.
This fallback to less scalable refcounting after 100ms is a fine
tradeoff from uretprobe's scalability and performance perspective,
because uretprobing *long running* user functions inherently doesn't run
into scalability issues (there is just not enough frequency to cause
noticeable issues with either performance or scalability).
The overall trick is in ensuring synchronization between current thread
and timer's callback fired on some other thread. To cope with that with
minimal logic complications, we add hprobe wrapper which is used to
contain all the synchronization related issues behind a small number of
basic helpers: hprobe_expire() for "downgrading" uprobe from SRCU-protected
state to refcounted state, and a hprobe_consume() and hprobe_finalize()
pair of single-use consuming helpers. Other than that, whatever current
thread's logic is there stays the same, as timer thread cannot modify
return_instance state (or add new/remove old return_instances). It only
takes care of SRCU unlock and uprobe refcounting, which is hidden from
the higher-level uretprobe handling logic.
We use atomic xchg() in hprobe_consume(), which is called from
performance critical handle_uretprobe_chain() function run in the
current context. When uncontended, this xchg() doesn't seem to hurt
performance as there are no other competing CPUs fighting for the same
cache line. We also mark struct return_instance as ____cacheline_aligned
to ensure no false sharing can happen.
Another technical moment. We need to make sure that the list of return
instances can be safely traversed under RCU from timer callback, so we
delay return_instance freeing with kfree_rcu() and make sure that list
modifications use RCU-aware operations.
Also, given SRCU lock survives transition from kernel to user space and
back we need to use lower-level __srcu_read_lock() and
__srcu_read_unlock() to avoid lockdep complaining.
Just to give an impression of a kind of performance improvements this
change brings, below are benchmarking results with and without these
SRCU changes, assuming other uprobe optimizations (mainly RCU Tasks
Trace for entry uprobes, lockless RB-tree lookup, and lockless VMA to
uprobe lookup) are left intact:
WITHOUT SRCU for uretprobes
===========================
uretprobe-nop ( 1 cpus): 2.197 ± 0.002M/s ( 2.197M/s/cpu)
uretprobe-nop ( 2 cpus): 3.325 ± 0.001M/s ( 1.662M/s/cpu)
uretprobe-nop ( 3 cpus): 4.129 ± 0.002M/s ( 1.376M/s/cpu)
uretprobe-nop ( 4 cpus): 6.180 ± 0.003M/s ( 1.545M/s/cpu)
uretprobe-nop ( 8 cpus): 7.323 ± 0.005M/s ( 0.915M/s/cpu)
uretprobe-nop (16 cpus): 6.943 ± 0.005M/s ( 0.434M/s/cpu)
uretprobe-nop (32 cpus): 5.931 ± 0.014M/s ( 0.185M/s/cpu)
uretprobe-nop (64 cpus): 5.145 ± 0.003M/s ( 0.080M/s/cpu)
uretprobe-nop (80 cpus): 4.925 ± 0.005M/s ( 0.062M/s/cpu)
WITH SRCU for uretprobes
========================
uretprobe-nop ( 1 cpus): 1.968 ± 0.001M/s ( 1.968M/s/cpu)
uretprobe-nop ( 2 cpus): 3.739 ± 0.003M/s ( 1.869M/s/cpu)
uretprobe-nop ( 3 cpus): 5.616 ± 0.003M/s ( 1.872M/s/cpu)
uretprobe-nop ( 4 cpus): 7.286 ± 0.002M/s ( 1.822M/s/cpu)
uretprobe-nop ( 8 cpus): 13.657 ± 0.007M/s ( 1.707M/s/cpu)
uretprobe-nop (32 cpus): 45.305 ± 0.066M/s ( 1.416M/s/cpu)
uretprobe-nop (64 cpus): 42.390 ± 0.922M/s ( 0.662M/s/cpu)
uretprobe-nop (80 cpus): 47.554 ± 2.411M/s ( 0.594M/s/cpu)
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241024044159.3156646-3-andrii@kernel.org
|
|
The rapl pmu is die scope, which is supported by the generic perf_event
subsystem now.
Set the scope for the rapl PMU and remove all the cpumask and hotplug
codes.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com>
Link: https://lore.kernel.org/r/20241010142604.770192-2-kan.liang@linux.intel.com
|
|
Cross-merge bpf fixes after downstream PR.
No conflicts.
Adjacent changes in:
include/linux/bpf.h
include/uapi/linux/bpf.h
kernel/bpf/btf.c
kernel/bpf/helpers.c
kernel/bpf/syscall.c
kernel/bpf/verifier.c
kernel/trace/bpf_trace.c
mm/slab_common.c
tools/include/uapi/linux/bpf.h
tools/testing/selftests/bpf/Makefile
Link: https://lore.kernel.org/all/20241024215724.60017-1-daniel@iogearbox.net/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pull bpf fixes from Daniel Borkmann:
- Fix an out-of-bounds read in bpf_link_show_fdinfo for BPF sockmap
link file descriptors (Hou Tao)
- Fix BPF arm64 JIT's address emission with tag-based KASAN enabled
reserving not enough size (Peter Collingbourne)
- Fix BPF verifier do_misc_fixups patching for inlining of the
bpf_get_branch_snapshot BPF helper (Andrii Nakryiko)
- Fix a BPF verifier bug and reject BPF program write attempts into
read-only marked BPF maps (Daniel Borkmann)
- Fix perf_event_detach_bpf_prog error handling by removing an invalid
check which would skip BPF program release (Jiri Olsa)
- Fix memory leak when parsing mount options for the BPF filesystem
(Hou Tao)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Check validity of link->type in bpf_link_show_fdinfo()
bpf: Add the missing BPF_LINK_TYPE invocation for sockmap
bpf: fix do_misc_fixups() for bpf_get_branch_snapshot()
bpf,perf: Fix perf_event_detach_bpf_prog error handling
selftests/bpf: Add test for passing in uninit mtu_len
selftests/bpf: Add test for writes to .rodata
bpf: Remove MEM_UNINIT from skb/xdp MTU helpers
bpf: Fix overloading of MEM_UNINIT's meaning
bpf: Add MEM_WRITE attribute
bpf: Preserve param->string when parsing mount options
bpf, arm64: Fix address emission with tag-based KASAN enabled
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfiler, xfrm and bluetooth.
Oddly this includes a fix for a posix clock regression; in our
previous PR we included a change there as a pre-requisite for
networking one. That fix proved to be buggy and requires the follow-up
included here. Thomas suggested we should send it, given we sent the
buggy patch.
Current release - regressions:
- posix-clock: Fix unbalanced locking in pc_clock_settime()
- netfilter: fix typo causing some targets not to load on IPv6
Current release - new code bugs:
- xfrm: policy: remove last remnants of pernet inexact list
Previous releases - regressions:
- core: fix races in netdev_tx_sent_queue()/dev_watchdog()
- bluetooth: fix UAF on sco_sock_timeout
- eth: hv_netvsc: fix VF namespace also in synthetic NIC
NETDEV_REGISTER event
- eth: usbnet: fix name regression
- eth: be2net: fix potential memory leak in be_xmit()
- eth: plip: fix transmit path breakage
Previous releases - always broken:
- sched: deny mismatched skip_sw/skip_hw flags for actions created by
classifiers
- netfilter: bpf: must hold reference on net namespace
- eth: virtio_net: fix integer overflow in stats
- eth: bnxt_en: replace ptp_lock with irqsave variant
- eth: octeon_ep: add SKB allocation failures handling in
__octep_oq_process_rx()
Misc:
- MAINTAINERS: add Simon as an official reviewer"
* tag 'net-6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
net: dsa: mv88e6xxx: support 4000ps cycle counter period
net: dsa: mv88e6xxx: read cycle counter period from hardware
net: dsa: mv88e6xxx: group cycle counter coefficients
net: usb: qmi_wwan: add Fibocom FG132 0x0112 composition
hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event
net: dsa: microchip: disable EEE for KSZ879x/KSZ877x/KSZ876x
Bluetooth: ISO: Fix UAF on iso_sock_timeout
Bluetooth: SCO: Fix UAF on sco_sock_timeout
Bluetooth: hci_core: Disable works on hci_unregister_dev
posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime()
r8169: avoid unsolicited interrupts
net: sched: use RCU read-side critical section in taprio_dump()
net: sched: fix use-after-free in taprio_change()
net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers
net: usb: usbnet: fix name regression
mlxsw: spectrum_router: fix xa_store() error checking
virtio_net: fix integer overflow in stats
net: fix races in netdev_tx_sent_queue()/dev_watchdog()
net: wwan: fix global oob in wwan_rtnl_policy
netfilter: xtables: fix typo causing some targets not to load on IPv6
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Get correct cores_per_package for SMT systems, enable IRQ if do_ale()
triggered in irq-enabled context, and fix some bugs about vDSO, memory
managenent, hrtimer in KVM, etc"
* tag 'loongarch-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Mark hrtimer to expire in hard interrupt context
LoongArch: Make KASAN usable for variable cpu_vabits
LoongArch: Set initial pte entry with PAGE_GLOBAL for kernel space
LoongArch: Don't crash in stack_top() for tasks without vDSO
LoongArch: Set correct size for vDSO code mapping
LoongArch: Enable IRQ if do_ale() triggered in irq-enabled context
LoongArch: Get correct cores_per_package for SMT systems
LoongArch: Use "Exception return address" to comment ERA
|
|
This patch adds uptr support in the map_value of the task local storage.
struct map_value {
struct user_data __uptr *uptr;
};
struct {
__uint(type, BPF_MAP_TYPE_TASK_STORAGE);
__uint(map_flags, BPF_F_NO_PREALLOC);
__type(key, int);
__type(value, struct value_type);
} datamap SEC(".maps");
A new bpf_obj_pin_uptrs() is added to pin the user page and
also stores the kernel address back to the uptr for the
bpf prog to use later. It currently does not support
the uptr pointing to a user struct across two pages.
It also excludes PageHighMem support to keep it simple.
As of now, the 32bit bpf jit is missing other more crucial bpf
features. For example, many important bpf features depend on
bpf kfunc now but so far only one arch (x86-32) supports it
which was added by me as an example when kfunc was first
introduced to bpf.
The uptr can only be stored to the task local storage by the
syscall update_elem. Meaning the uptr will not be considered
if it is provided by the bpf prog through
bpf_task_storage_get(BPF_LOCAL_STORAGE_GET_F_CREATE).
This is enforced by only calling
bpf_local_storage_update(swap_uptrs==true) in
bpf_pid_task_storage_update_elem. Everywhere else will
have swap_uptrs==false.
This will pump down to bpf_selem_alloc(swap_uptrs==true). It is
the only case that bpf_selem_alloc() will take the uptr value when
updating the newly allocated selem. bpf_obj_swap_uptrs() is added
to swap the uptr between the SDATA(selem)->data and the user provided
map_value in "void *value". bpf_obj_swap_uptrs() makes the
SDATA(selem)->data takes the ownership of the uptr and the user space
provided map_value will have NULL in the uptr.
The bpf_obj_unpin_uptrs() is called after map->ops->map_update_elem()
returning error. If the map->ops->map_update_elem has reached
a state that the local storage has taken the uptr ownership,
the bpf_obj_unpin_uptrs() will be a no op because the uptr
is NULL. A "__"bpf_obj_unpin_uptrs is added to make this
error path unpin easier such that it does not have to check
the map->record is NULL or not.
BPF_F_LOCK is not supported when the map_value has uptr.
This can be revisited later if there is a use case. A similar
swap_uptrs idea can be considered.
The final bit is to do unpin_user_page in the bpf_obj_free_fields().
The earlier patch has ensured that the bpf_obj_free_fields() has
gone through the rcu gp when needed.
Cc: linux-mm@kvack.org
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Link: https://lore.kernel.org/r/20241023234759.860539-7-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
In a later patch, bpf_selem_free() will call unpin_user_page()
through bpf_obj_free_fields(). unpin_user_page() may take spin_lock.
However, some bpf_selem_free() call paths have held a raw_spin_lock.
Like this:
raw_spin_lock_irqsave()
bpf_selem_unlink_storage_nolock()
bpf_selem_free()
unpin_user_page()
spin_lock()
To avoid spinlock nested in raw_spinlock, bpf_selem_free() should be
done after releasing the raw_spinlock. The "bool reuse_now" arg is
replaced with "struct hlist_head *free_selem_list" in
bpf_selem_unlink_storage_nolock(). The bpf_selem_unlink_storage_nolock()
will append the to-be-free selem at the free_selem_list. The caller of
bpf_selem_unlink_storage_nolock() will need to call the new
bpf_selem_free_list(free_selem_list, reuse_now) to free the selem
after releasing the raw_spinlock.
Note that the selem->snode cannot be reused for linking to
the free_selem_list because the selem->snode is protected by the
raw_spinlock that we want to avoid holding. A new
"struct hlist_node free_node;" is union-ized with
the rcu_head. Only the first one successfully
hlist_del_init_rcu(&selem->snode) will be able
to use the free_node. After succeeding hlist_del_init_rcu(&selem->snode),
the free_node and rcu_head usage is serialized such that they
can share the 16 bytes in a union.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
bpf_selem_alloc()
In a later patch, the task local storage will only accept uptr
from the syscall update_elem and will not accept uptr from
the bpf prog. The reason is the bpf prog does not have a way
to provide a valid user space address.
bpf_local_storage_update() and bpf_selem_alloc() are used by
both bpf prog bpf_task_storage_get(BPF_LOCAL_STORAGE_GET_F_CREATE)
and bpf syscall update_elem. "bool swap_uptrs" arg is added
to bpf_local_storage_update() and bpf_selem_alloc() to tell if
it is called by the bpf prog or by the bpf syscall. When
swap_uptrs==true, it is called by the syscall.
The arg is named (swap_)uptrs because the later patch will swap
the uptrs between the newly allocated selem and the user space
provided map_value. It will make error handling easier in case
map->ops->map_update_elem() fails and the caller can decide
if it needs to unpin the uptr in the user space provided
map_value or the bpf_local_storage_update() has already
taken the uptr ownership and will take care of unpinning it also.
Only swap_uptrs==false is passed now. The logic to handle
the true case will be added in a later patch.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-4-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This patch introduces the "__uptr" type tag to BTF. It is to define
a pointer pointing to the user space memory. This patch adds BTF
logic to pass the "__uptr" type tag.
btf_find_kptr() is reused for the "__uptr" tag. The "__uptr" will only
be supported in the map_value of the task storage map. However,
btf_parse_struct_meta() also uses btf_find_kptr() but it is not
interested in "__uptr". This patch adds a "field_mask" argument
to btf_find_kptr() which will return BTF_FIELD_IGNORE if the
caller is not interested in a “__uptr” field.
btf_parse_kptr() is also reused to parse the uptr.
The btf_check_and_fixup_fields() is changed to do extra
checks on the uptr to ensure that its struct size is not larger
than PAGE_SIZE. It is not clear how a uptr pointing to a CO-RE
supported kernel struct will be used, so it is also not allowed now.
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241023234759.860539-2-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
There is an out-of-bounds read in bpf_link_show_fdinfo() for the sockmap
link fd. Fix it by adding the missing BPF_LINK_TYPE invocation for
sockmap link
Also add comments for bpf_link_type to prevent missing updates in the
future.
Fixes: 699c23f02c65 ("bpf: Add bpf_link support for sk_msg and sk_skb progs")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241024013558.1135167-2-houtao@huaweicloud.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Luiz Augusto von Dentz says:
====================
bluetooth pull request for net:
- hci_core: Disable works on hci_unregister_dev
- SCO: Fix UAF on sco_sock_timeout
- ISO: Fix UAF on iso_sock_timeout
* tag 'for-net-2024-10-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
Bluetooth: ISO: Fix UAF on iso_sock_timeout
Bluetooth: SCO: Fix UAF on sco_sock_timeout
Bluetooth: hci_core: Disable works on hci_unregister_dev
====================
Link: https://patch.msgid.link/20241023143005.2297694-1-luiz.dentz@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2024-10-22
1) Fix routing behavior that relies on L4 information
for xfrm encapsulated packets.
From Eyal Birger.
2) Remove leftovers of pernet policy_inexact lists.
From Florian Westphal.
3) Validate new SA's prefixlen when the selector family is
not set from userspace.
From Sabrina Dubroca.
4) Fix a kernel-infoleak when dumping an auth algorithm.
From Petr Vaganov.
Please pull or let me know if there are problems.
ipsec-2024-10-22
* tag 'ipsec-2024-10-22' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec:
xfrm: fix one more kernel-infoleak in algo dumping
xfrm: validate new SA's prefixlen using SA family when sel.family is unset
xfrm: policy: remove last remnants of pernet inexact list
xfrm: respect ip protocols rules criteria when performing dst lookups
xfrm: extract dst lookup parameters into a struct
====================
Link: https://patch.msgid.link/20241022092226.654370-1-steffen.klassert@secunet.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This change allows the uprobe consumer to behave as session which
means that 'handler' and 'ret_handler' callbacks are connected in
a way that allows to:
- control execution of 'ret_handler' from 'handler' callback
- share data between 'handler' and 'ret_handler' callbacks
The session concept fits to our common use case where we do filtering
on entry uprobe and based on the result we decide to run the return
uprobe (or not).
It's also convenient to share the data between session callbacks.
To achive this we are adding new return value the uprobe consumer
can return from 'handler' callback:
UPROBE_HANDLER_IGNORE
- Ignore 'ret_handler' callback for this consumer.
And store cookie and pass it to 'ret_handler' when consumer has both
'handler' and 'ret_handler' callbacks defined.
We store shared data in the return_consumer object array as part of
the return_instance object. This way the handle_uretprobe_chain can
find related return_consumer and its shared data.
We also store entry handler return value, for cases when there are
multiple consumers on single uprobe and some of them are ignored and
some of them not, in which case the return probe gets installed and
we need to have a way to find out which consumer needs to be ignored.
The tricky part is when consumer is registered 'after' the uprobe
entry handler is hit. In such case this consumer's 'ret_handler' gets
executed as well, but it won't have the proper data pointer set,
so we can filter it out.
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241018202252.693462-3-jolsa@kernel.org
|
|
Adding data pointer to both entry and exit consumer handlers and all
its users. The functionality itself is coming in following change.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20241018202252.693462-2-jolsa@kernel.org
|
|
conn->sk maybe have been unlinked/freed while waiting for sco_conn_lock
so this checks if the conn->sk is still valid by checking if it part of
sco_sk_list.
Reported-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com
Tested-by: syzbot+4c0d0c4cde787116d465@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4c0d0c4cde787116d465
Fixes: ba316be1b6a0 ("Bluetooth: schedule SCO timeouts with delayed_work")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
|
|
Add a MEM_WRITE attribute for BPF helper functions which can be used in
bpf_func_proto to annotate an argument type in order to let the verifier
know that the helper writes into the memory passed as an argument. In
the past MEM_UNINIT has been (ab)used for this function, but the latter
merely tells the verifier that the passed memory can be uninitialized.
There have been bugs with overloading the latter but aside from that
there are also cases where the passed memory is read + written which
currently cannot be expressed, see also 4b3786a6c539 ("bpf: Zero former
ARG_PTR_TO_{LONG,INT} args in case of error").
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241021152809.33343-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM64:
- Fix the guest view of the ID registers, making the relevant fields
writable from userspace (affecting ID_AA64DFR0_EL1 and
ID_AA64PFR1_EL1)
- Correcly expose S1PIE to guests, fixing a regression introduced in
6.12-rc1 with the S1POE support
- Fix the recycling of stage-2 shadow MMUs by tracking the context
(are we allowed to block or not) as well as the recycling state
- Address a couple of issues with the vgic when userspace
misconfigures the emulation, resulting in various splats. Headaches
courtesy of our Syzkaller friends
- Stop wasting space in the HYP idmap, as we are dangerously close to
the 4kB limit, and this has already exploded in -next
- Fix another race in vgic_init()
- Fix a UBSAN error when faking the cache topology with MTE enabled
RISCV:
- RISCV: KVM: use raw_spinlock for critical section in imsic
x86:
- A bandaid for lack of XCR0 setup in selftests, which causes trouble
if the compiler is configured to have x86-64-v3 (with AVX) as the
default ISA. Proper XCR0 setup will come in the next merge window.
- Fix an issue where KVM would not ignore low bits of the nested CR3
and potentially leak up to 31 bytes out of the guest memory's
bounds
- Fix case in which an out-of-date cached value for the segments
could by returned by KVM_GET_SREGS.
- More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL
- Override MTRR state for KVM confidential guests, making it WB by
default as is already the case for Hyper-V guests.
Generic:
- Remove a couple of unused functions"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
RISCV: KVM: use raw_spinlock for critical section in imsic
KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
KVM: selftests: x86: Avoid using SSE/AVX instructions
KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
x86/kvm: Override default caching mode for SEV-SNP and TDX
KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
KVM: Remove unused kvm_vcpu_gfn_to_pfn
KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
KVM: arm64: Fix shift-out-of-bounds bug
KVM: arm64: Shave a few bytes from the EL2 idmap code
KVM: arm64: Don't eagerly teardown the vgic on init error
KVM: arm64: Expose S1PIE to guests
KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"afs:
- Fix a lock recursion in afs_wake_up_async_call() on ->notify_lock
netfs:
- Drop the references to a folio immediately after the folio has been
extracted to prevent races with future I/O collection
- Fix a documenation build error
- Downgrade the i_rwsem for buffered writes to fix a cifs reported
performance regression when switching to netfslib
vfs:
- Explicitly return -E2BIG from openat2() if the specified size is
unexpectedly large. This aligns openat2() with other extensible
struct based system calls
- When copying a mount namespace ensure that we only try to remove
the new copy from the mount namespace rbtree if it has already been
added to it
nilfs:
- Clear the buffer delay flag when clearing the buffer state clags
when a buffer head is discarded to prevent a kernel OOPs
ocfs2:
- Fix an unitialized value warning in ocfs2_setattr()
proc:
- Fix a kernel doc warning"
* tag 'vfs-6.12-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
proc: Fix W=1 build kernel-doc warning
afs: Fix lock recursion
fs: Fix uninitialized value issue in from_kuid and from_kgid
fs: don't try and remove empty rbtree node
netfs: Downgrade i_rwsem for a buffered write
nilfs2: fix kernel bug due to missing clearing of buffer delay flag
openat2: explicitly return -E2BIG for (usize > PAGE_SIZE)
netfs: fix documentation build error
netfs: In readahead, put the folio refs as soon extracted
|
|
There are two pages in one TLB entry on LoongArch system. For kernel
space, it requires both two pte entries (buddies) with PAGE_GLOBAL bit
set, otherwise HW treats it as non-global tlb, there will be potential
problems if tlb entry for kernel space is not global. Such as fail to
flush kernel tlb with the function local_flush_tlb_kernel_range() which
supposed only flush tlb with global bit.
Kernel address space areas include percpu, vmalloc, vmemmap, fixmap and
kasan areas. For these areas both two consecutive page table entries
should be enabled with PAGE_GLOBAL bit. So with function set_pte() and
pte_clear(), pte buddy entry is checked and set besides its own pte
entry. However it is not atomic operation to set both two pte entries,
there is problem with test_vmalloc test case.
So function kernel_pte_init() is added to init a pte table when it is
created for kernel address space, and the default initial pte value is
PAGE_GLOBAL rather than zero at beginning. Then only its own pte entry
need update with function set_pte() and pte_clear(), nothing to do with
the pte buddy entry.
Signed-off-by: Bibo Mao <maobibo@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
|
|
Some workloads hit the infamous dev_watchdog() message:
"NETDEV WATCHDOG: eth0 (xxxx): transmit queue XX timed out"
It seems possible to hit this even for perfectly normal
BQL enabled drivers:
1) Assume a TX queue was idle for more than dev->watchdog_timeo
(5 seconds unless changed by the driver)
2) Assume a big packet is sent, exceeding current BQL limit.
3) Driver ndo_start_xmit() puts the packet in TX ring,
and netdev_tx_sent_queue() is called.
4) QUEUE_STATE_STACK_XOFF could be set from netdev_tx_sent_queue()
before txq->trans_start has been written.
5) txq->trans_start is written later, from netdev_start_xmit()
if (rc == NETDEV_TX_OK)
txq_trans_update(txq)
dev_watchdog() running on another cpu could read the old
txq->trans_start, and then see QUEUE_STATE_STACK_XOFF, because 5)
did not happen yet.
To solve the issue, write txq->trans_start right before one XOFF bit
is set :
- _QUEUE_STATE_DRV_XOFF from netif_tx_stop_queue()
- __QUEUE_STATE_STACK_XOFF from netdev_tx_sent_queue()
From dev_watchdog(), we have to read txq->state before txq->trans_start.
Add memory barriers to enforce correct ordering.
In the future, we could avoid writing over txq->trans_start for normal
operations, and rename this field to txq->xoff_start_time.
Fixes: bec251bc8b6a ("net: no longer stop all TX queues in dev_watchdog()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://patch.msgid.link/20241015194118.3951657-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small tty and serial driver fixes for 6.12-rc4:
- qcom-geni serial driver fixes, wow what a mess of a UART chip that
thing is...
- vt infoleak fix for odd font sizes
- imx serial driver bugfix
- yet-another n_gsm ldisc bugfix, slowly chipping down the issues in
that piece of code
All of these have been in linux-next for over a week with no reported
issues"
* tag 'tty-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: qcom-geni: rename suspend functions
serial: qcom-geni: drop unused receive parameter
serial: qcom-geni: drop flip buffer WARN()
serial: qcom-geni: fix rx cancel dma status bit
serial: qcom-geni: fix receiver enable
serial: qcom-geni: fix dma rx cancellation
serial: qcom-geni: fix shutdown race
serial: qcom-geni: revert broken hibernation support
serial: qcom-geni: fix polled console initialisation
serial: imx: Update mctrl old_status on RTSD interrupt
tty: n_gsm: Fix use-after-free in gsm_cleanup_mux
vt: prevent kernel-infoleak in con_font_get()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Fix a case for sifive-plic where an interrupt gets disabled *and*
masked and remains masked when it gets reenabled later
- Plug a small race in GIC-v4 where userspace can force an affinity
change of a virtual CPU (vPE) in its unmapping path
- Do not mix the two sets of ocelot irqchip's registers in the mask
calculation of the main interrupt sticky register
- Other smaller fixlets and cleanups
* tag 'irq_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/renesas-rzg2l: Fix missing put_device
irqchip/riscv-intc: Fix SMP=n boot with ACPI
irqchip/sifive-plic: Unmask interrupt in plic_irq_enable()
irqchip/gic-v4: Don't allow a VMOVP on a dying VPE
irqchip/sifive-plic: Return error code on failure
irqchip/riscv-imsic: Fix output text of base address
irqchip/ocelot: Comment sticky register clearing code
irqchip/ocelot: Fix trigger register address
irqchip: Remove obsolete config ARM_GIC_V3_ITS_PCI
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduling fixes from Borislav Petkov:
- Add PREEMPT_RT maintainers
- Fix another aspect of delayed dequeued tasks wrt determining their
state, i.e., whether they're runnable or blocked
- Handle delayed dequeued tasks and their migration wrt PSI properly
- Fix the situation where a delayed dequeue task gets enqueued into a
new class, which should not happen
- Fix a case where memory allocation would happen while the runqueue
lock is held, which is a no-no
- Do not over-schedule when tasks with shorter slices preempt the
currently running task
- Make sure delayed to deque entities are properly handled before
unthrottling
- Other smaller cleanups and improvements
* tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add an entry for PREEMPT_RT.
sched/fair: Fix external p->on_rq users
sched/psi: Fix mistaken CPU pressure indication after corrupted task state bug
sched/core: Dequeue PSI signals for blocked tasks that are delayed
sched: Fix delayed_dequeue vs switched_from_fair()
sched/core: Disable page allocation in task_tick_mm_cid()
sched/deadline: Use hrtick_enabled_dl() before start_hrtick_dl()
sched/eevdf: Fix wakeup-preempt by checking cfs_rq->nr_running
sched: Fix sched_delayed vs cfs_bandwidth
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix from Juergen Gross:
"A single fix for a build failure introduced this merge window"
* tag 'for-linus-6.12a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen: Remove dependency between pciback and privcmd
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fix from Christoph Hellwig:
"Just another small tracing fix from Sean"
* tag 'dma-mapping-6.12-2024-10-20' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: fix tracing dma_alloc/free with vmalloc'd memory
|
|
The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit
1bbc60d0c7e5 ("KVM: x86/mmu: Remove MMU auditing")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Message-ID: <20241001141354.18009-3-linux@treblig.org>
[Adjust Documentation/virt/kvm/locking.rst. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The last use of kvm_vcpu_gfn_to_pfn was removed by commit
b1624f99aa8f ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Message-ID: <20241001141354.18009-2-linux@treblig.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to not affect subreg_def marks in its range
propagation (Eduard Zingerman)
- Fix a truncation bug in the BPF verifier's handling of
coerce_reg_to_size_sx (Dimitar Kanaliev)
- Fix the BPF verifier's delta propagation between linked registers
under 32-bit addition (Daniel Borkmann)
- Fix a NULL pointer dereference in BPF devmap due to missing rxq
information (Florian Kauer)
- Fix a memory leak in bpf_core_apply (Jiri Olsa)
- Fix an UBSAN-reported array-index-out-of-bounds in BTF parsing for
arrays of nested structs (Hou Tao)
- Fix build ID fetching where memory areas backing the file were
created with memfd_secret (Andrii Nakryiko)
- Fix BPF task iterator tid filtering which was incorrectly using pid
instead of tid (Jordan Rome)
- Several fixes for BPF sockmap and BPF sockhash redirection in
combination with vsocks (Michal Luczaj)
- Fix riscv BPF JIT and make BPF_CMPXCHG fully ordered (Andrea Parri)
- Fix riscv BPF JIT under CONFIG_CFI_CLANG to prevent the possibility
of an infinite BPF tailcall (Pu Lehui)
- Fix a build warning from resolve_btfids that bpf_lsm_key_free cannot
be resolved (Thomas Weißschuh)
- Fix a bug in kfunc BTF caching for modules where the wrong BTF object
was returned (Toke Høiland-Jørgensen)
- Fix a BPF selftest compilation error in cgroup-related tests with
musl libc (Tony Ambardar)
- Several fixes to BPF link info dumps to fill missing fields (Tyrone
Wu)
- Add BPF selftests for kfuncs from multiple modules, checking that the
correct kfuncs are called (Simon Sundberg)
- Ensure that internal and user-facing bpf_redirect flags don't overlap
(Toke Høiland-Jørgensen)
- Switch to use kvzmalloc to allocate BPF verifier environment (Rik van
Riel)
- Use raw_spinlock_t in BPF ringbuf to fix a sleep in atomic splat
under RT (Wander Lairson Costa)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (38 commits)
lib/buildid: Handle memfd_secret() files in build_id_parse()
selftests/bpf: Add test case for delta propagation
bpf: Fix print_reg_state's constant scalar dump
bpf: Fix incorrect delta propagation between linked registers
bpf: Properly test iter/task tid filtering
bpf: Fix iter/task tid filtering
riscv, bpf: Make BPF_CMPXCHG fully ordered
bpf, vsock: Drop static vsock_bpf_prot initialization
vsock: Update msg_count on read_skb()
vsock: Update rx_bytes on read_skb()
bpf, sockmap: SK_DROP on attempted redirects of unsupported af_vsock
selftests/bpf: Add asserts for netfilter link info
bpf: Fix link info netfilter flags to populate defrag flag
selftests/bpf: Add test for sign extension in coerce_subreg_to_size_sx()
selftests/bpf: Add test for truncation after sign extension in coerce_reg_to_size_sx()
bpf: Fix truncation bug in coerce_reg_to_size_sx()
selftests/bpf: Assert link info uprobe_multi count & path_size if unset
bpf: Fix unpopulated path_size when uprobe_multi fields unset
selftests/bpf: Fix cross-compiling urandom_read
selftests/bpf: Add test for kfunc module order
...
|
|
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- Fix target passthrough identifier (Nilay)
- Fix tcp locking (Hannes)
- Replace list with sbitmap for tracking RDMA rsp tags (Guixen)
- Remove unnecessary fallthrough statements (Tokunori)
- Remove ready-without-media support (Greg)
- Fix multipath partition scan deadlock (Keith)
- Fix concurrent PCI reset and remove queue mapping (Maurizio)
- Fabrics shutdown fixes (Nilay)
- Fix for a kerneldoc warning (Keith)
- Fix a race with blk-rq-qos and wakeups (Omar)
- Cleanup of checking for always-set tag_set (SurajSonawane2415)
- Fix for a crash with CPU hotplug notifiers (Ming)
- Don't allow zero-copy ublk on unprivileged device (Ming)
- Use array_index_nospec() for CDROM (Josh)
- Remove dead code in drbd (David)
- Tweaks to elevator loading (Breno)
* tag 'block-6.12-20241018' of git://git.kernel.dk/linux:
cdrom: Avoid barrier_nospec() in cdrom_ioctl_media_changed()
nvme: use helper nvme_ctrl_state in nvme_keep_alive_finish function
nvme: make keep-alive synchronous operation
nvme-loop: flush off pending I/O while shutting down loop controller
nvme-pci: fix race condition between reset and nvme_dev_disable()
ublk: don't allow user copy for unprivileged device
blk-rq-qos: fix crash on rq_qos_wait vs. rq_qos_wake_function race
nvme-multipath: defer partition scanning
blk-mq: setup queue ->tag_set before initializing hctx
elevator: Remove argument from elevator_find_get
elevator: do not request_module if elevator exists
drbd: Remove unused conn_lowest_minor
nvme: disable CC.CRIME (NVME_CC_CRIME)
nvme: delete unnecessary fallthru comment
nvmet-rdma: use sbitmap to replace rsp free list
block: Fix elevator_get_default() checking for NULL q->tag_set
nvme: tcp: avoid race between queue_lock lock and destroy
nvmet-passthru: clear EUID/NGUID/UUID while using loop target
block: fix blk_rq_map_integrity_sg kernel-doc
|
|
Pull xfs fixes from Carlos Maiolino:
- Fix integer overflow in xrep_bmap
- Fix stale dealloc punching for COW IO
* tag 'xfs-6.12-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: punch delalloc extents from the COW fork for COW writes
xfs: set IOMAP_F_SHARED for all COW fork allocations
xfs: share more code in xfs_buffered_write_iomap_begin
xfs: support the COW fork in xfs_bmap_punch_delalloc_range
xfs: IOMAP_ZERO and IOMAP_UNSHARE already hold invalidate_lock
xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof
xfs: factor out a xfs_file_write_zero_eof helper
iomap: move locking out of iomap_write_delalloc_release
iomap: remove iomap_file_buffered_write_punch_delalloc
iomap: factor out a iomap_last_written_block helper
xfs: fix integer overflow in xrep_bmap
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes, msm and xe are the two main ones, with a bunch of
scattered fixes including a largish revert in mgag200, then amdgpu,
vmwgfx and scattering of other minor ones.
All seems pretty regular.
msm:
- Display:
- move CRTC resource assignment to atomic_check otherwise to make
consecutive calls to atomic_check() consistent
- fix rounding / sign-extension issues with pclk calculation in
case of DSC
- cleanups to drop incorrect null checks in dpu snapshots
- fix to use kvzalloc in dpu snapshot to avoid allocation issues
in heavily loaded system cases
- Fix to not program merge_3d block if dual LM is not being used
- Fix to not flush merge_3d block if its not enabled otherwise
this leads to false timeouts
- GPU:
- a7xx: add a fence wait before SMMU table update
xe:
- New workaround to Xe2 (Aradhya)
- Fix unbalanced rpm put (Matthew Auld)
- Remove fragile lock optimization (Matthew Brost)
- Fix job release, delegating it to the drm scheduler (Matthew Brost)
- Fix timestamp bit width for Xe2 (Lucas)
- Fix external BO's dma-resv usag (Matthew Brost)
- Fix returning success for timeout in wait_token (Nirmoy)
- Initialize fence to avoid it being detected as signaled (Matthew
Auld)
- Improve cache flush for BMG (Matthew Auld)
- Don't allow hflip for tile4 framebuffer on Xe2 (Juha-Pekka)
amdgpu:
- SR-IOV fix
- CS chunk handling fix
- MES fixes
- SMU13 fixes
amdkfd:
- VRAM usage reporting fix
radeon:
- Fix possible_clones handling
i915:
- Two DP bandwidth related MST fixes
ast:
- Clear EDID on unplugged connectors
host1x:
- Fix boot on Tegra186
- Set DMA parameters
mgag200:
- Revert VBLANK support
panel:
- himax-hx83192: Adjust power and gamma
qaic:
- Sgtable loop fixes
vmwgfx:
- Limit display layout allocatino size
- Handle allocation errors in connector checks
- Clean up KMS code for 2d-only setup
- Report surface-check errors correctly
- Remove NULL test around kvfree()"
* tag 'drm-fixes-2024-10-18' of https://gitlab.freedesktop.org/drm/kernel: (45 commits)
drm/ast: vga: Clear EDID if no display is connected
drm/ast: sil164: Clear EDID if no display is connected
Revert "drm/mgag200: Add vblank support"
drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs
drm/i915/display: Don't allow tile4 framebuffer to do hflip on display20 or greater
drm/xe/bmg: improve cache flushing behaviour
drm/xe/xe_sync: initialise ufence.signalled
drm/xe/ufence: ufence can be signaled right after wait_woken
drm/xe: Use bookkeep slots for external BO's in exec IOCTL
drm/xe/query: Increase timestamp width
drm/xe: Don't free job in TDR
drm/xe: Take job list lock in xe_sched_add_pending_job
drm/xe: fix unbalanced rpm put() with declare_wedged()
drm/xe: fix unbalanced rpm put() with fence_fini()
drm/xe/xe2lpg: Extend Wa_15016589081 for xe2lpg
drm/i915/dp_mst: Don't require DSC hblank quirk for a non-DSC compatible mode
drm/i915/dp_mst: Handle error during DSC BW overhead/slice calculation
drm/msm/a6xx+: Insert a fence wait before SMMU table update
drm/msm/dpu: don't always program merge_3d block
drm/msm/dpu: Don't always set merge_3d pending flush
...
|
|
Commit 2fae6bb7be32 ("xen/privcmd: Add new syscall to get gsi from dev")
adds a weak reverse dependency to the config XEN_PRIVCMD definition, that
dependency causes xen-privcmd can't be loaded on domU, because dependent
xen-pciback isn't always be loaded successfully on domU.
To solve above problem, remove that dependency, and do not call
pcistub_get_gsi_from_sbdf() directly, instead add a hook in
drivers/xen/apci.c, xen-pciback register the real call function, then in
privcmd_ioctl_pcidev_get_gsi call that hook.
Fixes: 2fae6bb7be32 ("xen/privcmd: Add new syscall to get gsi from dev")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Message-ID: <20241012084537.1543059-1-Jiqian.Chen@amd.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"28 hotfixes. 13 are cc:stable. 23 are MM.
It is the usual shower of unrelated singletons - please see the
individual changelogs for details"
* tag 'mm-hotfixes-stable-2024-10-17-16-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (28 commits)
maple_tree: add regression test for spanning store bug
maple_tree: correct tree corruption on spanning store
mm/mglru: only clear kswapd_failures if reclaimable
mm/swapfile: skip HugeTLB pages for unuse_vma
selftests: mm: fix the incorrect usage() info of khugepaged
MAINTAINERS: add Jann as memory mapping/VMA reviewer
mm: swap: prevent possible data-race in __try_to_reclaim_swap
mm: khugepaged: fix the incorrect statistics when collapsing large file folios
MAINTAINERS: kasan, kcov: add bugzilla links
mm: don't install PMD mappings when THPs are disabled by the hw/process/vma
mm: huge_memory: add vma_thp_disabled() and thp_disabled_by_hw()
Docs/damon/maintainer-profile: update deprecated awslabs GitHub URLs
Docs/damon/maintainer-profile: add missing '_' suffixes for external web links
maple_tree: check for MA_STATE_BULK on setting wr_rebalance
mm: khugepaged: fix the arguments order in khugepaged_collapse_file trace point
mm/damon/tests/sysfs-kunit.h: fix memory leak in damon_sysfs_test_add_targets()
mm: remove unused stub for can_swapin_thp()
mailmap: add an entry for Andy Chiu
MAINTAINERS: add memory mapping/VMA co-maintainers
fs/proc: fix build with GCC 15 due to -Werror=unterminated-string-initialization
...
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
ast:
- Clear EDID on unplugged connectors
host1x:
- Fix boot on Tegra186
- Set DMA parameters
mgag200:
- Revert VBLANK support
panel:
- himax-hx83192: Adjust power and gamma
qaic:
- Sgtable loop fixes
vmwgfx:
- Limit display layout allocatino size
- Handle allocation errors in connector checks
- Clean up KMS code for 2d-only setup
- Report surface-check errors correctly
- Remove NULL test around kvfree()
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241017115516.GA196624@linux.fritz.box
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of small fixes, nothing really stands out:
- Usual HD-audio quirks / device-specific fixes
- Kconfig dependency fix for UM
- A series of minor fixes for SoundWire
- Updates of USB-audio LINE6 contact address"
* tag 'sound-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2
ALSA/hda: intel-sdw-acpi: add support for sdw-manager-list property read
ALSA/hda: intel-sdw-acpi: simplify sdw-master-count property read
ALSA/hda: intel-sdw-acpi: fetch fwnode once in sdw_intel_scan_controller()
ALSA/hda: intel-sdw-acpi: cleanup sdw_intel_scan_controller
ALSA: hda/tas2781: Add new quirk for Lenovo, ASUS, Dell projects
ALSA: scarlett2: Add error check after retrieving PEQ filter values
ALSA: hda/cs8409: Fix possible NULL dereference
sound: Make CONFIG_SND depend on INDIRECT_IOMEM instead of UML
ALSA: line6: update contact information
ALSA: usb-audio: Fix NULL pointer deref in snd_usb_power_domain_set()
ALSA: hda/conexant - Fix audio routing for HP EliteOne 1000 G2
ALSA: hda: Sound support for HP Spectre x360 16 inch model 2024
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Current release - new code bugs:
- eth: mlx5: HWS, don't destroy more bwc queue locks than allocated
Previous releases - regressions:
- ipv4: give an IPv4 dev to blackhole_netdev
- udp: compute L4 checksum as usual when not segmenting the skb
- tcp/dccp: don't use timer_pending() in reqsk_queue_unlink().
- eth: mlx5e: don't call cleanup on profile rollback failure
- eth: microchip: vcap api: fix memory leaks in
vcap_api_encode_rule_test()
- eth: enetc: disable Tx BD rings after they are empty
- eth: macb: avoid 20s boot delay by skipping MDIO bus registration
for fixed-link PHY
Previous releases - always broken:
- posix-clock: fix missing timespec64 check in pc_clock_settime()
- genetlink: hold RCU in genlmsg_mcast()
- mptcp: prevent MPC handshake on port-based signal endpoints
- eth: vmxnet3: fix packet corruption in vmxnet3_xdp_xmit_frame
- eth: stmmac: dwmac-tegra: fix link bring-up sequence
- eth: bcmasp: fix potential memory leak in bcmasp_xmit()
Misc:
- add Andrew Lunn as a co-maintainer of all networking drivers"
* tag 'net-6.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (47 commits)
net/mlx5e: Don't call cleanup on profile rollback failure
net/mlx5: Unregister notifier on eswitch init failure
net/mlx5: Fix command bitmask initialization
net/mlx5: Check for invalid vector index on EQ creation
net/mlx5: HWS, use lock classes for bwc locks
net/mlx5: HWS, don't destroy more bwc queue locks than allocated
net/mlx5: HWS, fixed double free in error flow of definer layout
net/mlx5: HWS, removed wrong access to a number of rules variable
mptcp: pm: fix UaF read in mptcp_pm_nl_rm_addr_or_subflow
net: ethernet: mtk_eth_soc: fix memory corruption during fq dma init
vmxnet3: Fix packet corruption in vmxnet3_xdp_xmit_frame
net: dsa: vsc73xx: fix reception from VLAN-unaware bridges
net: ravb: Only advertise Rx/Tx timestamps if hardware supports it
net: microchip: vcap api: Fix memory leaks in vcap_api_encode_rule_test()
net: phy: mdio-bcm-unimac: Add BCM6846 support
dt-bindings: net: brcm,unimac-mdio: Add bcm6846-mdio
udp: Compute L4 checksum as usual when not segmenting the skb
genetlink: hold RCU in genlmsg_mcast()
net: dsa: mv88e6xxx: Fix the max_vid definition for the MV88E6361
tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().
...
|
|
Not all virtual addresses have physical addresses, such as if they were
vmalloc'd. Just trace the virtual address instead of trying to trace a
physical address. This aligns with the API, and is good enough to
associate dma_alloc with dma_free.
Fixes: 038eb433dc14 ("dma-mapping: add tracing for dma-mapping API calls")
Reported-by: syzbot+b4bfacdec173efaa8567@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/670ebde5.050a0220.d9b66.0154.GAE@google.com/
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Don't mislead the callers of bpf_{sk,msg}_redirect_{map,hash}(): make sure
to immediately and visibly fail the forwarding of unsupported af_vsock
packets.
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20241013-vsock-fixes-for-redir-v2-1-d6577bbfe742@rbox.co
|