Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
"Core locking primitives updates:
- Remove mutex_trylock_recursive() from the API - no users left
- Simplify + constify the futex code a bit
Lockdep updates:
- Teach lockdep about local_lock_t
- Add CONFIG_DEBUG_IRQFLAGS=y debug config option to check for
potentially unsafe IRQ mask restoration patterns. (I.e.
calling raw_local_irq_restore() with IRQs enabled.)
- Add wait context self-tests
- Fix graph lock corner case corrupting internal data structures
- Fix noinstr annotations
LKMM updates:
- Simplify the litmus tests
- Documentation fixes
KCSAN updates:
- Re-enable KCSAN instrumentation in lib/random32.c
Misc fixes:
- Don't branch-trace static label APIs
- DocBook fix
- Remove stale leftover empty file"
* tag 'locking-core-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
checkpatch: Don't check for mutex_trylock_recursive()
locking/mutex: Kill mutex_trylock_recursive()
s390: Use arch_local_irq_{save,restore}() in early boot code
lockdep: Noinstr annotate warn_bogus_irq_restore()
locking/lockdep: Avoid unmatched unlock
locking/rwsem: Remove empty rwsem.h
locking/rtmutex: Add missing kernel-doc markup
futex: Remove unneeded gotos
futex: Change utime parameter to be 'const ... *'
lockdep: report broken irq restoration
jump_label: Do not profile branch annotations
locking: Add Reviewers
locking/selftests: Add local_lock inversion tests
locking/lockdep: Exclude local_lock_t from IRQ inversions
locking/lockdep: Clean up check_redundant() a bit
locking/lockdep: Add a skip() function to __bfs()
locking/lockdep: Mark local_lock_t
locking/selftests: More granular debug_locks_verbose
lockdep/selftest: Add wait context selftests
tools/memory-model: Fix typo in klitmus7 compatibility table
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
"These are the latest RCU updates for v5.12:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide
allocator return addresses to more easily locate bugs. This has a
couple of RCU-related commits, but is mostly MM. Was pulled in with
akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks, which enables
better debugging information and smarter reactions to large numbers
of callbacks.
- The first round of changes to allow CPUs to be runtime switched
from and to callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a
"torture everything" script that runs rcutorture, locktorture,
scftorture, rcuscale, and refscale. Plus does an allmodconfig
build.
- nolibc fixes for the torture tests"
* tag 'core-rcu-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (130 commits)
percpu_ref: Dump mem_dump_obj() info upon reference-count underflow
rcu: Make call_rcu() print mem_dump_obj() info for double-freed callback
mm: Make mem_obj_dump() vmalloc() dumps include start and length
mm: Make mem_dump_obj() handle vmalloc() memory
mm: Make mem_dump_obj() handle NULL and zero-sized pointers
mm: Add mem_dump_obj() to print source of memory block
tools/rcutorture: Fix position of -lgcc in mkinitrd.sh
tools/nolibc: Fix position of -lgcc in the documented example
tools/nolibc: Emit detailed error for missing alternate syscall number definitions
tools/nolibc: Remove incorrect definitions of __ARCH_WANT_*
tools/nolibc: Get timeval, timespec and timezone from linux/time.h
tools/nolibc: Implement poll() based on ppoll()
tools/nolibc: Implement fork() based on clone()
tools/nolibc: Make getpgrp() fall back to getpgid(0)
tools/nolibc: Make dup2() rely on dup3() when available
tools/nolibc: Add the definition for dup()
rcutorture: Add rcutree.use_softirq=0 to RUDE01 and TASKS01
torture: Maintain torture-specific set of CPUs-online books
torture: Clean up after torture-test CPU hotplugging
rcutorture: Make object_debug also double call_rcu() heap object
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
"Time and timer updates:
- Instead of new drivers remove tango, sirf, u300 and atlas drivers
- Add suspend/resume support for microchip pit64b
- The usual fixes, improvements and cleanups here and there"
* tag 'timers-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timens: Delete no-op time_ns_init()
alarmtimer: Update kerneldoc
clocksource/drivers/timer-microchip-pit64b: Add clocksource suspend/resume
clocksource/drivers/prima: Remove sirf prima driver
clocksource/drivers/atlas: Remove sirf atlas driver
clocksource/drivers/tango: Remove tango driver
clocksource/drivers/u300: Remove the u300 driver
dt-bindings: timer: nuvoton: Clarify that interrupt of timer 0 should be specified
clocksource/drivers/davinci: Move pr_fmt() before the includes
clocksource/drivers/efm32: Drop unused timer code
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the irq subsystem:
- The usual new irq chip driver (Realtek RTL83xx)
- Removal of sirfsoc and tango irq chip drivers
- Conversion of the sun6i chip support to hierarchical irq domains
- The usual fixes, improvements and cleanups all over the place"
* tag 'irq-core-2021-02-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/imx: IMX_INTMUX should not default to y, unconditionally
irqchip/loongson-pch-msi: Use bitmap_zalloc() to allocate bitmap
irqchip/csky-mpintc: Prevent selection on unsupported platforms
irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller
dt-bindings: interrupt-controller: Add Realtek RTL838x/RTL839x support
irqchip/ls-extirq: add IRQCHIP_SKIP_SET_WAKE to the irqchip flags
genirq: Use new tasklet API for resend_tasklet
dt-bindings: qcom,pdc: Add compatible for SM8350
dt-bindings: qcom,pdc: Add compatible for SM8250
irqchip/sun6i-r: Add wakeup support
irqchip/sun6i-r: Use a stacked irqchip driver
dt-bindings: irq: sun6i-r: Add a compatible for the H3
dt-bindings: irq: sun6i-r: Split the binding from sun7i-nmi
irqchip/gic-v3: Fix typos in PMR/RPR SCR_EL3.FIQ handling explanation
irqchip: Remove sirfsoc driver
irqchip: Remove sigma tango driver
|
|
Pull core block updates from Jens Axboe:
"Another nice round of removing more code than what is added, mostly
due to Christoph's relentless pursuit of tech debt removal/cleanups.
This pull request contains:
- Two series of BFQ improvements (Paolo, Jan, Jia)
- Block iov_iter improvements (Pavel)
- bsg error path fix (Pan)
- blk-mq scheduler improvements (Jan)
- -EBUSY discard fix (Jan)
- bvec allocation improvements (Ming, Christoph)
- bio allocation and init improvements (Christoph)
- Store bdev pointer in bio instead of gendisk + partno (Christoph)
- Block trace point cleanups (Christoph)
- hard read-only vs read-only split (Christoph)
- Block based swap cleanups (Christoph)
- Zoned write granularity support (Damien)
- Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)"
* tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits)
mm: simplify swapdev_block
sd_zbc: clear zone resources for non-zoned case
block: introduce blk_queue_clear_zone_settings()
zonefs: use zone write granularity as block size
block: introduce zone_write_granularity limit
block: use blk_queue_set_zoned in add_partition()
nullb: use blk_queue_set_zoned() to setup zoned devices
nvme: cleanup zone information initialization
block: document zone_append_max_bytes attribute
block: use bi_max_vecs to find the bvec pool
md/raid10: remove dead code in reshape_request
block: mark the bio as cloned in bio_iov_bvec_set
block: set BIO_NO_PAGE_REF in bio_iov_bvec_set
block: remove a layer of indentation in bio_iov_iter_get_pages
block: turn the nr_iovecs argument to bio_alloc* into an unsigned short
block: remove the 1 and 4 vec bvec_slabs entries
block: streamline bvec_alloc
block: factor out a bvec_alloc_gfp helper
block: move struct biovec_slab to bio.c
block: reuse BIO_INLINE_VECS for integrity bvecs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux
Pull oprofile and dcookies removal from Viresh Kumar:
"Remove oprofile and dcookies support
The 'oprofile' user-space tools don't use the kernel OPROFILE support
any more, and haven't in a long time. User-space has been converted to
the perf interfaces.
The dcookies stuff is only used by the oprofile code. Now that
oprofile's support is getting removed from the kernel, there is no
need for dcookies as well.
Remove kernel's old oprofile and dcookies support"
* tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux:
fs: Remove dcookies support
drivers: Remove CONFIG_OPROFILE support
arch: xtensa: Remove CONFIG_OPROFILE support
arch: x86: Remove CONFIG_OPROFILE support
arch: sparc: Remove CONFIG_OPROFILE support
arch: sh: Remove CONFIG_OPROFILE support
arch: s390: Remove CONFIG_OPROFILE support
arch: powerpc: Remove oprofile
arch: powerpc: Stop building and using oprofile
arch: parisc: Remove CONFIG_OPROFILE support
arch: mips: Remove CONFIG_OPROFILE support
arch: microblaze: Remove CONFIG_OPROFILE support
arch: ia64: Remove rest of perfmon support
arch: ia64: Remove CONFIG_OPROFILE support
arch: hexagon: Don't select HAVE_OPROFILE
arch: arc: Remove CONFIG_OPROFILE support
arch: arm: Remove CONFIG_OPROFILE support
arch: alpha: Remove CONFIG_OPROFILE support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ELF compat updates from Al Viro:
"Sanitizing ELF compat support, especially for triarch architectures:
- X32 handling cleaned up
- MIPS64 uses compat_binfmt_elf.c both for O32 and N32 now
- Kconfig side of things regularized
Eventually I hope to have compat_binfmt_elf.c killed, with both native
and compat built from fs/binfmt_elf.c, with -DELF_BITS={64,32} passed
by kbuild, but that's a separate story - not included here"
* 'work.elf-compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
get rid of COMPAT_ELF_EXEC_PAGESIZE
compat_binfmt_elf: don't bother with undef of ELF_ARCH
Kconfig: regularize selection of CONFIG_BINFMT_ELF
mips compat: switch to compat_binfmt_elf.c
mips: don't bother with ELF_CORE_EFLAGS
mips compat: don't bother with ELF_ET_DYN_BASE
mips: KVM_GUEST makes no sense for 64bit builds...
mips: kill unused definitions in binfmt_elf[on]32.c
mips binfmt_elf*32.c: use elfcore-compat.h
x32: make X32, !IA32_EMULATION setups able to execute x32 binaries
[amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly
elf_prstatus: collect the common part (everything before pr_reg) into a struct
binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add a new power capping facility allowing aggregate power
constraints to be applied to sets of devices in a distributed manner,
add a new CPU ID to the RAPL power capping driver and improve it, drop
a cpufreq driver belonging to a platform that is not supported any
more, drop two redundant cpufreq driver flags, update cpufreq drivers
(intel_pstate, brcmstb-avs, qcom-hw), update the operating performance
points (OPP) framework (code cleanups, new helpers, devfreq-related
modifications), clean up devfreq, extend the PM clock layer, update
the cpupower utility and make assorted janitorial changes.
Specifics:
- Add new power capping facility called DTPM (Dynamic Thermal Power
Management), based on the existing power capping framework, to
allow aggregate power constraints to be applied to sets of devices
in a distributed manner, along with a CPU backend driver based on
the Energy Model (Daniel Lezcano, Dan Carpenter, Colin Ian King).
- Add AlderLake Mobile support to the Intel RAPL power capping driver
and make it use the topology interface when laying out the system
topology (Zhang Rui, Yunfeng Ye).
- Drop the cpufreq tango driver belonging to a platform that is not
supported any more (Arnd Bergmann).
- Drop the redundant CPUFREQ_STICKY and CPUFREQ_PM_NO_WARN cpufreq
driver flags (Viresh Kumar).
- Update cpufreq drivers:
* Fix max CPU frequency discovery in the intel_pstate driver and
make janitorial changes in it (Chen Yu, Rafael Wysocki, Nigel
Christian).
* Fix resource leaks in the brcmstb-avs-cpufreq driver (Christophe
JAILLET).
* Make the tegra20 driver use the resource-managed API (Dmitry
Osipenko).
* Enable boost support in the qcom-hw driver (Shawn Guo).
- Update the operating performance points (OPP) framework:
* Clean up the OPP core (Dmitry Osipenko, Viresh Kumar).
* Extend the OPP API by adding new helpers to it (Dmitry Osipenko,
Viresh Kumar).
* Allow required OPPs to be used for devfreq devices and update
the devfreq governor code accordingly (Saravana Kannan).
* Prepare the framework for introducing new dev_pm_opp_set_opp()
helper (Viresh Kumar).
* Drop dev_pm_opp_set_bw() and update related drivers (Viresh
Kumar).
* Allow lazy linking of required-OPPs (Viresh Kumar).
- Simplify and clean up devfreq somewhat (Lukasz Luba, Yang Li,
Pierre Kuo).
- Update the generic power domains (genpd) framework:
* Use device's next wakeup to determine domain idle state (Lina
Iyer).
* Improve initialization and debug (Dmitry Osipenko).
* Simplify computations (Abaci Team).
- Make janitorial changes in the core code handling system sleep and
PM-runtime (Bhaskar Chowdhury, Bjorn Helgaas, Rikard Falkeborn,
Zqiang).
- Update the MAINTAINERS entry for the exynos cpuidle driver and drop
DEBUG definition from intel_idle (Krzysztof Kozlowski, Tom Rix).
- Extend the PM clock layer to cover clocks that must sleep (Nicolas
Pitre).
- Update the cpupower utility:
* Update cpupower command, add support for AMD family 0x19 and
clean up the code to remove many of the family checks to make
future family updates easier (Nathan Fontenot, Robert Richter).
* Add Makefile dependencies for install targets to allow building
cpupower in parallel rather than serially (Ivan Babrou).
- Make janitorial changes in power management Kconfig (Lukasz Luba)"
* tag 'pm-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
MAINTAINERS: cpuidle: exynos: include header in file pattern
powercap: intel_rapl: Use topology interface in rapl_init_domains()
powercap: intel_rapl: Use topology interface in rapl_add_package()
PM: sleep: Constify static struct attribute_group
PM: Kconfig: remove unneeded "default n" options
PM: EM: update Kconfig description and drop "default n" option
cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN
cpufreq: Remove CPUFREQ_STICKY flag
PM / devfreq: Add required OPPs support to passive governor
PM / devfreq: Cache OPP table reference in devfreq
OPP: Add function to look up required OPP's for a given OPP
PM / devfreq: rk3399_dmc: Remove unneeded semicolon
opp: Replace ENOTSUPP with EOPNOTSUPP
opp: Fix "foo * bar" should be "foo *bar"
opp: Don't ignore clk_get() errors other than -ENOENT
opp: Update bandwidth requirements based on scaling up/down
opp: Allow lazy-linking of required-opps
opp: Remove dev_pm_opp_set_bw()
devfreq: tegra30: Migrate to dev_pm_opp_set_opp()
drm: msm: Migrate to dev_pm_opp_set_opp()
...
|
|
Pull networking updates from David Miller:
"Here is what we have this merge window:
1) Support SW steering for mlx5 Connect-X6Dx, from Yevgeny Kliteynik.
2) Add RSS multi group support to octeontx2-pf driver, from Geetha
Sowjanya.
3) Add support for KS8851 PHY. From Marek Vasut.
4) Add support for GarfieldPeak bluetooth controller from Kiran K.
5) Add support for half-duplex tcan4x5x can controllers.
6) Add batch skb rx processing to bcrm63xx_enet, from Sieng Piaw
Liew.
7) Rework RX port offload infrastructure, particularly wrt, UDP
tunneling, from Jakub Kicinski.
8) Add BCM72116 PHY support, from Florian Fainelli.
9) Remove Dsa specific notifiers, they are unnecessary. From Vladimir
Oltean.
10) Add support for picosecond rx delay in dwmac-meson8b chips. From
Martin Blumenstingl.
11) Support TSO on xfrm interfaces from Eyal Birger.
12) Add support for MP_PRIO to mptcp stack, from Geliang Tang.
13) Support BCM4908 integrated switch, from Rafał Miłecki.
14) Support for directly accessing kernel module variables via module
BTF info, from Andrii Naryiko.
15) Add DASH (esktop and mobile Architecture for System Hardware)
support to r8169 driver, from Heiner Kallweit.
16) Add rx vlan filtering to dpaa2-eth, from Ionut-robert Aron.
17) Add support for 100 base0x SFP devices, from Bjarni Jonasson.
18) Support link aggregation in DSA, from Tobias Waldekranz.
19) Support for bitwidse atomics in bpf, from Brendan Jackman.
20) SmartEEE support in at803x driver, from Russell King.
21) Add support for flow based tunneling to GTP, from Pravin B Shelar.
22) Allow arbitrary number of interconnrcts in ipa, from Alex Elder.
23) TLS RX offload for bonding, from Tariq Toukan.
24) RX decap offklload support in mac80211, from Felix Fietkou.
25) devlink health saupport in octeontx2-af, from George Cherian.
26) Add TTL attr to SCM_TIMESTAMP_OPT_STATS, from Yousuk Seung
27) Delegated actionss support in mptcp, from Paolo Abeni.
28) Support receive timestamping when doin zerocopy tcp receive. From
Arjun Ray.
29) HTB offload support for mlx5, from Maxim Mikityanskiy.
30) UDP GRO forwarding, from Maxim Mikityanskiy.
31) TAPRIO offloading in dsa hellcreek driver, from Kurt Kanzenbach.
32) Weighted random twos choice algorithm for ipvs, from Darby Payne.
33) Fix netdev registration deadlock, from Johannes Berg.
34) Various conversions to new tasklet api, from EmilRenner Berthing.
35) Bulk skb allocations in veth, from Lorenzo Bianconi.
36) New ethtool interface for lane setting, from Danielle Ratson.
37) Offload failiure notifications for routes, from Amit Cohen.
38) BCM4908 support, from Rafał Miłecki.
39) Support several new iwlwifi chips, from Ihab Zhaika.
40) Flow drector support for ipv6 in i40e, from Przemyslaw Patynowski.
41) Support for mhi prrotocols, from Loic Poulain.
42) Optimize bpf program stats.
43) Implement RFC6056, for better port randomization, from Eric
Dumazet.
44) hsr tag offloading support from George McCollister.
45) Netpoll support in qede, from Bhaskar Upadhaya.
46) 2005/400g speed support in bonding 3ad mode, from Nikolay
Aleksandrov.
47) Netlink event support in mptcp, from Florian Westphal.
48) Better skbuff caching, from Alexander Lobakin.
49) MRP (Media Redundancy Protocol) offloading in DSA and a few
drivers, from Horatiu Vultur.
50) mqprio saupport in mvneta, from Maxime Chevallier.
51) Remove of_phy_attach, no longer needed, from Florian Fainelli"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1766 commits)
octeontx2-pf: Fix otx2_get_fecparam()
cteontx2-pf: cn10k: Prevent harmless double shift bugs
net: stmmac: Add PCI bus info to ethtool driver query output
ptp: ptp_clockmatrix: clean-up - parenthesis around a == b are unnecessary
ptp: ptp_clockmatrix: Simplify code - remove unnecessary `err` variable.
ptp: ptp_clockmatrix: Coding style - tighten vertical spacing.
ptp: ptp_clockmatrix: Clean-up dev_*() messages.
ptp: ptp_clockmatrix: Remove unused header declarations.
ptp: ptp_clockmatrix: Add alignment of 1 PPS to idtcm_perout_enable.
ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.
net: stmmac: dwmac-sun8i: Add a shutdown callback
net: stmmac: dwmac-sun8i: Minor probe function cleanup
net: stmmac: dwmac-sun8i: Use reset_control_reset
net: stmmac: dwmac-sun8i: Remove unnecessary PHY power check
net: stmmac: dwmac-sun8i: Return void from PHY unpower
r8169: use macro pm_ptr
net: mdio: Remove of_phy_attach()
net: mscc: ocelot: select PACKING in the Kconfig
net: re-solve some conflicts after net -> net-next merge
net: dsa: tag_rtl4_a: Support also egress tags
...
|
|
|
|
Daniel Borkmann says:
====================
pull-request: bpf-next 2021-02-16
The following pull-request contains BPF updates for your *net-next* tree.
There's a small merge conflict between 7eeba1706eba ("tcp: Add receive timestamp
support for receive zerocopy.") from net-next tree and 9cacf81f8161 ("bpf: Remove
extra lock_sock for TCP_ZEROCOPY_RECEIVE") from bpf-next tree. Resolve as follows:
[...]
lock_sock(sk);
err = tcp_zerocopy_receive(sk, &zc, &tss);
err = BPF_CGROUP_RUN_PROG_GETSOCKOPT_KERN(sk, level, optname,
&zc, &len, err);
release_sock(sk);
[...]
We've added 116 non-merge commits during the last 27 day(s) which contain
a total of 156 files changed, 5662 insertions(+), 1489 deletions(-).
The main changes are:
1) Adds support of pointers to types with known size among global function
args to overcome the limit on max # of allowed args, from Dmitrii Banshchikov.
2) Add bpf_iter for task_vma which can be used to generate information similar
to /proc/pid/maps, from Song Liu.
3) Enable bpf_{g,s}etsockopt() from all sock_addr related program hooks. Allow
rewriting bind user ports from BPF side below the ip_unprivileged_port_start
range, both from Stanislav Fomichev.
4) Prevent recursion on fentry/fexit & sleepable programs and allow map-in-map
as well as per-cpu maps for the latter, from Alexei Starovoitov.
5) Add selftest script to run BPF CI locally. Also enable BPF ringbuffer
for sleepable programs, both from KP Singh.
6) Extend verifier to enable variable offset read/write access to the BPF
program stack, from Andrei Matei.
7) Improve tc & XDP MTU handling and add a new bpf_check_mtu() helper to
query device MTU from programs, from Jesper Dangaard Brouer.
8) Allow bpf_get_socket_cookie() helper also be called from [sleepable] BPF
tracing programs, from Florent Revest.
9) Extend x86 JIT to pad JMPs with NOPs for helping image to converge when
otherwise too many passes are required, from Gary Lin.
10) Verifier fixes on atomics with BPF_FETCH as well as function-by-function
verification both related to zero-extension handling, from Ilya Leoshkevich.
11) Better kernel build integration of resolve_btfids tool, from Jiri Olsa.
12) Batch of AF_XDP selftest cleanups and small performance improvement
for libbpf's xsk map redirect for newer kernels, from Björn Töpel.
13) Follow-up BPF doc and verifier improvements around atomics with
BPF_FETCH, from Brendan Jackman.
14) Permit zero-sized data sections e.g. if ELF .rodata section contains
read-only data from local variables, from Yonghong Song.
15) veth driver skb bulk-allocation for ndo_xdp_xmit, from Lorenzo Bianconi.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
test_global_func4 fails on s390 as reported by Yauheni in [1].
The immediate problem is that the zext code includes the instruction,
whose result needs to be zero-extended, into the zero-extension
patchlet, and if this instruction happens to be a branch, then its
delta is not adjusted. As a result, the verifier rejects the program
later.
However, according to [2], as far as the verifier's algorithm is
concerned and as specified by the insn_no_def() function, branching
insns do not define anything. This includes call insns, even though
one might argue that they define %r0.
This means that the real problem is that zero extension kicks in at
all. This happens because clear_caller_saved_regs() sets BPF_REG_0's
subreg_def after global function calls. This can be fixed in many
ways; this patch mimics what helper function call handling already
does.
[1] https://lore.kernel.org/bpf/20200903140542.156624-1-yauheni.kaliuta@redhat.com/
[2] https://lore.kernel.org/bpf/CAADnVQ+2RPKcftZw8d+B1UwB35cpBhpF5u3OocNh90D9pETPwg@mail.gmail.com/
Fixes: 51c39bb1d5d1 ("bpf: Introduce function-by-function verification")
Reported-by: Yauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210212040408.90109-1-iii@linux.ibm.com
|
|
* powercap:
powercap: intel_rapl: Use topology interface in rapl_init_domains()
powercap: intel_rapl: Use topology interface in rapl_add_package()
powercap/intel_rapl: add support for AlderLake Mobile
powercap/drivers/dtpm: Fix size of object being allocated
powercap/drivers/dtpm: Fix an IS_ERR() vs NULL check
powercap/drivers/dtpm: Fix some missing unlock bugs
powercap/drivers/dtpm: Fix a double shift bug
powercap/drivers/dtpm: Fix __udivdi3 and __aeabi_uldivmod unresolved symbols
powercap/drivers/dtpm: Add CPU energy model based support
powercap/drivers/dtpm: Add API for dynamic thermal power management
Documentation/powercap/dtpm: Add documentation for dtpm
units: Add Watt units
* pm-misc:
PM: Kconfig: remove unneeded "default n" options
PM: EM: update Kconfig description and drop "default n" option
|
|
* pm-sleep:
PM: sleep: Constify static struct attribute_group
PM: sleep: Use dev_printk() when possible
PM: sleep: No need to check PF_WQ_WORKER in thaw_kernel_threads()
* pm-core:
PM: runtime: Fix typos and grammar
PM: runtime: Fix resposible -> responsible in runtime.c
* pm-domains:
PM: domains: Simplify the calculation of variables
PM: domains: Add "performance" column to debug summary
PM: domains: Make of_genpd_add_subdomain() return -EPROBE_DEFER
PM: domains: Make set_performance_state() callback optional
PM: domains: use device's next wakeup to determine domain idle state
PM: domains: inform PM domain of a device's next wakeup
* pm-clk:
PM: clk: make PM clock layer compatible with clocks that must sleep
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"Two cgroup fixes:
- fix a NULL deref when trying to poll PSI in the root cgroup
- fix confusing controller parsing corner case when mounting cgroup
v1 hierarchies
And doc / maintainer file updates"
* 'for-5.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: update PSI file description in docs
cgroup: fix psi monitor for root cgroup
MAINTAINERS: Update my email address
MAINTAINERS: Remove stale URLs for cpuset
cgroup-v1: add disabled controller check in cgroup1_parse_param()
|
|
Add an ability to pass a pointer to a type with known size in arguments
of a global function. Such pointers may be used to overcome the limit on
the maximum number of arguments, avoid expensive and tricky workarounds
and to have multiple output arguments.
A referenced type may contain pointers but indirect access through them
isn't supported.
The implementation consists of two parts. If a global function has an
argument that is a pointer to a type with known size then:
1) In btf_check_func_arg_match(): check that the corresponding
register points to NULL or to a valid memory region that is large enough
to contain the expected argument's type.
2) In btf_prepare_func_args(): set the corresponding register type to
PTR_TO_MEM_OR_NULL and its size to the size of the expected type.
Only global functions are supported because allowance of pointers for
static functions might break validation. Consider the following
scenario. A static function has a pointer argument. A caller passes
pointer to its stack memory. Because the callee can change referenced
memory verifier cannot longer assume any particular slot type of the
caller's stack memory hence the slot type is changed to SLOT_MISC. If
there is an operation that relies on slot type other than SLOT_MISC then
verifier won't be able to infer safety of the operation.
When verifier sees a static function that has a pointer argument
different from PTR_TO_CTX then it skips arguments check and continues
with "inline" validation with more information available. The operation
that relies on the particular slot type now succeeds.
Because global functions were not allowed to have pointer arguments
different from PTR_TO_CTX it's not possible to break existing and valid
code.
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-4-me@ubique.spb.ru
|
|
Extract conversion from a register's nullable type to a type with a
value. The helper will be used in mark_ptr_not_null_reg().
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-3-me@ubique.spb.ru
|
|
Using "reg" for an array of bpf_reg_state and "reg[i + 1]" for an
individual bpf_reg_state is error-prone and verbose. Use "regs" for the
former and "reg" for the latter as other code nearby does.
Signed-off-by: Dmitrii Banshchikov <me@ubique.spb.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210212205642.620788-2-me@ubique.spb.ru
|
|
Recently noticed that when mod32 with a known src reg of 0 is performed,
then the dst register is 32-bit truncated in verifier:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = 0
1: R0_w=inv0 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b7) r1 = -1
2: R0_w=inv0 R1_w=inv-1 R10=fp0
2: (b4) w2 = -1
3: R0_w=inv0 R1_w=inv-1 R2_w=inv4294967295 R10=fp0
3: (9c) w1 %= w0
4: R0_w=inv0 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
4: (b7) r0 = 1
5: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
5: (1d) if r1 == r2 goto pc+1
R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
6: R0_w=inv1 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
6: (b7) r0 = 2
7: R0_w=inv2 R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2_w=inv4294967295 R10=fp0
7: (95) exit
7: R0=inv1 R1=inv(id=0,umin_value=4294967295,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R2=inv4294967295 R10=fp0
7: (95) exit
However, as a runtime result, we get 2 instead of 1, meaning the dst
register does not contain (u32)-1 in this case. The reason is fairly
straight forward given the 0 test leaves the dst register as-is:
# ./bpftool p d x i 23
0: (b7) r0 = 0
1: (b7) r1 = -1
2: (b4) w2 = -1
3: (16) if w0 == 0x0 goto pc+1
4: (9c) w1 %= w0
5: (b7) r0 = 1
6: (1d) if r1 == r2 goto pc+1
7: (b7) r0 = 2
8: (95) exit
This was originally not an issue given the dst register was marked as
completely unknown (aka 64 bit unknown). However, after 468f6eafa6c4
("bpf: fix 32-bit ALU op verification") the verifier casts the register
output to 32 bit, and hence it becomes 32 bit unknown. Note that for
the case where the src register is unknown, the dst register is marked
64 bit unknown. After the fix, the register is truncated by the runtime
and the test passes:
# ./bpftool p d x i 23
0: (b7) r0 = 0
1: (b7) r1 = -1
2: (b4) w2 = -1
3: (16) if w0 == 0x0 goto pc+2
4: (9c) w1 %= w0
5: (05) goto pc+1
6: (bc) w1 = w1
7: (b7) r0 = 1
8: (1d) if r1 == r2 goto pc+1
9: (b7) r0 = 2
10: (95) exit
Semantics also match with {R,W}x mod{64,32} 0 -> {R,W}x. Invalid div
has always been {R,W}x div{64,32} 0 -> 0. Rewrites are as follows:
mod32: mod64:
(16) if w0 == 0x0 goto pc+2 (15) if r0 == 0x0 goto pc+1
(9c) w1 %= w0 (9f) r1 %= r0
(05) goto pc+1
(bc) w1 = w1
Fixes: 468f6eafa6c4 ("bpf: fix 32-bit ALU op verification")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
The devmap bulk queue is allocated with GFP_ATOMIC and the allocation
may fail if there is no available space in existing percpu pool.
Since commit 75ccae62cb8d42 ("xdp: Move devmap bulk queue into struct net_device")
moved the bulk queue allocation to NETDEV_REGISTER callback, whose context
is allowed to sleep, use GFP_KERNEL instead of GFP_ATOMIC to let percpu
allocator extend the pool when needed and avoid possible failure of netdev
registration.
As the required alignment is natural, we can simply use alloc_percpu().
Fixes: 75ccae62cb8d42 ("xdp: Move devmap bulk queue into struct net_device")
Signed-off-by: Jun'ichi Nomura <junichi.nomura@nec.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210209082451.GA44021@jeru.linux.bs1.fc.nec.co.jp
|
|
Commit 15d83c4d7cef ("bpf: Allow loading of a bpf_iter program")
cached btf_id in struct bpf_iter_target_info so later on
if it can be checked cheaply compared to checking registered names.
syzbot found a bug that uninitialized value may occur to
bpf_iter_target_info->btf_id. This is because we allocated
bpf_iter_target_info structure with kmalloc and never initialized
field btf_id afterwards. This uninitialized btf_id is typically
compared to a u32 bpf program func proto btf_id, and the chance
of being equal is extremely slim.
This patch fixed the issue by using kzalloc which will also
prevent future likely instances due to adding new fields.
Fixes: 15d83c4d7cef ("bpf: Allow loading of a bpf_iter program")
Reported-by: syzbot+580f4f2a272e452d55cb@syzkaller.appspotmail.com
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210212005926.2875002-1-yhs@fb.com
|
|
task_file and task_vma iter programs have access to file->f_path. Enable
bpf_d_path to print paths of these file.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210212183107.50963-3-songliubraving@fb.com
|
|
Introduce task_vma bpf_iter to print memory information of a process. It
can be used to print customized information similar to /proc/<pid>/maps.
Current /proc/<pid>/maps and /proc/<pid>/smaps provide information of
vma's of a process. However, these information are not flexible enough to
cover all use cases. For example, if a vma cover mixed 2MB pages and 4kB
pages (x86_64), there is no easy way to tell which address ranges are
backed by 2MB pages. task_vma solves the problem by enabling the user to
generate customize information based on the vma (and vma->vm_mm,
vma->vm_file, etc.).
To access the vma safely in the BPF program, task_vma iterator holds
target mmap_lock while calling the BPF program. If the mmap_lock is
contended, task_vma unlocks mmap_lock between iterations to unblock the
writer(s). This lock contention avoidance mechanism is similar to the one
used in show_smaps_rollup().
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20210212183107.50963-2-songliubraving@fb.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix buffer overflow in trace event filter.
It was reported that if an trace event was larger than a page and was
filtered, that it caused memory corruption. The reason is that
filtered events first go into a buffer to test the filter before being
written into the ring buffer. Unfortunately, this write did not check
the size"
* tag 'trace-v5.11-rc7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Check length before giving out the filter buffer
|
|
The only usage of suspend_attr_group is to put its address in an
array of pointers to const attribute_group structs.
Make it const to allow the compiler to put it into read-only memory.
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Remove "default n" options. If the "default" line is removed, it
defaults to 'n'.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Energy Model supports now other devices like GPUs, DSPs, not only CPUs.
Thus, update the description in the config option. Remove also unneeded
"default n". If the "default" line is removed, it defaults to 'n'.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- kfree_rcu() updates: Addition of mem_dump_obj() to provide allocator return
addresses to more easily locate bugs. This has a couple of RCU-related commits,
but is mostly MM. Was pulled in with akpm's agreement.
- Per-callback-batch tracking of numbers of callbacks,
which enables better debugging information and smarter
reactions to large numbers of callbacks.
- The first round of changes to allow CPUs to be runtime switched from and to
callback-offloaded state.
- CONFIG_PREEMPT_RT-related changes.
- RCU CPU stall warning updates.
- Addition of polling grace-period APIs for SRCU.
- Torture-test and torture-test scripting updates, including a "torture everything"
script that runs rcutorture, locktorture, scftorture, rcuscale, and refscale.
Plus does an allmodconfig build.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into locking/core
Pull KCSAN updates from Paul E. McKenney:
"Kernel concurrency sanitizer (KCSAN) updates from Marco Elver."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
All 32-bit variants of BPF_FETCH (add, and, or, xor, xchg, cmpxchg)
define a 32-bit subreg and thus have zext_dst set. Their encoding,
however, uses dst_reg field as a base register, which causes
opt_subreg_zext_lo32_rnd_hi32() to zero-extend said base register
instead of the one the insn really defines (r0 or src_reg).
Fix by properly choosing a register being defined, similar to how
check_atomic() already does that.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210210204502.83429-1-iii@linux.ibm.com
|
|
bpf_prog_realloc copies contents of struct bpf_prog.
The pointers have to be cleared before freeing old struct.
Reported-by: Ilya Leoshkevich <iii@linux.ibm.com>
Fixes: 700d4796ef59 ("bpf: Optimize program stats")
Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
This needs a new helper that:
- can work in a sleepable context (using sock_gen_cookie)
- takes a struct sock pointer and checks that it's not NULL
Signed-off-by: Florent Revest <revest@chromium.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210111406.785541-2-revest@chromium.org
|
|
When filters are used by trace events, a page is allocated on each CPU and
used to copy the trace event fields to this page before writing to the ring
buffer. The reason to use the filter and not write directly into the ring
buffer is because a filter may discard the event and there's more overhead
on discarding from the ring buffer than the extra copy.
The problem here is that there is no check against the size being allocated
when using this page. If an event asks for more than a page size while being
filtered, it will get only a page, leading to the caller writing more that
what was allocated.
Check the length of the request, and if it is more than PAGE_SIZE minus the
header default back to allocating from the ring buffer directly. The ring
buffer may reject the event if its too big anyway, but it wont overflow.
Link: https://lore.kernel.org/ath10k/1612839593-2308-1-git-send-email-wgong@codeaurora.org/
Cc: stable@vger.kernel.org
Fixes: 0fc1b09ff1ff4 ("tracing: Use temp buffer when filtering events")
Reported-by: Wen Gong <wgong@codeaurora.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
Since sleepable programs are now executing under migrate_disable
the per-cpu maps are safe to use.
The map-in-map were ok to use in sleepable from the time sleepable
progs were introduced.
Note that non-preallocated maps are still not safe, since there is
no rcu_read_lock yet in sleepable programs and dynamically allocated
map elements are relying on rcu protection. The sleepable programs
have rcu_read_lock_trace instead. That limitation will be addresses
in the future.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-9-alexei.starovoitov@gmail.com
|
|
Add per-program counter for number of times recursion prevention mechanism
was triggered and expose it via show_fdinfo and bpf_prog_info.
Teach bpftool to print it.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-7-alexei.starovoitov@gmail.com
|
|
Since both sleepable and non-sleepable programs execute under migrate_disable
add recursion prevention mechanism to both types of programs when they're
executed via bpf trampoline.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-5-alexei.starovoitov@gmail.com
|
|
Since sleepable programs don't migrate from the cpu the excution stats can be
computed for them as well. Reuse the same infrastructure for both sleepable and
non-sleepable programs.
run_cnt -> the number of times the program was executed.
run_time_ns -> the program execution time in nanoseconds including the
off-cpu time when the program was sleeping.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-4-alexei.starovoitov@gmail.com
|
|
In older non-RT kernels migrate_disable() was the same as preempt_disable().
Since commit 74d862b682f5 ("sched: Make migrate_disable/enable() independent of RT")
migrate_disable() is real and doesn't prevent sleeping.
Running sleepable programs with migration disabled allows to add support for
program stats and per-cpu maps later.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-3-alexei.starovoitov@gmail.com
|
|
Move bpf_prog_stats from prog->aux into prog to avoid one extra load
in critical path of program execution.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210210033634.62081-2-alexei.starovoitov@gmail.com
|
|
For double-checked locking in bpf_common_lru_push_free(), node->type is
read outside the critical section and then re-checked under the lock.
However, concurrent writes to node->type result in data races.
For example, the following concurrent access was observed by KCSAN:
write to 0xffff88801521bc22 of 1 bytes by task 10038 on cpu 1:
__bpf_lru_node_move_in kernel/bpf/bpf_lru_list.c:91
__local_list_flush kernel/bpf/bpf_lru_list.c:298
...
read to 0xffff88801521bc22 of 1 bytes by task 10043 on cpu 0:
bpf_common_lru_push_free kernel/bpf/bpf_lru_list.c:507
bpf_lru_push_free kernel/bpf/bpf_lru_list.c:555
...
Fix the data races where node->type is read outside the critical section
(for double-checked locking) by marking the access with READ_ONCE() as
well as ensuring the variable is only accessed once.
Fixes: 3a08c2fd7634 ("bpf: LRU List")
Reported-by: syzbot+3536db46dfa58c573458@syzkaller.appspotmail.com
Reported-by: syzbot+516acdb03d3e27d91bcd@syzkaller.appspotmail.com
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20210209112701.3341724-1-elver@google.com
|
|
|
|
Pull networking fixes from David Miller:
"Another pile of networing fixes:
1) ath9k build error fix from Arnd Bergmann
2) dma memory leak fix in mediatec driver from Lorenzo Bianconi.
3) bpf int3 kprobe fix from Alexei Starovoitov.
4) bpf stackmap integer overflow fix from Bui Quang Minh.
5) Add usb device ids for Cinterion MV31 to qmi_qwwan driver, from
Christoph Schemmel.
6) Don't update deleted entry in xt_recent netfilter module, from
Jazsef Kadlecsik.
7) Use after free in nftables, fix from Pablo Neira Ayuso.
8) Header checksum fix in flowtable from Sven Auhagen.
9) Validate user controlled length in qrtr code, from Sabyrzhan
Tasbolatov.
10) Fix race in xen/netback, from Juergen Gross,
11) New device ID in cxgb4, from Raju Rangoju.
12) Fix ring locking in rxrpc release call, from David Howells.
13) Don't return LAPB error codes from x25_open(), from Xie He.
14) Missing error returns in gsi_channel_setup() from Alex Elder.
15) Get skb_copy_and_csum_datagram working properly with odd segment
sizes, from Willem de Bruijn.
16) Missing RFS/RSS table init in enetc driver, from Vladimir Oltean.
17) Do teardown on probe failure in DSA, from Vladimir Oltean.
18) Fix compilation failures of txtimestamp selftest, from Vadim
Fedorenko.
19) Limit rx per-napi gro queue size to fix latency regression, from
Eric Dumazet.
20) dpaa_eth xdp fixes from Camelia Groza.
21) Missing txq mode update when switching CBS off, in stmmac driver,
from Mohammad Athari Bin Ismail.
22) Failover pending logic fix in ibmvnic driver, from Sukadev
Bhattiprolu.
23) Null deref fix in vmw_vsock, from Norbert Slusarek.
24) Missing verdict update in xdp paths of ena driver, from Shay
Agroskin.
25) seq_file iteration fix in sctp from Neil Brown.
26) bpf 32-bit src register truncation fix on div/mod, from Daniel
Borkmann.
27) Fix jmp32 pruning in bpf verifier, from Daniel Borkmann.
28) Fix locking in vsock_shutdown(), from Stefano Garzarella.
29) Various missing index bound checks in hns3 driver, from Yufeng Mo.
30) Flush ports on .phylink_mac_link_down() in dsa felix driver, from
Vladimir Oltean.
31) Don't mix up stp and mrp port states in bridge layer, from Horatiu
Vultur.
32) Fix locking during netif_tx_disable(), from Edwin Peer"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
bpf: Fix 32 bit src register truncation on div/mod
bpf: Fix verifier jmp32 pruning decision logic
bpf: Fix verifier jsgt branch analysis on max bound
vsock: fix locking in vsock_shutdown()
net: hns3: add a check for index in hclge_get_rss_key()
net: hns3: add a check for tqp_index in hclge_get_ring_chain_from_mbx()
net: hns3: add a check for queue_id in hclge_reset_vf_queue()
net: dsa: felix: implement port flushing on .phylink_mac_link_down
switchdev: mrp: Remove SWITCHDEV_ATTR_ID_MRP_PORT_STAT
bridge: mrp: Fix the usage of br_mrp_port_switchdev_set_state
net: watchdog: hold device global xmit lock during tx disable
netfilter: nftables: relax check for stateful expressions in set definition
netfilter: conntrack: skip identical origin tuple in same zone only
vsock/virtio: update credit only if socket is not closed
net: fix iteration for sctp transport seq_files
net: ena: Update XDP verdict upon failure
net/vmw_vsock: improve locking in vsock_connect_timeout()
net/vmw_vsock: fix NULL pointer dereference
ibmvnic: Clear failover_pending if unable to schedule
net: stmmac: set TxQ mode back to DCB after disabling CBS
...
|
|
Before this patch, variable offset access to the stack was dissalowed
for regular instructions, but was allowed for "indirect" accesses (i.e.
helpers). This patch removes the restriction, allowing reading and
writing to the stack through stack pointers with variable offsets. This
makes stack-allocated buffers more usable in programs, and brings stack
pointers closer to other types of pointers.
The motivation is being able to use stack-allocated buffers for data
manipulation. When the stack size limit is sufficient, allocating
buffers on the stack is simpler than per-cpu arrays, or other
alternatives.
In unpriviledged programs, variable-offset reads and writes are
disallowed (they were already disallowed for the indirect access case)
because the speculative execution checking code doesn't support them.
Additionally, when writing through a variable-offset stack pointer, if
any pointers are in the accessible range, there's possilibities of later
leaking pointers because the write cannot be tracked precisely.
Writes with variable offset mark the whole range as initialized, even
though we don't know which stack slots are actually written. This is in
order to not reject future reads to these slots. Note that this doesn't
affect writes done through helpers; like before, helpers need the whole
stack range to be initialized to begin with.
All the stack slots are in range are considered scalars after the write;
variable-offset register spills are not tracked.
For reads, all the stack slots in the variable range needs to be
initialized (but see above about what writes do), otherwise the read is
rejected. All register spilled in stack slots that might be read are
marked as having been read, however reads through such pointers don't do
register filling; the target register will always be either a scalar or
a constant zero.
Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210207011027.676572-2-andreimatei1@gmail.com
|
|
There are not users of mutex_trylock_recursive() in tree as of
v5.11-rc7.
Remove it.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210210085248.219210-2-bigeasy@linutronix.de
|
|
vmlinux.o: warning: objtool: lock_is_held_type()+0x107: call to warn_bogus_irq_restore() leaves .noinstr.text section
As per the general rule that WARNs are allowed to violate noinstr to
get out, annotate it away.
Fixes: 997acaf6b4b5 ("lockdep: report broken irq restoration")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Link: https://lkml.kernel.org/r/YCKyYg53mMp4E7YI@hirez.programming.kicks-ass.net
|
|
While reviewing a different fix, John and I noticed an oddity in one of the
BPF program dumps that stood out, for example:
# bpftool p d x i 13
0: (b7) r0 = 808464450
1: (b4) w4 = 808464432
2: (bc) w0 = w0
3: (15) if r0 == 0x0 goto pc+1
4: (9c) w4 %= w0
[...]
In line 2 we noticed that the mov32 would 32 bit truncate the original src
register for the div/mod operation. While for the two operations the dst
register is typically marked unknown e.g. from adjust_scalar_min_max_vals()
the src register is not, and thus verifier keeps tracking original bounds,
simplified:
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = -1
1: R0_w=invP-1 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b7) r1 = -1
2: R0_w=invP-1 R1_w=invP-1 R10=fp0
2: (3c) w0 /= w1
3: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP-1 R10=fp0
3: (77) r1 >>= 32
4: R0_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1_w=invP4294967295 R10=fp0
4: (bf) r0 = r1
5: R0_w=invP4294967295 R1_w=invP4294967295 R10=fp0
5: (95) exit
processed 6 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Runtime result of r0 at exit is 0 instead of expected -1. Remove the
verifier mov32 src rewrite in div/mod and replace it with a jmp32 test
instead. After the fix, we result in the following code generation when
having dividend r1 and divisor r6:
div, 64 bit: div, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8
1: (b7) r1 = 8 1: (b7) r1 = 8
2: (55) if r6 != 0x0 goto pc+2 2: (56) if w6 != 0x0 goto pc+2
3: (ac) w1 ^= w1 3: (ac) w1 ^= w1
4: (05) goto pc+1 4: (05) goto pc+1
5: (3f) r1 /= r6 5: (3c) w1 /= w6
6: (b7) r0 = 0 6: (b7) r0 = 0
7: (95) exit 7: (95) exit
mod, 64 bit: mod, 32 bit:
0: (b7) r6 = 8 0: (b7) r6 = 8
1: (b7) r1 = 8 1: (b7) r1 = 8
2: (15) if r6 == 0x0 goto pc+1 2: (16) if w6 == 0x0 goto pc+1
3: (9f) r1 %= r6 3: (9c) w1 %= w6
4: (b7) r0 = 0 4: (b7) r0 = 0
5: (95) exit 5: (95) exit
x86 in particular can throw a 'divide error' exception for div
instruction not only for divisor being zero, but also for the case
when the quotient is too large for the designated register. For the
edx:eax and rdx:rax dividend pair it is not an issue in x86 BPF JIT
since we always zero edx (rdx). Hence really the only protection
needed is against divisor being zero.
Fixes: 68fda450a7df ("bpf: fix 32-bit divide by zero")
Co-developed-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
Anatoly has been fuzzing with kBdysch harness and reported a hang in
one of the outcomes:
func#0 @0
0: R1=ctx(id=0,off=0,imm=0) R10=fp0
0: (b7) r0 = 808464450
1: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R10=fp0
1: (b4) w4 = 808464432
2: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP808464432 R10=fp0
2: (9c) w4 %= w0
3: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0
3: (66) if w4 s> 0x30303030 goto pc+0
R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
4: (7f) r0 >>= r0
5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff),s32_max_value=808464432) R10=fp0
5: (9c) w4 %= w0
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
9: (95) exit
propagating r0
from 6 to 7: safe
4: R0_w=invP808464450 R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
4: (7f) r0 >>= r0
5: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0,umin_value=808464433,umax_value=2147483647,var_off=(0x0; 0x7fffffff)) R10=fp0
5: (9c) w4 %= w0
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
propagating r0
7: safe
propagating r0
from 6 to 7: safe
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
The underlying program was xlated as follows:
# bpftool p d x i 10
0: (b7) r0 = 808464450
1: (b4) w4 = 808464432
2: (bc) w0 = w0
3: (15) if r0 == 0x0 goto pc+1
4: (9c) w4 %= w0
5: (66) if w4 s> 0x30303030 goto pc+0
6: (7f) r0 >>= r0
7: (bc) w0 = w0
8: (15) if r0 == 0x0 goto pc+1
9: (9c) w4 %= w0
10: (66) if w0 s> 0x3030 goto pc+0
11: (d6) if w0 s<= 0x303030 goto pc+1
12: (05) goto pc-1
13: (95) exit
The verifier rewrote original instructions it recognized as dead code with
'goto pc-1', but reality differs from verifier simulation in that we are
actually able to trigger a hang due to hitting the 'goto pc-1' instructions.
Taking a closer look at the verifier analysis, the reason is that it misjudges
its pruning decision at the first 'from 6 to 7: safe' occasion. What happens
is that while both old/cur registers are marked as precise, they get misjudged
for the jmp32 case as range_within() yields true, meaning that the prior
verification path with a wider register bound could be verified successfully
and therefore the current path with a narrower register bound is deemed safe
as well whereas in reality it's not. R0 old/cur path's bounds compare as
follows:
old: smin_value=0x8000000000000000,smax_value=0x7fffffffffffffff,umin_value=0x0,umax_value=0xffffffffffffffff,var_off=(0x0; 0xffffffffffffffff)
cur: smin_value=0x8000000000000000,smax_value=0x7fffffff7fffffff,umin_value=0x0,umax_value=0xffffffff7fffffff,var_off=(0x0; 0xffffffff7fffffff)
old: s32_min_value=0x80000000,s32_max_value=0x00003030,u32_min_value=0x00000000,u32_max_value=0xffffffff
cur: s32_min_value=0x00003031,s32_max_value=0x7fffffff,u32_min_value=0x00003031,u32_max_value=0x7fffffff
The 64 bit bounds generally look okay and while the information that got
propagated from 32 to 64 bit looks correct as well, it's not precise enough
for judging a conditional jmp32. Given the latter only operates on subregisters
we also need to take these into account as well for a range_within() probe
in order to be able to prune paths. Extending the range_within() constraint
to both bounds will be able to tell us that the old signed 32 bit bounds are
not wider than the cur signed 32 bit bounds.
With the fix in place, the program will now verify the 'goto' branch case as
it should have been:
[...]
6: R0_w=invP(id=0) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
6: (66) if w0 s> 0x3030 goto pc+0
R0_w=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
9: R0=invP(id=0,s32_max_value=12336) R1=ctx(id=0,off=0,imm=0) R4=invP(id=0) R10=fp0
9: (95) exit
7: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=12337,u32_min_value=12337,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
7: (d6) if w0 s<= 0x303030 goto pc+1
R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
8: R0_w=invP(id=0,smax_value=9223372034707292159,umax_value=18446744071562067967,var_off=(0x0; 0xffffffff7fffffff),s32_min_value=3158065,u32_min_value=3158065,u32_max_value=2147483647) R1=ctx(id=0,off=0,imm=0) R4_w=invP(id=0) R10=fp0
8: (30) r0 = *(u8 *)skb[808464432]
BPF_LD_[ABS|IND] uses reserved fields
processed 11 insns (limit 1000000) max_states_per_insn 1 total_states 1 peak_states 1 mark_read 1
The bug is quite subtle in the sense that when verifier would determine that
a given branch is dead code, it would (here: wrongly) remove these instructions
from the program and hard-wire the taken branch for privileged programs instead
of the 'goto pc-1' rewrites which will cause hard to debug problems.
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Anatoly Trosinenko <anatoly.trosinenko@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
Fix incorrect is_branch{32,64}_taken() analysis for the jsgt case. The return
code for both will tell the caller whether a given conditional jump is taken
or not, e.g. 1 means branch will be taken [for the involved registers] and the
goto target will be executed, 0 means branch will not be taken and instead we
fall-through to the next insn, and last but not least a -1 denotes that it is
not known at verification time whether a branch will be taken or not. Now while
the jsgt has the branch-taken case correct with reg->s32_min_value > sval, the
branch-not-taken case is off-by-one when testing for reg->s32_max_value < sval
since the branch will also be taken for reg->s32_max_value == sval. The jgt
branch analysis, for example, gets this right.
Fixes: 3f50f132d840 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Fixes: 4f7b3e82589e ("bpf: improve verifier branch analysis")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix output of top level event tracing 'enable' file.
When writing a tool for enabling events in the tracing system, an
anomaly was discovered. The top level event 'enable' file would never
show '1' when all events were enabled.
The system and event 'enable' files worked as expected.
The reason was because the top level event 'enable' file included the
'ftrace' tracer events, which are not controlled by the 'enable' file
and would cause the output to be wrong. This appears to have been a
bug since it was created"
* tag 'trace-v5.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Do not count ftrace events in top level enable output
|