Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. All singletons, please see the
changelogs for details"
* tag 'mm-hotfixes-stable-2024-11-16-15-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: revert "mm: shmem: fix data-race in shmem_getattr()"
ocfs2: uncache inode which has failed entering the group
mm: fix NULL pointer dereference in alloc_pages_bulk_noprof
mm, doc: update read_ahead_kb for MADV_HUGEPAGE
fs/proc/task_mmu: prevent integer overflow in pagemap_scan_get_args()
sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers
crash, powerpc: default to CRASH_DUMP=n on PPC_BOOK3S_32
mm/mremap: fix address wraparound in move_page_tables()
tools/mm: fix compile error
mm, swap: fix allocation and scanning race with swapoff
|
|
Pull drm fixes from Dave Airlie:
"Final week of fixes, lots of small amdgpu fixes, some i915 and xe
fixes, the nouveau changes fix a recent regression and some laptop
panel black screens, then a couple of other misc ones.
It's probably a little busier than I'd like, but each fix seems fine.
amdgpu:
- PSR fix
- Panel replay fixes
- DML fix
- vblank power fix
- Fix video caps
- SMU 14.0 fix
- GPUVM fix
- MES 12 fix
- APU carve out fix
- DC vbios fix
- NBIO fix
i915:
- Don't load GSC on ARL-H and ARL-U if too old FW
- Avoid potential OOPS in enabling/disabling TV output
xe:
- Fix unlock on exec ioctl error path
- Fix hibernation on LNL due to ggtt getting lost
- Fix missing runtime PM in OA release
bridge:
- tc358768: Fix DSI command tx
nouveau:
- Fix GSP AUX error handling
- dp: Handle retires for AUX CH transfers with GSP
- fw: Sync DMA after setup
panthor:
- Fix partial BO mappings to GPU
rockchip:
- vop: Avoid null-ptr deref in plane-state check
vmwgfx:
- Avoid null-ptr deref in surface creation"
* tag 'drm-fixes-2024-11-16' of https://gitlab.freedesktop.org/drm/kernel: (27 commits)
drm/bridge: tc358768: Fix DSI command tx
drm/vmwgfx: avoid null_ptr_deref in vmw_framebuffer_surface_create_handle
nouveau/dp: handle retries for AUX CH transfers with GSP.
nouveau: handle EBUSY and EAGAIN for GSP aux errors.
nouveau: fw: sync dma after setup is called.
drm/xe/oa: Fix "Missing outer runtime PM protection" warning
drm/xe: handle flat ccs during hibernation on igpu
drm/xe: improve hibernation on igpu
drm/xe: Restore system memory GGTT mappings
drm/xe: Ensure all locks released in exec IOCTL
drm/panthor: Fix handling of partial GPU mapping of BOs
drm/amd: Fix initialization mistake for NBIO 7.7.0
Revert "drm/amd/display: parse umc_info or vram_info based on ASIC"
drm/amd/display: Fix failure to read vram info due to static BP_RESULT
drm/amdgpu: enable GTT fallback handling for dGPUs only
drm/i915: Grab intel_display from the encoder to avoid potential oopsies
drm/i915/gsc: ARL-H and ARL-U need a newer GSC FW.
drm/amdgpu/mes12: correct kiq unmap latency
drm/amdgpu: fix check in gmc_v9_0_get_vm_pte()
drm/amd/pm: print pp_dpm_mclk in ascending order on SMU v14.0.0
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm
Pull pmdomain fixes from Ulf Hansson:
"pmdomain core:
- Add GENPD_FLAG_DEV_NAME_FW flag to generate unique names
pmdomain providers:
- arm: Use FLAG_DEV_NAME_FW to ensure unique names
- imx93-blk-ctrl: Fix the remove path
arm_scmi/qcom-cpucp:
- Report duplicate OPPs as firmware bugs for arm_scmi
- Skip OPP duplicates for arm_scmi
- Mark the qcom-cpucp mailbox irq with IRQF_NO_SUSPEND flag"
* tag 'pmdomain-v6.12-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm:
mailbox: qcom-cpucp: Mark the irq with IRQF_NO_SUSPEND flag
firmware: arm_scmi: Report duplicate opps as firmware bugs
firmware: arm_scmi: Skip opp duplicates
pmdomain: imx93-blk-ctrl: correct remove path
pmdomain: arm: Use FLAG_DEV_NAME_FW to ensure unique names
pmdomain: core: Add GENPD_FLAG_DEV_NAME_FW flag
|
|
When CONFIG_KASAN_SW_TAGS and CONFIG_KASAN_STACK are enabled, the
object_is_on_stack() function may produce incorrect results due to the
presence of tags in the obj pointer, while the stack pointer does not have
tags. This discrepancy can lead to incorrect stack object detection and
subsequently trigger warnings if CONFIG_DEBUG_OBJECTS is also enabled.
Example of the warning:
ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:557 __debug_object_init+0x330/0x364
Modules linked in:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5 #4
Hardware name: linux,dummy-virt (DT)
pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : __debug_object_init+0x330/0x364
lr : __debug_object_init+0x330/0x364
sp : ffff800082ea7b40
x29: ffff800082ea7b40 x28: 98ff0000c0164518 x27: 98ff0000c0164534
x26: ffff800082d93ec8 x25: 0000000000000001 x24: 1cff0000c00172a0
x23: 0000000000000000 x22: ffff800082d93ed0 x21: ffff800081a24418
x20: 3eff800082ea7bb0 x19: efff800000000000 x18: 0000000000000000
x17: 00000000000000ff x16: 0000000000000047 x15: 206b63617473206e
x14: 0000000000000018 x13: ffff800082ea7780 x12: 0ffff800082ea78e
x11: 0ffff800082ea790 x10: 0ffff800082ea79d x9 : 34d77febe173e800
x8 : 34d77febe173e800 x7 : 0000000000000001 x6 : 0000000000000001
x5 : feff800082ea74b8 x4 : ffff800082870a90 x3 : ffff80008018d3c4
x2 : 0000000000000001 x1 : ffff800082858810 x0 : 0000000000000050
Call trace:
__debug_object_init+0x330/0x364
debug_object_init_on_stack+0x30/0x3c
schedule_hrtimeout_range_clock+0xac/0x26c
schedule_hrtimeout+0x1c/0x30
wait_task_inactive+0x1d4/0x25c
kthread_bind_mask+0x28/0x98
init_rescuer+0x1e8/0x280
workqueue_init+0x1a0/0x3cc
kernel_init_freeable+0x118/0x200
kernel_init+0x28/0x1f0
ret_from_fork+0x10/0x20
---[ end trace 0000000000000000 ]---
ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
------------[ cut here ]------------
Link: https://lkml.kernel.org/r/20241113042544.19095-1-qun-wei.lin@mediatek.com
Signed-off-by: Qun-Wei Lin <qun-wei.lin@mediatek.com>
Cc: Andrew Yang <andrew.yang@mediatek.com>
Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Cc: Casper Li <casper.li@mediatek.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chinwen Chang <chinwen.chang@mediatek.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Matthias Brugger <matthias.bgg@gmail.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bluetooth.
Quite calm week. No new regression under investigation.
Current release - regressions:
- eth: revert "igb: Disable threaded IRQ for igb_msix_other"
Current release - new code bugs:
- bluetooth: btintel: direct exception event to bluetooth stack
Previous releases - regressions:
- core: fix data-races around sk->sk_forward_alloc
- netlink: terminate outstanding dump on socket close
- mptcp: error out earlier on disconnect
- vsock: fix accept_queue memory leak
- phylink: ensure PHY momentary link-fails are handled
- eth: mlx5:
- fix null-ptr-deref in add rule err flow
- lock FTE when checking if active
- eth: dwmac-mediatek: fix inverted handling of mediatek,mac-wol
Previous releases - always broken:
- sched: fix u32's systematic failure to free IDR entries for hnodes.
- sctp: fix possible UAF in sctp_v6_available()
- eth: bonding: add ns target multicast address to slave device
- eth: mlx5: fix msix vectors to respect platform limit
- eth: icssg-prueth: fix 1 PPS sync"
* tag 'net-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits)
net: sched: u32: Add test case for systematic hnode IDR leaks
selftests: bonding: add ns multicast group testing
bonding: add ns target multicast address to slave device
net: ti: icssg-prueth: Fix 1 PPS sync
stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines
net: Make copy_safe_from_sockptr() match documentation
net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol
ipmr: Fix access to mfc_cache_list without lock held
samples: pktgen: correct dev to DEV
net: phylink: ensure PHY momentary link-fails are handled
mptcp: pm: use _rcu variant under rcu_read_lock
mptcp: hold pm lock when deleting entry
mptcp: update local address flags when setting it
net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
MAINTAINERS: Re-add cancelled Renesas driver sections
Revert "igb: Disable threaded IRQ for igb_msix_other"
Bluetooth: btintel: Direct exception event to bluetooth stack
Bluetooth: hci_core: Fix calling mgmt_device_connected
virtio/vsock: Improve MSG_ZEROCOPY error handling
vsock: Fix sk_error_queue memory leak
...
|
|
Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves")
tried to resolve the issue where backup slaves couldn't be brought up when
receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only
worked for drivers that receive all multicast messages, such as the veth
interface.
For standard drivers, the NS multicast message is silently dropped because
the slave device is not a member of the NS target multicast group.
To address this, we need to make the slave device join the NS target
multicast group, ensuring it can receive these IPv6 NS messages to validate
the slave’s status properly.
There are three policies before joining the multicast group:
1. All settings must be under active-backup mode (alb and tlb do not support
arp_validate), with backup slaves and slaves supporting multicast.
2. We can add or remove multicast groups when arp_validate changes.
3. Other operations, such as enslaving, releasing, or setting NS targets,
need to be guarded by arp_validate.
Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
copy_safe_from_sockptr()
return copy_from_sockptr()
return copy_from_sockptr_offset()
return copy_from_user()
copy_from_user() does not return an error on fault. Instead, it returns a
number of bytes that were not copied. Have it handled.
Patch has a side effect: it un-breaks garbage input handling of
nfc_llcp_setsockopt() and mISDN's data_sock_setsockopt().
Fixes: 6309863b31dd ("net: add copy_safe_from_sockptr() helper")
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Link: https://patch.msgid.link/20241111-sockptr-copy-ret-fix-v1-1-a520083a93fb@rbox.co
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fixes from Jarkko Sakkinen:
"Two bug fixes for TPM bus encryption (the remaining reported issues in
the feature)"
* tag 'tpmdd-next-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm: Disable TPM on tpm2_create_primary() failure
tpm: Opt-in in disable PCR integrity protection
|
|
The initial HMAC session feature added TPM bus encryption and/or integrity
protection to various in-kernel TPM operations. This can cause performance
bottlenecks with IMA, as it heavily utilizes PCR extend operations.
In order to mitigate this performance issue, introduce a kernel
command-line parameter to the TPM driver for disabling the integrity
protection for PCR extend operations (i.e. TPM2_PCR_Extend).
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Link: https://lore.kernel.org/linux-integrity/20241015193916.59964-1-zohar@linux.ibm.com/
Fixes: 6519fea6fd37 ("tpm: add hmac checks to tpm2_pcr_extend()")
Tested-by: Mimi Zohar <zohar@linux.ibm.com>
Co-developed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Co-developed-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
Pull bpf fixes from Daniel Borkmann:
- Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye)
- Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS
context retrieval (Zijian Zhang)
- Fix BPF bits iterator selftest for s390x (Hou Tao)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6
bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx
selftests/bpf: Use -4095 as the bad address for bits iterator
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"10 hotfixes, 7 of which are cc:stable. 7 are MM, 3 are not. All
singletons"
* tag 'mm-hotfixes-stable-2024-11-12-16-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: swapfile: fix cluster reclaim work crash on rotational devices
selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
mm/thp: fix deferred split queue not partially_mapped: fix
mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases
nommu: pass NULL argument to vma_iter_prealloc()
ocfs2: fix UBSAN warning in ocfs2_verify_volume()
nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
mm: page_alloc: move mlocked flag clearance into free_pages_prepare()
mm: count zeromap read and set for swapout and swapin
|
|
All MTL and ARL SKUs share the same GSC FW, but the newer platforms are
only supported in newer blobs. In particular, ARL-S is supported
starting from 102.0.10.1878 (which is already the minimum required
version for ARL in the code), while ARL-H and ARL-U are supported from
102.1.15.1926. Therefore, the driver needs to check which specific ARL
subplatform its running on when verifying that the GSC FW is new enough
for it.
Fixes: 2955ae8186c8 ("drm/i915: ARL requires a newer GSC firmware")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241028233132.149745-1-daniele.ceraolospurio@intel.com
(cherry picked from commit 3c1d5ced18db8a67251c8436cf9bdc061f972bdb)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
When the proportion of folios from the zeromap is small, missing their
accounting may not significantly impact profiling. However, it's easy to
construct a scenario where this becomes an issue—for example, allocating
1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT,
and then swapping it back in. In this case, the swap-out and swap-in
counts seem to vanish into a black hole, potentially causing semantic
ambiguity.
On the other hand, Usama reported that zero-filled pages can exceed 10% in
workloads utilizing zswap, while Hailong noted that some app in Android
have more than 6% zero-filled pages. Before commit 0ca0c24e3211 ("mm:
store zero pages to be swapped out in a bitmap"), both zswap and zRAM
implemented similar optimizations, leading to these optimized-out pages
being counted in either zswap or zRAM counters (with pswpin/pswpout also
increasing for zRAM). With zeromap functioning prior to both zswap and
zRAM, userspace will no longer detect these swap-out and swap-in actions.
We have three ways to address this:
1. Introduce a dedicated counter specifically for the zeromap.
2. Use pswpin/pswpout accounting, treating the zero map as a standard
backend. This approach aligns with zRAM's current handling of
same-page fills at the device level. However, it would mean losing the
optimized-out page counters previously available in zRAM and would not
align with systems using zswap. Additionally, as noted by Nhat Pham,
pswpin/pswpout counters apply only to I/O done directly to the backend
device.
3. Count zeromap pages under zswap, aligning with system behavior when
zswap is enabled. However, this would not be consistent with zRAM, nor
would it align with systems lacking both zswap and zRAM.
Given the complications with options 2 and 3, this patch selects
option 1.
We can find these counters from /proc/vmstat (counters for the whole
system) and memcg's memory.stat (counters for the interested memcg).
For example:
$ grep -E 'swpin_zero|swpout_zero' /proc/vmstat
swpin_zero 1648
swpout_zero 33536
$ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat
swpin_zero 3905
swpout_zero 3985
This patch does not address any specific zeromap bug, but the missing
swpout and swpin counts for zero-filled pages can be highly confusing and
may mislead user-space agents that rely on changes in these counters as
indicators. Therefore, we add a Fixes tag to encourage the inclusion of
this counter in any kernel versions with zeromap.
Many thanks to Kanchana for the contribution of changing
count_objcg_event() to count_objcg_events() to support large folios[1],
which has now been incorporated into this patch.
[1] https://lkml.kernel.org/r/20241001053222.6944-5-kanchana.p.sridhar@intel.com
Link: https://lkml.kernel.org/r/20241107011246.59137-1-21cnbao@gmail.com
Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap")
Co-developed-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Hailong Liu <hailong.liu@oppo.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kairui Song <kasong@tencent.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"20 hotfixes, 14 of which are cc:stable.
Three affect DAMON. Lorenzo's five-patch series to address the
mmap_region error handling is here also.
Apart from that, various singletons"
* tag 'mm-hotfixes-stable-2024-11-09-22-40' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mailmap: add entry for Thorsten Blum
ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
signal: restore the override_rlimit logic
fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
ucounts: fix counter leak in inc_rlimit_get_ucounts()
selftests: hugetlb_dio: check for initial conditions to skip in the start
mm: fix docs for the kernel parameter ``thp_anon=``
mm/damon/core: avoid overflow in damon_feed_loop_next_input()
mm/damon/core: handle zero schemes apply interval
mm/damon/core: handle zero {aggregation,ops_update} intervals
mm/mlock: set the correct prev on failure
objpool: fix to make percpu slot allocation more robust
mm/page_alloc: keep track of free highatomic
mm: resolve faulty mmap_region() error path behaviour
mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
mm: refactor map_deny_write_exec()
mm: unconditionally close VMAs on error
mm: avoid unsafe VMA hook invocation when error arises on mmap hook
mm/thp: fix deferred split unqueue naming and locking
mm/thp: fix deferred split queue not partially_mapped
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Fix the ACPI processor driver initialization ordering after recent
changes to avoid calling init_freq_invariance_cppc() too early on AMD
platforms (Mario Limonciello)"
* tag 'acpi-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: processor: Move arch_init_invariance_cppc() call later
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Here is a (hopefully) final round of arm64 fixes for 6.12 that address
some user-visible floating point register corruption. Both of the
Marks have been working on this for a couple of weeks and we've ended
up in a position where SVE is solid but SME still has enough pending
issues that the most pragmatic solution for the release and stable
backports is to disable the feature. Yes, it's a shame, but the
hardware is rare as hen's teeth at the moment and we're better off
getting back to a known good state before fixing it all properly.
We're also improving the selftests for 6.13 to help avoid merging
broken code in the future.
Anyway, the good news is that we're removing a lot more code than
we're adding.
Summary:
- Fix handling of SVE traps from userspace on preemptible kernels
when converting the saved floating point state into SVE state.
- Remove broken support for the SMCCCv1.3 "SVE discard hint"
optimisation.
- Disable SME support, as the current support code suffers from
numerous issues around signal delivery, ptrace access and
context-switch which can lead to user-visible corruption of the
register state"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Kconfig: Make SME depend on BROKEN for now
arm64: smccc: Remove broken support for SMCCCv1.3 SVE discard hint
arm64/sve: Discard stale CPU state when handling SVE traps
|
|
Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of
ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of
signals. However now it's enforced unconditionally, even if
override_rlimit is set. This behavior change caused production issues.
For example, if the limit is reached and a process receives a SIGSEGV
signal, sigqueue_alloc fails to allocate the necessary resources for the
signal delivery, preventing the signal from being delivered with siginfo.
This prevents the process from correctly identifying the fault address and
handling the error. From the user-space perspective, applications are
unaware that the limit has been reached and that the siginfo is
effectively 'corrupted'. This can lead to unpredictable behavior and
crashes, as we observed with java applications.
Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip
the comparison to max there if override_rlimit is set. This effectively
restores the old behavior.
Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev
Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts")
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Co-developed-by: Andrei Vagin <avagin@google.com>
Signed-off-by: Andrei Vagin <avagin@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Alexey Gladkov <legion@kernel.org>
Cc: Kees Cook <kees@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
OOM kills due to vastly overestimated free highatomic reserves were
observed:
... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ...
Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ...
Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB
The second line above shows that the OOM kill was due to the following
condition:
free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB)
And the third line shows there were no free pages in any
MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'.
Therefore __zone_watermark_unusable_free() underestimated the usable free
memory by over 1GB, which resulted in the unnecessary OOM kill above.
The comments in __zone_watermark_unusable_free() warns about the potential
risk, i.e.,
If the caller does not have rights to reserves below the min
watermark then subtract the high-atomic reserves. This will
over-estimate the size of the atomic reserve but it avoids a search.
However, it is possible to keep track of free pages in reserved highatomic
pageblocks with a new per-zone counter nr_free_highatomic protected by the
zone lock, to avoid a search when calculating the usable free memory. And
the cost would be minimal, i.e., simple arithmetics in the highatomic
alloc/free/move paths.
Note that since nr_free_highatomic can be relatively small, using a
per-cpu counter might cause too much drift and defeat its purpose, in
addition to the extra memory overhead.
Dependson e0932b6c1f94 ("mm: page_alloc: consolidate free page accounting") - see [1]
[akpm@linux-foundation.org: s/if/else if/, per Johannes, stealth whitespace tweak]
Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com
Link: https://lkml.kernel.org/r/0d0ddb33-fcdc-43e2-801f-0c1df2031afb@suse.cz [1]
Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Reported-by: Link Lin <linkl@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from can and netfilter.
Things are slowing down quite a bit, mostly driver fixes here. No
known ongoing investigations.
Current release - new code bugs:
- eth: ti: am65-cpsw:
- fix multi queue Rx on J7
- fix warning in am65_cpsw_nuss_remove_rx_chns()
Previous releases - regressions:
- mptcp: do not require admin perm to list endpoints, got missed in a
refactoring
- mptcp: use sock_kfree_s instead of kfree
Previous releases - always broken:
- sctp: properly validate chunk size in sctp_sf_ootb() fix OOB access
- virtio_net: make RSS interact properly with queue number
- can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation
- can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing
configuration when switching CAN modes
Misc:
- revert earlier hns3 fixes, they were ignoring IOMMU abstractions
and need to be reworked
- can: {cc770,sja1000}_isa: allow building on x86_64"
* tag 'net-6.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (42 commits)
drivers: net: ionic: add missed debugfs cleanup to ionic_probe() error path
net/smc: do not leave a dangling sk pointer in __smc_create()
rxrpc: Fix missing locking causing hanging calls
net/smc: Fix lookup of netdev by using ib_device_get_netdev()
net: arc: rockchip: fix emac mdio node support
net: arc: fix the device for dma_map_single/dma_unmap_single
virtio_net: Update rss when set queue
virtio_net: Sync rss config to device when virtnet_probe
virtio_net: Add hash_key_length check
virtio_net: Support dynamic rss indirection table size
netfilter: nf_tables: wait for rcu grace period on net_device removal
net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case
net: vertexcom: mse102x: Fix possible double free of TX skb
mptcp: use sock_kfree_s instead of kfree
mptcp: no admin perm to list endpoints
net: phy: ti: add PHY_RST_AFTER_CLK_EN flag
net: ethernet: ti: am65-cpsw: fix warning in am65_cpsw_nuss_remove_rx_chns()
net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7
net: hns3: fix kernel crash when uninstalling driver
Revert "Merge branch 'there-are-some-bugfix-for-the-hns3-ethernet-driver'"
...
|
|
If a call gets aborted (e.g. because kafs saw a signal) between it being
queued for connection and the I/O thread picking up the call, the abort
will be prioritised over the connection and it will be removed from
local->new_client_calls by rxrpc_disconnect_client_call() without a lock
being held. This may cause other calls on the list to disappear if a race
occurs.
Fix this by taking the client_call_lock when removing a call from whatever
list its ->wait_link happens to be on.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Fixes: 9d35d880e0e4 ("rxrpc: Move client call connection to the I/O thread")
Link: https://patch.msgid.link/726660.1730898202@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter fix for net
The following series contains a Netfilter fix:
1) Wait for rcu grace period after netdevice removal is reported via event.
* tag 'nf-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: wait for rcu grace period on net_device removal
====================
Link: https://patch.msgid.link/20241107113212.116634-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
8c873e219970 ("netfilter: core: free hooks with call_rcu") removed
synchronize_net() call when unregistering basechain hook, however,
net_device removal event handler for the NFPROTO_NETDEV was not updated
to wait for RCU grace period.
Note that 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks
on net_device removal") does not remove basechain rules on device
removal, I was hinted to remove rules on net_device removal later, see
5ebe0b0eec9d ("netfilter: nf_tables: destroy basechain and rules on
netdevice removal").
Although NETDEV_UNREGISTER event is guaranteed to be handled after
synchronize_net() call, this path needs to wait for rcu grace period via
rcu callback to release basechain hooks if netns is alive because an
ongoing netlink dump could be in progress (sockets hold a reference on
the netns).
Note that nf_tables_pre_exit_net() unregisters and releases basechain
hooks but it is possible to see NETDEV_UNREGISTER at a later stage in
the netns exit path, eg. veth peer device in another netns:
cleanup_net()
default_device_exit_batch()
unregister_netdevice_many_notify()
notifier_call_chain()
nf_tables_netdev_event()
__nft_release_basechain()
In this particular case, same rule of thumb applies: if netns is alive,
then wait for rcu grace period because netlink dump in the other netns
could be in progress. Otherwise, if the other netns is going away then
no netlink dump can be in progress and basechain hooks can be released
inmediately.
While at it, turn WARN_ON() into WARN_ON_ONCE() for the basechain
validation, which should not ever happen.
Fixes: 835b803377f5 ("netfilter: nf_tables_netdev: unregister hooks on net_device removal")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
SMCCCv1.3 added a hint bit which callers can set in an SMCCC function ID
(AKA "FID") to indicate that it is acceptable for the SMCCC
implementation to discard SVE and/or SME state over a specific SMCCC
call. The kernel support for using this hint is broken and SMCCC calls
may clobber the SVE and/or SME state of arbitrary tasks, though FPSIMD
state is unaffected.
The kernel support is intended to use the hint when there is no SVE or
SME state to save, and to do this it checks whether TIF_FOREIGN_FPSTATE
is set or TIF_SVE is clear in assembly code:
| ldr <flags>, [<current_task>, #TSK_TI_FLAGS]
| tbnz <flags>, #TIF_FOREIGN_FPSTATE, 1f // Any live FP state?
| tbnz <flags>, #TIF_SVE, 2f // Does that state include SVE?
|
| 1: orr <fid>, <fid>, ARM_SMCCC_1_3_SVE_HINT
| 2:
| << SMCCC call using FID >>
This is not safe as-is:
(1) SMCCC calls can be made in a preemptible context and preemption can
result in TIF_FOREIGN_FPSTATE being set or cleared at arbitrary
points in time. Thus checking for TIF_FOREIGN_FPSTATE provides no
guarantee.
(2) TIF_FOREIGN_FPSTATE only indicates that the live FP/SVE/SME state in
the CPU does not belong to the current task, and does not indicate
that clobbering this state is acceptable.
When the live CPU state is clobbered it is necessary to update
fpsimd_last_state.st to ensure that a subsequent context switch will
reload FP/SVE/SME state from memory rather than consuming the
clobbered state. This and the SMCCC call itself must happen in a
critical section with preemption disabled to avoid races.
(3) Live SVE/SME state can exist with TIF_SVE clear (e.g. with only
TIF_SME set), and checking TIF_SVE alone is insufficient.
Remove the broken support for the SMCCCv1.3 SVE saving hint. This is
effectively a revert of commits:
* cfa7ff959a78 ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint")
* a7c3acca5380 ("arm64: smccc: Save lr before calling __arm_smccc_sve_check()")
... leaving behind the ARM_SMCCC_VERSION_1_3 and ARM_SMCCC_1_3_SVE_HINT
definitions, since these are simply definitions from the SMCCC
specification, and the latter is used in KVM via ARM_SMCCC_CALL_HINTS.
If we want to bring this back in future, we'll probably want to handle
this logic in C where we can use all the usual FPSIMD/SVE/SME helper
functions, and that'll likely require some rework of the SMCCC code
and/or its callers.
Fixes: cfa7ff959a78 ("arm64: smccc: Support SMCCC v1.3 SVE register saving hint")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: stable@vger.kernel.org
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241106160448.2712997-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Pull NFS client fixes from Anna Schumaker:
"These are mostly fixes that came up during the nfs bakeathon the other
week.
Stable Fixes:
- Fix KMSAN warning in decode_getfattr_attrs()
Other Bugfixes:
- Handle -ENOTCONN in xs_tcp_setup_socked()
- NFSv3: only use NFS timeout for MOUNT when protocols are compatible
- Fix attribute delegation behavior on exclusive create and a/mtime
changes
- Fix localio to cope with racing nfs_local_probe()
- Avoid i_lock contention in fs_clear_invalid_mapping()"
* tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: avoid i_lock contention in nfs_clear_invalid_mapping
nfs_common: fix localio to cope with racing nfs_local_probe()
NFS: Further fixes to attribute delegation a/mtime changes
NFS: Fix attribute delegation behaviour on exclusive create
nfs: Fix KMSAN warning in decode_getfattr_attrs()
NFSv3: only use NFS timeout for MOUNT when protocols are compatible
sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()
|
|
arch_init_invariance_cppc() is called at the end of
acpi_cppc_processor_probe() in order to configure frequency invariance
based upon the values from _CPC.
This however doesn't work on AMD CPPC shared memory designs that have
AMD preferred cores enabled because _CPC needs to be analyzed from all
cores to judge if preferred cores are enabled.
This issue manifests to users as a warning since commit 21fb59ab4b97
("ACPI: CPPC: Adjust debug messages in amd_set_max_freq_ratio() to warn"):
```
Could not retrieve highest performance (-19)
```
However the warning isn't the cause of this, it was actually
commit 279f838a61f9 ("x86/amd: Detect preferred cores in
amd_get_boost_ratio_numerator()") which exposed the issue.
To fix this problem, change arch_init_invariance_cppc() into a new weak
symbol that is called at the end of acpi_processor_driver_init().
Each architecture that supports it can declare the symbol to override
the weak one.
Define it for x86, in arch/x86/kernel/acpi/cppc.c, and for all of the
architectures using the generic arch_topology.c code.
Fixes: 279f838a61f9 ("x86/amd: Detect preferred cores in amd_get_boost_ratio_numerator()")
Reported-by: Ivan Shapovalov <intelfx@intelfx.name>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219431
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://patch.msgid.link/20241104222855.3959267-1-superm1@kernel.org
[ rjw: Changelog edit ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
As the introduction of the support for vsock and unix sockets in sockmap,
tls_sw_has_ctx_tx/rx cannot presume the socket passed in must be IS_ICSK.
vsock and af_unix sockets have vsock_sock and unix_sock instead of
inet_connection_sock. For these sockets, tls_get_ctx may return an invalid
pointer and cause page fault in function tls_sw_ctx_rx.
BUG: unable to handle page fault for address: 0000000000040030
Workqueue: vsock-loopback vsock_loopback_work
RIP: 0010:sk_psock_strp_data_ready+0x23/0x60
Call Trace:
? __die+0x81/0xc3
? no_context+0x194/0x350
? do_page_fault+0x30/0x110
? async_page_fault+0x3e/0x50
? sk_psock_strp_data_ready+0x23/0x60
virtio_transport_recv_pkt+0x750/0x800
? update_load_avg+0x7e/0x620
vsock_loopback_work+0xd0/0x100
process_one_work+0x1a7/0x360
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? __kthread_cancel_work+0x40/0x40
ret_from_fork+0x1f/0x40
v2:
- Add IS_ICSK check
v3:
- Update the commits in Fixes
Fixes: 634f1a7110b4 ("vsock: support sockmap")
Fixes: 94531cfcbe79 ("af_unix: Add unix_stream_proto for sockmap")
Signed-off-by: Zijian Zhang <zijianzhang@bytedance.com>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20241106003742.399240-1-zijianzhang@bytedance.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
|
|
Currently MTE is permitted in two circumstances (desiring to use MTE
having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is
specified, as checked by arch_calc_vm_flag_bits() and actualised by
setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is
shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap
hook is activated in mmap_region().
The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also
set is the arm64 implementation of arch_validate_flags().
Unfortunately, we intend to refactor mmap_region() to perform this check
earlier, meaning that in the case of a shmem backing we will not have
invoked shmem_mmap() yet, causing the mapping to fail spuriously.
It is inappropriate to set this architecture-specific flag in general mm
code anyway, so a sensible resolution of this issue is to instead move the
check somewhere else.
We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via
the arch_calc_vm_flag_bits() call.
This is an appropriate place to do this as we already check for the
MAP_ANONYMOUS case here, and the shmem file case is simply a variant of
the same idea - we permit RAM-backed memory.
This requires a modification to the arch_calc_vm_flag_bits() signature to
pass in a pointer to the struct file associated with the mapping, however
this is not too egregious as this is only used by two architectures anyway
- arm64 and parisc.
So this patch performs this adjustment and removes the unnecessary
assignment of VM_MTE_ALLOWED in shmem_mmap().
[akpm@linux-foundation.org: fix whitespace, per Catalin]
Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Refactor the map_deny_write_exec() to not unnecessarily require a VMA
parameter but rather to accept VMA flags parameters, which allows us to
use this function early in mmap_region() in a subsequent commit.
While we're here, we refactor the function to be more readable and add
some additional documentation.
Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.1730224667.git.lorenzo.stoakes@oracle.com
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Jann Horn <jannh@google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Jann Horn <jannh@google.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC fixes from Arnd Bergmann:
"Where the last set of fixes was mostly drivers, this time the
devicetree changes all come at once, targeting mostly the Rockchips,
Qualcomm and NXP platforms.
The Qualcomm bugfixes target the Snapdragon X Elite laptops,
specifically problems with PCIe and NVMe support to improve
reliability, and a boot regresion on msm8939.
Also for Snapdragon platforms, there are a number of correctness
changes in the several platform specific device drivers, but none of
these are as impactful.
On the NXP i.MX platform, the fixes are all for 64-bit i.MX8 variants,
correcting individual entries in the devicetree that were incorrect
and causing the media, video, mmc and spi drivers to misbehave in
minor ways.
The Arm SCMI firmware driver gets fixes for a use-after-free bug and
for correctly parsing firmware information.
On the RISC-V side, there are three minor devicetree fixes for
starfive and sophgo, again addressing only minor mistakes. One device
driver patch fixes a problem with spurious interrupt handling"
* tag 'arm-fixes-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (63 commits)
firmware: arm_scmi: Use vendor string in max-rx-timeout-ms
dt-bindings: firmware: arm,scmi: Add missing vendor string
riscv: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices
arm64: dts: rockchip: Correct GPIO polarity on brcm BT nodes
arm64: dts: rockchip: Drop invalid clock-names from es8388 codec nodes
ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin
ARM: dts: rockchip: Fix the spi controller on rk3036
ARM: dts: rockchip: drop grf reference from rk3036 hdmi
ARM: dts: rockchip: fix rk3036 acodec node
arm64: dts: rockchip: remove orphaned pinctrl-names from pinephone pro
soc: qcom: pmic_glink: Handle GLINK intent allocation rejections
rpmsg: glink: Handle rejected intent request better
arm64: dts: qcom: x1e80100: fix PCIe5 interconnect
arm64: dts: qcom: x1e80100: fix PCIe4 interconnect
arm64: dts: qcom: x1e80100: Fix up BAR spaces
MAINTAINERS: invert Misc RISC-V SoC Support's pattern
soc: qcom: socinfo: fix revision check in qcom_socinfo_probe()
arm64: dts: qcom: x1e80100-qcp: fix nvme regulator boot glitch
arm64: dts: qcom: x1e80100-microsoft-romulus: fix nvme regulator boot glitch
arm64: dts: qcom: x1e80100-yoga-slim7x: fix nvme regulator boot glitch
...
|
|
Fix the possibility of racing nfs_local_probe() resulting in:
list_add double add: new=ffff8b99707f9f58, prev=ffff8b99707f9f58, next=ffffffffc0f30000.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:35!
Add nfs_uuid_init() to properly initialize all nfs_uuid_t members
(particularly its list_head).
Switch to returning bool from nfs_uuid_begin(), returns false if
nfs_uuid_t is already in-use (its list_head is on a list). Update
nfs_local_probe() to return early if the nfs_client's cl_uuid
(nfs_uuid_t) is in-use.
Also, switch nfs_uuid_begin() from using list_add_tail_rcu() to
list_add_tail() -- rculist was used in an earlier version of the
localio code that had a lockless nfs_uuid_lookup interface.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes
Qualcomm driver fixes for v6.12
The Qualcomm EDAC driver's configuration of interrupts is made optional,
to avoid violating security constriants on X Elite platform .
The SCM drivers' detection mechanism for the presence of SHM bridge in QTEE,
is corrected to handle the case where firmware successfully returns that
the interface isn't supported.
The GLINK driver and the PMIC GLINK interface is updated to handle
buffer allocation issues during initialization of the communication
channel.
Allocation error handling in the socinfo dirver is corrected, and then
the fix is corrected.
* tag 'qcom-drivers-fixes-for-6.12' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
soc: qcom: pmic_glink: Handle GLINK intent allocation rejections
rpmsg: glink: Handle rejected intent request better
soc: qcom: socinfo: fix revision check in qcom_socinfo_probe()
firmware: qcom: scm: Return -EOPNOTSUPP for unsupported SHM bridge enabling
EDAC/qcom: Make irq configuration optional
firmware: qcom: scm: fix a NULL-pointer dereference
firmware: qcom: scm: suppress download mode error
soc: qcom: Add check devm_kasprintf() returned value
MAINTAINERS: Qualcomm SoC: Match reserved-memory bindings
Link: https://lore.kernel.org/r/20241101161455.746290-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"17 hotfixes. 9 are cc:stable. 13 are MM and 4 are non-MM.
The usual collection of singletons - please see the changelogs"
* tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()
mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats
mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes
mm: shrinker: avoid memleak in alloc_shrinker_info
.mailmap: update e-mail address for Eugen Hristev
vmscan,migrate: fix page count imbalance on node stats when demoting pages
mailmap: update Jarkko's email addresses
mm: allow set/clear page_type again
nilfs2: fix potential deadlock with newly created symlinks
Squashfs: fix variable overflow in squashfs_readpage_block
kasan: remove vmalloc_percpu test
tools/mm: -Werror fixes in page-types/slabinfo
mm, swap: avoid over reclaim of full clusters
mm: fix PSWPIN counter for large folios swap-in
mm: avoid VM_BUG_ON when try to map an anon large folio to zero page.
mm/codetag: fix null pointer check logic for ref and tag
mm/gup: stop leaking pinned pages in low memory conditions
|
|
When the MM_WALK capability is enabled, memory that is mostly accessed by
a VM appears younger than it really is, therefore this memory will be less
likely to be evicted. Therefore, the presence of a running VM can
significantly increase swap-outs for non-VM memory, regressing the
performance for the rest of the system.
Fix this regression by always calling {ptep,pmdp}_clear_young_notify()
whenever we clear the young bits on PMDs/PTEs.
[jthoughton@google.com: fix link-time error]
Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Reported-by: David Stevens <stevensd@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in
MM_WALK".
Today, the MM_WALK capability causes MGLRU to clear the young bit from
PMDs and PTEs during the page table walk before eviction, but MGLRU does
not call the clear_young() MMU notifier in this case. By not calling this
notifier, the MM walk takes less time/CPU, but it causes pages that are
accessed mostly through KVM / secondary MMUs to appear younger than they
should be.
We do call the clear_young() notifier today, but only when attempting to
evict the page, so we end up clearing young/accessed information less
frequently for secondary MMUs than for mm PTEs, and therefore they appear
younger and are less likely to be evicted. Therefore, memory that is
*not* being accessed mostly by KVM will be evicted *more* frequently,
worsening performance.
ChromeOS observed a tab-open latency regression when enabling MGLRU with a
setup that involved running a VM:
Tab-open latency histogram (ms)
Version p50 mean p95 p99 max
base 1315 1198 2347 3454 10319
mglru 2559 1311 7399 12060 43758
fix 1119 926 2470 4211 6947
This series replaces the final non-selftest patchs from this series[1],
which introduced a similar change (and a new MMU notifier) with KVM
optimizations. I'll send a separate series (to Sean and Paolo) for the
KVM optimizations.
This series also makes proactive reclaim with MGLRU possible for KVM
memory. I have verified that this functions correctly with the selftest
from [1], but given that that test is a KVM selftest, I'll send it with
the rest of the KVM optimizations later. Andrew, let me know if you'd
like to take the test now anyway.
[1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google.com/
This patch (of 2):
The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful
and become more complicated to properly compute when adding
test/clear_young() notifiers in MGLRU's mm walk.
Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com
Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com
Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Signed-off-by: James Houghton <jthoughton@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Stevens <stevensd@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Wei Xu <weixugc@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
- a fix for regression in input core introduced in 6.11 preventing
re-registering input handlers
- a fix for adp5588-keys driver tyring to disable interrupt 0 at
suspend when devices is used without interrupt
- a fix for edt-ft5x06 to stop leaking regmap structure when probing
fails and to make sure it is not released too early on removal.
* tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: fix regression when re-registering input handlers
Input: adp5588-keys - do not try to disable interrupt 0
Input: edt-ft5x06 - fix regmap leak when probe fails
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Thomas Gleixner:
"A single fix for posix CPU timers.
When a thread is cloned, the posix CPU timers are not inherited.
If the parent has a CPU timer armed the corresponding tick dependency
in the tasks tick_dep_mask is set and copied to the new thread, which
means the new thread and all decendants will prevent the system to go
into full NOHZ operation.
Clear the tick dependency mask in copy_process() to fix this"
* tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
|
|
Commit d469647bafd9 ("Input: simplify event handling logic") introduced
code that would set handler->events() method to either
input_handler_events_filter() or input_handler_events_default() or
input_handler_events_null(), depending on the kind of input handler
(a filter or a regular one) we are dealing with. Unfortunately this
breaks cases when we try to re-register the same filter (as is the case
with sysrq handler): after initial registration the handler will have 2
event handling methods defined, and will run afoul of the check in
input_handler_check_methods():
input: input_handler_check_methods: only one event processing method can be defined (sysrq)
sysrq: Failed to register input handler, error -22
Fix this by adding handle_events() method to input_handle structure and
setting it up when registering a new input handle according to event
handling methods defined in associated input_handler structure, thus
avoiding modifying the input_handler structure.
Reported-by: "Ned T. Crigler" <crigler@gmail.com>
Reported-by: Christian Heusel <christian@heusel.eu>
Tested-by: "Ned T. Crigler" <crigler@gmail.com>
Tested-by: Peter Seiderer <ps.report@gmx.net>
Fixes: d469647bafd9 ("Input: simplify event handling logic")
Link: https://lore.kernel.org/r/Zx2iQp6csn42PJA7@xavtug
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
Pull drm fixes from Dave Airlie:
"Regular fixes pull, nothing too out of the ordinary, the mediatek
fixes came in a batch that I might have preferred a bit earlier but
all seem fine, otherwise regular xe/amdgpu and a few misc ones.
xe:
- Fix missing HPD interrupt enabling, bringing one PM refactor with it
- Workaround LNL GGTT invalidation not being visible to GuC
- Avoid getting jobs stuck without a protecting timeout
ivpu:
- Fix firewall IRQ handling
panthor:
- Fix firmware initialization wrt page sizes
- Fix handling and reporting of dead job groups
sched:
- Guarantee forward progress via WC_MEM_RECLAIM
tests:
- Fix memory leak in drm_display_mode_from_cea_vic()
amdgpu:
- DCN 3.5 fix
- Vangogh SMU KASAN fix
- SMU 13 profile reporting fix
mediatek:
- Fix degradation problem of alpha blending
- Fix color format MACROs in OVL
- Fix get efuse issue for MT8188 DPTX
- Fix potential NULL dereference in mtk_crtc_destroy()
- Correct dpi power-domains property
- Add split subschema property constraints"
* tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel: (27 commits)
drm/xe: Don't short circuit TDR on jobs not started
drm/xe: Add mmio read before GGTT invalidate
drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic()
drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic()
drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()
drm/panthor: Report group as timedout when we fail to properly suspend
drm/panthor: Fail job creation when the group is dead
drm/panthor: Fix firmware initialization on systems with a page size > 4k
accel/ivpu: Fix NOC firewall interrupt handling
drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume
drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling
drm/xe: Remove runtime argument from display s/r functions
drm/amdgpu/smu13: fix profile reporting
drm/amd/pm: Vangogh: Fix kernel memory out of bounds write
Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"
drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM
drm/tegra: Fix NULL vs IS_ERR() check in probe()
dt-bindings: display: mediatek: split: add subschema property constraints
dt-bindings: display: mediatek: dpi: correct power-domains property
drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl fixes from Ira Weiny:
"The bulk of these fixes center around an initialization order bug
reported by Gregory Price and some additional fall out from the
debugging effort.
In summary, cxl_acpi and cxl_mem race and previously worked because of
a bus_rescan_devices() while testing without modules built in.
Unfortunately with modules built in the rescan would fail due to the
cxl_port driver being registered late via the build order. Furthermore
it was found bus_rescan_devices() did not guarantee a probe barrier
which CXL was expecting. Additional fixes to cxl-test and decoder
allocation came along as they were found in this debugging effort.
The other fixes are pretty minor but one affects trace point data seen
by user space.
Summary:
- Fix crashes when running with cxl-test code
- Fix Trace DRAM Event Record field decodes
- Fix module/built in initialization order errors
- Fix use after free on decoder shutdowns
- Fix out of order decoder allocations
- Improve cxl-test to better reflect real world systems"
* tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/test: Improve init-order fidelity relative to real-world systems
cxl/port: Prevent out-of-order decoder allocation
cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
cxl/port: Fix CXL port initialization order when the subsystem is built-in
cxl/events: Fix Trace DRAM Event Record
cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fix from Rafael Wysocki:
"Make the ACPI CPPC library use a raw spinlock for operations carried
out in scheduler context via the schedutil governor and the ACPI CPPC
cpufreq driver (Pierre Gondois)"
* tag 'acpi-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: CPPC: Make rmw_lock a raw_spin_lock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"The important one is a change to the way in which we handle protection
keys around signal delivery so that we're more closely aligned with
the x86 behaviour, however there is also a revert of the previous fix
to disable software tag-based KASAN with GCC, since a workaround
materialised shortly afterwards.
I'd love to say we're done with 6.12, but we're aware of some
longstanding fpsimd register corruption issues that we're almost at
the bottom of resolving.
Summary:
- Fix handling of POR_EL0 during signal delivery so that pushing the
signal context doesn't fail based on the pkey configuration of the
interrupted context and align our user-visible behaviour with that
of x86.
- Fix a bogus pointer being passed to the CPU hotplug code from the
Arm SDEI driver.
- Re-enable software tag-based KASAN with GCC by using an alternative
implementation of '__no_sanitize_address'"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state()
Revert "kasan: Disable Software Tag-Based KASAN with GCC"
kasan: Fix Software Tag-Based KASAN with GCC
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull iomap fixes from Christian Brauner:
"Fixes for iomap to prevent data corruption bugs in the fallocate
unshare range implementation of fsdax and a small cleanup to turn
iomap_want_unshare_iter() into an inline function"
* tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
iomap: turn iomap_want_unshare_iter into an inline function
fsdax: dax_unshare_iter needs to copy entire blocks
fsdax: remove zeroing code from dax_unshare_iter
iomap: share iomap_unshare_iter predicate code with fsdax
xfs: don't allocate COW extents when unsharing a hole
|
|
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs
Pull filesystem fixes from Christian Brauner:
"VFS:
- Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set
- Add a get_tree_bdev_flags() helper that allows to modify e.g.,
whether errors are logged into the filesystem context during
superblock creation. This is used by erofs to fix a userspace
regression where an error is currently logged when its used on a
regular file which is an new allowed mode in erofs.
netfs:
- Fix the sysfs debug path in the documentation.
- Fix iov_iter_get_pages*() for folio queues by skipping the page
extracation if we're at the end of a folio.
afs:
- Fix moving subdirectories to different parent directory.
autofs:
- Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in
validate_dev_ioctl(). The actual ioctl number, not the ioctl
command needs to be checked for autofs"
* tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs:
iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP
autofs: fix thinko in validate_dev_ioctl()
iov_iter: Fix iov_iter_get_pages*() for folio_queue
afs: Fix missing subdir edit when renamed between parent dirs
doc: correcting the debug path for cachefiles
erofs: use get_tree_bdev_flags() to avoid misleading messages
fs/super.c: introduce get_tree_bdev_flags()
|
|
Pull bpf fixes from Daniel Borkmann:
- Fix BPF verifier to force a checkpoint when the program's jump
history becomes too long (Eduard Zingerman)
- Add several fixes to the BPF bits iterator addressing issues like
memory leaks and overflow problems (Hou Tao)
- Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong)
- Fix BPF test infra's LIVE_FRAME frame update after a page has been
recycled (Toke Høiland-Jørgensen)
- Fix BPF verifier and undo the 40-bytes extra stack space for
bpf_fastcall patterns due to various bugs (Eduard Zingerman)
- Fix a BPF sockmap race condition which could trigger a NULL pointer
dereference in sock_map_link_update_prog (Cong Wang)
- Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under
the socket lock (Jiayuan Chen)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled
selftests/bpf: Add three test cases for bits_iter
bpf: Use __u64 to save the bits in bits iterator
bpf: Check the validity of nr_words in bpf_iter_bits_new()
bpf: Add bpf_mem_alloc_check_size() helper
bpf: Free dynamically allocated bits in bpf_iter_bits_destroy()
bpf: disallow 40-bytes extra stack for bpf_fastcall patterns
selftests/bpf: Add test for trie_get_next_key()
bpf: Fix out-of-bounds write in trie_get_next_key()
selftests/bpf: Test with a very short loop
bpf: Force checkpoint when jmp history is too long
bpf: fix filed access without lock
sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from WiFi, bluetooth and netfilter.
No known new regressions outstanding.
Current release - regressions:
- wifi: mt76: do not increase mcu skb refcount if retry is not
supported
Current release - new code bugs:
- wifi:
- rtw88: fix the RX aggregation in USB 3 mode
- mac80211: fix memory corruption bug in struct ieee80211_chanctx
Previous releases - regressions:
- sched:
- stop qdisc_tree_reduce_backlog on TC_H_ROOT
- sch_api: fix xa_insert() error path in tcf_block_get_ext()
- wifi:
- revert "wifi: iwlwifi: remove retry loops in start"
- cfg80211: clear wdev->cqm_config pointer on free
- netfilter: fix potential crash in nf_send_reset6()
- ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find()
- bluetooth: fix null-ptr-deref in hci_read_supported_codecs
- eth: mlxsw: add missing verification before pushing Tx header
- eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of
bounds issue
Previous releases - always broken:
- wifi: mac80211: do not pass a stopped vif to the driver in
.get_txpower
- netfilter: sanitize offset and length before calling skb_checksum()
- core:
- fix crash when config small gso_max_size/gso_ipv4_max_size
- skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
- mptcp: protect sched with rcu_read_lock
- eth: ice: fix crash on probe for DPLL enabled E810 LOM
- eth: macsec: fix use-after-free while sending the offloading packet
- eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data
- eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices
- eth: mtk_wed: fix path of MT7988 WO firmware"
* tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits)
net: hns3: fix kernel crash when 1588 is sent on HIP08 devices
net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue
net: hns3: initialize reset_timer before hclgevf_misc_irq_init()
net: hns3: don't auto enable misc vector
net: hns3: Resolved the issue that the debugfs query result is inconsistent.
net: hns3: fix missing features due to dev->features configuration too early
net: hns3: fixed reset failure issues caused by the incorrect reset type
net: hns3: add sync command to sync io-pgtable
net: hns3: default enable tx bounce buffer when smmu enabled
netfilter: nft_payload: sanitize offset and length before calling skb_checksum()
net: ethernet: mtk_wed: fix path of MT7988 WO firmware
selftests: forwarding: Add IPv6 GRE remote change tests
mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address
mlxsw: pci: Sync Rx buffers for device
mlxsw: pci: Sync Rx buffers for CPU
mlxsw: spectrum_ptp: Add missing verification before pushing Tx header
net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension
Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs
netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6()
netfilter: Fix use-after-free in get_info()
...
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
ivpu:
- Fix firewall IRQ handling
panthor:
- Fix firmware initialization wrt page sizes
- Fix handling and reporting of dead job groups
sched:
- Guarantee forward progress via WC_MEM_RECLAIM
tests:
- Fix memory leak in drm_display_mode_from_cea_vic()
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20241031144348.GA7826@linux-2.fritz.box
|
|
As Maxime suggested, add a new helper
drm_kunit_display_mode_from_cea_vic(), it can replace the direct call
of drm_display_mode_from_cea_vic(), and it will help solving
the `mode` memory leaks.
Acked-by: Maxime Ripard <mripard@kernel.org>
Suggested-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241030023504.530425-2-ruanjinjie@huawei.com
Signed-off-by: Maxime Ripard <mripard@kernel.org>
|
|
Some page flags (page->flags) were converted to page types
(page->page_types). A recent example is PG_hugetlb.
From the exclusive writer's perspective, e.g., a thread doing
__folio_set_hugetlb(), there is a difference between the page flag and
type APIs: the former allows the same non-atomic operation to be repeated
whereas the latter does not. For example, calling __folio_set_hugetlb()
twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type
(PG_hugetlb) not to be set previously.
Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in
the following error-handling path. And when that happens, it triggers the
aforementioned VM_BUG_ON_FOLIO().
if (folio_test_hugetlb(folio)) {
rc = hugetlb_vmemmap_restore_folio(h, folio);
if (rc) {
spin_lock_irq(&hugetlb_lock);
add_hugetlb_folio(h, folio, false);
...
It is possible to make hugeTLB comply with the new requirements from the
page type API. However, a straightforward fix would be to just allow the
same page type to be set or cleared again inside the API, to avoid any
changes to its callers.
Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com
Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType")
Signed-off-by: Yu Zhao <yuzhao@google.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When running low on usable slots, cluster allocator will try to reclaim
the full clusters aggressively to reclaim HAS_CACHE slots. This
guarantees that as long as there are any usable slots, HAS_CACHE or not,
the swap device will be usable and workload won't go OOM early.
Before the cluster allocator, swap allocator fails easily if device is
filled up with reclaimable HAS_CACHE slots. Which can be easily
reproduced with following simple program:
#include <stdio.h>
#include <string.h>
#include <linux/mman.h>
#include <sys/mman.h>
#define SIZE 8192UL * 1024UL * 1024UL
int main(int argc, char **argv) {
long tmp;
char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
memset(p, 0, SIZE);
madvise(p, SIZE, MADV_PAGEOUT);
for (unsigned long i = 0; i < SIZE; ++i)
tmp += p[i];
getchar(); /* Pause */
return 0;
}
Setup an 8G non ramdisk swap, the first run of the program will swapout 8G
ram successfully. But run same program again after the first run paused,
the second run can't swapout all 8G memory as now half of the swap device
is pinned by HAS_CACHE. There was a random scan in the old allocator that
may reclaim part of the HAS_CACHE by luck, but it's unreliable.
The new allocator's added reclaim of full clusters when device is low on
usable slots. But when multiple CPUs are seeing the device is low on
usable slots at the same time, they ran into a thundering herd problem.
This is an observable problem on large machine with mass parallel
workload, as full cluster reclaim is slower on large swap device and
higher number of CPUs will also make things worse.
Testing using a 128G ZRAM on a 48c96t system. When the swap device is
very close to full (eg. 124G / 128G), running build linux kernel with
make -j96 in a 1G memory cgroup will hung (not a softlockup though)
spinning in full cluster reclaim for about ~5min before go OOM.
To solve this, split the full reclaim into two parts:
- Instead of do a synchronous aggressively reclaim when device is low,
do only one aggressively reclaim when device is strictly full with a
kworker. This still ensures in worst case the device won't be unusable
because of HAS_CACHE slots.
- To avoid allocation (especially higher order) suffer from HAS_CACHE
filling up clusters and kworker not responsive enough, do one synchronous
scan every time the free list is drained, and only scan one cluster. This
is kind of similar to the random reclaim before, keeps the full clusters
rotated and has a minimal latency. This should provide a fair reclaim
strategy suitable for most workloads.
Link: https://lkml.kernel.org/r/20241022175512.10398-1-ryncsn@gmail.com
Fixes: 2cacbdfdee65 ("mm: swap: add a adaptive full cluster cache reclaim")
Signed-off-by: Kairui Song <kasong@tencent.com>
Cc: Barry Song <v-songbaohua@oppo.com>
Cc: Chris Li <chrisl@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kalesh Singh <kaleshsingh@google.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When we compile and load lib/slub_kunit.c,it will cause a panic.
The root cause is that __kmalloc_cache_noprof was directly called instead
of kmem_cache_alloc,which resulted in no alloc_tag being allocated.This
caused current->alloc_tag to be null,leading to a null pointer dereference
in alloc_tag_ref_set.
Despite the fact that my colleague Pei Xiao will later fix the code in
slub_kunit.c,we still need fix null pointer check logic for ref and tag to
avoid panic caused by a null pointer dereference.
Here is the log for the panic:
[ 74.779373][ T2158] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[ 74.780130][ T2158] Mem abort info:
[ 74.780406][ T2158] ESR = 0x0000000096000004
[ 74.780756][ T2158] EC = 0x25: DABT (current EL), IL = 32 bits
[ 74.781225][ T2158] SET = 0, FnV = 0
[ 74.781529][ T2158] EA = 0, S1PTW = 0
[ 74.781836][ T2158] FSC = 0x04: level 0 translation fault
[ 74.782288][ T2158] Data abort info:
[ 74.782577][ T2158] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 74.783068][ T2158] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 74.783533][ T2158] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 74.784010][ T2158] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000105f34000
[ 74.784586][ T2158] [0000000000000020] pgd=0000000000000000, p4d=0000000000000000
[ 74.785293][ T2158] Internal error: Oops: 0000000096000004 [#1] SMP
[ 74.785805][ T2158] Modules linked in: slub_kunit kunit ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle 4
[ 74.790661][ T2158] CPU: 0 UID: 0 PID: 2158 Comm: kunit_try_catch Kdump: loaded Tainted: G W N 6.12.0-rc3+ #2
[ 74.791535][ T2158] Tainted: [W]=WARN, [N]=TEST
[ 74.791889][ T2158] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 74.792479][ T2158] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 74.793101][ T2158] pc : alloc_tagging_slab_alloc_hook+0x120/0x270
[ 74.793607][ T2158] lr : alloc_tagging_slab_alloc_hook+0x120/0x270
[ 74.794095][ T2158] sp : ffff800084d33cd0
[ 74.794418][ T2158] x29: ffff800084d33cd0 x28: 0000000000000000 x27: 0000000000000000
[ 74.795095][ T2158] x26: 0000000000000000 x25: 0000000000000012 x24: ffff80007b30e314
[ 74.795822][ T2158] x23: ffff000390ff6f10 x22: 0000000000000000 x21: 0000000000000088
[ 74.796555][ T2158] x20: ffff000390285840 x19: fffffd7fc3ef7830 x18: ffffffffffffffff
[ 74.797283][ T2158] x17: ffff8000800e63b4 x16: ffff80007b33afc4 x15: ffff800081654c00
[ 74.798011][ T2158] x14: 0000000000000000 x13: 205d383531325420 x12: 5b5d383734363537
[ 74.798744][ T2158] x11: ffff800084d337e0 x10: 000000000000005d x9 : 00000000ffffffd0
[ 74.799476][ T2158] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008219d188 x6 : c0000000ffff7fff
[ 74.800206][ T2158] x5 : ffff0003fdbc9208 x4 : ffff800081edd188 x3 : 0000000000000001
[ 74.800932][ T2158] x2 : 0beaa6dee1ac5a00 x1 : 0beaa6dee1ac5a00 x0 : ffff80037c2cb000
[ 74.801656][ T2158] Call trace:
[ 74.801954][ T2158] alloc_tagging_slab_alloc_hook+0x120/0x270
[ 74.802494][ T2158] __kmalloc_cache_noprof+0x148/0x33c
[ 74.802976][ T2158] test_kmalloc_redzone_access+0x4c/0x104 [slub_kunit]
[ 74.803607][ T2158] kunit_try_run_case+0x70/0x17c [kunit]
[ 74.804124][ T2158] kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit]
[ 74.804768][ T2158] kthread+0x10c/0x118
[ 74.805141][ T2158] ret_from_fork+0x10/0x20
[ 74.805540][ T2158] Code: b9400a80 11000400 b9000a80 97ffd858 (f94012d3)
[ 74.806176][ T2158] SMP: stopping secondary CPUs
[ 74.808130][ T2158] Starting crashdump kernel...
Link: https://lkml.kernel.org/r/20241020070819.307944-1-hao.ge@linux.dev
Fixes: e0a955bf7f61 ("mm/codetag: add pgalloc_tag_copy()")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|