summaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2024-11-02timekeeping: Remove CONFIG_DEBUG_TIMEKEEPINGThomas Gleixner
Since 135225a363ae timekeeping_cycles_to_ns() handles large offsets which would lead to 64bit multiplication overflows correctly. It's also protected against negative motion of the clocksource unconditionally, which was exclusive to x86 before. timekeeping_advance() handles large offsets already correctly. That means the value of CONFIG_DEBUG_TIMEKEEPING which analyzed these cases is very close to zero. Remove all of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20241031120328.536010148@linutronix.de
2024-11-01Merge tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds
Pull drm fixes from Dave Airlie: "Regular fixes pull, nothing too out of the ordinary, the mediatek fixes came in a batch that I might have preferred a bit earlier but all seem fine, otherwise regular xe/amdgpu and a few misc ones. xe: - Fix missing HPD interrupt enabling, bringing one PM refactor with it - Workaround LNL GGTT invalidation not being visible to GuC - Avoid getting jobs stuck without a protecting timeout ivpu: - Fix firewall IRQ handling panthor: - Fix firmware initialization wrt page sizes - Fix handling and reporting of dead job groups sched: - Guarantee forward progress via WC_MEM_RECLAIM tests: - Fix memory leak in drm_display_mode_from_cea_vic() amdgpu: - DCN 3.5 fix - Vangogh SMU KASAN fix - SMU 13 profile reporting fix mediatek: - Fix degradation problem of alpha blending - Fix color format MACROs in OVL - Fix get efuse issue for MT8188 DPTX - Fix potential NULL dereference in mtk_crtc_destroy() - Correct dpi power-domains property - Add split subschema property constraints" * tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel: (27 commits) drm/xe: Don't short circuit TDR on jobs not started drm/xe: Add mmio read before GGTT invalidate drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic() drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic() drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic() drm/panthor: Report group as timedout when we fail to properly suspend drm/panthor: Fail job creation when the group is dead drm/panthor: Fix firmware initialization on systems with a page size > 4k accel/ivpu: Fix NOC firewall interrupt handling drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling drm/xe: Remove runtime argument from display s/r functions drm/amdgpu/smu13: fix profile reporting drm/amd/pm: Vangogh: Fix kernel memory out of bounds write Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM drm/tegra: Fix NULL vs IS_ERR() check in probe() dt-bindings: display: mediatek: split: add subschema property constraints dt-bindings: display: mediatek: dpi: correct power-domains property drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy() ...
2024-11-01Merge tag 'cxl-fixes-6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull cxl fixes from Ira Weiny: "The bulk of these fixes center around an initialization order bug reported by Gregory Price and some additional fall out from the debugging effort. In summary, cxl_acpi and cxl_mem race and previously worked because of a bus_rescan_devices() while testing without modules built in. Unfortunately with modules built in the rescan would fail due to the cxl_port driver being registered late via the build order. Furthermore it was found bus_rescan_devices() did not guarantee a probe barrier which CXL was expecting. Additional fixes to cxl-test and decoder allocation came along as they were found in this debugging effort. The other fixes are pretty minor but one affects trace point data seen by user space. Summary: - Fix crashes when running with cxl-test code - Fix Trace DRAM Event Record field decodes - Fix module/built in initialization order errors - Fix use after free on decoder shutdowns - Fix out of order decoder allocations - Improve cxl-test to better reflect real world systems" * tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/test: Improve init-order fidelity relative to real-world systems cxl/port: Prevent out-of-order decoder allocation cxl/port: Fix use-after-free, permit out-of-order decoder shutdown cxl/acpi: Ensure ports ready at cxl_acpi_probe() return cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() cxl/port: Fix CXL port initialization order when the subsystem is built-in cxl/events: Fix Trace DRAM Event Record cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device
2024-11-01Merge tag 'acpi-6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Make the ACPI CPPC library use a raw spinlock for operations carried out in scheduler context via the schedutil governor and the ACPI CPPC cpufreq driver (Pierre Gondois)" * tag 'acpi-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: CPPC: Make rmw_lock a raw_spin_lock
2024-11-01bpf: decouple BPF link/attach hook and BPF program sleepable semanticsAndrii Nakryiko
BPF link's lifecycle protection scheme depends on both BPF hook and BPF program. If *either* of those require RCU Tasks Trace GP, then we need to go through a chain of GPs before putting BPF program refcount and deallocating BPF link memory. This patch adds bpf_link-specific sleepable flag, which can be set to true even if underlying BPF program is not sleepable itself. If either link->sleepable or link->prog->sleepable is true, we'll go through a chain of RCU Tasks Trace GP and RCU GP before putting BPF program and freeing memory. This will be used to protect BPF link for sleepable (faultable) raw tracepoints in the next patch. Link: https://lore.kernel.org/20241101181754.782341-2-andrii@kernel.org Tested-by: Jordan Rife <jrife@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-01tracing: Add might_fault() check in __DECLARE_TRACE_SYSCALLMathieu Desnoyers
Catch incorrect use of syscall tracepoints even if no probes are registered by adding a might_fault() check in trace_##name() emitted by __DECLARE_TRACE_SYSCALL. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jordan Rife <jrife@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/20241031152056.744137-5-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-01tracing: Introduce tracepoint_is_faultable()Mathieu Desnoyers
Introduce a "faultable" flag within the extended structure to know whether a tracepoint needs rcu tasks trace grace period before reclaim. This can be queried using tracepoint_is_faultable(). Acked-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Jordan Rife <jrife@google.com> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/20241031152056.744137-3-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-01tracing: Introduce tracepoint extended structureMathieu Desnoyers
Shrink the struct tracepoint size from 80 bytes to 72 bytes on x86-64 by moving the (typically NULL) regfunc/unregfunc pointers to an extended structure. Tested-by: Jordan Rife <jrife@google.com> Cc: Michael Jeanson <mjeanson@efficios.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: bpf@vger.kernel.org Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Jordan Rife <jrife@google.com> Cc: linux-trace-kernel@vger.kernel.org Link: https://lore.kernel.org/20241031152056.744137-2-mathieu.desnoyers@efficios.com Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-01tracing: Remove TRACE_FLAG_IRQS_NOSUPPORTSebastian Andrzej Siewior
It was possible to enable tracing with no IRQ tracing support. The tracing infrastructure would then record TRACE_FLAG_IRQS_NOSUPPORT as the only tracing flag and show an 'X' in the output. The last user of this feature was PPC32 which managed to implement it during PowerPC merge in 2009. Since then, it was unused and the PPC32 dependency was finally removed in commit 0ea5ee035133a ("tracing: Remove PPC32 wart from config TRACING_SUPPORT"). Since the PowerPC merge the code behind !CONFIG_TRACE_IRQFLAGS_SUPPORT with TRACING enabled can no longer be selected used and the 'X' is not displayed or recorded. Remove the CONFIG_TRACE_IRQFLAGS_SUPPORT from the tracing code. Remove TRACE_FLAG_IRQS_NOSUPPORT. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://lore.kernel.org/20241022110112.XJI8I9T2@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-11-01Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "The important one is a change to the way in which we handle protection keys around signal delivery so that we're more closely aligned with the x86 behaviour, however there is also a revert of the previous fix to disable software tag-based KASAN with GCC, since a workaround materialised shortly afterwards. I'd love to say we're done with 6.12, but we're aware of some longstanding fpsimd register corruption issues that we're almost at the bottom of resolving. Summary: - Fix handling of POR_EL0 during signal delivery so that pushing the signal context doesn't fail based on the pkey configuration of the interrupted context and align our user-visible behaviour with that of x86. - Fix a bogus pointer being passed to the CPU hotplug code from the Arm SDEI driver. - Re-enable software tag-based KASAN with GCC by using an alternative implementation of '__no_sanitize_address'" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: signal: Improve POR_EL0 handling to avoid uaccess failures firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state() Revert "kasan: Disable Software Tag-Based KASAN with GCC" kasan: Fix Software Tag-Based KASAN with GCC
2024-11-01Merge tag 'vfs-6.12-rc6.iomap' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull iomap fixes from Christian Brauner: "Fixes for iomap to prevent data corruption bugs in the fallocate unshare range implementation of fsdax and a small cleanup to turn iomap_want_unshare_iter() into an inline function" * tag 'vfs-6.12-rc6.iomap' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iomap: turn iomap_want_unshare_iter into an inline function fsdax: dax_unshare_iter needs to copy entire blocks fsdax: remove zeroing code from dax_unshare_iter iomap: share iomap_unshare_iter predicate code with fsdax xfs: don't allocate COW extents when unsharing a hole
2024-11-01Merge tag 'vfs-6.12-rc6.fixes' of ↵Linus Torvalds
gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs Pull filesystem fixes from Christian Brauner: "VFS: - Fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP=y is set - Add a get_tree_bdev_flags() helper that allows to modify e.g., whether errors are logged into the filesystem context during superblock creation. This is used by erofs to fix a userspace regression where an error is currently logged when its used on a regular file which is an new allowed mode in erofs. netfs: - Fix the sysfs debug path in the documentation. - Fix iov_iter_get_pages*() for folio queues by skipping the page extracation if we're at the end of a folio. afs: - Fix moving subdirectories to different parent directory. autofs: - Fix handling of AUTOFS_DEV_IOCTL_TIMEOUT_CMD ioctl in validate_dev_ioctl(). The actual ioctl number, not the ioctl command needs to be checked for autofs" * tag 'vfs-6.12-rc6.fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: iov_iter: fix copy_page_from_iter_atomic() if KMAP_LOCAL_FORCE_MAP autofs: fix thinko in validate_dev_ioctl() iov_iter: Fix iov_iter_get_pages*() for folio_queue afs: Fix missing subdir edit when renamed between parent dirs doc: correcting the debug path for cachefiles erofs: use get_tree_bdev_flags() to avoid misleading messages fs/super.c: introduce get_tree_bdev_flags()
2024-11-01dt-bindings: mfd: aspeed: Support for AST2700Ryan Chen
Add reset, clk dt bindings headers, and update compatible support for AST2700 clk, silicon-id in yaml. Signed-off-by: Ryan Chen <ryan_chen@aspeedtech.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://lore.kernel.org/r/20241023090153.1395220-2-ryan_chen@aspeedtech.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-11-01iio: backend: extend featuresAngelo Dureghello
Extend backend features with new calls needed later on this patchset from axi version of ad3552r. The follwoing calls are added: iio_backend_ddr_enable() enable ddr bus transfer iio_backend_ddr_disable() disable ddr bus transfer iio_backend_data_stream_enable() enable data stream over bus interface iio_backend_data_stream_disable() disable data stream over bus interface iio_backend_data_transfer_addr() define the target register address where the DAC sample will be written. Reviewed-by: Nuno Sa <nuno.sa@analog.com> Signed-off-by: Angelo Dureghello <adureghello@baylibre.com> Reviewed-by: David Lechner <dlechner@baylibre.com> Link: https://patch.msgid.link/20241028-wip-bl-ad3552r-axi-v0-iio-testing-v9-3-f6960b4f9719@kernel-space.org Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-11-01ASoC: cleanup function parameter for rtd and its idKuninori Morimoto
some functions had parameter like below xxx(..., rtd, ..., id); This "id" is rtd->id. We don't need to have "id" on each functions because we can get it from "rtd". Let's cleanup it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/87plnqb84p.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-01ASoC: remove rtd->numKuninori Morimoto
No one is using rtd->num. Let's remove it. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/87sesmb852.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-01ASoC: rename rtd->num to rtd->idKuninori Morimoto
Current rtd has "num". It sounds/looks like size of rtd or something, but it will be mainly used at snd_pcm_new() as "device index". This naming is confusable. Let's rename it to "id" Some drivers are using rtd->num, so let's keep it so far, and remove it if all user was switched. Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Link: https://patch.msgid.link/87zfmub85z.wl-kuninori.morimoto.gx@renesas.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-01ASoC: sdw_utils/intel/amd: refactor dai link init logicVijendar Mukunda
Add 'no_pcm' as parameter for asoc_sdw_init_dai_link() so that same function can be used for SOF and legacy(No DSP) stack. Pass 'no_pcm' as 1 for Intel and AMD SOF based machine drivers. Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com> Reviewed-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com> Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com> Link: https://patch.msgid.link/20241101020802.1103181-2-Vijendar.Mukunda@amd.com Signed-off-by: Mark Brown <broonie@kernel.org>
2024-11-01Merge tag 'drm-misc-next-2024-10-31' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.13: All of the previous pull request, with MORE! Core Changes: - Update documentation for scheduler start/stop and job init. - Add dedede and sm8350-hdk hardware to ci runs. Driver Changes: - Small fixes and cleanups to panfrost, omap, nouveau, ivpu, zynqmp, v3d, panthor docs, and leadtek-ltk050h3146w. - Crashdump support for qaic. - Support DP compliance in zynqmp. - Add Samsung S6E88A0-AMS427AP24 panel. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/deeef745-f3fb-4e85-a9d0-e8d38d43c1cf@linux.intel.com
2024-10-31dql: annotate data-races around dql->last_obj_cntEric Dumazet
dql->last_obj_cnt is read/written from different contexts, without any lock synchronization. Use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <jdamato@fastly.com> Link: https://patch.msgid.link/20241029191425.2519085-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31netlink: add NLA_POLICY_MAX_LEN macroAntonio Quartulli
Similarly to NLA_POLICY_MIN_LEN, NLA_POLICY_MAX_LEN defines a policy with a maximum length value. The netlink generator for YAML specs has been extended accordingly. Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20241029-b4-ovpn-v11-1-de4698c73a25@openvpn.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31netlabel: document doi_remove field of struct netlbl_calipso_opsGeorge Guo
Add documentation of doi_remove field to Kernel doc for struct netlbl_calipso_ops. Flagged by ./scripts/kernel-doc -none. Signed-off-by: George Guo <guodongtai@kylinos.cn> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://patch.msgid.link/20241028123435.3495916-1-dongtai.guo@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-01f2fs: introduce device aliasing fileDaeho Jeong
F2FS should understand how the device aliasing file works and support deleting the file after use. A device aliasing file can be created by mkfs.f2fs tool and it can map the whole device with an extent, not using node blocks. The file space should be pinned and normally used for read-only usages. Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Chao Yu <chao@kernel.org> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR (net-6.12-rc6). Conflicts: drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c cbe84e9ad5e2 ("wifi: iwlwifi: mvm: really send iwl_txpower_constraints_cmd") 188a1bf89432 ("wifi: mac80211: re-order assigning channel in activate links") https://lore.kernel.org/all/20241028123621.7bbb131b@canb.auug.org.au/ net/mac80211/cfg.c c4382d5ca1af ("wifi: mac80211: update the right link for tx power") 8dd0498983ee ("wifi: mac80211: Fix setting txpower with emulate_chanctx") drivers/net/ethernet/intel/ice/ice_ptp_hw.h 6e58c3310622 ("ice: fix crash on probe for DPLL enabled E810 LOM") e4291b64e118 ("ice: Align E810T GPIO to other products") ebb2693f8fbd ("ice: Read SDP section from NVM for pin definitions") ac532f4f4251 ("ice: Cleanup unused declarations") https://lore.kernel.org/all/20241030120524.1ee1af18@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-31Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds
Pull bpf fixes from Daniel Borkmann: - Fix BPF verifier to force a checkpoint when the program's jump history becomes too long (Eduard Zingerman) - Add several fixes to the BPF bits iterator addressing issues like memory leaks and overflow problems (Hou Tao) - Fix an out-of-bounds write in trie_get_next_key (Byeonguk Jeong) - Fix BPF test infra's LIVE_FRAME frame update after a page has been recycled (Toke Høiland-Jørgensen) - Fix BPF verifier and undo the 40-bytes extra stack space for bpf_fastcall patterns due to various bugs (Eduard Zingerman) - Fix a BPF sockmap race condition which could trigger a NULL pointer dereference in sock_map_link_update_prog (Cong Wang) - Fix tcp_bpf_recvmsg_parser to retrieve seq_copied from tcp_sk under the socket lock (Jiayuan Chen) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, test_run: Fix LIVE_FRAME frame update after a page has been recycled selftests/bpf: Add three test cases for bits_iter bpf: Use __u64 to save the bits in bits iterator bpf: Check the validity of nr_words in bpf_iter_bits_new() bpf: Add bpf_mem_alloc_check_size() helper bpf: Free dynamically allocated bits in bpf_iter_bits_destroy() bpf: disallow 40-bytes extra stack for bpf_fastcall patterns selftests/bpf: Add test for trie_get_next_key() bpf: Fix out-of-bounds write in trie_get_next_key() selftests/bpf: Test with a very short loop bpf: Force checkpoint when jmp history is too long bpf: fix filed access without lock sock_map: fix a NULL pointer dereference in sock_map_link_update_prog()
2024-10-31i3c: master: Extend address status bit to 4 and add I3C_ADDR_SLOT_EXT_DESIREDFrank Li
Extend the address status bit to 4 and introduce the I3C_ADDR_SLOT_EXT_DESIRED macro to indicate that a device prefers a specific address. This is generally set by the 'assigned-address' in the device tree source (dts) file. ┌────┬─────────────┬───┬─────────┬───┐ │S/Sr│ 7'h7E RnW=0 │ACK│ ENTDAA │ T ├────┐ └────┴─────────────┴───┴─────────┴───┘ │ ┌─────────────────────────────────────────┘ │ ┌──┬─────────────┬───┬─────────────────┬────────────────┬───┬─────────┐ └─►│Sr│7'h7E RnW=1 │ACK│48bit UID BCR DCR│Assign 7bit Addr│PAR│ ACK/NACK│ └──┴─────────────┴───┴─────────────────┴────────────────┴───┴─────────┘ Some master controllers (such as HCI) need to prepare the entire above transaction before sending it out to the I3C bus. This means that a 7-bit dynamic address needs to be allocated before knowing the target device's UID information. However, some I3C targets may request specific addresses (called as "init_dyn_addr"), which is typically specified by the DT-'s assigned-address property. Lower addresses having higher IBI priority. If it is available, i3c_bus_get_free_addr() preferably return a free address that is not in the list of desired addresses (called as "init_dyn_addr"). This allows the device with the "init_dyn_addr" to switch to its "init_dyn_addr" when it hot-joins the I3C bus. Otherwise, if the "init_dyn_addr" is already in use by another I3C device, the target device will not be able to switch to its desired address. If the previous step fails, fallback returning one of the remaining unassigned address, regardless of its state in the desired list. Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20241021-i3c_dts_assign-v8-2-4098b8bde01e@nxp.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-10-31i3c: master: Replace hard code 2 with macro I3C_ADDR_SLOT_STATUS_BITSFrank Li
Replace the hardcoded value 2, which indicates 2 bits for I3C address status, with the predefined macro I3C_ADDR_SLOT_STATUS_BITS. Improve maintainability and extensibility of the code. Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20241021-i3c_dts_assign-v8-1-4098b8bde01e@nxp.com Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
2024-10-31Merge tag 'net-6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from WiFi, bluetooth and netfilter. No known new regressions outstanding. Current release - regressions: - wifi: mt76: do not increase mcu skb refcount if retry is not supported Current release - new code bugs: - wifi: - rtw88: fix the RX aggregation in USB 3 mode - mac80211: fix memory corruption bug in struct ieee80211_chanctx Previous releases - regressions: - sched: - stop qdisc_tree_reduce_backlog on TC_H_ROOT - sch_api: fix xa_insert() error path in tcf_block_get_ext() - wifi: - revert "wifi: iwlwifi: remove retry loops in start" - cfg80211: clear wdev->cqm_config pointer on free - netfilter: fix potential crash in nf_send_reset6() - ip_tunnel: fix suspicious RCU usage warning in ip_tunnel_find() - bluetooth: fix null-ptr-deref in hci_read_supported_codecs - eth: mlxsw: add missing verification before pushing Tx header - eth: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue Previous releases - always broken: - wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower - netfilter: sanitize offset and length before calling skb_checksum() - core: - fix crash when config small gso_max_size/gso_ipv4_max_size - skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension - mptcp: protect sched with rcu_read_lock - eth: ice: fix crash on probe for DPLL enabled E810 LOM - eth: macsec: fix use-after-free while sending the offloading packet - eth: stmmac: fix unbalanced DMA map/unmap for non-paged SKB data - eth: hns3: fix kernel crash when 1588 is sent on HIP08 devices - eth: mtk_wed: fix path of MT7988 WO firmware" * tag 'net-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (70 commits) net: hns3: fix kernel crash when 1588 is sent on HIP08 devices net: hns3: fixed hclge_fetch_pf_reg accesses bar space out of bounds issue net: hns3: initialize reset_timer before hclgevf_misc_irq_init() net: hns3: don't auto enable misc vector net: hns3: Resolved the issue that the debugfs query result is inconsistent. net: hns3: fix missing features due to dev->features configuration too early net: hns3: fixed reset failure issues caused by the incorrect reset type net: hns3: add sync command to sync io-pgtable net: hns3: default enable tx bounce buffer when smmu enabled netfilter: nft_payload: sanitize offset and length before calling skb_checksum() net: ethernet: mtk_wed: fix path of MT7988 WO firmware selftests: forwarding: Add IPv6 GRE remote change tests mlxsw: spectrum_ipip: Fix memory leak when changing remote IPv6 address mlxsw: pci: Sync Rx buffers for device mlxsw: pci: Sync Rx buffers for CPU mlxsw: spectrum_ptp: Add missing verification before pushing Tx header net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension Bluetooth: hci: fix null-ptr-deref in hci_read_supported_codecs netfilter: nf_reject_ipv6: fix potential crash in nf_send_reset6() netfilter: Fix use-after-free in get_info() ...
2024-11-01Merge tag 'drm-misc-fixes-2024-10-31' of ↵Dave Airlie
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes Short summary of fixes pull: ivpu: - Fix firewall IRQ handling panthor: - Fix firmware initialization wrt page sizes - Fix handling and reporting of dead job groups sched: - Guarantee forward progress via WC_MEM_RECLAIM tests: - Fix memory leak in drm_display_mode_from_cea_vic() Signed-off-by: Dave Airlie <airlied@redhat.com> From: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20241031144348.GA7826@linux-2.fritz.box
2024-10-31KVM: arm64: nv: Reprogram PMU events affected by nested transitionOliver Upton
Start reprogramming PMU events at nested boundaries now that everything is in place to handle the EL2 event filter. Only repaint events where the filter differs between EL1 and EL2 as a slight optimization. PMU now 'works' for nested VMs, albeit slow. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182559.3364829-1-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Adjust range of accessible PMCs according to HPMNOliver Upton
The value of MDCR_EL2.HPMN controls the number of event counters made visible to EL0 and EL1. This means it is possible for the guest hypervisor to allow direct access to event counters to the L2. Rework KVM's PMU register emulation to take the effects of HPMN into account when handling a trap. For bitmask-style registers, writes only affect accessible registers. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-14-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: Rename kvm_pmu_valid_counter_mask()Oliver Upton
Nested PMU support requires dynamically changing the visible range of PMU counters based on the exception level and value of MDCR_EL2.HPMN. At the same time, the PMU emulation code needs to know the absolute number of implemented counters, regardless of context. Rename the existing helper to make it obvious that it returns the number of implemented counters and not anything else. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-13-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31KVM: arm64: nv: Describe trap behaviour of MDCR_EL2.HPMNOliver Upton
MDCR_EL2.HPMN splits the PMU event counters into two ranges: the first range is accessible from all ELs, and the second range is accessible only to EL2/3. Supposing the guest hypervisor allows direct access to the PMU counters from the L2, KVM needs to locally handle those accesses. Add a new complex trap configuration for HPMN that checks if the counter index is accessible to the current context. As written, the architecture suggests HPMN only causes PMEVCNTR<n>_EL0 to trap, though intuition (and the pseudocode) suggest that the trap applies to PMEVTYPER<n>_EL0 as well. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241025182354.3364124-11-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-31block: remove bio_add_zone_append_pageChristoph Hellwig
This is only used by the nvmet zns passthrough code, which can trivially just use bio_add_pc_page and do the sanity check for the max zone append limit itself. All future zoned file systems should follow the btrfs lead and let the upper layers fill up bios unlimited by hardware constraints and split them to the limits in the I/O submission handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20241030051859.280923-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-10-31mfd: mt6397: Add initial support for MT6328Yassine Oudjana
The MT6328 PMIC is commonly used with the MT6735 SoC. Add initial support for this PMIC. Signed-off-by: Yassine Oudjana <y.oudjana@protonmail.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20241018081050.23592-5-y.oudjana@protonmail.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-10-31mfd: axp20x: Add support for AXP323Andre Przywara
The X-Powers AXP323 is a very close sibling of the AXP313A. The only difference seems to be the ability to dual-phase the first two DC/DC converter, which adds another register. Add the required boilerplate to introduce a new PMIC to the AXP MFD driver. Where possible, this just maps into the existing structs defined for the AXP313A, only deviating where needed. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Link: https://lore.kernel.org/r/20241007001408.27249-5-andre.przywara@arm.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-10-31mfd: axp20x: Ensure relationship between IDs and model namesAndre Przywara
At the moment there is an implicit relationship between the AXP model IDs and the order of the strings in the axp20x_model_names[] array. This is fragile, and makes adding IDs in the middle error prone. Make this relationship official by changing the ID type to the actual enum used, and using indexed initialisers for the string list. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Chen-Yu Tsai <wens@csie.org> Link: https://lore.kernel.org/r/20241007001408.27249-3-andre.przywara@arm.com Signed-off-by: Lee Jones <lee@kernel.org>
2024-10-31Merge tag 'ath-next-20241030' of ↵Kalle Valo
git://git.kernel.org/pub/scm/linux/kernel/git/ath/ath ath.git patches for v6.13 This development cycle featured phase 1 of patches to ath12k to support the new 802.11be MLO feature, along with other ath12k feature patches. In older drivers, support for some additional devices were added. And there was the usual set of bug fixes and cleanups across most drivers. Per-driver highlights: ath12k * Switch to using wiphy_lock() and remove ar->conf_mutex * Convert struct ath12k_sta::update_wk to use struct wiphy_work * Add phase 1 of 802.11be MLO support * Add firmware coredump collection support * Add debugfs support for a multitude of statistics * Fix host representation of multiple hal_rx structs * Fix use-after-free in ath12k_dp_cc_cleanup() * Skip Rx TID cleanup for self peer * Fix warning and crash when unloading in a VM * Convert CE interrupt handling from tasklet to BH workqueue * Fix A-MSDU indication in monitor mode ath11k * Fix double free issue during SRNG deinit * Enable firmware diagnostic events for WCN6750 * Fix CE offset address calculation for WCN6750 during SSR * Fix stack frame size warning in ath11k_vif_wow_set_wakeups() * Document the inputs for ath11k on WCN6855 ath10k * Fix multiple stack frame size warnings * Fix invalid VHT parameters in supported_vht_mcs_rate_nss* structs * Avoid NULL pointer error during SDIO remove ath5k * Add support for Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A
2024-10-31clockevents: Shutdown and unregister current clockevents at CPUHP_AP_TICK_DYINGFrederic Weisbecker
The way the clockevent devices are finally stopped while a CPU is offlining is currently chaotic. The layout being by order: 1) tick_sched_timer_dying() stops the tick and the underlying clockevent but only for oneshot case. The periodic tick and its related clockevent still runs. 2) tick_broadcast_offline() detaches and stops the per-cpu oneshot broadcast and append it to the released list. 3) Some individual clockevent drivers stop the clockevents (a second time if the tick is oneshot) 4) Once the CPU is dead, a control CPU remotely detaches and stops (a 3rd time if oneshot mode) the CPU clockevent and adds it to the released list. 5) The released list containing the broadcast device released on step 2) and the remotely detached clockevent from step 4) are unregistered. These random events can be factorized if the current clockevent is detached and stopped by the dying CPU at the generic layer, that is from the dying CPU: a) Stop the tick b) Stop/detach the underlying per-cpu oneshot broadcast clockevent c) Stop/detach the underlying clockevent d) Release / unregister the clockevents from b) and c) e) Release / unregister the remaining clockevents from the dying CPU. This part could be performed by the dying CPU This way the drivers and the tick layer don't need to care about clockevent operations during cpuhotplug down. This also unifies the tick behaviour on offline CPUs between oneshot and periodic modes, avoiding offline ticks altogether for sanity. Adopt the simplification. [ tglx: Remove the WARN_ON() in clockevents_register_device() as that is called from an upcoming CPU before the CPU is marked online ] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241029125451.54574-3-frederic@kernel.org
2024-10-31x86/MCE/AMD: Add support for new MCA_SYND{1,2} registersAvadhut Naik
Starting with Zen4, AMD's Scalable MCA systems incorporate two new registers: MCA_SYND1 and MCA_SYND2. These registers will include supplemental error information in addition to the existing MCA_SYND register. The data within these registers is considered valid if MCA_STATUS[SyndV] is set. Userspace error decoding tools like rasdaemon gather related hardware error information through the tracepoints. Therefore, export these two registers through the mce_record tracepoint so that tools like rasdaemon can parse them and output the supplemental error information like FRU text contained in them. [ bp: Massage. ] Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Link: https://lore.kernel.org/r/20241022194158.110073-4-avadhut.naik@amd.com
2024-10-31drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic()Jinjie Ruan
As Maxime suggested, add a new helper drm_kunit_display_mode_from_cea_vic(), it can replace the direct call of drm_display_mode_from_cea_vic(), and it will help solving the `mode` memory leaks. Acked-by: Maxime Ripard <mripard@kernel.org> Suggested-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241030023504.530425-2-ruanjinjie@huawei.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
2024-10-30mm: allow set/clear page_type againYu Zhao
Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30mm, swap: avoid over reclaim of full clustersKairui Song
When running low on usable slots, cluster allocator will try to reclaim the full clusters aggressively to reclaim HAS_CACHE slots. This guarantees that as long as there are any usable slots, HAS_CACHE or not, the swap device will be usable and workload won't go OOM early. Before the cluster allocator, swap allocator fails easily if device is filled up with reclaimable HAS_CACHE slots. Which can be easily reproduced with following simple program: #include <stdio.h> #include <string.h> #include <linux/mman.h> #include <sys/mman.h> #define SIZE 8192UL * 1024UL * 1024UL int main(int argc, char **argv) { long tmp; char *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); memset(p, 0, SIZE); madvise(p, SIZE, MADV_PAGEOUT); for (unsigned long i = 0; i < SIZE; ++i) tmp += p[i]; getchar(); /* Pause */ return 0; } Setup an 8G non ramdisk swap, the first run of the program will swapout 8G ram successfully. But run same program again after the first run paused, the second run can't swapout all 8G memory as now half of the swap device is pinned by HAS_CACHE. There was a random scan in the old allocator that may reclaim part of the HAS_CACHE by luck, but it's unreliable. The new allocator's added reclaim of full clusters when device is low on usable slots. But when multiple CPUs are seeing the device is low on usable slots at the same time, they ran into a thundering herd problem. This is an observable problem on large machine with mass parallel workload, as full cluster reclaim is slower on large swap device and higher number of CPUs will also make things worse. Testing using a 128G ZRAM on a 48c96t system. When the swap device is very close to full (eg. 124G / 128G), running build linux kernel with make -j96 in a 1G memory cgroup will hung (not a softlockup though) spinning in full cluster reclaim for about ~5min before go OOM. To solve this, split the full reclaim into two parts: - Instead of do a synchronous aggressively reclaim when device is low, do only one aggressively reclaim when device is strictly full with a kworker. This still ensures in worst case the device won't be unusable because of HAS_CACHE slots. - To avoid allocation (especially higher order) suffer from HAS_CACHE filling up clusters and kworker not responsive enough, do one synchronous scan every time the free list is drained, and only scan one cluster. This is kind of similar to the random reclaim before, keeps the full clusters rotated and has a minimal latency. This should provide a fair reclaim strategy suitable for most workloads. Link: https://lkml.kernel.org/r/20241022175512.10398-1-ryncsn@gmail.com Fixes: 2cacbdfdee65 ("mm: swap: add a adaptive full cluster cache reclaim") Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chris Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-30mm/codetag: fix null pointer check logic for ref and tagHao Ge
When we compile and load lib/slub_kunit.c,it will cause a panic. The root cause is that __kmalloc_cache_noprof was directly called instead of kmem_cache_alloc,which resulted in no alloc_tag being allocated.This caused current->alloc_tag to be null,leading to a null pointer dereference in alloc_tag_ref_set. Despite the fact that my colleague Pei Xiao will later fix the code in slub_kunit.c,we still need fix null pointer check logic for ref and tag to avoid panic caused by a null pointer dereference. Here is the log for the panic: [ 74.779373][ T2158] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 [ 74.780130][ T2158] Mem abort info: [ 74.780406][ T2158] ESR = 0x0000000096000004 [ 74.780756][ T2158] EC = 0x25: DABT (current EL), IL = 32 bits [ 74.781225][ T2158] SET = 0, FnV = 0 [ 74.781529][ T2158] EA = 0, S1PTW = 0 [ 74.781836][ T2158] FSC = 0x04: level 0 translation fault [ 74.782288][ T2158] Data abort info: [ 74.782577][ T2158] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 74.783068][ T2158] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 74.783533][ T2158] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 74.784010][ T2158] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000105f34000 [ 74.784586][ T2158] [0000000000000020] pgd=0000000000000000, p4d=0000000000000000 [ 74.785293][ T2158] Internal error: Oops: 0000000096000004 [#1] SMP [ 74.785805][ T2158] Modules linked in: slub_kunit kunit ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle 4 [ 74.790661][ T2158] CPU: 0 UID: 0 PID: 2158 Comm: kunit_try_catch Kdump: loaded Tainted: G W N 6.12.0-rc3+ #2 [ 74.791535][ T2158] Tainted: [W]=WARN, [N]=TEST [ 74.791889][ T2158] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [ 74.792479][ T2158] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 74.793101][ T2158] pc : alloc_tagging_slab_alloc_hook+0x120/0x270 [ 74.793607][ T2158] lr : alloc_tagging_slab_alloc_hook+0x120/0x270 [ 74.794095][ T2158] sp : ffff800084d33cd0 [ 74.794418][ T2158] x29: ffff800084d33cd0 x28: 0000000000000000 x27: 0000000000000000 [ 74.795095][ T2158] x26: 0000000000000000 x25: 0000000000000012 x24: ffff80007b30e314 [ 74.795822][ T2158] x23: ffff000390ff6f10 x22: 0000000000000000 x21: 0000000000000088 [ 74.796555][ T2158] x20: ffff000390285840 x19: fffffd7fc3ef7830 x18: ffffffffffffffff [ 74.797283][ T2158] x17: ffff8000800e63b4 x16: ffff80007b33afc4 x15: ffff800081654c00 [ 74.798011][ T2158] x14: 0000000000000000 x13: 205d383531325420 x12: 5b5d383734363537 [ 74.798744][ T2158] x11: ffff800084d337e0 x10: 000000000000005d x9 : 00000000ffffffd0 [ 74.799476][ T2158] x8 : 7f7f7f7f7f7f7f7f x7 : ffff80008219d188 x6 : c0000000ffff7fff [ 74.800206][ T2158] x5 : ffff0003fdbc9208 x4 : ffff800081edd188 x3 : 0000000000000001 [ 74.800932][ T2158] x2 : 0beaa6dee1ac5a00 x1 : 0beaa6dee1ac5a00 x0 : ffff80037c2cb000 [ 74.801656][ T2158] Call trace: [ 74.801954][ T2158] alloc_tagging_slab_alloc_hook+0x120/0x270 [ 74.802494][ T2158] __kmalloc_cache_noprof+0x148/0x33c [ 74.802976][ T2158] test_kmalloc_redzone_access+0x4c/0x104 [slub_kunit] [ 74.803607][ T2158] kunit_try_run_case+0x70/0x17c [kunit] [ 74.804124][ T2158] kunit_generic_run_threadfn_adapter+0x2c/0x4c [kunit] [ 74.804768][ T2158] kthread+0x10c/0x118 [ 74.805141][ T2158] ret_from_fork+0x10/0x20 [ 74.805540][ T2158] Code: b9400a80 11000400 b9000a80 97ffd858 (f94012d3) [ 74.806176][ T2158] SMP: stopping secondary CPUs [ 74.808130][ T2158] Starting crashdump kernel... Link: https://lkml.kernel.org/r/20241020070819.307944-1-hao.ge@linux.dev Fixes: e0a955bf7f61 ("mm/codetag: add pgalloc_tag_copy()") Signed-off-by: Hao Ge <gehao@kylinos.cn> Acked-by: Suren Baghdasaryan <surenb@google.com> Suggested-by: Suren Baghdasaryan <surenb@google.com> Acked-by: Yu Zhao <yuzhao@google.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-10-31KVM: arm64: nv: Handle CNTHCTL_EL2 speciallyMarc Zyngier
Accessing CNTHCTL_EL2 is fraught with danger if running with HCR_EL2.E2H=1: half of the bits are held in CNTKCTL_EL1, and thus can be changed behind our back, while the rest lives in the CNTHCTL_EL2 shadow copy that is memory-based. Yes, this is a lot of fun! Make sure that we merge the two on read access, while we can write to CNTKCTL_EL1 in a more straightforward manner. Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20241023145345.1613824-7-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2024-10-30net: sched: propagate "skip_sw" flag to struct flow_cls_common_offloadVladimir Oltean
Background: switchdev ports offload the Linux bridge, and most of the packets they handle will never see the CPU. The ports between which there exists no hardware data path are considered 'foreign' to switchdev. These can either be normal physical NICs without switchdev offload, or incompatible switchdev ports, or virtual interfaces like veth/dummy/etc. In some cases, an offloaded filter can only do half the work, and the rest must be handled by software. Redirecting/mirroring from the ingress of a switchdev port towards a foreign interface is one example of combined hardware/software data path. The most that the switchdev port can do is to extract the matching packets from its offloaded data path and send them to the CPU. From there on, the software filter runs (a second time, after the first run in hardware) on the packet and performs the mirred action. It makes sense for switchdev drivers which allow this kind of "half offloading" to sense the "skip_sw" flag of the filter/action pair, and deny attempts from the user to install a filter that does not run in software, because that simply won't work. In fact, a mirred action on a switchdev port towards a dummy interface appears to be a valid way of (selectively) monitoring offloaded traffic that flows through it. IFF_PROMISC was also discussed years ago, but (despite initial disagreement) there seems to be consensus that this flag should not affect the destination taken by packets, but merely whether or not the NIC discards packets with unknown MAC DA for local processing. [1] https://lore.kernel.org/netdev/20190830092637.7f83d162@ceranb/ [2] https://lore.kernel.org/netdev/20191002233750.13566-1-olteanv@gmail.com/ Suggested-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/netdev/ZxUo0Dc0M5Y6l9qF@shredder.mtl.com/ Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20241023135251.1752488-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30s390/time: Add clocksource id to TOD clockSven Schnelle
To allow specifying the clock source in the upcoming PtP driver, add a clocksource ID to the s390 TOD clock. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Link: https://patch.msgid.link/20241023065601.449586-2-svens@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-30uprobes: SRCU-protect uretprobe lifetime (with timeout)Andrii Nakryiko
Avoid taking refcount on uprobe in prepare_uretprobe(), instead take uretprobe-specific SRCU lock and keep it active as kernel transfers control back to user space. Given we can't rely on user space returning from traced function within reasonable time period, we need to make sure not to keep SRCU lock active for too long, though. To that effect, we employ a timer callback which is meant to terminate SRCU lock region after predefined timeout (currently set to 100ms), and instead transfer underlying struct uprobe's lifetime protection to refcounting. This fallback to less scalable refcounting after 100ms is a fine tradeoff from uretprobe's scalability and performance perspective, because uretprobing *long running* user functions inherently doesn't run into scalability issues (there is just not enough frequency to cause noticeable issues with either performance or scalability). The overall trick is in ensuring synchronization between current thread and timer's callback fired on some other thread. To cope with that with minimal logic complications, we add hprobe wrapper which is used to contain all the synchronization related issues behind a small number of basic helpers: hprobe_expire() for "downgrading" uprobe from SRCU-protected state to refcounted state, and a hprobe_consume() and hprobe_finalize() pair of single-use consuming helpers. Other than that, whatever current thread's logic is there stays the same, as timer thread cannot modify return_instance state (or add new/remove old return_instances). It only takes care of SRCU unlock and uprobe refcounting, which is hidden from the higher-level uretprobe handling logic. We use atomic xchg() in hprobe_consume(), which is called from performance critical handle_uretprobe_chain() function run in the current context. When uncontended, this xchg() doesn't seem to hurt performance as there are no other competing CPUs fighting for the same cache line. We also mark struct return_instance as ____cacheline_aligned to ensure no false sharing can happen. Another technical moment. We need to make sure that the list of return instances can be safely traversed under RCU from timer callback, so we delay return_instance freeing with kfree_rcu() and make sure that list modifications use RCU-aware operations. Also, given SRCU lock survives transition from kernel to user space and back we need to use lower-level __srcu_read_lock() and __srcu_read_unlock() to avoid lockdep complaining. Just to give an impression of a kind of performance improvements this change brings, below are benchmarking results with and without these SRCU changes, assuming other uprobe optimizations (mainly RCU Tasks Trace for entry uprobes, lockless RB-tree lookup, and lockless VMA to uprobe lookup) are left intact: WITHOUT SRCU for uretprobes =========================== uretprobe-nop ( 1 cpus): 2.197 ± 0.002M/s ( 2.197M/s/cpu) uretprobe-nop ( 2 cpus): 3.325 ± 0.001M/s ( 1.662M/s/cpu) uretprobe-nop ( 3 cpus): 4.129 ± 0.002M/s ( 1.376M/s/cpu) uretprobe-nop ( 4 cpus): 6.180 ± 0.003M/s ( 1.545M/s/cpu) uretprobe-nop ( 8 cpus): 7.323 ± 0.005M/s ( 0.915M/s/cpu) uretprobe-nop (16 cpus): 6.943 ± 0.005M/s ( 0.434M/s/cpu) uretprobe-nop (32 cpus): 5.931 ± 0.014M/s ( 0.185M/s/cpu) uretprobe-nop (64 cpus): 5.145 ± 0.003M/s ( 0.080M/s/cpu) uretprobe-nop (80 cpus): 4.925 ± 0.005M/s ( 0.062M/s/cpu) WITH SRCU for uretprobes ======================== uretprobe-nop ( 1 cpus): 1.968 ± 0.001M/s ( 1.968M/s/cpu) uretprobe-nop ( 2 cpus): 3.739 ± 0.003M/s ( 1.869M/s/cpu) uretprobe-nop ( 3 cpus): 5.616 ± 0.003M/s ( 1.872M/s/cpu) uretprobe-nop ( 4 cpus): 7.286 ± 0.002M/s ( 1.822M/s/cpu) uretprobe-nop ( 8 cpus): 13.657 ± 0.007M/s ( 1.707M/s/cpu) uretprobe-nop (32 cpus): 45.305 ± 0.066M/s ( 1.416M/s/cpu) uretprobe-nop (64 cpus): 42.390 ± 0.922M/s ( 0.662M/s/cpu) uretprobe-nop (80 cpus): 47.554 ± 2.411M/s ( 0.594M/s/cpu) Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20241024044159.3156646-3-andrii@kernel.org
2024-10-30perf/x86/rapl: Clean up cpumask and hotplugKan Liang
The rapl pmu is die scope, which is supported by the generic perf_event subsystem now. Set the scope for the rapl PMU and remove all the cpumask and hotplug codes. Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Oliver Sang <oliver.sang@intel.com> Tested-by: Dhananjay Ugwekar <dhananjay.ugwekar@amd.com> Link: https://lore.kernel.org/r/20241010142604.770192-2-kan.liang@linux.intel.com
2024-10-30KVM: Protect vCPU's "last run PID" with rwlock, not RCUSean Christopherson
To avoid jitter on KVM_RUN due to synchronize_rcu(), use a rwlock instead of RCU to protect vcpu->pid, a.k.a. the pid of the task last used to a vCPU. When userspace is doing M:N scheduling of tasks to vCPUs, e.g. to run SEV migration helper vCPUs during post-copy, the synchronize_rcu() needed to change the PID associated with the vCPU can stall for hundreds of milliseconds, which is problematic for latency sensitive post-copy operations. In the directed yield path, do not acquire the lock if it's contended, i.e. if the associated PID is changing, as that means the vCPU's task is already running. Reported-by: Steve Rutherford <srutherford@google.com> Reviewed-by: Steve Rutherford <srutherford@google.com> Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20240802200136.329973-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>