summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
AgeCommit message (Collapse)Author
2024-09-05media: cec: extron-da-hd-4k-plus: add the Extron DA HD 4K Plus CEC driverHans Verkuil
Add support for the Extron DA HD 4K Plus series of 4K HDMI Distrubution Amplifiers (aka HDMI Splitters). These devices support CEC and this driver adds support for the CEC protocol for both the input and all outputs (2, 4 or 6 outputs, depending on the model). It also exports the EDID from the outputs and allows reading and setting the EDID of the input. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
2024-09-05Merge tag 'hwmon-for-v6.11-rc7' into review-hansHans de Goede
Merge "hwmon fixes for v6.11-rc7" into review-hans to bring in commit a54da9df75cd ("hwmon: (hp-wmi-sensors) Check if WMI event data exists"). This is a dependency for a set of WMI event data refactoring changes.
2024-09-04Documentation: admin-guide: pm: Add efficiency vs. latency tradeoff to ↵Tero Kristo
uncore documentation Added documentation about the functionality of efficiency vs. latency tradeoff control in intel Xeon processors, and how this is configured via sysfs. Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20240828153657.1296410-2-tero.kristo@linux.intel.com Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2024-09-04KVM: Add a module param to allow enabling virtualization when KVM is loadedSean Christopherson
Add an on-by-default module param, enable_virt_at_load, to let userspace force virtualization to be enabled in hardware when KVM is initialized, i.e. just before /dev/kvm is exposed to userspace. Enabling virtualization during KVM initialization allows userspace to avoid the additional latency when creating/destroying the first/last VM (or more specifically, on the 0=>1 and 1=>0 edges of creation/destruction). Now that KVM uses the cpuhp framework to do per-CPU enabling, the latency could be non-trivial as the cpuhup bringup/teardown is serialized across CPUs, e.g. the latency could be problematic for use case that need to spin up VMs quickly. Prior to commit 10474ae8945c ("KVM: Activate Virtualization On Demand"), KVM _unconditionally_ enabled virtualization during load, i.e. there's no fundamental reason KVM needs to dynamically toggle virtualization. These days, the only known argument for not enabling virtualization is to allow KVM to be autoloaded without blocking other out-of-tree hypervisors, and such use cases can simply change the module param, e.g. via command line. Note, the aforementioned commit also mentioned that enabling SVM (AMD's virtualization extensions) can result in "using invalid TLB entries". It's not clear whether the changelog was referring to a KVM bug, a CPU bug, or something else entirely. Regardless, leaving virtualization off by default is not a robust "fix", as any protection provided is lost the instant userspace creates the first VM. Reviewed-by: Chao Gao <chao.gao@intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Reviewed-by: Kai Huang <kai.huang@intel.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20240830043600.127750-8-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-03Documentation/cgroup-v2: clarify that zswap.writeback is ignored if zswap is ↵Mike Yuan
disabled As discussed in [1], zswap-related settings natually lose their effect when zswap is disabled, specifically zswap.writeback here. Be explicit about this behavior. [1] https://lore.kernel.org/linux-kernel/CAKEwX=Mhbwhh-=xxCU-RjMXS_n=RpV3Gtznb2m_3JgL+jzz++g@mail.gmail.com/ [akpm@linux-foundation.org: fix/simplify text] Link: https://lkml.kernel.org/r/20240823162506.12117-3-me@yhndnzj.com Signed-off-by: Mike Yuan <me@yhndnzj.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03mm,memcg: provide per-cgroup counters for NUMA balancing operationsKaiyang Zhao
The ability to observe the demotion and promotion decisions made by the kernel on a per-cgroup basis is important for monitoring and tuning containerized workloads on machines equipped with tiered memory. Different containers in the system may experience drastically different memory tiering actions that cannot be distinguished from the global counters alone. For example, a container running a workload that has a much hotter memory accesses will likely see more promotions and fewer demotions, potentially depriving a colocated container of top tier memory to such an extent that its performance degrades unacceptably. For another example, some containers may exhibit longer periods between data reuse, causing much more numa_hint_faults than numa_pages_migrated. In this case, tuning hot_threshold_ms may be appropriate, but the signal can easily be lost if only global counters are available. In the long term, we hope to introduce per-cgroup control of promotion and demotion actions to implement memory placement policies in tiering. This patch set adds seven counters to memory.stat in a cgroup: numa_pages_migrated, numa_pte_updates, numa_hint_faults, pgdemote_kswapd, pgdemote_khugepaged, pgdemote_direct and pgpromote_success. pgdemote_* and pgpromote_success are also available in memory.numa_stat. count_memcg_events_mm() is added to count multiple event occurrences at once, and get_mem_cgroup_from_folio() is added because we need to get a reference to the memcg of a folio before it's migrated to track numa_pages_migrated. The accounting of PGDEMOTE_* is moved to shrink_inactive_list() before being changed to per-cgroup. [kaiyang2@cs.cmu.edu: add documentation of the memcg counters in cgroup-v2.rst] Link: https://lkml.kernel.org/r/20240814235122.252309-1-kaiyang2@cs.cmu.edu Link: https://lkml.kernel.org/r/20240814174227.30639-1-kaiyang2@cs.cmu.edu Signed-off-by: Kaiyang Zhao <kaiyang2@cs.cmu.edu> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Wei Xu <weixugc@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-03docs: move numa=fake description to kernel-parameters.txtMike Rapoport (Microsoft)
NUMA emulation can be now enabled on arm64 and riscv in addition to x86. Move description of numa=fake parameters from x86 documentation of admin-guide/kernel-parameters.txt Link: https://lkml.kernel.org/r/20240807064110.1003856-27-rppt@kernel.org Suggested-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01Document/kexec: generalize crash hotplug descriptionSourabh Jain
Commit 79365026f869 ("crash: add a new kexec flag for hotplug support") generalizes the crash hotplug support to allow architectures to update multiple kexec segments on CPU/Memory hotplug and not just elfcorehdr. Therefore, update the relevant kernel documentation to reflect the same. No functional change. Link: https://lkml.kernel.org/r/20240812041651.703156-1-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Reviewed-by: Petr Tesarik <ptesarik@suse.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Petr Tesarik <petr@tesarici.cz> Cc: Sourabh Jain <sourabhjain@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01memcg: initiate deprecation of pressure_levelShakeel Butt
The pressure_level in memcg v1 provides memory pressure notifications to the user space. At the moment it provides notifications for three levels of memory pressure i.e. low, medium and critical, which are defined based on internal memory reclaim implementation details. More specifically the ratio of scanned and reclaimed pages during a memory reclaim. However this is not robust as there are workloads with mostly unreclaimable user memory or kernel memory. For v2, the users can use PSI for memory pressure status of the system or the cgroup. Let's start the deprecation process for pressure_level and add warnings to gather the info on how the current users are using this interface and how they can be used to PSI. Link: https://lkml.kernel.org/r/20240814220021.3208384-5-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: T.J. Mercier <tjmercier@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01memcg: initiate deprecation of oom_controlShakeel Butt
The oom_control provides functionality to disable memcg oom-killer, notifications on oom-kill and reading the stats regarding oom-kills. This interface was mainly introduced to provide functionality for userspace oom-killers. However it is not robust enough and only supports OOM handling in the page fault path. For v2, the users can use the combination of memory.events notifications, memory.high and PSI to provide userspace OOM-killing functionality. Actually LMKD in Android and OOMd in systemd and Meta infrastructure already use PSI in combination with other stats to implement userspace OOM-killing. Let's start the deprecation process for v1 and gather the info on how the current users are using this interface and work on providing a more robust functionality in v2. Link: https://lkml.kernel.org/r/20240814220021.3208384-4-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: T.J. Mercier <tjmercier@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01memcg: initiate deprecation of v1 soft limitShakeel Butt
Memcg v1 provides soft limit functionality for the best effort memory sharing between multiple workloads on a system. It is usually triggered through kswapd and at the moment does not reclaim kernel memory. Memcg v2 provides more straightforward best effort (memory.low) and hard protection (memory.min) functionalities. Let's initiate the deprecation of soft limit from v1 and gather if v2 needs something more to move the existing v1 users to v2 regarding soft limit. Link: https://lkml.kernel.org/r/20240814220021.3208384-3-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: T.J. Mercier <tjmercier@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01memcg: initiate deprecation of v1 tcp accountingShakeel Butt
Patch series "memcg: initiate deprecation of v1 features", v2. Start the deprecation process of the memcg v1 features which we discussed during LSFMMBPF 2024 [1]. For now add the warnings to collect the information on how the current users are using these features. Next we will work on providing better alternatives in v2 (if needed) and fully deprecate these features. Link: https://lwn.net/Articles/974575 [1] This patch (of 4): Memcg v1 provides opt-in TCP memory accounting feature. However it is mostly unused due to its performance impact on the network traffic. In v2, the TCP memory is accounted in the regular memory usage and is transparent to the users but they can observe the TCP memory usage through memcg stats. Let's initiate the deprecation process of memcg v1's tcp accounting functionality and add warnings to gather if there are any users and if there are, collect how they are using it and plan to provide them better alternative in v2. Link: https://lkml.kernel.org/r/20240814220021.3208384-1-shakeel.butt@linux.dev Link: https://lkml.kernel.org/r/20240814220021.3208384-2-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: T.J. Mercier <tjmercier@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm: override mTHP "enabled" defaults at kernel cmdlineRyan Roberts
Add thp_anon= cmdline parameter to allow specifying the default enablement of each supported anon THP size. The parameter accepts the following format and can be provided multiple times to configure each size: thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value> An example: thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never See Documentation/admin-guide/mm/transhuge.rst for more details. Configuring the defaults at boot time is useful to allow early user space to take advantage of mTHP before its been configured through sysfs. [v-songbaohua@oppo.com: use get_oder() and check size is is_power_of_2] Link: https://lkml.kernel.org/r/20240814224635.43272-1-21cnbao@gmail.com [ryan.roberts@arm.com: some minor cleanup according to David's comments] Link: https://lkml.kernel.org/r/20240820105244.62703-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240814020247.67297-1-21cnbao@gmail.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Co-developed-by: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Barry Song <v-songbaohua@oppo.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm, memcg: cg2 memory{.swap,}.peak write handlersDavid Finkel
Patch series "mm, memcg: cg2 memory{.swap,}.peak write handlers", v7. This patch (of 2): Other mechanisms for querying the peak memory usage of either a process or v1 memory cgroup allow for resetting the high watermark. Restore parity with those mechanisms, but with a less racy API. For example: - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets the high watermark. - writing "5" to the clear_refs pseudo-file in a processes's proc directory resets the peak RSS. This change is an evolution of a previous patch, which mostly copied the cgroup v1 behavior, however, there were concerns about races/ownership issues with a global reset, so instead this change makes the reset filedescriptor-local. Writing any non-empty string to the memory.peak and memory.swap.peak pseudo-files reset the high watermark to the current usage for subsequent reads through that same FD. Notably, following Johannes's suggestion, this implementation moves the O(FDs that have written) behavior onto the FD write(2) path. Instead, on the page-allocation path, we simply add one additional watermark to conditionally bump per-hierarchy level in the page-counter. Additionally, this takes Longman's suggestion of nesting the page-charging-path checks for the two watermarks to reduce the number of common-case comparisons. This behavior is particularly useful for work scheduling systems that need to track memory usage of worker processes/cgroups per-work-item. Since memory can't be squeezed like CPU can (the OOM-killer has opinions), these systems need to track the peak memory usage to compute system/container fullness when binpacking workitems. Most notably, Vimeo's use-case involves a system that's doing global binpacking across many Kubernetes pods/containers, and while we can use PSI for some local decisions about overload, we strive to avoid packing workloads too tightly in the first place. To facilitate this, we track the peak memory usage. However, since we run with long-lived workers (to amortize startup costs) we need a way to track the high watermark while a work-item is executing. Polling runs the risk of missing short spikes that last for timescales below the polling interval, and peak memory tracking at the cgroup level is otherwise perfect for this use-case. As this data is used to ensure that binpacked work ends up with sufficient headroom, this use-case mostly avoids the inaccuracies surrounding reclaimable memory. Link: https://lkml.kernel.org/r/20240730231304.761942-1-davidf@vimeo.com Link: https://lkml.kernel.org/r/20240729143743.34236-1-davidf@vimeo.com Link: https://lkml.kernel.org/r/20240729143743.34236-2-davidf@vimeo.com Signed-off-by: David Finkel <davidf@vimeo.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Suggested-by: Waiman Long <longman@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Michal Koutný <mkoutny@suse.com> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Shuah Khan <shuah@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-09-01mm/memcontrol: respect zswap.writeback setting from parent cg tooMike Yuan
Currently, the behavior of zswap.writeback wrt. the cgroup hierarchy seems a bit odd. Unlike zswap.max, it doesn't honor the value from parent cgroups. This surfaced when people tried to globally disable zswap writeback, i.e. reserve physical swap space only for hibernation [1] - disabling zswap.writeback only for the root cgroup results in subcgroups with zswap.writeback=1 still performing writeback. The inconsistency became more noticeable after I introduced the MemoryZSwapWriteback= systemd unit setting [2] for controlling the knob. The patch assumed that the kernel would enforce the value of parent cgroups. It could probably be workarounded from systemd's side, by going up the slice unit tree and inheriting the value. Yet I think it's more sensible to make it behave consistently with zswap.max and friends. [1] https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate#Disable_zswap_writeback_to_use_the_swap_space_only_for_hibernation [2] https://github.com/systemd/systemd/pull/31734 Link: https://lkml.kernel.org/r/20240823162506.12117-1-me@yhndnzj.com Fixes: 501a06fe8e4c ("zswap: memcontrol: implement zswap writeback disabling") Signed-off-by: Mike Yuan <me@yhndnzj.com> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Acked-by: Yosry Ahmed <yosryahmed@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-08-30drivers/perf: hisi_pcie: Export supported Root Ports [bdf_min, bdf_max]Yicong Yang
Currently users can get the Root Ports supported by the PCIe PMU by "bus" sysfs attributes which indicates the PCIe bus number where Root Ports are located. This maybe insufficient since Root Ports supported by different PCIe PMUs may be located on the same PCIe bus. So export the BDF range the Root Ports additionally. Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/20240829090332.28756-4-yangyicong@huawei.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30proc: add config & param to block forcing mem writesAdrian Ratiu
This adds a Kconfig option and boot param to allow removing the FOLL_FORCE flag from /proc/pid/mem write calls because it can be abused. The traditional forcing behavior is kept as default because it can break GDB and some other use cases. Previously we tried a more sophisticated approach allowing distributions to fine-tune /proc/pid/mem behavior, however that got NAK-ed by Linus [1], who prefers this simpler approach with semantics also easier to understand for users. Link: https://lore.kernel.org/lkml/CAHk-=wiGWLChxYmUA5HrT5aopZrB7_2VTa0NLZcxORgkUe5tEQ@mail.gmail.com/ [1] Cc: Doug Anderson <dianders@chromium.org> Cc: Jeff Xu <jeffxu@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com> Link: https://lore.kernel.org/r/20240802080225.89408-1-adrian.ratiu@collabora.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-27Documentation/srso: Document a method for checking safe RET operates properlyBorislav Petkov (AMD)
Add a method to quickly verify whether safe RET operates properly on a given system using perf tool. Also, add a selftest which does the same thing. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240731160531.28640-1-bp@kernel.org
2024-08-26Documentation: ext4.rst: remove obsolete descriptions of noacl/nouser_xattr ↵Stefan Tauner
options These have been deprecated for a decade[1] and removed two years ago[2]. 1: f70486055ee351158bd6999f3965ad378b52c694 2: 2d544ec923dbe5fbed64a7f43dccf527218380bc Signed-off-by: Stefan Tauner <stefan.tauner@gmx.at> Link: https://patch.msgid.link/20240728003433.2566649-1-stefan.tauner@gmx.at Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26Documentation: admin-guide: direct people to bug trackers, if specifiedJani Nikula
Update bug reporting info in bug-hunting.rst to direct people to driver/subsystem bug trackers, if explicitly specified with the MAINTAINERS "B:" entry. Use the new get_maintainer.pl --bug option to print the info. Cc: Joe Perches <joe@perches.com> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240815113450.3397499-2-jani.nikula@intel.com
2024-08-26docs: bug-bisect: rewrite to better match the other bisecting textThorsten Leemhuis
Rewrite the short document on bisecting kernel bugs. The new text improves .config handling, brings a mention of 'git bisect skip', and explains what to do after the bisection finished -- including trying a revert to verify the result. The rewrite at the same time removes the unrelated and outdated section on 'Devices not appearing' and replaces some sentences about bug reporting with a pointer to the document covering that topic in detail. This overall brings the approach close to the one in the recently added text Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst. As those two texts serve a similar purpose for different audiences, mention that document in the head of this one and outline when the other might be the better one to follow. Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info> Reviewed-by: Petr Tesarik <petr@tesarici.cz> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/74dc0137dcc3e2c05648e885a7bc31ffd39a0890.1724312119.git.linux@leemhuis.info
2024-08-26tracing/Documentation: Start a document on how to debug with tracingSteven Rostedt
Add a new document Documentation/trace/debugging.rst that will hold various ways to debug tracing. This initial version mentions trace_printk and how to create persistent buffers that can last across bootups. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexander Aring <aahringo@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Tomas Glozar <tglozar@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Jonathan Corbet" <corbet@lwn.net> Link: https://lore.kernel.org/20240823014019.702433486@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-26tracing: Have trace_printk not use binary prints if boot bufferSteven Rostedt
If the persistent boot mapped ring buffer is used for trace_printk(), force it to not use the binary versions. trace_printk() by default uses bin_printf() that only saves the pointer to the format and not the format itself inside the ring buffer. But for a persistent buffer that is read after reboot, the pointers to the format strings may not be the same, or worse, not even exist! Instead, just force the more robust, but slower, version that does the formatting before saving into the ring buffer. The boot mapped buffer can now be used for trace_printk and friends! Using the trace_printk() and the persistent buffer was used to debug the issue with the osnoise tracer: Link: https://lore.kernel.org/all/20240822103443.6a6ae051@gandalf.local.home/ Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexander Aring <aahringo@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Tomas Glozar <tglozar@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Jonathan Corbet" <corbet@lwn.net> Link: https://lore.kernel.org/20240823014019.386925800@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-26tracing: Allow trace_printk() to go to other instance buffersSteven Rostedt
Currently, trace_printk() just goes to the top level ring buffer. But there may be times that it should go to one of the instances created by the kernel command line. Add a new trace_instance flag: traceprintk (also can use "printk" or "trace_printk" as people tend to forget the actual flag name). trace_instance=foo^traceprintk Will assign the trace_printk to this buffer at boot up. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexander Aring <aahringo@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Tomas Glozar <tglozar@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Jonathan Corbet" <corbet@lwn.net> Link: https://lore.kernel.org/20240823014019.226694946@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-26tracing: Add "traceoff" flag to boot time tracing instancesSteven Rostedt
Add a "flags" delimiter (^) to the "trace_instance" kernel command line parameter, and add the "traceoff" flag. The format is: trace_instance=<name>[^<flag1>[^<flag2>]][@<memory>][,<events>] The code allows for more than one flag to be added, but currently only "traceoff" is done so. The motivation for this change came from debugging with the persistent ring buffer and having trace_printk() writing to it. The trace_printk calls are always enabled, and the boot after the crash was having the unwanted trace_printks from the current boot inject into the ring buffer with the trace_printks of the crash kernel, making the output very confusing. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vineeth Pillai <vineeth@bitbyteword.org> Cc: Beau Belgrave <beaub@linux.microsoft.com> Cc: Alexander Graf <graf@amazon.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Ross Zwisler <zwisler@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexander Aring <aahringo@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Tomas Glozar <tglozar@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Jonathan Corbet" <corbet@lwn.net> Link: https://lore.kernel.org/20240823014019.053229958@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-23Documentation: dwc_pcie_pmu: Update bdf to sbdfKrishna chaitanya chundru
Update document to reflect the driver change to use sbdf instead of bdf alone. Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com> Reviewed-by: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240816-dwc_pmu_fix-v2-2-198b8ab1077c@quicinc.com Signed-off-by: Will Deacon <will@kernel.org>
2024-08-21dm-crypt: Allow to specify the integrity key size as optionIngo Franzki
For the MAC based integrity operation, the integrity key size (i.e. key_mac_size) is currently set to the digest size of the used digest. For wrapped key HMAC algorithms, the key size is independent of the cryptographic key size. So there is no known size of the mac key in such cases. The desired key size can optionally be specified as argument when the dm-crypt device is configured via 'integrity_key_size:%u'. If no integrity_key_size argument is specified, the mac key size is still set to the digest size, as before. Increase version number to 1.28.0 so that support for the new argument can be detected by user space (i.e. cryptsetup). Signed-off-by: Ingo Franzki <ifranzki@linux.ibm.com> Reviewed-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-08-21dm delay: enhance kernel documentationHeinz Mauelshagen
This commit improves documentation of the dm-delay target. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-08-21dm vdo: add dmsetup message for returning configuration infoBruce Johnston
Add a new dmsetup message called config, which will return useful configuration information for the vdo volume and the uds index associated with it. The output is a YAML string, and contains a version number to allow future additions to the content. Signed-off-by: Bruce Johnston <bjohnsto@redhat.com> Signed-off-by: Matthew Sakai <msakai@redhat.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
2024-08-20documentation: add IPE documentationDeven Bowers
Add IPE's admin and developer documentation to the kernel tree. Co-developed-by: Fan Wu <wufan@linux.microsoft.com> Signed-off-by: Deven Bowers <deven.desai@linux.microsoft.com> Signed-off-by: Fan Wu <wufan@linux.microsoft.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-08-19cgroup: update some statememt about delegationChen Ridong
The comment in cgroup_file_write is missing some interfaces, such as 'cgroup.threads'. All delegatable files are listed in '/sys/kernel/cgroup/delegate', so update the comment in cgroup_file_write. Besides, add a statement that files outside the namespace shouldn't be visible from inside the delegated namespace. tj: Reflowed text for consistency. Signed-off-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-15tracing: Allow boot instances to use reserve_mem boot memorySteven Rostedt (Google)
Allow boot instances to use memory reserved by the reserve_mem boot option. reserve_mem=12M:4096:trace trace_instance=boot_mapped@trace The above will allocate 12 megs with 4096 alignment and label it "trace". The second parameter will create a "boot_mapped" instance and use the memory reserved and labeled as "trace" as the memory for the ring buffer. That will create an instance called "boot_mapped": /sys/kernel/tracing/instances/boot_mapped Note, because the ring buffer is using a defined memory ranged, it will act just like a memory mapped ring buffer. It will not have a snapshot buffer, as it can't swap out the buffer. The snapshot files as well as any tracers that uses a snapshot will not be present in the boot_mapped instance. Also note that reserve_mem is not reliable in acquiring the same physical memory at each soft reboot. It is possible that KALSR could map the kernel at the previous boot memory location forcing the reserve_mem to return a different memory location. In this case, the previous ring buffer will be lost. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ross Zwisler <zwisler@google.com> Cc: Vincent Donnefort <vdonnefort@google.com> Link: https://lore.kernel.org/20240815082811.669f7d8c@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-14Merge tag 'v6.11-rc3' into trace/ring-buffer/coreSteven Rostedt
The "reserve_mem" kernel command line parameter has been pulled into v6.11. Merge the latest -rc3 to allow the persistent ring buffer memory to be able to be mapped at the address specified by the "reserve_mem" command line parameter. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2024-08-15rcu: Summarize RCU CPU stall warnings during CSD-lock stallsPaul E. McKenney
During CSD-lock stalls, the additional information output by RCU CPU stall warnings is usually redundant, flooding the console for not good reason. However, this has been the way things work for a few years. This commit therefore adds an rcutree.csd_lock_suppress_rcu_stall kernel boot parameter that causes RCU CPU stall warnings to be abbreviated to a single line when there is at least one CPU that has been stuck waiting for CSD lock for more than five seconds. To make this abbreviated message happen with decent probability: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 8 \ --configs "2*TREE01" --kconfig "CONFIG_CSD_LOCK_WAIT_DEBUG=y" \ --bootargs "csdlock_debug=1 rcutorture.stall_cpu=200 \ rcutorture.stall_cpu_holdoff=120 rcutorture.stall_cpu_irqsoff=1 \ rcutree.csd_lock_suppress_rcu_stall=1 \ rcupdate.rcu_exp_cpu_stall_timeout=5000" --trust-make [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14Merge tag 'next-media-rkisp1-20240814' of ↵Hans Verkuil
git://git.kernel.org/pub/scm/linux/kernel/git/pinchartl/linux.git Extensible parameters support for the rkisp1 driver. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-08-14doc: Remove RCU Tasks Rude asynchronous APIsPaul E. McKenney
The call_rcu_tasks_rude() and rcu_barrier_tasks_rude() APIs are no longer. This commit therefore removes them from the documentation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14rcutorture: Add a stall_cpu_repeat module parameterPaul E. McKenney
This commit adds an stall_cpu_repeat kernel, which is also the rcutorture.stall_cpu_repeat boot parameter, to test repeated CPU stalls. Note that only the first stall will pay attention to the stall_cpu_irqsoff module parameter. For the second and subsequent stalls, interrupts will be enabled. This is helpful when testing the interaction between RCU CPU stall warnings and CSD-lock stall warnings. Reported-by: Rik van Riel <riel@surriel.com> Signed-off-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-08-14media: admin-guide: mgb4: Outputs DV timings documentation updateMartin Tůma
Properly document the function of the mgb4 output "frame_rate" sysfs parameter and update the default DV timings values according to the latest code changes. Signed-off-by: Martin Tůma <martin.tuma@digiteqautomotive.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-08-13Documentation: dm-crypt.rst warning + error fixDaniel Yang
While building kernel documention using make htmldocs command, I was getting unexpected indentation error. Single description was given for two module parameters with wrong indentation. So, I corrected the indentation of both parameters and the description. Signed-off-by: Shibu kumar <shibukumar.bit@gmail.com> Signed-off-by: Daniel Yang <danielyangkang@gmail.com> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: 0d815e3400e6 ("dm-crypt: limit the size of encryption requests")
2024-08-12media: uapi: videodev2: Add V4L2_META_FMT_RK_ISP1_EXT_PARAMSJacopo Mondi
The rkisp1 driver stores ISP configuration parameters in the fixed rkisp1_params_cfg structure. As the members of the structure are part of the userspace API, the structure layout is immutable and cannot be extended further. Introducing new parameters or modifying the existing ones would change the buffer layout and cause breakages in existing applications. The allow for future extensions to the ISP parameters, introduce a new extensible parameters format, with a new format 4CC. Document usage of the new format in the rkisp1 admin guide. Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com> Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Reviewed-by: Paul Elder <paul.elder@ideasonboard.com> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com> Acked-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
2024-08-09Documentation: media: vivid.rst: update TODO listHans Verkuil
The vivid driver supports media controller support for quite a long time now, so drop that from the list. Since commit 4c4dacb052d4 ("media: vivid: loopback based on 'Connected To' controls") making EDID changes causes correct signaling to happen, but what is still missing is the 100 ms delay required before signaling that there is an HPD. Modify this TODO item accordingly. Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
2024-08-08pstore/ramoops: Fix typo as there is no "reserver"Steven Rostedt
For some reason my finger always hits the 'r' after typing "reserve". Fix the typo in the Documentation example. Fixes: d9d814eebb1ae ("pstore/ramoops: Add ramoops.mem_name= command line option") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://lore.kernel.org/r/20240807170029.3c1ff651@gandalf.local.home Signed-off-by: Kees Cook <kees@kernel.org>
2024-08-08smb3: fix setting SecurityFlags when encryption is requiredSteve French
Setting encryption as required in security flags was broken. For example (to require all mounts to be encrypted by setting): "echo 0x400c5 > /proc/fs/cifs/SecurityFlags" Would return "Invalid argument" and log "Unsupported security flags" This patch fixes that (e.g. allowing overriding the default for SecurityFlags 0x00c5, including 0x40000 to require seal, ie SMB3.1.1 encryption) so now that works and forces encryption on subsequent mounts. Acked-by: Bharath SM <bharathsm@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2024-08-07docs: dm-crypt: Removal of unexpected indentation errorShibu Kumar
Add the required indentation to fix this docs build error: Documentation/admin-guide/device-mapper/dm-crypt.rst:167: ERROR: Unexpected indentation. Also split the documentation for read and write into separate blocks. Signed-off-by: Shibu kumar shibukumar.bit@gmail.com [jc: rewrote changelog] Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240803183306.32425-1-shibukumar.bit@gmail.com
2024-08-07Documentation: kernel-parameters: add workqueue.panic_on_stallSangmoon Kim
The workqueue.panic_on_stall kernel parameter was added in commit 073107b39e55 ("workqueue: add cmdline parameter workqueue.panic_on_stall") but not listed in the kernel-parameters doc. Add it there. Signed-off-by: Sangmoon Kim <sangmoon.kim@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2024-08-04profiling: remove profile=sleep supportTetsuo Handa
The kernel sleep profile is no longer working due to a recursive locking bug introduced by commit 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked") Booting with the 'profile=sleep' kernel command line option added or executing # echo -n sleep > /sys/kernel/profiling after boot causes the system to lock up. Lockdep reports kthreadd/3 is trying to acquire lock: ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: get_wchan+0x32/0x70 but task is already holding lock: ffff93ac82e08d58 (&p->pi_lock){....}-{2:2}, at: try_to_wake_up+0x53/0x370 with the call trace being lock_acquire+0xc8/0x2f0 get_wchan+0x32/0x70 __update_stats_enqueue_sleeper+0x151/0x430 enqueue_entity+0x4b0/0x520 enqueue_task_fair+0x92/0x6b0 ttwu_do_activate+0x73/0x140 try_to_wake_up+0x213/0x370 swake_up_locked+0x20/0x50 complete+0x2f/0x40 kthread+0xfb/0x180 However, since nobody noticed this regression for more than two years, let's remove 'profile=sleep' support based on the assumption that nobody needs this functionality. Fixes: 42a20f86dc19 ("sched: Add wrapper for get_wchan() to keep task blocked") Cc: stable@vger.kernel.org # v5.16+ Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2024-07-31cgroup: Show # of subsystem CSSes in cgroup.statWaiman Long
Cgroup subsystem state (CSS) is an abstraction in the cgroup layer to help manage different structures in various cgroup subsystems by being an embedded element inside a larger structure like cpuset or mem_cgroup. The /proc/cgroups file shows the number of cgroups for each of the subsystems. With cgroup v1, the number of CSSes is the same as the number of cgroups. That is not the case anymore with cgroup v2. The /proc/cgroups file cannot show the actual number of CSSes for the subsystems that are bound to cgroup v2. So if a v2 cgroup subsystem is leaking cgroups (usually memory cgroup), we can't tell by looking at /proc/cgroups which cgroup subsystems may be responsible. As cgroup v2 had deprecated the use of /proc/cgroups, the hierarchical cgroup.stat file is now being extended to show the number of live and dying CSSes associated with all the non-inhibited cgroup subsystems that have been bound to cgroup v2. The number includes CSSes in the current cgroup as well as in all the descendants underneath it. This will help us pinpoint which subsystems are responsible for the increasing number of dying (nr_dying_descendants) cgroups. The CSSes dying counts are stored in the cgroup structure itself instead of inside the CSS as suggested by Johannes. This will allow us to accurately track dying counts of cgroup subsystems that have recently been disabled in a cgroup. It is now possible that a zero subsystem number is coupled with a non-zero dying subsystem number. The cgroup-v2.rst file is updated to discuss this new behavior. With this patch applied, a sample output from root cgroup.stat file was shown below. nr_descendants 56 nr_subsys_cpuset 1 nr_subsys_cpu 43 nr_subsys_io 43 nr_subsys_memory 56 nr_subsys_perf_event 57 nr_subsys_hugetlb 1 nr_subsys_pids 56 nr_subsys_rdma 1 nr_subsys_misc 1 nr_dying_descendants 30 nr_dying_subsys_cpuset 0 nr_dying_subsys_cpu 0 nr_dying_subsys_io 0 nr_dying_subsys_memory 30 nr_dying_subsys_perf_event 0 nr_dying_subsys_hugetlb 0 nr_dying_subsys_pids 0 nr_dying_subsys_rdma 0 nr_dying_subsys_misc 0 Another sample output from system.slice/cgroup.stat was: nr_descendants 34 nr_subsys_cpuset 0 nr_subsys_cpu 32 nr_subsys_io 32 nr_subsys_memory 34 nr_subsys_perf_event 35 nr_subsys_hugetlb 0 nr_subsys_pids 34 nr_subsys_rdma 0 nr_subsys_misc 0 nr_dying_descendants 30 nr_dying_subsys_cpuset 0 nr_dying_subsys_cpu 0 nr_dying_subsys_io 0 nr_dying_subsys_memory 30 nr_dying_subsys_perf_event 0 nr_dying_subsys_hugetlb 0 nr_dying_subsys_pids 0 nr_dying_subsys_rdma 0 nr_dying_subsys_misc 0 Note that 'debug' controller wasn't used to provide this information because the controller is not recommended in productions kernels, also many of them won't enable CONFIG_CGROUP_DEBUG by default. Similar information could be retrieved with debuggers like drgn but that's also not always available (e.g. lockdown) and the additional cost of runtime tracking here is deemed marginal. tj: Added Michal's paragraphs on why this is not added the debug controller to the commit message. Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com> Cc: Michal Koutný <mkoutny@suse.com> Link: http://lkml.kernel.org/r/20240715150034.2583772-1-longman@redhat.com Signed-off-by: Tejun Heo <tj@kernel.org>
2024-07-30Documentation: Add detailed explanation for 'N' taint flagBenjamin Poirier
Every taint flag has an entry in the "More detailed explanation" section except for the 'N' flag. That omission was probably just an oversight so add an entry for that flag. Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: David Gow <davidgow@google.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20240717203521.514348-1-bpoirier@nvidia.com
2024-07-26Merge tag 's390-6.11-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Vasily Gorbik: - Fix KMSAN build breakage caused by the conflict between s390 and mm-stable trees - Add KMSAN page markers for ptdump - Add runtime constant support - Fix __pa/__va for modules under non-GPL licenses by exporting necessary vm_layout struct with EXPORT_SYMBOL to prevent linkage problems - Fix an endless loop in the CF_DIAG event stop in the CPU Measurement Counter Facility code when the counter set size is zero - Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable its functionality by default - Support allocation of multiple MSI interrupts per device and improve logging of architecture-specific limitations - Add support for lowcore relocation as a debugging feature to catch all null ptr dereferences in the kernel address space, improving detection beyond the current implementation's limited write access protection - Clean up and rework CPU alternatives to allow for callbacks and early patching for the lowcore relocation * tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits) s390: Remove protvirt and kvm config guards for uv code s390/boot: Add cmdline option to relocate lowcore s390/kdump: Make kdump ready for lowcore relocation s390/entry: Make system_call() ready for lowcore relocation s390/entry: Make ret_from_fork() ready for lowcore relocation s390/entry: Make __switch_to() ready for lowcore relocation s390/entry: Make restart_int_handler() ready for lowcore relocation s390/entry: Make mchk_int_handler() ready for lowcore relocation s390/entry: Make int handlers ready for lowcore relocation s390/entry: Make pgm_check_handler() ready for lowcore relocation s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro s390/entry: Add base register to SIEEXIT macro s390/entry: Add base register to MBEAR macro s390/entry: Make __sie64a() ready for lowcore relocation s390/head64: Make startup code ready for lowcore relocation s390: Add infrastructure to patch lowcore accesses s390/atomic_ops: Disable flag outputs constraint for GCC versions below 14.2.0 s390/entry: Move SIE indicator flag to thread info s390/nmi: Simplify ptregs setup s390/alternatives: Remove alternative facility list ...
2024-07-25Merge tag 'parisc-for-6.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: "The gettimeofday() and clock_gettime() syscalls are now available as vDSO functions, and Dave added a patch which allows to use NVMe cards in the PCI slots as fast and easy alternative to SCSI discs. Summary: - add gettimeofday() and clock_gettime() vDSO functions - enable PCI_MSI_ARCH_FALLBACKS to allow PCI to PCIe bridge adaptor with PCIe NVME card to function in parisc machines - allow users to reduce kernel unaligned runtime warnings - minor code cleanups" * tag 'parisc-for-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Add support for CONFIG_SYSCTL_ARCH_UNALIGN_NO_WARN parisc: Use max() to calculate parisc_tlb_flush_threshold parisc: Fix warning at drivers/pci/msi/msi.h:121 parisc: Add 64-bit gettimeofday() and clock_gettime() vDSO functions parisc: Add 32-bit gettimeofday() and clock_gettime() vDSO functions parisc: Clean up unistd.h file