summaryrefslogtreecommitdiff
path: root/Documentation/admin-guide
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/admin-guide')
-rw-r--r--Documentation/admin-guide/cgroup-v1/memory.rst2
-rw-r--r--Documentation/admin-guide/cgroup-v2.rst31
-rw-r--r--Documentation/admin-guide/cifs/changes.rst4
-rw-r--r--Documentation/admin-guide/cifs/usage.rst8
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt267
-rw-r--r--Documentation/admin-guide/mm/damon/start.rst10
-rw-r--r--Documentation/admin-guide/mm/damon/usage.rst146
-rw-r--r--Documentation/admin-guide/perf/hisi-pmu.rst40
-rw-r--r--Documentation/admin-guide/quickly-build-trimmed-linux.rst49
-rw-r--r--Documentation/admin-guide/sysctl/kernel.rst2
-rw-r--r--Documentation/admin-guide/sysctl/net.rst4
11 files changed, 306 insertions, 257 deletions
diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst
index 47d1d7d932a8..fabaad3fd9c2 100644
--- a/Documentation/admin-guide/cgroup-v1/memory.rst
+++ b/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -297,7 +297,7 @@ Lock order is as follows::
Page lock (PG_locked bit of page->flags)
mm->page_table_lock or split pte_lock
- lock_page_memcg (memcg->move_lock)
+ folio_memcg_lock (memcg->move_lock)
mapping->i_pages lock
lruvec->lru_lock.
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 7544ce00e0cb..4ef890191196 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1213,23 +1213,25 @@ PAGE_SIZE multiple when read back.
A read-write single value file which exists on non-root
cgroups. The default is "max".
- Memory usage throttle limit. This is the main mechanism to
- control memory usage of a cgroup. If a cgroup's usage goes
+ Memory usage throttle limit. If a cgroup's usage goes
over the high boundary, the processes of the cgroup are
throttled and put under heavy reclaim pressure.
Going over the high limit never invokes the OOM killer and
- under extreme conditions the limit may be breached.
+ under extreme conditions the limit may be breached. The high
+ limit should be used in scenarios where an external process
+ monitors the limited cgroup to alleviate heavy reclaim
+ pressure.
memory.max
A read-write single value file which exists on non-root
cgroups. The default is "max".
- Memory usage hard limit. This is the final protection
- mechanism. If a cgroup's memory usage reaches this limit and
- can't be reduced, the OOM killer is invoked in the cgroup.
- Under certain circumstances, the usage may go over the limit
- temporarily.
+ Memory usage hard limit. This is the main mechanism to limit
+ memory usage of a cgroup. If a cgroup's memory usage reaches
+ this limit and can't be reduced, the OOM killer is invoked in
+ the cgroup. Under certain circumstances, the usage may go
+ over the limit temporarily.
In default configuration regular 0-order allocations always
succeed unless OOM killer chooses current task as a victim.
@@ -1238,10 +1240,6 @@ PAGE_SIZE multiple when read back.
Caller could retry them differently, return into userspace
as -ENOMEM or silently ignore in cases like disk readahead.
- This is the ultimate protection mechanism. As long as the
- high limit is used and monitored properly, this limit's
- utility is limited to providing the final safety net.
-
memory.reclaim
A write-only nested-keyed file which exists for all cgroups.
@@ -1582,6 +1580,13 @@ PAGE_SIZE multiple when read back.
Healthy workloads are not expected to reach this limit.
+ memory.swap.peak
+ A read-only single value file which exists on non-root
+ cgroups.
+
+ The max swap usage recorded for the cgroup and its
+ descendants since the creation of the cgroup.
+
memory.swap.max
A read-write single value file which exists on non-root
cgroups. The default is "max".
@@ -2445,7 +2450,7 @@ Miscellaneous controller provides 3 interface files. If two misc resources (res_
res_b 10
misc.current
- A read-only flat-keyed file shown in the non-root cgroups. It shows
+ A read-only flat-keyed file shown in the all cgroups. It shows
the current usage of the resources in the cgroup and its children.::
$ cat misc.current
diff --git a/Documentation/admin-guide/cifs/changes.rst b/Documentation/admin-guide/cifs/changes.rst
index 3147bbae9c43..8c42c4de510b 100644
--- a/Documentation/admin-guide/cifs/changes.rst
+++ b/Documentation/admin-guide/cifs/changes.rst
@@ -5,5 +5,5 @@ Changes
See https://wiki.samba.org/index.php/LinuxCIFSKernel for summary
information about fixes/improvements to CIFS/SMB2/SMB3 support (changes
to cifs.ko module) by kernel version (and cifs internal module version).
-This may be easier to read than parsing the output of "git log fs/cifs"
-by release.
+This may be easier to read than parsing the output of
+"git log fs/smb/client" by release.
diff --git a/Documentation/admin-guide/cifs/usage.rst b/Documentation/admin-guide/cifs/usage.rst
index 2e151cd8c2e4..5f936b4b6018 100644
--- a/Documentation/admin-guide/cifs/usage.rst
+++ b/Documentation/admin-guide/cifs/usage.rst
@@ -45,7 +45,7 @@ Installation instructions
If you have built the CIFS vfs as module (successfully) simply
type ``make modules_install`` (or if you prefer, manually copy the file to
-the modules directory e.g. /lib/modules/2.4.10-4GB/kernel/fs/cifs/cifs.ko).
+the modules directory e.g. /lib/modules/6.3.0-060300-generic/kernel/fs/smb/client/cifs.ko).
If you have built the CIFS vfs into the kernel itself, follow the instructions
for your distribution on how to install a new kernel (usually you
@@ -66,15 +66,15 @@ If cifs is built as a module, then the size and number of network buffers
and maximum number of simultaneous requests to one server can be configured.
Changing these from their defaults is not recommended. By executing modinfo::
- modinfo kernel/fs/cifs/cifs.ko
+ modinfo <path to cifs.ko>
-on kernel/fs/cifs/cifs.ko the list of configuration changes that can be made
+on kernel/fs/smb/client/cifs.ko the list of configuration changes that can be made
at module initialization time (by running insmod cifs.ko) can be seen.
Recommendations
===============
-To improve security the SMB2.1 dialect or later (usually will get SMB3) is now
+To improve security the SMB2.1 dialect or later (usually will get SMB3.1.1) is now
the new default. To use old dialects (e.g. to mount Windows XP) use "vers=1.0"
on mount (or vers=2.0 for Windows Vista). Note that the CIFS (vers=1.0) is
much older and less secure than the default dialect SMB3 which includes
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 911c54829c7c..85fb0fa5d091 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1,17 +1,17 @@
- acpi= [HW,ACPI,X86,ARM64]
+ acpi= [HW,ACPI,X86,ARM64,RISCV64]
Advanced Configuration and Power Interface
Format: { force | on | off | strict | noirq | rsdt |
copy_dsdt }
force -- enable ACPI if default was off
- on -- enable ACPI but allow fallback to DT [arm64]
+ on -- enable ACPI but allow fallback to DT [arm64,riscv64]
off -- disable ACPI if default was on
noirq -- do not use ACPI for IRQ routing
strict -- Be less tolerant of platforms that are not
strictly ACPI specification compliant.
rsdt -- prefer RSDT over (default) XSDT
copy_dsdt -- copy DSDT to memory
- For ARM64, ONLY "acpi=off", "acpi=on" or "acpi=force"
- are available
+ For ARM64 and RISCV64, ONLY "acpi=off", "acpi=on" or
+ "acpi=force" are available
See also Documentation/power/runtime_pm.rst, pci=noacpi
@@ -304,7 +304,7 @@
EL0 is indicated by /sys/devices/system/cpu/aarch32_el0
and hot-unplug operations may be restricted.
- See Documentation/arm64/asymmetric-32bit.rst for more
+ See Documentation/arch/arm64/asymmetric-32bit.rst for more
information.
amd_iommu= [HW,X86-64]
@@ -323,6 +323,7 @@
option with care.
pgtbl_v1 - Use v1 page table for DMA-API (Default).
pgtbl_v2 - Use v2 page table for DMA-API.
+ irtcachedis - Disable Interrupt Remapping Table (IRT) caching.
amd_iommu_dump= [HW,X86-64]
Enable AMD IOMMU driver option to dump the ACPI table
@@ -429,6 +430,9 @@
arm64.nosme [ARM64] Unconditionally disable Scalable Matrix
Extension support
+ arm64.nomops [ARM64] Unconditionally disable Memory Copy and Memory
+ Set instructions support
+
ataflop= [HW,M68k]
atarimouse= [HW,MOUSE] Atari Mouse
@@ -818,20 +822,6 @@
Format:
<first_slot>,<last_slot>,<port>,<enum_bit>[,<debug>]
- cpu0_hotplug [X86] Turn on CPU0 hotplug feature when
- CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off.
- Some features depend on CPU0. Known dependencies are:
- 1. Resume from suspend/hibernate depends on CPU0.
- Suspend/hibernate will fail if CPU0 is offline and you
- need to online CPU0 before suspend/hibernate.
- 2. PIC interrupts also depend on CPU0. CPU0 can't be
- removed if a PIC interrupt is detected.
- It's said poweroff/reboot may depend on CPU0 on some
- machines although I haven't seen such issues so far
- after CPU0 is offline on a few tested machines.
- If the dependencies are under your control, you can
- turn on cpu0_hotplug.
-
cpuidle.off=1 [CPU_IDLE]
disable the cpuidle sub-system
@@ -852,6 +842,12 @@
on every CPU online, such as boot, and resume from suspend.
Default: 10000
+ cpuhp.parallel=
+ [SMP] Enable/disable parallel bringup of secondary CPUs
+ Format: <bool>
+ Default is enabled if CONFIG_HOTPLUG_PARALLEL=y. Otherwise
+ the parameter has no effect.
+
crash_kexec_post_notifiers
Run kdump after running panic-notifiers and dumping
kmsg. This only for the users who doubt kdump always
@@ -2117,6 +2113,16 @@
disable
Do not enable intel_pstate as the default
scaling driver for the supported processors
+ active
+ Use intel_pstate driver to bypass the scaling
+ governors layer of cpufreq and provides it own
+ algorithms for p-state selection. There are two
+ P-state selection algorithms provided by
+ intel_pstate in the active mode: powersave and
+ performance. The way they both operate depends
+ on whether or not the hardware managed P-states
+ (HWP) feature has been enabled in the processor
+ and possibly on the processor model.
passive
Use intel_pstate as a scaling driver, but configure it
to work with generic cpufreq governors (instead of
@@ -2551,12 +2557,13 @@
If the value is 0 (the default), KVM will pick a period based
on the ratio, such that a page is zapped after 1 hour on average.
- kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM.
- Default is 1 (enabled)
+ kvm-amd.nested= [KVM,AMD] Control nested virtualization feature in
+ KVM/SVM. Default is 1 (enabled).
- kvm-amd.npt= [KVM,AMD] Disable nested paging (virtualized MMU)
- for all guests.
- Default is 1 (enabled) if in 64-bit or 32-bit PAE mode.
+ kvm-amd.npt= [KVM,AMD] Control KVM's use of Nested Page Tables,
+ a.k.a. Two-Dimensional Page Tables. Default is 1
+ (enabled). Disable by KVM if hardware lacks support
+ for NPT.
kvm-arm.mode=
[KVM,ARM] Select one of KVM/arm64's modes of operation.
@@ -2602,30 +2609,33 @@
Format: <integer>
Default: 5
- kvm-intel.ept= [KVM,Intel] Disable extended page tables
- (virtualized MMU) support on capable Intel chips.
- Default is 1 (enabled)
+ kvm-intel.ept= [KVM,Intel] Control KVM's use of Extended Page Tables,
+ a.k.a. Two-Dimensional Page Tables. Default is 1
+ (enabled). Disable by KVM if hardware lacks support
+ for EPT.
kvm-intel.emulate_invalid_guest_state=
- [KVM,Intel] Disable emulation of invalid guest state.
- Ignored if kvm-intel.enable_unrestricted_guest=1, as
- guest state is never invalid for unrestricted guests.
- This param doesn't apply to nested guests (L2), as KVM
- never emulates invalid L2 guest state.
- Default is 1 (enabled)
+ [KVM,Intel] Control whether to emulate invalid guest
+ state. Ignored if kvm-intel.enable_unrestricted_guest=1,
+ as guest state is never invalid for unrestricted
+ guests. This param doesn't apply to nested guests (L2),
+ as KVM never emulates invalid L2 guest state.
+ Default is 1 (enabled).
kvm-intel.flexpriority=
- [KVM,Intel] Disable FlexPriority feature (TPR shadow).
- Default is 1 (enabled)
+ [KVM,Intel] Control KVM's use of FlexPriority feature
+ (TPR shadow). Default is 1 (enabled). Disalbe by KVM if
+ hardware lacks support for it.
kvm-intel.nested=
- [KVM,Intel] Enable VMX nesting (nVMX).
- Default is 0 (disabled)
+ [KVM,Intel] Control nested virtualization feature in
+ KVM/VMX. Default is 1 (enabled).
kvm-intel.unrestricted_guest=
- [KVM,Intel] Disable unrestricted guest feature
- (virtualized real and unpaged mode) on capable
- Intel chips. Default is 1 (enabled)
+ [KVM,Intel] Control KVM's use of unrestricted guest
+ feature (virtualized real and unpaged mode). Default
+ is 1 (enabled). Disable by KVM if EPT is disabled or
+ hardware lacks support for it.
kvm-intel.vmentry_l1d_flush=[KVM,Intel] Mitigation for L1 Terminal Fault
CVE-2018-3620.
@@ -2639,9 +2649,10 @@
Default is cond (do L1 cache flush in specific instances)
- kvm-intel.vpid= [KVM,Intel] Disable Virtual Processor Identification
- feature (tagged TLBs) on capable Intel chips.
- Default is 1 (enabled)
+ kvm-intel.vpid= [KVM,Intel] Control KVM's use of Virtual Processor
+ Identification feature (tagged TLBs). Default is 1
+ (enabled). Disable by KVM if hardware lacks support
+ for it.
l1d_flush= [X86,INTEL]
Control mitigation for L1D based snooping vulnerability.
@@ -3423,6 +3434,10 @@
[HW] Make the MicroTouch USB driver use raw coordinates
('y', default) or cooked coordinates ('n')
+ mtrr=debug [X86]
+ Enable printing debug information related to MTRR
+ registers at boot time.
+
mtrr_chunk_size=nn[KMG] [X86]
used for mtrr cleanup. It is largest continuous chunk
that could hold holes aka. UC entries.
@@ -3702,8 +3717,8 @@
nohibernate [HIBERNATION] Disable hibernation and resume.
- nohlt [ARM,ARM64,MICROBLAZE,SH] Forces the kernel to busy wait
- in do_idle() and not use the arch_cpu_idle()
+ nohlt [ARM,ARM64,MICROBLAZE,MIPS,SH] Forces the kernel to
+ busy wait in do_idle() and not use the arch_cpu_idle()
implementation; requires CONFIG_GENERIC_IDLE_POLL_SETUP
to be effective. This is useful on platforms where the
sleep(SH) or wfi(ARM,ARM64) instructions do not work
@@ -3838,7 +3853,7 @@
nosmp [SMP] Tells an SMP kernel to act as a UP kernel,
and disable the IO APIC. legacy for "maxcpus=0".
- nosmt [KNL,S390] Disable symmetric multithreading (SMT).
+ nosmt [KNL,MIPS,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
[KNL,X86] Disable symmetric multithreading (SMT).
@@ -4736,43 +4751,6 @@
the propagation of recent CPU-hotplug changes up
the rcu_node combining tree.
- rcutree.use_softirq= [KNL]
- If set to zero, move all RCU_SOFTIRQ processing to
- per-CPU rcuc kthreads. Defaults to a non-zero
- value, meaning that RCU_SOFTIRQ is used by default.
- Specify rcutree.use_softirq=0 to use rcuc kthreads.
-
- But note that CONFIG_PREEMPT_RT=y kernels disable
- this kernel boot parameter, forcibly setting it
- to zero.
-
- rcutree.rcu_fanout_exact= [KNL]
- Disable autobalancing of the rcu_node combining
- tree. This is used by rcutorture, and might
- possibly be useful for architectures having high
- cache-to-cache transfer latencies.
-
- rcutree.rcu_fanout_leaf= [KNL]
- Change the number of CPUs assigned to each
- leaf rcu_node structure. Useful for very
- large systems, which will choose the value 64,
- and for NUMA systems with large remote-access
- latencies, which will choose a value aligned
- with the appropriate hardware boundaries.
-
- rcutree.rcu_min_cached_objs= [KNL]
- Minimum number of objects which are cached and
- maintained per one CPU. Object size is equal
- to PAGE_SIZE. The cache allows to reduce the
- pressure to page allocator, also it makes the
- whole algorithm to behave better in low memory
- condition.
-
- rcutree.rcu_delay_page_cache_fill_msec= [KNL]
- Set the page-cache refill delay (in milliseconds)
- in response to low-memory conditions. The range
- of permitted values is in the range 0:100000.
-
rcutree.jiffies_till_first_fqs= [KNL]
Set delay from grace-period initialization to
first attempt to force quiescent states.
@@ -4811,21 +4789,6 @@
When RCU_NOCB_CPU is set, also adjust the
priority of NOCB callback kthreads.
- rcutree.rcu_divisor= [KNL]
- Set the shift-right count to use to compute
- the callback-invocation batch limit bl from
- the number of callbacks queued on this CPU.
- The result will be bounded below by the value of
- the rcutree.blimit kernel parameter. Every bl
- callbacks, the softirq handler will exit in
- order to allow the CPU to do other work.
-
- Please note that this callback-invocation batch
- limit applies only to non-offloaded callback
- invocation. Offloaded callbacks are instead
- invoked in the context of an rcuoc kthread, which
- scheduler will preempt as it does any other task.
-
rcutree.nocb_nobypass_lim_per_jiffy= [KNL]
On callback-offloaded (rcu_nocbs) CPUs,
RCU reduces the lock contention that would
@@ -4839,14 +4802,6 @@
the ->nocb_bypass queue. The definition of "too
many" is supplied by this kernel boot parameter.
- rcutree.rcu_nocb_gp_stride= [KNL]
- Set the number of NOCB callback kthreads in
- each group, which defaults to the square root
- of the number of CPUs. Larger numbers reduce
- the wakeup overhead on the global grace-period
- kthread, but increases that same overhead on
- each group's NOCB grace-period kthread.
-
rcutree.qhimark= [KNL]
Set threshold of queued RCU callbacks beyond which
batch limiting is disabled.
@@ -4864,6 +4819,56 @@
on rcutree.qhimark at boot time and to zero to
disable more aggressive help enlistment.
+ rcutree.rcu_delay_page_cache_fill_msec= [KNL]
+ Set the page-cache refill delay (in milliseconds)
+ in response to low-memory conditions. The range
+ of permitted values is in the range 0:100000.
+
+ rcutree.rcu_divisor= [KNL]
+ Set the shift-right count to use to compute
+ the callback-invocation batch limit bl from
+ the number of callbacks queued on this CPU.
+ The result will be bounded below by the value of
+ the rcutree.blimit kernel parameter. Every bl
+ callbacks, the softirq handler will exit in
+ order to allow the CPU to do other work.
+
+ Please note that this callback-invocation batch
+ limit applies only to non-offloaded callback
+ invocation. Offloaded callbacks are instead
+ invoked in the context of an rcuoc kthread, which
+ scheduler will preempt as it does any other task.
+
+ rcutree.rcu_fanout_exact= [KNL]
+ Disable autobalancing of the rcu_node combining
+ tree. This is used by rcutorture, and might
+ possibly be useful for architectures having high
+ cache-to-cache transfer latencies.
+
+ rcutree.rcu_fanout_leaf= [KNL]
+ Change the number of CPUs assigned to each
+ leaf rcu_node structure. Useful for very
+ large systems, which will choose the value 64,
+ and for NUMA systems with large remote-access
+ latencies, which will choose a value aligned
+ with the appropriate hardware boundaries.
+
+ rcutree.rcu_min_cached_objs= [KNL]
+ Minimum number of objects which are cached and
+ maintained per one CPU. Object size is equal
+ to PAGE_SIZE. The cache allows to reduce the
+ pressure to page allocator, also it makes the
+ whole algorithm to behave better in low memory
+ condition.
+
+ rcutree.rcu_nocb_gp_stride= [KNL]
+ Set the number of NOCB callback kthreads in
+ each group, which defaults to the square root
+ of the number of CPUs. Larger numbers reduce
+ the wakeup overhead on the global grace-period
+ kthread, but increases that same overhead on
+ each group's NOCB grace-period kthread.
+
rcutree.rcu_kick_kthreads= [KNL]
Cause the grace-period kthread to get an extra
wake_up() if it sleeps three times longer than
@@ -4871,6 +4876,13 @@
This wake_up() will be accompanied by a
WARN_ONCE() splat and an ftrace_dump().
+ rcutree.rcu_resched_ns= [KNL]
+ Limit the time spend invoking a batch of RCU
+ callbacks to the specified number of nanoseconds.
+ By default, this limit is checked only once
+ every 32 callbacks in order to limit the pain
+ inflicted by local_clock() overhead.
+
rcutree.rcu_unlock_delay= [KNL]
In CONFIG_RCU_STRICT_GRACE_PERIOD=y kernels,
this specifies an rcu_read_unlock()-time delay
@@ -4885,6 +4897,16 @@
rcu_node tree with an eye towards determining
why a new grace period has not yet started.
+ rcutree.use_softirq= [KNL]
+ If set to zero, move all RCU_SOFTIRQ processing to
+ per-CPU rcuc kthreads. Defaults to a non-zero
+ value, meaning that RCU_SOFTIRQ is used by default.
+ Specify rcutree.use_softirq=0 to use rcuc kthreads.
+
+ But note that CONFIG_PREEMPT_RT=y kernels disable
+ this kernel boot parameter, forcibly setting it
+ to zero.
+
rcuscale.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
@@ -5087,8 +5109,17 @@
rcutorture.stall_cpu_block= [KNL]
Sleep while stalling if set. This will result
- in warnings from preemptible RCU in addition
- to any other stall-related activity.
+ in warnings from preemptible RCU in addition to
+ any other stall-related activity. Note that
+ in kernels built with CONFIG_PREEMPTION=n and
+ CONFIG_PREEMPT_COUNT=y, this parameter will
+ cause the CPU to pass through a quiescent state.
+ Given CONFIG_PREEMPTION=n, this will suppress
+ RCU CPU stall warnings, but will instead result
+ in scheduling-while-atomic splats.
+
+ Use of this module parameter results in splats.
+
rcutorture.stall_cpu_holdoff= [KNL]
Time to wait (s) after boot before inducing stall.
@@ -5740,7 +5771,7 @@
1: Fast pin select (default)
2: ATC IRMode
- smt= [KNL,S390] Set the maximum number of threads (logical
+ smt= [KNL,MIPS,S390] Set the maximum number of threads (logical
CPUs) to use per physical CPU on systems capable of
symmetric multithreading (SMT). Will be capped to the
actual hardware limit.
@@ -6568,6 +6599,12 @@
unknown_nmi_panic
[X86] Cause panic on unknown NMI.
+ unwind_debug [X86-64]
+ Enable unwinder debug output. This can be
+ useful for debugging certain unwinder error
+ conditions, including corrupt stacks and
+ bad/missing unwinder metadata.
+
usbcore.authorized_default=
[USB] Default USB device authorization:
(default -1 = authorized except for wireless USB,
@@ -6936,6 +6973,18 @@
it can be updated at runtime by writing to the
corresponding sysfs file.
+ workqueue.cpu_intensive_thresh_us=
+ Per-cpu work items which run for longer than this
+ threshold are automatically considered CPU intensive
+ and excluded from concurrency management to prevent
+ them from noticeably delaying other per-cpu work
+ items. Default is 10000 (10ms).
+
+ If CONFIG_WQ_CPU_INTENSIVE_REPORT is set, the kernel
+ will report the work functions which violate this
+ threshold repeatedly. They are likely good
+ candidates for using WQ_UNBOUND workqueues instead.
+
workqueue.disable_numa
By default, all work items queued to unbound
workqueues are affine to the NUMA nodes they're
diff --git a/Documentation/admin-guide/mm/damon/start.rst b/Documentation/admin-guide/mm/damon/start.rst
index 9f88afc734da..7aa0071ff1c3 100644
--- a/Documentation/admin-guide/mm/damon/start.rst
+++ b/Documentation/admin-guide/mm/damon/start.rst
@@ -119,9 +119,9 @@ set size has chronologically changed.::
Data Access Pattern Aware Memory Management
===========================================
-Below three commands make every memory region of size >=4K that doesn't
-accessed for >=60 seconds in your workload to be swapped out. ::
+Below command makes every memory region of size >=4K that has not accessed for
+>=60 seconds in your workload to be swapped out. ::
- $ echo "#min-size max-size min-acc max-acc min-age max-age action" > test_scheme
- $ echo "4K max 0 0 60s max pageout" >> test_scheme
- $ damo schemes -c test_scheme <pid of your workload>
+ $ sudo damo schemes --damos_access_rate 0 0 --damos_sz_region 4K max \
+ --damos_age 60s max --damos_action pageout \
+ <pid of your workload>
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index 9b823fec974d..2d495fa85a0e 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -10,9 +10,8 @@ DAMON provides below interfaces for different users.
`This <https://github.com/awslabs/damo>`_ is for privileged people such as
system administrators who want a just-working human-friendly interface.
Using this, users can use the DAMON’s major features in a human-friendly way.
- It may not be highly tuned for special cases, though. It supports both
- virtual and physical address spaces monitoring. For more detail, please
- refer to its `usage document
+ It may not be highly tuned for special cases, though. For more detail,
+ please refer to its `usage document
<https://github.com/awslabs/damo/blob/next/USAGE.md>`_.
- *sysfs interface.*
:ref:`This <sysfs_interface>` is for privileged user space programmers who
@@ -20,11 +19,7 @@ DAMON provides below interfaces for different users.
features by reading from and writing to special sysfs files. Therefore,
you can write and use your personalized DAMON sysfs wrapper programs that
reads/writes the sysfs files instead of you. The `DAMON user space tool
- <https://github.com/awslabs/damo>`_ is one example of such programs. It
- supports both virtual and physical address spaces monitoring. Note that this
- interface provides only simple :ref:`statistics <damos_stats>` for the
- monitoring results. For detailed monitoring results, DAMON provides a
- :ref:`tracepoint <tracepoint>`.
+ <https://github.com/awslabs/damo>`_ is one example of such programs.
- *debugfs interface. (DEPRECATED!)*
:ref:`This <debugfs_interface>` is almost identical to :ref:`sysfs interface
<sysfs_interface>`. This is deprecated, so users should move to the
@@ -139,7 +134,7 @@ scheme of the kdamond. Writing ``clear_schemes_tried_regions`` to ``state``
file clears the DAMON-based operating scheme action tried regions directory for
each DAMON-based operation scheme of the kdamond. For details of the
DAMON-based operation scheme action tried regions directory, please refer to
-:ref:tried_regions section <sysfs_schemes_tried_regions>`.
+:ref:`tried_regions section <sysfs_schemes_tried_regions>`.
If the state is ``on``, reading ``pid`` shows the pid of the kdamond thread.
@@ -259,12 +254,9 @@ be equal or smaller than ``start`` of directory ``N+1``.
contexts/<N>/schemes/
---------------------
-For usual DAMON-based data access aware memory management optimizations, users
-would normally want the system to apply a memory management action to a memory
-region of a specific access pattern. DAMON receives such formalized operation
-schemes from the user and applies those to the target memory regions. Users
-can get and set the schemes by reading from and writing to files under this
-directory.
+The directory for DAMON-based Operation Schemes (:ref:`DAMOS
+<damon_design_damos>`). Users can get and set the schemes by reading from and
+writing to files under this directory.
In the beginning, this directory has only one file, ``nr_schemes``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
@@ -277,12 +269,12 @@ In each scheme directory, five directories (``access_pattern``, ``quotas``,
``watermarks``, ``filters``, ``stats``, and ``tried_regions``) and one file
(``action``) exist.
-The ``action`` file is for setting and getting what action you want to apply to
-memory regions having specific access pattern of the interest. The keywords
-that can be written to and read from the file and their meaning are as below.
+The ``action`` file is for setting and getting the scheme's :ref:`action
+<damon_design_damos_action>`. The keywords that can be written to and read
+from the file and their meaning are as below.
Note that support of each action depends on the running DAMON operations set
-`implementation <sysfs_contexts>`.
+:ref:`implementation <sysfs_contexts>`.
- ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
Supported by ``vaddr`` and ``fvaddr`` operations set.
@@ -304,32 +296,21 @@ Note that support of each action depends on the running DAMON operations set
schemes/<N>/access_pattern/
---------------------------
-The target access pattern of each DAMON-based operation scheme is constructed
-with three ranges including the size of the region in bytes, number of
-monitored accesses per aggregate interval, and number of aggregated intervals
-for the age of the region.
+The directory for the target access :ref:`pattern
+<damon_design_damos_access_pattern>` of the given DAMON-based operation scheme.
Under the ``access_pattern`` directory, three directories (``sz``,
``nr_accesses``, and ``age``) each having two files (``min`` and ``max``)
exist. You can set and get the access pattern for the given scheme by writing
to and reading from the ``min`` and ``max`` files under ``sz``,
-``nr_accesses``, and ``age`` directories, respectively.
+``nr_accesses``, and ``age`` directories, respectively. Note that the ``min``
+and the ``max`` form a closed interval.
schemes/<N>/quotas/
-------------------
-Optimal ``target access pattern`` for each ``action`` is workload dependent, so
-not easy to find. Worse yet, setting a scheme of some action too aggressive
-can cause severe overhead. To avoid such overhead, users can limit time and
-size quota for each scheme. In detail, users can ask DAMON to try to use only
-up to specific time (``time quota``) for applying the action, and to apply the
-action to only up to specific amount (``size quota``) of memory regions having
-the target access pattern within a given time interval (``reset interval``).
-
-When the quota limit is expected to be exceeded, DAMON prioritizes found memory
-regions of the ``target access pattern`` based on their size, access frequency,
-and age. For personalized prioritization, users can set the weights for the
-three properties.
+The directory for the :ref:`quotas <damon_design_damos_quotas>` of the given
+DAMON-based operation scheme.
Under ``quotas`` directory, three files (``ms``, ``bytes``,
``reset_interval_ms``) and one directory (``weights``) having three files
@@ -337,23 +318,26 @@ Under ``quotas`` directory, three files (``ms``, ``bytes``,
You can set the ``time quota`` in milliseconds, ``size quota`` in bytes, and
``reset interval`` in milliseconds by writing the values to the three files,
-respectively. You can also set the prioritization weights for size, access
-frequency, and age in per-thousand unit by writing the values to the three
-files under the ``weights`` directory.
+respectively. Then, DAMON tries to use only up to ``time quota`` milliseconds
+for applying the ``action`` to memory regions of the ``access_pattern``, and to
+apply the action to only up to ``bytes`` bytes of memory regions within the
+``reset_interval_ms``. Setting both ``ms`` and ``bytes`` zero disables the
+quota limits.
+
+You can also set the :ref:`prioritization weights
+<damon_design_damos_quotas_prioritization>` for size, access frequency, and age
+in per-thousand unit by writing the values to the three files under the
+``weights`` directory.
schemes/<N>/watermarks/
-----------------------
-To allow easy activation and deactivation of each scheme based on system
-status, DAMON provides a feature called watermarks. The feature receives five
-values called ``metric``, ``interval``, ``high``, ``mid``, and ``low``. The
-``metric`` is the system metric such as free memory ratio that can be measured.
-If the metric value of the system is higher than the value in ``high`` or lower
-than ``low`` at the memoent, the scheme is deactivated. If the value is lower
-than ``mid``, the scheme is activated.
+The directory for the :ref:`watermarks <damon_design_damos_watermarks>` of the
+given DAMON-based operation scheme.
Under the watermarks directory, five files (``metric``, ``interval_us``,
-``high``, ``mid``, and ``low``) for setting each value exist. You can set and
+``high``, ``mid``, and ``low``) for setting the metric, the time interval
+between check of the metric, and the three watermarks exist. You can set and
get the five values by writing to the files, respectively.
Keywords and meanings of those that can be written to the ``metric`` file are
@@ -367,12 +351,8 @@ The ``interval`` should written in microseconds unit.
schemes/<N>/filters/
--------------------
-Users could know something more than the kernel for specific types of memory.
-In the case, users could do their own management for the memory and hence
-doesn't want DAMOS bothers that. Users could limit DAMOS by setting the access
-pattern of the scheme and/or the monitoring regions for the purpose, but that
-can be inefficient in some cases. In such cases, users could set non-access
-pattern driven filters using files in this directory.
+The directory for the :ref:`filters <damon_design_damos_filters>` of the given
+DAMON-based operation scheme.
In the beginning, this directory has only one file, ``nr_filters``. Writing a
number (``N``) to the file creates the number of child directories named ``0``
@@ -432,13 +412,17 @@ starting from ``0`` under this directory. Each directory contains files
exposing detailed information about each of the memory region that the
corresponding scheme's ``action`` has tried to be applied under this directory,
during next :ref:`aggregation interval <sysfs_monitoring_attrs>`. The
-information includes address range, ``nr_accesses``, , and ``age`` of the
-region.
+information includes address range, ``nr_accesses``, and ``age`` of the region.
The directories will be removed when another special keyword,
``clear_schemes_tried_regions``, is written to the relevant
``kdamonds/<N>/state`` file.
+The expected usage of this directory is investigations of schemes' behaviors,
+and query-like efficient data access monitoring results retrievals. For the
+latter use case, in particular, users can set the ``action`` as ``stat`` and
+set the ``access pattern`` as their interested pattern that they want to query.
+
tried_regions/<N>/
------------------
@@ -600,15 +584,10 @@ update.
Schemes
-------
-For usual DAMON-based data access aware memory management optimizations, users
-would simply want the system to apply a memory management action to a memory
-region of a specific access pattern. DAMON receives such formalized operation
-schemes from the user and applies those to the target processes.
-
-Users can get and set the schemes by reading from and writing to ``schemes``
-debugfs file. Reading the file also shows the statistics of each scheme. To
-the file, each of the schemes should be represented in each line in below
-form::
+Users can get and set the DAMON-based operation :ref:`schemes
+<damon_design_damos>` by reading from and writing to ``schemes`` debugfs file.
+Reading the file also shows the statistics of each scheme. To the file, each
+of the schemes should be represented in each line in below form::
<target access pattern> <action> <quota> <watermarks>
@@ -617,8 +596,9 @@ You can disable schemes by simply writing an empty string to the file.
Target Access Pattern
~~~~~~~~~~~~~~~~~~~~~
-The ``<target access pattern>`` is constructed with three ranges in below
-form::
+The target access :ref:`pattern <damon_design_damos_access_pattern>` of the
+scheme. The ``<target access pattern>`` is constructed with three ranges in
+below form::
min-size max-size min-acc max-acc min-age max-age
@@ -631,9 +611,9 @@ closed interval.
Action
~~~~~~
-The ``<action>`` is a predefined integer for memory management actions, which
-DAMON will apply to the regions having the target access pattern. The
-supported numbers and their meanings are as below.
+The ``<action>`` is a predefined integer for memory management :ref:`actions
+<damon_design_damos_action>`. The supported numbers and their meanings are as
+below.
- 0: Call ``madvise()`` for the region with ``MADV_WILLNEED``. Ignored if
``target`` is ``paddr``.
@@ -649,10 +629,8 @@ supported numbers and their meanings are as below.
Quota
~~~~~
-Optimal ``target access pattern`` for each ``action`` is workload dependent, so
-not easy to find. Worse yet, setting a scheme of some action too aggressive
-can cause severe overhead. To avoid such overhead, users can limit time and
-size quota for the scheme via the ``<quota>`` in below form::
+Users can set the :ref:`quotas <damon_design_damos_quotas>` of the given scheme
+via the ``<quota>`` in below form::
<ms> <sz> <reset interval> <priority weights>
@@ -662,19 +640,17 @@ the action to memory regions of the ``target access pattern`` within the
``<sz>`` bytes of memory regions within the ``<reset interval>``. Setting both
``<ms>`` and ``<sz>`` zero disables the quota limits.
-When the quota limit is expected to be exceeded, DAMON prioritizes found memory
-regions of the ``target access pattern`` based on their size, access frequency,
-and age. For personalized prioritization, users can set the weights for the
-three properties in ``<priority weights>`` in below form::
+For the :ref:`prioritization <damon_design_damos_quotas_prioritization>`, users
+can set the weights for the three properties in ``<priority weights>`` in below
+form::
<size weight> <access frequency weight> <age weight>
Watermarks
~~~~~~~~~~
-Some schemes would need to run based on current value of the system's specific
-metrics like free memory ratio. For such cases, users can specify watermarks
-for the condition.::
+Users can specify :ref:`watermarks <damon_design_damos_watermarks>` of the
+given scheme via ``<watermarks>`` in below form::
<metric> <check interval> <high mark> <middle mark> <low mark>
@@ -797,10 +773,12 @@ root directory only.
Tracepoint for Monitoring Results
=================================
-DAMON provides the monitoring results via a tracepoint,
-``damon:damon_aggregated``. While the monitoring is turned on, you could
-record the tracepoint events and show results using tracepoint supporting tools
-like ``perf``. For example::
+Users can get the monitoring results via the :ref:`tried_regions
+<sysfs_schemes_tried_regions>` or a tracepoint, ``damon:damon_aggregated``.
+While the tried regions directory is useful for getting a snapshot, the
+tracepoint is useful for getting a full record of the results. While the
+monitoring is turned on, you could record the tracepoint events and show
+results using tracepoint supporting tools like ``perf``. For example::
# echo on > monitor_on
# perf record -e damon:damon_aggregated &
diff --git a/Documentation/admin-guide/perf/hisi-pmu.rst b/Documentation/admin-guide/perf/hisi-pmu.rst
index 546979360513..e0174d20809a 100644
--- a/Documentation/admin-guide/perf/hisi-pmu.rst
+++ b/Documentation/admin-guide/perf/hisi-pmu.rst
@@ -56,14 +56,14 @@ Example usage of perf::
For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
as PMU v1, but some new functions are added to the hardware.
-(a) L3C PMU supports filtering by core/thread within the cluster which can be
+1. L3C PMU supports filtering by core/thread within the cluster which can be
specified as a bitmap::
$# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
This will only count the operations from core/thread 0 and 1 in this cluster.
-(b) Tracetag allow the user to chose to count only read, write or atomic
+2. Tracetag allow the user to chose to count only read, write or atomic
operations via the tt_req parameeter in perf. The default value counts all
operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
represents write operations, 3'b110 represents atomic store operations and
@@ -73,14 +73,16 @@ represents write operations, 3'b110 represents atomic store operations and
This will only count the read operations in this cluster.
-(c) Datasrc allows the user to check where the data comes from. It is 5 bits.
+3. Datasrc allows the user to check where the data comes from. It is 5 bits.
Some important codes are as follows:
-5'b00001: comes from L3C in this die;
-5'b01000: comes from L3C in the cross-die;
-5'b01001: comes from L3C which is in another socket;
-5'b01110: comes from the local DDR;
-5'b01111: comes from the cross-die DDR;
-5'b10000: comes from cross-socket DDR;
+
+- 5'b00001: comes from L3C in this die;
+- 5'b01000: comes from L3C in the cross-die;
+- 5'b01001: comes from L3C which is in another socket;
+- 5'b01110: comes from the local DDR;
+- 5'b01111: comes from the cross-die DDR;
+- 5'b10000: comes from cross-socket DDR;
+
etc, it is mainly helpful to find that the data source is nearest from the CPU
cores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
configured in perf command::
@@ -88,15 +90,25 @@ configured in perf command::
$# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
-(d)Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
+4. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
contains several Compute Clusters (CCLs). The I/O dies are called Super I/O
clusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
SoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
CCL/ICL-ID. For I/O die, the ICL-ID is followed by:
-5'b00000: I/O_MGMT_ICL;
-5'b00001: Network_ICL;
-5'b00011: HAC_ICL;
-5'b10000: PCIe_ICL;
+
+- 5'b00000: I/O_MGMT_ICL;
+- 5'b00001: Network_ICL;
+- 5'b00011: HAC_ICL;
+- 5'b10000: PCIe_ICL;
+
+5. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
+uring channel. It is 2 bits. Some important codes are as follows:
+
+- 2'b11: count the events which sent to the uring_ext (MATA) channel;
+- 2'b01: is the same as 2'b11;
+- 2'b10: count the events which sent to the uring (non-MATA) channel;
+- 2'b00: default value, count the events which sent to the both uring and
+ uring_ext channel;
Users could configure IDs to count data come from specific CCL/ICL, by setting
srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
diff --git a/Documentation/admin-guide/quickly-build-trimmed-linux.rst b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
index ff4f4cc8522b..f08149bc53f8 100644
--- a/Documentation/admin-guide/quickly-build-trimmed-linux.rst
+++ b/Documentation/admin-guide/quickly-build-trimmed-linux.rst
@@ -215,12 +215,14 @@ again.
reduce the compile time enormously, especially if you are running an
universal kernel from a commodity Linux distribution.
- There is a catch: the make target 'localmodconfig' will disable kernel
- features you have not directly or indirectly through some program utilized
- since you booted the system. You can reduce or nearly eliminate that risk by
- using tricks outlined in the reference section; for quick testing purposes
- that risk is often negligible, but it is an aspect you want to keep in mind
- in case your kernel behaves oddly.
+ There is a catch: 'localmodconfig' is likely to disable kernel features you
+ did not use since you booted your Linux -- like drivers for currently
+ disconnected peripherals or a virtualization software not haven't used yet.
+ You can reduce or nearly eliminate that risk with tricks the reference
+ section outlines; but when building a kernel just for quick testing purposes
+ it is often negligible if such features are missing. But you should keep that
+ aspect in mind when using a kernel built with this make target, as it might
+ be the reason why something you only use occasionally stopped working.
[:ref:`details<configuration>`]
@@ -271,6 +273,9 @@ again.
does nothing at all; in that case you have to manually install your kernel,
as outlined in the reference section.
+ If you are running a immutable Linux distribution, check its documentation
+ and the web to find out how to install your own kernel there.
+
[:ref:`details<install>`]
.. _another_sbs:
@@ -291,29 +296,29 @@ again.
version you care about, as git otherwise might retrieve the entire commit
history::
- git fetch --shallow-exclude=v6.1 origin
-
- If you modified the sources (for example by applying a patch), you now need
- to discard those modifications; that's because git otherwise will not be able
- to switch to the sources of another version due to potential conflicting
- changes::
-
- git reset --hard
+ git fetch --shallow-exclude=v6.0 origin
- Now checkout the version you are interested in, as explained above::
+ Now switch to the version you are interested in -- but be aware the command
+ used here will discard any modifications you performed, as they would
+ conflict with the sources you want to checkout::
- git checkout --detach origin/master
+ git checkout --force --detach origin/master
At this point you might want to patch the sources again or set/modify a build
- tag, as explained earlier; afterwards adjust the build configuration to the
- new codebase and build your next kernel::
+ tag, as explained earlier. Afterwards adjust the build configuration to the
+ new codebase using olddefconfig, which will now adjust the configuration file
+ you prepared earlier using localmodconfig (~/linux/.config) for your next
+ kernel::
# reminder: if you want to apply patches, do it at this point
# reminder: you might want to update your build tag at this point
make olddefconfig
+
+ Now build your kernel::
+
make -j $(nproc --all)
- Install the kernel as outlined above::
+ Afterwards install the kernel as outlined above::
command -v installkernel && sudo make modules_install install
@@ -584,11 +589,11 @@ versions and individual commits at hand at any time::
curl -L \
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/clone.bundle \
-o linux-stable.git.bundle
- git clone clone.bundle ~/linux/
+ git clone linux-stable.git.bundle ~/linux/
rm linux-stable.git.bundle
cd ~/linux/
- git remote set-url origin
- https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
+ git remote set-url origin \
+ https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
git fetch origin
git checkout --detach origin/master
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index d85d90f5d000..3800fab1619b 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -949,7 +949,7 @@ user space can read performance monitor counter registers directly.
The default value is 0 (access disabled).
-See Documentation/arm64/perf.rst for more information.
+See Documentation/arch/arm64/perf.rst for more information.
pid_max
diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst
index 466c560b0c30..4877563241f3 100644
--- a/Documentation/admin-guide/sysctl/net.rst
+++ b/Documentation/admin-guide/sysctl/net.rst
@@ -386,8 +386,8 @@ Default : 0 (for compatibility reasons)
txrehash
--------
-Controls default hash rethink behaviour on listening socket when SO_TXREHASH
-option is set to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
+Controls default hash rethink behaviour on socket when SO_TXREHASH option is set
+to SOCK_TXREHASH_DEFAULT (i. e. not overridden by setsockopt).
If set to 1 (default), hash rethink is performed on listening socket.
If set to 0, hash rethink is not performed.