diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-12-15 12:53:37 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-12-15 12:53:37 -0800 |
commit | ac73e3dc8acd0a3be292755db30388c3580f5674 (patch) | |
tree | 5abef6cb82b205b5dbbb69dca950b8a5aae716de /Documentation | |
parent | 148842c98a24e508aecb929718818fbf4c2a6ff3 (diff) | |
parent | dfefd226b0bf7c435a58d75a0ce2f9273b9825f6 (diff) |
Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
- a few random little subsystems
- almost all of the MM patches which are staged ahead of linux-next
material. I'll trickle to post-linux-next work in as the dependents
get merged up.
Subsystems affected by this patch series: kthread, kbuild, ide, ntfs,
ocfs2, arch, and mm (slab-generic, slab, slub, dax, debug, pagecache,
gup, swap, shmem, memcg, pagemap, mremap, hmm, vmalloc, documentation,
kasan, pagealloc, memory-failure, hugetlb, vmscan, z3fold, compaction,
oom-kill, migration, cma, page-poison, userfaultfd, zswap, zsmalloc,
uaccess, zram, and cleanups).
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (200 commits)
mm: cleanup kstrto*() usage
mm: fix fall-through warnings for Clang
mm: slub: convert sysfs sprintf family to sysfs_emit/sysfs_emit_at
mm: shmem: convert shmem_enabled_show to use sysfs_emit_at
mm:backing-dev: use sysfs_emit in macro defining functions
mm: huge_memory: convert remaining use of sprintf to sysfs_emit and neatening
mm: use sysfs_emit for struct kobject * uses
mm: fix kernel-doc markups
zram: break the strict dependency from lzo
zram: add stat to gather incompressible pages since zram set up
zram: support page writeback
mm/process_vm_access: remove redundant initialization of iov_r
mm/zsmalloc.c: rework the list_add code in insert_zspage()
mm/zswap: move to use crypto_acomp API for hardware acceleration
mm/zswap: fix passing zero to 'PTR_ERR' warning
mm/zswap: make struct kernel_param_ops definitions const
userfaultfd/selftests: hint the test runner on required privilege
userfaultfd/selftests: fix retval check for userfaultfd_open()
userfaultfd/selftests: always dump something in modes
userfaultfd: selftests: make __{s,u}64 format specifiers portable
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/admin-guide/blockdev/zram.rst | 6 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/memcg_test.rst | 8 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v1/memory.rst | 40 | ||||
-rw-r--r-- | Documentation/admin-guide/cgroup-v2.rst | 11 | ||||
-rw-r--r-- | Documentation/admin-guide/mm/transhuge.rst | 15 | ||||
-rw-r--r-- | Documentation/admin-guide/sysctl/vm.rst | 15 | ||||
-rw-r--r-- | Documentation/core-api/memory-allocation.rst | 4 | ||||
-rw-r--r-- | Documentation/core-api/pin_user_pages.rst | 6 | ||||
-rw-r--r-- | Documentation/dev-tools/kasan.rst | 5 | ||||
-rw-r--r-- | Documentation/filesystems/tmpfs.rst | 8 | ||||
-rw-r--r-- | Documentation/vm/memory-model.rst | 3 | ||||
-rw-r--r-- | Documentation/vm/page_owner.rst | 12 |
12 files changed, 64 insertions, 69 deletions
diff --git a/Documentation/admin-guide/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst index 9093228cf18b..700329d25f57 100644 --- a/Documentation/admin-guide/blockdev/zram.rst +++ b/Documentation/admin-guide/blockdev/zram.rst @@ -266,6 +266,7 @@ line of text and contains the following stats separated by whitespace: No memory is allocated for such pages. pages_compacted the number of pages freed during compaction huge_pages the number of incompressible pages + huge_pages_since the number of incompressible pages since zram set up ================ ============================================================= File /sys/block/zram<id>/bd_stat @@ -334,6 +335,11 @@ Admin can request writeback of those idle pages at right timing via:: With the command, zram writeback idle pages from memory to the storage. +If admin want to write a specific page in zram device to backing device, +they could write a page index into the interface. + + echo "page_index=1251" > /sys/block/zramX/writeback + If there are lots of write IO with flash device, potentially, it has flash wearout problem so that admin needs to design write limitation to guarantee storage health for entire product life. diff --git a/Documentation/admin-guide/cgroup-v1/memcg_test.rst b/Documentation/admin-guide/cgroup-v1/memcg_test.rst index 3f7115e07b5d..4f83de2dab6e 100644 --- a/Documentation/admin-guide/cgroup-v1/memcg_test.rst +++ b/Documentation/admin-guide/cgroup-v1/memcg_test.rst @@ -219,13 +219,11 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. This is an easy way to test page migration, too. -9.5 mkdir/rmdir ---------------- +9.5 nested cgroups +------------------ - When using hierarchy, mkdir/rmdir test should be done. - Use tests like the following:: + Use tests like the following for testing nested cgroups:: - echo 1 >/opt/cgroup/01/memory/use_hierarchy mkdir /opt/cgroup/01/child_a mkdir /opt/cgroup/01/child_b diff --git a/Documentation/admin-guide/cgroup-v1/memory.rst b/Documentation/admin-guide/cgroup-v1/memory.rst index 12757e63b26c..a44cd467d218 100644 --- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -77,6 +77,8 @@ Brief summary of control files. memory.soft_limit_in_bytes set/show soft limit of memory usage memory.stat show various statistics memory.use_hierarchy set/show hierarchical account enabled + This knob is deprecated and shouldn't be + used. memory.force_empty trigger forced page reclaim memory.pressure_level set memory pressure notifications memory.swappiness set/show swappiness parameter of vmscan @@ -495,16 +497,13 @@ cgroup might have some charge associated with it, even though all tasks have migrated away from it. (because we charge against pages, not against tasks.) -We move the stats to root (if use_hierarchy==0) or parent (if -use_hierarchy==1), and no change on the charge except uncharging +We move the stats to parent, and no change on the charge except uncharging from the child. Charges recorded in swap information is not updated at removal of cgroup. Recorded information is discarded and a cgroup which uses swap (swapcache) will be charged as a new owner of it. -About use_hierarchy, see Section 6. - 5. Misc. interfaces =================== @@ -527,8 +526,6 @@ About use_hierarchy, see Section 6. write will still return success. In this case, it is expected that memory.kmem.usage_in_bytes == memory.usage_in_bytes. - About use_hierarchy, see Section 6. - 5.2 stat file ------------- @@ -675,31 +672,20 @@ hierarchy:: d e In the diagram above, with hierarchical accounting enabled, all memory -usage of e, is accounted to its ancestors up until the root (i.e, c and root), -that has memory.use_hierarchy enabled. If one of the ancestors goes over its -limit, the reclaim algorithm reclaims from the tasks in the ancestor and the -children of the ancestor. - -6.1 Enabling hierarchical accounting and reclaim ------------------------------------------------- - -A memory cgroup by default disables the hierarchy feature. Support -can be enabled by writing 1 to memory.use_hierarchy file of the root cgroup:: +usage of e, is accounted to its ancestors up until the root (i.e, c and root). +If one of the ancestors goes over its limit, the reclaim algorithm reclaims +from the tasks in the ancestor and the children of the ancestor. - # echo 1 > memory.use_hierarchy - -The feature can be disabled by:: +6.1 Hierarchical accounting and reclaim +--------------------------------------- - # echo 0 > memory.use_hierarchy +Hierarchical accounting is enabled by default. Disabling the hierarchical +accounting is deprecated. An attempt to do it will result in a failure +and a warning printed to dmesg. -NOTE1: - Enabling/disabling will fail if either the cgroup already has other - cgroups created below it, or if the parent cgroup has use_hierarchy - enabled. +For compatibility reasons writing 1 to memory.use_hierarchy will always pass:: -NOTE2: - When panic_on_oom is set to "2", the whole system will panic in - case of an OOM event in any cgroup. + # echo 1 > memory.use_hierarchy 7. Soft limits ============== diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 608d7c279396..63521cd36ce5 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1274,6 +1274,9 @@ PAGE_SIZE multiple when read back. kernel_stack Amount of memory allocated to kernel stacks. + pagetables + Amount of memory allocated for page tables. + percpu(npn) Amount of memory used for storing per-cpu kernel data structures. @@ -1300,6 +1303,14 @@ PAGE_SIZE multiple when read back. Amount of memory used in anonymous mappings backed by transparent hugepages + file_thp + Amount of cached filesystem data backed by transparent + hugepages + + shmem_thp + Amount of shm, tmpfs, shared anonymous mmap()s backed by + transparent hugepages + inactive_anon, active_anon, inactive_file, active_file, unevictable Amount of memory, swap-backed and filesystem-backed, on the internal memory management lists used by the diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index b2acd0d395ca..3b8a336511a4 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -401,21 +401,6 @@ compact_fail is incremented if the system tries to compact memory but failed. -compact_pages_moved - is incremented each time a page is moved. If - this value is increasing rapidly, it implies that the system - is copying a lot of data to satisfy the huge page allocation. - It is possible that the cost of copying exceeds any savings - from reduced TLB misses. - -compact_pagemigrate_failed - is incremented when the underlying mechanism - for moving a page failed. - -compact_blocks_moved - is incremented each time memory compaction examines - a huge page aligned range of pages. - It is possible to establish how long the stalls were using the function tracer to record how long was spent in __alloc_pages_nodemask and using the mm_page_alloc tracepoint to identify which allocations were diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index e0cf17ad2ae5..e972caa43bd4 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -873,12 +873,17 @@ file-backed pages is less than the high watermark in a zone. unprivileged_userfaultfd ======================== -This flag controls whether unprivileged users can use the userfaultfd -system calls. Set this to 1 to allow unprivileged users to use the -userfaultfd system calls, or set this to 0 to restrict userfaultfd to only -privileged users (with SYS_CAP_PTRACE capability). +This flag controls the mode in which unprivileged users can use the +userfaultfd system calls. Set this to 0 to restrict unprivileged users +to handle page faults in user mode only. In this case, users without +SYS_CAP_PTRACE must pass UFFD_USER_MODE_ONLY in order for userfaultfd to +succeed. Prohibiting use of userfaultfd for handling faults from kernel +mode may make certain vulnerabilities more difficult to exploit. -The default value is 1. +Set this to 1 to allow unprivileged users to use the userfaultfd system +calls without any restrictions. + +The default value is 0. user_reserve_kbytes diff --git a/Documentation/core-api/memory-allocation.rst b/Documentation/core-api/memory-allocation.rst index 4446a1ac36cc..5954ddf6ee13 100644 --- a/Documentation/core-api/memory-allocation.rst +++ b/Documentation/core-api/memory-allocation.rst @@ -147,6 +147,10 @@ The address of a chunk allocated with `kmalloc` is aligned to at least ARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, the alignment is also guaranteed to be at least the respective size. +Chunks allocated with kmalloc() can be resized with krealloc(). Similarly +to kmalloc_array(): a helper for resizing arrays is provided in the form of +krealloc_array(). + For large allocations you can use vmalloc() and vzalloc(), or directly request pages from the page allocator. The memory allocated by `vmalloc` and related functions is not physically contiguous. diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst index 7ca8c7bac650..fcf605be43d0 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -221,12 +221,12 @@ Unit testing ============ This file:: - tools/testing/selftests/vm/gup_benchmark.c + tools/testing/selftests/vm/gup_test.c has the following new calls to exercise the new pin*() wrapper functions: -* PIN_FAST_BENCHMARK (./gup_benchmark -a) -* PIN_BENCHMARK (./gup_benchmark -b) +* PIN_FAST_BENCHMARK (./gup_test -a) +* PIN_BASIC_TEST (./gup_test -b) You can monitor how many total dma-pinned pages have been acquired and released since the system was booted, via two new /proc/vmstat entries: :: diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index 0bd6d0e1ca6b..6b752a45a936 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -190,8 +190,9 @@ function calls GCC directly inserts the code to check the shadow memory. This option significantly enlarges kernel but it gives x1.1-x2 performance boost over outline instrumented kernel. -Generic KASAN prints up to 2 call_rcu() call stacks in reports, the last one -and the second to last. +Generic KASAN also reports the last 2 call stacks to creation of work that +potentially has access to an object. Call stacks for the following are shown: +call_rcu() and workqueue queuing. Software tag-based KASAN ~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystems/tmpfs.rst index c44f8b1d3cab..0408c245785e 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -4,7 +4,7 @@ Tmpfs ===== -Tmpfs is a file system which keeps all files in virtual memory. +Tmpfs is a file system which keeps all of its files in virtual memory. Everything in tmpfs is temporary in the sense that no files will be @@ -35,7 +35,7 @@ tmpfs has the following uses: memory. This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not - set, the user visible part of tmpfs is not build. But the internal + set, the user visible part of tmpfs is not built. But the internal mechanisms are always present. 2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for @@ -50,7 +50,7 @@ tmpfs has the following uses: This mount is _not_ needed for SYSV shared memory. The internal mount is used for that. (In the 2.3 kernel versions it was necessary to mount the predecessor of tmpfs (shm fs) to use SYSV - shared memory) + shared memory.) 3) Some people (including me) find it very convenient to mount it e.g. on /tmp and /var/tmp and have a big swap partition. And now @@ -83,7 +83,7 @@ If nr_blocks=0 (or size=0), blocks will not be limited in that instance; if nr_inodes=0, inodes will not be limited. It is generally unwise to mount with such options, since it allows any user with write access to use up all the memory on the machine; but enhances the scalability of -that instance in a system with many cpus making intensive use of it. +that instance in a system with many CPUs making intensive use of it. tmpfs has a mount option to set the NUMA memory allocation policy for diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst index 9daadf9faba1..ce398a7dc6cd 100644 --- a/Documentation/vm/memory-model.rst +++ b/Documentation/vm/memory-model.rst @@ -51,8 +51,7 @@ call :c:func:`free_area_init` function. Yet, the mappings array is not usable until the call to :c:func:`memblock_free_all` that hands all the memory to the page allocator. -If an architecture enables `CONFIG_ARCH_HAS_HOLES_MEMORYMODEL` option, -it may free parts of the `mem_map` array that do not cover the +An architecture may free parts of the `mem_map` array that do not cover the actual physical pages. In such case, the architecture specific :c:func:`pfn_valid` implementation should take the holes in the `mem_map` into account. diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst index 02deac76673f..4e67c2e9bbed 100644 --- a/Documentation/vm/page_owner.rst +++ b/Documentation/vm/page_owner.rst @@ -41,17 +41,17 @@ size change due to this facility. - Without page owner:: text data bss dec hex filename - 40662 1493 644 42799 a72f mm/page_alloc.o + 48392 2333 644 51369 c8a9 mm/page_alloc.o - With page owner:: text data bss dec hex filename - 40892 1493 644 43029 a815 mm/page_alloc.o - 1427 24 8 1459 5b3 mm/page_ext.o - 2722 50 0 2772 ad4 mm/page_owner.o + 48800 2445 644 51889 cab1 mm/page_alloc.o + 6574 108 29 6711 1a37 mm/page_owner.o + 1025 8 8 1041 411 mm/page_ext.o -Although, roughly, 4 KB code is added in total, page_alloc.o increase by -230 bytes and only half of it is in hotpath. Building the kernel with +Although, roughly, 8 KB code is added in total, page_alloc.o increase by +520 bytes and less than half of it is in hotpath. Building the kernel with page owner and turning it on if needed would be great option to debug kernel memory problem. |