summaryrefslogtreecommitdiff
path: root/kernel
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2024-09-21 07:29:05 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2024-09-21 07:29:05 -0700
commit617a814f14b8914271f7a70366d72c6196d17663 (patch)
tree31d32f73bef107862101ded103a76b314cea3705 /kernel
parent1868f9d0260e9afaf7c6436d14923ae12eaea465 (diff)
parent684826f8271ad97580b138b9ffd462005e470b99 (diff)
Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: "Along with the usual shower of singleton patches, notable patch series in this pull request are: - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. - "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. - "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. - "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". - "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. - "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. - "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. - "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. - "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). - "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. - "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. - "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. - "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. - "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. - "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. - "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. - "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). - "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! - "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. - "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. - "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. - "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. - "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. - "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy. This was overprovisioning THPs in sparsely accessed memory areas. - "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. - "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. - "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. - "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios" * tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits) zram: free secondary algorithms names uprobes: turn xol_area->pages[2] into xol_area->page uprobes: introduce the global struct vm_special_mapping xol_mapping Revert "uprobes: use vm_special_mapping close() functionality" mm: support large folios swap-in for sync io devices mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: fix swap_read_folio_zeromap() for large folios with partial zeromap mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries set_memory: add __must_check to generic stubs mm/vma: return the exact errno in vms_gather_munmap_vmas() memcg: cleanup with !CONFIG_MEMCG_V1 mm/show_mem.c: report alloc tags in human readable units mm: support poison recovery from copy_present_page() mm: support poison recovery from do_cow_fault() resource, kunit: add test case for region_intersects() resource: make alloc_free_mem_region() works for iomem_resource mm: z3fold: deprecate CONFIG_Z3FOLD vfio/pci: implement huge_fault support mm/arm64: support large pfn mappings mm/x86: support large pfn mappings ...
Diffstat (limited to 'kernel')
-rw-r--r--kernel/Makefile1
-rw-r--r--kernel/cgroup/cgroup-internal.h2
-rw-r--r--kernel/cgroup/cgroup.c23
-rw-r--r--kernel/events/uprobes.c35
-rw-r--r--kernel/exit.c57
-rw-r--r--kernel/fork.c4
-rw-r--r--kernel/numa.c26
-rw-r--r--kernel/resource.c13
-rw-r--r--kernel/resource_kunit.c143
-rw-r--r--kernel/sched/core.c4
-rw-r--r--kernel/sched/fair.c14
-rw-r--r--kernel/vmcore_info.c8
12 files changed, 267 insertions, 63 deletions
diff --git a/kernel/Makefile b/kernel/Makefile
index 3c13240dfc9f..87866b037fbe 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -116,7 +116,6 @@ obj-$(CONFIG_SHADOW_CALL_STACK) += scs.o
obj-$(CONFIG_HAVE_STATIC_CALL) += static_call.o
obj-$(CONFIG_HAVE_STATIC_CALL_INLINE) += static_call_inline.o
obj-$(CONFIG_CFI_CLANG) += cfi.o
-obj-$(CONFIG_NUMA) += numa.o
obj-$(CONFIG_PERF_EVENTS) += events/
diff --git a/kernel/cgroup/cgroup-internal.h b/kernel/cgroup/cgroup-internal.h
index 520b90dd97ec..c964dd7ff967 100644
--- a/kernel/cgroup/cgroup-internal.h
+++ b/kernel/cgroup/cgroup-internal.h
@@ -81,6 +81,8 @@ struct cgroup_file_ctx {
struct {
struct cgroup_pidlist *pidlist;
} procs1;
+
+ struct cgroup_of_peak peak;
};
/*
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 90e50d6d3cf3..f38f61bc068a 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1972,6 +1972,13 @@ static int cgroup2_parse_param(struct fs_context *fc, struct fs_parameter *param
return -EINVAL;
}
+struct cgroup_of_peak *of_peak(struct kernfs_open_file *of)
+{
+ struct cgroup_file_ctx *ctx = of->priv;
+
+ return &ctx->peak;
+}
+
static void apply_cgroup_root_flags(unsigned int root_flags)
{
if (current->nsproxy->cgroup_ns == &init_cgroup_ns) {
@@ -4623,8 +4630,9 @@ struct cgroup_subsys_state *css_next_child(struct cgroup_subsys_state *pos,
*
* While this function requires cgroup_mutex or RCU read locking, it
* doesn't require the whole traversal to be contained in a single critical
- * section. This function will return the correct next descendant as long
- * as both @pos and @root are accessible and @pos is a descendant of @root.
+ * section. Additionally, it isn't necessary to hold onto a reference to @pos.
+ * This function will return the correct next descendant as long as both @pos
+ * and @root are accessible and @pos is a descendant of @root.
*
* If a subsystem synchronizes ->css_online() and the start of iteration, a
* css which finished ->css_online() is guaranteed to be visible in the
@@ -4672,8 +4680,9 @@ EXPORT_SYMBOL_GPL(css_next_descendant_pre);
*
* While this function requires cgroup_mutex or RCU read locking, it
* doesn't require the whole traversal to be contained in a single critical
- * section. This function will return the correct rightmost descendant as
- * long as @pos is accessible.
+ * section. Additionally, it isn't necessary to hold onto a reference to @pos.
+ * This function will return the correct rightmost descendant as long as @pos
+ * is accessible.
*/
struct cgroup_subsys_state *
css_rightmost_descendant(struct cgroup_subsys_state *pos)
@@ -4717,9 +4726,9 @@ css_leftmost_descendant(struct cgroup_subsys_state *pos)
*
* While this function requires cgroup_mutex or RCU read locking, it
* doesn't require the whole traversal to be contained in a single critical
- * section. This function will return the correct next descendant as long
- * as both @pos and @cgroup are accessible and @pos is a descendant of
- * @cgroup.
+ * section. Additionally, it isn't necessary to hold onto a reference to @pos.
+ * This function will return the correct next descendant as long as both @pos
+ * and @cgroup are accessible and @pos is a descendant of @cgroup.
*
* If a subsystem synchronizes ->css_online() and the start of iteration, a
* css which finished ->css_online() is guaranteed to be visible in the
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4b7e590dc428..2ec796e2f055 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -103,8 +103,7 @@ struct xol_area {
atomic_t slot_count; /* number of in-use slots */
unsigned long *bitmap; /* 0 = free slot */
- struct vm_special_mapping xol_mapping;
- struct page *pages[2];
+ struct page *page;
/*
* We keep the vma's vm_start rather than a pointer to the vma
* itself. The probed process or a naughty kernel module could make
@@ -1466,6 +1465,21 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned lon
set_bit(MMF_RECALC_UPROBES, &vma->vm_mm->flags);
}
+static vm_fault_t xol_fault(const struct vm_special_mapping *sm,
+ struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+ struct xol_area *area = vma->vm_mm->uprobes_state.xol_area;
+
+ vmf->page = area->page;
+ get_page(vmf->page);
+ return 0;
+}
+
+static const struct vm_special_mapping xol_mapping = {
+ .name = "[uprobes]",
+ .fault = xol_fault,
+};
+
/* Slot allocation for XOL */
static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
{
@@ -1492,7 +1506,7 @@ static int xol_add_vma(struct mm_struct *mm, struct xol_area *area)
vma = _install_special_mapping(mm, area->vaddr, PAGE_SIZE,
VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO,
- &area->xol_mapping);
+ &xol_mapping);
if (IS_ERR(vma)) {
ret = PTR_ERR(vma);
goto fail;
@@ -1531,12 +1545,9 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
if (!area->bitmap)
goto free_area;
- area->xol_mapping.name = "[uprobes]";
- area->xol_mapping.pages = area->pages;
- area->pages[0] = alloc_page(GFP_HIGHUSER);
- if (!area->pages[0])
+ area->page = alloc_page(GFP_HIGHUSER);
+ if (!area->page)
goto free_bitmap;
- area->pages[1] = NULL;
area->vaddr = vaddr;
init_waitqueue_head(&area->wq);
@@ -1544,12 +1555,12 @@ static struct xol_area *__create_xol_area(unsigned long vaddr)
set_bit(0, area->bitmap);
atomic_set(&area->slot_count, 1);
insns = arch_uprobe_trampoline(&insns_size);
- arch_uprobe_copy_ixol(area->pages[0], 0, insns, insns_size);
+ arch_uprobe_copy_ixol(area->page, 0, insns, insns_size);
if (!xol_add_vma(mm, area))
return area;
- __free_page(area->pages[0]);
+ __free_page(area->page);
free_bitmap:
kfree(area->bitmap);
free_area:
@@ -1591,7 +1602,7 @@ void uprobe_clear_state(struct mm_struct *mm)
if (!area)
return;
- put_page(area->pages[0]);
+ put_page(area->page);
kfree(area->bitmap);
kfree(area);
}
@@ -1658,7 +1669,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
if (unlikely(!xol_vaddr))
return 0;
- arch_uprobe_copy_ixol(area->pages[0], xol_vaddr,
+ arch_uprobe_copy_ixol(area->page, xol_vaddr,
&uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
return xol_vaddr;
diff --git a/kernel/exit.c b/kernel/exit.c
index 0d62a53605df..619f0014c33b 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -778,6 +778,62 @@ static void exit_notify(struct task_struct *tsk, int group_dead)
}
#ifdef CONFIG_DEBUG_STACK_USAGE
+unsigned long stack_not_used(struct task_struct *p)
+{
+ unsigned long *n = end_of_stack(p);
+
+ do { /* Skip over canary */
+# ifdef CONFIG_STACK_GROWSUP
+ n--;
+# else
+ n++;
+# endif
+ } while (!*n);
+
+# ifdef CONFIG_STACK_GROWSUP
+ return (unsigned long)end_of_stack(p) - (unsigned long)n;
+# else
+ return (unsigned long)n - (unsigned long)end_of_stack(p);
+# endif
+}
+
+/* Count the maximum pages reached in kernel stacks */
+static inline void kstack_histogram(unsigned long used_stack)
+{
+#ifdef CONFIG_VM_EVENT_COUNTERS
+ if (used_stack <= 1024)
+ count_vm_event(KSTACK_1K);
+#if THREAD_SIZE > 1024
+ else if (used_stack <= 2048)
+ count_vm_event(KSTACK_2K);
+#endif
+#if THREAD_SIZE > 2048
+ else if (used_stack <= 4096)
+ count_vm_event(KSTACK_4K);
+#endif
+#if THREAD_SIZE > 4096
+ else if (used_stack <= 8192)
+ count_vm_event(KSTACK_8K);
+#endif
+#if THREAD_SIZE > 8192
+ else if (used_stack <= 16384)
+ count_vm_event(KSTACK_16K);
+#endif
+#if THREAD_SIZE > 16384
+ else if (used_stack <= 32768)
+ count_vm_event(KSTACK_32K);
+#endif
+#if THREAD_SIZE > 32768
+ else if (used_stack <= 65536)
+ count_vm_event(KSTACK_64K);
+#endif
+#if THREAD_SIZE > 65536
+ else
+ count_vm_event(KSTACK_REST);
+#endif
+#endif /* CONFIG_VM_EVENT_COUNTERS */
+}
+
static void check_stack_usage(void)
{
static DEFINE_SPINLOCK(low_water_lock);
@@ -785,6 +841,7 @@ static void check_stack_usage(void)
unsigned long free;
free = stack_not_used(current);
+ kstack_histogram(THREAD_SIZE - free);
if (free >= lowest_to_date)
return;
diff --git a/kernel/fork.c b/kernel/fork.c
index d4b2d543f48c..5ac41db8aa69 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -832,7 +832,7 @@ static void check_mm(struct mm_struct *mm)
pr_alert("BUG: non-zero pgtables_bytes on freeing mm: %ld\n",
mm_pgtables_bytes(mm));
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(CONFIG_SPLIT_PMD_PTLOCKS)
VM_BUG_ON_MM(mm->pmd_huge_pte, mm);
#endif
}
@@ -1276,7 +1276,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
-#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
+#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !defined(CONFIG_SPLIT_PMD_PTLOCKS)
mm->pmd_huge_pte = NULL;
#endif
mm_init_uprobes_state(mm);
diff --git a/kernel/numa.c b/kernel/numa.c
deleted file mode 100644
index 67ca6b8585c0..000000000000
--- a/kernel/numa.c
+++ /dev/null
@@ -1,26 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-
-#include <linux/printk.h>
-#include <linux/numa.h>
-
-/* Stub functions: */
-
-#ifndef memory_add_physaddr_to_nid
-int memory_add_physaddr_to_nid(u64 start)
-{
- pr_info_once("Unknown online node for memory at 0x%llx, assuming node 0\n",
- start);
- return 0;
-}
-EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
-#endif
-
-#ifndef phys_to_target_node
-int phys_to_target_node(u64 start)
-{
- pr_info_once("Unknown target node for memory at 0x%llx, assuming node 0\n",
- start);
- return 0;
-}
-EXPORT_SYMBOL_GPL(phys_to_target_node);
-#endif
diff --git a/kernel/resource.c b/kernel/resource.c
index 1681ab5012e1..b730bd28b422 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -450,8 +450,7 @@ int walk_system_ram_res_rev(u64 start, u64 end, void *arg,
/* re-alloc */
struct resource *rams_new;
- rams_new = kvrealloc(rams, rams_size * sizeof(struct resource),
- (rams_size + 16) * sizeof(struct resource),
+ rams_new = kvrealloc(rams, (rams_size + 16) * sizeof(struct resource),
GFP_KERNEL);
if (!rams_new)
goto out;
@@ -1860,7 +1859,11 @@ EXPORT_SYMBOL(resource_list_free);
#ifdef CONFIG_GET_FREE_REGION
#define GFR_DESCENDING (1UL << 0)
#define GFR_REQUEST_REGION (1UL << 1)
-#define GFR_DEFAULT_ALIGN (1UL << PA_SECTION_SHIFT)
+#ifdef PA_SECTION_SHIFT
+#define GFR_DEFAULT_ALIGN (1UL << PA_SECTION_SHIFT)
+#else
+#define GFR_DEFAULT_ALIGN PAGE_SIZE
+#endif
static resource_size_t gfr_start(struct resource *base, resource_size_t size,
resource_size_t align, unsigned long flags)
@@ -1872,7 +1875,7 @@ static resource_size_t gfr_start(struct resource *base, resource_size_t size,
return end - size + 1;
}
- return ALIGN(base->start, align);
+ return ALIGN(max(base->start, align), align);
}
static bool gfr_continue(struct resource *base, resource_size_t addr,
@@ -2046,7 +2049,7 @@ struct resource *alloc_free_mem_region(struct resource *base,
return get_free_mem_region(NULL, base, size, align, name,
IORES_DESC_NONE, flags);
}
-EXPORT_SYMBOL_NS_GPL(alloc_free_mem_region, CXL);
+EXPORT_SYMBOL_GPL(alloc_free_mem_region);
#endif /* CONFIG_GET_FREE_REGION */
static int __init strict_iomem(char *str)
diff --git a/kernel/resource_kunit.c b/kernel/resource_kunit.c
index 0e509985a44a..42d2d8d20f5d 100644
--- a/kernel/resource_kunit.c
+++ b/kernel/resource_kunit.c
@@ -7,6 +7,8 @@
#include <linux/ioport.h>
#include <linux/kernel.h>
#include <linux/string.h>
+#include <linux/sizes.h>
+#include <linux/mm.h>
#define R0_START 0x0000
#define R0_END 0xffff
@@ -137,9 +139,150 @@ static void resource_test_intersection(struct kunit *test)
} while (++i < ARRAY_SIZE(results_for_intersection));
}
+/*
+ * The test resource tree for region_intersects() test:
+ *
+ * BASE-BASE+1M-1 : Test System RAM 0
+ * # hole 0 (BASE+1M-BASE+2M)
+ * BASE+2M-BASE+3M-1 : Test CXL Window 0
+ * BASE+3M-BASE+4M-1 : Test System RAM 1
+ * BASE+4M-BASE+7M-1 : Test CXL Window 1
+ * BASE+4M-BASE+5M-1 : Test System RAM 2
+ * BASE+4M+128K-BASE+4M+256K-1: Test Code
+ * BASE+5M-BASE+6M-1 : Test System RAM 3
+ */
+#define RES_TEST_RAM0_OFFSET 0
+#define RES_TEST_RAM0_SIZE SZ_1M
+#define RES_TEST_HOLE0_OFFSET (RES_TEST_RAM0_OFFSET + RES_TEST_RAM0_SIZE)
+#define RES_TEST_HOLE0_SIZE SZ_1M
+#define RES_TEST_WIN0_OFFSET (RES_TEST_HOLE0_OFFSET + RES_TEST_HOLE0_SIZE)
+#define RES_TEST_WIN0_SIZE SZ_1M
+#define RES_TEST_RAM1_OFFSET (RES_TEST_WIN0_OFFSET + RES_TEST_WIN0_SIZE)
+#define RES_TEST_RAM1_SIZE SZ_1M
+#define RES_TEST_WIN1_OFFSET (RES_TEST_RAM1_OFFSET + RES_TEST_RAM1_SIZE)
+#define RES_TEST_WIN1_SIZE (SZ_1M * 3)
+#define RES_TEST_RAM2_OFFSET RES_TEST_WIN1_OFFSET
+#define RES_TEST_RAM2_SIZE SZ_1M
+#define RES_TEST_CODE_OFFSET (RES_TEST_RAM2_OFFSET + SZ_128K)
+#define RES_TEST_CODE_SIZE SZ_128K
+#define RES_TEST_RAM3_OFFSET (RES_TEST_RAM2_OFFSET + RES_TEST_RAM2_SIZE)
+#define RES_TEST_RAM3_SIZE SZ_1M
+#define RES_TEST_TOTAL_SIZE ((RES_TEST_WIN1_OFFSET + RES_TEST_WIN1_SIZE))
+
+static void remove_free_resource(void *ctx)
+{
+ struct resource *res = (struct resource *)ctx;
+
+ remove_resource(res);
+ kfree(res);
+}
+
+static void resource_test_request_region(struct kunit *test, struct resource *parent,
+ resource_size_t start, resource_size_t size,
+ const char *name, unsigned long flags)
+{
+ struct resource *res;
+
+ res = __request_region(parent, start, size, name, flags);
+ KUNIT_ASSERT_NOT_NULL(test, res);
+ kunit_add_action_or_reset(test, remove_free_resource, res);
+}
+
+static void resource_test_insert_resource(struct kunit *test, struct resource *parent,
+ resource_size_t start, resource_size_t size,
+ const char *name, unsigned long flags)
+{
+ struct resource *res;
+
+ res = kzalloc(sizeof(*res), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_NULL(test, res);
+
+ res->name = name;
+ res->start = start;
+ res->end = start + size - 1;
+ res->flags = flags;
+ if (insert_resource(parent, res)) {
+ kfree(res);
+ KUNIT_FAIL_AND_ABORT(test, "Fail to insert resource %pR\n", res);
+ }
+
+ kunit_add_action_or_reset(test, remove_free_resource, res);
+}
+
+static void resource_test_region_intersects(struct kunit *test)
+{
+ unsigned long flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+ struct resource *parent;
+ resource_size_t start;
+
+ /* Find an iomem_resource hole to hold test resources */
+ parent = alloc_free_mem_region(&iomem_resource, RES_TEST_TOTAL_SIZE, SZ_1M,
+ "test resources");
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, parent);
+ start = parent->start;
+ kunit_add_action_or_reset(test, remove_free_resource, parent);
+
+ resource_test_request_region(test, parent, start + RES_TEST_RAM0_OFFSET,
+ RES_TEST_RAM0_SIZE, "Test System RAM 0", flags);
+ resource_test_insert_resource(test, parent, start + RES_TEST_WIN0_OFFSET,
+ RES_TEST_WIN0_SIZE, "Test CXL Window 0",
+ IORESOURCE_MEM);
+ resource_test_request_region(test, parent, start + RES_TEST_RAM1_OFFSET,
+ RES_TEST_RAM1_SIZE, "Test System RAM 1", flags);
+ resource_test_insert_resource(test, parent, start + RES_TEST_WIN1_OFFSET,
+ RES_TEST_WIN1_SIZE, "Test CXL Window 1",
+ IORESOURCE_MEM);
+ resource_test_request_region(test, parent, start + RES_TEST_RAM2_OFFSET,
+ RES_TEST_RAM2_SIZE, "Test System RAM 2", flags);
+ resource_test_insert_resource(test, parent, start + RES_TEST_CODE_OFFSET,
+ RES_TEST_CODE_SIZE, "Test Code", flags);
+ resource_test_request_region(test, parent, start + RES_TEST_RAM3_OFFSET,
+ RES_TEST_RAM3_SIZE, "Test System RAM 3", flags);
+ kunit_release_action(test, remove_free_resource, parent);
+
+ KUNIT_EXPECT_EQ(test, REGION_INTERSECTS,
+ region_intersects(start + RES_TEST_RAM0_OFFSET, PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+ KUNIT_EXPECT_EQ(test, REGION_INTERSECTS,
+ region_intersects(start + RES_TEST_RAM0_OFFSET +
+ RES_TEST_RAM0_SIZE - PAGE_SIZE, 2 * PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+ KUNIT_EXPECT_EQ(test, REGION_DISJOINT,
+ region_intersects(start + RES_TEST_HOLE0_OFFSET, PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+ KUNIT_EXPECT_EQ(test, REGION_DISJOINT,
+ region_intersects(start + RES_TEST_HOLE0_OFFSET +
+ RES_TEST_HOLE0_SIZE - PAGE_SIZE, 2 * PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+ KUNIT_EXPECT_EQ(test, REGION_MIXED,
+ region_intersects(start + RES_TEST_WIN0_OFFSET +
+ RES_TEST_WIN0_SIZE - PAGE_SIZE, 2 * PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+ KUNIT_EXPECT_EQ(test, REGION_INTERSECTS,
+ region_intersects(start + RES_TEST_RAM1_OFFSET +
+ RES_TEST_RAM1_SIZE - PAGE_SIZE, 2 * PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+ KUNIT_EXPECT_EQ(test, REGION_INTERSECTS,
+ region_intersects(start + RES_TEST_RAM2_OFFSET +
+ RES_TEST_RAM2_SIZE - PAGE_SIZE, 2 * PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+ KUNIT_EXPECT_EQ(test, REGION_INTERSECTS,
+ region_intersects(start + RES_TEST_CODE_OFFSET, PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+ KUNIT_EXPECT_EQ(test, REGION_INTERSECTS,
+ region_intersects(start + RES_TEST_RAM2_OFFSET,
+ RES_TEST_RAM2_SIZE + PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+ KUNIT_EXPECT_EQ(test, REGION_MIXED,
+ region_intersects(start + RES_TEST_RAM3_OFFSET,
+ RES_TEST_RAM3_SIZE + PAGE_SIZE,
+ IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE));
+}
+
static struct kunit_case resource_test_cases[] = {
KUNIT_CASE(resource_test_union),
KUNIT_CASE(resource_test_intersection),
+ KUNIT_CASE(resource_test_region_intersects),
{}
};
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a7af49b3a337..c1a08035ea43 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7483,7 +7483,7 @@ EXPORT_SYMBOL(io_schedule);
void sched_show_task(struct task_struct *p)
{
- unsigned long free = 0;
+ unsigned long free;
int ppid;
if (!try_get_task_stack(p))
@@ -7493,9 +7493,7 @@ void sched_show_task(struct task_struct *p)
if (task_is_running(p))
pr_cont(" running task ");
-#ifdef CONFIG_DEBUG_STACK_USAGE
free = stack_not_used(p);
-#endif
ppid = 0;
rcu_read_lock();
if (pid_alive(p))
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b9784e13e6b6..03d7c5e59a18 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1823,7 +1823,7 @@ static bool pgdat_free_space_enough(struct pglist_data *pgdat)
continue;
if (zone_watermark_ok(zone, 0,
- wmark_pages(zone, WMARK_PROMO) + enough_wmark,
+ promo_wmark_pages(zone) + enough_wmark,
ZONE_MOVABLE, 0))
return true;
}
@@ -1921,8 +1921,7 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
* The pages in slow memory node should be migrated according
* to hot/cold instead of private/shared.
*/
- if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING &&
- !node_is_toptier(src_nid)) {
+ if (folio_use_access_time(folio)) {
struct pglist_data *pgdat;
unsigned long rate_limit;
unsigned int latency, th, def_th;
@@ -3269,6 +3268,15 @@ static bool vma_is_accessed(struct mm_struct *mm, struct vm_area_struct *vma)
return true;
}
+ /*
+ * This vma has not been accessed for a while, and if the number
+ * the threads in the same process is low, which means no other
+ * threads can help scan this vma, force a vma scan.
+ */
+ if (READ_ONCE(mm->numa_scan_seq) >
+ (vma->numab_state->prev_scan_seq + get_nr_threads(current)))
+ return true;
+
return false;
}
diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index 8b4f8cc2e0ec..1fec61603ef3 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -198,17 +198,17 @@ static int __init crash_save_vmcoreinfo_init(void)
VMCOREINFO_NUMBER(PG_private);
VMCOREINFO_NUMBER(PG_swapcache);
VMCOREINFO_NUMBER(PG_swapbacked);
-#define PAGE_SLAB_MAPCOUNT_VALUE (~PG_slab)
+#define PAGE_SLAB_MAPCOUNT_VALUE (PGTY_slab << 24)
VMCOREINFO_NUMBER(PAGE_SLAB_MAPCOUNT_VALUE);
#ifdef CONFIG_MEMORY_FAILURE
VMCOREINFO_NUMBER(PG_hwpoison);
#endif
VMCOREINFO_NUMBER(PG_head_mask);
-#define PAGE_BUDDY_MAPCOUNT_VALUE (~PG_buddy)
+#define PAGE_BUDDY_MAPCOUNT_VALUE (PGTY_buddy << 24)
VMCOREINFO_NUMBER(PAGE_BUDDY_MAPCOUNT_VALUE);
-#define PAGE_HUGETLB_MAPCOUNT_VALUE (~PG_hugetlb)
+#define PAGE_HUGETLB_MAPCOUNT_VALUE (PGTY_hugetlb << 24)
VMCOREINFO_NUMBER(PAGE_HUGETLB_MAPCOUNT_VALUE);
-#define PAGE_OFFLINE_MAPCOUNT_VALUE (~PG_offline)
+#define PAGE_OFFLINE_MAPCOUNT_VALUE (PGTY_offline << 24)
VMCOREINFO_NUMBER(PAGE_OFFLINE_MAPCOUNT_VALUE);
#ifdef CONFIG_KALLSYMS