summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2022-08-02mm/folio-compat: Remove migration compatibility functionsMatthew Wilcox (Oracle)
migrate_page_move_mapping(), migrate_page_copy() and migrate_page_states() are all now unused after converting all the filesystems from aops->migratepage() to aops->migrate_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-08-02fs: Remove aops->migratepage()Matthew Wilcox (Oracle)
With all users converted to migrate_folio(), remove this operation. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-08-02secretmem: Convert to migrate_folioMatthew Wilcox (Oracle)
This is little more than changing the types over; there's no real work being done in this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-08-02hugetlb: Convert to migrate_folioMatthew Wilcox (Oracle)
This involves converting migrate_huge_page_move_mapping(). We also need a folio variant of hugetlb_set_page_subpool(), but that's for a later patch. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
2022-08-02mm/migrate: Add filemap_migrate_folio()Matthew Wilcox (Oracle)
There is nothing iomap-specific about iomap_migratepage(), and it fits a pattern used by several other filesystems, so move it to mm/migrate.c, convert it to be filemap_migrate_folio() and convert the iomap filesystems to use it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2022-08-02mm/migrate: Convert migrate_page() to migrate_folio()Matthew Wilcox (Oracle)
Convert all callers to pass a folio. Most have the folio already available. Switch all users from aops->migratepage to aops->migrate_folio. Also turn the documentation into kerneldoc. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com>
2022-08-02mm/migrate: Convert expected_page_refs() to folio_expected_refs()Matthew Wilcox (Oracle)
Now that both callers have a folio, convert this function to take a folio & rename it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-08-02mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio()Matthew Wilcox (Oracle)
Use a folio throughout __buffer_migrate_folio(), add kernel-doc for buffer_migrate_folio() and buffer_migrate_folio_norefs(), move their declarations to buffer.h and switch all filesystems that have wired them up. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-08-02mm/migrate: Convert writeout() to take a folioMatthew Wilcox (Oracle)
Use a folio throughout this function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-08-02mm/migrate: Convert fallback_migrate_page() to fallback_migrate_folio()Matthew Wilcox (Oracle)
Use a folio throughout. migrate_page() will be converted to migrate_folio() later. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-08-02fs: Add aops->migrate_folioMatthew Wilcox (Oracle)
Provide a folio-based replacement for aops->migratepage. Update the documentation to document migrate_folio instead of migratepage. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-08-02mm: Convert all PageMovable users to movable_operationsMatthew Wilcox (Oracle)
These drivers are rather uncomfortably hammered into the address_space_operations hole. They aren't filesystems and don't behave like filesystems. They just need their own movable_operations structure, which we can point to directly from page->mapping. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-08-02secretmem: Remove isolate_pageMatthew Wilcox (Oracle)
The isolate_page operation is never called for filesystems, only for device drivers which call SetPageMovable. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com>
2022-06-29filemap: Use filemap_read_folio() in do_read_cache_folio()Matthew Wilcox (Oracle)
By passing ->read_folio to filemap_read_folio(), we can use filemap_read_folio() in do_read_cache_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-29filemap: Handle AOP_TRUNCATED_PAGE in do_read_cache_folio()Matthew Wilcox (Oracle)
If the call to filler() returns AOP_TRUNCATED_PAGE, we need to retry the page cache lookup. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-29filemap: Move 'filler' case to the end of do_read_cache_folio()Matthew Wilcox (Oracle)
No functionality change intended; this simply moves code around to disentangle the function a little. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-29filemap: Remove find_get_pages_range() and associated functionsMatthew Wilcox (Oracle)
All callers of find_get_pages_range(), pagevec_lookup_range() and pagevec_lookup() have now been removed. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-29shmem: Convert shmem_unlock_mapping() to use filemap_get_folios()Matthew Wilcox (Oracle)
This is a straightforward conversion. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-29vmscan: Add check_move_unevictable_folios()Matthew Wilcox (Oracle)
Change the guts of check_move_unevictable_pages() over to use folios and add check_move_unevictable_pages() as a wrapper. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-29filemap: Add filemap_get_folios()Matthew Wilcox (Oracle)
This is the equivalent of find_get_pages() but fills a folio_batch instead of an array of pages. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-29filemap: Remove add_to_page_cache() and add_to_page_cache_locked()Matthew Wilcox (Oracle)
These functions have no more users, so delete them. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
2022-06-29hugetlb: Convert huge_add_to_page_cache() to use a folioMatthew Wilcox (Oracle)
Remove the last caller of add_to_page_cache() Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
2022-06-29mm: Remove __delete_from_page_cache()Matthew Wilcox (Oracle)
This wrapper is no longer used. Remove it and all references to it. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-29mm: Account dirty folios properly during splitsMatthew Wilcox (Oracle)
If the last folio in a file is split as a result of truncation, we simply clear the dirty bits for the pages we're discarding. That causes NR_FILE_DIRTY (among other counters) to be thrown off and eventually Linux will hang in balance_dirty_pages_ratelimited() Reported-by: Dave Chinner <dchinner@redhat.com> Tested-by: Dave Chinner <dchinner@redhat.com> Tested-by: Darrick J. Wong <djwong@kernel.org> Fixes: d68eccad3706 ("mm/filemap: Allow large folios to be added to the page cache") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-23filemap: Fix serialization adding transparent huge pages to page cacheAlistair Popple
Commit 793917d997df ("mm/readahead: Add large folio readahead") introduced support for using large folios for filebacked pages if the filesystem supports it. page_cache_ra_order() was introduced to allocate and add these large folios to the page cache. However adding pages to the page cache should be serialized against truncation and hole punching by taking invalidate_lock. Not doing so can lead to data races resulting in stale data getting added to the page cache and marked up-to-date. See commit 730633f0b7f9 ("mm: Protect operations adding pages to page cache with invalidate_lock") for more details. This issue was found by inspection but a testcase revealed it was possible to observe in practice on XFS. Fix this by taking invalidate_lock in page_cache_ra_order(), to mirror what is done for the non-thp case in page_cache_ra_unbounded(). Signed-off-by: Alistair Popple <apopple@nvidia.com> Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-23mm: Clear page->private when splitting or migrating a pageMatthew Wilcox (Oracle)
In our efforts to remove uses of PG_private, we have found folios with the private flag clear and folio->private not-NULL. That is the root cause behind 642d51fb0775 ("ceph: check folio PG_private bit instead of folio->private"). It can also affect a few other filesystems that haven't yet reported a problem. compaction_alloc() can return a page with uninitialised page->private, and rather than checking all the callers of migrate_pages(), just zero page->private after calling get_new_page(). Similarly, the tail pages from split_huge_page() may also have an uninitialised page->private. Reported-by: Xiubo Li <xiubli@redhat.com> Tested-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-20filemap: Handle sibling entries in filemap_get_read_batch()Matthew Wilcox (Oracle)
If a read races with an invalidation followed by another read, it is possible for a folio to be replaced with a higher-order folio. If that happens, we'll see a sibling entry for the new folio in the next iteration of the loop. This manifests as a NULL pointer dereference while holding the RCU read lock. Handle this by simply returning. The next call will find the new folio and handle it correctly. The other ways of handling this rare race are more complex and it's just not worth it. Reported-by: Dave Chinner <david@fromorbit.com> Reported-by: Brian Foster <bfoster@redhat.com> Debugged-by: Brian Foster <bfoster@redhat.com> Tested-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Fixes: cbd59c48ae2b ("mm/filemap: use head pages in generic_file_buffered_read") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-20filemap: Correct the conditions for marking a folio as accessedMatthew Wilcox (Oracle)
We had an off-by-one error which meant that we never marked the first page in a read as accessed. This was visible as a slowdown when re-reading a file as pages were being evicted from cache too soon. In reviewing this code, we noticed a second bug where a multi-page folio would be marked as accessed multiple times when doing reads that were less than the size of the folio. Abstract the comparison of whether two file positions are in the same folio into a new function, fixing both of these bugs. Reported-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-20Merge tag 'slab-for-5.19-fixup' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fixes from Vlastimil Babka: - A slub fix for PREEMPT_RT locking semantics from Sebastian. - A slub fix for state corruption due to a possible race scenario from Jann. * tag 'slab-for-5.19-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slub: add missing TID updates on slab deactivation mm/slub: Move the stackdepot related allocation out of IRQ-off section.
2022-06-17Merge tag 'fs_for_v5.19-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull writeback and ext2 fixes from Jan Kara: "A fix for writeback bug which prevented machines with kdevtmpfs from booting and also one small ext2 bugfix in IO error handling" * tag 'fs_for_v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: init: Initialize noop_backing_dev_info early ext2: fix fs corruption when trying to remove a non-empty directory with IO error
2022-06-16init: Initialize noop_backing_dev_info earlyJan Kara
noop_backing_dev_info is used by superblocks of various pseudofilesystems such as kdevtmpfs. After commit 10e14073107d ("writeback: Fix inode->i_io_list not be protected by inode->i_lock error") this broke because __mark_inode_dirty() started to access more fields from noop_backing_dev_info and this led to crashes inside locked_inode_to_wb_and_lock_list() called from __mark_inode_dirty(). Fix the problem by initializing noop_backing_dev_info before the filesystems get mounted. Fixes: 10e14073107d ("writeback: Fix inode->i_io_list not be protected by inode->i_lock error") Reported-and-tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reported-and-tested-by: Alexandru Elisei <alexandru.elisei@arm.com> Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
2022-06-13usercopy: Make usercopy resilient against ridiculously large copiesMatthew Wilcox (Oracle)
If 'n' is so large that it's negative, we might wrap around and mistakenly think that the copy is OK when it's not. Such a copy would probably crash, but just doing the arithmetic in a more simple way lets us detect and refuse this case. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220612213227.3881769-4-willy@infradead.org
2022-06-13usercopy: Cast pointer to an integer onceMatthew Wilcox (Oracle)
Get rid of a lot of annoying casts by setting 'addr' once at the top of the function. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220612213227.3881769-3-willy@infradead.org
2022-06-13usercopy: Handle vm_map_ram() areasMatthew Wilcox (Oracle)
vmalloc does not allocate a vm_struct for vm_map_ram() areas. That causes us to deny usercopies from those areas. This affects XFS which uses vm_map_ram() for its directories. Fix this by calling find_vmap_area() instead of find_vm_area(). Fixes: 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220612213227.3881769-2-willy@infradead.org
2022-06-13mm/slub: add missing TID updates on slab deactivationJann Horn
The fastpath in slab_alloc_node() assumes that c->slab is stable as long as the TID stays the same. However, two places in __slab_alloc() currently don't update the TID when deactivating the CPU slab. If multiple operations race the right way, this could lead to an object getting lost; or, in an even more unlikely situation, it could even lead to an object being freed onto the wrong slab's freelist, messing up the `inuse` counter and eventually causing a page to be freed to the page allocator while it still contains slab objects. (I haven't actually tested these cases though, this is just based on looking at the code. Writing testcases for this stuff seems like it'd be a pain...) The race leading to state inconsistency is (all operations on the same CPU and kmem_cache): - task A: begin do_slab_free(): - read TID - read pcpu freelist (==NULL) - check `slab == c->slab` (true) - [PREEMPT A->B] - task B: begin slab_alloc_node(): - fastpath fails (`c->freelist` is NULL) - enter __slab_alloc() - slub_get_cpu_ptr() (disables preemption) - enter ___slab_alloc() - take local_lock_irqsave() - read c->freelist as NULL - get_freelist() returns NULL - write `c->slab = NULL` - drop local_unlock_irqrestore() - goto new_slab - slub_percpu_partial() is NULL - get_partial() returns NULL - slub_put_cpu_ptr() (enables preemption) - [PREEMPT B->A] - task A: finish do_slab_free(): - this_cpu_cmpxchg_double() succeeds() - [CORRUPT STATE: c->slab==NULL, c->freelist!=NULL] From there, the object on c->freelist will get lost if task B is allowed to continue from here: It will proceed to the retry_load_slab label, set c->slab, then jump to load_freelist, which clobbers c->freelist. But if we instead continue as follows, we get worse corruption: - task A: run __slab_free() on object from other struct slab: - CPU_PARTIAL_FREE case (slab was on no list, is now on pcpu partial) - task A: run slab_alloc_node() with NUMA node constraint: - fastpath fails (c->slab is NULL) - call __slab_alloc() - slub_get_cpu_ptr() (disables preemption) - enter ___slab_alloc() - c->slab is NULL: goto new_slab - slub_percpu_partial() is non-NULL - set c->slab to slub_percpu_partial(c) - [CORRUPT STATE: c->slab points to slab-1, c->freelist has objects from slab-2] - goto redo - node_match() fails - goto deactivate_slab - existing c->freelist is passed into deactivate_slab() - inuse count of slab-1 is decremented to account for object from slab-2 At this point, the inuse count of slab-1 is 1 lower than it should be. This means that if we free all allocated objects in slab-1 except for one, SLUB will think that slab-1 is completely unused, and may free its page, leading to use-after-free. Fixes: c17dda40a6a4e ("slub: Separate out kmem_cache_cpu processing from deactivate_slab") Fixes: 03e404af26dc2 ("slub: fast release on full slab") Cc: stable@vger.kernel.org Signed-off-by: Jann Horn <jannh@google.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/20220608182205.2945720-1-jannh@google.com
2022-06-13mm/slub: Move the stackdepot related allocation out of IRQ-off section.Sebastian Andrzej Siewior
The set_track() invocation in free_debug_processing() is invoked with acquired slab_lock(). The lock disables interrupts on PREEMPT_RT and this forbids to allocate memory which is done in stack_depot_save(). Split set_track() into two parts: set_track_prepare() which allocate memory and set_track_update() which only performs the assignment of the trace data structure. Use set_track_prepare() before disabling interrupts. [ vbabka@suse.cz: make set_track() call set_track_update() instead of open-coded assignments ] Fixes: 5cf909c553e9e ("mm/slub: use stackdepot to save stack trace in objects") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Link: https://lore.kernel.org/r/Yp9sqoUi4fVa5ExF@linutronix.de
2022-06-09mm/huge_memory: Fix xarray node memory leakMatthew Wilcox (Oracle)
If xas_split_alloc() fails to allocate the necessary nodes to complete the xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem() then interprets as "Please allocate more memory", not as "Please free any unnecessary memory" (which was the intended outcome). It's confusing to use xas_nomem() to free memory in this context, so call xas_destroy() instead. Reported-by: syzbot+9e27a75a8c24f3fe75c1@syzkaller.appspotmail.com Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-09filemap: Cache the value of vm_flagsMatthew Wilcox (Oracle)
After we have unlocked the mmap_lock for I/O, the file is pinned, but the VMA is not. Checking this flag after that can be a use-after-free. It's not a terribly interesting use-after-free as it can only read one bit, and it's used to decide whether to read 2MB or 4MB. But it upsets the automated tools and it's generally bad practice anyway, so let's fix it. Reported-by: syzbot+5b96d55e5b54924c77ad@syzkaller.appspotmail.com Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-09filemap: Don't release a locked folioMatthew Wilcox (Oracle)
We must hold a reference over the call to filemap_release_folio(), otherwise the page cache will put the last reference to the folio before we unlock it, leading to splats like this: BUG: Bad page state in process u8:5 pfn:1ab1f4 page:ffffea0006ac7d00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x28b1de pfn:0x1ab1f4 flags: 0x17ff80000040001(locked|reclaim|node=0|zone=2|lastcpupid=0xfff) raw: 017ff80000040001 dead000000000100 dead000000000122 0000000000000000 raw: 000000000028b1de 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set It's an error path, so it doesn't see much testing. Reported-by: Darrick J. Wong <djwong@kernel.org> Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-05Merge tag 'mm-hotfixes-stable-2022-06-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm hotfixes from Andrew Morton: "Fixups for various recently-added and longer-term issues and a few minor tweaks: - fixes for material merged during this merge window - cc:stable fixes for more longstanding issues - minor mailmap and MAINTAINERS updates" * tag 'mm-hotfixes-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/oom_kill.c: fix vm_oom_kill_table[] ifdeffery x86/kexec: fix memory leak of elf header buffer mm/memremap: fix missing call to untrack_pfn() in pagemap_range() mm: page_isolation: use compound_nr() correctly in isolate_single_pageblock() mm: hugetlb_vmemmap: fix CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON MAINTAINERS: add maintainer information for z3fold mailmap: update Josh Poimboeuf's email
2022-06-05Merge tag 'mm-nonmm-stable-2022-06-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull delay-accounting update from Andrew Morton: "A single featurette for delay accounting. Delayed a bit because, unusually, it had dependencies on both the mm-stable and mm-nonmm-stable queues" * tag 'mm-nonmm-stable-2022-06-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: delayacct: track delays from write-protect copy
2022-06-04Merge tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linuxLinus Torvalds
Pull bitmap updates from Yury Norov: - bitmap: optimize bitmap_weight() usage, from me - lib/bitmap.c make bitmap_print_bitmask_to_buf parseable, from Mauro Carvalho Chehab - include/linux/find: Fix documentation, from Anna-Maria Behnsen - bitmap: fix conversion from/to fix-sized arrays, from me - bitmap: Fix return values to be unsigned, from Kees Cook It has been in linux-next for at least a week with no problems. * tag 'bitmap-for-5.19-rc1' of https://github.com/norov/linux: (31 commits) nodemask: Fix return values to be unsigned bitmap: Fix return values to be unsigned KVM: x86: hyper-v: replace bitmap_weight() with hweight64() KVM: x86: hyper-v: fix type of valid_bank_mask ia64: cleanup remove_siblinginfo() drm/amd/pm: use bitmap_{from,to}_arr32 where appropriate KVM: s390: replace bitmap_copy with bitmap_{from,to}_arr64 where appropriate lib/bitmap: add test for bitmap_{from,to}_arr64 lib: add bitmap_{from,to}_arr64 lib/bitmap: extend comment for bitmap_(from,to)_arr32() include/linux/find: Fix documentation lib/bitmap.c make bitmap_print_bitmask_to_buf parseable MAINTAINERS: add cpumask and nodemask files to BITMAP_API arch/x86: replace nodes_weight with nodes_empty where appropriate mm/vmstat: replace cpumask_weight with cpumask_empty where appropriate clocksource: replace cpumask_weight with cpumask_empty in clocksource.c genirq/affinity: replace cpumask_weight with cpumask_empty where appropriate irq: mips: replace cpumask_weight with cpumask_empty where appropriate drm/i915/pmu: replace cpumask_weight with cpumask_empty where appropriate arch/x86: replace cpumask_weight with cpumask_empty where appropriate ...
2022-06-03mm/vmstat: replace cpumask_weight with cpumask_empty where appropriateYury Norov
mm/vmstat.c code calls cpumask_weight() to check if any bit of a given cpumask is set. We can do it more efficiently with cpumask_empty() because cpumask_empty() stops traversing the cpumask as soon as it finds first set bit, while cpumask_weight() counts all bits unconditionally. Signed-off-by: Yury Norov <yury.norov@gmail.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com>
2022-06-01mm/oom_kill.c: fix vm_oom_kill_table[] ifdefferyAndrew Morton
arm allnoconfig: mm/oom_kill.c:60:25: warning: 'vm_oom_kill_table' defined but not used [-Wunused-variable] 60 | static struct ctl_table vm_oom_kill_table[] = { | ^~~~~~~~~~~~~~~~~ Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-01mm/memremap: fix missing call to untrack_pfn() in pagemap_range()Miaohe Lin
We forget to call untrack_pfn() to pair with track_pfn_remap() when range is not allowed to hotplug. Fix it by jump err_kasan. Link: https://lkml.kernel.org/r/20220531122643.25249-1-linmiaohe@huawei.com Fixes: bca3feaa0764 ("mm/memory_hotplug: prevalidate the address range being added with platform") Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: David Hildenbrand <david@redhat.com> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-01mm: page_isolation: use compound_nr() correctly in isolate_single_pageblock()Zi Yan
When compound_nr(page) was used, page was not guaranteed to be the head of the compound page and it could cause an infinite loop. Fix it by calling it on the head page. Link: https://lkml.kernel.org/r/20220531024450.2498431-1-zi.yan@sent.com Fixes: b2c9e2fbba32 ("mm: make alloc_contig_range work at pageblock granularity") Signed-off-by: Zi Yan <ziy@nvidia.com> Reported-by: Anshuman Khandual <anshuman.khandual@arm.com> Link: https://lore.kernel.org/linux-mm/20220530115027.123341-1-anshuman.khandual@arm.com/ Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Muchun Song <songmuchun@bytedance.com> Cc: David Hildenbrand <david@redhat.com> Cc: Qian Cai <quic_qiancai@quicinc.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Eric Ren <renzhengeek@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-01mm: hugetlb_vmemmap: fix CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ONMuchun Song
The following: commit 47010c040dec ("mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*") forgot to update CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON used in vmemmap_optimize_mode to CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON. The result is we cannot enable hugetlb_optimize_vmemmap at boot time when we configure CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON. Fix it. Link: https://lkml.kernel.org/r/20220527081948.68832-1-songmuchun@bytedance.com Fixes: 47010c040dec ("mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP*") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Reported-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-01delayacct: track delays from write-protect copyYang Yang
Delay accounting does not track the delay of write-protect copy. When tasks trigger many write-protect copys(include COW and unsharing of anonymous pages[1]), it may spend a amount of time waiting for them. To get the delay of tasks in write-protect copy, could help users to evaluate the impact of using KSM or fork() or GUP. Also update tools/accounting/getdelays.c: / # ./getdelays -dl -p 231 print delayacct stats ON listen forever PID 231 CPU count real total virtual total delay total delay average 6247 1859000000 2154070021 1674255063 0.268ms IO count delay total delay average 0 0 0ms SWAP count delay total delay average 0 0 0ms RECLAIM count delay total delay average 0 0 0ms THRASHING count delay total delay average 0 0 0ms COMPACT count delay total delay average 3 72758 0ms WPCOPY count delay total delay average 3635 271567604 0ms [1] commit 31cc5bc4af70("mm: support GUP-triggered unsharing of anonymous pages") Link: https://lkml.kernel.org/r/20220409014342.2505532-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reviewed-by: wangyong <wang.yong12@zte.com.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-31Merge tag 'riscv-for-linus-5.19-mw0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Palmer Dabbelt: - Support for the Svpbmt extension, which allows memory attributes to be encoded in pages - Support for the Allwinner D1's implementation of page-based memory attributes - Support for running rv32 binaries on rv64 systems, via the compat subsystem - Support for kexec_file() - Support for the new generic ticket-based spinlocks, which allows us to also move to qrwlock. These should have already gone in through the asm-geneic tree as well - A handful of cleanups and fixes, include some larger ones around atomics and XIP * tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits) RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add] riscv: compat: Using seperated vdso_maps for compat_vdso_info RISC-V: Fix the XIP build RISC-V: Split out the XIP fixups into their own file RISC-V: ignore xipImage RISC-V: Avoid empty create_*_mapping definitions riscv: Don't output a bogus mmu-type on a no MMU kernel riscv: atomic: Add custom conditional atomic operation implementation riscv: atomic: Optimize dec_if_positive functions riscv: atomic: Cleanup unnecessary definition RISC-V: Load purgatory in kexec_file RISC-V: Add purgatory RISC-V: Support for kexec_file on panic RISC-V: Add kexec_file support RISC-V: use memcpy for kexec_file mode kexec_file: Fix kexec_file.c build error for riscv platform riscv: compat: Add COMPAT Kbuild skeletal support riscv: compat: ptrace: Add compat_arch_ptrace implement riscv: compat: signal: Add rt_frame implementation riscv: add memory-type errata for T-Head ...
2022-05-28Merge tag 'powerpc-5.19-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Convert to the generic mmap support (ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT) - Add support for outline-only KASAN with 64-bit Radix MMU (P9 or later) - Increase SIGSTKSZ and MINSIGSTKSZ and add support for AT_MINSIGSTKSZ - Enable the DAWR (Data Address Watchpoint) on POWER9 DD2.3 or later - Drop support for system call instruction emulation - Many other small features and fixes Thanks to Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Bagas Sanjaya, Bjorn Helgaas, Bo Liu, Chen Huang, Christophe Leroy, Colin Ian King, Daniel Axtens, Dwaipayan Ray, Fabiano Rosas, Finn Thain, Frank Rowand, Fuqian Huang, Guilherme G. Piccoli, Hangyu Hua, Haowen Bai, Haren Myneni, Hari Bathini, He Ying, Jason Wang, Jiapeng Chong, Jing Yangyang, Joel Stanley, Julia Lawall, Kajol Jain, Kevin Hao, Krzysztof Kozlowski, Laurent Dufour, Lv Ruyi, Madhavan Srinivasan, Magali Lemes, Miaoqian Lin, Minghao Chi, Nathan Chancellor, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Oscar Salvador, Pali Rohár, Paul Mackerras, Peng Wu, Qing Wang, Randy Dunlap, Reza Arbab, Russell Currey, Sohaib Mohamed, Vaibhav Jain, Vasant Hegde, Wang Qing, Wang Wensheng, Xiang wangx, Xiaomeng Tong, Xu Wang, Yang Guang, Yang Li, Ye Bin, YueHaibing, Yu Kuai, Zheng Bin, Zou Wei, and Zucheng Zheng. * tag 'powerpc-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits) powerpc/64: Include cache.h directly in paca.h powerpc/64s: Only set HAVE_ARCH_UNMAPPED_AREA when CONFIG_PPC_64S_HASH_MMU is set powerpc/xics: Include missing header powerpc/powernv/pci: Drop VF MPS fixup powerpc/fsl_book3e: Don't set rodata RO too early powerpc/microwatt: Add mmu bits to device tree powerpc/powernv/flash: Check OPAL flash calls exist before using powerpc/powermac: constify device_node in of_irq_parse_oldworld() powerpc/powermac: add missing g5_phy_disable_cpu1() declaration selftests/powerpc/pmu: fix spelling mistake "mis-match" -> "mismatch" powerpc: Enable the DAWR on POWER9 DD2.3 and above powerpc/64s: Add CPU_FTRS_POWER10 to ALWAYS mask powerpc/64s: Add CPU_FTRS_POWER9_DD2_2 to CPU_FTRS_ALWAYS mask powerpc: Fix all occurences of "the the" selftests/powerpc/pmu/ebb: remove fixed_instruction.S powerpc/platforms/83xx: Use of_device_get_match_data() powerpc/eeh: Drop redundant spinlock initialization powerpc/iommu: Add missing of_node_put in iommu_init_early_dart powerpc/pseries/vas: Call misc_deregister if sysfs init fails powerpc/papr_scm: Fix leaking nvdimm_events_map elements ...