summaryrefslogtreecommitdiff
path: root/fs/nfs/file.c
AgeCommit message (Collapse)Author
2024-07-21Merge tag 'mm-stable-2024-07-21-14-50' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page*(), vm_map_pages*() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-17nfs: pass explicit offset/count to trace eventsChristoph Hellwig
nfs_folio_length is unsafe to use without having the folio locked and a check for a NULL ->f_mapping that protects against truncations and can lead to kernel crashes. E.g. when running xfstests generic/065 with all nfs trace points enabled. Follow the model of the XFS trace points and pass in an explіcit offset and length. This has the additional benefit that these values can be more accurate as some of the users touch partial folio ranges. Fixes: eb5654b3b89d ("NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio()") Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08nfs: remove dead code for the old swap over NFS implementationChristoph Hellwig
Remove the code testing folio_test_swapcache either explicitly or implicitly in pagemap.h headers, as is now handled using the direct I/O path and not the buffered I/O path that these helpers are located in. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08NFSv4: Add a flags argument to the 'have_delegation' callbackTrond Myklebust
This argument will be used to allow the caller to specify whether or not they need to know that this is an attribute delegation. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08nfs: add support for large foliosChristoph Hellwig
NFS already is void of folio size assumption, so just pass the chunk size to __filemap_get_folio and set the large folio address_space flag for all regular files. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-03nfs: drop usage of folio_file_posKairui Song
folio_file_pos is only needed for mixed usage of page cache and swap cache, for pure page cache usage, the caller can just use folio_pos instead. After commit e1209d3a7a67 ("mm: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space"), swap cache should never be exposed to nfs. So remove the usage of folio_file_pos in following NFS functions / helpers: - nfs_vm_page_mkwrite It's only used by nfs_file_vm_ops.page_mkwrite - trace event helper: nfs_folio_event - trace event helper: nfs_folio_event_done These two are used through DEFINE_NFS_FOLIO_EVENT and DEFINE_NFS_FOLIO_EVENT_DONE, which defined following events: - trace_nfs_aop_readpage{_done}: only called by nfs_read_folio - trace_nfs_writeback_folio: only called by nfs_wb_folio - trace_nfs_invalidate_folio: only called by nfs_invalidate_folio - trace_nfs_launder_folio_done: only called by nfs_launder_folio None of them could possibly be used on swap cache folio, nfs_read_folio only called by: .write_begin -> nfs_read_folio .read_folio nfs_wb_folio only called by nfs mapping: .release_folio -> nfs_wb_folio .launder_folio -> nfs_wb_folio .write_begin -> nfs_read_folio -> nfs_wb_folio .read_folio -> nfs_wb_folio .write_end -> nfs_update_folio -> nfs_writepage_setup -> nfs_setup_write_request -> nfs_try_to_update_request -> nfs_wb_folio .page_mkwrite -> nfs_update_folio -> nfs_writepage_setup -> nfs_setup_write_request -> nfs_try_to_update_request -> nfs_wb_folio .write_begin -> nfs_flush_incompatible -> nfs_wb_folio .page_mkwrite -> nfs_vm_page_mkwrite -> nfs_flush_incompatible -> nfs_wb_folio nfs_invalidate_folio is only called by .invalidate_folio. nfs_launder_folio is only called by .launder_folio - nfs_grow_file - nfs_update_folio nfs_grow_file is only called by nfs_update_folio, and all possible callers of them are: .write_end -> nfs_update_folio .page_mkwrite -> nfs_update_folio - nfs_wb_folio_cancel .invalidate_folio -> nfs_wb_folio_cancel Also, seeing from the swap side, swap_rw is now the only interface calling into fs, the offset info is always in iocb.ki_pos now. So we can remove all these folio_file_pos call safely. Link: https://lkml.kernel.org/r/20240521175854.96038-8-ryncsn@gmail.com Signed-off-by: Kairui Song <kasong@tencent.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Barry Song <v-songbaohua@oppo.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: NeilBrown <neilb@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Xiubo Li <xiubli@redhat.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-29mm: Remove the PG_fscache alias for PG_private_2David Howells
Remove the PG_fscache alias for PG_private_2 and use the latter directly. Use of this flag for marking pages undergoing writing to the cache should be considered deprecated and the folios should be marked dirty instead and the write done in ->writepages(). Note that PG_private_2 itself should be considered deprecated and up for future removal by the MM folks too. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: Bharath SM <bharathsm@microsoft.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: netfs@lists.linux.dev cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-02-05nfs: adapt to breakup of struct file_lockJeff Layton
Most of the existing APIs have remained the same, but subsystems that access file_lock fields directly need to reach into struct file_lock_core now. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-41-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: split common fields into struct file_lock_coreJeff Layton
In a future patch, we're going to split file leases into their own structure. Since a lot of the underlying machinery uses the same fields move those into a new file_lock_core, and embed that inside struct file_lock. For now, add some macros to ensure that we can continue to build while the conversion is in progress. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-17-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05nfs: convert to using new filelock helpersJeff Layton
Convert to using the new file locking helper functions. Also, in later patches we're going to introduce some temporary macros with names that clash with the variable name in nfs4_proc_unlck. Rename it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-11-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-10Merge tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull nfs client updates from Anna Schumaker: "New Features: - Always ask for type with READDIR - Remove nfs_writepage() Bugfixes: - Fix a suspicious RCU usage warning - Fix a blocklayoutdriver reference leak - Fix the block driver's calculation of layoutget size - Fix handling NFS4ERR_RETURNCONFLICT - Fix _xprt_switch_find_current_entry() - Fix v4.1 backchannel request timeouts - Don't add zero-length pnfs block devices - Use the parent cred in nfs_access_login_time() Cleanups: - A few improvements when dealing with referring calls from the server - Clean up various unused variables, struct fields, and function calls - Various tracepoint improvements" * tag 'nfs-for-6.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (21 commits) NFSv4.1: Use the nfs_client's rpc timeouts for backchannel SUNRPC: Fixup v4.1 backchannel request timeouts rpc_pipefs: Replace one label in bl_resolve_deviceid() nfs: Remove writepage NFS: drop unused nfs_direct_req bytes_left pNFS: Fix the pnfs block driver's calculation of layoutget size nfs: print fileid in lookup tracepoints nfs: rename the nfs_async_rename_done tracepoint nfs: add new tracepoint at nfs4 revalidate entry point SUNRPC: fix _xprt_switch_find_current_entry logic NFSv4.1/pnfs: Ensure we handle the error NFS4ERR_RETURNCONFLICT NFSv4.1: if referring calls are complete, trust the stateid argument NFSv4: Track the number of referring calls in struct cb_process_state NFS: Use parent's objective cred in nfs_access_login_time() NFSv4: Always ask for type with READDIR pnfs/blocklayout: Don't add zero-length pnfs_block_dev blocklayoutdriver: Fix reference leak of pnfs_device_node SUNRPC: Fix a suspicious RCU usage warning SUNRPC: Create a helper function for accessing the rpc_clnt's xprt_switch SUNRPC: Remove unused function rpc_clnt_xprt_switch_put() ...
2024-01-04nfs: Remove writepageMatthew Wilcox (Oracle)
NFS already has writepages and migrate_folio, so it does not need to implement writepage. The writepage operation is deprecated as it leads to worse performance under high memory pressure due to folios being written out in LRU order rather than sequentially within a file. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-12-10fs: convert error_remove_page to error_remove_folioMatthew Wilcox (Oracle)
There were already assertions that we were not passing a tail page to error_remove_page(), so make the compiler enforce that by converting everything to pass and use a folio. Link: https://lkml.kernel.org/r/20231117161447.2461643-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-24filemap: Fix errors in file.chuzhi001@208suo.com
The following checkpatch errors are removed: ERROR: "foo * bar" should be "foo *bar" "foo * bar" should be "foo *bar" Signed-off-by: ZhiHu <huzhi001@208suo.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-09filemap: update ki_pos in generic_perform_writeChristoph Hellwig
All callers of generic_perform_write need to updated ki_pos, move it into common code. Link: https://lkml.kernel.org/r/20230601145904.1385409-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Theodore Ts'o <tytso@mit.edu> Acked-by: Darrick J. Wong <djwong@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09backing_dev: remove current->backing_dev_infoChristoph Hellwig
Patch series "cleanup the filemap / direct I/O interaction", v4. This series cleans up some of the generic write helper calling conventions and the page cache writeback / invalidation for direct I/O. This is a spinoff from the no-bufferhead kernel project, for which we'll want to an use iomap based buffered write path in the block layer. This patch (of 12): The last user of current->backing_dev_info disappeared in commit b9b1335e6403 ("remove bdi_congested() and wb_congested() and related functions"). Remove the field and all assignments to it. Link: https://lkml.kernel.org/r/20230601145904.1385409-1-hch@lst.de Link: https://lkml.kernel.org/r/20230601145904.1385409-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-05-24nfs: Provide a splice-read wrapperDavid Howells
Provide a splice_read wrapper for NFS. This locks the inode around filemap_splice_read() and revalidates the mapping. Splicing from direct I/O is handled by the caller. Signed-off-by: David Howells <dhowells@redhat.com> cc: Christoph Hellwig <hch@lst.de> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Jens Axboe <axboe@kernel.dk> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/20230522135018.2742245-21-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-04-27Merge tag 'mm-stable-2023-04-27-15-30' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of switching from a user process to a kernel thread. - More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav. - zsmalloc performance improvements from Sergey Senozhatsky. - Yue Zhao has found and fixed some data race issues around the alteration of memcg userspace tunables. - VFS rationalizations from Christoph Hellwig: - removal of most of the callers of write_one_page() - make __filemap_get_folio()'s return value more useful - Luis Chamberlain has changed tmpfs so it no longer requires swap backing. Use `mount -o noswap'. - Qi Zheng has made the slab shrinkers operate locklessly, providing some scalability benefits. - Keith Busch has improved dmapool's performance, making part of its operations O(1) rather than O(n). - Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd, permitting userspace to wr-protect anon memory unpopulated ptes. - Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather than exclusive, and has fixed a bunch of errors which were caused by its unintuitive meaning. - Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature, which causes minor faults to install a write-protected pte. - Vlastimil Babka has done some maintenance work on vma_merge(): cleanups to the kernel code and improvements to our userspace test harness. - Cleanups to do_fault_around() by Lorenzo Stoakes. - Mike Rapoport has moved a lot of initialization code out of various mm/ files and into mm/mm_init.c. - Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for DRM, but DRM doesn't use it any more. - Lorenzo has also coverted read_kcore() and vread() to use iterators and has thereby removed the use of bounce buffers in some cases. - Lorenzo has also contributed further cleanups of vma_merge(). - Chaitanya Prakash provides some fixes to the mmap selftesting code. - Matthew Wilcox changes xfs and afs so they no longer take sleeping locks in ->map_page(), a step towards RCUification of pagefaults. - Suren Baghdasaryan has improved mmap_lock scalability by switching to per-VMA locking. - Frederic Weisbecker has reworked the percpu cache draining so that it no longer causes latency glitches on cpu isolated workloads. - Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig logic. - Liu Shixin has changed zswap's initialization so we no longer waste a chunk of memory if zswap is not being used. - Yosry Ahmed has improved the performance of memcg statistics flushing. - David Stevens has fixed several issues involving khugepaged, userfaultfd and shmem. - Christoph Hellwig has provided some cleanup work to zram's IO-related code paths. - David Hildenbrand has fixed up some issues in the selftest code's testing of our pte state changing. - Pankaj Raghav has made page_endio() unneeded and has removed it. - Peter Xu contributed some rationalizations of the userfaultfd selftests. - Yosry Ahmed has fixed an issue around memcg's page recalim accounting. - Chaitanya Prakash has fixed some arm-related issues in the selftests/mm code. - Longlong Xia has improved the way in which KSM handles hwpoisoned pages. - Peter Xu fixes a few issues with uffd-wp at fork() time. - Stefan Roesch has changed KSM so that it may now be used on a per-process and per-cgroup basis. * tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm,unmap: avoid flushing TLB in batch if PTE is inaccessible shmem: restrict noswap option to initial user namespace mm/khugepaged: fix conflicting mods to collapse_file() sparse: remove unnecessary 0 values from rc mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area() hugetlb: pte_alloc_huge() to replace huge pte_alloc_map() maple_tree: fix allocation in mas_sparse_area() mm: do not increment pgfault stats when page fault handler retries zsmalloc: allow only one active pool compaction context selftests/mm: add new selftests for KSM mm: add new KSM process and sysfs knobs mm: add new api to enable ksm per process mm: shrinkers: fix debugfs file permissions mm: don't check VMA write permissions if the PTE/PMD indicates write permissions migrate_pages_batch: fix statistics for longterm pin retry userfaultfd: use helper function range_in_vma() lib/show_mem.c: use for_each_populated_zone() simplify code mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list() fs/buffer: convert create_page_buffers to folio_create_buffers fs/buffer: add folio_create_empty_buffers helper ...
2023-04-06fs: Add FGP_WRITEBEGINMatthew Wilcox
This particular combination of flags is used by most filesystems in their ->write_begin method, although it does find use in a few other places. Before folios, it warranted its own function (grab_cache_page_write_begin()), but I think that just having specialised flags is enough. It certainly helps the few places that have been converted from grab_cache_page_write_begin() to __filemap_get_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://lore.kernel.org/r/20230324180129.1220691-2-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-04-05mm: return an ERR_PTR from __filemap_get_folioChristoph Hellwig
Instead of returning NULL for all errors, distinguish between: - no entry found and not asked to allocated (-ENOENT) - failed to allocate memory (-ENOMEM) - would block (-EAGAIN) so that callers don't have to guess the error based on the passed in flags. Also pass through the error through the direct callers: filemap_get_folio, filemap_lock_folio filemap_grab_folio and filemap_get_incore_folio. [hch@lst.de: fix null-pointer deref] Link: https://lkml.kernel.org/r/20230310070023.GA13563@lst.de Link: https://lkml.kernel.org/r/20230310043137.GA1624890@u2004 Link: https://lkml.kernel.org/r/20230307143410.28031-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs2] Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-22Merge tag 'nfs-for-6.3-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Convert the read and write paths to use folios Bugfixes and Cleanups: - Fix tracepoint state manager flag printing - Fix disabling swap files - Fix NFSv4 client identifier sysfs path in the documentation - Don't clear NFS_CAP_COPY if server returns NFS4ERR_OFFLOAD_DENIED - Treat GETDEVICEINFO errors as a layout failure - Replace kmap_atomic() calls with kmap_local_page() - Constify sunrpc sysfs kobj_type structures" * tag 'nfs-for-6.3-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (25 commits) fs/nfs: Replace kmap_atomic() with kmap_local_page() in dir.c pNFS/filelayout: treat GETDEVICEINFO errors as layout failure Documentation: Fix sysfs path for the NFSv4 client identifier nfs42: do not fail with EIO if ssc returns NFS4ERR_OFFLOAD_DENIED NFS: fix disabling of swap SUNRPC: make kobj_type structures constant nfs4trace: fix state manager flag printing NFS: Remove unnecessary check in nfs_read_folio() NFS: Improve tracing of nfs_wb_folio() NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio() NFS: fix up nfs_release_folio() to try to release the page NFS: Clean up O_DIRECT request allocation NFS: Fix up nfs_vm_page_mkwrite() for folios NFS: Convert nfs_write_begin/end to use folios NFS: Remove unused function nfs_wb_page() NFS: Convert buffered writes to use folios NFS: Convert the function nfs_wb_page() to use folios NFS: Convert buffered reads to use folios NFS: Add a helper nfs_wb_folio() NFS: Convert the remaining pagelist helper functions to support folios ...
2023-02-14NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio()Trond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-02-14NFS: fix up nfs_release_folio() to try to release the pageTrond Myklebust
If the gfp context allows it, and we're not kswapd, then try to write out the folio that has private data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-02-14NFS: Fix up nfs_vm_page_mkwrite() for foliosTrond Myklebust
Mechanical conversion of struct page and functions into the folio equivalents. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-02-14NFS: Convert nfs_write_begin/end to use foliosTrond Myklebust
Add a helper nfs_folio_grab_cache_write_begin() that can call __filemap_get_folio() directly with the appropriate parameters. Since write_begin()/write_end() take struct page arguments, just pass the folio->page back for now. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-02-14NFS: Convert buffered writes to use foliosTrond Myklebust
Mostly mechanical conversion of struct page and functions into struct folio equivalents. The lack of support for folios in write_cache_pages(), means we still only support order 0 folio allocations. However the rest of the writeback code should now be ready for order n > 0. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-02-14NFS: Add a helper nfs_wb_folio()Trond Myklebust
...and use it in nfs_launder_folio(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-01-11filelock: move file locking definitions to separate header fileJeff Layton
The file locking definitions have lived in fs.h since the dawn of time, but they are only used by a small subset of the source files that include it. Move the file locking definitions to a new header file, and add the appropriate #include directives to the source files that need them. By doing this we trim down fs.h a bit and limit the amount of rebuilding that has to be done when we make changes to the file locking APIs. Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Howells <dhowells@redhat.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Acked-by: Chuck Lever <chuck.lever@oracle.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Steve French <stfrench@microsoft.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Jeff Layton <jlayton@kernel.org>
2022-10-13Merge tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Add NFSv4.2 xattr tracepoints - Replace xprtiod WQ in rpcrdma - Flexfiles cancels I/O on layout recall or revoke Bugfixes and Cleanups: - Directly use ida_alloc() / ida_free() - Don't open-code max_t() - Prefer using strscpy over strlcpy - Remove unused forward declarations - Always return layout states on flexfiles layout return - Have LISTXATTR treat NFS4ERR_NOXATTR as an empty reply instead of error - Allow more xprtrdma memory allocations to fail without triggering a reclaim - Various other xprtrdma clean ups - Fix rpc_killall_tasks() races" * tag 'nfs-for-6.1-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits) NFSv4/flexfiles: Cancel I/O if the layout is recalled or revoked SUNRPC: Add API to force the client to disconnect SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls SUNRPC: Fix races with rpc_killall_tasks() xprtrdma: Fix uninitialized variable xprtrdma: Prevent memory allocations from driving a reclaim xprtrdma: Memory allocation should be allowed to fail during connect xprtrdma: MR-related memory allocation should be allowed to fail xprtrdma: Clean up synopsis of rpcrdma_regbuf_alloc() xprtrdma: Clean up synopsis of rpcrdma_req_create() svcrdma: Clean up RPCRDMA_DEF_GFP SUNRPC: Replace the use of the xprtiod WQ in rpcrdma NFSv4.2: Add a tracepoint for listxattr NFSv4.2: Add tracepoints for getxattr, setxattr, and removexattr NFSv4.2: Move TRACE_DEFINE_ENUM(NFS4_CONTENT_*) under CONFIG_NFS_V4_2 NFSv4.2: Add special handling for LISTXATTR receiving NFS4ERR_NOXATTR nfs: remove nfs_wait_atomic_killable() and nfs_write_prepare() declaration NFSv4: remove nfs4_renewd_prepare_shutdown() declaration fs/nfs/pnfs_nfs.c: fix spelling typo and syntax error in comment NFSv4/pNFS: Always return layout stats on layout return for flexfiles ...
2022-10-10Merge tag 'sched-core-2022-10-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "Debuggability: - Change most occurances of BUG_ON() to WARN_ON_ONCE() - Reorganize & fix TASK_ state comparisons, turn it into a bitmap - Update/fix misc scheduler debugging facilities Load-balancing & regular scheduling: - Improve the behavior of the scheduler in presence of lot of SCHED_IDLE tasks - in particular they should not impact other scheduling classes. - Optimize task load tracking, cleanups & fixes - Clean up & simplify misc load-balancing code Freezer: - Rewrite the core freezer to behave better wrt thawing and be simpler in general, by replacing PF_FROZEN with TASK_FROZEN & fixing/adjusting all the fallout. Deadline scheduler: - Fix the DL capacity-aware code - Factor out dl_task_is_earliest_deadline() & replenish_dl_new_period() - Relax/optimize locking in task_non_contending() Cleanups: - Factor out the update_current_exec_runtime() helper - Various cleanups, simplifications" * tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits) sched: Fix more TASK_state comparisons sched: Fix TASK_state comparisons sched/fair: Move call to list_last_entry() in detach_tasks sched/fair: Cleanup loop_max and loop_break sched/fair: Make sure to try to detach at least one movable task sched: Show PF_flag holes freezer,sched: Rewrite core freezer logic sched: Widen TAKS_state literals sched/wait: Add wait_event_state() sched/completion: Add wait_for_completion_state() sched: Add TASK_ANY for wait_task_inactive() sched: Change wait_task_inactive()s match_state freezer,umh: Clean up freezer/initrd interaction freezer: Have {,un}lock_system_sleep() save/restore flags sched: Rename task_running() to task_on_cpu() sched/fair: Cleanup for SIS_PROP sched/fair: Default to false in test_idle_cores() sched/fair: Remove useless check in select_idle_core() sched/fair: Avoid double search on same cpu sched/fair: Remove redundant check in select_idle_smt() ...
2022-10-03NFS: clean up a needless assignment in nfs_file_write()Lukas Bulwahn
Commit 064109db53ec ("NFS: remove redundant code in nfs_file_write()") identifies that filemap_fdatawait_range() will always return 0 and removes a dead error-handling case in nfs_file_write(). With this change however, assigning the return of filemap_fdatawait_range() to the result variable is a dead store. Remove this needless assignment. No functional change. No change in object code. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-09-07freezer,sched: Rewrite core freezer logicPeter Zijlstra
Rewrite the core freezer to behave better wrt thawing and be simpler in general. By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensured frozen tasks stay frozen until thawed and don't randomly wake up early, as is currently possible. As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up two PF_flags (yay!). Specifically; the current scheme works a little like: freezer_do_not_count(); schedule(); freezer_count(); And either the task is blocked, or it lands in try_to_freezer() through freezer_count(). Now, when it is blocked, the freezer considers it frozen and continues. However, on thawing, once pm_freezing is cleared, freezer_count() stops working, and any random/spurious wakeup will let a task run before its time. That is, thawing tries to thaw things in explicit order; kernel threads and workqueues before doing bringing SMP back before userspace etc.. However due to the above mentioned races it is entirely possible for userspace tasks to thaw (by accident) before SMP is back. This can be a fatal problem in asymmetric ISA architectures (eg ARMv9) where the userspace task requires a special CPU to run. As said; replace this with a special task state TASK_FROZEN and add the following state transitions: TASK_FREEZABLE -> TASK_FROZEN __TASK_STOPPED -> TASK_FROZEN __TASK_TRACED -> TASK_FROZEN The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state is already required to deal with spurious wakeups and the freezer causes one such when thawing the task (since the original state is lost). The special __TASK_{STOPPED,TRACED} states *can* be restored since their canonical state is in ->jobctl. With this, frozen tasks need an explicit TASK_FROZEN wakeup and are free of undue (early / spurious) wakeups. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org
2022-08-22Merge tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: "Stable fixes: - NFS: Fix another fsync() issue after a server reboot Bugfixes: - NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT - NFS: Fix missing unlock in nfs_unlink() - Add sanity checking of the file type used by __nfs42_ssc_open - Fix a case where we're failing to set task->tk_rpc_status Cleanups: - Remove the NFS_CONTEXT_RESEND_WRITES flag that got obsoleted by the fsync() fix" * tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: RPC level errors should set task->tk_rpc_status NFSv4.2 fix problems with __nfs42_ssc_open NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT NFS: Cleanup to remove unused flag NFS_CONTEXT_RESEND_WRITES NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds NFS: Fix another fsync() issue after a server reboot NFS: Fix missing unlock in nfs_unlink()
2022-08-13NFS: Fix another fsync() issue after a server rebootTrond Myklebust
Currently, when the writeback code detects a server reboot, it redirties any pages that were not committed to disk, and it sets the flag NFS_CONTEXT_RESEND_WRITES in the nfs_open_context of the file descriptor that dirtied the file. While this allows the file descriptor in question to redrive its own writes, it violates the fsync() requirement that we should be synchronising all writes to disk. While the problem is infrequent, we do see corner cases where an untimely server reboot causes the fsync() call to abandon its attempt to sync data to disk and causing data corruption issues due to missed error conditions or similar. In order to tighted up the client's ability to deal with this situation without introducing livelocks, add a counter that records the number of times pages are redirtied due to a server reboot-like condition, and use that in fsync() to redrive the sync to disk. Fixes: 2197e9b06c22 ("NFS: Fix up fsync() when the server rebooted") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-08-10Merge tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - pNFS/flexfiles: Fix infinite looping when the RDMA connection errors out Bugfixes: - NFS: fix port value parsing - SUNRPC: Reinitialise the backchannel request buffers before reuse - SUNRPC: fix expiry of auth creds - NFSv4: Fix races in the legacy idmapper upcall - NFS: O_DIRECT fixes from Jeff Layton - NFSv4.1: Fix OP_SEQUENCE error handling - SUNRPC: Fix an RPC/RDMA performance regression - NFS: Fix case insensitive renames - NFSv4/pnfs: Fix a use-after-free bug in open - NFSv4.1: RECLAIM_COMPLETE must handle EACCES Features: - NFSv4.1: session trunking enhancements - NFSv4.2: READ_PLUS performance optimisations - NFS: relax the rules for rsize/wsize mount options - NFS: don't unhash dentry during unlink/rename - SUNRPC: Fail faster on bad verifier - NFS/SUNRPC: Various tracing improvements" * tag 'nfs-for-5.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (46 commits) NFS: Improve readpage/writepage tracing NFS: Improve O_DIRECT tracing NFS: Improve write error tracing NFS: don't unhash dentry during unlink/rename NFSv4/pnfs: Fix a use-after-free bug in open NFS: nfs_async_write_reschedule_io must not recurse into the writeback code SUNRPC: Don't reuse bvec on retransmission of the request SUNRPC: Reinitialise the backchannel request buffers before reuse NFSv4.1: RECLAIM_COMPLETE must handle EACCES NFSv4.1 probe offline transports for trunking on session creation SUNRPC create a function that probes only offline transports SUNRPC export xprt_iter_rewind function SUNRPC restructure rpc_clnt_setup_test_and_add_xprt NFSv4.1 remove xprt from xprt_switch if session trunking test fails SUNRPC create an rpc function that allows xprt removal from rpc_clnt SUNRPC enable back offline transports in trunking discovery SUNRPC create an iterator to list only OFFLINE xprts NFSv4.1 offline trunkable transports on DESTROY_SESSION SUNRPC add function to offline remove trunkable transports SUNRPC expose functions for offline remote xprt functionality ...
2022-08-02nfs: Convert to migrate_folioMatthew Wilcox (Oracle)
Use a folio throughout this function. migrate_page() will be converted later. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-07-12NFS: remove redundant code in nfs_file_write()ChenXiaoSong
filemap_fdatawait_range() will always return 0, after patch 6c984083ec24 ("NFS: Use of mapping_set_error() results in spurious errors"), it will not save the wb err in struct address_space->flags: result = filemap_fdatawait_range(file->f_mapping, ...) = 0 filemap_check_errors(mapping) = 0 test_bit(..., &mapping->flags) // flags is 0 Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-05-31Merge tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Add support for 'dacl' and 'sacl' attributes Bugfixes and Cleanups: - Fixes for reporting mapping errors - Fixes for memory allocation errors - Improve warning message when locks are lost - Update documentation for the nfs4_unique_id parameter - Add an explanation of NFSv4 client identifiers - Ensure the i_size attribute is written to the fscache storage - Fix freeing uninitialized nfs4_labels - Better handling when xprtrdma bc_serv is NULL - Mark qualified async operations as MOVEABLE tasks" * tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.1 mark qualified async operations as MOVEABLE tasks xprtrdma: treat all calls not a bcall when bc_serv is NULL NFSv4: Fix free of uninitialized nfs4_label on referral lookup. NFS: Pass i_size to fscache_unuse_cookie() when a file is released Documentation: Add an explanation of NFSv4 client identifiers NFS: update documentation for the nfs4_unique_id parameter NFS: Improve warning message when locks are lost. NFSv4.1: Enable access to the NFSv4.1 'dacl' and 'sacl' attributes NFSv4: Add encoders/decoders for the NFSv4.1 dacl and sacl attributes NFSv4: Specify the type of ACL to cache NFSv4: Don't hold the layoutget locks across multiple RPC calls pNFS/files: Fall back to I/O through the MDS on non-fatal layout errors NFS: Further fixes to the writeback error handling NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout NFS: Memory allocation failures are not server fatal errors NFS: Don't report errors from nfs_pageio_complete() more than once NFS: Do not report flush errors in nfs_write_end() NFS: Don't report ENOSPC write errors twice NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYS NFS: Do not report EINTR/ERESTARTSYS as mapping errors
2022-05-26Merge tag 'mm-stable-2022-05-25' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Almost all of MM here. A few things are still getting finished off, reviewed, etc. - Yang Shi has improved the behaviour of khugepaged collapsing of readonly file-backed transparent hugepages. - Johannes Weiner has arranged for zswap memory use to be tracked and managed on a per-cgroup basis. - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime enablement of the recent huge page vmemmap optimization feature. - Baolin Wang contributes a series to fix some issues around hugetlb pagetable invalidation. - Zhenwei Pi has fixed some interactions between hwpoisoned pages and virtualization. - Tong Tiangen has enabled the use of the presently x86-only page_table_check debugging feature on arm64 and riscv. - David Vernet has done some fixup work on the memcg selftests. - Peter Xu has taught userfaultfd to handle write protection faults against shmem- and hugetlbfs-backed files. - More DAMON development from SeongJae Park - adding online tuning of the feature and support for monitoring of fixed virtual address ranges. Also easier discovery of which monitoring operations are available. - Nadav Amit has done some optimization of TLB flushing during mprotect(). - Neil Brown continues to labor away at improving our swap-over-NFS support. - David Hildenbrand has some fixes to anon page COWing versus get_user_pages(). - Peng Liu fixed some errors in the core hugetlb code. - Joao Martins has reduced the amount of memory consumed by device-dax's compound devmaps. - Some cleanups of the arch-specific pagemap code from Anshuman Khandual. - Muchun Song has found and fixed some errors in the TLB flushing of transparent hugepages. - Roman Gushchin has done more work on the memcg selftests. ... and, of course, many smaller fixes and cleanups. Notably, the customary million cleanup serieses from Miaohe Lin" * tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits) mm: kfence: use PAGE_ALIGNED helper selftests: vm: add the "settings" file with timeout variable selftests: vm: add "test_hmm.sh" to TEST_FILES selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests selftests: vm: add migration to the .gitignore selftests/vm/pkeys: fix typo in comment ksm: fix typo in comment selftests: vm: add process_mrelease tests Revert "mm/vmscan: never demote for memcg reclaim" mm/kfence: print disabling or re-enabling message include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace" include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion" mm: fix a potential infinite loop in start_isolate_page_range() MAINTAINERS: add Muchun as co-maintainer for HugeTLB zram: fix Kconfig dependency warning mm/shmem: fix shmem folio swapoff hang cgroup: fix an error handling path in alloc_pagecache_max_30M() mm: damon: use HPAGE_PMD_SIZE tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate nodemask.h: fix compilation error with GCC12 ...
2022-05-17NFS: Do not report flush errors in nfs_write_end()Trond Myklebust
If we do flush cached writebacks in nfs_write_end() due to the imminent expiration of an RPCSEC_GSS session, then we should defer reporting any resulting errors until the calls to file_check_and_advance_wb_err() in nfs_file_write() and nfs_file_fsync(). Fixes: 6fbda89b257f ("NFS: Replace custom error reporting mechanism with generic one") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-05-17NFS: Don't report ENOSPC write errors twiceTrond Myklebust
Any errors reported by the write() system call need to be cleared from the file descriptor's error tracking. The current call to nfs_wb_all() causes the error to be reported, but since it doesn't call file_check_and_advance_wb_err(), we can end up reporting the same error a second time when the application calls fsync(). Note that since Linux 4.13, the rule is that EIO may be reported for write(), but it must be reported by a subsequent fsync(), so let's just drop reporting it in write. The check for nfs_ctx_key_to_expire() is just a duplicate to the one already in nfs_write_end(), so let's drop that too. Reported-by: ChenXiaoSong <chenxiaosong2@huawei.com> Fixes: ce368536dd61 ("nfs: nfs_file_write() should check for writeback errors") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-05-17NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYSTrond Myklebust
If the commit to disk is interrupted, we should still first check for filesystem errors so that we can report them in preference to the error due to the signal. Fixes: 2197e9b06c22 ("NFS: Fix up fsync() when the server rebooted") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-05-09nfs: Convert to release_folioMatthew Wilcox (Oracle)
Use folios throughout the release_folio paths. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-05-09VFS: add FMODE_CAN_ODIRECT file flagNeilBrown
Currently various places test if direct IO is possible on a file by checking for the existence of the direct_IO address space operation. This is a poor choice, as the direct_IO operation may not be used - it is only used if the generic_file_*_iter functions are called for direct IO and some filesystems - particularly NFS - don't do this. Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the various places to check this (avoiding pointer dereferences). do_dentry_open() will set this flag if ->direct_IO is present, so filesystems do not need to be changed. NFS *is* changed, to set the flag explicitly and discard the direct_IO entry in the address_space_operations for files. Other filesystems which currently use noop_direct_IO could usefully be changed to set this flag instead. Link: https://lkml.kernel.org/r/164859778128.29473.15189737957277399416.stgit@noble.brown Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09nfs: rename nfs_direct_IO and use as ->swap_rwNeilBrown
The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a while. We now need a ->swap_rw function which behaves slightly differently, returning zero for success rather than a byte count. So modify nfs_direct_IO accordingly, rename it, and use it as the ->swap_rw function. Link: https://lkml.kernel.org/r/165119301493.15698.7491285551903597618.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> (on Renesas RSK+RZA1 with 32 MiB of SDRAM) Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09mm: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-spaceNeilBrown
swap currently uses ->readpage to read swap pages. This can only request one page at a time from the filesystem, which is not most efficient. swap uses ->direct_IO for writes which while this is adequate is an inappropriate over-loading. ->direct_IO may need to had handle allocate space for holes or other details that are not relevant for swap. So this patch introduces a new address_space operation: ->swap_rw. In this patch it is used for reads, and a subsequent patch will switch writes to use it. No filesystem yet supports ->swap_rw, but that is not a problem because no filesystem actually works with filesystem-based swap. Only two filesystems set SWP_FS_OPS: - cifs sets the flag, but ->direct_IO always fails so swap cannot work. - nfs sets the flag, but ->direct_IO calls generic_write_checks() which has failed on swap files for several releases. To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both NFS and cifs are changed to fail if ->swap_rw is not set. This can be removed if/when the function is added. Future patches will restore swap-over-NFS functionality. To submit an async read with ->swap_rw() we need to allocate a structure to hold the kiocb and other details. swap_readpage() cannot handle transient failure, so we create a mempool to provide the structures. Link: https://lkml.kernel.org/r/164859778125.29473.13430559328221330589.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09mm: move responsibility for setting SWP_FS_OPS to ->swap_activateNeilBrown
If a filesystem wishes to handle all swap IO itself (via ->direct_IO and ->readpage), rather than just providing devices addresses for submit_bio(), SWP_FS_OPS must be set. Currently the protocol for setting this it to have ->swap_activate return zero. In that case SWP_FS_OPS is set, and add_swap_extent() is called for the entire file. This is a little clumsy as different return values for ->swap_activate have quite different meanings, and it makes it hard to search for which filesystems require SWP_FS_OPS to be set. So remove the special meaning of a zero return, and require the filesystem to set SWP_FS_OPS if it so desires, and to always call add_swap_extent() as required. Currently only NFS and CIFS return zero for add_swap_extent(). Link: https://lkml.kernel.org/r/164859778123.29473.17908205846599043598.stgit@noble.brown Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: David Howells <dhowells@redhat.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09nfs: Convert nfs to read_folioMatthew Wilcox (Oracle)
This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-05-08fs: Convert is_dirty_writeback() to take a folioMatthew Wilcox (Oracle)
Pass a folio instead of a page to aops->is_dirty_writeback(). Convert both implementations and the caller. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>