summaryrefslogtreecommitdiff
path: root/fs/netfs/main.c
AgeCommit message (Collapse)Author
2024-09-16Merge tag 'vfs-6.12.netfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull netfs updates from Christian Brauner: "This contains the work to improve read/write performance for the new netfs library. The main performance enhancing changes are: - Define a structure, struct folio_queue, and a new iterator type, ITER_FOLIOQ, to hold a buffer as a replacement for ITER_XARRAY. See that patch for questions about naming and form. ITER_FOLIOQ is provided as a replacement for ITER_XARRAY. The problem with an xarray is that accessing it requires the use of a lock (typically the RCU read lock) - and this means that we can't supply iterate_and_advance() with a step function that might sleep (crypto for example) without having to drop the lock between pages. ITER_FOLIOQ is the iterator for a chain of folio_queue structs, where each folio_queue holds a small list of folios. A folio_queue struct is a simpler structure than xarray and is not subject to concurrent manipulation by the VM. folio_queue is used rather than a bvec[] as it can form lists of indefinite size, adding to one end and removing from the other on the fly. - Provide a copy_folio_from_iter() wrapper. - Make cifs RDMA support ITER_FOLIOQ. - Use folio queues in the write-side helpers instead of xarrays. - Add a function to reset the iterator in a subrequest. - Simplify the write-side helpers to use sheaves to skip gaps rather than trying to work out where gaps are. - In afs, make the read subrequests asynchronous, putting them into work items to allow the next patch to do progressive unlocking/reading. - Overhaul the read-side helpers to improve performance. - Fix the caching of a partial block at the end of a file. - Allow a store to be cancelled. Then some changes for cifs to make it use folio queues instead of xarrays for crypto bufferage: - Use raw iteration functions rather than manually coding iteration when hashing data. - Switch to using folio_queue for crypto buffers. - Remove the xarray bits. Make some adjustments to the /proc/fs/netfs/stats file such that: - All the netfs stats lines begin 'Netfs:' but change this to something a bit more useful. - Add a couple of stats counters to track the numbers of skips and waits on the per-inode writeback serialisation lock to make it easier to check for this as a source of performance loss. Miscellaneous work: - Ensure that the sb_writers lock is taken around vfs_{set,remove}xattr() in the cachefiles code. - Reduce the number of conditional branches in netfs_perform_write(). - Move the CIFS_INO_MODIFIED_ATTR flag to the netfs_inode struct and remove cifs_post_modify(). - Move the max_len/max_nr_segs members from netfs_io_subrequest to netfs_io_request as they're only needed for one subreq at a time. - Add an 'unknown' source value for tracing purposes. - Remove NETFS_COPY_TO_CACHE as it's no longer used. - Set the request work function up front at allocation time. - Use bh-disabling spinlocks for rreq->lock as cachefiles completion may be run from block-filesystem DIO completion in softirq context. - Remove fs/netfs/io.c" * tag 'vfs-6.12.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (25 commits) docs: filesystems: corrected grammar of netfs page cifs: Don't support ITER_XARRAY cifs: Switch crypto buffer to use a folio_queue rather than an xarray cifs: Use iterate_and_advance*() routines directly for hashing netfs: Cancel dirty folios that have no storage destination cachefiles, netfs: Fix write to partial block at EOF netfs: Remove fs/netfs/io.c netfs: Speed up buffered reading afs: Make read subreqs async netfs: Simplify the writeback code netfs: Provide an iterator-reset function netfs: Use new folio_queue data type and iterator instead of xarray iter cifs: Provide the capability to extract from ITER_FOLIOQ to RDMA SGEs iov_iter: Provide copy_folio_from_iter() mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios netfs: Use bh-disabling spinlocks for rreq->lock netfs: Set the request work function upon allocation netfs: Remove NETFS_COPY_TO_CACHE netfs: Reserve netfs_sreq_source 0 as unset/unknown netfs: Move max_len/max_nr_segs from netfs_io_subrequest to netfs_io_stream ...
2024-09-12netfs: Speed up buffered readingDavid Howells
Improve the efficiency of buffered reads in a number of ways: (1) Overhaul the algorithm in general so that it's a lot more compact and split the read submission code between buffered and unbuffered versions. The unbuffered version can be vastly simplified. (2) Read-result collection is handed off to a work queue rather than being done in the I/O thread. Multiple subrequests can be processes simultaneously. (3) When a subrequest is collected, any folios it fully spans are collected and "spare" data on either side is donated to either the previous or the next subrequest in the sequence. Notes: (*) Readahead expansion is massively slows down fio, presumably because it causes a load of extra allocations, both folio and xarray, up front before RPC requests can be transmitted. (*) RDMA with cifs does appear to work, both with SIW and RXE. (*) PG_private_2-based reading and copy-to-cache is split out into its own file and altered to use folio_queue. Note that the copy to the cache now creates a new write transaction against the cache and adds the folios to be copied into it. This allows it to use part of the writeback I/O code. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-20-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-05netfs: Remove NETFS_COPY_TO_CACHEDavid Howells
Remove NETFS_COPY_TO_CACHE as it isn't used anymore. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20240814203850.2240469-10-dhowells@redhat.com/ # v2 Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-08-30netfs: Delete subtree of 'fs/netfs' when netfs module exitsBaokun Li
In netfs_init() or fscache_proc_init(), we create dentry under 'fs/netfs', but in netfs_exit(), we only delete the proc entry of 'fs/netfs' without deleting its subtree. This triggers the following WARNING: ================================================================== remove_proc_entry: removing non-empty directory 'fs/netfs', leaking at least 'requests' WARNING: CPU: 4 PID: 566 at fs/proc/generic.c:717 remove_proc_entry+0x160/0x1c0 Modules linked in: netfs(-) CPU: 4 UID: 0 PID: 566 Comm: rmmod Not tainted 6.11.0-rc3 #860 RIP: 0010:remove_proc_entry+0x160/0x1c0 Call Trace: <TASK> netfs_exit+0x12/0x620 [netfs] __do_sys_delete_module.isra.0+0x14c/0x2e0 do_syscall_64+0x4b/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e ================================================================== Therefore use remove_proc_subtree() instead of remove_proc_entry() to fix the above problem. Fixes: 7eb5b3e3a0a5 ("netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a symlink") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240826113404.3214786-1-libaokun@huaweicloud.com Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-07-24netfs: Revert "netfs: Switch debug logging to pr_debug()"David Howells
Revert commit 163eae0fb0d4c610c59a8de38040f8e12f89fd43 to get back the original operation of the debugging macros. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240608151352.22860-2-ukleinek@kernel.org Link: https://lore.kernel.org/r/1410685.1721333252@warthog.procyon.org.uk cc: Uwe Kleine-König <ukleinek@kernel.org> cc: Christian Brauner <brauner@kernel.org> cc: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-06-12netfs: Switch debug logging to pr_debug()Uwe Kleine-König
Instead of inventing a custom way to conditionally enable debugging, just make use of pr_debug(), which also has dynamic debugging facilities and is more likely known to someone who hunts a problem in the netfs code. Also drop the module parameter netfs_debug which didn't have any effect without further source changes. (The variable netfs_debug was only used in #ifdef blocks for cpp vars that don't exist; Note that CONFIG_NETFS_DEBUG isn't settable via kconfig, a variable with that name never existed in the mainline and is probably just taken over (and renamed) from similar custom debug logging implementations.) Signed-off-by: Uwe Kleine-König <ukleinek@kernel.org> Link: https://lore.kernel.org/r/20240608151352.22860-2-ukleinek@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-01netfs: Switch to using unsigned long long rather than loff_tDavid Howells
Switch to using unsigned long long rather than loff_t in netfslib to avoid problems with the sign flipping in the maths when we're dealing with the byte at position 0x7fffffffffffffff. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: netfs@lists.linux.dev cc: ceph-devel@vger.kernel.org cc: linux-fsdevel@vger.kernel.org
2024-05-01netfs: Use mempools for allocating requests and subrequestsDavid Howells
Use mempools for allocating requests and subrequests in an effort to make sure that allocation always succeeds so that when performing writeback we can always make progress. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2024-05-01netfs: Remove ->launder_folio() supportDavid Howells
Remove support for ->launder_folio() from netfslib and expect filesystems to use filemap_invalidate_inode() instead. netfs_launder_folio() can then be got rid of. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: David Howells <dhowells@redhat.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Steve French <sfrench@samba.org> cc: Matthew Wilcox <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-fsdevel@vger.kernel.org cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: devel@lists.orangefs.org
2024-04-29netfs: Replace PG_fscache by setting folio->private and marking dirtyDavid Howells
When dirty data is being written to the cache, setting/waiting on/clearing the fscache flag is always done in tandem with setting/waiting on/clearing the writeback flag. The netfslib buffered write routines wait on and set both flags and the write request cleanup clears both flags, so the fscache flag is almost superfluous. The reason it isn't superfluous is because the fscache flag is also used to indicate that data just read from the server is being written to the cache. The flag is used to prevent a race involving overlapping direct-I/O writes to the cache. Change this to indicate that a page is in need of being copied to the cache by placing a magic value in folio->private and marking the folios dirty. Then when the writeback code sees a folio marked in this way, it only writes it to the cache and not to the server. If a folio that has this magic value set is modified, the value is just replaced and the folio will then be uplodaded too. With this, PG_fscache is no longer required by the netfslib core, 9p and afs. Ceph and nfs, however, still need to use the old PG_fscache-based tracking. To deal with this, a flag, NETFS_ICTX_USE_PGPRIV2, now has to be set on the flags in the netfs_inode struct for those filesystems. This reenables the use of PG_fscache in that inode. 9p and afs use the netfslib write helpers so get switched over; cifs, for the moment, does page-by-page manual access to the cache, so doesn't use PG_fscache and is unaffected. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Matthew Wilcox (Oracle) <willy@infradead.org> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: Marc Dionne <marc.dionne@auristor.com> cc: Ilya Dryomov <idryomov@gmail.com> cc: Xiubo Li <xiubli@redhat.com> cc: Steve French <sfrench@samba.org> cc: Paulo Alcantara <pc@manguebit.com> cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> cc: Shyam Prasad N <sprasad@microsoft.com> cc: Tom Talpey <tom@talpey.com> cc: Bharath SM <bharathsm@microsoft.com> cc: Trond Myklebust <trond.myklebust@hammerspace.com> cc: Anna Schumaker <anna@kernel.org> cc: netfs@lists.linux.dev cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Export the netfs_sreq tracepointDavid Howells
Export the netfs_sreq tracepoint so that it can be called directly from client filesystems/cache backend modules. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Implement a write-through caching optionDavid Howells
Provide a flag whereby a filesystem may request that cifs_perform_write() perform write-through caching. This involves putting pages directly into writeback rather than dirty and attaching them to a write operation as we go. Further, the writes being made are limited to the byte range being written rather than whole folios being written. This can be used by cifs, for example, to deal with strict byte-range locking. This can't be used with content encryption as that may require expansion of the write RPC beyond the write being made. This doesn't affect writes via mmap - those are written back in the normal way; similarly failed writethrough writes are marked dirty and left to writeback to retry. Another option would be to simply invalidate them, but the contents can be simultaneously accessed by read() and through mmap. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Provide a launder_folio implementationDavid Howells
Provide a launder_folio implementation for netfslib. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Implement unbuffered/DIO write supportDavid Howells
Implement support for unbuffered writes and direct I/O writes. If the write is misaligned with respect to the fscrypt block size, then RMW cycles are performed if necessary. DIO writes are a special case of unbuffered writes with extra restriction imposed, such as block size alignment requirements. Also provide a field that can tell the code to add some extra space onto the bounce buffer for use by the filesystem in the case of a content-encrypted file. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Implement unbuffered/DIO read supportDavid Howells
Implement support for unbuffered and DIO reads in the netfs library, utilising the existing read helper code to do block splitting and individual queuing. The code also handles extraction of the destination buffer from the supplied iterator, allowing async unbuffered reads to take place. The read will be split up according to the rsize setting and, if supplied, the ->clamp_length() method. Note that the next subrequest will be issued as soon as issue_op returns, without waiting for previous ones to finish. The network filesystem needs to pause or handle queuing them if it doesn't want to fire them all at the server simultaneously. Once all the subrequests have finished, the state will be assessed and the amount of data to be indicated as having being obtained will be determined. As the subrequests may finish in any order, if an intermediate subrequest is short, any further subrequests may be copied into the buffer and then abandoned. In the future, this will also take care of doing an unbuffered read from encrypted content, with the decryption being done by the library. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-28netfs: Extend the netfs_io_*request structs to handle writesDavid Howells
Modify the netfs_io_request struct to act as a point around which writes can be coordinated. It represents and pins a range of pages that need writing and a list of regions of dirty data in that range of pages. If RMW is required, the original data can be downloaded into the bounce buffer, decrypted if necessary, the modifications made, then the modified data can be reencrypted/recompressed and sent back to the server. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs: Add a procfile to list in-progress requestsDavid Howells
Add a procfile, /proc/fs/netfs/requests, to list in-progress netfslib I/O requests. Signed-off-by: David Howells <dhowells@redhat.com> cc: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org
2023-12-24netfs, fscache: Move /proc/fs/fscache to /proc/fs/netfs and put in a symlinkDavid Howells
Rename /proc/fs/fscache to "netfs" and make a symlink from fscache to that. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com
2023-12-24netfs, fscache: Combine fscache with netfsDavid Howells
Now that the fscache code is moved to be colocated with the netfslib code so that they combined into one module, do the combining. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com cc: linux-nfs@vger.kernel.org, cc: linux-erofs@lists.ozlabs.org
2023-12-24netfs, fscache: Move fs/fscache/* into fs/netfs/David Howells
There's a problem with dependencies between netfslib and fscache as each wants to access some functions of the other. Deal with this by moving fs/fscache/* into fs/netfs/ and renaming those files to begin with "fscache-". For the moment, the moved files are changed as little as possible and an fscache module is still built. A subsequent patch will integrate them. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: Christian Brauner <christian@brauner.io> cc: linux-fsdevel@vger.kernel.org cc: linux-cachefs@redhat.com
2022-03-18netfs: Split some core bits out into their own fileDavid Howells
Split some core bits out into their own file. More bits will be added to this file later. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164623006934.3564931.17932680017894039748.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678218407.1200972.1731208226140990280.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692920944.2099075.11990502173226013856.stgit@warthog.procyon.org.uk/ # v3