diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-03-31 15:49:36 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-03-31 15:49:36 -0700 |
commit | f008b1d6e1e06bb61e9402aa8a1cfa681510e375 (patch) | |
tree | 5c21a4ed2333592eec1ab4d25a6189e4bfe62866 | |
parent | 478f74a3d8085076dfcb481aa9361b808a6aae94 (diff) | |
parent | ab487a4cdfca3d1ef12795a49eafe1144967e617 (diff) |
Merge tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull netfs updates from David Howells:
"Netfs prep for write helpers.
Having had a go at implementing write helpers and content encryption
support in netfslib, it seems that the netfs_read_{,sub}request
structs and the equivalent write request structs were almost the same
and so should be merged, thereby requiring only one set of
alloc/get/put functions and a common set of tracepoints.
Merging the structs also has the advantage that if a bounce buffer is
added to the request struct, a read operation can be performed to fill
the bounce buffer, the contents of the buffer can be modified and then
a write operation can be performed on it to send the data wherever it
needs to go using the same request structure all the way through. The
I/O handlers would then transparently perform any required crypto.
This should make it easier to perform RMW cycles if needed.
The potentially common functions and structs, however, by their names
all proclaim themselves to be associated with the read side of things.
The bulk of these changes alter this in the following ways:
- Rename struct netfs_read_{,sub}request to netfs_io_{,sub}request.
- Rename some enums, members and flags to make them more appropriate.
- Adjust some comments to match.
- Drop "read"/"rreq" from the names of common functions. For
instance, netfs_get_read_request() becomes netfs_get_request().
- The ->init_rreq() and ->issue_op() methods become ->init_request()
and ->issue_read(). I've kept the latter as a read-specific
function and in another branch added an ->issue_write() method.
The driver source is then reorganised into a number of files:
fs/netfs/buffered_read.c Create read reqs to the pagecache
fs/netfs/io.c Dispatchers for read and write reqs
fs/netfs/main.c Some general miscellaneous bits
fs/netfs/objects.c Alloc, get and put functions
fs/netfs/stats.c Optional procfs statistics.
and future development can be fitted into this scheme, e.g.:
fs/netfs/buffered_write.c Modify the pagecache
fs/netfs/buffered_flush.c Writeback from the pagecache
fs/netfs/direct_read.c DIO read support
fs/netfs/direct_write.c DIO write support
fs/netfs/unbuffered_write.c Write modifications directly back
Beyond the above changes, there are also some changes that affect how
things work:
- Make fscache_end_operation() generally available.
- In the netfs tracing header, generate enums from the symbol ->
string mapping tables rather than manually coding them.
- Add a struct for filesystems that uses netfslib to put into their
inode wrapper structs to hold extra state that netfslib is
interested in, such as the fscache cookie. This allows netfslib
functions to be set in filesystem operation tables and jumped to
directly without having to have a filesystem wrapper.
- Add a member to the struct added above to track the remote inode
length as that may differ if local modifications are buffered. We
may need to supply an appropriate EOF pointer when storing data (in
AFS for example).
- Pass extra information to netfs_alloc_request() so that the
->init_request() hook can access it and retain information to
indicate the origin of the operation.
- Make the ->init_request() hook return an error, thereby allowing a
filesystem that isn't allowed to cache an inode (ceph or cifs, for
example) to skip readahead.
- Switch to using refcount_t for subrequests and add tracepoints to
log refcount changes for the request and subrequest structs.
- Add a function to consolidate dispatching a read request. Similar
code is used in three places and another couple are likely to be
added in the future"
Link: https://lore.kernel.org/all/2639515.1648483225@warthog.procyon.org.uk/
* tag 'netfs-prep-20220318' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Maintain netfs_i_context::remote_i_size
netfs: Keep track of the actual remote file size
netfs: Split some core bits out into their own file
netfs: Split fs/netfs/read_helper.c
netfs: Rename read_helper.c to io.c
netfs: Prepare to split read_helper.c
netfs: Add a function to consolidate beginning a read
netfs: Add a netfs inode context
ceph: Make ceph_init_request() check caps on readahead
netfs: Change ->init_request() to return an error code
netfs: Refactor arguments for netfs_alloc_read_request
netfs: Adjust the netfs_failure tracepoint to indicate non-subreq lines
netfs: Trace refcounting on the netfs_io_subrequest struct
netfs: Trace refcounting on the netfs_io_request struct
netfs: Adjust the netfs_rreq tracepoint slightly
netfs: Split netfs_io_* object handling out
netfs: Finish off rename of netfs_read_request to netfs_io_request
netfs: Rename netfs_read_*request to netfs_io_*request
netfs: Generate enums from trace symbol mapping lists
fscache: export fscache_end_operation()
35 files changed, 1868 insertions, 1628 deletions
diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst index 4f373a8ec47b..69f00179fdfe 100644 --- a/Documentation/filesystems/netfs_library.rst +++ b/Documentation/filesystems/netfs_library.rst @@ -7,6 +7,8 @@ Network Filesystem Helper Library .. Contents: - Overview. + - Per-inode context. + - Inode context helper functions. - Buffered read helpers. - Read helper functions. - Read helper structures. @@ -28,6 +30,69 @@ Note that the library module doesn't link against local caching directly, so access must be provided by the netfs. +Per-Inode Context +================= + +The network filesystem helper library needs a place to store a bit of state for +its use on each netfs inode it is helping to manage. To this end, a context +structure is defined:: + + struct netfs_i_context { + const struct netfs_request_ops *ops; + struct fscache_cookie *cache; + }; + +A network filesystem that wants to use netfs lib must place one of these +directly after the VFS ``struct inode`` it allocates, usually as part of its +own struct. This can be done in a way similar to the following:: + + struct my_inode { + struct { + /* These must be contiguous */ + struct inode vfs_inode; + struct netfs_i_context netfs_ctx; + }; + ... + }; + +This allows netfslib to find its state by simple offset from the inode pointer, +thereby allowing the netfslib helper functions to be pointed to directly by the +VFS/VM operation tables. + +The structure contains the following fields: + + * ``ops`` + + The set of operations provided by the network filesystem to netfslib. + + * ``cache`` + + Local caching cookie, or NULL if no caching is enabled. This field does not + exist if fscache is disabled. + + +Inode Context Helper Functions +------------------------------ + +To help deal with the per-inode context, a number helper functions are +provided. Firstly, a function to perform basic initialisation on a context and +set the operations table pointer:: + + void netfs_i_context_init(struct inode *inode, + const struct netfs_request_ops *ops); + +then two functions to cast between the VFS inode structure and the netfs +context:: + + struct netfs_i_context *netfs_i_context(struct inode *inode); + struct inode *netfs_inode(struct netfs_i_context *ctx); + +and finally, a function to get the cache cookie pointer from the context +attached to an inode (or NULL if fscache is disabled):: + + struct fscache_cookie *netfs_i_cookie(struct inode *inode); + + Buffered Read Helpers ===================== @@ -70,38 +135,22 @@ Read Helper Functions Three read helpers are provided:: - void netfs_readahead(struct readahead_control *ractl, - const struct netfs_read_request_ops *ops, - void *netfs_priv); + void netfs_readahead(struct readahead_control *ractl); int netfs_readpage(struct file *file, - struct folio *folio, - const struct netfs_read_request_ops *ops, - void *netfs_priv); + struct page *page); int netfs_write_begin(struct file *file, struct address_space *mapping, loff_t pos, unsigned int len, unsigned int flags, struct folio **_folio, - void **_fsdata, - const struct netfs_read_request_ops *ops, - void *netfs_priv); - -Each corresponds to a VM operation, with the addition of a couple of parameters -for the use of the read helpers: + void **_fsdata); - * ``ops`` - - A table of operations through which the helpers can talk to the filesystem. - - * ``netfs_priv`` +Each corresponds to a VM address space operation. These operations use the +state in the per-inode context. - Filesystem private data (can be NULL). - -Both of these values will be stored into the read request structure. - -For ->readahead() and ->readpage(), the network filesystem should just jump -into the corresponding read helper; whereas for ->write_begin(), it may be a +For ->readahead() and ->readpage(), the network filesystem just point directly +at the corresponding read helper; whereas for ->write_begin(), it may be a little more complicated as the network filesystem might want to flush conflicting writes or track dirty data and needs to put the acquired folio if an error occurs after calling the helper. @@ -116,7 +165,7 @@ occurs, the request will get partially completed if sufficient data is read. Additionally, there is:: - * void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, + * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq, ssize_t transferred_or_error, bool was_async); @@ -132,7 +181,7 @@ Read Helper Structures The read helpers make use of a couple of structures to maintain the state of the read. The first is a structure that manages a read request as a whole:: - struct netfs_read_request { + struct netfs_io_request { struct inode *inode; struct address_space *mapping; struct netfs_cache_resources cache_resources; @@ -140,7 +189,7 @@ the read. The first is a structure that manages a read request as a whole:: loff_t start; size_t len; loff_t i_size; - const struct netfs_read_request_ops *netfs_ops; + const struct netfs_request_ops *netfs_ops; unsigned int debug_id; ... }; @@ -187,8 +236,8 @@ The above fields are the ones the netfs can use. They are: The second structure is used to manage individual slices of the overall read request:: - struct netfs_read_subrequest { - struct netfs_read_request *rreq; + struct netfs_io_subrequest { + struct netfs_io_request *rreq; loff_t start; size_t len; size_t transferred; @@ -244,32 +293,26 @@ Read Helper Operations The network filesystem must provide the read helpers with a table of operations through which it can issue requests and negotiate:: - struct netfs_read_request_ops { - void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); - bool (*is_cache_enabled)(struct inode *inode); - int (*begin_cache_operation)(struct netfs_read_request *rreq); - void (*expand_readahead)(struct netfs_read_request *rreq); - bool (*clamp_length)(struct netfs_read_subrequest *subreq); - void (*issue_op)(struct netfs_read_subrequest *subreq); - bool (*is_still_valid)(struct netfs_read_request *rreq); + struct netfs_request_ops { + void (*init_request)(struct netfs_io_request *rreq, struct file *file); + int (*begin_cache_operation)(struct netfs_io_request *rreq); + void (*expand_readahead)(struct netfs_io_request *rreq); + bool (*clamp_length)(struct netfs_io_subrequest *subreq); + void (*issue_read)(struct netfs_io_subrequest *subreq); + bool (*is_still_valid)(struct netfs_io_request *rreq); int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, struct folio *folio, void **_fsdata); - void (*done)(struct netfs_read_request *rreq); + void (*done)(struct netfs_io_request *rreq); void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; The operations are as follows: - * ``init_rreq()`` + * ``init_request()`` [Optional] This is called to initialise the request structure. It is given the file for reference and can modify the ->netfs_priv value. - * ``is_cache_enabled()`` - - [Required] This is called by netfs_write_begin() to ask if the file is being - cached. It should return true if it is being cached and false otherwise. - * ``begin_cache_operation()`` [Optional] This is called to ask the network filesystem to call into the @@ -305,7 +348,7 @@ The operations are as follows: This should return 0 on success and an error code on error. - * ``issue_op()`` + * ``issue_read()`` [Required] The helpers use this to dispatch a subrequest to the server for reading. In the subrequest, ->start, ->len and ->transferred indicate what @@ -420,12 +463,12 @@ The network filesystem's ->begin_cache_operation() method is called to set up a cache and this must call into the cache to do the work. If using fscache, for example, the cache would call:: - int fscache_begin_read_operation(struct netfs_read_request *rreq, + int fscache_begin_read_operation(struct netfs_io_request *rreq, struct fscache_cookie *cookie); passing in the request pointer and the cookie corresponding to the file. -The netfs_read_request object contains a place for the cache to hang its +The netfs_io_request object contains a place for the cache to hang its state:: struct netfs_cache_resources { @@ -443,7 +486,7 @@ operation table looks like the following:: void (*expand_readahead)(struct netfs_cache_resources *cres, loff_t *_start, size_t *_len, loff_t i_size); - enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq, + enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq, loff_t i_size); int (*read)(struct netfs_cache_resources *cres, @@ -562,4 +605,5 @@ API Function Reference ====================== .. kernel-doc:: include/linux/netfs.h -.. kernel-doc:: fs/netfs/read_helper.c +.. kernel-doc:: fs/netfs/buffered_read.c +.. kernel-doc:: fs/netfs/io.c diff --git a/fs/9p/cache.c b/fs/9p/cache.c index 55e108e5e133..1c8dc696d516 100644 --- a/fs/9p/cache.c +++ b/fs/9p/cache.c @@ -49,22 +49,20 @@ int v9fs_cache_session_get_cookie(struct v9fs_session_info *v9ses, void v9fs_cache_inode_get_cookie(struct inode *inode) { - struct v9fs_inode *v9inode; + struct v9fs_inode *v9inode = V9FS_I(inode); struct v9fs_session_info *v9ses; __le32 version; __le64 path; if (!S_ISREG(inode->i_mode)) return; - - v9inode = V9FS_I(inode); - if (WARN_ON(v9inode->fscache)) + if (WARN_ON(v9fs_inode_cookie(v9inode))) return; version = cpu_to_le32(v9inode->qid.version); path = cpu_to_le64(v9inode->qid.path); v9ses = v9fs_inode2v9ses(inode); - v9inode->fscache = + v9inode->netfs_ctx.cache = fscache_acquire_cookie(v9fs_session_cache(v9ses), 0, &path, sizeof(path), @@ -72,5 +70,5 @@ void v9fs_cache_inode_get_cookie(struct inode *inode) i_size_read(&v9inode->vfs_inode)); p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n", - inode, v9inode->fscache); + inode, v9fs_inode_cookie(v9inode)); } diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c index 08f65c40af4f..e28ddf763b3b 100644 --- a/fs/9p/v9fs.c +++ b/fs/9p/v9fs.c @@ -623,9 +623,7 @@ static void v9fs_sysfs_cleanup(void) static void v9fs_inode_init_once(void *foo) { struct v9fs_inode *v9inode = (struct v9fs_inode *)foo; -#ifdef CONFIG_9P_FSCACHE - v9inode->fscache = NULL; -#endif + memset(&v9inode->qid, 0, sizeof(v9inode->qid)); inode_init_once(&v9inode->vfs_inode); } diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h index bc8b30205d36..ec0e8df3b2eb 100644 --- a/fs/9p/v9fs.h +++ b/fs/9p/v9fs.h @@ -9,6 +9,7 @@ #define FS_9P_V9FS_H #include <linux/backing-dev.h> +#include <linux/netfs.h> /** * enum p9_session_flags - option flags for each 9P session @@ -108,14 +109,15 @@ struct v9fs_session_info { #define V9FS_INO_INVALID_ATTR 0x01 struct v9fs_inode { -#ifdef CONFIG_9P_FSCACHE - struct fscache_cookie *fscache; -#endif + struct { + /* These must be contiguous */ + struct inode vfs_inode; /* the VFS's inode record */ + struct netfs_i_context netfs_ctx; /* Netfslib context */ + }; struct p9_qid qid; unsigned int cache_validity; struct p9_fid *writeback_fid; struct mutex v_mutex; - struct inode vfs_inode; }; static inline struct v9fs_inode *V9FS_I(const struct inode *inode) @@ -126,7 +128,7 @@ static inline struct v9fs_inode *V9FS_I(const struct inode *inode) static inline struct fscache_cookie *v9fs_inode_cookie(struct v9fs_inode *v9inode) { #ifdef CONFIG_9P_FSCACHE - return v9inode->fscache; + return netfs_i_cookie(&v9inode->vfs_inode); #else return NULL; #endif @@ -163,6 +165,7 @@ extern struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses, extern const struct inode_operations v9fs_dir_inode_operations_dotl; extern const struct inode_operations v9fs_file_inode_operations_dotl; extern const struct inode_operations v9fs_symlink_inode_operations_dotl; +extern const struct netfs_request_ops v9fs_req_ops; extern struct inode *v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses, struct p9_fid *fid, struct super_block *sb, int new); diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 76956c9d2af9..501128188343 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -28,12 +28,12 @@ #include "fid.h" /** - * v9fs_req_issue_op - Issue a read from 9P + * v9fs_issue_read - Issue a read from 9P * @subreq: The read to make */ -static void v9fs_req_issue_op(struct netfs_read_subrequest *subreq) +static void v9fs_issue_read(struct netfs_io_subrequest *subreq) { - struct netfs_read_request *rreq = subreq->rreq; + struct netfs_io_request *rreq = subreq->rreq; struct p9_fid *fid = rreq->netfs_priv; struct iov_iter to; loff_t pos = subreq->start + subreq->transferred; @@ -52,20 +52,21 @@ static void v9fs_req_issue_op(struct netfs_read_subrequest *subreq) } /** - * v9fs_init_rreq - Initialise a read request + * v9fs_init_request - Initialise a read request * @rreq: The read request * @file: The file being read from */ -static void v9fs_init_rreq(struct netfs_read_request *rreq, struct file *file) +static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) { struct p9_fid *fid = file->private_data; refcount_inc(&fid->count); rreq->netfs_priv = fid; + return 0; } /** - * v9fs_req_cleanup - Cleanup request initialized by v9fs_init_rreq + * v9fs_req_cleanup - Cleanup request initialized by v9fs_init_request * @mapping: unused mapping of request to cleanup * @priv: private data to cleanup, a fid, guaranted non-null. */ @@ -77,21 +78,10 @@ static void v9fs_req_cleanup(struct address_space *mapping, void *priv) } /** - * v9fs_is_cache_enabled - Determine if caching is enabled for an inode - * @inode: The inode to check - */ -static bool v9fs_is_cache_enabled(struct inode *inode) -{ - struct fscache_cookie *cookie = v9fs_inode_cookie(V9FS_I(inode)); - - return fscache_cookie_enabled(cookie) && cookie->cache_priv; -} - -/** * v9fs_begin_cache_operation - Begin a cache operation for a read * @rreq: The read request */ -static int v9fs_begin_cache_operation(struct netfs_read_request *rreq) +static int v9fs_begin_cache_operation(struct netfs_io_request *rreq) { #ifdef CONFIG_9P_FSCACHE struct fscache_cookie *cookie = v9fs_inode_cookie(V9FS_I(rreq->inode)); @@ -102,37 +92,14 @@ static int v9fs_begin_cache_operation(struct netfs_read_request *rreq) #endif } -static const struct netfs_read_request_ops v9fs_req_ops = { - .init_rreq = v9fs_init_rreq, - .is_cache_enabled = v9fs_is_cache_enabled, +const struct netfs_request_ops v9fs_req_ops = { + .init_request = v9fs_init_request, .begin_cache_operation = v9fs_begin_cache_operation, - .issue_op = v9fs_req_issue_op, + .issue_read = v9fs_issue_read, .cleanup = v9fs_req_cleanup, }; /** - * v9fs_vfs_readpage - read an entire page in from 9P - * @file: file being read - * @page: structure to page - * - */ -static int v9fs_vfs_readpage(struct file *file, struct page *page) -{ - struct folio *folio = page_folio(page); - - return netfs_readpage(file, folio, &v9fs_req_ops, NULL); -} - -/** - * v9fs_vfs_readahead - read a set of pages from 9P - * @ractl: The readahead parameters - */ -static void v9fs_vfs_readahead(struct readahead_control *ractl) -{ - netfs_readahead(ractl, &v9fs_req_ops, NULL); -} - -/** * v9fs_release_page - release the private state associated with a page * @page: The page to be released * @gfp: The caller's allocation restrictions @@ -308,8 +275,7 @@ static int v9fs_write_begin(struct file *filp, struct address_space *mapping, * file. We need to do this before we get a lock on the page in case * there's more than one writer competing for the same cache block. */ - retval = netfs_write_begin(filp, mapping, pos, len, flags, &folio, fsdata, - &v9fs_req_ops, NULL); + retval = netfs_write_begin(filp, mapping, pos, len, flags, &folio, fsdata); if (retval < 0) return retval; @@ -370,8 +336,8 @@ static bool v9fs_dirty_folio(struct address_space *mapping, struct folio *folio) #endif const struct address_space_operations v9fs_addr_operations = { - .readpage = v9fs_vfs_readpage, - .readahead = v9fs_vfs_readahead, + .readpage = netfs_readpage, + .readahead = netfs_readahead, .dirty_folio = v9fs_dirty_folio, .writepage = v9fs_vfs_writepage, .write_begin = v9fs_write_begin, diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c index 84c3cf7dffa5..55367ecb9442 100644 --- a/fs/9p/vfs_inode.c +++ b/fs/9p/vfs_inode.c @@ -231,9 +231,6 @@ struct inode *v9fs_alloc_inode(struct super_block *sb) v9inode = alloc_inode_sb(sb, v9fs_inode_cache, GFP_KERNEL); if (!v9inode) return NULL; -#ifdef CONFIG_9P_FSCACHE - v9inode->fscache = NULL; -#endif v9inode->writeback_fid = NULL; v9inode->cache_validity = 0; mutex_init(&v9inode->v_mutex); @@ -250,6 +247,14 @@ void v9fs_free_inode(struct inode *inode) kmem_cache_free(v9fs_inode_cache, V9FS_I(inode)); } +/* + * Set parameters for the netfs library + */ +static void v9fs_set_netfs_context(struct inode *inode) +{ + netfs_i_context_init(inode, &v9fs_req_ops); +} + int v9fs_init_inode(struct v9fs_session_info *v9ses, struct inode *inode, umode_t mode, dev_t rdev) { @@ -338,6 +343,8 @@ int v9fs_init_inode(struct v9fs_session_info *v9ses, err = -EINVAL; goto error; } + + v9fs_set_netfs_context(inode); error: return err; diff --git a/fs/afs/dynroot.c b/fs/afs/dynroot.c index db832cc931c8..f120bcb8bf73 100644 --- a/fs/afs/dynroot.c +++ b/fs/afs/dynroot.c @@ -76,6 +76,7 @@ struct inode *afs_iget_pseudo_dir(struct super_block *sb, bool root) /* there shouldn't be an existing inode */ BUG_ON(!(inode->i_state & I_NEW)); + netfs_i_context_init(inode, NULL); inode->i_size = 0; inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO; if (root) { diff --git a/fs/afs/file.c b/fs/afs/file.c index 0f9fdb284a20..26292a110a8f 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -19,13 +19,11 @@ #include "internal.h" static int afs_file_mmap(struct file *file, struct vm_area_struct *vma); -static int afs_readpage(struct file *file, struct page *page); static int afs_symlink_readpage(struct file *file, struct page *page); static void afs_invalidate_folio(struct folio *folio, size_t offset, size_t length); static int afs_releasepage(struct page *page, gfp_t gfp_flags); -static void afs_readahead(struct readahead_control *ractl); static ssize_t afs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter); static void afs_vm_open(struct vm_area_struct *area); static void afs_vm_close(struct vm_area_struct *area); @@ -52,8 +50,8 @@ const struct inode_operations afs_file_inode_operations = { }; const struct address_space_operations afs_file_aops = { - .readpage = afs_readpage, - .readahead = afs_readahead, + .readpage = netfs_readpage, + .readahead = netfs_readahead, .dirty_folio = afs_dirty_folio, .launder_folio = afs_launder_folio, .releasepage = afs_releasepage, @@ -240,7 +238,7 @@ void afs_put_read(struct afs_read *req) static void afs_fetch_data_notify(struct afs_operation *op) { struct afs_read *req = op->fetch.req; - struct netfs_read_subrequest *subreq = req->subreq; + struct netfs_io_subrequest *subreq = req->subreq; int error = op->error; if (error == -ECONNABORTED) @@ -310,7 +308,7 @@ int afs_fetch_data(struct afs_vnode *vnode, struct afs_read *req) return afs_do_sync_operation(op); } -static void afs_req_issue_op(struct netfs_read_subrequest *subreq) +static void afs_issue_read(struct netfs_io_subrequest *subreq) { struct afs_vnode *vnode = AFS_FS_I(subreq->rreq->inode); struct afs_read *fsreq; @@ -359,19 +357,13 @@ static int afs_symlink_readpage(struct file *file, struct page *page) return ret; } -static void afs_init_rreq(struct netfs_read_request *rreq, struct file *file) +static int afs_init_request(struct netfs_io_request *rreq, struct file *file) { rreq->netfs_priv = key_get(afs_file_key(file)); + return 0; } -static bool afs_is_cache_enabled(struct inode *inode) -{ - struct fscache_cookie *cookie = afs_vnode_cache(AFS_FS_I(inode)); - - return fscache_cookie_enabled(cookie) && cookie->cache_priv; -} - -static int afs_begin_cache_operation(struct netfs_read_request *rreq) +static int afs_begin_cache_operation(struct netfs_io_request *rreq) { #ifdef CONFIG_AFS_FSCACHE struct afs_vnode *vnode = AFS_FS_I(rreq->inode); @@ -396,27 +388,14 @@ static void afs_priv_cleanup(struct address_space *mapping, void *netfs_priv) key_put(netfs_priv); } -const struct netfs_read_request_ops afs_req_ops = { - .init_rreq = afs_init_rreq, - .is_cache_enabled = afs_is_cache_enabled, +const struct netfs_request_ops afs_req_ops = { + .init_request = afs_init_request, .begin_cache_operation = afs_begin_cache_operation, .check_write_begin = afs_check_write_begin, - .issue_op = afs_req_issue_op, + .issue_read = afs_issue_read, .cleanup = afs_priv_cleanup, }; -static int afs_readpage(struct file *file, struct page *page) -{ - struct folio *folio = page_folio(page); - - return netfs_readpage(file, folio, &afs_req_ops, NULL); -} - -static void afs_readahead(struct readahead_control *ractl) -{ - netfs_readahead(ractl, &afs_req_ops, NULL); -} - int afs_write_inode(struct inode *inode, struct writeback_control *wbc) { fscache_unpin_writeback(wbc, afs_vnode_cache(AFS_FS_I(inode))); diff --git a/fs/afs/inode.c b/fs/afs/inode.c index 5964f8aee090..2fe402483ad5 100644 --- a/fs/afs/inode.c +++ b/fs/afs/inode.c @@ -54,6 +54,14 @@ static noinline void dump_vnode(struct afs_vnode *vnode, struct afs_vnode *paren } /* + * Set parameters for the netfs library + */ +static void afs_set_netfs_context(struct afs_vnode *vnode) +{ + netfs_i_context_init(&vnode->vfs_inode, &afs_req_ops); +} + +/* * Initialise an inode from the vnode status. */ static int afs_inode_init_from_status(struct afs_operation *op, @@ -128,6 +136,7 @@ static int afs_inode_init_from_status(struct afs_operation *op, } afs_set_i_size(vnode, status->size); + afs_set_netfs_context(vnode); vnode->invalid_before = status->data_version; inode_set_iversion_raw(&vnode->vfs_inode, status->data_version); @@ -237,6 +246,7 @@ static void afs_apply_status(struct afs_operation *op, * idea of what the size should be that's not the same as * what's on the server. */ + vnode->netfs_ctx.remote_i_size = status->size; if (change_size) { afs_set_i_size(vnode, status->size); inode->i_ctime = t; @@ -420,7 +430,7 @@ static void afs_get_inode_cache(struct afs_vnode *vnode) struct afs_vnode_cache_aux aux; if (vnode->status.type != AFS_FTYPE_FILE) { - vnode->cache = NULL; + vnode->netfs_ctx.cache = NULL; return; } @@ -430,12 +440,14 @@ static void afs_get_inode_cache(struct afs_vnode *vnode) key.vnode_id_ext[1] = htonl(vnode->fid.vnode_hi); afs_set_cache_aux(vnode, &aux); - vnode->cache = fscache_acquire_cookie( - vnode->volume->cache, - vnode->status.type == AFS_FTYPE_FILE ? 0 : FSCACHE_ADV_SINGLE_CHUNK, - &key, sizeof(key), - &aux, sizeof(aux), - vnode->status.size); + afs_vnode_set_cache(vnode, + fscache_acquire_cookie( + vnode->volume->cache, + vnode->status.type == AFS_FTYPE_FILE ? + 0 : FSCACHE_ADV_SINGLE_CHUNK, + &key, sizeof(key), + &aux, sizeof(aux), + vnode->status.size)); #endif } @@ -528,6 +540,7 @@ struct inode *afs_root_iget(struct super_block *sb, struct key *key) vnode = AFS_FS_I(inode); vnode->cb_v_break = as->volume->cb_v_break, + afs_set_netfs_context(vnode); op = afs_alloc_operation(key, as->volume); if (IS_ERR(op)) { @@ -786,11 +799,8 @@ void afs_evict_inode(struct inode *inode) afs_put_wb_key(wbk); } -#ifdef CONFIG_AFS_FSCACHE - fscache_relinquish_cookie(vnode->cache, + fscache_relinquish_cookie(afs_vnode_cache(vnode), test_bit(AFS_VNODE_DELETED, &vnode->flags)); - vnode->cache = NULL; -#endif afs_prune_wb_keys(vnode); afs_put_permits(rcu_access_pointer(vnode->permit_cache)); diff --git a/fs/afs/internal.h b/fs/afs/internal.h index dc5032e10244..7b7ef945dc78 100644 --- a/fs/afs/internal.h +++ b/fs/afs/internal.h @@ -207,7 +207,7 @@ struct afs_read { loff_t file_size; /* File size returned by server */ struct key *key; /* The key to use to reissue the read */ struct afs_vnode *vnode; /* The file being read into. */ - struct netfs_read_subrequest *subreq; /* Fscache helper read request this belongs to */ + struct netfs_io_subrequest *subreq; /* Fscache helper read request this belongs to */ afs_dataversion_t data_version; /* Version number returned by server */ refcount_t usage; unsigned int call_debug_id; @@ -619,15 +619,16 @@ enum afs_lock_state { * leak from one inode to another. */ struct afs_vnode { - struct inode vfs_inode; /* the VFS's inode record */ + struct { + /* These must be contiguous */ + struct inode vfs_inode; /* the VFS's inode record */ + struct netfs_i_context netfs_ctx; /* Netfslib context */ + }; struct afs_volume *volume; /* volume on which vnode resides */ struct afs_fid fid; /* the file identifier for this inode */ struct afs_file_status status; /* AFS status info for this file */ afs_dataversion_t invalid_before; /* Child dentries are invalid before this */ -#ifdef CONFIG_AFS_FSCACHE - struct fscache_cookie *cache; /* caching cookie */ -#endif struct afs_permits __rcu *permit_cache; /* cache of permits so far obtained */ struct mutex io_lock; /* Lock for serialising I/O on this mutex */ struct rw_semaphore validate_lock; /* lock for validating this vnode */ @@ -674,12 +675,20 @@ struct afs_vnode { static inline struct fscache_cookie *afs_vnode_cache(struct afs_vnode *vnode) { #ifdef CONFIG_AFS_FSCACHE - return vnode->cache; + return netfs_i_cookie(&vnode->vfs_inode); #else return NULL; #endif } +static inline void afs_vnode_set_cache(struct afs_vnode *vnode, + struct fscache_cookie *cookie) +{ +#ifdef CONFIG_AFS_FSCACHE + vnode->netfs_ctx.cache = cookie; +#endif +} + /* * cached security record for one user's attempt to access a vnode */ @@ -1063,7 +1072,7 @@ extern const struct address_space_operations afs_file_aops; extern const struct address_space_operations afs_symlink_aops; extern const struct inode_operations afs_file_inode_operations; extern const struct file_operations afs_file_operations; -extern const struct netfs_read_request_ops afs_req_ops; +extern const struct netfs_request_ops afs_req_ops; extern int afs_cache_wb_key(struct afs_vnode *, struct afs_file *); extern void afs_put_wb_key(struct afs_wb_key *); diff --git a/fs/afs/super.c b/fs/afs/super.c index 7592c0f469f1..1fea195b0b27 100644 --- a/fs/afs/super.c +++ b/fs/afs/super.c @@ -688,13 +688,11 @@ static struct inode *afs_alloc_inode(struct super_block *sb) /* Reset anything that shouldn't leak from one inode to the next. */ memset(&vnode->fid, 0, sizeof(vnode->fid)); memset(&vnode->status, 0, sizeof(vnode->status)); + afs_vnode_set_cache(vnode, NULL); vnode->volume = NULL; vnode->lock_key = NULL; vnode->permit_cache = NULL; -#ifdef CONFIG_AFS_FSCACHE - vnode->cache = NULL; -#endif vnode->flags = 1 << AFS_VNODE_UNSET; vnode->lock_state = AFS_VNODE_LOCK_NONE; diff --git a/fs/afs/write.c b/fs/afs/write.c index e1c17081d18e..6bcf1475511b 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -60,8 +60,7 @@ int afs_write_begin(struct file *file, struct address_space *mapping, * file. We need to do this before we get a lock on the page in case * there's more than one writer competing for the same cache block. */ - ret = netfs_write_begin(file, mapping, pos, len, flags, &folio, fsdata, - &afs_req_ops, NULL); + ret = netfs_write_begin(file, mapping, pos, len, flags, &folio, fsdata); if (ret < 0) return ret; @@ -355,9 +354,10 @@ static const struct afs_operation_ops afs_store_data_operation = { static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t pos, bool laundering) { + struct netfs_i_context *ictx = &vnode->netfs_ctx; struct afs_operation *op; struct afs_wb_key *wbk = NULL; - loff_t size = iov_iter_count(iter), i_size; + loff_t size = iov_iter_count(iter); int ret = -ENOKEY; _enter("%s{%llx:%llu.%u},%llx,%llx", @@ -379,15 +379,13 @@ static int afs_store_data(struct afs_vnode *vnode, struct iov_iter *iter, loff_t return -ENOMEM; } - i_size = i_size_read(&vnode->vfs_inode); - afs_op_set_vnode(op, 0, vnode); op->file[0].dv_delta = 1; op->file[0].modification = true; op->store.write_iter = iter; op->store.pos = pos; op->store.size = size; - op->store.i_size = max(pos + size, i_size); + op->store.i_size = max(pos + size, ictx->remote_i_size); op->store.laundering = laundering; op->mtime = vnode->vfs_inode.i_mtime; op->flags |= AFS_OPERATION_UNINTR; diff --git a/fs/cachefiles/io.c b/fs/cachefiles/io.c index bc7c7a7d9260..9dc81e781f2b 100644 --- a/fs/cachefiles/io.c +++ b/fs/cachefiles/io.c @@ -380,18 +380,18 @@ presubmission_error: * Prepare a read operation, shortening it to a cached/uncached * boundary as appropriate. */ -static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subrequest *subreq, +static enum netfs_io_source cachefiles_prepare_read(struct netfs_io_subrequest *subreq, loff_t i_size) { enum cachefiles_prepare_read_trace why; - struct netfs_read_request *rreq = subreq->rreq; + struct netfs_io_request *rreq = subreq->rreq; struct netfs_cache_resources *cres = &rreq->cache_resources; struct cachefiles_object *object; struct cachefiles_cache *cache; struct fscache_cookie *cookie = fscache_cres_cookie(cres); const struct cred *saved_cred; struct file *file = cachefiles_cres_file(cres); - enum netfs_read_source ret = NETFS_DOWNLOAD_FROM_SERVER; + enum netfs_io_source ret = NETFS_DOWNLOAD_FROM_SERVER; loff_t off, to; ino_t ino = file ? file_inode(file)->i_ino : 0; @@ -404,7 +404,7 @@ static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subreque } if (test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags)) { - __set_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); + __set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); why = cachefiles_trace_read_no_data; goto out_no_object; } @@ -473,7 +473,7 @@ static enum netfs_read_source cachefiles_prepare_read(struct netfs_read_subreque goto out; download_and_store: - __set_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); + __set_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); out: cachefiles_end_secure(cache, saved_cred); out_no_object: diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index c7a0ab0d298b..aa25bffd4823 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -182,7 +182,7 @@ static int ceph_releasepage(struct page *page, gfp_t gfp) return 1; } -static void ceph_netfs_expand_readahead(struct netfs_read_request *rreq) +static void ceph_netfs_expand_readahead(struct netfs_io_request *rreq) { struct inode *inode = rreq->inode; struct ceph_inode_info *ci = ceph_inode(inode); @@ -199,7 +199,7 @@ static void ceph_netfs_expand_readahead(struct netfs_read_request *rreq) rreq->len = roundup(rreq->len, lo->stripe_unit); } -static bool ceph_netfs_clamp_length(struct netfs_read_subrequest *subreq) +static bool ceph_netfs_clamp_length(struct netfs_io_subrequest *subreq) { struct inode *inode = subreq->rreq->inode; struct ceph_fs_client *fsc = ceph_inode_to_client(inode); @@ -218,7 +218,7 @@ static void finish_netfs_read(struct ceph_osd_request *req) { struct ceph_fs_client *fsc = ceph_inode_to_client(req->r_inode); struct ceph_osd_data *osd_data = osd_req_op_extent_osd_data(req, 0); - struct netfs_read_subrequest *subreq = req->r_priv; + struct netfs_io_subrequest *subreq = req->r_priv; int num_pages; int err = req->r_result; @@ -244,9 +244,9 @@ static void finish_netfs_read(struct ceph_osd_request *req) iput(req->r_inode); } -static bool ceph_netfs_issue_op_inline(struct netfs_read_subrequest *subreq) +static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq) { - struct netfs_read_request *rreq = subreq->rreq; + struct netfs_io_request *rreq = subreq->rreq; struct inode *inode = rreq->inode; struct ceph_mds_reply_info_parsed *rinfo; struct ceph_mds_reply_info_in *iinfo; @@ -258,7 +258,7 @@ static bool ceph_netfs_issue_op_inline(struct netfs_read_subrequest *subreq) size_t len; __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); - __clear_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); + __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); if (subreq->start >= inode->i_size) goto out; @@ -297,9 +297,9 @@ out: return true; } -static void ceph_netfs_issue_op(struct netfs_read_subrequest *subreq) +static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) { - struct netfs_read_request *rreq = subreq->rreq; + struct netfs_io_request *rreq = subreq->rreq; struct inode *inode = rreq->inode; struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); @@ -353,6 +353,45 @@ out: dout("%s: result %d\n", __func__, err); } +static int ceph_init_request(struct netfs_io_request *rreq, struct file *file) +{ + struct inode *inode = rreq->inode; + int got = 0, want = CEPH_CAP_FILE_CACHE; + int ret = 0; + + if (rreq->origin != NETFS_READAHEAD) + return 0; + + if (file) { + struct ceph_rw_context *rw_ctx; + struct ceph_file_info *fi = file->private_data; + + rw_ctx = ceph_find_rw_context(fi); + if (rw_ctx) + return 0; + } + + /* + * readahead callers do not necessarily hold Fcb caps + * (e.g. fadvise, madvise). + */ + ret = ceph_try_get_caps(inode, CEPH_CAP_FILE_RD, want, true, &got); + if (ret < 0) { + dout("start_read %p, error getting cap\n", inode); + return ret; + } + + if (!(got & want)) { + dout("start_read %p, no cache cap\n", inode); + return -EACCES; + } + if (ret == 0) + return -EACCES; + + rreq->netfs_priv = (void *)(uintptr_t)got; + return 0; +} + static void ceph_readahead_cleanup(struct address_space *mapping, void *priv) { struct inode *inode = mapping->host; @@ -363,64 +402,16 @@ static void ceph_readahead_cleanup(struct address_space *mapping, void *priv) ceph_put_cap_refs(ci, got); } -static const struct netfs_read_request_ops ceph_netfs_read_ops = { - .is_cache_enabled = ceph_is_cache_enabled, +const struct netfs_request_ops ceph_netfs_ops = { + .init_request = ceph_init_request, .begin_cache_operation = ceph_begin_cache_operation, - .issue_op = ceph_netfs_issue_op, + .issue_read = ceph_netfs_issue_read, .expand_readahead = ceph_netfs_expand_readahead, .clamp_length = ceph_netfs_clamp_length, .check_write_begin = ceph_netfs_check_write_begin, .cleanup = ceph_readahead_cleanup, }; -/* read a single page, without unlocking it. */ -static int ceph_readpage(struct file *file, struct page *subpage) -{ - struct folio *folio = page_folio(subpage); - struct inode *inode = file_inode(file); - struct ceph_inode_info *ci = ceph_inode(inode); - struct ceph_vino vino = ceph_vino(inode); - size_t len = folio_size(folio); - u64 off = folio_file_pos(folio); - - dout("readpage ino %llx.%llx file %p off %llu len %zu folio %p index %lu\n inline %d", - vino.ino, vino.snap, file, off, len, folio, folio_index(folio), - ci->i_inline_version != CEPH_INLINE_NONE); - - return netfs_readpage(file, folio, &ceph_netfs_read_ops, NULL); -} - -static void ceph_readahead(struct readahead_control *ractl) -{ - struct inode *inode = file_inode(ractl->file); - struct ceph_file_info *fi = ractl->file->private_data; - struct ceph_rw_context *rw_ctx; - int got = 0; - int ret = 0; - - if (ceph_inode(inode)->i_inline_version != CEPH_INLINE_NONE) - return; - - rw_ctx = ceph_find_rw_context(fi); - if (!rw_ctx) { - /* - * readahead callers do not necessarily hold Fcb caps - * (e.g. fadvise, madvise). - */ - int want = CEPH_CAP_FILE_CACHE; - - ret = ceph_try_get_caps(inode, CEPH_CAP_FILE_RD, want, true, &got); - if (ret < 0) - dout("start_read %p, error getting cap\n", inode); - else if (!(got & want)) - dout("start_read %p, no cache cap\n", inode); - - if (ret <= 0) - return; - } - netfs_readahead(ractl, &ceph_netfs_read_ops, (void *)(uintptr_t)got); -} - #ifdef CONFIG_CEPH_FSCACHE static void ceph_set_page_fscache(struct page *page) { @@ -1327,8 +1318,7 @@ static int ceph_write_begin(struct file *file, struct address_space *mapping, struct folio *folio = NULL; int r; - r = netfs_write_begin(file, inode->i_mapping, pos, len, 0, &folio, NULL, - &ceph_netfs_read_ops, NULL); + r = netfs_write_begin(file, inode->i_mapping, pos, len, 0, &folio, NULL); if (r == 0) folio_wait_fscache(folio); if (r < 0) { @@ -1382,8 +1372,8 @@ out: } const struct address_space_operations ceph_aops = { - .readpage = ceph_readpage, - .readahead = ceph_readahead, + .readpage = netfs_readpage, + .readahead = netfs_readahead, .writepage = ceph_writepage, .writepages = ceph_writepages_start, .write_begin = ceph_write_begin, diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c index 7d22850623ef..ddea99922073 100644 --- a/fs/ceph/cache.c +++ b/fs/ceph/cache.c @@ -29,26 +29,25 @@ void ceph_fscache_register_inode_cookie(struct inode *inode) if (!(inode->i_state & I_NEW)) return; - WARN_ON_ONCE(ci->fscache); + WARN_ON_ONCE(ci->netfs_ctx.cache); - ci->fscache = fscache_acquire_cookie(fsc->fscache, 0, - &ci->i_vino, sizeof(ci->i_vino), - &ci->i_version, sizeof(ci->i_version), - i_size_read(inode)); + ci->netfs_ctx.cache = + fscache_acquire_cookie(fsc->fscache, 0, + &ci->i_vino, sizeof(ci->i_vino), + &ci->i_version, sizeof(ci->i_version), + i_size_read(inode)); } -void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info* ci) +void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info *ci) { - struct fscache_cookie *cookie = ci->fscache; - - fscache_relinquish_cookie(cookie, false); + fscache_relinquish_cookie(ceph_fscache_cookie(ci), false); } void ceph_fscache_use_cookie(struct inode *inode, bool will_modify) { struct ceph_inode_info *ci = ceph_inode(inode); - fscache_use_cookie(ci->fscache, will_modify); + fscache_use_cookie(ceph_fscache_cookie(ci), will_modify); } void ceph_fscache_unuse_cookie(struct inode *inode, bool update) @@ -58,9 +57,10 @@ void ceph_fscache_unuse_cookie(struct inode *inode, bool update) if (update) { loff_t i_size = i_size_read(inode); - fscache_unuse_cookie(ci->fscache, &ci->i_version, &i_size); + fscache_unuse_cookie(ceph_fscache_cookie(ci), + &ci->i_version, &i_size); } else { - fscache_unuse_cookie(ci->fscache, NULL, NULL); + fscache_unuse_cookie(ceph_fscache_cookie(ci), NULL, NULL); } } @@ -69,14 +69,14 @@ void ceph_fscache_update(struct inode *inode) struct ceph_inode_info *ci = ceph_inode(inode); loff_t i_size = i_size_read(inode); - fscache_update_cookie(ci->fscache, &ci->i_version, &i_size); + fscache_update_cookie(ceph_fscache_cookie(ci), &ci->i_version, &i_size); } void ceph_fscache_invalidate(struct inode *inode, bool dio_write) { struct ceph_inode_info *ci = ceph_inode(inode); - fscache_invalidate(ceph_inode(inode)->fscache, + fscache_invalidate(ceph_fscache_cookie(ci), &ci->i_version, i_size_read(inode), dio_write ? FSCACHE_INVAL_DIO_WRITE : 0); } diff --git a/fs/ceph/cache.h b/fs/ceph/cache.h index b90f3016994d..7255b790a4c1 100644 --- a/fs/ceph/cache.h +++ b/fs/ceph/cache.h @@ -26,14 +26,9 @@ void ceph_fscache_unuse_cookie(struct inode *inode, bool update); void ceph_fscache_update(struct inode *inode); void ceph_fscache_invalidate(struct inode *inode, bool dio_write); -static inline void ceph_fscache_inode_init(struct ceph_inode_info *ci) -{ - ci->fscache = NULL; -} - static inline struct fscache_cookie *ceph_fscache_cookie(struct ceph_inode_info *ci) { - return ci->fscache; + return netfs_i_cookie(&ci->vfs_inode); } static inline void ceph_fscache_resize(struct inode *inode, loff_t to) @@ -62,7 +57,7 @@ static inline int ceph_fscache_dirty_folio(struct address_space *mapping, return fscache_dirty_folio(mapping, folio, ceph_fscache_cookie(ci)); } -static inline int ceph_begin_cache_operation(struct netfs_read_request *rreq) +static inline int ceph_begin_cache_operation(struct netfs_io_request *rreq) { struct fscache_cookie *cookie = ceph_fscache_cookie(ceph_inode(rreq->inode)); @@ -91,10 +86,6 @@ static inline void ceph_fscache_unregister_fs(struct ceph_fs_client* fsc) { } -static inline void ceph_fscache_inode_init(struct ceph_inode_info *ci) -{ -} - static inline void ceph_fscache_register_inode_cookie(struct inode *inode) { } @@ -144,7 +135,7 @@ static inline bool ceph_is_cache_enabled(struct inode *inode) return false; } -static inline int ceph_begin_cache_operation(struct netfs_read_request *rreq) +static inline int ceph_begin_cache_operation(struct netfs_io_request *rreq) { return -ENOBUFS; } diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c index d80911dc91c2..63113e2a4890 100644 --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@ -459,6 +459,9 @@ struct inode *ceph_alloc_inode(struct super_block *sb) dout("alloc_inode %p\n", &ci->vfs_inode); + /* Set parameters for the netfs library */ + netfs_i_context_init(&ci->vfs_inode, &ceph_netfs_ops); + spin_lock_init(&ci->i_ceph_lock); ci->i_version = 0; @@ -544,9 +547,6 @@ struct inode *ceph_alloc_inode(struct super_block *sb) INIT_WORK(&ci->i_work, ceph_inode_work); ci->i_work_mask = 0; memset(&ci->i_btime, '\0', sizeof(ci->i_btime)); - - ceph_fscache_inode_init(ci); - return &ci->vfs_inode; } diff --git a/fs/ceph/super.h b/fs/ceph/super.h index a1ecc410a495..20ceab74e871 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -17,13 +17,11 @@ #include <linux/posix_acl.h> #include <linux/refcount.h> #include <linux/security.h> +#include <linux/netfs.h> +#include <linux/fscache.h> #include <linux/ceph/libceph.h> -#ifdef CONFIG_CEPH_FSCACHE -#include <linux/fscache.h> -#endif - /* large granularity for statfs utilization stats to facilitate * large volume sizes on 32-bit machines. */ #define CEPH_BLOCK_SHIFT 22 /* 4 MB */ @@ -318,6 +316,11 @@ struct ceph_inode_xattrs_info { * Ceph inode. */ struct ceph_inode_info { + struct { + /* These must be contiguous */ + struct inode vfs_inode; + struct netfs_i_context netfs_ctx; /* Netfslib context */ + }; struct ceph_vino i_vino; /* ceph ino + snap */ spinlock_t i_ceph_lock; @@ -428,11 +431,6 @@ struct ceph_inode_info { struct work_struct i_work; unsigned long i_work_mask; - -#ifdef CONFIG_CEPH_FSCACHE - struct fscache_cookie *fscache; -#endif - struct inode vfs_inode; /* at end */ }; static inline struct ceph_inode_info * @@ -1216,6 +1214,7 @@ extern void __ceph_touch_fmode(struct ceph_inode_info *ci, /* addr.c */ extern const struct address_space_operations ceph_aops; +extern const struct netfs_request_ops ceph_netfs_ops; extern int ceph_mmap(struct file *file, struct vm_area_struct *vma); extern int ceph_uninline_data(struct file *file); extern int ceph_pool_perm_check(struct inode *inode, int need); diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 48b343d03430..0a4085ced40f 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -16,6 +16,7 @@ #include <linux/mempool.h> #include <linux/workqueue.h> #include <linux/utsname.h> +#include <linux/netfs.h> #include "cifs_fs_sb.h" #include "cifsacl.h" #include <crypto/internal/hash.h> @@ -1402,6 +1403,11 @@ void cifsFileInfo_put(struct cifsFileInfo *cifs_file); */ struct cifsInodeInfo { + struct { + /* These must be contiguous */ + struct inode vfs_inode; /* the VFS's inode record */ + struct netfs_i_context netfs_ctx; /* Netfslib context */ + }; bool can_cache_brlcks; struct list_head llist; /* locks helb by this inode */ /* @@ -1432,10 +1438,6 @@ struct cifsInodeInfo { u64 uniqueid; /* server inode number */ u64 createtime; /* creation time on server */ __u8 lease_key[SMB2_LEASE_KEY_SIZE]; /* lease key for this inode */ -#ifdef CONFIG_CIFS_FSCACHE - struct fscache_cookie *fscache; -#endif - struct inode vfs_inode; struct list_head deferred_closes; /* list of deferred closes */ spinlock_t deferred_lock; /* protection on deferred list */ bool lease_granted; /* Flag to indicate whether lease or oplock is granted. */ diff --git a/fs/cifs/fscache.c b/fs/cifs/fscache.c index 33af72e0ac0c..a638b29e9062 100644 --- a/fs/cifs/fscache.c +++ b/fs/cifs/fscache.c @@ -103,7 +103,7 @@ void cifs_fscache_get_inode_cookie(struct inode *inode) cifs_fscache_fill_coherency(&cifsi->vfs_inode, &cd); - cifsi->fscache = + cifsi->netfs_ctx.cache = fscache_acquire_cookie(tcon->fscache, 0, &cifsi->uniqueid, sizeof(cifsi->uniqueid), &cd, sizeof(cd), @@ -126,22 +126,15 @@ void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update) void cifs_fscache_release_inode_cookie(struct inode *inode) { struct cifsInodeInfo *cifsi = CIFS_I(inode); + struct fscache_cookie *cookie = cifs_inode_cookie(inode); - if (cifsi->fscache) { - cifs_dbg(FYI, "%s: (0x%p)\n", __func__, cifsi->fscache); - fscache_relinquish_cookie(cifsi->fscache, false); - cifsi->fscache = NULL; + if (cookie) { + cifs_dbg(FYI, "%s: (0x%p)\n", __func__, cookie); + fscache_relinquish_cookie(cookie, false); + cifsi->netfs_ctx.cache = NULL; } } -static inline void fscache_end_operation(struct netfs_cache_resources *cres) -{ - const struct netfs_cache_ops *ops = fscache_operation_valid(cres); - - if (ops) - ops->end_operation(cres); -} - /* * Fallback page reading interface. */ diff --git a/fs/cifs/fscache.h b/fs/cifs/fscache.h index 55129908e2c1..52355c0912ae 100644 --- a/fs/cifs/fscache.h +++ b/fs/cifs/fscache.h @@ -61,7 +61,7 @@ void cifs_fscache_fill_coherency(struct inode *inode, static inline struct fscache_cookie *cifs_inode_cookie(struct inode *inode) { - return CIFS_I(inode)->fscache; + return netfs_i_cookie(inode); } static inline void cifs_invalidate_cache(struct inode *inode, unsigned int flags) diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h index f121c21590dc..ed1c9ed737f2 100644 --- a/fs/fscache/internal.h +++ b/fs/fscache/internal.h @@ -71,17 +71,6 @@ static inline void fscache_see_cookie(struct fscache_cookie *cookie, } /* - * io.c - */ -static inline void fscache_end_operation(struct netfs_cache_resources *cres) -{ - const struct netfs_cache_ops *ops = fscache_operation_valid(cres); - - if (ops) - ops->end_operation(cres); -} - -/* * main.c */ extern unsigned fscache_debug; diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index c15bfc966d96..f684c0cd1ec5 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -1,5 +1,11 @@ # SPDX-License-Identifier: GPL-2.0 -netfs-y := read_helper.o stats.o +netfs-y := \ + buffered_read.o \ + io.o \ + main.o \ + objects.o + +netfs-$(CONFIG_NETFS_STATS) += stats.o obj-$(CONFIG_NETFS_SUPPORT) := netfs.o diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c new file mode 100644 index 000000000000..281a88a5b8dc --- /dev/null +++ b/fs/netfs/buffered_read.c @@ -0,0 +1,428 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Network filesystem high-level buffered read support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/export.h> +#include <linux/task_io_accounting_ops.h> +#include "internal.h" + +/* + * Unlock the folios in a read operation. We need to set PG_fscache on any + * folios we're going to write back before we unlock them. + */ +void netfs_rreq_unlock_folios(struct netfs_io_request *rreq) +{ + struct netfs_io_subrequest *subreq; + struct folio *folio; + unsigned int iopos, account = 0; + pgoff_t start_page = rreq->start / PAGE_SIZE; + pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1; + bool subreq_failed = false; + + XA_STATE(xas, &rreq->mapping->i_pages, start_page); + + if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) { + __clear_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags); + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags); + } + } + + /* Walk through the pagecache and the I/O request lists simultaneously. + * We may have a mixture of cached and uncached sections and we only + * really want to write out the uncached sections. This is slightly + * complicated by the possibility that we might have huge pages with a + * mixture inside. + */ + subreq = list_first_entry(&rreq->subrequests, + struct netfs_io_subrequest, rreq_link); + iopos = 0; + subreq_failed = (subreq->error < 0); + + trace_netfs_rreq(rreq, netfs_rreq_trace_unlock); + + rcu_read_lock(); + xas_for_each(&xas, folio, last_page) { + unsigned int pgpos = (folio_index(folio) - start_page) * PAGE_SIZE; + unsigned int pgend = pgpos + folio_size(folio); + bool pg_failed = false; + + for (;;) { + if (!subreq) { + pg_failed = true; + break; + } + if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) + folio_start_fscache(folio); + pg_failed |= subreq_failed; + if (pgend < iopos + subreq->len) + break; + + account += subreq->transferred; + iopos += subreq->len; + if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + subreq = list_next_entry(subreq, rreq_link); + subreq_failed = (subreq->error < 0); + } else { + subreq = NULL; + subreq_failed = false; + } + if (pgend == iopos) + break; + } + + if (!pg_failed) { + flush_dcache_folio(folio); + folio_mark_uptodate(folio); + } + + if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) { + if (folio_index(folio) == rreq->no_unlock_folio && + test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags)) + _debug("no unlock"); + else + folio_unlock(folio); + } + } + rcu_read_unlock(); + + task_io_account_read(account); + if (rreq->netfs_ops->done) + rreq->netfs_ops->done(rreq); +} + +static void netfs_cache_expand_readahead(struct netfs_io_request *rreq, + loff_t *_start, size_t *_len, loff_t i_size) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + + if (cres->ops && cres->ops->expand_readahead) + cres->ops->expand_readahead(cres, _start, _len, i_size); +} + +static void netfs_rreq_expand(struct netfs_io_request *rreq, + struct readahead_control *ractl) +{ + /* Give the cache a chance to change the request parameters. The + * resultant request must contain the original region. + */ + netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size); + + /* Give the netfs a chance to change the request parameters. The + * resultant request must contain the original region. + */ + if (rreq->netfs_ops->expand_readahead) + rreq->netfs_ops->expand_readahead(rreq); + + /* Expand the request if the cache wants it to start earlier. Note + * that the expansion may get further extended if the VM wishes to + * insert THPs and the preferred start and/or end wind up in the middle + * of THPs. + * + * If this is the case, however, the THP size should be an integer + * multiple of the cache granule size, so we get a whole number of + * granules to deal with. + */ + if (rreq->start != readahead_pos(ractl) || + rreq->len != readahead_length(ractl)) { + readahead_expand(ractl, rreq->start, rreq->len); + rreq->start = readahead_pos(ractl); + rreq->len = readahead_length(ractl); + + trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), + netfs_read_trace_expanded); + } +} + +/** + * netfs_readahead - Helper to manage a read request + * @ractl: The description of the readahead request + * + * Fulfil a readahead request by drawing data from the cache if possible, or + * the netfs if not. Space beyond the EOF is zero-filled. Multiple I/O + * requests from different sources will get munged together. If necessary, the + * readahead window can be expanded in either direction to a more convenient + * alighment for RPC efficiency or to make storage in the cache feasible. + * + * The calling netfs must initialise a netfs context contiguous to the vfs + * inode before calling this. + * + * This is usable whether or not caching is enabled. + */ +void netfs_readahead(struct readahead_control *ractl) +{ + struct netfs_io_request *rreq; + struct netfs_i_context *ctx = netfs_i_context(ractl->mapping->host); + int ret; + + _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); + + if (readahead_count(ractl) == 0) + return; + + rreq = netfs_alloc_request(ractl->mapping, ractl->file, + readahead_pos(ractl), + readahead_length(ractl), + NETFS_READAHEAD); + if (IS_ERR(rreq)) + return; + + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto cleanup_free; + } + + netfs_stat(&netfs_n_rh_readahead); + trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), + netfs_read_trace_readahead); + + netfs_rreq_expand(rreq, ractl); + + /* Drop the refs on the folios here rather than in the cache or + * filesystem. The locks will be dropped in netfs_rreq_unlock(). + */ + while (readahead_folio(ractl)) + ; + + netfs_begin_read(rreq, false); + return; + +cleanup_free: + netfs_put_request(rreq, false, netfs_rreq_trace_put_failed); + return; +} +EXPORT_SYMBOL(netfs_readahead); + +/** + * netfs_readpage - Helper to manage a readpage request + * @file: The file to read from + * @subpage: A subpage of the folio to read + * + * Fulfil a readpage request by drawing data from the cache if possible, or the + * netfs if not. Space beyond the EOF is zero-filled. Multiple I/O requests + * from different sources will get munged together. + * + * The calling netfs must initialise a netfs context contiguous to the vfs + * inode before calling this. + * + * This is usable whether or not caching is enabled. + */ +int netfs_readpage(struct file *file, struct page *subpage) +{ + struct folio *folio = page_folio(subpage); + struct address_space *mapping = folio_file_mapping(folio); + struct netfs_io_request *rreq; + struct netfs_i_context *ctx = netfs_i_context(mapping->host); + int ret; + + _enter("%lx", folio_index(folio)); + + rreq = netfs_alloc_request(mapping, file, + folio_file_pos(folio), folio_size(folio), + NETFS_READPAGE); + if (IS_ERR(rreq)) { + ret = PTR_ERR(rreq); + goto alloc_error; + } + + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto discard; + } + + netfs_stat(&netfs_n_rh_readpage); + trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); + return netfs_begin_read(rreq, true); + +discard: + netfs_put_request(rreq, false, netfs_rreq_trace_put_discard); +alloc_error: + folio_unlock(folio); + return ret; +} +EXPORT_SYMBOL(netfs_readpage); + +/* + * Prepare a folio for writing without reading first + * @folio: The folio being prepared + * @pos: starting position for the write + * @len: length of write + * @always_fill: T if the folio should always be completely filled/cleared + * + * In some cases, write_begin doesn't need to read at all: + * - full folio write + * - write that lies in a folio that is completely beyond EOF + * - write that covers the folio from start to EOF or beyond it + * + * If any of these criteria are met, then zero out the unwritten parts + * of the folio and return true. Otherwise, return false. + */ +static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len, + bool always_fill) +{ + struct inode *inode = folio_inode(folio); + loff_t i_size = i_size_read(inode); + size_t offset = offset_in_folio(folio, pos); + size_t plen = folio_size(folio); + + if (unlikely(always_fill)) { + if (pos - offset + len <= i_size) + return false; /* Page entirely before EOF */ + zero_user_segment(&folio->page, 0, plen); + folio_mark_uptodate(folio); + return true; + } + + /* Full folio write */ + if (offset == 0 && len >= plen) + return true; + + /* Page entirely beyond the end of the file */ + if (pos - offset >= i_size) + goto zero_out; + + /* Write that covers from the start of the folio to EOF or beyond */ + if (offset == 0 && (pos + len) >= i_size) + goto zero_out; + + return false; +zero_out: + zero_user_segments(&folio->page, 0, offset, offset + len, plen); + return true; +} + +/** + * netfs_write_begin - Helper to prepare for writing + * @file: The file to read from + * @mapping: The mapping to read from + * @pos: File position at which the write will begin + * @len: The length of the write (may extend beyond the end of the folio chosen) + * @aop_flags: AOP_* flags + * @_folio: Where to put the resultant folio + * @_fsdata: Place for the netfs to store a cookie + * + * Pre-read data for a write-begin request by drawing data from the cache if + * possible, or the netfs if not. Space beyond the EOF is zero-filled. + * Multiple I/O requests from different sources will get munged together. If + * necessary, the readahead window can be expanded in either direction to a + * more convenient alighment for RPC efficiency or to make storage in the cache + * feasible. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. + * + * The check_write_begin() operation can be provided to check for and flush + * conflicting writes once the folio is grabbed and locked. It is passed a + * pointer to the fsdata cookie that gets returned to the VM to be passed to + * write_end. It is permitted to sleep. It should return 0 if the request + * should go ahead; unlock the folio and return -EAGAIN to cause the folio to + * be regot; or return an error. + * + * The calling netfs must initialise a netfs context contiguous to the vfs + * inode before calling this. + * + * This is usable whether or not caching is enabled. + */ +int netfs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned int len, unsigned int aop_flags, + struct folio **_folio, void **_fsdata) +{ + struct netfs_io_request *rreq; + struct netfs_i_context *ctx = netfs_i_context(file_inode(file )); + struct folio *folio; + unsigned int fgp_flags; + pgoff_t index = pos >> PAGE_SHIFT; + int ret; + + DEFINE_READAHEAD(ractl, file, NULL, mapping, index); + +retry: + fgp_flags = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE; + if (aop_flags & AOP_FLAG_NOFS) + fgp_flags |= FGP_NOFS; + folio = __filemap_get_folio(mapping, index, fgp_flags, + mapping_gfp_mask(mapping)); + if (!folio) + return -ENOMEM; + + if (ctx->ops->check_write_begin) { + /* Allow the netfs (eg. ceph) to flush conflicts. */ + ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata); + if (ret < 0) { + trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin); + if (ret == -EAGAIN) + goto retry; + goto error; + } + } + + if (folio_test_uptodate(folio)) + goto have_folio; + + /* If the page is beyond the EOF, we want to clear it - unless it's + * within the cache granule containing the EOF, in which case we need + * to preload the granule. + */ + if (!netfs_is_cache_enabled(ctx) && + netfs_skip_folio_read(folio, pos, len, false)) { + netfs_stat(&netfs_n_rh_write_zskip); + goto have_folio_no_wait; + } + + rreq = netfs_alloc_request(mapping, file, + folio_file_pos(folio), folio_size(folio), + NETFS_READ_FOR_WRITE); + if (IS_ERR(rreq)) { + ret = PTR_ERR(rreq); + goto error; + } + rreq->no_unlock_folio = folio_index(folio); + __set_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags); + + if (ctx->ops->begin_cache_operation) { + ret = ctx->ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto error_put; + } + + netfs_stat(&netfs_n_rh_write_begin); + trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin); + + /* Expand the request to meet caching requirements and download + * preferences. + */ + ractl._nr_pages = folio_nr_pages(folio); + netfs_rreq_expand(rreq, &ractl); + + /* We hold the folio locks, so we can drop the references */ + folio_get(folio); + while (readahead_folio(&ractl)) + ; + + ret = netfs_begin_read(rreq, true); + if (ret < 0) + goto error; + +have_folio: + ret = folio_wait_fscache_killable(folio); + if (ret < 0) + goto error; +have_folio_no_wait: + *_folio = folio; + _leave(" = 0"); + return 0; + +error_put: + netfs_put_request(rreq, false, netfs_rreq_trace_put_failed); +error: + folio_unlock(folio); + folio_put(folio); + _leave(" = %d", ret); + return ret; +} +EXPORT_SYMBOL(netfs_write_begin); diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index b7f2c4459f33..b7b0e3d18d9e 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -5,6 +5,10 @@ * Written by David Howells (dhowells@redhat.com) */ +#include <linux/netfs.h> +#include <linux/fscache.h> +#include <trace/events/netfs.h> + #ifdef pr_fmt #undef pr_fmt #endif @@ -12,11 +16,40 @@ #define pr_fmt(fmt) "netfs: " fmt /* - * read_helper.c + * buffered_read.c + */ +void netfs_rreq_unlock_folios(struct netfs_io_request *rreq); + +/* + * io.c + */ +int netfs_begin_read(struct netfs_io_request *rreq, bool sync); + +/* + * main.c */ extern unsigned int netfs_debug; /* + * objects.c + */ +struct netfs_io_request *netfs_alloc_request(struct address_space *mapping, + struct file *file, + loff_t start, size_t len, + enum netfs_io_origin origin); +void netfs_get_request(struct netfs_io_request *rreq, enum netfs_rreq_ref_trace what); +void netfs_clear_subrequests(struct netfs_io_request *rreq, bool was_async); +void netfs_put_request(struct netfs_io_request *rreq, bool was_async, + enum netfs_rreq_ref_trace what); +struct netfs_io_subrequest *netfs_alloc_subrequest(struct netfs_io_request *rreq); + +static inline void netfs_see_request(struct netfs_io_request *rreq, + enum netfs_rreq_ref_trace what) +{ + trace_netfs_rreq_ref(rreq->debug_id, refcount_read(&rreq->ref), what); +} + +/* * stats.c */ #ifdef CONFIG_NETFS_STATS @@ -55,6 +88,21 @@ static inline void netfs_stat_d(atomic_t *stat) #define netfs_stat_d(x) do {} while(0) #endif +/* + * Miscellaneous functions. + */ +static inline bool netfs_is_cache_enabled(struct netfs_i_context *ctx) +{ +#if IS_ENABLED(CONFIG_FSCACHE) + struct fscache_cookie *cookie = ctx->cache; + + return fscache_cookie_valid(cookie) && cookie->cache_priv && + fscache_cookie_enabled(cookie); +#else + return false; +#endif +} + /*****************************************************************************/ /* * debug tracing diff --git a/fs/netfs/io.c b/fs/netfs/io.c new file mode 100644 index 000000000000..428925899282 --- /dev/null +++ b/fs/netfs/io.c @@ -0,0 +1,657 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Network filesystem high-level read support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/module.h> +#include <linux/export.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/pagemap.h> +#include <linux/slab.h> +#include <linux/uio.h> +#include <linux/sched/mm.h> +#include <linux/task_io_accounting_ops.h> +#include "internal.h" + +/* + * Clear the unread part of an I/O request. + */ +static void netfs_clear_unread(struct netfs_io_subrequest *subreq) +{ + struct iov_iter iter; + + iov_iter_xarray(&iter, READ, &subreq->rreq->mapping->i_pages, + subreq->start + subreq->transferred, + subreq->len - subreq->transferred); + iov_iter_zero(iov_iter_count(&iter), &iter); +} + +static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_io_subrequest *subreq = priv; + + netfs_subreq_terminated(subreq, transferred_or_error, was_async); +} + +/* + * Issue a read against the cache. + * - Eats the caller's ref on subreq. + */ +static void netfs_read_from_cache(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq, + enum netfs_read_from_hole read_hole) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + struct iov_iter iter; + + netfs_stat(&netfs_n_rh_read); + iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, + subreq->start + subreq->transferred, + subreq->len - subreq->transferred); + + cres->ops->read(cres, subreq->start, &iter, read_hole, + netfs_cache_read_terminated, subreq); +} + +/* + * Fill a subrequest region with zeroes. + */ +static void netfs_fill_with_zeroes(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq) +{ + netfs_stat(&netfs_n_rh_zero); + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + netfs_subreq_terminated(subreq, 0, false); +} + +/* + * Ask the netfs to issue a read request to the server for us. + * + * The netfs is expected to read from subreq->pos + subreq->transferred to + * subreq->pos + subreq->len - 1. It may not backtrack and write data into the + * buffer prior to the transferred point as it might clobber dirty data + * obtained from the cache. + * + * Alternatively, the netfs is allowed to indicate one of two things: + * + * - NETFS_SREQ_SHORT_READ: A short read - it will get called again to try and + * make progress. + * + * - NETFS_SREQ_CLEAR_TAIL: A short read - the rest of the buffer will be + * cleared. + */ +static void netfs_read_from_server(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq) +{ + netfs_stat(&netfs_n_rh_download); + rreq->netfs_ops->issue_read(subreq); +} + +/* + * Release those waiting. + */ +static void netfs_rreq_completed(struct netfs_io_request *rreq, bool was_async) +{ + trace_netfs_rreq(rreq, netfs_rreq_trace_done); + netfs_clear_subrequests(rreq, was_async); + netfs_put_request(rreq, was_async, netfs_rreq_trace_put_complete); +} + +/* + * Deal with the completion of writing the data to the cache. We have to clear + * the PG_fscache bits on the folios involved and release the caller's ref. + * + * May be called in softirq mode and we inherit a ref from the caller. + */ +static void netfs_rreq_unmark_after_write(struct netfs_io_request *rreq, + bool was_async) +{ + struct netfs_io_subrequest *subreq; + struct folio *folio; + pgoff_t unlocked = 0; + bool have_unlocked = false; + + rcu_read_lock(); + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE); + + xas_for_each(&xas, folio, (subreq->start + subreq->len - 1) / PAGE_SIZE) { + /* We might have multiple writes from the same huge + * folio, but we mustn't unlock a folio more than once. + */ + if (have_unlocked && folio_index(folio) <= unlocked) + continue; + unlocked = folio_index(folio); + folio_end_fscache(folio); + have_unlocked = true; + } + } + + rcu_read_unlock(); + netfs_rreq_completed(rreq, was_async); +} + +static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_io_subrequest *subreq = priv; + struct netfs_io_request *rreq = subreq->rreq; + + if (IS_ERR_VALUE(transferred_or_error)) { + netfs_stat(&netfs_n_rh_write_failed); + trace_netfs_failure(rreq, subreq, transferred_or_error, + netfs_fail_copy_to_cache); + } else { + netfs_stat(&netfs_n_rh_write_done); + } + + trace_netfs_sreq(subreq, netfs_sreq_trace_write_term); + + /* If we decrement nr_copy_ops to 0, the ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_copy_ops)) + netfs_rreq_unmark_after_write(rreq, was_async); + + netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated); +} + +/* + * Perform any outstanding writes to the cache. We inherit a ref from the + * caller. + */ +static void netfs_rreq_do_write_to_cache(struct netfs_io_request *rreq) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + struct netfs_io_subrequest *subreq, *next, *p; + struct iov_iter iter; + int ret; + + trace_netfs_rreq(rreq, netfs_rreq_trace_copy); + + /* We don't want terminating writes trying to wake us up whilst we're + * still going through the list. + */ + atomic_inc(&rreq->nr_copy_ops); + + list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { + if (!test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) { + list_del_init(&subreq->rreq_link); + netfs_put_subrequest(subreq, false, + netfs_sreq_trace_put_no_copy); + } + } + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + /* Amalgamate adjacent writes */ + while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + next = list_next_entry(subreq, rreq_link); + if (next->start != subreq->start + subreq->len) + break; + subreq->len += next->len; + list_del_init(&next->rreq_link); + netfs_put_subrequest(next, false, + netfs_sreq_trace_put_merged); + } + + ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len, + rreq->i_size, true); + if (ret < 0) { + trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write); + trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip); + continue; + } + + iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages, + subreq->start, subreq->len); + + atomic_inc(&rreq->nr_copy_ops); + netfs_stat(&netfs_n_rh_write); + netfs_get_subrequest(subreq, netfs_sreq_trace_get_copy_to_cache); + trace_netfs_sreq(subreq, netfs_sreq_trace_write); + cres->ops->write(cres, subreq->start, &iter, + netfs_rreq_copy_terminated, subreq); + } + + /* If we decrement nr_copy_ops to 0, the usage ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_copy_ops)) + netfs_rreq_unmark_after_write(rreq, false); +} + +static void netfs_rreq_write_to_cache_work(struct work_struct *work) +{ + struct netfs_io_request *rreq = + container_of(work, struct netfs_io_request, work); + + netfs_rreq_do_write_to_cache(rreq); +} + +static void netfs_rreq_write_to_cache(struct netfs_io_request *rreq) +{ + rreq->work.func = netfs_rreq_write_to_cache_work; + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); +} + +/* + * Handle a short read. + */ +static void netfs_rreq_short_read(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq) +{ + __clear_bit(NETFS_SREQ_SHORT_IO, &subreq->flags); + __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + + netfs_stat(&netfs_n_rh_short_read); + trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short); + + netfs_get_subrequest(subreq, netfs_sreq_trace_get_short_read); + atomic_inc(&rreq->nr_outstanding); + if (subreq->source == NETFS_READ_FROM_CACHE) + netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_CLEAR); + else + netfs_read_from_server(rreq, subreq); +} + +/* + * Resubmit any short or failed operations. Returns true if we got the rreq + * ref back. + */ +static bool netfs_rreq_perform_resubmissions(struct netfs_io_request *rreq) +{ + struct netfs_io_subrequest *subreq; + + WARN_ON(in_interrupt()); + + trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); + + /* We don't want terminating submissions trying to wake us up whilst + * we're still going through the list. + */ + atomic_inc(&rreq->nr_outstanding); + + __clear_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + if (subreq->error) { + if (subreq->source != NETFS_READ_FROM_CACHE) + break; + subreq->source = NETFS_DOWNLOAD_FROM_SERVER; + subreq->error = 0; + netfs_stat(&netfs_n_rh_download_instead); + trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead); + netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); + atomic_inc(&rreq->nr_outstanding); + netfs_read_from_server(rreq, subreq); + } else if (test_bit(NETFS_SREQ_SHORT_IO, &subreq->flags)) { + netfs_rreq_short_read(rreq, subreq); + } + } + + /* If we decrement nr_outstanding to 0, the usage ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_outstanding)) + return true; + + wake_up_var(&rreq->nr_outstanding); + return false; +} + +/* + * Check to see if the data read is still valid. + */ +static void netfs_rreq_is_still_valid(struct netfs_io_request *rreq) +{ + struct netfs_io_subrequest *subreq; + + if (!rreq->netfs_ops->is_still_valid || + rreq->netfs_ops->is_still_valid(rreq)) + return; + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + if (subreq->source == NETFS_READ_FROM_CACHE) { + subreq->error = -ESTALE; + __set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + } + } +} + +/* + * Assess the state of a read request and decide what to do next. + * + * Note that we could be in an ordinary kernel thread, on a workqueue or in + * softirq context at this point. We inherit a ref from the caller. + */ +static void netfs_rreq_assess(struct netfs_io_request *rreq, bool was_async) +{ + trace_netfs_rreq(rreq, netfs_rreq_trace_assess); + +again: + netfs_rreq_is_still_valid(rreq); + + if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && + test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { + if (netfs_rreq_perform_resubmissions(rreq)) + goto again; + return; + } + + netfs_rreq_unlock_folios(rreq); + + clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); + wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); + + if (test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags)) + return netfs_rreq_write_to_cache(rreq); + + netfs_rreq_completed(rreq, was_async); +} + +static void netfs_rreq_work(struct work_struct *work) +{ + struct netfs_io_request *rreq = + container_of(work, struct netfs_io_request, work); + netfs_rreq_assess(rreq, false); +} + +/* + * Handle the completion of all outstanding I/O operations on a read request. + * We inherit a ref from the caller. + */ +static void netfs_rreq_terminated(struct netfs_io_request *rreq, + bool was_async) +{ + if (test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags) && + was_async) { + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_rreq_assess(rreq, was_async); + } +} + +/** + * netfs_subreq_terminated - Note the termination of an I/O operation. + * @subreq: The I/O request that has terminated. + * @transferred_or_error: The amount of data transferred or an error code. + * @was_async: The termination was asynchronous + * + * This tells the read helper that a contributory I/O operation has terminated, + * one way or another, and that it should integrate the results. + * + * The caller indicates in @transferred_or_error the outcome of the operation, + * supplying a positive value to indicate the number of bytes transferred, 0 to + * indicate a failure to transfer anything that should be retried or a negative + * error code. The helper will look after reissuing I/O operations as + * appropriate and writing downloaded data to the cache. + * + * If @was_async is true, the caller might be running in softirq or interrupt + * context and we can't sleep. + */ +void netfs_subreq_terminated(struct netfs_io_subrequest *subreq, + ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_io_request *rreq = subreq->rreq; + int u; + + _enter("[%u]{%llx,%lx},%zd", + subreq->debug_index, subreq->start, subreq->flags, + transferred_or_error); + + switch (subreq->source) { + case NETFS_READ_FROM_CACHE: + netfs_stat(&netfs_n_rh_read_done); + break; + case NETFS_DOWNLOAD_FROM_SERVER: + netfs_stat(&netfs_n_rh_download_done); + break; + default: + break; + } + + if (IS_ERR_VALUE(transferred_or_error)) { + subreq->error = transferred_or_error; + trace_netfs_failure(rreq, subreq, transferred_or_error, + netfs_fail_read); + goto failed; + } + + if (WARN(transferred_or_error > subreq->len - subreq->transferred, + "Subreq overread: R%x[%x] %zd > %zu - %zu", + rreq->debug_id, subreq->debug_index, + transferred_or_error, subreq->len, subreq->transferred)) + transferred_or_error = subreq->len - subreq->transferred; + + subreq->error = 0; + subreq->transferred += transferred_or_error; + if (subreq->transferred < subreq->len) + goto incomplete; + +complete: + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + if (test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) + set_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags); + +out: + trace_netfs_sreq(subreq, netfs_sreq_trace_terminated); + + /* If we decrement nr_outstanding to 0, the ref belongs to us. */ + u = atomic_dec_return(&rreq->nr_outstanding); + if (u == 0) + netfs_rreq_terminated(rreq, was_async); + else if (u == 1) + wake_up_var(&rreq->nr_outstanding); + + netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated); + return; + +incomplete: + if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { + netfs_clear_unread(subreq); + subreq->transferred = subreq->len; + goto complete; + } + + if (transferred_or_error == 0) { + if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { + subreq->error = -ENODATA; + goto failed; + } + } else { + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + } + + __set_bit(NETFS_SREQ_SHORT_IO, &subreq->flags); + set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + goto out; + +failed: + if (subreq->source == NETFS_READ_FROM_CACHE) { + netfs_stat(&netfs_n_rh_read_failed); + set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + } else { + netfs_stat(&netfs_n_rh_download_failed); + set_bit(NETFS_RREQ_FAILED, &rreq->flags); + rreq->error = subreq->error; + } + goto out; +} +EXPORT_SYMBOL(netfs_subreq_terminated); + +static enum netfs_io_source netfs_cache_prepare_read(struct netfs_io_subrequest *subreq, + loff_t i_size) +{ + struct netfs_io_request *rreq = subreq->rreq; + struct netfs_cache_resources *cres = &rreq->cache_resources; + + if (cres->ops) + return cres->ops->prepare_read(subreq, i_size); + if (subreq->start >= rreq->i_size) + return NETFS_FILL_WITH_ZEROES; + return NETFS_DOWNLOAD_FROM_SERVER; +} + +/* + * Work out what sort of subrequest the next one will be. + */ +static enum netfs_io_source +netfs_rreq_prepare_read(struct netfs_io_request *rreq, + struct netfs_io_subrequest *subreq) +{ + enum netfs_io_source source; + + _enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, rreq->i_size); + + source = netfs_cache_prepare_read(subreq, rreq->i_size); + if (source == NETFS_INVALID_READ) + goto out; + + if (source == NETFS_DOWNLOAD_FROM_SERVER) { + /* Call out to the netfs to let it shrink the request to fit + * its own I/O sizes and boundaries. If it shinks it here, it + * will be called again to make simultaneous calls; if it wants + * to make serial calls, it can indicate a short read and then + * we will call it again. + */ + if (subreq->len > rreq->i_size - subreq->start) + subreq->len = rreq->i_size - subreq->start; + + if (rreq->netfs_ops->clamp_length && + !rreq->netfs_ops->clamp_length(subreq)) { + source = NETFS_INVALID_READ; + goto out; + } + } + + if (WARN_ON(subreq->len == 0)) + source = NETFS_INVALID_READ; + +out: + subreq->source = source; + trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); + return source; +} + +/* + * Slice off a piece of a read request and submit an I/O request for it. + */ +static bool netfs_rreq_submit_slice(struct netfs_io_request *rreq, + unsigned int *_debug_index) +{ + struct netfs_io_subrequest *subreq; + enum netfs_io_source source; + + subreq = netfs_alloc_subrequest(rreq); + if (!subreq) + return false; + + subreq->debug_index = (*_debug_index)++; + subreq->start = rreq->start + rreq->submitted; + subreq->len = rreq->len - rreq->submitted; + + _debug("slice %llx,%zx,%zx", subreq->start, subreq->len, rreq->submitted); + list_add_tail(&subreq->rreq_link, &rreq->subrequests); + + /* Call out to the cache to find out what it can do with the remaining + * subset. It tells us in subreq->flags what it decided should be done + * and adjusts subreq->len down if the subset crosses a cache boundary. + * + * Then when we hand the subset, it can choose to take a subset of that + * (the starts must coincide), in which case, we go around the loop + * again and ask it to download the next piece. + */ + source = netfs_rreq_prepare_read(rreq, subreq); + if (source == NETFS_INVALID_READ) + goto subreq_failed; + + atomic_inc(&rreq->nr_outstanding); + + rreq->submitted += subreq->len; + + trace_netfs_sreq(subreq, netfs_sreq_trace_submit); + switch (source) { + case NETFS_FILL_WITH_ZEROES: + netfs_fill_with_zeroes(rreq, subreq); + break; + case NETFS_DOWNLOAD_FROM_SERVER: + netfs_read_from_server(rreq, subreq); + break; + case NETFS_READ_FROM_CACHE: + netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_IGNORE); + break; + default: + BUG(); + } + + return true; + +subreq_failed: + rreq->error = subreq->error; + netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_failed); + return false; +} + +/* + * Begin the process of reading in a chunk of data, where that data may be + * stitched together from multiple sources, including multiple servers and the + * local cache. + */ +int netfs_begin_read(struct netfs_io_request *rreq, bool sync) +{ + unsigned int debug_index = 0; + int ret; + + _enter("R=%x %llx-%llx", + rreq->debug_id, rreq->start, rreq->start + rreq->len - 1); + + if (rreq->len == 0) { + pr_err("Zero-sized read [R=%x]\n", rreq->debug_id); + netfs_put_request(rreq, false, netfs_rreq_trace_put_zero_len); + return -EIO; + } + + INIT_WORK(&rreq->work, netfs_rreq_work); + + if (sync) + netfs_get_request(rreq, netfs_rreq_trace_get_hold); + + /* Chop the read into slices according to what the cache and the netfs + * want and submit each one. + */ + atomic_set(&rreq->nr_outstanding, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + if (sync) { + /* Keep nr_outstanding incremented so that the ref always belongs to + * us, and the service code isn't punted off to a random thread pool to + * process. + */ + for (;;) { + wait_var_event(&rreq->nr_outstanding, + atomic_read(&rreq->nr_outstanding) == 1); + netfs_rreq_assess(rreq, false); + if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) + break; + cond_resched(); + } + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) { + trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_read); + ret = -EIO; + } + netfs_put_request(rreq, false, netfs_rreq_trace_put_hold); + } else { + /* If we decrement nr_outstanding to 0, the ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_outstanding)) + netfs_rreq_assess(rreq, false); + ret = 0; + } + return ret; +} diff --git a/fs/netfs/main.c b/fs/netfs/main.c new file mode 100644 index 000000000000..068568702957 --- /dev/null +++ b/fs/netfs/main.c @@ -0,0 +1,20 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Miscellaneous bits for the netfs support library. + * + * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/module.h> +#include <linux/export.h> +#include "internal.h" +#define CREATE_TRACE_POINTS +#include <trace/events/netfs.h> + +MODULE_DESCRIPTION("Network fs support"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +unsigned netfs_debug; +module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask"); diff --git a/fs/netfs/objects.c b/fs/netfs/objects.c new file mode 100644 index 000000000000..e86107b30ba4 --- /dev/null +++ b/fs/netfs/objects.c @@ -0,0 +1,160 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Object lifetime handling and tracing. + * + * Copyright (C) 2022 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include <linux/slab.h> +#include "internal.h" + +/* + * Allocate an I/O request and initialise it. + */ +struct netfs_io_request *netfs_alloc_request(struct address_space *mapping, + struct file *file, + loff_t start, size_t len, + enum netfs_io_origin origin) +{ + static atomic_t debug_ids; + struct inode *inode = file ? file_inode(file) : mapping->host; + struct netfs_i_context *ctx = netfs_i_context(inode); + struct netfs_io_request *rreq; + int ret; + + rreq = kzalloc(sizeof(struct netfs_io_request), GFP_KERNEL); + if (!rreq) + return ERR_PTR(-ENOMEM); + + rreq->start = start; + rreq->len = len; + rreq->origin = origin; + rreq->netfs_ops = ctx->ops; + rreq->mapping = mapping; + rreq->inode = inode; + rreq->i_size = i_size_read(inode); + rreq->debug_id = atomic_inc_return(&debug_ids); + INIT_LIST_HEAD(&rreq->subrequests); + refcount_set(&rreq->ref, 1); + __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); + if (rreq->netfs_ops->init_request) { + ret = rreq->netfs_ops->init_request(rreq, file); + if (ret < 0) { + kfree(rreq); + return ERR_PTR(ret); + } + } + + netfs_stat(&netfs_n_rh_rreq); + return rreq; +} + +void netfs_get_request(struct netfs_io_request *rreq, enum netfs_rreq_ref_trace what) +{ + int r; + + __refcount_inc(&rreq->ref, &r); + trace_netfs_rreq_ref(rreq->debug_id, r + 1, what); +} + +void netfs_clear_subrequests(struct netfs_io_request *rreq, bool was_async) +{ + struct netfs_io_subrequest *subreq; + + while (!list_empty(&rreq->subrequests)) { + subreq = list_first_entry(&rreq->subrequests, + struct netfs_io_subrequest, rreq_link); + list_del(&subreq->rreq_link); + netfs_put_subrequest(subreq, was_async, + netfs_sreq_trace_put_clear); + } +} + +static void netfs_free_request(struct work_struct *work) +{ + struct netfs_io_request *rreq = + container_of(work, struct netfs_io_request, work); + + netfs_clear_subrequests(rreq, false); + if (rreq->netfs_priv) + rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); + trace_netfs_rreq(rreq, netfs_rreq_trace_free); + if (rreq->cache_resources.ops) + rreq->cache_resources.ops->end_operation(&rreq->cache_resources); + kfree(rreq); + netfs_stat_d(&netfs_n_rh_rreq); +} + +void netfs_put_request(struct netfs_io_request *rreq, bool was_async, + enum netfs_rreq_ref_trace what) +{ + unsigned int debug_id = rreq->debug_id; + bool dead; + int r; + + dead = __refcount_dec_and_test(&rreq->ref, &r); + trace_netfs_rreq_ref(debug_id, r - 1, what); + if (dead) { + if (was_async) { + rreq->work.func = netfs_free_request; + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_free_request(&rreq->work); + } + } +} + +/* + * Allocate and partially initialise an I/O request structure. + */ +struct netfs_io_subrequest *netfs_alloc_subrequest(struct netfs_io_request *rreq) +{ + struct netfs_io_subrequest *subreq; + + subreq = kzalloc(sizeof(struct netfs_io_subrequest), GFP_KERNEL); + if (subreq) { + INIT_LIST_HEAD(&subreq->rreq_link); + refcount_set(&subreq->ref, 2); + subreq->rreq = rreq; + netfs_get_request(rreq, netfs_rreq_trace_get_subreq); + netfs_stat(&netfs_n_rh_sreq); + } + + return subreq; +} + +void netfs_get_subrequest(struct netfs_io_subrequest *subreq, + enum netfs_sreq_ref_trace what) +{ + int r; + + __refcount_inc(&subreq->ref, &r); + trace_netfs_sreq_ref(subreq->rreq->debug_id, subreq->debug_index, r + 1, + what); +} + +static void netfs_free_subrequest(struct netfs_io_subrequest *subreq, + bool was_async) +{ + struct netfs_io_request *rreq = subreq->rreq; + + trace_netfs_sreq(subreq, netfs_sreq_trace_free); + kfree(subreq); + netfs_stat_d(&netfs_n_rh_sreq); + netfs_put_request(rreq, was_async, netfs_rreq_trace_put_subreq); +} + +void netfs_put_subrequest(struct netfs_io_subrequest *subreq, bool was_async, + enum netfs_sreq_ref_trace what) +{ + unsigned int debug_index = subreq->debug_index; + unsigned int debug_id = subreq->rreq->debug_id; + bool dead; + int r; + + dead = __refcount_dec_and_test(&subreq->ref, &r); + trace_netfs_sreq_ref(debug_id, debug_index, r - 1, what); + if (dead) + netfs_free_subrequest(subreq, was_async); +} diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c deleted file mode 100644 index 501da990c259..000000000000 --- a/fs/netfs/read_helper.c +++ /dev/null @@ -1,1205 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0-or-later -/* Network filesystem high-level read support. - * - * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. - * Written by David Howells (dhowells@redhat.com) - */ - -#include <linux/module.h> -#include <linux/export.h> -#include <linux/fs.h> -#include <linux/mm.h> -#include <linux/pagemap.h> -#include <linux/slab.h> -#include <linux/uio.h> -#include <linux/sched/mm.h> -#include <linux/task_io_accounting_ops.h> -#include <linux/netfs.h> -#include "internal.h" -#define CREATE_TRACE_POINTS -#include <trace/events/netfs.h> - -MODULE_DESCRIPTION("Network fs support"); -MODULE_AUTHOR("Red Hat, Inc."); -MODULE_LICENSE("GPL"); - -unsigned netfs_debug; -module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO); -MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask"); - -static void netfs_rreq_work(struct work_struct *); -static void __netfs_put_subrequest(struct netfs_read_subrequest *, bool); - -static void netfs_put_subrequest(struct netfs_read_subrequest *subreq, - bool was_async) -{ - if (refcount_dec_and_test(&subreq->usage)) - __netfs_put_subrequest(subreq, was_async); -} - -static struct netfs_read_request *netfs_alloc_read_request( - const struct netfs_read_request_ops *ops, void *netfs_priv, - struct file *file) -{ - static atomic_t debug_ids; - struct netfs_read_request *rreq; - - rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL); - if (rreq) { - rreq->netfs_ops = ops; - rreq->netfs_priv = netfs_priv; - rreq->inode = file_inode(file); - rreq->i_size = i_size_read(rreq->inode); - rreq->debug_id = atomic_inc_return(&debug_ids); - INIT_LIST_HEAD(&rreq->subrequests); - INIT_WORK(&rreq->work, netfs_rreq_work); - refcount_set(&rreq->usage, 1); - __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); - if (ops->init_rreq) - ops->init_rreq(rreq, file); - netfs_stat(&netfs_n_rh_rreq); - } - - return rreq; -} - -static void netfs_get_read_request(struct netfs_read_request *rreq) -{ - refcount_inc(&rreq->usage); -} - -static void netfs_rreq_clear_subreqs(struct netfs_read_request *rreq, - bool was_async) -{ - struct netfs_read_subrequest *subreq; - - while (!list_empty(&rreq->subrequests)) { - subreq = list_first_entry(&rreq->subrequests, - struct netfs_read_subrequest, rreq_link); - list_del(&subreq->rreq_link); - netfs_put_subrequest(subreq, was_async); - } -} - -static void netfs_free_read_request(struct work_struct *work) -{ - struct netfs_read_request *rreq = - container_of(work, struct netfs_read_request, work); - netfs_rreq_clear_subreqs(rreq, false); - if (rreq->netfs_priv) - rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); - trace_netfs_rreq(rreq, netfs_rreq_trace_free); - if (rreq->cache_resources.ops) - rreq->cache_resources.ops->end_operation(&rreq->cache_resources); - kfree(rreq); - netfs_stat_d(&netfs_n_rh_rreq); -} - -static void netfs_put_read_request(struct netfs_read_request *rreq, bool was_async) -{ - if (refcount_dec_and_test(&rreq->usage)) { - if (was_async) { - rreq->work.func = netfs_free_read_request; - if (!queue_work(system_unbound_wq, &rreq->work)) - BUG(); - } else { - netfs_free_read_request(&rreq->work); - } - } -} - -/* - * Allocate and partially initialise an I/O request structure. - */ -static struct netfs_read_subrequest *netfs_alloc_subrequest( - struct netfs_read_request *rreq) -{ - struct netfs_read_subrequest *subreq; - - subreq = kzalloc(sizeof(struct netfs_read_subrequest), GFP_KERNEL); - if (subreq) { - INIT_LIST_HEAD(&subreq->rreq_link); - refcount_set(&subreq->usage, 2); - subreq->rreq = rreq; - netfs_get_read_request(rreq); - netfs_stat(&netfs_n_rh_sreq); - } - - return subreq; -} - -static void netfs_get_read_subrequest(struct netfs_read_subrequest *subreq) -{ - refcount_inc(&subreq->usage); -} - -static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, - bool was_async) -{ - struct netfs_read_request *rreq = subreq->rreq; - - trace_netfs_sreq(subreq, netfs_sreq_trace_free); - kfree(subreq); - netfs_stat_d(&netfs_n_rh_sreq); - netfs_put_read_request(rreq, was_async); -} - -/* - * Clear the unread part of an I/O request. - */ -static void netfs_clear_unread(struct netfs_read_subrequest *subreq) -{ - struct iov_iter iter; - - iov_iter_xarray(&iter, READ, &subreq->rreq->mapping->i_pages, - subreq->start + subreq->transferred, - subreq->len - subreq->transferred); - iov_iter_zero(iov_iter_count(&iter), &iter); -} - -static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, - bool was_async) -{ - struct netfs_read_subrequest *subreq = priv; - - netfs_subreq_terminated(subreq, transferred_or_error, was_async); -} - -/* - * Issue a read against the cache. - * - Eats the caller's ref on subreq. - */ -static void netfs_read_from_cache(struct netfs_read_request *rreq, - struct netfs_read_subrequest *subreq, - enum netfs_read_from_hole read_hole) -{ - struct netfs_cache_resources *cres = &rreq->cache_resources; - struct iov_iter iter; - - netfs_stat(&netfs_n_rh_read); - iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, - subreq->start + subreq->transferred, - subreq->len - subreq->transferred); - - cres->ops->read(cres, subreq->start, &iter, read_hole, - netfs_cache_read_terminated, subreq); -} - -/* - * Fill a subrequest region with zeroes. - */ -static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, - struct netfs_read_subrequest *subreq) -{ - netfs_stat(&netfs_n_rh_zero); - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); - netfs_subreq_terminated(subreq, 0, false); -} - -/* - * Ask the netfs to issue a read request to the server for us. - * - * The netfs is expected to read from subreq->pos + subreq->transferred to - * subreq->pos + subreq->len - 1. It may not backtrack and write data into the - * buffer prior to the transferred point as it might clobber dirty data - * obtained from the cache. - * - * Alternatively, the netfs is allowed to indicate one of two things: - * - * - NETFS_SREQ_SHORT_READ: A short read - it will get called again to try and - * make progress. - * - * - NETFS_SREQ_CLEAR_TAIL: A short read - the rest of the buffer will be - * cleared. - */ -static void netfs_read_from_server(struct netfs_read_request *rreq, - struct netfs_read_subrequest *subreq) -{ - netfs_stat(&netfs_n_rh_download); - rreq->netfs_ops->issue_op(subreq); -} - -/* - * Release those waiting. - */ -static void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async) -{ - trace_netfs_rreq(rreq, netfs_rreq_trace_done); - netfs_rreq_clear_subreqs(rreq, was_async); - netfs_put_read_request(rreq, was_async); -} - -/* - * Deal with the completion of writing the data to the cache. We have to clear - * the PG_fscache bits on the folios involved and release the caller's ref. - * - * May be called in softirq mode and we inherit a ref from the caller. - */ -static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq, - bool was_async) -{ - struct netfs_read_subrequest *subreq; - struct folio *folio; - pgoff_t unlocked = 0; - bool have_unlocked = false; - - rcu_read_lock(); - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE); - - xas_for_each(&xas, folio, (subreq->start + subreq->len - 1) / PAGE_SIZE) { - /* We might have multiple writes from the same huge - * folio, but we mustn't unlock a folio more than once. - */ - if (have_unlocked && folio_index(folio) <= unlocked) - continue; - unlocked = folio_index(folio); - folio_end_fscache(folio); - have_unlocked = true; - } - } - - rcu_read_unlock(); - netfs_rreq_completed(rreq, was_async); -} - -static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error, - bool was_async) -{ - struct netfs_read_subrequest *subreq = priv; - struct netfs_read_request *rreq = subreq->rreq; - - if (IS_ERR_VALUE(transferred_or_error)) { - netfs_stat(&netfs_n_rh_write_failed); - trace_netfs_failure(rreq, subreq, transferred_or_error, - netfs_fail_copy_to_cache); - } else { - netfs_stat(&netfs_n_rh_write_done); - } - - trace_netfs_sreq(subreq, netfs_sreq_trace_write_term); - - /* If we decrement nr_wr_ops to 0, the ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_wr_ops)) - netfs_rreq_unmark_after_write(rreq, was_async); - - netfs_put_subrequest(subreq, was_async); -} - -/* - * Perform any outstanding writes to the cache. We inherit a ref from the - * caller. - */ -static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq) -{ - struct netfs_cache_resources *cres = &rreq->cache_resources; - struct netfs_read_subrequest *subreq, *next, *p; - struct iov_iter iter; - int ret; - - trace_netfs_rreq(rreq, netfs_rreq_trace_write); - - /* We don't want terminating writes trying to wake us up whilst we're - * still going through the list. - */ - atomic_inc(&rreq->nr_wr_ops); - - list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { - if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { - list_del_init(&subreq->rreq_link); - netfs_put_subrequest(subreq, false); - } - } - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - /* Amalgamate adjacent writes */ - while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { - next = list_next_entry(subreq, rreq_link); - if (next->start != subreq->start + subreq->len) - break; - subreq->len += next->len; - list_del_init(&next->rreq_link); - netfs_put_subrequest(next, false); - } - - ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len, - rreq->i_size, true); - if (ret < 0) { - trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write); - trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip); - continue; - } - - iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages, - subreq->start, subreq->len); - - atomic_inc(&rreq->nr_wr_ops); - netfs_stat(&netfs_n_rh_write); - netfs_get_read_subrequest(subreq); - trace_netfs_sreq(subreq, netfs_sreq_trace_write); - cres->ops->write(cres, subreq->start, &iter, - netfs_rreq_copy_terminated, subreq); - } - - /* If we decrement nr_wr_ops to 0, the usage ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_wr_ops)) - netfs_rreq_unmark_after_write(rreq, false); -} - -static void netfs_rreq_write_to_cache_work(struct work_struct *work) -{ - struct netfs_read_request *rreq = - container_of(work, struct netfs_read_request, work); - - netfs_rreq_do_write_to_cache(rreq); -} - -static void netfs_rreq_write_to_cache(struct netfs_read_request *rreq) -{ - rreq->work.func = netfs_rreq_write_to_cache_work; - if (!queue_work(system_unbound_wq, &rreq->work)) - BUG(); -} - -/* - * Unlock the folios in a read operation. We need to set PG_fscache on any - * folios we're going to write back before we unlock them. - */ -static void netfs_rreq_unlock(struct netfs_read_request *rreq) -{ - struct netfs_read_subrequest *subreq; - struct folio *folio; - unsigned int iopos, account = 0; - pgoff_t start_page = rreq->start / PAGE_SIZE; - pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1; - bool subreq_failed = false; - - XA_STATE(xas, &rreq->mapping->i_pages, start_page); - - if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) { - __clear_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - __clear_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); - } - } - - /* Walk through the pagecache and the I/O request lists simultaneously. - * We may have a mixture of cached and uncached sections and we only - * really want to write out the uncached sections. This is slightly - * complicated by the possibility that we might have huge pages with a - * mixture inside. - */ - subreq = list_first_entry(&rreq->subrequests, - struct netfs_read_subrequest, rreq_link); - iopos = 0; - subreq_failed = (subreq->error < 0); - - trace_netfs_rreq(rreq, netfs_rreq_trace_unlock); - - rcu_read_lock(); - xas_for_each(&xas, folio, last_page) { - unsigned int pgpos = (folio_index(folio) - start_page) * PAGE_SIZE; - unsigned int pgend = pgpos + folio_size(folio); - bool pg_failed = false; - - for (;;) { - if (!subreq) { - pg_failed = true; - break; - } - if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) - folio_start_fscache(folio); - pg_failed |= subreq_failed; - if (pgend < iopos + subreq->len) - break; - - account += subreq->transferred; - iopos += subreq->len; - if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { - subreq = list_next_entry(subreq, rreq_link); - subreq_failed = (subreq->error < 0); - } else { - subreq = NULL; - subreq_failed = false; - } - if (pgend == iopos) - break; - } - - if (!pg_failed) { - flush_dcache_folio(folio); - folio_mark_uptodate(folio); - } - - if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) { - if (folio_index(folio) == rreq->no_unlock_folio && - test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags)) - _debug("no unlock"); - else - folio_unlock(folio); - } - } - rcu_read_unlock(); - - task_io_account_read(account); - if (rreq->netfs_ops->done) - rreq->netfs_ops->done(rreq); -} - -/* - * Handle a short read. - */ -static void netfs_rreq_short_read(struct netfs_read_request *rreq, - struct netfs_read_subrequest *subreq) -{ - __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); - __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); - - netfs_stat(&netfs_n_rh_short_read); - trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short); - - netfs_get_read_subrequest(subreq); - atomic_inc(&rreq->nr_rd_ops); - if (subreq->source == NETFS_READ_FROM_CACHE) - netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_CLEAR); - else - netfs_read_from_server(rreq, subreq); -} - -/* - * Resubmit any short or failed operations. Returns true if we got the rreq - * ref back. - */ -static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) -{ - struct netfs_read_subrequest *subreq; - - WARN_ON(in_interrupt()); - - trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); - - /* We don't want terminating submissions trying to wake us up whilst - * we're still going through the list. - */ - atomic_inc(&rreq->nr_rd_ops); - - __clear_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - if (subreq->error) { - if (subreq->source != NETFS_READ_FROM_CACHE) - break; - subreq->source = NETFS_DOWNLOAD_FROM_SERVER; - subreq->error = 0; - netfs_stat(&netfs_n_rh_download_instead); - trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead); - netfs_get_read_subrequest(subreq); - atomic_inc(&rreq->nr_rd_ops); - netfs_read_from_server(rreq, subreq); - } else if (test_bit(NETFS_SREQ_SHORT_READ, &subreq->flags)) { - netfs_rreq_short_read(rreq, subreq); - } - } - - /* If we decrement nr_rd_ops to 0, the usage ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_rd_ops)) - return true; - - wake_up_var(&rreq->nr_rd_ops); - return false; -} - -/* - * Check to see if the data read is still valid. - */ -static void netfs_rreq_is_still_valid(struct netfs_read_request *rreq) -{ - struct netfs_read_subrequest *subreq; - - if (!rreq->netfs_ops->is_still_valid || - rreq->netfs_ops->is_still_valid(rreq)) - return; - - list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { - if (subreq->source == NETFS_READ_FROM_CACHE) { - subreq->error = -ESTALE; - __set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); - } - } -} - -/* - * Assess the state of a read request and decide what to do next. - * - * Note that we could be in an ordinary kernel thread, on a workqueue or in - * softirq context at this point. We inherit a ref from the caller. - */ -static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async) -{ - trace_netfs_rreq(rreq, netfs_rreq_trace_assess); - -again: - netfs_rreq_is_still_valid(rreq); - - if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && - test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { - if (netfs_rreq_perform_resubmissions(rreq)) - goto again; - return; - } - - netfs_rreq_unlock(rreq); - - clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); - wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); - - if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags)) - return netfs_rreq_write_to_cache(rreq); - - netfs_rreq_completed(rreq, was_async); -} - -static void netfs_rreq_work(struct work_struct *work) -{ - struct netfs_read_request *rreq = - container_of(work, struct netfs_read_request, work); - netfs_rreq_assess(rreq, false); -} - -/* - * Handle the completion of all outstanding I/O operations on a read request. - * We inherit a ref from the caller. - */ -static void netfs_rreq_terminated(struct netfs_read_request *rreq, - bool was_async) -{ - if (test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags) && - was_async) { - if (!queue_work(system_unbound_wq, &rreq->work)) - BUG(); - } else { - netfs_rreq_assess(rreq, was_async); - } -} - -/** - * netfs_subreq_terminated - Note the termination of an I/O operation. - * @subreq: The I/O request that has terminated. - * @transferred_or_error: The amount of data transferred or an error code. - * @was_async: The termination was asynchronous - * - * This tells the read helper that a contributory I/O operation has terminated, - * one way or another, and that it should integrate the results. - * - * The caller indicates in @transferred_or_error the outcome of the operation, - * supplying a positive value to indicate the number of bytes transferred, 0 to - * indicate a failure to transfer anything that should be retried or a negative - * error code. The helper will look after reissuing I/O operations as - * appropriate and writing downloaded data to the cache. - * - * If @was_async is true, the caller might be running in softirq or interrupt - * context and we can't sleep. - */ -void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, - ssize_t transferred_or_error, - bool was_async) -{ - struct netfs_read_request *rreq = subreq->rreq; - int u; - - _enter("[%u]{%llx,%lx},%zd", - subreq->debug_index, subreq->start, subreq->flags, - transferred_or_error); - - switch (subreq->source) { - case NETFS_READ_FROM_CACHE: - netfs_stat(&netfs_n_rh_read_done); - break; - case NETFS_DOWNLOAD_FROM_SERVER: - netfs_stat(&netfs_n_rh_download_done); - break; - default: - break; - } - - if (IS_ERR_VALUE(transferred_or_error)) { - subreq->error = transferred_or_error; - trace_netfs_failure(rreq, subreq, transferred_or_error, - netfs_fail_read); - goto failed; - } - - if (WARN(transferred_or_error > subreq->len - subreq->transferred, - "Subreq overread: R%x[%x] %zd > %zu - %zu", - rreq->debug_id, subreq->debug_index, - transferred_or_error, subreq->len, subreq->transferred)) - transferred_or_error = subreq->len - subreq->transferred; - - subreq->error = 0; - subreq->transferred += transferred_or_error; - if (subreq->transferred < subreq->len) - goto incomplete; - -complete: - __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); - if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) - set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); - -out: - trace_netfs_sreq(subreq, netfs_sreq_trace_terminated); - - /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ - u = atomic_dec_return(&rreq->nr_rd_ops); - if (u == 0) - netfs_rreq_terminated(rreq, was_async); - else if (u == 1) - wake_up_var(&rreq->nr_rd_ops); - - netfs_put_subrequest(subreq, was_async); - return; - -incomplete: - if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { - netfs_clear_unread(subreq); - subreq->transferred = subreq->len; - goto complete; - } - - if (transferred_or_error == 0) { - if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { - subreq->error = -ENODATA; - goto failed; - } - } else { - __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); - } - - __set_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); - set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); - goto out; - -failed: - if (subreq->source == NETFS_READ_FROM_CACHE) { - netfs_stat(&netfs_n_rh_read_failed); - set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); - } else { - netfs_stat(&netfs_n_rh_download_failed); - set_bit(NETFS_RREQ_FAILED, &rreq->flags); - rreq->error = subreq->error; - } - goto out; -} -EXPORT_SYMBOL(netfs_subreq_terminated); - -static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequest *subreq, - loff_t i_size) -{ - struct netfs_read_request *rreq = subreq->rreq; - struct netfs_cache_resources *cres = &rreq->cache_resources; - - if (cres->ops) - return cres->ops->prepare_read(subreq, i_size); - if (subreq->start >= rreq->i_size) - return NETFS_FILL_WITH_ZEROES; - return NETFS_DOWNLOAD_FROM_SERVER; -} - -/* - * Work out what sort of subrequest the next one will be. - */ -static enum netfs_read_source -netfs_rreq_prepare_read(struct netfs_read_request *rreq, - struct netfs_read_subrequest *subreq) -{ - enum netfs_read_source source; - - _enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, rreq->i_size); - - source = netfs_cache_prepare_read(subreq, rreq->i_size); - if (source == NETFS_INVALID_READ) - goto out; - - if (source == NETFS_DOWNLOAD_FROM_SERVER) { - /* Call out to the netfs to let it shrink the request to fit - * its own I/O sizes and boundaries. If it shinks it here, it - * will be called again to make simultaneous calls; if it wants - * to make serial calls, it can indicate a short read and then - * we will call it again. - */ - if (subreq->len > rreq->i_size - subreq->start) - subreq->len = rreq->i_size - subreq->start; - - if (rreq->netfs_ops->clamp_length && - !rreq->netfs_ops->clamp_length(subreq)) { - source = NETFS_INVALID_READ; - goto out; - } - } - - if (WARN_ON(subreq->len == 0)) - source = NETFS_INVALID_READ; - -out: - subreq->source = source; - trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); - return source; -} - -/* - * Slice off a piece of a read request and submit an I/O request for it. - */ -static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, - unsigned int *_debug_index) -{ - struct netfs_read_subrequest *subreq; - enum netfs_read_source source; - - subreq = netfs_alloc_subrequest(rreq); - if (!subreq) - return false; - - subreq->debug_index = (*_debug_index)++; - subreq->start = rreq->start + rreq->submitted; - subreq->len = rreq->len - rreq->submitted; - - _debug("slice %llx,%zx,%zx", subreq->start, subreq->len, rreq->submitted); - list_add_tail(&subreq->rreq_link, &rreq->subrequests); - - /* Call out to the cache to find out what it can do with the remaining - * subset. It tells us in subreq->flags what it decided should be done - * and adjusts subreq->len down if the subset crosses a cache boundary. - * - * Then when we hand the subset, it can choose to take a subset of that - * (the starts must coincide), in which case, we go around the loop - * again and ask it to download the next piece. - */ - source = netfs_rreq_prepare_read(rreq, subreq); - if (source == NETFS_INVALID_READ) - goto subreq_failed; - - atomic_inc(&rreq->nr_rd_ops); - - rreq->submitted += subreq->len; - - trace_netfs_sreq(subreq, netfs_sreq_trace_submit); - switch (source) { - case NETFS_FILL_WITH_ZEROES: - netfs_fill_with_zeroes(rreq, subreq); - break; - case NETFS_DOWNLOAD_FROM_SERVER: - netfs_read_from_server(rreq, subreq); - break; - case NETFS_READ_FROM_CACHE: - netfs_read_from_cache(rreq, subreq, NETFS_READ_HOLE_IGNORE); - break; - default: - BUG(); - } - - return true; - -subreq_failed: - rreq->error = subreq->error; - netfs_put_subrequest(subreq, false); - return false; -} - -static void netfs_cache_expand_readahead(struct netfs_read_request *rreq, - loff_t *_start, size_t *_len, loff_t i_size) -{ - struct netfs_cache_resources *cres = &rreq->cache_resources; - - if (cres->ops && cres->ops->expand_readahead) - cres->ops->expand_readahead(cres, _start, _len, i_size); -} - -static void netfs_rreq_expand(struct netfs_read_request *rreq, - struct readahead_control *ractl) -{ - /* Give the cache a chance to change the request parameters. The - * resultant request must contain the original region. - */ - netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size); - - /* Give the netfs a chance to change the request parameters. The - * resultant request must contain the original region. - */ - if (rreq->netfs_ops->expand_readahead) - rreq->netfs_ops->expand_readahead(rreq); - - /* Expand the request if the cache wants it to start earlier. Note - * that the expansion may get further extended if the VM wishes to - * insert THPs and the preferred start and/or end wind up in the middle - * of THPs. - * - * If this is the case, however, the THP size should be an integer - * multiple of the cache granule size, so we get a whole number of - * granules to deal with. - */ - if (rreq->start != readahead_pos(ractl) || - rreq->len != readahead_length(ractl)) { - readahead_expand(ractl, rreq->start, rreq->len); - rreq->start = readahead_pos(ractl); - rreq->len = readahead_length(ractl); - - trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), - netfs_read_trace_expanded); - } -} - -/** - * netfs_readahead - Helper to manage a read request - * @ractl: The description of the readahead request - * @ops: The network filesystem's operations for the helper to use - * @netfs_priv: Private netfs data to be retained in the request - * - * Fulfil a readahead request by drawing data from the cache if possible, or - * the netfs if not. Space beyond the EOF is zero-filled. Multiple I/O - * requests from different sources will get munged together. If necessary, the - * readahead window can be expanded in either direction to a more convenient - * alighment for RPC efficiency or to make storage in the cache feasible. - * - * The calling netfs must provide a table of operations, only one of which, - * issue_op, is mandatory. It may also be passed a private token, which will - * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). - * - * This is usable whether or not caching is enabled. - */ -void netfs_readahead(struct readahead_control *ractl, - const struct netfs_read_request_ops *ops, - void *netfs_priv) -{ - struct netfs_read_request *rreq; - unsigned int debug_index = 0; - int ret; - - _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); - - if (readahead_count(ractl) == 0) - goto cleanup; - - rreq = netfs_alloc_read_request(ops, netfs_priv, ractl->file); - if (!rreq) - goto cleanup; - rreq->mapping = ractl->mapping; - rreq->start = readahead_pos(ractl); - rreq->len = readahead_length(ractl); - - if (ops->begin_cache_operation) { - ret = ops->begin_cache_operation(rreq); - if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) - goto cleanup_free; - } - - netfs_stat(&netfs_n_rh_readahead); - trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), - netfs_read_trace_readahead); - - netfs_rreq_expand(rreq, ractl); - - atomic_set(&rreq->nr_rd_ops, 1); - do { - if (!netfs_rreq_submit_slice(rreq, &debug_index)) - break; - - } while (rreq->submitted < rreq->len); - - /* Drop the refs on the folios here rather than in the cache or - * filesystem. The locks will be dropped in netfs_rreq_unlock(). - */ - while (readahead_folio(ractl)) - ; - - /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ - if (atomic_dec_and_test(&rreq->nr_rd_ops)) - netfs_rreq_assess(rreq, false); - return; - -cleanup_free: - netfs_put_read_request(rreq, false); - return; -cleanup: - if (netfs_priv) - ops->cleanup(ractl->mapping, netfs_priv); - return; -} -EXPORT_SYMBOL(netfs_readahead); - -/** - * netfs_readpage - Helper to manage a readpage request - * @file: The file to read from - * @folio: The folio to read - * @ops: The network filesystem's operations for the helper to use - * @netfs_priv: Private netfs data to be retained in the request - * - * Fulfil a readpage request by drawing data from the cache if possible, or the - * netfs if not. Space beyond the EOF is zero-filled. Multiple I/O requests - * from different sources will get munged together. - * - * The calling netfs must provide a table of operations, only one of which, - * issue_op, is mandatory. It may also be passed a private token, which will - * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). - * - * This is usable whether or not caching is enabled. - */ -int netfs_readpage(struct file *file, - struct folio *folio, - const struct netfs_read_request_ops *ops, - void *netfs_priv) -{ - struct netfs_read_request *rreq; - unsigned int debug_index = 0; - int ret; - - _enter("%lx", folio_index(folio)); - - rreq = netfs_alloc_read_request(ops, netfs_priv, file); - if (!rreq) { - if (netfs_priv) - ops->cleanup(folio_file_mapping(folio), netfs_priv); - folio_unlock(folio); - return -ENOMEM; - } - rreq->mapping = folio_file_mapping(folio); - rreq->start = folio_file_pos(folio); - rreq->len = folio_size(folio); - - if (ops->begin_cache_operation) { - ret = ops->begin_cache_operation(rreq); - if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) { - folio_unlock(folio); - goto out; - } - } - - netfs_stat(&netfs_n_rh_readpage); - trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); - - netfs_get_read_request(rreq); - - atomic_set(&rreq->nr_rd_ops, 1); - do { - if (!netfs_rreq_submit_slice(rreq, &debug_index)) - break; - - } while (rreq->submitted < rreq->len); - - /* Keep nr_rd_ops incremented so that the ref always belongs to us, and - * the service code isn't punted off to a random thread pool to - * process. - */ - do { - wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); - netfs_rreq_assess(rreq, false); - } while (test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)); - - ret = rreq->error; - if (ret == 0 && rreq->submitted < rreq->len) { - trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_readpage); - ret = -EIO; - } -out: - netfs_put_read_request(rreq, false); - return ret; -} -EXPORT_SYMBOL(netfs_readpage); - -/* - * Prepare a folio for writing without reading first - * @folio: The folio being prepared - * @pos: starting position for the write - * @len: length of write - * - * In some cases, write_begin doesn't need to read at all: - * - full folio write - * - write that lies in a folio that is completely beyond EOF - * - write that covers the folio from start to EOF or beyond it - * - * If any of these criteria are met, then zero out the unwritten parts - * of the folio and return true. Otherwise, return false. - */ -static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len) -{ - struct inode *inode = folio_inode(folio); - loff_t i_size = i_size_read(inode); - size_t offset = offset_in_folio(folio, pos); - - /* Full folio write */ - if (offset == 0 && len >= folio_size(folio)) - return true; - - /* pos beyond last folio in the file */ - if (pos - offset >= i_size) - goto zero_out; - - /* Write that covers from the start of the folio to EOF or beyond */ - if (offset == 0 && (pos + len) >= i_size) - goto zero_out; - - return false; -zero_out: - zero_user_segments(&folio->page, 0, offset, offset + len, folio_size(folio)); - return true; -} - -/** - * netfs_write_begin - Helper to prepare for writing - * @file: The file to read from - * @mapping: The mapping to read from - * @pos: File position at which the write will begin - * @len: The length of the write (may extend beyond the end of the folio chosen) - * @aop_flags: AOP_* flags - * @_folio: Where to put the resultant folio - * @_fsdata: Place for the netfs to store a cookie - * @ops: The network filesystem's operations for the helper to use - * @netfs_priv: Private netfs data to be retained in the request - * - * Pre-read data for a write-begin request by drawing data from the cache if - * possible, or the netfs if not. Space beyond the EOF is zero-filled. - * Multiple I/O requests from different sources will get munged together. If - * necessary, the readahead window can be expanded in either direction to a - * more convenient alighment for RPC efficiency or to make storage in the cache - * feasible. - * - * The calling netfs must provide a table of operations, only one of which, - * issue_op, is mandatory. - * - * The check_write_begin() operation can be provided to check for and flush - * conflicting writes once the folio is grabbed and locked. It is passed a - * pointer to the fsdata cookie that gets returned to the VM to be passed to - * write_end. It is permitted to sleep. It should return 0 if the request - * should go ahead; unlock the folio and return -EAGAIN to cause the folio to - * be regot; or return an error. - * - * This is usable whether or not caching is enabled. - */ -int netfs_write_begin(struct file *file, struct address_space *mapping, - loff_t pos, unsigned int len, unsigned int aop_flags, - struct folio **_folio, void **_fsdata, - const struct netfs_read_request_ops *ops, - void *netfs_priv) -{ - struct netfs_read_request *rreq; - struct folio *folio; - struct inode *inode = file_inode(file); - unsigned int debug_index = 0, fgp_flags; - pgoff_t index = pos >> PAGE_SHIFT; - int ret; - - DEFINE_READAHEAD(ractl, file, NULL, mapping, index); - -retry: - fgp_flags = FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE; - if (aop_flags & AOP_FLAG_NOFS) - fgp_flags |= FGP_NOFS; - folio = __filemap_get_folio(mapping, index, fgp_flags, - mapping_gfp_mask(mapping)); - if (!folio) - return -ENOMEM; - - if (ops->check_write_begin) { - /* Allow the netfs (eg. ceph) to flush conflicts. */ - ret = ops->check_write_begin(file, pos, len, folio, _fsdata); - if (ret < 0) { - trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin); - if (ret == -EAGAIN) - goto retry; - goto error; - } - } - - if (folio_test_uptodate(folio)) - goto have_folio; - - /* If the page is beyond the EOF, we want to clear it - unless it's - * within the cache granule containing the EOF, in which case we need - * to preload the granule. - */ - if (!ops->is_cache_enabled(inode) && - netfs_skip_folio_read(folio, pos, len)) { - netfs_stat(&netfs_n_rh_write_zskip); - goto have_folio_no_wait; - } - - ret = -ENOMEM; - rreq = netfs_alloc_read_request(ops, netfs_priv, file); - if (!rreq) - goto error; - rreq->mapping = folio_file_mapping(folio); - rreq->start = folio_file_pos(folio); - rreq->len = folio_size(folio); - rreq->no_unlock_folio = folio_index(folio); - __set_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags); - netfs_priv = NULL; - - if (ops->begin_cache_operation) { - ret = ops->begin_cache_operation(rreq); - if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) - goto error_put; - } - - netfs_stat(&netfs_n_rh_write_begin); - trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin); - - /* Expand the request to meet caching requirements and download - * preferences. - */ - ractl._nr_pages = folio_nr_pages(folio); - netfs_rreq_expand(rreq, &ractl); - netfs_get_read_request(rreq); - - /* We hold the folio locks, so we can drop the references */ - folio_get(folio); - while (readahead_folio(&ractl)) - ; - - atomic_set(&rreq->nr_rd_ops, 1); - do { - if (!netfs_rreq_submit_slice(rreq, &debug_index)) - break; - - } while (rreq->submitted < rreq->len); - - /* Keep nr_rd_ops incremented so that the ref always belongs to us, and - * the service code isn't punted off to a random thread pool to - * process. - */ - for (;;) { - wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); - netfs_rreq_assess(rreq, false); - if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) - break; - cond_resched(); - } - - ret = rreq->error; - if (ret == 0 && rreq->submitted < rreq->len) { - trace_netfs_failure(rreq, NULL, ret, netfs_fail_short_write_begin); - ret = -EIO; - } - netfs_put_read_request(rreq, false); - if (ret < 0) - goto error; - -have_folio: - ret = folio_wait_fscache_killable(folio); - if (ret < 0) - goto error; -have_folio_no_wait: - if (netfs_priv) - ops->cleanup(mapping, netfs_priv); - *_folio = folio; - _leave(" = 0"); - return 0; - -error_put: - netfs_put_read_request(rreq, false); -error: - folio_unlock(folio); - folio_put(folio); - if (netfs_priv) - ops->cleanup(mapping, netfs_priv); - _leave(" = %d", ret); - return ret; -} -EXPORT_SYMBOL(netfs_write_begin); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index 9ae538c85378..5510a7a14a40 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -7,7 +7,6 @@ #include <linux/export.h> #include <linux/seq_file.h> -#include <linux/netfs.h> #include "internal.h" atomic_t netfs_n_rh_readahead; diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index 4dee53ceb941..f73c09a9cf0a 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -238,14 +238,6 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp) } } -static inline void fscache_end_operation(struct netfs_cache_resources *cres) -{ - const struct netfs_cache_ops *ops = fscache_operation_valid(cres); - - if (ops) - ops->end_operation(cres); -} - /* * Fallback page reading interface. */ diff --git a/include/linux/fscache.h b/include/linux/fscache.h index d44ff747a657..6727fb0db619 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -457,6 +457,20 @@ int fscache_begin_read_operation(struct netfs_cache_resources *cres, } /** + * fscache_end_operation - End the read operation for the netfs lib + * @cres: The cache resources for the read operation + * + * Clean up the resources at the end of the read request. + */ +static inline void fscache_end_operation(struct netfs_cache_resources *cres) +{ + const struct netfs_cache_ops *ops = fscache_operation_valid(cres); + + if (ops) + ops->end_operation(cres); +} + +/** * fscache_read - Start a read from the cache. * @cres: The cache resources to use * @start_pos: The beginning file offset in the cache file diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 614f22213e21..c7bf1eaf51d5 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -18,6 +18,8 @@ #include <linux/fs.h> #include <linux/pagemap.h> +enum netfs_sreq_ref_trace; + /* * Overload PG_private_2 to give us PG_fscache - this is used to indicate that * a page is currently backed by a local disk cache @@ -106,7 +108,7 @@ static inline int wait_on_page_fscache_killable(struct page *page) return folio_wait_private_2_killable(page_folio(page)); } -enum netfs_read_source { +enum netfs_io_source { NETFS_FILL_WITH_ZEROES, NETFS_DOWNLOAD_FROM_SERVER, NETFS_READ_FROM_CACHE, @@ -117,6 +119,17 @@ typedef void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error, bool was_async); /* + * Per-inode description. This must be directly after the inode struct. + */ +struct netfs_i_context { + const struct netfs_request_ops *ops; +#if IS_ENABLED(CONFIG_FSCACHE) + struct fscache_cookie *cache; +#endif + loff_t remote_i_size; /* Size of the remote file */ +}; + +/* * Resources required to do operations on a cache. */ struct netfs_cache_resources { @@ -130,69 +143,75 @@ struct netfs_cache_resources { /* * Descriptor for a single component subrequest. */ -struct netfs_read_subrequest { - struct netfs_read_request *rreq; /* Supervising read request */ +struct netfs_io_subrequest { + struct netfs_io_request *rreq; /* Supervising I/O request */ struct list_head rreq_link; /* Link in rreq->subrequests */ loff_t start; /* Where to start the I/O */ size_t len; /* Size of the I/O */ size_t transferred; /* Amount of data transferred */ - refcount_t usage; + refcount_t ref; short error; /* 0 or error that occurred */ unsigned short debug_index; /* Index in list (for debugging output) */ - enum netfs_read_source source; /* Where to read from */ + enum netfs_io_source source; /* Where to read from/write to */ unsigned long flags; -#define NETFS_SREQ_WRITE_TO_CACHE 0 /* Set if should write to cache */ +#define NETFS_SREQ_COPY_TO_CACHE 0 /* Set if should copy the data to the cache */ #define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */ -#define NETFS_SREQ_SHORT_READ 2 /* Set if there was a short read from the cache */ +#define NETFS_SREQ_SHORT_IO 2 /* Set if the I/O was short */ #define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */ #define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */ }; +enum netfs_io_origin { + NETFS_READAHEAD, /* This read was triggered by readahead */ + NETFS_READPAGE, /* This read is a synchronous read */ + NETFS_READ_FOR_WRITE, /* This read is to prepare a write */ +} __mode(byte); + /* - * Descriptor for a read helper request. This is used to make multiple I/O - * requests on a variety of sources and then stitch the result together. + * Descriptor for an I/O helper request. This is used to make multiple I/O + * operations to a variety of data stores and then stitch the result together. */ -struct netfs_read_request { +struct netfs_io_request { struct work_struct work; struct inode *inode; /* The file being accessed */ struct address_space *mapping; /* The mapping being accessed */ struct netfs_cache_resources cache_resources; - struct list_head subrequests; /* Requests to fetch I/O from disk or net */ + struct list_head subrequests; /* Contributory I/O operations */ void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; - atomic_t nr_rd_ops; /* Number of read ops in progress */ - atomic_t nr_wr_ops; /* Number of write ops in progress */ + atomic_t nr_outstanding; /* Number of ops in progress */ + atomic_t nr_copy_ops; /* Number of copy-to-cache ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ + enum netfs_io_origin origin; /* Origin of the request */ loff_t i_size; /* Size of the file */ loff_t start; /* Start position */ pgoff_t no_unlock_folio; /* Don't unlock this folio after read */ - refcount_t usage; + refcount_t ref; unsigned long flags; #define NETFS_RREQ_INCOMPLETE_IO 0 /* Some ioreqs terminated short or with error */ -#define NETFS_RREQ_WRITE_TO_CACHE 1 /* Need to write to the cache */ +#define NETFS_RREQ_COPY_TO_CACHE 1 /* Need to write to the cache */ #define NETFS_RREQ_NO_UNLOCK_FOLIO 2 /* Don't unlock no_unlock_folio on completion */ #define NETFS_RREQ_DONT_UNLOCK_FOLIOS 3 /* Don't unlock the folios on completion */ #define NETFS_RREQ_FAILED 4 /* The request failed */ #define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */ - const struct netfs_read_request_ops *netfs_ops; + const struct netfs_request_ops *netfs_ops; }; /* * Operations the network filesystem can/must provide to the helpers. */ -struct netfs_read_request_ops { - bool (*is_cache_enabled)(struct inode *inode); - void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); - int (*begin_cache_operation)(struct netfs_read_request *rreq); - void (*expand_readahead)(struct netfs_read_request *rreq); - bool (*clamp_length)(struct netfs_read_subrequest *subreq); - void (*issue_op)(struct netfs_read_subrequest *subreq); - bool (*is_still_valid)(struct netfs_read_request *rreq); +struct netfs_request_ops { + int (*init_request)(struct netfs_io_request *rreq, struct file *file); + int (*begin_cache_operation)(struct netfs_io_request *rreq); + void (*expand_readahead)(struct netfs_io_request *rreq); + bool (*clamp_length)(struct netfs_io_subrequest *subreq); + void (*issue_read)(struct netfs_io_subrequest *subreq); + bool (*is_still_valid)(struct netfs_io_request *rreq); int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, struct folio *folio, void **_fsdata); - void (*done)(struct netfs_read_request *rreq); + void (*done)(struct netfs_io_request *rreq); void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; @@ -235,7 +254,7 @@ struct netfs_cache_ops { /* Prepare a read operation, shortening it to a cached/uncached * boundary as appropriate. */ - enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq, + enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq, loff_t i_size); /* Prepare a write operation, working out what part of the write we can @@ -254,20 +273,89 @@ struct netfs_cache_ops { }; struct readahead_control; -extern void netfs_readahead(struct readahead_control *, - const struct netfs_read_request_ops *, - void *); -extern int netfs_readpage(struct file *, - struct folio *, - const struct netfs_read_request_ops *, - void *); +extern void netfs_readahead(struct readahead_control *); +extern int netfs_readpage(struct file *, struct page *); extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, unsigned int, struct folio **, - void **, - const struct netfs_read_request_ops *, - void *); + void **); -extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); +extern void netfs_subreq_terminated(struct netfs_io_subrequest *, ssize_t, bool); +extern void netfs_get_subrequest(struct netfs_io_subrequest *subreq, + enum netfs_sreq_ref_trace what); +extern void netfs_put_subrequest(struct netfs_io_subrequest *subreq, + bool was_async, enum netfs_sreq_ref_trace what); extern void netfs_stats_show(struct seq_file *); +/** + * netfs_i_context - Get the netfs inode context from the inode + * @inode: The inode to query + * + * Get the netfs lib inode context from the network filesystem's inode. The + * context struct is expected to directly follow on from the VFS inode struct. + */ +static inline struct netfs_i_context *netfs_i_context(struct inode *inode) +{ + return (struct netfs_i_context *)(inode + 1); +} + +/** + * netfs_inode - Get the netfs inode from the inode context + * @ctx: The context to query + * + * Get the netfs inode from the netfs library's inode context. The VFS inode + * is expected to directly precede the context struct. + */ +static inline struct inode *netfs_inode(struct netfs_i_context *ctx) +{ + return ((struct inode *)ctx) - 1; +} + +/** + * netfs_i_context_init - Initialise a netfs lib context + * @inode: The inode with which the context is associated + * @ops: The netfs's operations list + * + * Initialise the netfs library context struct. This is expected to follow on + * directly from the VFS inode struct. + */ +static inline void netfs_i_context_init(struct inode *inode, + const struct netfs_request_ops *ops) +{ + struct netfs_i_context *ctx = netfs_i_context(inode); + + memset(ctx, 0, sizeof(*ctx)); + ctx->ops = ops; + ctx->remote_i_size = i_size_read(inode); +} + +/** + * netfs_resize_file - Note that a file got resized + * @inode: The inode being resized + * @new_i_size: The new file size + * + * Inform the netfs lib that a file got resized so that it can adjust its state. + */ +static inline void netfs_resize_file(struct inode *inode, loff_t new_i_size) +{ + struct netfs_i_context *ctx = netfs_i_context(inode); + + ctx->remote_i_size = new_i_size; +} + +/** + * netfs_i_cookie - Get the cache cookie from the inode + * @inode: The inode to query + * + * Get the caching cookie (if enabled) from the network filesystem's inode. + */ +static inline struct fscache_cookie *netfs_i_cookie(struct inode *inode) +{ +#if IS_ENABLED(CONFIG_FSCACHE) + struct netfs_i_context *ctx = netfs_i_context(inode); + return ctx->cache; +#else + return NULL; +#endif +} + #endif /* _LINUX_NETFS_H */ diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h index 2c530637e10a..311c14a20e70 100644 --- a/include/trace/events/cachefiles.h +++ b/include/trace/events/cachefiles.h @@ -426,8 +426,8 @@ TRACE_EVENT(cachefiles_vol_coherency, ); TRACE_EVENT(cachefiles_prep_read, - TP_PROTO(struct netfs_read_subrequest *sreq, - enum netfs_read_source source, + TP_PROTO(struct netfs_io_subrequest *sreq, + enum netfs_io_source source, enum cachefiles_prepare_read_trace why, ino_t cache_inode), @@ -437,7 +437,7 @@ TRACE_EVENT(cachefiles_prep_read, __field(unsigned int, rreq ) __field(unsigned short, index ) __field(unsigned short, flags ) - __field(enum netfs_read_source, source ) + __field(enum netfs_io_source, source ) __field(enum cachefiles_prepare_read_trace, why ) __field(size_t, len ) __field(loff_t, start ) diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index e6f4ebbb4c69..beec534cbaab 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -15,63 +15,25 @@ /* * Define enums for tracing information. */ -#ifndef __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY -#define __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY - -enum netfs_read_trace { - netfs_read_trace_expanded, - netfs_read_trace_readahead, - netfs_read_trace_readpage, - netfs_read_trace_write_begin, -}; - -enum netfs_rreq_trace { - netfs_rreq_trace_assess, - netfs_rreq_trace_done, - netfs_rreq_trace_free, - netfs_rreq_trace_resubmit, - netfs_rreq_trace_unlock, - netfs_rreq_trace_unmark, - netfs_rreq_trace_write, -}; - -enum netfs_sreq_trace { - netfs_sreq_trace_download_instead, - netfs_sreq_trace_free, - netfs_sreq_trace_prepare, - netfs_sreq_trace_resubmit_short, - netfs_sreq_trace_submit, - netfs_sreq_trace_terminated, - netfs_sreq_trace_write, - netfs_sreq_trace_write_skip, - netfs_sreq_trace_write_term, -}; - -enum netfs_failure { - netfs_fail_check_write_begin, - netfs_fail_copy_to_cache, - netfs_fail_read, - netfs_fail_short_readpage, - netfs_fail_short_write_begin, - netfs_fail_prepare_write, -}; - -#endif - #define netfs_read_traces \ EM(netfs_read_trace_expanded, "EXPANDED ") \ EM(netfs_read_trace_readahead, "READAHEAD") \ EM(netfs_read_trace_readpage, "READPAGE ") \ E_(netfs_read_trace_write_begin, "WRITEBEGN") +#define netfs_rreq_origins \ + EM(NETFS_READAHEAD, "RA") \ + EM(NETFS_READPAGE, "RP") \ + E_(NETFS_READ_FOR_WRITE, "RW") + #define netfs_rreq_traces \ - EM(netfs_rreq_trace_assess, "ASSESS") \ - EM(netfs_rreq_trace_done, "DONE ") \ - EM(netfs_rreq_trace_free, "FREE ") \ - EM(netfs_rreq_trace_resubmit, "RESUBM") \ - EM(netfs_rreq_trace_unlock, "UNLOCK") \ - EM(netfs_rreq_trace_unmark, "UNMARK") \ - E_(netfs_rreq_trace_write, "WRITE ") + EM(netfs_rreq_trace_assess, "ASSESS ") \ + EM(netfs_rreq_trace_copy, "COPY ") \ + EM(netfs_rreq_trace_done, "DONE ") \ + EM(netfs_rreq_trace_free, "FREE ") \ + EM(netfs_rreq_trace_resubmit, "RESUBMT") \ + EM(netfs_rreq_trace_unlock, "UNLOCK ") \ + E_(netfs_rreq_trace_unmark, "UNMARK ") #define netfs_sreq_sources \ EM(NETFS_FILL_WITH_ZEROES, "ZERO") \ @@ -94,10 +56,47 @@ enum netfs_failure { EM(netfs_fail_check_write_begin, "check-write-begin") \ EM(netfs_fail_copy_to_cache, "copy-to-cache") \ EM(netfs_fail_read, "read") \ - EM(netfs_fail_short_readpage, "short-readpage") \ - EM(netfs_fail_short_write_begin, "short-write-begin") \ + EM(netfs_fail_short_read, "short-read") \ E_(netfs_fail_prepare_write, "prep-write") +#define netfs_rreq_ref_traces \ + EM(netfs_rreq_trace_get_hold, "GET HOLD ") \ + EM(netfs_rreq_trace_get_subreq, "GET SUBREQ ") \ + EM(netfs_rreq_trace_put_complete, "PUT COMPLT ") \ + EM(netfs_rreq_trace_put_discard, "PUT DISCARD") \ + EM(netfs_rreq_trace_put_failed, "PUT FAILED ") \ + EM(netfs_rreq_trace_put_hold, "PUT HOLD ") \ + EM(netfs_rreq_trace_put_subreq, "PUT SUBREQ ") \ + EM(netfs_rreq_trace_put_zero_len, "PUT ZEROLEN") \ + E_(netfs_rreq_trace_new, "NEW ") + +#define netfs_sreq_ref_traces \ + EM(netfs_sreq_trace_get_copy_to_cache, "GET COPY2C ") \ + EM(netfs_sreq_trace_get_resubmit, "GET RESUBMIT") \ + EM(netfs_sreq_trace_get_short_read, "GET SHORTRD") \ + EM(netfs_sreq_trace_new, "NEW ") \ + EM(netfs_sreq_trace_put_clear, "PUT CLEAR ") \ + EM(netfs_sreq_trace_put_failed, "PUT FAILED ") \ + EM(netfs_sreq_trace_put_merged, "PUT MERGED ") \ + EM(netfs_sreq_trace_put_no_copy, "PUT NO COPY") \ + E_(netfs_sreq_trace_put_terminated, "PUT TERM ") + +#ifndef __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY +#define __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY + +#undef EM +#undef E_ +#define EM(a, b) a, +#define E_(a, b) a + +enum netfs_read_trace { netfs_read_traces } __mode(byte); +enum netfs_rreq_trace { netfs_rreq_traces } __mode(byte); +enum netfs_sreq_trace { netfs_sreq_traces } __mode(byte); +enum netfs_failure { netfs_failures } __mode(byte); +enum netfs_rreq_ref_trace { netfs_rreq_ref_traces } __mode(byte); +enum netfs_sreq_ref_trace { netfs_sreq_ref_traces } __mode(byte); + +#endif /* * Export enum symbols via userspace. @@ -108,10 +107,13 @@ enum netfs_failure { #define E_(a, b) TRACE_DEFINE_ENUM(a); netfs_read_traces; +netfs_rreq_origins; netfs_rreq_traces; netfs_sreq_sources; netfs_sreq_traces; netfs_failures; +netfs_rreq_ref_traces; +netfs_sreq_ref_traces; /* * Now redefine the EM() and E_() macros to map the enums to the strings that @@ -123,7 +125,7 @@ netfs_failures; #define E_(a, b) { a, b } TRACE_EVENT(netfs_read, - TP_PROTO(struct netfs_read_request *rreq, + TP_PROTO(struct netfs_io_request *rreq, loff_t start, size_t len, enum netfs_read_trace what), @@ -156,31 +158,34 @@ TRACE_EVENT(netfs_read, ); TRACE_EVENT(netfs_rreq, - TP_PROTO(struct netfs_read_request *rreq, + TP_PROTO(struct netfs_io_request *rreq, enum netfs_rreq_trace what), TP_ARGS(rreq, what), TP_STRUCT__entry( __field(unsigned int, rreq ) - __field(unsigned short, flags ) + __field(unsigned int, flags ) + __field(enum netfs_io_origin, origin ) __field(enum netfs_rreq_trace, what ) ), TP_fast_assign( __entry->rreq = rreq->debug_id; __entry->flags = rreq->flags; + __entry->origin = rreq->origin; __entry->what = what; ), - TP_printk("R=%08x %s f=%02x", + TP_printk("R=%08x %s %s f=%02x", __entry->rreq, + __print_symbolic(__entry->origin, netfs_rreq_origins), __print_symbolic(__entry->what, netfs_rreq_traces), __entry->flags) ); TRACE_EVENT(netfs_sreq, - TP_PROTO(struct netfs_read_subrequest *sreq, + TP_PROTO(struct netfs_io_subrequest *sreq, enum netfs_sreq_trace what), TP_ARGS(sreq, what), @@ -190,7 +195,7 @@ TRACE_EVENT(netfs_sreq, __field(unsigned short, index ) __field(short, error ) __field(unsigned short, flags ) - __field(enum netfs_read_source, source ) + __field(enum netfs_io_source, source ) __field(enum netfs_sreq_trace, what ) __field(size_t, len ) __field(size_t, transferred ) @@ -211,26 +216,26 @@ TRACE_EVENT(netfs_sreq, TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx/%zx e=%d", __entry->rreq, __entry->index, - __print_symbolic(__entry->what, netfs_sreq_traces), __print_symbolic(__entry->source, netfs_sreq_sources), + __print_symbolic(__entry->what, netfs_sreq_traces), __entry->flags, __entry->start, __entry->transferred, __entry->len, __entry->error) ); TRACE_EVENT(netfs_failure, - TP_PROTO(struct netfs_read_request *rreq, - struct netfs_read_subrequest *sreq, + TP_PROTO(struct netfs_io_request *rreq, + struct netfs_io_subrequest *sreq, int error, enum netfs_failure what), TP_ARGS(rreq, sreq, error, what), TP_STRUCT__entry( __field(unsigned int, rreq ) - __field(unsigned short, index ) + __field(short, index ) __field(short, error ) __field(unsigned short, flags ) - __field(enum netfs_read_source, source ) + __field(enum netfs_io_source, source ) __field(enum netfs_failure, what ) __field(size_t, len ) __field(size_t, transferred ) @@ -239,17 +244,17 @@ TRACE_EVENT(netfs_failure, TP_fast_assign( __entry->rreq = rreq->debug_id; - __entry->index = sreq ? sreq->debug_index : 0; + __entry->index = sreq ? sreq->debug_index : -1; __entry->error = error; __entry->flags = sreq ? sreq->flags : 0; __entry->source = sreq ? sreq->source : NETFS_INVALID_READ; __entry->what = what; - __entry->len = sreq ? sreq->len : 0; + __entry->len = sreq ? sreq->len : rreq->len; __entry->transferred = sreq ? sreq->transferred : 0; __entry->start = sreq ? sreq->start : 0; ), - TP_printk("R=%08x[%u] %s f=%02x s=%llx %zx/%zx %s e=%d", + TP_printk("R=%08x[%d] %s f=%02x s=%llx %zx/%zx %s e=%d", __entry->rreq, __entry->index, __print_symbolic(__entry->source, netfs_sreq_sources), __entry->flags, @@ -258,6 +263,59 @@ TRACE_EVENT(netfs_failure, __entry->error) ); +TRACE_EVENT(netfs_rreq_ref, + TP_PROTO(unsigned int rreq_debug_id, int ref, + enum netfs_rreq_ref_trace what), + + TP_ARGS(rreq_debug_id, ref, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(int, ref ) + __field(enum netfs_rreq_ref_trace, what ) + ), + + TP_fast_assign( + __entry->rreq = rreq_debug_id; + __entry->ref = ref; + __entry->what = what; + ), + + TP_printk("R=%08x %s r=%u", + __entry->rreq, + __print_symbolic(__entry->what, netfs_rreq_ref_traces), + __entry->ref) + ); + +TRACE_EVENT(netfs_sreq_ref, + TP_PROTO(unsigned int rreq_debug_id, unsigned int subreq_debug_index, + int ref, enum netfs_sreq_ref_trace what), + + TP_ARGS(rreq_debug_id, subreq_debug_index, ref, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned int, subreq ) + __field(int, ref ) + __field(enum netfs_sreq_ref_trace, what ) + ), + + TP_fast_assign( + __entry->rreq = rreq_debug_id; + __entry->subreq = subreq_debug_index; + __entry->ref = ref; + __entry->what = what; + ), + + TP_printk("R=%08x[%x] %s r=%u", + __entry->rreq, + __entry->subreq, + __print_symbolic(__entry->what, netfs_sreq_ref_traces), + __entry->ref) + ); + +#undef EM +#undef E_ #endif /* _TRACE_NETFS_H */ /* This part must be outside protection */ |