diff options
author | Christian Brauner <brauner@kernel.org> | 2024-12-20 22:34:18 +0100 |
---|---|---|
committer | Christian Brauner <brauner@kernel.org> | 2024-12-20 22:34:18 +0100 |
commit | 7a47db23a9f003614e15c687d2a5425c175a9ca8 (patch) | |
tree | 38292db9d7f2b020dc647d0871a5c938c7a82cc0 /include/linux/netfs.h | |
parent | 5fe85a5c513344161cde33b79f8badc81b8aa8d3 (diff) | |
parent | 794d8cf3a87a6b958d520a7c32d142f7ec30cb92 (diff) |
Merge patch series "netfs: Read performance improvements and "single-blob" support"
David Howells <dhowells@redhat.com> says:
This set of patches is primarily about two things: improving read
performance and supporting monolithic single-blob objects that have to be
read/written as such (e.g. AFS directory contents). The implementation of
the two parts is interwoven as each makes the other possible.
READ PERFORMANCE
================
The read performance improvements are intended to speed up some loss of
performance detected in cifs and to a lesser extend in afs. The problem is
that we queue too many work items during the collection of read results:
each individual subrequest is collected by its own work item, and then they
have to interact with each other when a series of subrequests don't exactly
align with the pattern of folios that are being read by the overall
request.
Whilst the processing of the pages covered by individual subrequests as
they complete potentially allows folios to be woken in parallel and with
minimum delay, it can shuffle wakeups for sequential reads out of order -
and that is the most common I/O pattern.
The final assessment and cleanup of an operation is then held up until the
last I/O completes - and for a synchronous sequential operation, this means
the bouncing around of work items just adds latency.
Two changes have been made to make this work:
(1) All collection is now done in a single "work item" that works
progressively through the subrequests as they complete (and also
dispatches retries as necessary).
(2) For readahead and AIO, this work item be done on a workqueue and can
run in parallel with the ultimate consumer of the data; for
synchronous direct or unbuffered reads, the collection is run in the
application thread and not offloaded.
Functions such as smb2_readv_callback() then just tell netfslib that the
subrequest has terminated; netfslib does a minimal bit of processing on the
spot - stat counting and tracing mostly - and then queues/wakes up the
worker. This simplifies the logic as the collector just walks sequentially
through the subrequests as they complete and walks through the folios, if
buffered, unlocking them as it goes. It also keeps to a minimum the amount
of latency injected into the filesystem's low-level I/O handling
The way netfs supports filesystems using the deprecated PG_private_2 flag
is changed: folios are flagged and added to a write request as they
complete and that takes care of scheduling the writes to the cache. The
originating read request can then just unlock the pages whatever happens.
SINGLE-BLOB OBJECT SUPPORT
==========================
Single-blob objects are files for which the content of the file must be
read from or written to the server in a single operation because reading
them in parts may yield inconsistent results. AFS directories are an
example of this as there exists the possibility that the contents are
generated on the fly and would differ between reads or might change due to
third party interference.
Such objects will be written to and retrieved from the cache if one is
present, though we allow/may need to propose multiple subrequests to do so.
The important part is that read from/write to the *server* is monolithic.
Single blob reading is, for the moment, fully synchronous and does result
collection in the application thread and, also for the moment, the API is
supplied the buffer in the form of a folio_queue chain rather than using
the pagecache.
AFS CHANGES
===========
This series makes a number of changes to the kafs filesystem, primarily in
the area of directory handling:
(1) AFS's FetchData RPC reply processing is made partially asynchronous
which allows the netfs_io_request's outstanding operation counter to
be removed as part of reducing the collection to a single work item.
(2) Directory and symlink reading are plumbed through netfslib using the
single-blob object API and are now cacheable with fscache. This also
allows the afs_read struct to be eliminated and netfs_io_subrequest to
be used directly instead.
(3) Directory and symlink content are now stored in a folio_queue buffer
rather than in the pagecache. This means we don't require the RCU
read lock and xarray iteration to access it, and folios won't randomly
disappear under us because the VM wants them back.
There are some downsides to this, though: the storage folios are no
longer known to the VM, drop_caches can't flush them, the folios are
not migrateable. The inode must also be marked dirty manually to get
the data written to the cache in the background.
(4) The vnode operation lock is changed from a mutex struct to a private
lock implementation. The problem is that the lock now needs to be
dropped in a separate thread and mutexes don't permit that.
(5) When a new directory or symlink is created, we now initialise it
locally and mark it valid rather than downloading it (we know what
it's likely to look like).
(6) We now use the in-directory hashtable to reduce the number of entries
we need to scan when doing a lookup. The edit routines have to
maintain the hash chains.
(7) Cancellation (e.g. by signal) of an async call after the rxrpc_call
has been set up is now offloaded to the worker thread as there will be
a notification from rxrpc upon completion. This avoids a double
cleanup.
SUPPORTING CHANGES
==================
To support the above some other changes are also made:
(1) A "rolling buffer" implementation is created to abstract out the two
separate folio_queue chaining implementations I had (one for read and
one for write).
(2) Functions are provided to create/extend a buffer in a folio_queue
chain and tear it down again. This is used to handle AFS directories,
but could also be used to create bounce buffers for content crypto and
transport crypto.
(3) The was_async argument is dropped from netfs_read_subreq_terminated().
Instead we wake the read collection work item by either queuing it or
waking up the app thread.
(4) We don't need to use BH-excluding locks when communicating between the
issuing thread and the collection thread as neither of them now run in
BH context.
MISCELLANY
==========
Also included are a number of new tracepoints; a split of the netfslib
write collection code to put retrying into its own file (it gets more
complicated with content encryption).
There are also some minor fixes AFS included, including fixing the AFS
directory format struct layout, reducing some directory over-invalidation
and making afs_mkdir() translate EEXIST to ENOTEMPY (which is not available
on all systems the servers support).
Finally, there's a patch to try and detect entry into the folio unlock
function with no folio_queue structs in the buffer (which isn't allowed in
the cases that can get there). This is a debugging patch, but should be
minimal overhead.
* patches from https://lore.kernel.org/r/20241216204124.3752367-1-dhowells@redhat.com: (31 commits)
netfs: Report on NULL folioq in netfs_writeback_unlock_folios()
afs: Add a tracepoint for afs_read_receive()
afs: Locally initialise the contents of a new symlink on creation
afs: Use the contained hashtable to search a directory
afs: Make afs_mkdir() locally initialise a new directory's content
netfs: Change the read result collector to only use one work item
afs: Make {Y,}FS.FetchData an asynchronous operation
afs: Fix cleanup of immediately failed async calls
afs: Eliminate afs_read
afs: Use netfslib for symlinks, allowing them to be cached
afs: Use netfslib for directories
afs: Make afs_init_request() get a key if not given a file
netfs: Add support for caching single monolithic objects such as AFS dirs
netfs: Add functions to build/clean a buffer in a folio_queue
afs: Add more tracepoints to do with tracking validity
cachefiles: Add auxiliary data trace
cachefiles: Add some subrequest tracepoints
netfs: Remove some extraneous directory invalidations
afs: Fix directory format encoding struct
afs: Fix EEXIST error returned from afs_rmdir() to be ENOTEMPTY
...
Link: https://lore.kernel.org/r/20241216204124.3752367-1-dhowells@redhat.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
Diffstat (limited to 'include/linux/netfs.h')
-rw-r--r-- | include/linux/netfs.h | 54 |
1 files changed, 35 insertions, 19 deletions
diff --git a/include/linux/netfs.h b/include/linux/netfs.h index ecdd5ced16a8..071d05d81d38 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -18,9 +18,11 @@ #include <linux/fs.h> #include <linux/pagemap.h> #include <linux/uio.h> +#include <linux/rolling_buffer.h> enum netfs_sreq_ref_trace; typedef struct mempool_s mempool_t; +struct folio_queue; /** * folio_start_private_2 - Start an fscache write on a folio. [DEPRECATED] @@ -71,6 +73,7 @@ struct netfs_inode { #define NETFS_ICTX_UNBUFFERED 1 /* I/O should not use the pagecache */ #define NETFS_ICTX_WRITETHROUGH 2 /* Write-through caching */ #define NETFS_ICTX_MODIFIED_ATTR 3 /* Indicate change in mtime/ctime */ +#define NETFS_ICTX_SINGLE_NO_UPLOAD 4 /* Monolithic payload, cache but no upload */ }; /* @@ -178,9 +181,6 @@ struct netfs_io_subrequest { unsigned long long start; /* Where to start the I/O */ size_t len; /* Size of the I/O */ size_t transferred; /* Amount of data transferred */ - size_t consumed; /* Amount of read data consumed */ - size_t prev_donated; /* Amount of data donated from previous subreq */ - size_t next_donated; /* Amount of data donated from next subreq */ refcount_t ref; short error; /* 0 or error that occurred */ unsigned short debug_index; /* Index in list (for debugging output) */ @@ -188,9 +188,6 @@ struct netfs_io_subrequest { u8 retry_count; /* The number of retries (0 on initial pass) */ enum netfs_io_source source; /* Where to read from/write to */ unsigned char stream_nr; /* I/O stream this belongs to */ - unsigned char curr_folioq_slot; /* Folio currently being read */ - unsigned char curr_folio_order; /* Order of folio */ - struct folio_queue *curr_folioq; /* Queue segment in which current folio resides */ unsigned long flags; #define NETFS_SREQ_COPY_TO_CACHE 0 /* Set if should copy the data to the cache */ #define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */ @@ -208,9 +205,11 @@ enum netfs_io_origin { NETFS_READAHEAD, /* This read was triggered by readahead */ NETFS_READPAGE, /* This read is a synchronous read */ NETFS_READ_GAPS, /* This read is a synchronous read to fill gaps */ + NETFS_READ_SINGLE, /* This read should be treated as a single object */ NETFS_READ_FOR_WRITE, /* This read is to prepare a write */ NETFS_DIO_READ, /* This is a direct I/O read */ NETFS_WRITEBACK, /* This write was triggered by writepages */ + NETFS_WRITEBACK_SINGLE, /* This monolithic write was triggered by writepages */ NETFS_WRITETHROUGH, /* This write was made by netfs_perform_write() */ NETFS_UNBUFFERED_WRITE, /* This is an unbuffered write */ NETFS_DIO_WRITE, /* This is a direct I/O write */ @@ -231,16 +230,16 @@ struct netfs_io_request { struct address_space *mapping; /* The mapping being accessed */ struct kiocb *iocb; /* AIO completion vector */ struct netfs_cache_resources cache_resources; + struct netfs_io_request *copy_to_cache; /* Request to write just-read data to the cache */ struct readahead_control *ractl; /* Readahead descriptor */ struct list_head proc_link; /* Link in netfs_iorequests */ - struct list_head subrequests; /* Contributory I/O operations */ struct netfs_io_stream io_streams[2]; /* Streams of parallel I/O operations */ #define NR_IO_STREAMS 2 //wreq->nr_io_streams struct netfs_group *group; /* Writeback group being written back */ - struct folio_queue *buffer; /* Head of I/O buffer */ - struct folio_queue *buffer_tail; /* Tail of I/O buffer */ - struct iov_iter iter; /* Unencrypted-side iterator */ - struct iov_iter io_iter; /* I/O (Encrypted-side) iterator */ + struct rolling_buffer buffer; /* Unencrypted buffer */ +#define NETFS_ROLLBUF_PUT_MARK ROLLBUF_MARK_1 +#define NETFS_ROLLBUF_PAGECACHE_MARK ROLLBUF_MARK_2 + wait_queue_head_t waitq; /* Processor waiter */ void *netfs_priv; /* Private data for the netfs */ void *netfs_priv2; /* Private data for the netfs */ struct bio_vec *direct_bv; /* DIO buffer list (when handling iovec-iter) */ @@ -251,28 +250,28 @@ struct netfs_io_request { atomic_t subreq_counter; /* Next subreq->debug_index */ unsigned int nr_group_rel; /* Number of refs to release on ->group */ spinlock_t lock; /* Lock for queuing subreqs */ - atomic_t nr_outstanding; /* Number of ops in progress */ unsigned long long submitted; /* Amount submitted for I/O so far */ unsigned long long len; /* Length of the request */ size_t transferred; /* Amount to be indicated as transferred */ long error; /* 0 or error that occurred */ enum netfs_io_origin origin; /* Origin of the request */ bool direct_bv_unpin; /* T if direct_bv[] must be unpinned */ - u8 buffer_head_slot; /* First slot in ->buffer */ - u8 buffer_tail_slot; /* Next slot in ->buffer_tail */ unsigned long long i_size; /* Size of the file */ unsigned long long start; /* Start position */ atomic64_t issued_to; /* Write issuer folio cursor */ unsigned long long collected_to; /* Point we've collected to */ unsigned long long cleaned_to; /* Position we've cleaned folios to */ + unsigned long long abandon_to; /* Position to abandon folios to */ pgoff_t no_unlock_folio; /* Don't unlock this folio after read */ - size_t prev_donated; /* Fallback for subreq->prev_donated */ + unsigned char front_folio_order; /* Order (size) of front folio */ refcount_t ref; unsigned long flags; +#define NETFS_RREQ_OFFLOAD_COLLECTION 0 /* Offload collection to workqueue */ #define NETFS_RREQ_NO_UNLOCK_FOLIO 2 /* Don't unlock no_unlock_folio on completion */ #define NETFS_RREQ_DONT_UNLOCK_FOLIOS 3 /* Don't unlock the folios on completion */ #define NETFS_RREQ_FAILED 4 /* The request failed */ #define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */ +#define NETFS_RREQ_FOLIO_COPY_TO_CACHE 6 /* Copy current folio to cache from read */ #define NETFS_RREQ_UPLOAD_TO_SERVER 8 /* Need to write to the server */ #define NETFS_RREQ_NONBLOCK 9 /* Don't block if possible (O_NONBLOCK) */ #define NETFS_RREQ_BLOCKED 10 /* We blocked */ @@ -409,6 +408,13 @@ ssize_t netfs_unbuffered_write_iter_locked(struct kiocb *iocb, struct iov_iter * struct netfs_group *netfs_group); ssize_t netfs_file_write_iter(struct kiocb *iocb, struct iov_iter *from); +/* Single, monolithic object read/write API. */ +void netfs_single_mark_inode_dirty(struct inode *inode); +ssize_t netfs_read_single(struct inode *inode, struct file *file, struct iov_iter *iter); +int netfs_writeback_single(struct address_space *mapping, + struct writeback_control *wbc, + struct iov_iter *iter); + /* Address operations API */ struct readahead_control; void netfs_readahead(struct readahead_control *); @@ -428,10 +434,8 @@ bool netfs_release_folio(struct folio *folio, gfp_t gfp); vm_fault_t netfs_page_mkwrite(struct vm_fault *vmf, struct netfs_group *netfs_group); /* (Sub)request management API. */ -void netfs_read_subreq_progress(struct netfs_io_subrequest *subreq, - bool was_async); -void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq, - int error, bool was_async); +void netfs_read_subreq_progress(struct netfs_io_subrequest *subreq); +void netfs_read_subreq_terminated(struct netfs_io_subrequest *subreq); void netfs_get_subrequest(struct netfs_io_subrequest *subreq, enum netfs_sreq_ref_trace what); void netfs_put_subrequest(struct netfs_io_subrequest *subreq, @@ -453,6 +457,18 @@ void netfs_end_io_write(struct inode *inode); int netfs_start_io_direct(struct inode *inode); void netfs_end_io_direct(struct inode *inode); +/* Miscellaneous APIs. */ +struct folio_queue *netfs_folioq_alloc(unsigned int rreq_id, gfp_t gfp, + unsigned int trace /*enum netfs_folioq_trace*/); +void netfs_folioq_free(struct folio_queue *folioq, + unsigned int trace /*enum netfs_trace_folioq*/); + +/* Buffer wrangling helpers API. */ +int netfs_alloc_folioq_buffer(struct address_space *mapping, + struct folio_queue **_buffer, + size_t *_cur_size, ssize_t size, gfp_t gfp); +void netfs_free_folioq_buffer(struct folio_queue *fq); + /** * netfs_inode - Get the netfs inode context from the inode * @inode: The inode to query |