summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2022-01-07vfs, fscache: Implement pinning of cache usage for writebackDavid Howells
Cachefiles has a problem in that it needs to keep the backing file for a cookie open whilst there are local modifications pending that need to be written to it. However, we don't want to keep the file open indefinitely, as that causes EMFILE/ENFILE/ENOMEM problems. Reopening the cache file, however, is a problem if this is being done due to writeback triggered by exit(). Some filesystems will oops if we try to open a file in that context because they want to access current->fs or other resources that have already been dismantled. To get around this, I added the following: (1) An inode flag, I_PINNING_FSCACHE_WB, to be set on a network filesystem inode to indicate that we have a usage count on the cookie caching that inode. (2) A flag in struct writeback_control, unpinned_fscache_wb, that is set when __writeback_single_inode() clears the last dirty page from i_pages - at which point it clears I_PINNING_FSCACHE_WB and sets this flag. This has to be done here so that clearing I_PINNING_FSCACHE_WB can be done atomically with the check of PAGECACHE_TAG_DIRTY that clears I_DIRTY_PAGES. (3) A function, fscache_set_page_dirty(), which if it is not set, sets I_PINNING_FSCACHE_WB and calls fscache_use_cookie() to pin the cache resources. (4) A function, fscache_unpin_writeback(), to be called by ->write_inode() to unuse the cookie. (5) A function, fscache_clear_inode_writeback(), to be called when the inode is evicted, before clear_inode() is called. This cleans up any lingering I_PINNING_FSCACHE_WB. The network filesystem can then use these tools to make sure that fscache_write_to_cache() can write locally modified data to the cache as well as to the server. For the future, I'm working on write helpers for netfs lib that should allow this facility to be removed by keeping track of the dirty regions separately - but that's incomplete at the moment and is also going to be affected by folios, one way or another, since it deals with pages Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819615157.215744.17623791756928043114.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906917856.143852.8224898306177154573.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967124567.1823006.14188359004568060298.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021524705.640689.17824932021727663017.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement higher-level write I/O interfaceDavid Howells
Provide a higher-level function than fscache_write() to perform a write from an inode's pagecache to the cache, whilst fending off concurrent writes by means of the PG_fscache mark on a page: void fscache_write_to_cache(struct fscache_cookie *cookie, struct address_space *mapping, loff_t start, size_t len, loff_t i_size, netfs_io_terminated_t term_func, void *term_func_priv, bool caching); If caching is false, this function does nothing except call (*term_func)() if given. It assumes that, in such a case, PG_fscache will not have been set on the pages. Otherwise, if caching is true, this function requires the source pages to have had PG_fscache set on them before calling. start and len define the region of the file to be modified and i_size indicates the new file size. The source pages are extracted from the mapping. term_func and term_func_priv work as for fscache_write(). The PG_fscache marks will be cleared at the end of the operation, before term_func is called or the function otherwise returns. There is an additonal helper function to clear the PG_fscache bits from a range of pages: void fscache_clear_page_bits(struct fscache_cookie *cookie, struct address_space *mapping, loff_t start, size_t len, bool caching); If caching is true, the pages to be managed are expected to be located on mapping in the range defined by start and len. If caching is false, it does nothing. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819614155.215744.5528123235123721230.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906916346.143852.15632773570362489926.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967123599.1823006.12946816026724657428.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021522672.640689.4381958316198807813.stgit@warthog.procyon.org.uk/ # v4
2022-01-07netfs: Pass more information on how to deal with a hole in the cacheDavid Howells
Pass more information to the cache on how to deal with a hole if it encounters one when trying to read from the cache. Three options are provided: (1) NETFS_READ_HOLE_IGNORE. Read the hole along with the data, assuming it to be a punched-out extent by the backing filesystem. (2) NETFS_READ_HOLE_CLEAR. If there's a hole, erase the requested region of the cache and clear the read buffer. (3) NETFS_READ_HOLE_FAIL. Fail the read if a hole is detected. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819612321.215744.9738308885948264476.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906914460.143852.6284247083607910189.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967119923.1823006.15637375885194297582.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021519762.640689.16994364383313159319.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Provide read/write stat counters for the cacheDavid Howells
Provide read/write stat counters for the cache backend to use. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819609532.215744.10821082637727410554.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906912598.143852.12960327989649429069.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967113830.1823006.3222957649202368162.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021517502.640689.6077928311710357342.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Count data storage objects in a cacheDavid Howells
Count the data storage objects that are currently allocated in a cache. This is used to pin certain cache structures until cache withdrawal is complete. Three helpers are provided to manage and make use of the count: (1) void fscache_count_object(struct fscache_cache *cache); This should be called by the cache backend to note that an object has been allocated and attached to the cache. (2) void fscache_uncount_object(struct fscache_cache *cache); This should be called by the backend to note that an object has been destroyed. This sends a wakeup event that allows cache withdrawal to proceed if it was waiting for that object. (3) void fscache_wait_for_objects(struct fscache_cache *cache); This can be used by the backend to wait for all outstanding cache object to be destroyed. Each cache's counter is displayed as part of /proc/fs/fscache/caches. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819608594.215744.1812706538117388252.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906911646.143852.168184059935530127.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967111846.1823006.9868154941573671255.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021516219.640689.4934796654308958158.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Provide a means to begin an operationDavid Howells
Provide a function to begin a read operation: int fscache_begin_read_operation( struct netfs_cache_resources *cres, struct fscache_cookie *cookie) This is primarily intended to be called by network filesystems on behalf of netfslib, but may also be called to use the I/O access functions directly. It attaches the resources required by the cache to cres struct from the supplied cookie. This holds access to the cache behind the cookie for the duration of the operation and forces cache withdrawal and cookie invalidation to perform synchronisation on the operation. cres->inval_counter is set from the cookie at this point so that it can be compared at the end of the operation. Note that this does not guarantee that the cache state is fully set up and able to perform I/O immediately; looking up and creation may be left in progress in the background. The operations intended to be called by the network filesystem, such as reading and writing, are expected to wait for the cookie to move to the correct state. This will, however, potentially sleep, waiting for a certain minimum state to be set or for operations such as invalidate to advance far enough that I/O can resume. Also provide a function for the cache to call to wait for the cache object to get to a state where it can be used for certain things: bool fscache_wait_for_operation(struct netfs_cache_resources *cres, enum fscache_want_stage stage); This looks at the cache resources provided by the begin function and waits for them to get to an appropriate stage. There's a choice of wanting just some parameters (FSCACHE_WANT_PARAM) or the ability to do I/O (FSCACHE_WANT_READ or FSCACHE_WANT_WRITE). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819603692.215744.146724961588817028.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906910672.143852.13856103384424986357.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967110245.1823006.2239170567540431836.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021513617.640689.16627329360866150606.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement cookie invalidationDavid Howells
Add a function to invalidate the cache behind a cookie: void fscache_invalidate(struct fscache_cookie *cookie, const void *aux_data, loff_t size, unsigned int flags) This causes any cached data for the specified cookie to be discarded. If the cookie is marked as being in use, a new cache object will be created if possible and future I/O will use that instead. In-flight I/O should be abandoned (writes) or reconsidered (reads). Each time it is called cookie->inval_counter is incremented and this can be used to detect invalidation at the end of an I/O operation. The coherency data attached to the cookie can be updated and the cookie size should be reset. One flag is available, FSCACHE_INVAL_DIO_WRITE, which should be used to indicate invalidation due to a DIO write on a file. This will temporarily disable caching for this cookie. Changes ======= ver #2: - Should only change to inval state if can get access to cache. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819602231.215744.11206598147269491575.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906909707.143852.18056070560477964891.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967107447.1823006.5945029409592119962.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021512640.640689.11418616313147754172.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement cookie user counting and resource pinningDavid Howells
Provide a pair of functions to count the number of users of a cookie (open files, writeback, invalidation, resizing, reads, writes), to obtain and pin resources for the cookie and to prevent culling for the whilst there are users. The first function marks a cookie as being in use: void fscache_use_cookie(struct fscache_cookie *cookie, bool will_modify); The caller should indicate the cookie to use and whether or not the caller is in a context that may modify the cookie (e.g. a file open O_RDWR). If the cookie is not already resourced, fscache will ask the cache backend in the background to do whatever it needs to look up, create or otherwise obtain the resources necessary to access data. This is pinned to the cookie and may not be culled, though it may be withdrawn if the cache as a whole is withdrawn. The second function removes the in-use mark from a cookie and, optionally, updates the coherency data: void fscache_unuse_cookie(struct fscache_cookie *cookie, const void *aux_data, const loff_t *object_size); If non-NULL, the aux_data buffer and/or the object_size will be saved into the cookie and will be set on the backing store when the object is committed. If this removes the last usage on a cookie, the cookie is placed onto an LRU list from which it will be removed and closed after a couple of seconds if it doesn't get reused. This prevents resource overload in the cache - in particular it prevents it from holding too many files open. Changes ======= ver #2: - Fix fscache_unuse_cookie() to use atomic_dec_and_lock() to avoid a potential race if the cookie gets reused before it completes the unusement. - Added missing transition to LRU_DISCARDING state. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819600612.215744.13678350304176542741.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906907567.143852.16979631199380722019.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967106467.1823006.6790864931048582667.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021511674.640689.10084988363699111860.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement simple cookie state machineDavid Howells
Implement a very simple cookie state machine to handle lookup, invalidation, withdrawal, relinquishment and, to be added later, commit on LRU discard. Three cache methods are provided: ->lookup_cookie() to look up and, if necessary, create a data storage object; ->withdraw_cookie() to free the resources associated with that object and potentially delete it; and ->prepare_to_write(), to do prepare for changes to the cached data to be modified locally. Changes ======= ver #3: - Fix a race between LRU discard and relinquishment whereby the former would override the latter and thus the latter would never happen[1]. ver #2: - Don't hold n_accesses elevated whilst cache is bound to a cookie, but rather add a flag that prevents the state machine from being queued when n_accesses reaches 0. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/599331.1639410068@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/163819599657.215744.15799615296912341745.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906903925.143852.1805855338154353867.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967105456.1823006.14730395299835841776.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021510706.640689.7961423370243272583.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Add a function for a cache backend to note an I/O errorDavid Howells
Add a function to the backend API to note an I/O error in a cache. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819598741.215744.891281275151382095.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906901316.143852.15225412215771586528.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967100721.1823006.16435671567428949398.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021508840.640689.11902836226570620424.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Provide and use cache methods to lookup/create/free a volumeDavid Howells
Add cache methods to lookup, create and remove a volume. Looking up or creating the volume requires the cache pinning for access; freeing the volume requires the volume pinning for access. The ->acquire_volume() method is used to ask the cache backend to lookup and, if necessary, create a volume; the ->free_volume() method is used to free the resources for a volume. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819597821.215744.5225318658134989949.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906898645.143852.8537799955945956818.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967099771.1823006.1455197910571061835.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021507345.640689.4073511598838843040.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement functions add/remove a cacheDavid Howells
Implement functions to allow the cache backend to add or remove a cache: (1) Declare a cache to be live: int fscache_add_cache(struct fscache_cache *cache, const struct fscache_cache_ops *ops, void *cache_priv); Take a previously acquired cache cookie, set the operations table and private data and mark the cache open for access. (2) Withdraw a cache from service: void fscache_withdraw_cache(struct fscache_cache *cache); This marks the cache as withdrawn and thus prevents further cache-level and volume-level accesses. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819596022.215744.8799712491432238827.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906896599.143852.17049208999019262884.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967097870.1823006.3470041000971522030.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021505541.640689.1819714759326331054.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement cookie-level access helpersDavid Howells
Add a number of helper functions to manage access to a cookie, pinning the cache object in place for the duration to prevent cache withdrawal from removing it: (1) void fscache_init_access_gate(struct fscache_cookie *cookie); This function initialises the access count when a cache binds to a cookie. An extra ref is taken on the access count to prevent wakeups while the cache is active. We're only interested in the wakeup when a cookie is being withdrawn and we're waiting for it to quiesce - at which point the counter will be decremented before the wait. The FSCACHE_COOKIE_NACC_ELEVATED flag is set on the cookie to keep track of the extra ref in order to handle a race between relinquishment and withdrawal both trying to drop the extra ref. (2) bool fscache_begin_cookie_access(struct fscache_cookie *cookie, enum fscache_access_trace why); This function attempts to begin access upon a cookie, pinning it in place if it's cached. If successful, it returns true and leaves a the access count incremented. (3) void fscache_end_cookie_access(struct fscache_cookie *cookie, enum fscache_access_trace why); This function drops the access count obtained by (2), permitting object withdrawal to take place when it reaches zero. A tracepoint is provided to track changes to the access counter on a cookie. Changes ======= ver #2: - Don't hold n_accesses elevated whilst cache is bound to a cookie, but rather add a flag that prevents the state machine from being queued when n_accesses reaches 0. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819595085.215744.1706073049250505427.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906895313.143852.10141619544149102193.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967095980.1823006.1133648159424418877.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021503063.640689.8870918985269528670.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement volume-level access helpersDavid Howells
Add a pair of helper functions to manage access to a volume, pinning the volume in place for the duration to prevent cache withdrawal from removing it: bool fscache_begin_volume_access(struct fscache_volume *volume, enum fscache_access_trace why); void fscache_end_volume_access(struct fscache_volume *volume, enum fscache_access_trace why); The way the access gate on the volume works/will work is: (1) If the cache tests as not live (state is not FSCACHE_CACHE_IS_ACTIVE), then we return false to indicate access was not permitted. (2) If the cache tests as live, then we increment the volume's n_accesses count and then recheck the cache liveness, ending the access if it ceased to be live. (3) When we end the access, we decrement the volume's n_accesses and wake up the any waiters if it reaches 0. (4) Whilst the cache is caching, the volume's n_accesses is kept artificially incremented to prevent wakeups from happening. (5) When the cache is taken offline, the state is changed to prevent new accesses, the volume's n_accesses is decremented and we wait for it to become 0. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819594158.215744.8285859817391683254.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906894315.143852.5454793807544710479.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967095028.1823006.9173132503876627466.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021501546.640689.9631510472149608443.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement cache-level access helpersDavid Howells
Add a pair of functions to pin/unpin a cache that we're wanting to do a high-level access to (such as creating or removing a volume): bool fscache_begin_cache_access(struct fscache_cache *cache, enum fscache_access_trace why); void fscache_end_cache_access(struct fscache_cache *cache, enum fscache_access_trace why); The way the access gate works/will work is: (1) If the cache tests as not live (state is not FSCACHE_CACHE_IS_ACTIVE), then we return false to indicate access was not permitted. (2) If the cache tests as live, then we increment the n_accesses count and then recheck the liveness, ending the access if it ceased to be live. (3) When we end the access, we decrement n_accesses and wake up the any waiters if it reaches 0. (4) Whilst the cache is caching, n_accesses is kept artificially incremented to prevent wakeups from happening. (5) When the cache is taken offline, the state is changed to prevent new accesses, n_accesses is decremented and we wait for n_accesses to become 0. Note that some of this is implemented in a later patch. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819593239.215744.7537428720603638088.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906893368.143852.14164004598465617981.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967093977.1823006.6967886507023056409.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021499995.640689.18286203753480287850.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement cookie registrationDavid Howells
Add functions to the fscache API to allow data file cookies to be acquired and relinquished by the network filesystem. It is intended that the filesystem will create such cookies per-inode under a volume. To request a cookie, the filesystem should call: struct fscache_cookie * fscache_acquire_cookie(struct fscache_volume *volume, u8 advice, const void *index_key, size_t index_key_len, const void *aux_data, size_t aux_data_len, loff_t object_size) The filesystem must first have created a volume cookie, which is passed in here. If it passes in NULL then the function will just return a NULL cookie. A binary key should be passed in index_key and is of size index_key_len. This is saved in the cookie and is used to locate the associated data in the cache. A coherency data buffer of size aux_data_len will be allocated and initialised from the buffer pointed to by aux_data. This is used to validate cache objects when they're opened and is stored on disk with them when they're committed. The data is stored in the cookie and will be updateable by various functions in later patches. The object_size must also be given. This is also used to perform a coherency check and to size the backing storage appropriately. This function disallows a cookie from being acquired twice in parallel, though it will cause the second user to wait if the first is busy relinquishing its cookie. When a network filesystem has finished with a cookie, it should call: void fscache_relinquish_cookie(struct fscache_volume *volume, bool retire) If retire is true, any backing data will be discarded immediately. Changes ======= ver #3: - fscache_hash()'s size parameter is now in bytes. Use __le32 as the unit to round up to. - When comparing cookies, simply see if the attributes are the same rather than subtracting them to produce a strcmp-style return[1]. - Add a check to see if the cookie is still hashed at the point of freeing. ver #2: - Don't hold n_accesses elevated whilst cache is bound to a cookie, but rather add a flag that prevents the state machine from being queued when n_accesses reaches 0. - Remove the unused cookie pointer field from the fscache_acquire tracepoint. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/163819590658.215744.14934902514281054323.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906891983.143852.6219772337558577395.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967088507.1823006.12659006350221417165.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021498432.640689.12743483856927722772.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement volume registrationDavid Howells
Add functions to the fscache API to allow volumes to be acquired and relinquished by the network filesystem. A volume is an index of data storage cache objects. A volume is represented by a volume cookie in the API. A filesystem would typically create a volume for a superblock and then create per-inode cookies within it. To request a volume, the filesystem calls: struct fscache_volume * fscache_acquire_volume(const char *volume_key, const char *cache_name, const void *coherency_data, size_t coherency_len) The volume_key is a printable string used to match the volume in the cache. It should not contain any '/' characters. For AFS, for example, this would be "afs,<cellname>,<volume_id>", e.g. "afs,example.com,523001". The cache_name can be NULL, but if not it should be a string indicating the name of the cache to use if there's more than one available. The coherency data, if given, is an arbitrarily-sized blob that's attached to the volume and is compared when the volume is looked up. If it doesn't match, the old volume is judged to be out of date and it and everything within it is discarded. Acquiring a volume twice concurrently is disallowed, though the function will wait if an old volume cookie is being relinquishing. When a network filesystem has finished with a volume, it should return the volume cookie by calling: void fscache_relinquish_volume(struct fscache_volume *volume, const void *coherency_data, bool invalidate) If invalidate is true, the entire volume will be discarded; if false, the volume will be synced and the coherency data will be updated. Changes ======= ver #4: - Removed an extraneous param from kdoc on fscache_relinquish_volume()[3]. ver #3: - fscache_hash()'s size parameter is now in bytes. Use __le32 as the unit to round up to. - When comparing cookies, simply see if the attributes are the same rather than subtracting them to produce a strcmp-style return[2]. - Make the coherency data an arbitrary blob rather than a u64, but don't store it for the moment. ver #2: - Fix error check[1]. - Make a fscache_acquire_volume() return errors, including EBUSY if a conflicting volume cookie already exists. No error is printed now - that's left to the netfs. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/20211203095608.GC2480@kili/ [1] Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com/ [2] Link: https://lore.kernel.org/r/20211220224646.30e8205c@canb.auug.org.au/ [3] Link: https://lore.kernel.org/r/163819588944.215744.1629085755564865996.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906890630.143852.13972180614535611154.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967086836.1823006.8191672796841981763.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021495816.640689.4403156093668590217.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement cache registrationDavid Howells
Implement a register of caches and provide functions to manage it. Two functions are provided for the cache backend to use: (1) Acquire a cache cookie: struct fscache_cache *fscache_acquire_cache(const char *name) This gets the cache cookie for a cache of the specified name and moves it to the preparation state. If a nameless cache cookie exists, that will be given this name and used. (2) Relinquish a cache cookie: void fscache_relinquish_cache(struct fscache_cache *cache); This relinquishes a cache cookie, cleans it and makes it available if it's still referenced by a network filesystem. Note that network filesystems don't deal with cache cookies directly, but rather go straight to the volume registration. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819587157.215744.13523139317322503286.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906889665.143852.10378009165231294456.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967085081.1823006.2218944206363626210.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021494847.640689.10109692261640524343.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Implement a hash functionDavid Howells
Implement a function to generate hashes. It needs to be stable over time and endianness-independent as the hashes will appear on disk in future patches. It can assume that its input is a multiple of four bytes in size and alignment. This is borrowed from the VFS and simplified. le32_to_cpu() is added to make it endianness-independent. Changes ======= ver #3: - Read the data being hashed in an endianness-independent way[1]. - Change the size parameter to be in bytes rather than words. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/CAHk-=whtkzB446+hX0zdLsdcUJsJ=8_-0S1mE_R+YurThfUbLA@mail.gmail.com [1] Link: https://lore.kernel.org/r/163819586113.215744.1699465806130102367.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906888735.143852.10944614318596881429.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967082342.1823006.8915671045444488742.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021493624.640689.9990442668811178628.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Introduce new driverDavid Howells
Introduce basic skeleton of the new, rewritten fscache driver. Changes ======= ver #3: - Use remove_proc_subtree(), not remove_proc_entry() to remove a populated dir. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819584034.215744.4290533472390439030.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906887770.143852.3577888294989185666.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967080039.1823006.5702921801104057922.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021491014.640689.4292699878317589512.stgit@warthog.procyon.org.uk/ # v4
2022-01-07netfs: Pass a flag to ->prepare_write() to say if there's no alloc'd spaceDavid Howells
Pass a flag to ->prepare_write() to indicate if there's definitely no space allocated in the cache yet (for instance if we've already checked as we were asked to do a read). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819583123.215744.12783808230464471417.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906886835.143852.6689886781122679769.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967079100.1823006.12889542712309574359.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021489334.640689.3131206613015409076.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache: Remove the contents of the fscache driver, pending rewriteDavid Howells
Remove the code that comprises the fscache driver as it's going to be substantially rewritten, with the majority of the code being erased in the rewrite. A small piece of linux/fscache.h is left as that is #included by a bunch of network filesystems. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819578724.215744.18210619052245724238.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906884814.143852.6727245089843862889.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967077097.1823006.1377665951499979089.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021485548.640689.13876080567388696162.stgit@warthog.procyon.org.uk/ # v4
2022-01-07cachefiles: Delete the cachefiles driver pending rewriteDavid Howells
Delete the code from the cachefiles driver to make it easier to rewrite and resubmit in a logical manner. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819577641.215744.12718114397770666596.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906883770.143852.4149714614981373410.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967076066.1823006.7175712134577687753.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021483619.640689.7586546280515844702.stgit@warthog.procyon.org.uk/ # v4
2022-01-07fscache, cachefiles: Disable configurationDavid Howells
Disable fscache and cachefiles in Kconfig whilst it is rewritten. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/163819576672.215744.12444272479560406780.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/163906882835.143852.11073015983885872901.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/163967075113.1823006.277316290062782998.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/164021481179.640689.2004199594774033658.stgit@warthog.procyon.org.uk/ # v4
2021-12-07netfs: fix parameter of cleanup()Jeffle Xu
The order of these two parameters is just reversed. gcc didn't warn on that, probably because 'void *' can be converted from or to other pointer types without warning. Cc: stable@vger.kernel.org Fixes: 3d3c95046742 ("netfs: Provide readahead and readpage netfs helpers") Fixes: e1b1240c1ff5 ("netfs: Add write_begin helper") Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Link: https://lore.kernel.org/r/20211207031449.100510-1-jefflexu@linux.alibaba.com/ # v1
2021-12-07netfs: Fix lockdep warning from taking sb_writers whilst holding mmap_lockDavid Howells
Taking sb_writers whilst holding mmap_lock isn't allowed and will result in a lockdep warning like that below. The problem comes from cachefiles needing to take the sb_writers lock in order to do a write to the cache, but being asked to do this by netfslib called from readpage, readahead or write_begin[1]. Fix this by always offloading the write to the cache off to a worker thread. The main thread doesn't need to wait for it, so deadlock can be avoided. This can be tested by running the quick xfstests on something like afs or ceph with lockdep enabled. WARNING: possible circular locking dependency detected 5.15.0-rc1-build2+ #292 Not tainted ------------------------------------------------------ holetest/65517 is trying to acquire lock: ffff88810c81d730 (mapping.invalidate_lock#3){.+.+}-{3:3}, at: filemap_fault+0x276/0x7a5 but task is already holding lock: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&mm->mmap_lock#2){++++}-{3:3}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b __might_fault+0x87/0xb1 strncpy_from_user+0x25/0x18c removexattr+0x7c/0xe5 __do_sys_fremovexattr+0x73/0x96 do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (sb_writers#10){.+.+}-{0:0}: validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b cachefiles_write+0x2b3/0x4bb netfs_rreq_do_write_to_cache+0x3b5/0x432 netfs_readpage+0x2de/0x39d filemap_read_page+0x51/0x94 filemap_get_pages+0x26f/0x413 filemap_read+0x182/0x427 new_sync_read+0xf0/0x161 vfs_read+0x118/0x16e ksys_read+0xb8/0x12e do_syscall_64+0x67/0x7a entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (mapping.invalidate_lock#3){.+.+}-{3:3}: check_noncircular+0xe4/0x129 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b down_read+0x40/0x4a filemap_fault+0x276/0x7a5 __do_fault+0x96/0xbf do_fault+0x262/0x35a __handle_mm_fault+0x171/0x1b5 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c exc_page_fault+0x85/0xa5 asm_exc_page_fault+0x1e/0x30 other info that might help us debug this: Chain exists of: mapping.invalidate_lock#3 --> sb_writers#10 --> &mm->mmap_lock#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_lock#2); lock(sb_writers#10); lock(&mm->mmap_lock#2); lock(mapping.invalidate_lock#3); *** DEADLOCK *** 1 lock held by holetest/65517: #0: ffff8881595b53e8 (&mm->mmap_lock#2){++++}-{3:3}, at: do_user_addr_fault+0x28d/0x59c stack backtrace: CPU: 0 PID: 65517 Comm: holetest Not tainted 5.15.0-rc1-build2+ #292 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: dump_stack_lvl+0x45/0x59 check_noncircular+0xe4/0x129 ? print_circular_bug+0x207/0x207 ? validate_chain+0x461/0x4a8 ? add_chain_block+0x88/0xd9 ? hlist_add_head_rcu+0x49/0x53 check_prev_add+0x16b/0x3a4 validate_chain+0x3c4/0x4a8 ? check_prev_add+0x3a4/0x3a4 ? mark_lock+0xa5/0x1c6 __lock_acquire+0x89d/0x949 lock_acquire+0x2dc/0x34b ? filemap_fault+0x276/0x7a5 ? rcu_read_unlock+0x59/0x59 ? add_to_page_cache_lru+0x13c/0x13c ? lock_is_held_type+0x7b/0xd3 down_read+0x40/0x4a ? filemap_fault+0x276/0x7a5 filemap_fault+0x276/0x7a5 ? pagecache_get_page+0x2dd/0x2dd ? __lock_acquire+0x8bc/0x949 ? pte_offset_kernel.isra.0+0x6d/0xc3 __do_fault+0x96/0xbf ? do_fault+0x124/0x35a do_fault+0x262/0x35a ? handle_pte_fault+0x1c1/0x20d __handle_mm_fault+0x171/0x1b5 ? handle_pte_fault+0x20d/0x20d ? __lock_release+0x151/0x254 ? mark_held_locks+0x1f/0x78 ? rcu_read_unlock+0x3a/0x59 handle_mm_fault+0x12a/0x233 do_user_addr_fault+0x3d2/0x59c ? pgtable_bad+0x70/0x70 ? rcu_read_lock_bh_held+0xab/0xab exc_page_fault+0x85/0xa5 ? asm_exc_page_fault+0x8/0x30 asm_exc_page_fault+0x1e/0x30 RIP: 0033:0x40192f Code: ff 48 89 c3 48 8b 05 50 28 00 00 48 85 ed 7e 23 31 d2 4b 8d 0c 2f eb 0a 0f 1f 00 48 8b 05 39 28 00 00 48 0f af c2 48 83 c2 01 <48> 89 1c 01 48 39 d5 7f e8 8b 0d f2 27 00 00 31 c0 85 c9 74 0e 8b RSP: 002b:00007f9931867eb0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00007f9931868700 RCX: 00007f993206ac00 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00007ffc13e06ee0 RBP: 0000000000000100 R08: 0000000000000000 R09: 00007f9931868700 R10: 00007f99318689d0 R11: 0000000000000202 R12: 00007ffc13e06ee0 R13: 0000000000000c00 R14: 00007ffc13e06e00 R15: 00007f993206a000 Fixes: 726218fdc22c ("netfs: Define an interface to talk to a cache") Signed-off-by: David Howells <dhowells@redhat.com> Tested-by: Jeff Layton <jlayton@kernel.org> cc: Jan Kara <jack@suse.cz> cc: linux-cachefs@redhat.com cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210922110420.GA21576@quack2.suse.cz/ [1] Link: https://lore.kernel.org/r/163887597541.1596626.2668163316598972956.stgit@warthog.procyon.org.uk/ # v1
2021-12-04Merge tag 'xfs-5.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fix from Darrick Wong: "Remove an unnecessary (and backwards) rename flags check that duplicates a VFS level check" * tag 'xfs-5.16-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove incorrect ASSERT in xfs_rename
2021-12-04Merge tag '5.16-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Three SMB3 multichannel/fscache fixes and a DFS fix. In testing multichannel reconnect scenarios recently various problems with the cifs.ko implementation of fscache were found (e.g. incorrect initialization of fscache cookies in some cases)" * tag '5.16-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: avoid use of dstaddr as key for fscache client cookie cifs: add server conn_id to fscache client cookie cifs: wait for tcon resource_id before getting fscache super cifs: fix missed refcounting of ipc tcon
2021-12-04Merge tag 'io_uring-5.16-2021-12-03' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fix from Jens Axboe: "Just a single fix preventing repeated retries of task_work based io-wq thread creation, fixing a regression from when io-wq was made more (a bit too much) resilient against signals" * tag 'io_uring-5.16-2021-12-03' of git://git.kernel.dk/linux-block: io-wq: don't retry task_work creation failure on fatal conditions
2021-12-04Merge tag 'gfs2-v5.16-rc4-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: - Since commit 486408d690e1 ("gfs2: Cancel remote delete work asynchronously"), inode create and lookup-by-number can overlap more easily and we can end up with temporary duplicate inodes. Fix the code to prevent that. - Fix a BUG demoting weak glock holders from a remote node. * tag 'gfs2-v5.16-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: gfs2_create_inode rework gfs2: gfs2_inode_lookup rework gfs2: gfs2_inode_lookup cleanup gfs2: Fix remote demote of weak glock holders
2021-12-03cifs: avoid use of dstaddr as key for fscache client cookieShyam Prasad N
server->dstaddr can change when the DNS mapping for the server hostname changes. But conn_id is a u64 counter that is incremented each time a new TCP connection is setup. So use only that as a key. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-03cifs: add server conn_id to fscache client cookieShyam Prasad N
The fscache client cookie uses the server address (and port) as the cookie key. This is a problem when nosharesock is used. Two different connections will use duplicate cookies. Avoid this by adding server->conn_id to the key, so that it's guaranteed that cookie will not be duplicated. Also, for secondary channels of a session, copy the fscache pointer from the primary channel. The primary channel is guaranteed not to go away as long as secondary channels are in use. Also addresses minor problem found by kernel test robot. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-03cifs: wait for tcon resource_id before getting fscache superShyam Prasad N
The logic for initializing tcon->resource_id is done inside cifs_root_iget. fscache super cookie relies on this for aux data. So we need to push the fscache initialization to this later point during mount. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-03cifs: fix missed refcounting of ipc tconPaulo Alcantara
Fix missed refcounting of IPC tcon used for getting domain-based DFS root referrals. We want to keep it alive as long as mount is active and can be refreshed. For standalone DFS root referrals it wouldn't be a problem as the client ends up having an IPC tcon for both mount and cache. Fixes: c88f7dcd6d64 ("cifs: support nested dfs links over reconnect") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2021-12-03fget: check that the fd still exists after getting a ref to itLinus Torvalds
Jann Horn points out that there is another possible race wrt Unix domain socket garbage collection, somewhat reminiscent of the one fixed in commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK"). See the extended comment about the garbage collection requirements added to unix_peek_fds() by that commit for details. The race comes from how we can locklessly look up a file descriptor just as it is in the process of being closed, and with the right artificial timing (Jann added a few strategic 'mdelay(500)' calls to do that), the Unix domain socket garbage collector could see the reference count decrement of the close() happen before fget() took its reference to the file and the file was attached onto a new file descriptor. This is all (intentionally) correct on the 'struct file *' side, with RCU lookups and lockless reference counting very much part of the design. Getting that reference count out of order isn't a problem per se. But the garbage collector can get confused by seeing this situation of having seen a file not having any remaining external references and then seeing it being attached to an fd. In commit cbcf01128d0a ("af_unix: fix garbage collect vs MSG_PEEK") the fix was to serialize the file descriptor install with the garbage collector by taking and releasing the unix_gc_lock. That's not really an option here, but since this all happens when we are in the process of looking up a file descriptor, we can instead simply just re-check that the file hasn't been closed in the meantime, and just re-do the lookup if we raced with a concurrent close() of the same file descriptor. Reported-and-tested-by: Jann Horn <jannh@google.com> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-03io-wq: don't retry task_work creation failure on fatal conditionsJens Axboe
We don't want to be retrying task_work creation failure if there's an actual signal pending for the parent task. If we do, then we can enter an infinite loop of perpetually retrying and each retry failing with -ERESTARTNOINTR because a signal is pending. Fixes: 3146cba99aa2 ("io-wq: make worker creation resilient against signals") Reported-by: Florian Fischer <florian.fl.fischer@fau.de> Link: https://lore.kernel.org/io-uring/20211202165606.mqryio4yzubl7ms5@pasture/ Tested-by: Florian Fischer <florian.fl.fischer@fau.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-12-02gfs2: gfs2_create_inode reworkAndreas Gruenbacher
When gfs2_lookup_by_inum() calls gfs2_inode_lookup() for an uncached inode, gfs2_inode_lookup() will place a new tentative inode into the inode cache before verifying that there is a valid inode at the given address. This can race with gfs2_create_inode() which doesn't check for duplicates inodes. gfs2_create_inode() will try to assign the new inode to the corresponding inode glock, and glock_set_object() will complain that the glock is still in use by gfs2_inode_lookup's tentative inode. We noticed this bug after adding commit 486408d690e1 ("gfs2: Cancel remote delete work asynchronously") which allowed delete_work_func() to race with gfs2_create_inode(), but the same race exists for open-by-handle. Fix that by switching from insert_inode_hash() to insert_inode_locked4(), which does check for duplicate inodes. We know we've just managed to to allocate the new inode, so an inode tentatively created by gfs2_inode_lookup() will eventually go away and insert_inode_locked4() will always succeed. In addition, don't flush the inode glock work anymore (this can now only make things worse) and clean up glock_{set,clear}_object for the inode glock somewhat. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-02gfs2: gfs2_inode_lookup reworkAndreas Gruenbacher
Rework gfs2_inode_lookup() to only set up the new inode's glocks after verifying that the new inode is valid. There is no need for flushing the inode glock work queue anymore now, so remove that as well. While at it, get rid of the useless wrapper around iget5_locked() and its unnecessary is_bad_inode() check. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-02gfs2: gfs2_inode_lookup cleanupAndreas Gruenbacher
In gfs2_inode_lookup, once the inode has been looked up, we check if the inode generation (no_formal_ino) is the one we're looking for. If it isn't and the inode wasn't in the inode cache, we discard the newly looked up inode. This is unnecessary, complicates the code, and makes future changes to gfs2_inode_lookup harder, so change the code to retain newly looked up inodes instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-02gfs2: Fix remote demote of weak glock holdersAndreas Gruenbacher
When we mock up a temporary holder in gfs2_glock_cb to demote weak holders in response to a remote locking conflict, we don't set the HIF_HOLDER flag. This causes function may_grant to BUG. Fix by setting the missing HIF_HOLDER flag in the mock glock holder. In addition, define the mock glock holder where it is used. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-12-01xfs: remove incorrect ASSERT in xfs_renameEric Sandeen
This ASSERT in xfs_rename is a) incorrect, because (RENAME_WHITEOUT|RENAME_NOREPLACE) is a valid combination, and b) unnecessary, because actual invalid flag combinations are already handled at the vfs level in do_renameat2() before we get called. So, remove it. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-11-29netfs: Adjust docs after foliationDavid Howells
Adjust the netfslib docs in light of the foliation changes. Also un-kdoc-mark netfs_skip_folio_read() since it's internal and isn't part of the API. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> cc: Matthew Wilcox <willy@infradead.org> cc: linux-cachefs@redhat.com cc: linux-mm@kvack.org Link: https://lore.kernel.org/r/163706992597.3179783.18360472879717076435.stgit@warthog.procyon.org.uk/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-27Merge tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds
Pull ksmbd fixes from Steve French: "Five ksmbd server fixes, four of them for stable: - memleak fix - fix for default data stream on filesystems that don't support xattr - error logging fix - session setup fix - minor doc cleanup" * tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: fix memleak in get_file_stream_info() ksmbd: contain default data stream even if xattr is empty ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec() docs: filesystem: cifs: ksmbd: Fix small layout issues ksmbd: Fix an error handling path in 'smb2_sess_setup()'
2021-11-27fs: ntfs: Limit NTFS_RW to page sizes smaller than 64kGuenter Roeck
NTFS_RW code allocates page size dependent arrays on the stack. This results in build failures if the page size is 64k or larger. fs/ntfs/aops.c: In function 'ntfs_write_mst_block': fs/ntfs/aops.c:1311:1: error: the frame size of 2240 bytes is larger than 2048 bytes Since commit f22969a66041 ("powerpc/64s: Default to 64K pages for 64 bit book3s") this affects ppc:allmodconfig builds, but other architectures supporting page sizes of 64k or larger are also affected. Increasing the maximum frame size for affected architectures just to silence this error does not really help. The frame size would have to be set to a really large value for 256k pages. Also, a large frame size could potentially result in stack overruns in this code and elsewhere and is therefore not desirable. Make NTFS_RW dependent on page sizes smaller than 64k instead. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Cc: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-27Merge tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs fixes from Darrick Wong: "Fixes for a resource leak and a build robot complaint about totally dead code: - Fix buffer resource leak that could lead to livelock on corrupt fs. - Remove unused function xfs_inew_wait to shut up the build robots" * tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove xfs_inew_wait xfs: Fix the free logic of state in xfs_attr_node_hasname
2021-11-27Merge tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap fixes from Darrick Wong: "A single iomap bug fix and a cleanup for 5.16-rc2. The bug fix changes how iomap deals with reading from an inline data region -- whereas the current code (incorrectly) lets the iomap read iter try for more bytes after reading the inline region (which zeroes the rest of the page!) and hopes the next iteration terminates, we surveyed the inlinedata implementations and realized that all inlinedata implementations also require that the inlinedata region end at EOF, so we can simply terminate the read. The second patch documents these assumptions in the code so that they're not subtle implications anymore, and cleans up some of the grosser parts of that function. Summary: - Fix an accounting problem where unaligned inline data reads can run off the end of the read iomap iterator. iomap has historically required that inline data mappings only exist at the end of a file, though this wasn't documented anywhere. - Document iomap_read_inline_data and change its return type to be appropriate for the information that it's actually returning" * tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: iomap_read_inline_data cleanup iomap: Fix inline extent handling in iomap_readpage
2021-11-27Merge tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull more io_uring fixes from Jens Axboe: "The locking fixup that was applied earlier this rc has both a deadlock and IRQ safety issue, let's get that ironed out before -rc3. This contains: - Link traversal locking fix (Pavel) - Cancelation fix (Pavel) - Relocate cond_resched() for huge buffer chain freeing, avoiding a softlockup warning (Ye) - Fix timespec validation (Ye)" * tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block: io_uring: Fix undefined-behaviour in io_issue_sqe io_uring: fix soft lockup when call __io_remove_buffers io_uring: fix link traversal locking io_uring: fail cancellation for EXITING tasks
2021-11-27Merge tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull NFS client fixes from Trond Myklebust: "Highlights include: Stable fixes: - NFSv42: Fix pagecache invalidation after COPY/CLONE Bugfixes: - NFSv42: Don't fail clone() just because the server failed to return post-op attributes - SUNRPC: use different lockdep keys for INET6 and LOCAL - NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION - SUNRPC: fix header include guard in trace header" * tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: use different lock keys for INET6 and LOCAL sunrpc: fix header include guard in trace header NFSv4.1: handle NFS4ERR_NOSPC by CREATE_SESSION NFSv42: Fix pagecache invalidation after COPY/CLONE NFS: Add a tracepoint to show the results of nfs_set_cache_invalid() NFSv42: Don't fail clone() unless the OP_CLONE operation failed
2021-11-27Merge tag 'erofs-for-5.16-rc3-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fix from Gao Xiang: "Fix an ABBA deadlock introduced by XArray conversion" * tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix deadlock when shrink erofs slab
2021-11-27io_uring: Fix undefined-behaviour in io_issue_sqeYe Bin
We got issue as follows: ================================================================================ UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14 signed integer overflow: -4966321760114568020 * 1000000000 cannot be represented in type 'long long int' CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x170/0x1dc lib/dump_stack.c:118 ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161 handle_overflow+0x188/0x1dc lib/ubsan.c:192 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213 ktime_set include/linux/ktime.h:42 [inline] timespec64_to_ktime include/linux/ktime.h:78 [inline] io_timeout fs/io_uring.c:5153 [inline] io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599 __io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988 io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067 io_submit_sqe fs/io_uring.c:6137 [inline] io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331 __do_sys_io_uring_enter fs/io_uring.c:8170 [inline] __se_sys_io_uring_enter fs/io_uring.c:8129 [inline] __arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129 invoke_syscall arch/arm64/kernel/syscall.c:53 [inline] el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121 el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190 el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017 ================================================================================ As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass negative value maybe lead to overflow. To address this issue, we must check if 'sec' is negative. Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>