summaryrefslogtreecommitdiff
path: root/fs/nfs/dir.c
AgeCommit message (Collapse)Author
2024-01-04nfs: add new tracepoint at nfs4 revalidate entry pointJeff Layton
Add a call to the v4 d_revalidate entrypoint, just like the v3 one. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-01-04NFS: Use parent's objective cred in nfs_access_login_time()Scott Mayhew
The subjective cred (task->cred) can potentially be overridden and subsquently freed in non-RCU context, which could lead to a panic if we try to use it in cred_fscmp(). Use __task_cred(), which returns the objective cred (task->real_cred) instead. Fixes: 0eb43812c027 ("NFS: Clear the file access cache upon login") Fixes: 5e9a7b9c2ea1 ("NFS: Fix up a sparse warning") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-11-01nfs: Convert nfs_symlink() to use a folioMatthew Wilcox (Oracle)
Use the folio APIs, saving about four calls to compound_head(). Convert back to a page in each of the individual protocol implementations. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-08-24nfs: fix redundant readdir request after get eofKinglong Mee
When a directory contains 17 files (except . and ..), nfs client sends a redundant readdir request after get eof. A simple reproduce, At NFS server, create a directory with 17 files under exported directory. # mkdir test # cd test # for i in {0..16} ; do touch $i; done At NFS client, no matter mounting through nfsv3 or nfsv4, does ls (or ll) at the created test directory. A tshark output likes following (for nfsv4), # tshark -i eth0 tcp port 2049 -Tfields -e ip.src -e ip.dst -e nfs -e nfs.cookie4 srcip dstip SEQUENCE, PUTFH, READDIR 0 dstip srcip SEQUENCE PUTFH READDIR 909539109313539306,2108391201987888856,2305312124304486544,2566335452463141496,2978225129081509984,4263037479923412583,4304697173036510679,4666703455469210097,4759208201298769007,4776701232145978803,5338408478512081262,5949498658935544804,5971526429894832903,6294060338267709855,6528840566229532529,8600463293536422524,9223372036854775807 srcip dstip srcip dstip SEQUENCE, PUTFH, READDIR 9223372036854775807 dstip srcip SEQUENCE PUTFH READDIR The READDIR with cookie 9223372036854775807(0x7FFFFFFFFFFFFFFF) is redundant. Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-19NFS: Convert kmap_atomic() to kmap_local_folio()Fabio M. De Francesco
kmap_atomic() is deprecated in favor of kmap_local_{folio,page}(). Therefore, replace kmap_atomic() with kmap_local_folio() in nfs_readdir_folio_array_append(). kmap_atomic() disables page-faults and preemption (the latter only for !PREEMPT_RT kernels), However, the code within the mapping/un-mapping in nfs_readdir_folio_array_append() does not depend on the above-mentioned side effects. Therefore, a mere replacement of the old API with the new one is all that is required (i.e., there is no need to explicitly add any calls to pagefault_disable() and/or preempt_disable()). Tested with (x)fstests in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Fixes: ec108d3cc766 ("NFS: Convert readdir page array functions to use a folio") Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-05-09nfs: fix another case of NULL/IS_ERR confusion wrt folio pointersLinus Torvalds
Dan has been improving on the smatch error pointer checks, and pointed at another case where the __filemap_get_folio() conversion to error pointers had been overlooked. This time because it was hidden behind the filemap_grab_folio() helper function that is a wrapper around it. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Anna Schumaker <anna@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-05-06nfs: fix mis-merged __filemap_get_folio() error checkLinus Torvalds
Fix another case of an incorrect check for the returned 'folio' value from __filemap_get_folio(). The failure case used to return NULL, but was changed by commit 66dabbb65d67 ("mm: return an ERR_PTR from __filemap_get_folio"). But in the meantime, commit ec108d3cc766 ("NFS: Convert readdir page array functions to use a folio") added a new user of that function. And my merge of the two did not fix this up correctly. The ext4 merge had the same issue, but that one had been caught in linux-next and got properly fixed while merging. Fixes: 0127f25b5dfc ("Merge tag 'nfs-for-6.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs") Cc: Anna Schumaker <anna@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-04-12NFS: Convert readdir page array functions to use a folioAnna Schumaker
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-04-12NFS: Convert the readdir array-of-pages into an array-of-foliosAnna Schumaker
This patch only converts the actual array, but doesn't touch the individual nfs_cache_array pages and related functions (that will be done in the next patch). I also adjust the names of the fields in the nfs_readdir_descriptor to say "folio" instead of "page". Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-14NFS: Correct timing for assigning access cache timestampChengen Du
When the user's login time is newer than the cache's timestamp, the original entry in the RB-tree will be replaced by a new entry. Currently, the timestamp is only set if the entry is not found in the RB-tree, which can cause the timestamp to be undefined when the entry exists. This may result in a significant increase in ACCESS operations if the timestamp is set to zero. Signed-off-by: Chengen Du <chengen.du@canonical.com> Fixes: 0eb43812c027 ("NFS: Clear the file access cache upon login”) Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-02-22Merge tag 'nfs-for-6.3-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "New Features: - Convert the read and write paths to use folios Bugfixes and Cleanups: - Fix tracepoint state manager flag printing - Fix disabling swap files - Fix NFSv4 client identifier sysfs path in the documentation - Don't clear NFS_CAP_COPY if server returns NFS4ERR_OFFLOAD_DENIED - Treat GETDEVICEINFO errors as a layout failure - Replace kmap_atomic() calls with kmap_local_page() - Constify sunrpc sysfs kobj_type structures" * tag 'nfs-for-6.3-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (25 commits) fs/nfs: Replace kmap_atomic() with kmap_local_page() in dir.c pNFS/filelayout: treat GETDEVICEINFO errors as layout failure Documentation: Fix sysfs path for the NFSv4 client identifier nfs42: do not fail with EIO if ssc returns NFS4ERR_OFFLOAD_DENIED NFS: fix disabling of swap SUNRPC: make kobj_type structures constant nfs4trace: fix state manager flag printing NFS: Remove unnecessary check in nfs_read_folio() NFS: Improve tracing of nfs_wb_folio() NFS: Enable tracing of nfs_invalidate_folio() and nfs_launder_folio() NFS: fix up nfs_release_folio() to try to release the page NFS: Clean up O_DIRECT request allocation NFS: Fix up nfs_vm_page_mkwrite() for folios NFS: Convert nfs_write_begin/end to use folios NFS: Remove unused function nfs_wb_page() NFS: Convert buffered writes to use folios NFS: Convert the function nfs_wb_page() to use folios NFS: Convert buffered reads to use folios NFS: Add a helper nfs_wb_folio() NFS: Convert the remaining pagelist helper functions to support folios ...
2023-02-20Merge tag 'fs.idmapped.v6.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for non-vfs developers without detailed knowledge in this area this was a potential source for bugs. This finishes the conversion. Instead of passing the plain namespace around this updates all places that currently take a pointer to a mnt_userns with a pointer to struct mnt_idmap. Now that the conversion is done all helpers down to the really low-level helpers only accept a struct mnt_idmap argument instead of two namespace arguments. Conflating mount and other idmappings will now cause the compiler to complain loudly thus eliminating the possibility of any bugs. This makes it impossible for filesystem developers to mix up mount and filesystem idmappings as they are two distinct types and require distinct helpers that cannot be used interchangeably. Everything associated with struct mnt_idmap is moved into a single separate file. With that change no code can poke around in struct mnt_idmap. It can only be interacted with through dedicated helpers. That means all filesystems are and all of the vfs is completely oblivious to the actual implementation of idmappings. We are now also able to extend struct mnt_idmap as we see fit. For example, we can decouple it completely from namespaces for users that don't require or don't want to use them at all. We can also extend the concept of idmappings so we can cover filesystem specific requirements. In combination with the vfs{g,u}id_t work we finished in v6.2 this makes this feature substantially more robust and thus difficult to implement wrong by a given filesystem and also protects the vfs. - Enable idmapped mounts for tmpfs and fulfill a longstanding request. A long-standing request from users had been to make it possible to create idmapped mounts for tmpfs. For example, to share the host's tmpfs mount between multiple sandboxes. This is a prerequisite for some advanced Kubernetes cases. Systemd also has a range of use-cases to increase service isolation. And there are more users of this. However, with all of the other work going on this was way down on the priority list but luckily someone other than ourselves picked this up. As usual the patch is tiny as all the infrastructure work had been done multiple kernel releases ago. In addition to all the tests that we already have I requested that Rodrigo add a dedicated tmpfs testsuite for idmapped mounts to xfstests. It is to be included into xfstests during the v6.3 development cycle. This should add a slew of additional tests. * tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits) shmem: support idmapped mounts for tmpfs fs: move mnt_idmap fs: port vfs{g,u}id helpers to mnt_idmap fs: port fs{g,u}id helpers to mnt_idmap fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap fs: port i_{g,u}id_{needs_}update() to mnt_idmap quota: port to mnt_idmap fs: port privilege checking helpers to mnt_idmap fs: port inode_owner_or_capable() to mnt_idmap fs: port inode_init_owner() to mnt_idmap fs: port acl to mnt_idmap fs: port xattr to mnt_idmap fs: port ->permission() to pass mnt_idmap fs: port ->fileattr_set() to pass mnt_idmap fs: port ->set_acl() to pass mnt_idmap fs: port ->get_acl() to pass mnt_idmap fs: port ->tmpfile() to pass mnt_idmap fs: port ->rename() to pass mnt_idmap fs: port ->mknod() to pass mnt_idmap fs: port ->mkdir() to pass mnt_idmap ...
2023-02-15fs/nfs: Replace kmap_atomic() with kmap_local_page() in dir.cFabio M. De Francesco
kmap_atomic() is deprecated in favor of kmap_local_page(). With kmap_local_page() the mappings are per thread, CPU local, can take page-faults, and can be called from any context (including interrupts). Furthermore, the tasks can be preempted and, when they are scheduled to run again, the kernel virtual addresses are restored and still valid. kmap_atomic() is implemented like a kmap_local_page() which also disables page-faults and preemption (the latter only for !PREEMPT_RT kernels, otherwise it only disables migration). The code within the mappings/un-mappings in the functions of dir.c don't depend on the above-mentioned side effects of kmap_atomic(), so that mere replacements of the old API with the new one is all that is required (i.e., there is no need to explicitly add calls to pagefault_disable() and/or preempt_disable()). Therefore, replace kmap_atomic() with kmap_local_page() in fs/nfs/dir.c. Tested in a QEMU/KVM x86_32 VM, 6GB RAM, booting a kernel with HIGHMEM64GB enabled. Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-01-19fs: port ->permission() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->rename() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mknod() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->mkdir() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->symlink() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-19fs: port ->create() to pass mnt_idmapChristian Brauner
Convert to struct mnt_idmap. Last cycle we merged the necessary infrastructure in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). This is just the conversion to struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevent on the mount level. Especially for non-vfs developers without detailed knowledge in this area this can be a potential source for bugs. Once the conversion to struct mnt_idmap is done all helpers down to the really low-level helpers will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two eliminating the possibility of any bugs. All of the vfs and all filesystems only operate on struct mnt_idmap. Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-01NFS: Fix up a sparse warningTrond Myklebust
sparse is warning about an incorrect RCU dereference. fs/nfs/dir.c:2965:56: warning: incorrect type in argument 1 (different address spaces) fs/nfs/dir.c:2965:56: expected struct cred const * fs/nfs/dir.c:2965:56: got struct cred const [noderef] __rcu *const cred Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-01-01NFS: Judge the file access cache's timestamp in rcu pathChengen Du
If the user's login time is newer than the cache's timestamp, we expect the cache may be stale and need to clear. The stale cache will remain in the list's tail if no other users operate on that inode. Once the user accesses the inode, the stale cache will be returned in rcu path. Signed-off-by: Chengen Du <chengen.du@canonical.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-11-27NFS: Trigger the "ls -l" readdir heuristic soonerBenjamin Coddington
Since commit 1a34c8c9a49e ("NFS: Support larger readdir buffers") has updated dtsize, and with recent improvements to the READDIRPLUS helper heuristic, the heuristic may not trigger until many dentries are emitted to userspace. This will cause many thousands of GETATTR calls for "ls -l" when the directory's pagecache has already been populated. This manifests as poor performance for long directory listings after an initially fast "ls -l". Fix this by emitting only 17 entries for any first pass through the NFS directory's ->iterate_shared(), which allows userpace to prime the counters for the heuristic. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-11-27NFS: Clear the file access cache upon loginTrond Myklebust
POSIX typically only refreshes the user's supplementary group information upon login. Since NFS servers may often refresh their concept of the user supplementary group membership at their own cadence, it is possible for the NFS client's access cache to become stale due to the user's group membership changing on the server after the user has already logged in on the client. While it is reasonable to expect that such group membership changes are rare, and that we do not want to optimise the cache to accommodate them, it is also not unreasonable for the user to expect that if they log out and log back in again, that the staleness would clear up. Reviewed-by: Benjamin Coddington <bcodding@redhat.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-10-27nfs: Remove redundant null checks before kfreeYushan Zhou
Fix the following coccicheck warning: fs/nfs/dir.c:2494:2-7: WARNING: NULL check before some freeing functions is not needed. Signed-off-by: Yushan Zhou <katrinzhou@tencent.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-06Merge tag 'pull-file_inode' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull file_inode() updates from Al Vrio: "whack-a-mole: cropped up open-coded file_inode() uses..." * tag 'pull-file_inode' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: orangefs: use ->f_mapping _nfs42_proc_copy(): use ->f_mapping instead of file_inode()->i_mapping dma_buf: no need to bother with file_inode()->i_mapping nfs_finish_open(): don't open-code file_inode() bprm_fill_uid(): don't open-code file_inode() sgx: use ->f_mapping... exfat_iterate(): don't open-code file_inode(file) ibmvmc: don't open-code file_inode()
2022-09-01nfs_finish_open(): don't open-code file_inode()Al Viro
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-08-19NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENTNeilBrown
nfs_unlink() calls d_delete() twice if it receives ENOENT from the server - once in nfs_dentry_handle_enoent() from nfs_safe_remove and once in nfs_dentry_remove_handle_error(). nfs_rmddir() also calls it twice - the nfs_dentry_handle_enoent() call is direct and inside a region locked with ->rmdir_sem It is safe to call d_delete() twice if the refcount > 1 as the dentry is simply unhashed. If the refcount is 1, the first call sets d_inode to NULL and the second call crashes. This patch guards the d_delete() call from nfs_dentry_handle_enoent() leaving the one under ->remdir_sem in case that is important. In mainline it would be safe to remove the d_delete() call. However in older kernels to which this might be backported, that would change the behaviour of nfs_unlink(). nfs_unlink() used to unhash the dentry which resulted in nfs_dentry_handle_enoent() not calling d_delete(). So in older kernels we need the d_delete() in nfs_dentry_remove_handle_error() when called from nfs_unlink() but not when called from nfs_rmdir(). To make the code work correctly for old and new kernels, and from both nfs_unlink() and nfs_rmdir(), we protect the d_delete() call with simple_positive(). This ensures it is never called in a circumstance where it could crash. Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Fixes: 9019fb391de0 ("NFS: Label the dentry with a verifier in nfs_rmdir() and nfs_unlink()") Signed-off-by: NeilBrown <neilb@suse.de> Tested-by: Olga Kornievskaia <aglo@umich.edu> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-08-13NFS: Fix missing unlock in nfs_unlink()Sun Ke
Add the missing unlock before goto. Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Signed-off-by: Sun Ke <sunke32@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-08-08NFS: don't unhash dentry during unlink/renameNeilBrown
NFS unlink() (and rename over existing target) must determine if the file is open, and must perform a "silly rename" instead of an unlink (or before rename) if it is. Otherwise the client might hold a file open which has been removed on the server. Consequently if it determines that the file isn't open, it must block any subsequent opens until the unlink/rename has been completed on the server. This is currently achieved by unhashing the dentry. This forces any open attempt to the slow-path for lookup which will block on i_rwsem on the directory until the unlink/rename completes. A future patch will change the VFS to only get a shared lock on i_rwsem for unlink, so this will no longer work. Instead we introduce an explicit interlock. A special value is stored in dentry->d_fsdata while the unlink/rename is running and ->d_revalidate blocks while that value is present. When ->d_revalidate unblocks, the dentry will be invalid. This closes the race without requiring exclusion on i_rwsem. d_fsdata is already used in two different ways. 1/ an IS_ROOT directory dentry might have a "devname" stored in d_fsdata. Such a dentry doesn't have a name and so cannot be the target of unlink or rename. For safety we check if an old devname is still stored, and remove it if it is. 2/ a dentry with DCACHE_NFSFS_RENAMED set will have a 'struct nfs_unlinkdata' stored in d_fsdata. While this is set maydelete() will fail, so an unlink or rename will never proceed on such a dentry. Neither of these can be in effect when a dentry is the target of unlink or rename. So we can expect d_fsdata to be NULL, and store a special value ((void*)1) which is given the name NFS_FSDATA_BLOCKED to indicate that any lookup will be blocked. The d_count() is incremented under d_lock() when a lookup finds the dentry, so we check d_count() is low, and set NFS_FSDATA_BLOCKED under the same lock to avoid any races. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-12nfs: Replace kmap() with kmap_local_page()Fabio M. De Francesco
The use of kmap() is being deprecated in favor of kmap_local_page(). With kmap_local_page(), the mapping is per thread, CPU local and not globally visible. Furthermore, the mapping can be acquired from any context (including interrupts). Therefore, use kmap_local_page() in nfs_do_filldir() because this mapping is per thread, CPU local, and not globally visible. Suggested-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-10NFS: Fix case insensitive renamesTrond Myklebust
For filesystems that are case insensitive and case preserving, we need to be able to rename from one case folded variant of the filename to another. Currently, if we have looked up the target filename before the call to rename, then we may have a hashed dentry with that target name in the dcache, causing the vfs to optimise away the rename. To avoid that, let's drop the target dentry, and leave it to the server to optimise away the rename if that is the correct thing to do. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-06-15NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x fileDave Wysochanski
Commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag") added the FMODE_CAN_ODIRECT flag for NFSv3 but neglected to add it for NFSv4.x. This causes direct io on NFSv4.x to fail open with EINVAL: mount -o vers=4.2 127.0.0.1:/export /mnt/nfs4 dd if=/dev/zero of=/mnt/nfs4/file.bin bs=128k count=1 oflag=direct dd: failed to open '/mnt/nfs4/file.bin': Invalid argument dd of=/dev/null if=/mnt/nfs4/file.bin bs=128k count=1 iflag=direct dd: failed to open '/mnt/dir1/file1.bin': Invalid argument Fixes: a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag") Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-05-09nfs: Convert to free_folioMatthew Wilcox (Oracle)
Add a wrapper that converts back from the folio to the page. This entire file needs to be converted to use folios, but that's a task for a different set of patches. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-04-07NFS: Replace readdir's use of xxhash() with hash_64()Trond Myklebust
Both xxhash() and hash_64() appear to give similarly low collision rates with a standard linearly increasing readdir offset. They both give similarly higher collision rates when applied to ext4's offsets. So switch to using the standard hash_64(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-29NFSv4: fix open failure with O_ACCMODE flagChenXiaoSong
open() with O_ACCMODE|O_DIRECT flags secondly will fail. Reproducer: 1. mount -t nfs -o vers=4.2 $server_ip:/ /mnt/ 2. fd = open("/mnt/file", O_ACCMODE|O_DIRECT|O_CREAT) 3. close(fd) 4. fd = open("/mnt/file", O_ACCMODE|O_DIRECT) Server nfsd4_decode_share_access() will fail with error nfserr_bad_xdr when client use incorrect share access mode of 0. Fix this by using NFS4_SHARE_ACCESS_BOTH share access mode in client, just like firstly opening. Fixes: ce4ef7c0a8a05 ("NFS: Split out NFS v4 file operations") Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22NFS: Fix revalidation of empty readdir pagesTrond Myklebust
If the page is empty, we need to check the array->last_cookie instead of the first entry. Add a helper for the cases where we care. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-22NFS: Don't deadlock when cookie hashes collideTrond Myklebust
In the very rare case where the readdir reply contains multiple cookies that map to the same hash value, we can end up deadlocking waiting for a page lock that we already hold. In this case we should fail the page lock by using grab_cache_page_nowait(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Cache all entries in the readdirplus replyTrond Myklebust
Even if we're not able to cache all the entries in the readdir buffer, let's ensure that we do prime the dcache. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Optimise away the previous cookie fieldTrond Myklebust
Replace the 'previous cookie' field in struct nfs_entry with the array->last_cookie. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Fix up forced readdirplusTrond Myklebust
Avoid clearing the entire readdir page cache if we're just doing forced readdirplus for the 'ls -l' heuristic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Convert readdir page cache to use a cookie based indexTrond Myklebust
Instead of using a linear index to address the pages, use the cookie of the first entry, since that is what we use to match the page anyway. This allows us to avoid re-reading the entire cache on a seekdir() type of operation. The latter is very common when re-exporting NFS, and is a major performance drain. The change does affect our duplicate cookie detection, since we can no longer rely on the page index as a linear offset for detecting whether we looped backwards. However since we no longer do a linear search through all the pages on each call to nfs_readdir(), this is less of a concern than it was previously. The other downside is that invalidate_mapping_pages() no longer can use the page index to avoid clearing pages that have been read. A subsequent patch will restore the functionality this provides to the 'ls -l' heuristic. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Clean up page array initialisation/freeTrond Myklebust
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Trace effects of the readdirplus heuristicTrond Myklebust
Enable tracking of when the readdirplus heuristic causes a page cache invalidation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Trace effects of readdirplus on the dcacheTrond Myklebust
Trace the effects of readdirplus on attribute and dentry revalidation. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Add basic readdir tracingTrond Myklebust
Add tracing to track how often the client goes to the server for updated readdir information. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Don't request readdirplus when revalidation was forcedTrond Myklebust
If the revalidation was forced, due to the presence of a LOOKUP_EXCL or a LOOKUP_REVAL flag, then readdirplus won't help. It also can't help when we're doing a path component lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Readdirplus can't help lookup for case insensitive filesystemsTrond Myklebust
If the filesystem is case insensitive, then readdirplus can't help with cache misses, since it won't return case folded variants of the filename. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Improve heuristic for readdirplusTrond Myklebust
The heuristic for readdirplus is designed to try to detect 'ls -l' and similar patterns. It does so by looking for cache hit/miss patterns in both the attribute cache and in the dcache of the files in a given directory, and then sets a flag for the readdirplus code to interpret. The problem with this approach is that a single attribute or dcache miss can cause the NFS code to force a refresh of the attributes for the entire set of files contained in the directory. To be able to make a more nuanced decision, let's sample the number of hits and misses in the set of open directory descriptors. That allows us to set thresholds at which we start preferring READDIRPLUS over regular READDIR, or at which we start to force a re-read of the remaining readdir cache using READDIRPLUS. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Reduce use of uncached readdirTrond Myklebust
When reading a very large directory, we want to try to keep the page cache up to date if doing so is inexpensive. With the change to allow readdir to continue reading even when the cache is incomplete, we no longer need to fall back to uncached readdir in order to scale to large directories. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-02NFS: Simplify nfs_readdir_xdr_to_array()Trond Myklebust
Recent changes to readdir mean that we can cope with partially filled page cache entries, so we no longer need to rely on looping in nfs_readdir_xdr_to_array(). Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>