summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-11-28xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delayChristoph Hellwig
xfs_bmap_add_extent_hole_delay works entirely on delalloc extents, for which xfs_bmap_same_rtgroup doesn't make sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-11-28xfs: Use xchg() in xlog_cil_insert_pcp_aggregate()Uros Bizjak
try_cmpxchg() loop with constant "new" value can be substituted with just xchg() to atomically get and clear the location. The code on x86_64 improves from: 1e7f: 48 89 4c 24 10 mov %rcx,0x10(%rsp) 1e84: 48 03 14 c5 00 00 00 add 0x0(,%rax,8),%rdx 1e8b: 00 1e88: R_X86_64_32S __per_cpu_offset 1e8c: 8b 02 mov (%rdx),%eax 1e8e: 41 89 c5 mov %eax,%r13d 1e91: 31 c9 xor %ecx,%ecx 1e93: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 1e97: 75 f5 jne 1e8e <xlog_cil_commit+0x84e> 1e99: 48 8b 4c 24 10 mov 0x10(%rsp),%rcx 1e9e: 45 01 e9 add %r13d,%r9d to just: 1e7f: 48 03 14 cd 00 00 00 add 0x0(,%rcx,8),%rdx 1e86: 00 1e83: R_X86_64_32S __per_cpu_offset 1e87: 31 c9 xor %ecx,%ecx 1e89: 87 0a xchg %ecx,(%rdx) 1e8b: 41 01 cb add %ecx,%r11d No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <dchinner@redhat.com> Reviewed-by: Alex Elder <elder@riscstar.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-11-27Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "A small number of improvements all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_vdpa: remove redundant check on desc virtio_fs: store actual queue index in mq_map virtio_fs: add informative log for new tag discovery virtio: Make vring_new_virtqueue support packed vring virtio_pmem: Add freeze/restore callbacks vdpa/mlx5: Fix suboptimal range on iotlb iteration
2024-11-27Merge tag 'vfs-6.13-rc1.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix a few iomap bugs - Fix a wrong argument in backing file callback - Fix security mount option retrieval in statmount() - Cleanup how statmount() handles unescaped options - Add a missing inode_owner_or_capable() check for setting write hints - Clear the return value in read_kcore_iter() after a successful iov_iter_zero() - Fix a mount_setattr() selftest - Fix function signature in mount api documentation - Remove duplicate include header in the fscache code * tag 'vfs-6.13-rc1.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs/backing_file: fix wrong argument in callback fs_parser: update mount_api doc to match function signature fs: require inode_owner_or_capable for F_SET_RW_HINT fs/proc/kcore.c: Clear ret value in read_kcore_iter after successful iov_iter_zero statmount: fix security option retrieval statmount: clean up unescaped option handling fscache: Remove duplicate included header iomap: elide flush from partial eof zero range iomap: lift zeroed mapping handling into iomap_zero_range() iomap: reset per-iter state on non-error iter advances iomap: warn on zero range of a post-eof folio selftests/mount_setattr: Fix failures on 64K PAGE_SIZE kernels
2024-11-27Merge tag 'vfs-6.13.exec.deny_write_access.revert' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull deny_write_access revert from Christian Brauner: "It turns out that the mold linker relies on the deny_write_access() mechanism for executables. The mold linker tries to open a file for writing and if ETXTBSY is returned mold falls back to creating a new file" * tag 'vfs-6.13.exec.deny_write_access.revert' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: Revert "fs: don't block i_writecount during exec"
2024-11-27Revert "fs: don't block i_writecount during exec"Christian Brauner
This reverts commit 2a010c41285345da60cece35575b4e0af7e7bf44. Rui Ueyama <rui314@gmail.com> writes: > I'm the creator and the maintainer of the mold linker > (https://github.com/rui314/mold). Recently, we discovered that mold > started causing process crashes in certain situations due to a change > in the Linux kernel. Here are the details: > > - In general, overwriting an existing file is much faster than > creating an empty file and writing to it on Linux, so mold attempts to > reuse an existing executable file if it exists. > > - If a program is running, opening the executable file for writing > previously failed with ETXTBSY. If that happens, mold falls back to > creating a new file. > > - However, the Linux kernel recently changed the behavior so that > writing to an executable file is now always permitted > (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a010c412853). > > That caused mold to write to an executable file even if there's a > process running that file. Since changes to mmap'ed files are > immediately visible to other processes, any processes running that > file would almost certainly crash in a very mysterious way. > Identifying the cause of these random crashes took us a few days. > > Rejecting writes to an executable file that is currently running is a > well-known behavior, and Linux had operated that way for a very long > time. So, I don’t believe relying on this behavior was our mistake; > rather, I see this as a regression in the Linux kernel. Quoting myself from commit 2a010c412853 ("fs: don't block i_writecount during exec") > Yes, someone in userspace could potentially be relying on this. It's not > completely out of the realm of possibility but let's find out if that's > actually the case and not guess. It seems we found out that someone is relying on this obscure behavior. So revert the change. Link: https://github.com/rui314/mold/issues/1361 Link: https://lore.kernel.org/r/4a2bc207-76be-4715-8e12-7fc45a76a125@leemhuis.info Cc: <stable@vger.kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-26smb: Initialize cfid->tcon before performing network opsPaul Aurich
Avoid leaking a tcon ref when a lease break races with opening the cached directory. Processing the leak break might take a reference to the tcon in cached_dir_lease_break() and then fail to release the ref in cached_dir_offload_close, since cfid->tcon is still NULL. Fixes: ebe98f1447bb ("cifs: enable caching of directories for which a lease is held") Signed-off-by: Paul Aurich <paul@darkrain42.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-26smb: During unmount, ensure all cached dir instances drop their dentryPaul Aurich
The unmount process (cifs_kill_sb() calling close_all_cached_dirs()) can race with various cached directory operations, which ultimately results in dentries not being dropped and these kernel BUGs: BUG: Dentry ffff88814f37e358{i=1000000000080,n=/} still in use (2) [unmount of cifs cifs] VFS: Busy inodes after unmount of cifs (cifs) ------------[ cut here ]------------ kernel BUG at fs/super.c:661! This happens when a cfid is in the process of being cleaned up when, and has been removed from the cfids->entries list, including: - Receiving a lease break from the server - Server reconnection triggers invalidate_all_cached_dirs(), which removes all the cfids from the list - The laundromat thread decides to expire an old cfid. To solve these problems, dropping the dentry is done in queued work done in a newly-added cfid_put_wq workqueue, and close_all_cached_dirs() flushes that workqueue after it drops all the dentries of which it's aware. This is a global workqueue (rather than scoped to a mount), but the queued work is minimal. The final cleanup work for cleaning up a cfid is performed via work queued in the serverclose_wq workqueue; this is done separate from dropping the dentries so that close_all_cached_dirs() doesn't block on any server operations. Both of these queued works expect to invoked with a cfid reference and a tcon reference to avoid those objects from being freed while the work is ongoing. While we're here, add proper locking to close_all_cached_dirs(), and locking around the freeing of cfid->dentry. Fixes: ebe98f1447bb ("cifs: enable caching of directories for which a lease is held") Cc: stable@vger.kernel.org Signed-off-by: Paul Aurich <paul@darkrain42.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-26smb: client: fix noisy message when mounting sharesPaulo Alcantara
When the client unconditionally attempts to get an DFS referral to check if share is DFS, some servers may return different errors that aren't handled in smb2_get_dfs_refer(), so the following will be logged in dmesg: CIFS: VFS: \\srv\IPC$ smb2_get_dfs_refer: ioctl error... which can confuse some users while mounting an SMB share. Fix this by logging such error with FYI. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-26smb: client: don't try following DFS links in cifs_tree_connect()Paulo Alcantara
We can't properly support chasing DFS links in cifs_tree_connect() because (1) We don't support creating new sessions while we're reconnecting, which would be required for DFS interlinks. (2) ->is_path_accessible() can't be called from cifs_tree_connect() as it would deadlock with smb2_reconnect(). This is required for checking if new DFS target is a nested DFS link. By unconditionally trying to get an DFS referral from new DFS target isn't correct because if the new DFS target (interlink) is an DFS standalone namespace, then we would end up getting -ELOOP and then potentially leaving tcon disconnected. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-26smb: client: allow reconnect when sending ioctlPaulo Alcantara
cifs_tree_connect() no longer uses ioctl, so allow sessions to be reconnected when sending ioctls. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-26smb: client: get rid of @nlsc param in cifs_tree_connect()Paulo Alcantara
We can access local_nls directly from @tcon->ses, so there is no need to pass it as parameter in cifs_tree_connect(). Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-26smb: client: allow more DFS referrals to be cachedPaulo Alcantara
In some DFS setups, a single DFS share may contain hundreds of DFS links and increasing the DFS cache to allow more referrals to be cached improves DFS failover as the client will likely find a cached DFS referral when reconnecting and then avoiding unnecessary remounts. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-26udf: Verify inode link counts before performing renameJan Kara
During rename, we are updating link counts of various inodes either when rename deletes target or when moving directory across directories. Verify involved link counts are sane so that we don't trip warnings in VFS. Reported-by: syzbot+3ff7365dc04a6bcafa66@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz>
2024-11-26udf: Skip parent dir link count update if corruptedJan Kara
If the parent directory link count is too low (likely directory inode corruption), just skip updating its link count as if it goes to 0 too early it can cause unexpected issues. Signed-off-by: Jan Kara <jack@suse.cz>
2024-11-26quota: flush quota_release_work upon quota writebackOjaswin Mujoo
One of the paths quota writeback is called from is: freeze_super() sync_filesystem() ext4_sync_fs() dquot_writeback_dquots() Since we currently don't always flush the quota_release_work queue in this path, we can end up with the following race: 1. dquot are added to releasing_dquots list during regular operations. 2. FS Freeze starts, however, this does not flush the quota_release_work queue. 3. Freeze completes. 4. Kernel eventually tries to flush the workqueue while FS is frozen which hits a WARN_ON since transaction gets started during frozen state: ext4_journal_check_start+0x28/0x110 [ext4] (unreliable) __ext4_journal_start_sb+0x64/0x1c0 [ext4] ext4_release_dquot+0x90/0x1d0 [ext4] quota_release_workfn+0x43c/0x4d0 Which is the following line: WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE); Which ultimately results in generic/390 failing due to dmesg noise. This was detected on powerpc machine 15 cores. To avoid this, make sure to flush the workqueue during dquot_writeback_dquots() so we dont have any pending workitems after freeze. Reported-by: Disha Goel <disgoel@linux.ibm.com> CC: stable@vger.kernel.org Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") Reviewed-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241121123855.645335-2-ojaswin@linux.ibm.com
2024-11-26Merge tag 'vfs-6.13.ecryptfs.mount.api' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull ecryptfs mount api conversion from Christian Brauner: "Convert ecryptfs to the new mount api" * tag 'vfs-6.13.ecryptfs.mount.api' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ecryptfs: Fix spelling mistake "validationg" -> "validating" ecryptfs: Convert ecryptfs to use the new mount API ecryptfs: Factor out mount option validation
2024-11-26Merge tag 'vfs-6.13.exportfs' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs exportfs updates from Christian Brauner: "This contains work to bring NFS connectable file handles to userspace servers. The name_to_handle_at() system call is extended to encode connectable file handles. Such file handles can be resolved to an open file with a connected path. So far userspace NFS servers couldn't make use of this functionality even though the kernel does already support it. This is achieved by introducing a new flag for name_to_handle_at(). Similarly, the open_by_handle_at() system call is tought to understand connectable file handles explicitly created via name_to_handle_at()" * tag 'vfs-6.13.exportfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: open_by_handle_at() support for decoding "explicit connectable" file handles fs: name_to_handle_at() support for "explicit connectable" file handles fs: prepare for "explicit connectable" file handles
2024-11-26Merge tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: "Jeff Layton contributed a scalability improvement to NFSD's NFSv4 backchannel session implementation. This improvement is intended to increase the rate at which NFSD can safely recall NFSv4 delegations from clients, to avoid the need to revoke them. Revoking requires a slow state recovery process. A wide variety of bug fixes and other incremental improvements make up the bulk of commits in this series. As always I am grateful to the NFSD contributors, reviewers, testers, and bug reporters who participated during this cycle" * tag 'nfsd-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (72 commits) nfsd: allow for up to 32 callback session slots nfs_common: must not hold RCU while calling nfsd_file_put_local nfsd: get rid of include ../internal.h nfsd: fix nfs4_openowner leak when concurrent nfsd4_open occur NFSD: Add nfsd4_copy time-to-live NFSD: Add a laundromat reaper for async copy state NFSD: Block DESTROY_CLIENTID only when there are ongoing async COPY operations NFSD: Handle an NFS4ERR_DELAY response to CB_OFFLOAD NFSD: Free async copy information in nfsd4_cb_offload_release() NFSD: Fix nfsd4_shutdown_copy() NFSD: Add a tracepoint to record canceled async COPY operations nfsd: make nfsd4_session->se_flags a bool nfsd: remove nfsd4_session->se_bchannel nfsd: make use of warning provided by refcount_t nfsd: Don't fail OP_SETCLIENTID when there are too many clients. svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init() xdrgen: Remove program_stat_to_errno() call sites xdrgen: Update the files included in client-side source code xdrgen: Remove check for "nfs_ok" in C templates xdrgen: Remove tracepoint call site ...
2024-11-26Merge tag 'f2fs-for-6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "This series introduces a device aliasing feature where user can carve out partitions but reclaim the space back by deleting aliased file in root dir. In addition to that, there're numerous minor bug fixes in zoned device support, checkpoint=disable, extent cache management, fiemap, and lazytime mount option. The full list of noticeable changes can be found below. Enhancements: - introduce device aliasing file - add stats in debugfs to show multiple devices - add a sysfs node to limit max read extent count per-inode - modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable - decrease spare area for pinned files for zoned devices Fixes: - Revert "f2fs: remove unreachable lazytime mount option parsing" - adjust unusable cap before checkpoint=disable mode - fix to drop all discards after creating snapshot on lvm device - fix to shrink read extent node in batches - fix changing cursegs if recovery fails on zoned device - fix to adjust appropriate length for fiemap - fix fiemap failure issue when page size is 16KB - fix to avoid forcing direct write to use buffered IO on inline_data inode - fix to map blocks correctly for direct write - fix to account dirty data in __get_secs_required() - fix null-ptr-deref in f2fs_submit_page_bio() - fix inconsistent update of i_blocks in release_compress_blocks and reserve_compress_blocks" * tag 'f2fs-for-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits) f2fs: fix to drop all discards after creating snapshot on lvm device f2fs: add a sysfs node to limit max read extent count per-inode f2fs: fix to shrink read extent node in batches f2fs: print message if fscorrupted was found in f2fs_new_node_page() f2fs: clear SBI_POR_DOING before initing inmem curseg f2fs: fix changing cursegs if recovery fails on zoned device f2fs: adjust unusable cap before checkpoint=disable mode f2fs: fix to requery extent which cross boundary of inquiry f2fs: fix to adjust appropriate length for fiemap f2fs: clean up w/ F2FS_{BLK_TO_BYTES,BTYES_TO_BLK} f2fs: fix to do cast in F2FS_{BLK_TO_BYTES, BTYES_TO_BLK} to avoid overflow f2fs: replace deprecated strcpy with strscpy Revert "f2fs: remove unreachable lazytime mount option parsing" f2fs: fix to avoid forcing direct write to use buffered IO on inline_data inode f2fs: fix to map blocks correctly for direct write f2fs: fix race in concurrent f2fs_stop_gc_thread f2fs: fix fiemap failure issue when page size is 16KB f2fs: remove redundant atomic file check in defragment f2fs: fix to convert log type to segment data type correctly f2fs: clean up the unused variable additional_reserved_segments ...
2024-11-26Merge tag 'fuse-update-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Add page -> folio conversions (Joanne Koong, Josef Bacik) - Allow max size of fuse requests to be configurable with a sysctl (Joanne Koong) - Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun) - Fix large kernel reads (like a module load) in virtio_fs (Hou Tao) - Fix attribute inconsistency in case readdirplus (and plain lookup in corner cases) is racing with inode eviction (Zhang Tianci) - Fix a WARN_ON triggered by virtio_fs (Asahi Lina) * tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits) virtiofs: dax: remove ->writepages() callback fuse: check attributes staleness on fuse_iget() fuse: remove pages for requests and exclusively use folios fuse: convert direct io to use folios mm/writeback: add folio_mark_dirty_lock() fuse: convert writebacks to use folios fuse: convert retrieves to use folios fuse: convert ioctls to use folios fuse: convert writes (non-writeback) to use folios fuse: convert reads to use folios fuse: convert readdir to use folios fuse: convert readlink to use folios fuse: convert cuse to use folios fuse: add support in virtio for requests using folios fuse: support folios in struct fuse_args_pages and fuse_copy_pages() fuse: convert fuse_notify_store to use folios fuse: convert fuse_retrieve to use folios fuse: use the folio based vmstat helpers fuse: convert fuse_writepage_need_send to take a folio fuse: convert fuse_do_readpage to use folios ...
2024-11-26Merge tag 'gfs2-for-6.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix the code that cleans up left-over unlinked files. Various fixes and minor improvements in deleting files cached or held open remotely. - Simplify the use of dlm's DLM_LKF_QUECVT flag. - A few other minor cleanups. * tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (21 commits) gfs2: Prevent inode creation race gfs2: Only defer deletes when we have an iopen glock gfs2: Simplify DLM_LKF_QUECVT use gfs2: gfs2_evict_inode clarification gfs2: Make gfs2_inode_refresh static gfs2: Use get_random_u32 in gfs2_orlov_skip gfs2: Randomize GLF_VERIFY_DELETE work delay gfs2: Use mod_delayed_work in gfs2_queue_try_to_evict gfs2: Update to the evict / remote delete documentation gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode gfs2: Clean up delete work processing gfs2: Minor delete_work_func cleanup gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glock gfs2: Rename dinode_demise to evict_behavior gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETE gfs2: Faster gfs2_upgrade_iopen_glock wakeups KMSAN: uninit-value in inode_go_dump (5) gfs2: Fix unlinked inode cleanup gfs2: Allow immediate GLF_VERIFY_DELETE work gfs2: Initialize gl_no_formal_ino earlier ...
2024-11-26Merge branch 'ovl.fixes'Christian Brauner
Bring in an overlayfs fix for v6.13-rc1 that fixes a bug introduced by the overlayfs changes merged for v6.13. Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-26fs/backing_file: fix wrong argument in callbackAmir Goldstein
Commit 48b50624aec4 ("backing-file: clean up the API") unintentionally changed the argument in the ->accessed() callback from the user file to the backing file. Fixes: 48b50624aec4 ("backing-file: clean up the API") Reported-by: syzbot+8d1206605b05ca9a0e6a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-unionfs/67447b3c.050a0220.1cc393.0085.GAE@google.com/ Tested-by: syzbot+8d1206605b05ca9a0e6a@syzkaller.appspotmail.com Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://lore.kernel.org/r/20241126145342.364869-1-amir73il@gmail.com Acked-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-25ksmbd: fix use-after-free in SMB request handlingYunseong Kim
A race condition exists between SMB request handling in `ksmbd_conn_handler_loop()` and the freeing of `ksmbd_conn` in the workqueue handler `handle_ksmbd_work()`. This leads to a UAF. - KASAN: slab-use-after-free Read in handle_ksmbd_work - KASAN: slab-use-after-free in rtlock_slowlock_locked This race condition arises as follows: - `ksmbd_conn_handler_loop()` waits for `conn->r_count` to reach zero: `wait_event(conn->r_count_q, atomic_read(&conn->r_count) == 0);` - Meanwhile, `handle_ksmbd_work()` decrements `conn->r_count` using `atomic_dec_return(&conn->r_count)`, and if it reaches zero, calls `ksmbd_conn_free()`, which frees `conn`. - However, after `handle_ksmbd_work()` decrements `conn->r_count`, it may still access `conn->r_count_q` in the following line: `waitqueue_active(&conn->r_count_q)` or `wake_up(&conn->r_count_q)` This results in a UAF, as `conn` has already been freed. The discovery of this UAF can be referenced in the following PR for syzkaller's support for SMB requests. Link: https://github.com/google/syzkaller/pull/5524 Fixes: ee426bfb9d09 ("ksmbd: add refcnt to ksmbd_conn struct") Cc: linux-cifs@vger.kernel.org Cc: stable@vger.kernel.org # v6.6.55+, v6.10.14+, v6.11.3+ Cc: syzkaller@googlegroups.com Signed-off-by: Yunseong Kim <yskelg@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25ksmbd: add debug print for pending request during server shutdownNamjae Jeon
We need to know how many pending requests are left at the end of server shutdown. That means we need to know how long the server will wait to process pending requests in case of a server shutdown. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25ksmbd: add netdev-up/down event debug printNamjae Jeon
Add netdev-up/down event debug print to find what netdev is connected or disconnected. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25ksmbd: add debug prints to know what smb2 requests were receivedNamjae Jeon
Add debug prints to know what smb2 requests were received. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25ksmbd: add debug print for rdma capableNamjae Jeon
Add debug print to know if netdevice is RDMA-capable network adapter. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25ksmbd: use msleep instaed of schedule_timeout_interruptible()Namjae Jeon
use msleep instaed of schedule_timeout_interruptible() to guarantee the task delays as expected. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25ksmbd: use __GFP_RETRY_MAYFAILNamjae Jeon
Prefer to report ENOMEM rather than incur the oom for allocations in ksmbd. __GFP_NORETRY could not achieve that, It would fail the allocations just too easily. __GFP_RETRY_MAYFAIL will keep retrying the allocation until there is no more progress and fail the allocation instead go OOM and let the caller to deal with it. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25Merge tag 'mm-nonmm-stable-2024-11-24-02-05' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - The series "resource: A couple of cleanups" from Andy Shevchenko performs some cleanups in the resource management code - The series "Improve the copy of task comm" from Yafang Shao addresses possible race-induced overflows in the management of task_struct.comm[] - The series "Remove unnecessary header includes from {tools/}lib/list_sort.c" from Kuan-Wei Chiu adds some cleanups and a small fix to the list_sort library code and to its selftest - The series "Enhance min heap API with non-inline functions and optimizations" also from Kuan-Wei Chiu optimizes and cleans up the min_heap library code - The series "nilfs2: Finish folio conversion" from Ryusuke Konishi finishes off nilfs2's folioification - The series "add detect count for hung tasks" from Lance Yang adds more userspace visibility into the hung-task detector's activity - Apart from that, singelton patches in many places - please see the individual changelogs for details * tag 'mm-nonmm-stable-2024-11-24-02-05' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (71 commits) gdb: lx-symbols: do not error out on monolithic build kernel/reboot: replace sprintf() with sysfs_emit() lib: util_macros_kunit: add kunit test for util_macros.h util_macros.h: fix/rework find_closest() macros Improve consistency of '#error' directive messages ocfs2: fix uninitialized value in ocfs2_file_read_iter() hung_task: add docs for hung_task_detect_count hung_task: add detect count for hung tasks dma-buf: use atomic64_inc_return() in dma_buf_getfile() fs/proc/kcore.c: fix coccinelle reported ERROR instances resource: avoid unnecessary resource tree walking in __region_intersects() ocfs2: remove unused errmsg function and table ocfs2: cluster: fix a typo lib/scatterlist: use sg_phys() helper checkpatch: always parse orig_commit in fixes tag nilfs2: convert metadata aops from writepage to writepages nilfs2: convert nilfs_recovery_copy_block() to take a folio nilfs2: convert nilfs_page_count_clean_buffers() to take a folio nilfs2: remove nilfs_writepage nilfs2: convert checkpoint file to be folio-based ...
2024-11-25cifs: Fix parsing reparse point with native symlink in SMB1 non-UNICODE sessionPali Rohár
SMB1 NT_TRANSACT_IOCTL/FSCTL_GET_REPARSE_POINT even in non-UNICODE mode returns reparse buffer in UNICODE/UTF-16 format. This is because FSCTL_GET_REPARSE_POINT is NT-based IOCTL which does not distinguish between 8-bit non-UNICODE and 16-bit UNICODE modes and its path buffers are always encoded in UTF-16. This change fixes reading of native symlinks in SMB1 when UNICODE session is not active. Fixes: ed3e0a149b58 ("smb: client: implement ->query_reparse_point() for SMB1") Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25cifs: Validate content of WSL reparse point buffersPali Rohár
WSL socket, fifo, char and block devices have empty reparse buffer. Validate the length of the reparse buffer. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25cifs: Improve guard for excluding $LXDEV xattrPali Rohár
$LXDEV xattr is for storing block/char device's major and minor number. Change guard which excludes storing $LXDEV xattr to explicitly filter everything except block and char device. Current guard is opposite, which is currently correct but is less-safe. This change is required for adding support for creating WSL-style symlinks as symlinks also do not use device's major and minor numbers. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25cifs: Add support for parsing WSL-style symlinksPali Rohár
Linux CIFS client currently does not implement readlink() for WSL-style symlinks. It is only able to detect that file is of WSL-style symlink, but is not able to read target symlink location. Add this missing functionality and implement support for parsing content of WSL-style symlink. The important note is that symlink target location stored for WSL symlink reparse point (IO_REPARSE_TAG_LX_SYMLINK) is in UTF-8 encoding instead of UTF-16 (which is used in whole SMB protocol and also in all other symlink styles). So for proper locale/cp support it is needed to do conversion from UTF-8 to local_nls. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25cifs: Validate content of native symlinkPali Rohár
Check that path buffer has correct length (it is non-zero and in UNICODE mode it has even number of bytes) and check that buffer does not contain null character (UTF-16 null codepoint in UNICODE mode or null byte in non-unicode mode) because Linux cannot process symlink with null byte. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25cifs: Fix parsing native symlinks relative to the exportPali Rohár
SMB symlink which has SYMLINK_FLAG_RELATIVE set is relative (as opposite of the absolute) and it can be relative either to the current directory (where is the symlink stored) or relative to the top level export path. To what it is relative depends on the first character of the symlink target path. If the first character is path separator then symlink is relative to the export, otherwise to the current directory. Linux (and generally POSIX systems) supports only symlink paths relative to the current directory where is symlink stored. Currently if Linux SMB client reads relative SMB symlink with first character as path separator (slash), it let as is. Which means that Linux interpret it as absolute symlink pointing from the root (/). But this location is different than the top level directory of SMB export (unless SMB export was mounted to the root) and thefore SMB symlinks relative to the export are interpreted wrongly by Linux SMB client. Fix this problem. As Linux does not have equivalent of the path relative to the top of the mount point, convert such symlink target path relative to the current directory. Do this by prepending "../" pattern N times before the SMB target path, where N is the number of path separators found in SMB symlink path. So for example, if SMB share is mounted to Linux path /mnt/share/, symlink is stored in file /mnt/share/test/folder1/symlink (so SMB symlink path is test\folder1\symlink) and SMB symlink target points to \test\folder2\file, then convert symlink target path to Linux path ../../test/folder2/file. Deduplicate code for parsing SMB symlinks in native form from functions smb2_parse_symlink_response() and parse_reparse_native_symlink() into new function smb2_parse_native_symlink() and pass into this new function a new full_path parameter from callers, which specify SMB full path where is symlink stored. This change fixes resolving of the native Windows symlinks relative to the top level directory of the SMB share. Signed-off-by: Pali Rohár <pali@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25smb: client: fix NULL ptr deref in crypto_aead_setkey()Paulo Alcantara
Neither SMB3.0 or SMB3.02 supports encryption negotiate context, so when SMB2_GLOBAL_CAP_ENCRYPTION flag is set in the negotiate response, the client uses AES-128-CCM as the default cipher. See MS-SMB2 3.3.5.4. Commit b0abcd65ec54 ("smb: client: fix UAF in async decryption") added a @server->cipher_type check to conditionally call smb3_crypto_aead_allocate(), but that check would always be false as @server->cipher_type is unset for SMB3.02. Fix the following KASAN splat by setting @server->cipher_type for SMB3.02 as well. mount.cifs //srv/share /mnt -o vers=3.02,seal,... BUG: KASAN: null-ptr-deref in crypto_aead_setkey+0x2c/0x130 Read of size 8 at addr 0000000000000020 by task mount.cifs/1095 CPU: 1 UID: 0 PID: 1095 Comm: mount.cifs Not tainted 6.12.0 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 ? crypto_aead_setkey+0x2c/0x130 kasan_report+0xda/0x110 ? crypto_aead_setkey+0x2c/0x130 crypto_aead_setkey+0x2c/0x130 crypt_message+0x258/0xec0 [cifs] ? __asan_memset+0x23/0x50 ? __pfx_crypt_message+0x10/0x10 [cifs] ? mark_lock+0xb0/0x6a0 ? hlock_class+0x32/0xb0 ? mark_lock+0xb0/0x6a0 smb3_init_transform_rq+0x352/0x3f0 [cifs] ? lock_acquire.part.0+0xf4/0x2a0 smb_send_rqst+0x144/0x230 [cifs] ? __pfx_smb_send_rqst+0x10/0x10 [cifs] ? hlock_class+0x32/0xb0 ? smb2_setup_request+0x225/0x3a0 [cifs] ? __pfx_cifs_compound_last_callback+0x10/0x10 [cifs] compound_send_recv+0x59b/0x1140 [cifs] ? __pfx_compound_send_recv+0x10/0x10 [cifs] ? __create_object+0x5e/0x90 ? hlock_class+0x32/0xb0 ? do_raw_spin_unlock+0x9a/0xf0 cifs_send_recv+0x23/0x30 [cifs] SMB2_tcon+0x3ec/0xb30 [cifs] ? __pfx_SMB2_tcon+0x10/0x10 [cifs] ? lock_acquire.part.0+0xf4/0x2a0 ? __pfx_lock_release+0x10/0x10 ? do_raw_spin_trylock+0xc6/0x120 ? lock_acquire+0x3f/0x90 ? _get_xid+0x16/0xd0 [cifs] ? __pfx_SMB2_tcon+0x10/0x10 [cifs] ? cifs_get_smb_ses+0xcdd/0x10a0 [cifs] cifs_get_smb_ses+0xcdd/0x10a0 [cifs] ? __pfx_cifs_get_smb_ses+0x10/0x10 [cifs] ? cifs_get_tcp_session+0xaa0/0xca0 [cifs] cifs_mount_get_session+0x8a/0x210 [cifs] dfs_mount_share+0x1b0/0x11d0 [cifs] ? __pfx___lock_acquire+0x10/0x10 ? __pfx_dfs_mount_share+0x10/0x10 [cifs] ? lock_acquire.part.0+0xf4/0x2a0 ? find_held_lock+0x8a/0xa0 ? hlock_class+0x32/0xb0 ? lock_release+0x203/0x5d0 cifs_mount+0xb3/0x3d0 [cifs] ? do_raw_spin_trylock+0xc6/0x120 ? __pfx_cifs_mount+0x10/0x10 [cifs] ? lock_acquire+0x3f/0x90 ? find_nls+0x16/0xa0 ? smb3_update_mnt_flags+0x372/0x3b0 [cifs] cifs_smb3_do_mount+0x1e2/0xc80 [cifs] ? __pfx_vfs_parse_fs_string+0x10/0x10 ? __pfx_cifs_smb3_do_mount+0x10/0x10 [cifs] smb3_get_tree+0x1bf/0x330 [cifs] vfs_get_tree+0x4a/0x160 path_mount+0x3c1/0xfb0 ? kasan_quarantine_put+0xc7/0x1d0 ? __pfx_path_mount+0x10/0x10 ? kmem_cache_free+0x118/0x3e0 ? user_path_at+0x74/0xa0 __x64_sys_mount+0x1a6/0x1e0 ? __pfx___x64_sys_mount+0x10/0x10 ? mark_held_locks+0x1a/0x90 do_syscall_64+0xbb/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Cc: Tom Talpey <tom@talpey.com> Reported-by: Jianhong Yin <jiyin@redhat.com> Cc: stable@vger.kernel.org # v6.12 Fixes: b0abcd65ec54 ("smb: client: fix UAF in async decryption") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25Update misleading comment in cifs_chan_update_ifaceMarco Crivellari
Since commit 8da33fd11c05 ("cifs: avoid deadlocks while updating iface") cifs_chan_update_iface now takes the chan_lock itself, so update the comment accordingly. Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25smb: client: change return value in open_cached_dir_by_dentry() if !cfidsHenrique Carvalho
Change return value from -ENOENT to -EOPNOTSUPP to maintain consistency with the return value of open_cached_dir() for the same case. This change is safe as the only calling function does not differentiate between these return values. Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25smb: client: disable directory caching when dir_cache_timeout is zeroHenrique Carvalho
Setting dir_cache_timeout to zero should disable the caching of directory contents. Currently, even when dir_cache_timeout is zero, some caching related functions are still invoked, which is unintended behavior. Fix the issue by setting tcon->nohandlecache to true when dir_cache_timeout is zero, ensuring that directory handle caching is properly disabled. Fixes: 238b351d0935 ("smb3: allow controlling length of time directory entries are cached with dir leases") Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25smb: client: remove unnecessary checks in open_cached_dir()Henrique Carvalho
Checks inside open_cached_dir() can be removed because if dir caching is disabled then tcon->cfids is necessarily NULL. Therefore, all other checks are redundant. Signed-off-by: Henrique Carvalho <henrique.carvalho@suse.com> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.com> Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-11-25ceph: fix cred leak in ceph_mds_check_access()Max Kellermann
get_current_cred() increments the reference counter, but the put_cred() call was missing. Cc: stable@vger.kernel.org Fixes: 596afb0b8933 ("ceph: add ceph_mds_check_access() helper") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-11-25ceph: pass cred pointer to ceph_mds_auth_match()Max Kellermann
This eliminates a redundant get_current_cred() call, because ceph_mds_check_access() has already obtained this pointer. As a side effect, this also fixes a reference leak in ceph_mds_auth_match(): by omitting the get_current_cred() call, no additional cred reference is taken. Cc: stable@vger.kernel.org Fixes: 596afb0b8933 ("ceph: add ceph_mds_check_access() helper") Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2024-11-25fs: require inode_owner_or_capable for F_SET_RW_HINTChristoph Hellwig
F_SET_RW_HINT controls data placement in the file system and / or device and should not be available to everyone who can read a given file. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20241122122931.90408-2-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-25exfat: reduce FAT chain traversalYuezhang Mo
Before this commit, ->dir and ->entry of exfat_inode_info record the first cluster of the parent directory and the directory entry index starting from this cluster. The directory entry set will be gotten during write-back-inode/rmdir/ unlink/rename. If the clusters of the parent directory are not continuous, the FAT chain will be traversed from the first cluster of the parent directory to find the cluster where ->entry is located. After this commit, ->dir records the cluster where the first directory entry in the directory entry set is located, and ->entry records the directory entry index in the cluster, so that there is almost no need to access the FAT when getting the directory entry set. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: code cleanup for exfat_readdir()Yuezhang Mo
For the root directory and other directories, the clusters allocated to them can be obtained from exfat_inode_info, and there is no need to distinguish them. And there is no need to initialize atime/ctime/mtime/size in exfat_readdir(), because exfat_iterate() does not use them. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: remove argument 'p_dir' from exfat_add_entry()Yuezhang Mo
The output of argument 'p_dir' of exfat_add_entry() is not used in either exfat_mkdir() or exfat_create(), remove the argument. Code refinement, no functional changes. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2024-11-25exfat: move exfat_chain_set() out of __exfat_resolve_path()Yuezhang Mo
__exfat_resolve_path() mixes two functions. The first one is to resolve and check if the path is valid. The second one is to output the cluster assigned to the directory. The second one is only needed when need to traverse the directory entries, and calling exfat_chain_set() so early causes p_dir to be passed as an argument multiple times, increasing the complexity of the code. This commit moves the call to exfat_chain_set() before traversing directory entries. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>