summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-11-07posix-timers: Embed sigqueue in struct k_itimerThomas Gleixner
To cure the SIG_IGN handling for posix interval timers, the preallocated sigqueue needs to be embedded into struct k_itimer to prevent life time races of all sorts. Now that the prerequisites are in place, embed the sigqueue into struct k_itimer and fixup the relevant usage sites. Aside of preparing for proper SIG_IGN handling, this spares an extra allocation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20241105064213.719695194@linutronix.de
2024-11-07btrfs: fix the length of reserved qgroup to freeHaisu Wang
The dealloc flag may be cleared and the extent won't reach the disk in cow_file_range when errors path. The reserved qgroup space is freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range"). However, the length of untouched region to free needs to be adjusted with the correct remaining region size. Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range") CC: stable@vger.kernel.org # 6.11+ Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Haisu Wang <haisuwang@tencent.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-07btrfs: reinitialize delayed ref list after deleting it from the listFilipe Manana
At insert_delayed_ref() if we need to update the action of an existing ref to BTRFS_DROP_DELAYED_REF, we delete the ref from its ref head's ref_add_list using list_del(), which leaves the ref's add_list member not reinitialized, as list_del() sets the next and prev members of the list to LIST_POISON1 and LIST_POISON2, respectively. If later we end up calling drop_delayed_ref() against the ref, which can happen during merging or when destroying delayed refs due to a transaction abort, we can trigger a crash since at drop_delayed_ref() we call list_empty() against the ref's add_list, which returns false since the list was not reinitialized after the list_del() and as a consequence we call list_del() again at drop_delayed_ref(). This results in an invalid list access since the next and prev members are set to poison pointers, resulting in a splat if CONFIG_LIST_HARDENED and CONFIG_DEBUG_LIST are set or invalid poison pointer dereferences otherwise. So fix this by deleting from the list with list_del_init() instead. Fixes: 1d57ee941692 ("btrfs: improve delayed refs iterations") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-07btrfs: fix per-subvolume RO/RW flags with new mount APIQu Wenruo
[BUG] With util-linux 2.40.2, the 'mount' utility is already utilizing the new mount API. e.g: # strace mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test/ ... fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0 fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0 fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0 fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0 fsmount(3, FSMOUNT_CLOEXEC, 0) = 4 mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0 move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0 But this leads to a new problem, that per-subvolume RO/RW mount no longer works, if the initial mount is RO: # mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test # mount -o rw,subvol=subv2 /dev/test/scratch1 /mnt/scratch # mount | grep mnt /dev/mapper/test-scratch1 on /mnt/test type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=256,subvol=/subv1) /dev/mapper/test-scratch1 on /mnt/scratch type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=257,subvol=/subv2) # touch /mnt/scratch/foobar touch: cannot touch '/mnt/scratch/foobar': Read-only file system This is a common use cases on distros. [CAUSE] We have a workaround for remount to handle the RO->RW change, but if the mount is using the new mount API, we do not do that, and rely on the mount tool NOT to set the ro flag. But that's not how the mount tool is doing for the new API: fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0 fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0 fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0 <<<< Setting RO flag for super block fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0 fsmount(3, FSMOUNT_CLOEXEC, 0) = 4 mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0 move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0 This means we will set the super block RO at the first mount. Later RW mount will not try to reconfigure the fs to RW because the mount tool is already using the new API. This totally breaks the per-subvolume RO/RW mount behavior. [FIX] Do not skip the reconfiguration even if using the new API. The old comments are just expecting any mount tool to properly skip the RO flag set even if we specify "ro", which is not the reality. Update the comments regarding the backward compatibility on the kernel level so it works with old and new mount utilities. CC: stable@vger.kernel.org # 6.8+ Fixes: f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes") Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2024-11-06Merge tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client fixes from Anna Schumaker: "These are mostly fixes that came up during the nfs bakeathon the other week. Stable Fixes: - Fix KMSAN warning in decode_getfattr_attrs() Other Bugfixes: - Handle -ENOTCONN in xs_tcp_setup_socked() - NFSv3: only use NFS timeout for MOUNT when protocols are compatible - Fix attribute delegation behavior on exclusive create and a/mtime changes - Fix localio to cope with racing nfs_local_probe() - Avoid i_lock contention in fs_clear_invalid_mapping()" * tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: nfs: avoid i_lock contention in nfs_clear_invalid_mapping nfs_common: fix localio to cope with racing nfs_local_probe() NFS: Further fixes to attribute delegation a/mtime changes NFS: Fix attribute delegation behaviour on exclusive create nfs: Fix KMSAN warning in decode_getfattr_attrs() NFSv3: only use NFS timeout for MOUNT when protocols are compatible sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()
2024-11-06fs/proc/kcore.c: fix coccinelle reported ERROR instancesMirsad Todorovac
Coccinelle complains about the nested reuse of the pointer `iter' with different pointer type: ./fs/proc/kcore.c:515:26-30: ERROR: invalid reference to the index variable of the iterator on line 499 ./fs/proc/kcore.c:534:23-27: ERROR: invalid reference to the index variable of the iterator on line 499 ./fs/proc/kcore.c:550:40-44: ERROR: invalid reference to the index variable of the iterator on line 499 ./fs/proc/kcore.c:568:27-31: ERROR: invalid reference to the index variable of the iterator on line 499 ./fs/proc/kcore.c:581:28-32: ERROR: invalid reference to the index variable of the iterator on line 499 ./fs/proc/kcore.c:599:27-31: ERROR: invalid reference to the index variable of the iterator on line 499 ./fs/proc/kcore.c:607:38-42: ERROR: invalid reference to the index variable of the iterator on line 499 ./fs/proc/kcore.c:614:26-30: ERROR: invalid reference to the index variable of the iterator on line 499 Replacing `struct kcore_list *iter' with `struct kcore_list *tmp' doesn't change the scope and the functionality is the same and coccinelle seems happy. NOTE: There was an issue with using `struct kcore_list *pos' as the nested iterator. The build did not work! [akpm@linux-foundation.org: s/tmp/pos/] Link: https://lkml.kernel.org/r/20241029054651.86356-2-mtodorovac69@gmail.com Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/20220331223700.902556-1-jakobkoschel@gmail.com Fixes: 04d168c6d42d ("fs/proc/kcore.c: remove check of list iterator against head past the loop body") Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com> Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Brian Johannesmeyer" <bjohannesmeyer@gmail.com> Cc: Cristiano Giuffrida <c.giuffrida@vu.nl> Cc: "Bos, H.J." <h.j.bos@vu.nl> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Hari Bathini <hbathini@linux.ibm.com> Cc: Yan Zhen <yanzhen@vivo.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-06isofs: avoid memory leak in iocharsetHao Ge
A memleak was found as below: unreferenced object 0xffff0000d10164d8 (size 8): comm "pool-udisksd", pid 108217, jiffies 4295408555 hex dump (first 8 bytes): 75 74 66 38 00 cc cc cc utf8.... backtrace (crc de430d31): [<ffff800081046e6c>] kmemleak_alloc+0xb8/0xc8 [<ffff8000803e6c3c>] __kmalloc_node_track_caller_noprof+0x380/0x474 [<ffff800080363b74>] kstrdup+0x70/0xfc [<ffff80007bb3c6a4>] isofs_parse_param+0x228/0x2c0 [isofs] [<ffff8000804d7f68>] vfs_parse_fs_param+0xf4/0x164 [<ffff8000804d8064>] vfs_parse_fs_string+0x8c/0xd4 [<ffff8000804d815c>] vfs_parse_monolithic_sep+0xb0/0xfc [<ffff8000804d81d8>] generic_parse_monolithic+0x30/0x3c [<ffff8000804d8bfc>] parse_monolithic_mount_data+0x40/0x4c [<ffff8000804b6a64>] path_mount+0x6c4/0x9ec [<ffff8000804b6e38>] do_mount+0xac/0xc4 [<ffff8000804b7494>] __arm64_sys_mount+0x16c/0x2b0 [<ffff80008002b8dc>] invoke_syscall+0x7c/0x104 [<ffff80008002ba44>] el0_svc_common.constprop.1+0xe0/0x104 [<ffff80008002ba94>] do_el0_svc+0x2c/0x38 [<ffff800081041108>] el0_svc+0x3c/0x1b8 The opt->iocharset is freed inside the isofs_fill_super function, But there may be situations where it's not possible to enter this function. For example, in the get_tree_bdev_flags function,when encountering the situation where "Can't mount, would change RO state," In such a case, isofs_fill_super will not have the opportunity to be called,which means that opt->iocharset will not have the chance to be freed,ultimately leading to a memory leak. Let's move the memory freeing of opt->iocharset into isofs_free_fc function. Fixes: 1b17a46c9243 ("isofs: convert isofs to use the new mount API") Signed-off-by: Hao Ge <gehao@kylinos.cn> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241106082841.51773-1-hao.ge@linux.dev
2024-11-06Merge tag 'tracefs-v6.12-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracefs fixes from Steven Rostedt: "Fix tracefs mount options. Commit 78ff64081949 ("vfs: Convert tracefs to use the new mount API") broke the gid setting when set by fstab or other mount utility. It is ignored when it is set. Fix the code so that it recognises the option again and will honor the settings on mount at boot up. Update the internal documentation and create a selftest to make sure it doesn't break again in the future" * tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/selftests: Add tracefs mount options test tracing: Document tracefs gid mount option tracing: Fix tracefs mount options
2024-11-06xattr: remove redundant check on variable errColin Ian King
Curretly in function generic_listxattr the for_each_xattr_handler loop checks err and will return out of the function if err is non-zero. It's impossible for err to be non-zero at the end of the function where err is checked again for a non-zero value. The final non-zero check is therefore redundant and can be removed. Also move the declaration of err into the loop. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-06fs/xattr: add *at family syscallsChristian Göttsche
Add the four syscalls setxattrat(), getxattrat(), listxattrat() and removexattrat(). Those can be used to operate on extended attributes, especially security related ones, either relative to a pinned directory or on a file descriptor without read access, avoiding a /proc/<pid>/fd/<fd> detour, requiring a mounted procfs. One use case will be setfiles(8) setting SELinux file contexts ("security.selinux") without race conditions and without a file descriptor opened with read access requiring SELinux read permission. Use the do_{name}at() pattern from fs/open.c. Pass the value of the extended attribute, its length, and for setxattrat(2) the command (XATTR_CREATE or XATTR_REPLACE) via an added struct xattr_args to not exceed six syscall arguments and not merging the AT_* and XATTR_* flags. [AV: fixes by Christian Brauner folded in, the entire thing rebased on top of {filename,file}_...xattr() primitives, treatment of empty pathnames regularized. As the result, AT_EMPTY_PATH+NULL handling is cheap, so f...(2) can use it] Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Link: https://lore.kernel.org/r/20240426162042.191916-1-cgoettsche@seltendoof.de Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christian Brauner <brauner@kernel.org> CC: x86@kernel.org CC: linux-alpha@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-arm-kernel@lists.infradead.org CC: linux-ia64@vger.kernel.org CC: linux-m68k@lists.linux-m68k.org CC: linux-mips@vger.kernel.org CC: linux-parisc@vger.kernel.org CC: linuxppc-dev@lists.ozlabs.org CC: linux-s390@vger.kernel.org CC: linux-sh@vger.kernel.org CC: sparclinux@vger.kernel.org CC: linux-fsdevel@vger.kernel.org CC: audit@vger.kernel.org CC: linux-arch@vger.kernel.org CC: linux-api@vger.kernel.org CC: linux-security-module@vger.kernel.org CC: selinux@vger.kernel.org [brauner: slight tweaks] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-06new helpers: file_removexattr(), filename_removexattr()Al Viro
switch path_removexattrat() and fremovexattr(2) to those Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-06new helpers: file_listxattr(), filename_listxattr()Al Viro
switch path_listxattr() and flistxattr(2) to those Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-06replace do_getxattr() with saner helpers.Al Viro
similar to do_setxattr() in the previous commit... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-06replace do_setxattr() with saner helpers.Al Viro
io_uring setxattr logics duplicates stuff from fs/xattr.c; provide saner helpers (filename_setxattr() and file_setxattr() resp.) and use them. NB: putname(ERR_PTR()) is a no-op Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-06new helper: import_xattr_name()Al Viro
common logics for marshalling xattr names. Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-06fs: rename struct xattr_ctx to kernel_xattr_ctxChristian Göttsche
Rename the struct xattr_ctx to increase distinction with the about to be added user API struct xattr_args. No functional change. Suggested-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Link: https://lore.kernel.org/r/20240426162042.191916-2-cgoettsche@seltendoof.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-11-06freevxfs: Replace one-element array with flexible array memberThorsten Blum
Replace the deprecated one-element array with a modern flexible array member in the struct vxfs_dirblk. Link: https://github.com/KSPP/linux/issues/79 Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20241103121707.102838-3-thorsten.blum@linux.dev Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-11-05ocfs2: remove unused errmsg function and tableDr. David Alan Gilbert
dlm_errmsg() has been unused since 2010's commit 0016eedc4185 ("ocfs2_dlmfs: Use the stackglue.") Remove dlm_errmsg() and the message table it indexes. Link: https://lkml.kernel.org/r/20241022002543.302606-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05ocfs2: cluster: fix a typoAndrew Kreimer
Fix a typo: panicing -> panicking. Via codespell. Link: https://lkml.kernel.org/r/20241027133540.22090-1-algonell@gmail.com Signed-off-by: Andrew Kreimer <algonell@gmail.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert metadata aops from writepage to writepagesMatthew Wilcox (Oracle)
By implementing ->writepages instead of ->writepage, we remove a layer of indirect function calls from the writeback path and the last use of struct page in nilfs2. [konishi.ryusuke@gmail.com: fixed panic by using buffer_migrate_folio_norefs] Link: https://lkml.kernel.org/r/20241002150036.1339475-5-willy@infradead.org Link: https://lkml.kernel.org/r/20241024092602.13395-13-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert nilfs_recovery_copy_block() to take a folioMatthew Wilcox (Oracle)
Use memcpy_to_folio() instead of open-coding it, and use offset_in_folio() in case anybody wants to use nilfs2 on a device with large blocks. [konishi.ryusuke@gmail.com: added label name change] Link: https://lkml.kernel.org/r/20241002150036.1339475-4-willy@infradead.org Link: https://lkml.kernel.org/r/20241024092602.13395-12-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert nilfs_page_count_clean_buffers() to take a folioMatthew Wilcox (Oracle)
Both callers have a folio, so pass it in and use it directly. [konishi.ryusuke@gmail.com: fixed a checkpatch warning about function declaration] Link: https://lkml.kernel.org/r/20241002150036.1339475-3-willy@infradead.org Link: https://lkml.kernel.org/r/20241024092602.13395-11-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: remove nilfs_writepageMatthew Wilcox (Oracle)
Since nilfs2 has a ->writepages operation already, ->writepage is only called by the migration code. If we add a ->migrate_folio operation, it won't even be used for that and so it can be deleted. [konishi.ryusuke@gmail.com: fixed panic by using buffer_migrate_folio_norefs] Link: https://lkml.kernel.org/r/20241002150036.1339475-2-willy@infradead.org Link: https://lkml.kernel.org/r/20241024092602.13395-10-konishi.ryusuke@gmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert checkpoint file to be folio-basedRyusuke Konishi
Regarding the cpfile, a metadata file that manages checkpoints, convert the page-based implementation to a folio-based implementation. This change involves some helper functions to calculate byte offsets on folios and removing a few helper functions that are no longer needed. Link: https://lkml.kernel.org/r/20241024092602.13395-9-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: remove nilfs_palloc_block_get_entry()Ryusuke Konishi
All calls to nilfs_palloc_block_get_entry() are now gone, so remove it. Link: https://lkml.kernel.org/r/20241024092602.13395-8-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert DAT file to be folio-basedRyusuke Konishi
Regarding the DAT, a metadata file that manages virtual block addresses, convert the page-based implementation to a folio-based implementation. Link: https://lkml.kernel.org/r/20241024092602.13395-7-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert inode file to be folio-basedRyusuke Konishi
Convert the page-based implementation of ifile, a metadata file that manages inodes, to folio-based. Link: https://lkml.kernel.org/r/20241024092602.13395-6-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert persistent object allocator to be folio-basedRyusuke Konishi
Regarding the persistent oject allocator, a common mechanism for allocating objects in metadata files such as inodes and DAT entries, convert the page-based implementation to a folio-based implementation. In this conversion, helper functions nilfs_palloc_group_desc_offset() and nilfs_palloc_bitmap_offset() are added and used to calculate the byte offset within a folio of a group descriptor structure and bitmap, respectively, to replace kmap_local_page with kmap_local_folio. In addition, a helper function called nilfs_palloc_entry_offset() is provided to facilitate common calculation of the byte offset within a folio of metadata file entries managed in the persistent object allocator format. Link: https://lkml.kernel.org/r/20241024092602.13395-5-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert segment usage file to be folio-basedRyusuke Konishi
For the sufile, which is a metadata file that holds information about managing segments, convert the page-based implementation to a folio-based implementation. kmap_local_page() is changed to use kmap_local_folio(), and where offsets within a page are calculated using bh_offset(), are replaced with calculations using offset_in_folio() with an additional helper function nilfs_sufile_segment_usage_offset(). Link: https://lkml.kernel.org/r/20241024092602.13395-4-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert common metadata file code to be folio-basedRyusuke Konishi
In the common routines for metadata files, nilfs_mdt_insert_new_block(), which inserts a new block buffer into the cache, is still page-based, and there are two places where bh_offset() is used. Convert these to page-based. Link: https://lkml.kernel.org/r/20241024092602.13395-3-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05nilfs2: convert segment buffer to be folio-basedRyusuke Konishi
Patch series "nilfs2: Finish folio conversion". This series converts all remaining page structure references in nilfs2 to folio-based, except for nilfs_copy_buffer function, which was converted to use folios in advance for cross-fs page flags cleanup. This prioritizes folio conversion, and does not include buffer head reference reduction, nor does it support for block sizes larger than the system page size. The first eight patches in this series mainly convert each of the nilfs2-specific metadata implementations to use folios. The last four patches, by Matthew Wilcox, eliminate aops writepage callbacks and convert the remaining page structure references to folio-based. This part reflects some corrections to the patch series posted by Matthew. This patch (of 12): In the segment buffer (log buffer) implementation, two parts of the block buffer, CRC calculation and bio preparation, are still page-based, so convert them to folio-based. Link: https://lkml.kernel.org/r/20241024092602.13395-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20241024092602.13395-2-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05bcachefs: update min_heap_callbacks to use default builtin swapKuan-Wei Chiu
Replace the swp function pointer in the min_heap_callbacks of bcachefs with NULL, allowing direct usage of the default builtin swap implementation. This modification simplifies the code and improves performance by removing unnecessary function indirection. Link: https://lkml.kernel.org/r/20241020040200.939973-10-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Coly Li <colyli@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Sakai <msakai@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05bcachefs: clean up duplicate min_heap_callbacks declarationsKuan-Wei Chiu
Refactor the bcachefs code to remove multiple redundant declarations of min_heap_callbacks, ensuring that each unique declaration appears only once. Link: https://lore.kernel.org/20241017095520.GV16066@noisy.programming.kicks-ass.net Link: https://lkml.kernel.org/r/20241020040200.939973-9-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Suggested-by: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Coly Li <colyli@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Sakai <msakai@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05lib/min_heap: introduce non-inline versions of min heap API functionsKuan-Wei Chiu
Patch series "Enhance min heap API with non-inline functions and optimizations", v2. Add non-inline versions of the min heap API functions in lib/min_heap.c and updates all users outside of kernel/events/core.c to use these non-inline versions. To mitigate the performance impact of indirect function calls caused by the non-inline versions of the swap and compare functions, a builtin swap has been introduced that swaps elements based on their size. Additionally, it micro-optimizes the efficiency of the min heap by pre-scaling the counter, following the same approach as in lib/sort.c. Documentation for the min heap API has also been added to the core-api section. This patch (of 10): All current min heap API functions are marked with '__always_inline'. However, as the number of users increases, inlining these functions everywhere leads to a increase in kernel size. In performance-critical paths, such as when perf events are enabled and min heap functions are called on every context switch, it is important to retain the inline versions for optimal performance. To balance this, the original inline functions are kept, and additional non-inline versions of the functions have been added in lib/min_heap.c. Link: https://lkml.kernel.org/r/20241020040200.939973-1-visitorckw@gmail.com Link: https://lore.kernel.org/20240522161048.8d8bbc7b153b4ecd92c50666@linux-foundation.org Link: https://lkml.kernel.org/r/20241020040200.939973-2-visitorckw@gmail.com Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw> Cc: Coly Li <colyli@suse.de> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Kuan-Wei Chiu <visitorckw@gmail.com> Cc: "Liang, Kan" <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Sakai <msakai@redhat.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05get rid of __get_task_comm()Yafang Shao
Patch series "Improve the copy of task comm", v8. Using {memcpy,strncpy,strcpy,kstrdup} to copy the task comm relies on the length of task comm. Changes in the task comm could result in a destination string that is overflow. Therefore, we should explicitly ensure the destination string is always NUL-terminated, regardless of the task comm. This approach will facilitate future extensions to the task comm. As suggested by Linus [0], we can identify all relevant code with the following git grep command: git grep 'memcpy.*->comm\>' git grep 'kstrdup.*->comm\>' git grep 'strncpy.*->comm\>' git grep 'strcpy.*->comm\>' PATCH #2~#4: memcpy PATCH #5~#6: kstrdup PATCH #7: strcpy Please note that strncpy() is not included in this series as it is being tracked by another effort. [1] This patch (of 7): We want to eliminate the use of __get_task_comm() for the following reasons: - The task_lock() is unnecessary Quoted from Linus [0]: : Since user space can randomly change their names anyway, using locking : was always wrong for readers (for writers it probably does make sense : to have some lock - although practically speaking nobody cares there : either, but at least for a writer some kind of race could have : long-term mixed results Link: https://lkml.kernel.org/r/20241007144911.27693-1-laoar.shao@gmail.com Link: https://lkml.kernel.org/r/20241007144911.27693-2-laoar.shao@gmail.com Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0] Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/ Link: https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/ [0] Link: https://github.com/KSPP/linux/issues/90 [1] Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Cc: Matus Jokay <matus.jokay@stuba.sk> Cc: Alejandro Colomar <alx@kernel.org> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Airlie <airlied@gmail.com> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maxime Ripard <mripard@kernel.org> Cc: Ondrej Mosnacek <omosnace@redhat.com> Cc: Paul Moore <paul@paul-moore.com> Cc: Quentin Monnet <qmo@kernel.org> Cc: Simon Horman <horms@kernel.org> Cc: Stephen Smalley <stephen.smalley.work@gmail.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05ocfs2: fix typo in commentMohammed Anees
Fix "Allcate" -> "Allocate" Link: https://lkml.kernel.org/r/20240917185156.10580-1-pvmohammedanees2003@gmail.com Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05ocfs2: remove unused declaration in header fileZhang Zekun
The definition of ocfs2_global_read_dquot() has been removed since commit fb8dd8d78014 ("ocfs2: Fix quota locking"). Let's remove the empty declartion Link: https://lkml.kernel.org/r/20240906055742.105024-1-zhangzekun11@huawei.com Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05mm: refactor mm_access() to not return NULLLorenzo Stoakes
mm_access() can return NULL if the mm is not found, but this is handled the same as an error in all callers, with some translating this into an -ESRCH error. Only proc_mem_open() returns NULL if no mm is found, however in this case it is clearer and makes more sense to explicitly handle the error. Additionally we take the opportunity to refactor the function to eliminate unnecessary nesting. Simplify things by simply returning -ESRCH if no mm is found - this both eliminates confusing use of the IS_ERR_OR_NULL() macro, and simplifies callers which would return -ESRCH by returning this error directly. [lorenzo.stoakes@oracle.com: prefer neater pointer error comparison] Link: https://lkml.kernel.org/r/2fae1834-749a-45e1-8594-5e5979cf7103@lucifer.local Link: https://lkml.kernel.org/r/20240924201023.193135-1-lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by: Arnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-11-05ext4: Do not fallback to buffered-io for DIO atomic writeRitesh Harjani (IBM)
atomic writes is currently only supported for single fsblock and only for direct-io. We should not return -ENOTBLK for atomic writes since we want the atomic write request to either complete fully or fail otherwise. Hence, we should never fallback to buffered-io in case of DIO atomic write requests. Let's also catch if this ever happens by adding some WARN_ON_ONCE before buffered-io handling for direct-io atomic writes. More details of the discussion [1]. While at it let's add an inline helper ext4_want_directio_fallback() which simplifies the logic checks and inherently fixes condition on when to return -ENOTBLK which otherwise was always returning true for any write or directio in ext4_iomap_end(). It was ok since ext4 only supports direct-io via iomap. [1]: https://lore.kernel.org/linux-xfs/cover.1729825985.git.ritesh.list@gmail.com/T/#m9dbecc11bed713ed0d7a486432c56b105b555f04 Suggested-by: Darrick J. Wong <djwong@kernel.org> # inline helper Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz>
2024-11-05ext4: Support setting FMODE_CAN_ATOMIC_WRITERitesh Harjani (IBM)
FS needs to add the fmode capability in order to support atomic writes during file open (refer kiocb_set_rw_flags()). Set this capability on a regular file if ext4 can do atomic write. Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz>
2024-11-05ext4: Check for atomic writes support in write iterRitesh Harjani (IBM)
Let's validate the given constraints for atomic write request. Otherwise it will fail with -EINVAL. Currently atomic write is only supported on DIO, so for buffered-io it will return -EOPNOTSUPP. Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz>
2024-11-05ext4: Add statx support for atomic writesRitesh Harjani (IBM)
This patch adds base support for atomic writes via statx getattr. On bs < ps systems, we can create FS with say bs of 16k. That means both atomic write min and max unit can be set to 16k for supporting atomic writes. Co-developed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz>
2024-11-05xfs: port ondisk structure checks from xfs/122 to the kernelDarrick J. Wong
Check this with every kernel and userspace build, so we can drop the nonsense in xfs/122. Roughly drafted with: sed -e 's/^offsetof/\tXFS_CHECK_OFFSET/g' \ -e 's/^sizeof/\tXFS_CHECK_STRUCT_SIZE/g' \ -e 's/ = \([0-9]*\)/,\t\t\t\1);/g' \ -e 's/xfs_sb_t/struct xfs_dsb/g' \ -e 's/),/,/g' \ -e 's/xfs_\([a-z0-9_]*\)_t,/struct xfs_\1,/g' \ < tests/xfs/122.out | sort and then manual fixups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: separate space btree structures in xfs_ondisk.hDarrick J. Wong
Create a separate section for space management btrees so that they're not mixed in with file structures. Ignore the dsb stuff sprinkled around for now, because we'll deal with that in a subsequent patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: convert struct typedefs in xfs_ondisk.hDarrick J. Wong
Replace xfs_foo_t with struct xfs_foo where appropriate. The next patch will import more checks from xfs/122, and it's easier to automate deduplication if we don't have to reason about typedefs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: enable metadata directory featureDarrick J. Wong
Enable the metadata directory feature. With this feature, all metadata inodes are placed in the metadata directory, and the only inumbers in the superblock are the roots of the two directory trees. The RT device is now sharded into a number of rtgroups, where 0 rtgroups mean that no RT extents are supported, and the traditional XFS stub RT bitmap and summary inodes don't exist. A single rtgroup gives roughly identical behavior to the traditional RT setup, but now with checksummed and self identifying free space metadata. For quota, the quota options are read from the superblock unless explicitly overridden via mount options. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: enable realtime quota againDarrick J. Wong
Enable quotas for the realtime device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: update sb field checks when metadir is turned onDarrick J. Wong
When metadir is enabled, we want to check the two new rtgroups fields, and we don't want to check the old inumbers that are now in the metadir. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: reserve quota for realtime files correctlyDarrick J. Wong
Fix xfs_quota_reserve_blkres to reserve rt block quota whenever we're dealing with a realtime file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: create quota preallocation watermarks for realtime quotaDarrick J. Wong
Refactor the quota preallocation watermarking code so that it'll work for realtime quota too. Convert the do_div calls into div_u64 for compactness. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>