Age | Commit message (Collapse) | Author |
|
To cure the SIG_IGN handling for posix interval timers, the preallocated
sigqueue needs to be embedded into struct k_itimer to prevent life time
races of all sorts.
Now that the prerequisites are in place, embed the sigqueue into struct
k_itimer and fixup the relevant usage sites.
Aside of preparing for proper SIG_IGN handling, this spares an extra
allocation.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/20241105064213.719695194@linutronix.de
|
|
The dealloc flag may be cleared and the extent won't reach the disk in
cow_file_range when errors path. The reserved qgroup space is freed in
commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in
cow_file_range"). However, the length of untouched region to free needs
to be adjusted with the correct remaining region size.
Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range")
CC: stable@vger.kernel.org # 6.11+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Haisu Wang <haisuwang@tencent.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
At insert_delayed_ref() if we need to update the action of an existing
ref to BTRFS_DROP_DELAYED_REF, we delete the ref from its ref head's
ref_add_list using list_del(), which leaves the ref's add_list member
not reinitialized, as list_del() sets the next and prev members of the
list to LIST_POISON1 and LIST_POISON2, respectively.
If later we end up calling drop_delayed_ref() against the ref, which can
happen during merging or when destroying delayed refs due to a transaction
abort, we can trigger a crash since at drop_delayed_ref() we call
list_empty() against the ref's add_list, which returns false since
the list was not reinitialized after the list_del() and as a consequence
we call list_del() again at drop_delayed_ref(). This results in an
invalid list access since the next and prev members are set to poison
pointers, resulting in a splat if CONFIG_LIST_HARDENED and
CONFIG_DEBUG_LIST are set or invalid poison pointer dereferences
otherwise.
So fix this by deleting from the list with list_del_init() instead.
Fixes: 1d57ee941692 ("btrfs: improve delayed refs iterations")
CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[BUG]
With util-linux 2.40.2, the 'mount' utility is already utilizing the new
mount API. e.g:
# strace mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test/
...
fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0
fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0
fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0
fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
fsmount(3, FSMOUNT_CLOEXEC, 0) = 4
mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0
move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0
But this leads to a new problem, that per-subvolume RO/RW mount no
longer works, if the initial mount is RO:
# mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test
# mount -o rw,subvol=subv2 /dev/test/scratch1 /mnt/scratch
# mount | grep mnt
/dev/mapper/test-scratch1 on /mnt/test type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=256,subvol=/subv1)
/dev/mapper/test-scratch1 on /mnt/scratch type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=257,subvol=/subv2)
# touch /mnt/scratch/foobar
touch: cannot touch '/mnt/scratch/foobar': Read-only file system
This is a common use cases on distros.
[CAUSE]
We have a workaround for remount to handle the RO->RW change, but if the
mount is using the new mount API, we do not do that, and rely on the
mount tool NOT to set the ro flag.
But that's not how the mount tool is doing for the new API:
fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0
fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0
fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0 <<<< Setting RO flag for super block
fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
fsmount(3, FSMOUNT_CLOEXEC, 0) = 4
mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0
move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0
This means we will set the super block RO at the first mount.
Later RW mount will not try to reconfigure the fs to RW because the
mount tool is already using the new API.
This totally breaks the per-subvolume RO/RW mount behavior.
[FIX]
Do not skip the reconfiguration even if using the new API. The old
comments are just expecting any mount tool to properly skip the RO flag
set even if we specify "ro", which is not the reality.
Update the comments regarding the backward compatibility on the kernel
level so it works with old and new mount utilities.
CC: stable@vger.kernel.org # 6.8+
Fixes: f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Pull NFS client fixes from Anna Schumaker:
"These are mostly fixes that came up during the nfs bakeathon the other
week.
Stable Fixes:
- Fix KMSAN warning in decode_getfattr_attrs()
Other Bugfixes:
- Handle -ENOTCONN in xs_tcp_setup_socked()
- NFSv3: only use NFS timeout for MOUNT when protocols are compatible
- Fix attribute delegation behavior on exclusive create and a/mtime
changes
- Fix localio to cope with racing nfs_local_probe()
- Avoid i_lock contention in fs_clear_invalid_mapping()"
* tag 'nfs-for-6.12-3' of git://git.linux-nfs.org/projects/anna/linux-nfs:
nfs: avoid i_lock contention in nfs_clear_invalid_mapping
nfs_common: fix localio to cope with racing nfs_local_probe()
NFS: Further fixes to attribute delegation a/mtime changes
NFS: Fix attribute delegation behaviour on exclusive create
nfs: Fix KMSAN warning in decode_getfattr_attrs()
NFSv3: only use NFS timeout for MOUNT when protocols are compatible
sunrpc: handle -ENOTCONN in xs_tcp_setup_socket()
|
|
Coccinelle complains about the nested reuse of the pointer `iter' with
different pointer type:
./fs/proc/kcore.c:515:26-30: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:534:23-27: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:550:40-44: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:568:27-31: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:581:28-32: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:599:27-31: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:607:38-42: ERROR: invalid reference to the index variable of the iterator on line 499
./fs/proc/kcore.c:614:26-30: ERROR: invalid reference to the index variable of the iterator on line 499
Replacing `struct kcore_list *iter' with `struct kcore_list *tmp' doesn't change the
scope and the functionality is the same and coccinelle seems happy.
NOTE: There was an issue with using `struct kcore_list *pos' as the nested iterator.
The build did not work!
[akpm@linux-foundation.org: s/tmp/pos/]
Link: https://lkml.kernel.org/r/20241029054651.86356-2-mtodorovac69@gmail.com
Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Link: https://lkml.kernel.org/r/20220331223700.902556-1-jakobkoschel@gmail.com
Fixes: 04d168c6d42d ("fs/proc/kcore.c: remove check of list iterator against head past the loop body")
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Brian Johannesmeyer" <bjohannesmeyer@gmail.com>
Cc: Cristiano Giuffrida <c.giuffrida@vu.nl>
Cc: "Bos, H.J." <h.j.bos@vu.nl>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Hari Bathini <hbathini@linux.ibm.com>
Cc: Yan Zhen <yanzhen@vivo.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
A memleak was found as below:
unreferenced object 0xffff0000d10164d8 (size 8):
comm "pool-udisksd", pid 108217, jiffies 4295408555
hex dump (first 8 bytes):
75 74 66 38 00 cc cc cc utf8....
backtrace (crc de430d31):
[<ffff800081046e6c>] kmemleak_alloc+0xb8/0xc8
[<ffff8000803e6c3c>] __kmalloc_node_track_caller_noprof+0x380/0x474
[<ffff800080363b74>] kstrdup+0x70/0xfc
[<ffff80007bb3c6a4>] isofs_parse_param+0x228/0x2c0 [isofs]
[<ffff8000804d7f68>] vfs_parse_fs_param+0xf4/0x164
[<ffff8000804d8064>] vfs_parse_fs_string+0x8c/0xd4
[<ffff8000804d815c>] vfs_parse_monolithic_sep+0xb0/0xfc
[<ffff8000804d81d8>] generic_parse_monolithic+0x30/0x3c
[<ffff8000804d8bfc>] parse_monolithic_mount_data+0x40/0x4c
[<ffff8000804b6a64>] path_mount+0x6c4/0x9ec
[<ffff8000804b6e38>] do_mount+0xac/0xc4
[<ffff8000804b7494>] __arm64_sys_mount+0x16c/0x2b0
[<ffff80008002b8dc>] invoke_syscall+0x7c/0x104
[<ffff80008002ba44>] el0_svc_common.constprop.1+0xe0/0x104
[<ffff80008002ba94>] do_el0_svc+0x2c/0x38
[<ffff800081041108>] el0_svc+0x3c/0x1b8
The opt->iocharset is freed inside the isofs_fill_super function,
But there may be situations where it's not possible to
enter this function.
For example, in the get_tree_bdev_flags function,when
encountering the situation where "Can't mount, would change RO state,"
In such a case, isofs_fill_super will not have the opportunity
to be called,which means that opt->iocharset will not have the chance
to be freed,ultimately leading to a memory leak.
Let's move the memory freeing of opt->iocharset into
isofs_free_fc function.
Fixes: 1b17a46c9243 ("isofs: convert isofs to use the new mount API")
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20241106082841.51773-1-hao.ge@linux.dev
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracefs fixes from Steven Rostedt:
"Fix tracefs mount options.
Commit 78ff64081949 ("vfs: Convert tracefs to use the new mount API")
broke the gid setting when set by fstab or other mount utility. It is
ignored when it is set. Fix the code so that it recognises the option
again and will honor the settings on mount at boot up.
Update the internal documentation and create a selftest to make sure
it doesn't break again in the future"
* tag 'tracefs-v6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing/selftests: Add tracefs mount options test
tracing: Document tracefs gid mount option
tracing: Fix tracefs mount options
|
|
Curretly in function generic_listxattr the for_each_xattr_handler loop
checks err and will return out of the function if err is non-zero.
It's impossible for err to be non-zero at the end of the function where
err is checked again for a non-zero value. The final non-zero check is
therefore redundant and can be removed. Also move the declaration of
err into the loop.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Add the four syscalls setxattrat(), getxattrat(), listxattrat() and
removexattrat(). Those can be used to operate on extended attributes,
especially security related ones, either relative to a pinned directory
or on a file descriptor without read access, avoiding a
/proc/<pid>/fd/<fd> detour, requiring a mounted procfs.
One use case will be setfiles(8) setting SELinux file contexts
("security.selinux") without race conditions and without a file
descriptor opened with read access requiring SELinux read permission.
Use the do_{name}at() pattern from fs/open.c.
Pass the value of the extended attribute, its length, and for
setxattrat(2) the command (XATTR_CREATE or XATTR_REPLACE) via an added
struct xattr_args to not exceed six syscall arguments and not
merging the AT_* and XATTR_* flags.
[AV: fixes by Christian Brauner folded in, the entire thing rebased on
top of {filename,file}_...xattr() primitives, treatment of empty
pathnames regularized. As the result, AT_EMPTY_PATH+NULL handling
is cheap, so f...(2) can use it]
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Link: https://lore.kernel.org/r/20240426162042.191916-1-cgoettsche@seltendoof.de
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
CC: x86@kernel.org
CC: linux-alpha@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-arm-kernel@lists.infradead.org
CC: linux-ia64@vger.kernel.org
CC: linux-m68k@lists.linux-m68k.org
CC: linux-mips@vger.kernel.org
CC: linux-parisc@vger.kernel.org
CC: linuxppc-dev@lists.ozlabs.org
CC: linux-s390@vger.kernel.org
CC: linux-sh@vger.kernel.org
CC: sparclinux@vger.kernel.org
CC: linux-fsdevel@vger.kernel.org
CC: audit@vger.kernel.org
CC: linux-arch@vger.kernel.org
CC: linux-api@vger.kernel.org
CC: linux-security-module@vger.kernel.org
CC: selinux@vger.kernel.org
[brauner: slight tweaks]
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
switch path_removexattrat() and fremovexattr(2) to those
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
switch path_listxattr() and flistxattr(2) to those
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
similar to do_setxattr() in the previous commit...
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
io_uring setxattr logics duplicates stuff from fs/xattr.c; provide
saner helpers (filename_setxattr() and file_setxattr() resp.) and
use them.
NB: putname(ERR_PTR()) is a no-op
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
common logics for marshalling xattr names.
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Rename the struct xattr_ctx to increase distinction with the about to be
added user API struct xattr_args.
No functional change.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Christian Göttsche <cgzones@googlemail.com>
Link: https://lore.kernel.org/r/20240426162042.191916-2-cgoettsche@seltendoof.de
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Replace the deprecated one-element array with a modern flexible array
member in the struct vxfs_dirblk.
Link: https://github.com/KSPP/linux/issues/79
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20241103121707.102838-3-thorsten.blum@linux.dev
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
dlm_errmsg() has been unused since 2010's commit 0016eedc4185
("ocfs2_dlmfs: Use the stackglue.")
Remove dlm_errmsg() and the message table it indexes.
Link: https://lkml.kernel.org/r/20241022002543.302606-1-linux@treblig.org
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fix a typo: panicing -> panicking.
Via codespell.
Link: https://lkml.kernel.org/r/20241027133540.22090-1-algonell@gmail.com
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
By implementing ->writepages instead of ->writepage, we remove a layer of
indirect function calls from the writeback path and the last use of struct
page in nilfs2.
[konishi.ryusuke@gmail.com: fixed panic by using buffer_migrate_folio_norefs]
Link: https://lkml.kernel.org/r/20241002150036.1339475-5-willy@infradead.org
Link: https://lkml.kernel.org/r/20241024092602.13395-13-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Use memcpy_to_folio() instead of open-coding it, and use offset_in_folio()
in case anybody wants to use nilfs2 on a device with large blocks.
[konishi.ryusuke@gmail.com: added label name change]
Link: https://lkml.kernel.org/r/20241002150036.1339475-4-willy@infradead.org
Link: https://lkml.kernel.org/r/20241024092602.13395-12-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Both callers have a folio, so pass it in and use it directly.
[konishi.ryusuke@gmail.com: fixed a checkpatch warning about function declaration]
Link: https://lkml.kernel.org/r/20241002150036.1339475-3-willy@infradead.org
Link: https://lkml.kernel.org/r/20241024092602.13395-11-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Since nilfs2 has a ->writepages operation already, ->writepage is only
called by the migration code. If we add a ->migrate_folio operation, it
won't even be used for that and so it can be deleted.
[konishi.ryusuke@gmail.com: fixed panic by using buffer_migrate_folio_norefs]
Link: https://lkml.kernel.org/r/20241002150036.1339475-2-willy@infradead.org
Link: https://lkml.kernel.org/r/20241024092602.13395-10-konishi.ryusuke@gmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Regarding the cpfile, a metadata file that manages checkpoints, convert
the page-based implementation to a folio-based implementation.
This change involves some helper functions to calculate byte offsets on
folios and removing a few helper functions that are no longer needed.
Link: https://lkml.kernel.org/r/20241024092602.13395-9-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
All calls to nilfs_palloc_block_get_entry() are now gone, so remove it.
Link: https://lkml.kernel.org/r/20241024092602.13395-8-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Regarding the DAT, a metadata file that manages virtual block addresses,
convert the page-based implementation to a folio-based implementation.
Link: https://lkml.kernel.org/r/20241024092602.13395-7-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Convert the page-based implementation of ifile, a metadata file that
manages inodes, to folio-based.
Link: https://lkml.kernel.org/r/20241024092602.13395-6-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Regarding the persistent oject allocator, a common mechanism for
allocating objects in metadata files such as inodes and DAT entries,
convert the page-based implementation to a folio-based implementation.
In this conversion, helper functions nilfs_palloc_group_desc_offset() and
nilfs_palloc_bitmap_offset() are added and used to calculate the byte
offset within a folio of a group descriptor structure and bitmap,
respectively, to replace kmap_local_page with kmap_local_folio.
In addition, a helper function called nilfs_palloc_entry_offset() is
provided to facilitate common calculation of the byte offset within a
folio of metadata file entries managed in the persistent object allocator
format.
Link: https://lkml.kernel.org/r/20241024092602.13395-5-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
For the sufile, which is a metadata file that holds information about
managing segments, convert the page-based implementation to a folio-based
implementation.
kmap_local_page() is changed to use kmap_local_folio(), and where offsets
within a page are calculated using bh_offset(), are replaced with
calculations using offset_in_folio() with an additional helper function
nilfs_sufile_segment_usage_offset().
Link: https://lkml.kernel.org/r/20241024092602.13395-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In the common routines for metadata files, nilfs_mdt_insert_new_block(),
which inserts a new block buffer into the cache, is still page-based, and
there are two places where bh_offset() is used. Convert these to
page-based.
Link: https://lkml.kernel.org/r/20241024092602.13395-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "nilfs2: Finish folio conversion".
This series converts all remaining page structure references in nilfs2 to
folio-based, except for nilfs_copy_buffer function, which was converted to
use folios in advance for cross-fs page flags cleanup.
This prioritizes folio conversion, and does not include buffer head
reference reduction, nor does it support for block sizes larger than the
system page size.
The first eight patches in this series mainly convert each of the
nilfs2-specific metadata implementations to use folios. The last four
patches, by Matthew Wilcox, eliminate aops writepage callbacks and convert
the remaining page structure references to folio-based. This part
reflects some corrections to the patch series posted by Matthew.
This patch (of 12):
In the segment buffer (log buffer) implementation, two parts of the block
buffer, CRC calculation and bio preparation, are still page-based, so
convert them to folio-based.
Link: https://lkml.kernel.org/r/20241024092602.13395-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20241024092602.13395-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Replace the swp function pointer in the min_heap_callbacks of bcachefs
with NULL, allowing direct usage of the default builtin swap
implementation. This modification simplifies the code and improves
performance by removing unnecessary function indirection.
Link: https://lkml.kernel.org/r/20241020040200.939973-10-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Refactor the bcachefs code to remove multiple redundant declarations of
min_heap_callbacks, ensuring that each unique declaration appears only
once.
Link: https://lore.kernel.org/20241017095520.GV16066@noisy.programming.kicks-ass.net
Link: https://lkml.kernel.org/r/20241020040200.939973-9-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Enhance min heap API with non-inline functions and
optimizations", v2.
Add non-inline versions of the min heap API functions in lib/min_heap.c
and updates all users outside of kernel/events/core.c to use these
non-inline versions. To mitigate the performance impact of indirect
function calls caused by the non-inline versions of the swap and compare
functions, a builtin swap has been introduced that swaps elements based on
their size. Additionally, it micro-optimizes the efficiency of the min
heap by pre-scaling the counter, following the same approach as in
lib/sort.c. Documentation for the min heap API has also been added to the
core-api section.
This patch (of 10):
All current min heap API functions are marked with '__always_inline'.
However, as the number of users increases, inlining these functions
everywhere leads to a increase in kernel size.
In performance-critical paths, such as when perf events are enabled and
min heap functions are called on every context switch, it is important to
retain the inline versions for optimal performance. To balance this, the
original inline functions are kept, and additional non-inline versions of
the functions have been added in lib/min_heap.c.
Link: https://lkml.kernel.org/r/20241020040200.939973-1-visitorckw@gmail.com
Link: https://lore.kernel.org/20240522161048.8d8bbc7b153b4ecd92c50666@linux-foundation.org
Link: https://lkml.kernel.org/r/20241020040200.939973-2-visitorckw@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
Cc: Coly Li <colyli@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Kuan-Wei Chiu <visitorckw@gmail.com>
Cc: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Sakai <msakai@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "Improve the copy of task comm", v8.
Using {memcpy,strncpy,strcpy,kstrdup} to copy the task comm relies on the
length of task comm. Changes in the task comm could result in a
destination string that is overflow. Therefore, we should explicitly
ensure the destination string is always NUL-terminated, regardless of the
task comm. This approach will facilitate future extensions to the task
comm.
As suggested by Linus [0], we can identify all relevant code with the
following git grep command:
git grep 'memcpy.*->comm\>'
git grep 'kstrdup.*->comm\>'
git grep 'strncpy.*->comm\>'
git grep 'strcpy.*->comm\>'
PATCH #2~#4: memcpy
PATCH #5~#6: kstrdup
PATCH #7: strcpy
Please note that strncpy() is not included in this series as it is being
tracked by another effort. [1]
This patch (of 7):
We want to eliminate the use of __get_task_comm() for the following
reasons:
- The task_lock() is unnecessary
Quoted from Linus [0]:
: Since user space can randomly change their names anyway, using locking
: was always wrong for readers (for writers it probably does make sense
: to have some lock - although practically speaking nobody cares there
: either, but at least for a writer some kind of race could have
: long-term mixed results
Link: https://lkml.kernel.org/r/20241007144911.27693-1-laoar.shao@gmail.com
Link: https://lkml.kernel.org/r/20241007144911.27693-2-laoar.shao@gmail.com
Link: https://lore.kernel.org/all/CAHk-=wivfrF0_zvf+oj6==Sh=-npJooP8chLPEfaFV0oNYTTBA@mail.gmail.com [0]
Link: https://lore.kernel.org/all/CAHk-=whWtUC-AjmGJveAETKOMeMFSTwKwu99v7+b6AyHMmaDFA@mail.gmail.com/
Link: https://lore.kernel.org/all/CAHk-=wjAmmHUg6vho1KjzQi2=psR30+CogFd4aXrThr2gsiS4g@mail.gmail.com/ [0]
Link: https://github.com/KSPP/linux/issues/90 [1]
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Matus Jokay <matus.jokay@stuba.sk>
Cc: Alejandro Colomar <alx@kernel.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Justin Stitt <justinstitt@google.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@gmail.com>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Ondrej Mosnacek <omosnace@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Quentin Monnet <qmo@kernel.org>
Cc: Simon Horman <horms@kernel.org>
Cc: Stephen Smalley <stephen.smalley.work@gmail.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fix "Allcate" -> "Allocate"
Link: https://lkml.kernel.org/r/20240917185156.10580-1-pvmohammedanees2003@gmail.com
Signed-off-by: Mohammed Anees <pvmohammedanees2003@gmail.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The definition of ocfs2_global_read_dquot() has been removed since commit
fb8dd8d78014 ("ocfs2: Fix quota locking"). Let's remove the empty
declartion
Link: https://lkml.kernel.org/r/20240906055742.105024-1-zhangzekun11@huawei.com
Signed-off-by: Zhang Zekun <zhangzekun11@huawei.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
mm_access() can return NULL if the mm is not found, but this is handled
the same as an error in all callers, with some translating this into an
-ESRCH error.
Only proc_mem_open() returns NULL if no mm is found, however in this case
it is clearer and makes more sense to explicitly handle the error.
Additionally we take the opportunity to refactor the function to eliminate
unnecessary nesting.
Simplify things by simply returning -ESRCH if no mm is found - this both
eliminates confusing use of the IS_ERR_OR_NULL() macro, and simplifies
callers which would return -ESRCH by returning this error directly.
[lorenzo.stoakes@oracle.com: prefer neater pointer error comparison]
Link: https://lkml.kernel.org/r/2fae1834-749a-45e1-8594-5e5979cf7103@lucifer.local
Link: https://lkml.kernel.org/r/20240924201023.193135-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
atomic writes is currently only supported for single fsblock and only
for direct-io. We should not return -ENOTBLK for atomic writes since we
want the atomic write request to either complete fully or fail
otherwise. Hence, we should never fallback to buffered-io in case of
DIO atomic write requests.
Let's also catch if this ever happens by adding some WARN_ON_ONCE before
buffered-io handling for direct-io atomic writes. More details of the
discussion [1].
While at it let's add an inline helper ext4_want_directio_fallback() which
simplifies the logic checks and inherently fixes condition on when to return
-ENOTBLK which otherwise was always returning true for any write or directio in
ext4_iomap_end(). It was ok since ext4 only supports direct-io via iomap.
[1]: https://lore.kernel.org/linux-xfs/cover.1729825985.git.ritesh.list@gmail.com/T/#m9dbecc11bed713ed0d7a486432c56b105b555f04
Suggested-by: Darrick J. Wong <djwong@kernel.org> # inline helper
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
FS needs to add the fmode capability in order to support atomic writes
during file open (refer kiocb_set_rw_flags()). Set this capability on
a regular file if ext4 can do atomic write.
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
Let's validate the given constraints for atomic write request.
Otherwise it will fail with -EINVAL. Currently atomic write is only
supported on DIO, so for buffered-io it will return -EOPNOTSUPP.
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
This patch adds base support for atomic writes via statx getattr.
On bs < ps systems, we can create FS with say bs of 16k. That means
both atomic write min and max unit can be set to 16k for supporting
atomic writes.
Co-developed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
|
|
Check this with every kernel and userspace build, so we can drop the
nonsense in xfs/122. Roughly drafted with:
sed -e 's/^offsetof/\tXFS_CHECK_OFFSET/g' \
-e 's/^sizeof/\tXFS_CHECK_STRUCT_SIZE/g' \
-e 's/ = \([0-9]*\)/,\t\t\t\1);/g' \
-e 's/xfs_sb_t/struct xfs_dsb/g' \
-e 's/),/,/g' \
-e 's/xfs_\([a-z0-9_]*\)_t,/struct xfs_\1,/g' \
< tests/xfs/122.out | sort
and then manual fixups.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Create a separate section for space management btrees so that they're
not mixed in with file structures. Ignore the dsb stuff sprinkled
around for now, because we'll deal with that in a subsequent patch.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Replace xfs_foo_t with struct xfs_foo where appropriate. The next patch
will import more checks from xfs/122, and it's easier to automate
deduplication if we don't have to reason about typedefs.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Enable the metadata directory feature. With this feature, all metadata
inodes are placed in the metadata directory, and the only inumbers in
the superblock are the roots of the two directory trees.
The RT device is now sharded into a number of rtgroups, where 0 rtgroups
mean that no RT extents are supported, and the traditional XFS stub RT
bitmap and summary inodes don't exist. A single rtgroup gives roughly
identical behavior to the traditional RT setup, but now with checksummed
and self identifying free space metadata.
For quota, the quota options are read from the superblock unless
explicitly overridden via mount options.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Enable quotas for the realtime device.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
When metadir is enabled, we want to check the two new rtgroups fields,
and we don't want to check the old inumbers that are now in the metadir.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Fix xfs_quota_reserve_blkres to reserve rt block quota whenever we're
dealing with a realtime file.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|
|
Refactor the quota preallocation watermarking code so that it'll work
for realtime quota too. Convert the do_div calls into div_u64 for
compactness.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
|