summaryrefslogtreecommitdiff
path: root/fs/quota
AgeCommit message (Collapse)Author
2024-11-26quota: flush quota_release_work upon quota writebackOjaswin Mujoo
One of the paths quota writeback is called from is: freeze_super() sync_filesystem() ext4_sync_fs() dquot_writeback_dquots() Since we currently don't always flush the quota_release_work queue in this path, we can end up with the following race: 1. dquot are added to releasing_dquots list during regular operations. 2. FS Freeze starts, however, this does not flush the quota_release_work queue. 3. Freeze completes. 4. Kernel eventually tries to flush the workqueue while FS is frozen which hits a WARN_ON since transaction gets started during frozen state: ext4_journal_check_start+0x28/0x110 [ext4] (unreliable) __ext4_journal_start_sb+0x64/0x1c0 [ext4] ext4_release_dquot+0x90/0x1d0 [ext4] quota_release_workfn+0x43c/0x4d0 Which is the following line: WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE); Which ultimately results in generic/390 failing due to dmesg noise. This was detected on powerpc machine 15 cores. To avoid this, make sure to flush the workqueue during dquot_writeback_dquots() so we dont have any pending workitems after freeze. Reported-by: Disha Goel <disgoel@linux.ibm.com> CC: stable@vger.kernel.org Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") Reviewed-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241121123855.645335-2-ojaswin@linux.ibm.com
2024-11-21Merge tag 'reiserfs_delete' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull reiserfs removal from Jan Kara: "The deprecation period of reiserfs is ending at the end of this year so it is time to remove it" * tag 'reiserfs_delete' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: The last commit
2024-11-21Merge tag 'for_v6.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and isofs updates from Jan Kara: "Fix a memory leak in isofs and a cleanup of includes in quota" * tag 'for_v6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: dquot.c: get rid of include ../internal.h isofs: avoid memory leak in iocharset
2024-11-13dquot.c: get rid of include ../internal.hAl Viro
Ugh, indeed - and not needed nearly a decade. It had been added for the sake of inode_sb_list_lock and that spinlock had become a per-superblock (->s_inode_list_lock) in March 2015... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241112213842.GC3387508@ZenIV
2024-11-03fdget_raw() users: switch to CLASS(fd_raw)Al Viro
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-10-21reiserfs: The last commitJan Kara
Deprecation period of reiserfs ends with the end of this year so it is time to remove it from the kernel. Acked-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>
2024-09-23Merge tag 'fs_for_v6.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and isofs updates from Jan Kara: "A few small cleanups in quota and isofs" * tag 'fs_for_v6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: isofs: Annotate struct SL_component with __counted_by() quota: remove unnecessary error code translation in dquot_quota_enable quota: remove redundant return at end of void function quota: remove unneeded return value of register_quota_format quota: avoid missing put_quota_format when DQUOT_SUSPENDED is passed
2024-08-12introduce fd_file(), convert all accessors to it.Al Viro
For any changes of struct fd representation we need to turn existing accesses to fields into calls of wrappers. Accesses to struct fd::flags are very few (3 in linux/file.h, 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in explicit initializers). Those can be dealt with in the commit converting to new layout; accesses to struct fd::file are too many for that. This commit converts (almost) all of f.file to fd_file(f). It's not entirely mechanical ('file' is used as a member name more than just in struct fd) and it does not even attempt to distinguish the uses in pointer context from those in boolean context; the latter will be eventually turned into a separate helper (fd_empty()). NOTE: mass conversion to fd_empty(), tempting as it might be, is a bad idea; better do that piecewise in commit that convert from fdget...() to CLASS(...). [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c caught by git; fs/stat.c one got caught by git grep] [fs/xattr.c conflict] Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-07-24sysctl: treewide: constify the ctl_table argument of proc_handlersJoel Granados
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *); @r4@ identifier func, ctl; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos); ``` * Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-22quota: remove unnecessary error code translation in dquot_quota_enableKemeng Shi
Simply set error code to -EEXIST when quota limit is already enabled in dquot_quota_enable to remove unnecessary error code translation. Link: https://patch.msgid.link/20240715130534.2112678-5-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jan Kara <jack@suse.cz>
2024-07-22quota: remove redundant return at end of void functionKemeng Shi
Function dquot_claim_space_nodirty nad dquot_reclaim_space_nodirty have no return value, just remove redundant return at end of the functions. Link: https://patch.msgid.link/20240715130534.2112678-4-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by: Jan Kara <jack@suse.cz>
2024-07-22quota: remove unneeded return value of register_quota_formatKemeng Shi
The register_quota_format always returns 0, simply remove unneeded return value. Link: https://patch.msgid.link/20240715130534.2112678-3-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Jan Kara <jack@suse.cz>
2024-07-22quota: avoid missing put_quota_format when DQUOT_SUSPENDED is passedKemeng Shi
Avoid missing put_quota_format when DQUOT_SUSPENDED is passed to dquot_load_quota_sb. Link: https://patch.msgid.link/20240715130534.2112678-2-shikemeng@huaweicloud.com Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Fixes: d44c57663723 ("quota: Remove BUG_ON in dquot_load_quota_sb()") Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Jan Kara <jack@suse.cz>
2024-06-03vfs: replace WARN(down_read_trylock, ...) abuse with proper assertsMateusz Guzik
Note the macro used here works regardless of LOCKDEP. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/r/20240602123720.775702-1-mjguzik@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-12quota: fix to propagate error of mark_dquot_dirty() to callerChao Yu
in order to let caller be aware of failure of mark_dquot_dirty(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240412094942.2131243-1-chao@kernel.org>
2024-03-25fs: quota: use group allocation of per-cpu counters APIKefeng Wang
Use group allocation of per-cpu counters api to accelerate dquot_init() and simplify code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240325041240.53537-1-wangkefeng.wang@huawei.com>
2024-03-05quota: remove SLAB_MEM_SPREAD flag usageChengming Zhou
The SLAB_MEM_SPREAD flag is already a no-op after removal of SLAB allocator and in [1] it was fully deprecated. Remove its usage so we can delete it from slab. No functional change. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/all/20240223-slab-cleanup-flags-v2-1-02f1753e8303@suse.cz/ Message-Id: <20240224135118.830073-1-chengming.zhou@linux.dev>
2024-02-12quota: Detect loops in quota treeJan Kara
Syzbot has found that when it creates corrupted quota files where the quota tree contains a loop, we will deadlock when tryling to insert a dquot. Add loop detection into functions traversing the quota tree. Signed-off-by: Jan Kara <jack@suse.cz>
2024-02-08quota: Properly annotate i_dquot arrays with __rcuJan Kara
Dquots pointed to from i_dquot arrays in inodes are protected by dquot_srcu. Annotate them as such and change .get_dquots callback to return properly annotated pointer to make sparse happy. Fixes: b9ba6f94b238 ("quota: remove dqptr_sem") Signed-off-by: Jan Kara <jack@suse.cz>
2024-02-08quota: Fix rcu annotations of inode dquot pointersJan Kara
Dquot pointers in i_dquot array in the inode are protected by dquot_srcu. Annotate the array pointers with __rcu, perform the locked dereferences with srcu_dereference_check() instead of plain reads, and set the array elements with rcu_assign_pointer(). Fixes: b9ba6f94b238 ("quota: remove dqptr_sem") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402061900.rTuYDlo6-lkp@intel.com/ Signed-off-by: Jan Kara <jack@suse.cz>
2024-02-05quota: Fix potential NULL pointer dereferenceWang Jianjian
Below race may cause NULL pointer dereference P1 P2 dquot_free_inode quota_off drop_dquot_ref remove_dquot_ref dquots = i_dquot(inode) dquots = i_dquot(inode) srcu_read_lock dquots[cnt]) != NULL (1) dquots[type] = NULL (2) spin_lock(&dquots[cnt]->dq_dqb_lock) (3) .... If dquot_free_inode(or other routines) checks inode's quota pointers (1) before quota_off sets it to NULL(2) and use it (3) after that, NULL pointer dereference will be triggered. So let's fix it by using a temporary pointer to avoid this issue. Signed-off-by: Wang Jianjian <wangjianjian3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20240202081852.2514092-1-wangjianjian3@huawei.com>
2024-01-23quota: Drop GFP_NOFS instances under dquot->dq_lock and dqio_semJan Kara
Quota code acquires dquot->dq_lock whenever reading / writing dquot. When reading / writing quota info we hold dqio_sem. Since these locks can be acquired during inode reclaim (through dquot_drop() -> dqput() -> dquot_release()) we are setting nofs allocation context whenever acquiring these locks. Hence there's no need to use GFP_NOFS allocations in quota code doing IO. Just switch it to GFP_KERNEL. Signed-off-by: Jan Kara <jack@suse.cz>
2024-01-23quota: Set nofs allocation context when acquiring dqio_semJan Kara
dqio_sem can be acquired during inode reclaim through dquot_drop() -> dqput() -> dquot_release() -> write_file_info() context. In some cases (most notably through dquot_get_next_id()) it can be acquired without holding dquot->dq_lock (which already sets nofs allocation context). So we need to set nofs allocation context when acquiring it as well. Signed-off-by: Jan Kara <jack@suse.cz>
2024-01-22quota: Remove BUG_ON from dqget()Jan Kara
dqget() checks whether dquot->dq_sb is set when returning it using BUG_ON. Firstly this doesn't work as an invalidation check for quite some time (we release dquot with dq_sb set these days), secondly using BUG_ON is quite harsh. Use WARN_ON_ONCE and check whether dquot is still hashed instead. Signed-off-by: Jan Kara <jack@suse.cz>
2024-01-22quota: Remove BUG_ON in dquot_load_quota_sb()Jan Kara
The case when someone passes DQUOT_SUSPENDED flag to dquot_load_quota_sb() is easy to handle. So just WARN_ON_ONCE and bail with error if that happens instead of calling BUG_ON which is likely to lockup the machine. Signed-off-by: Jan Kara <jack@suse.cz>
2024-01-22quota: Replace BUG_ON in dqput()Jan Kara
The BUG_ON in dqput() will likely take the whole machine down when it hits. Replace it with WARN_ON_ONCE() instead and stop hiding that behind CONFIG_QUOTA_DEBUG. Signed-off-by: Jan Kara <jack@suse.cz>
2024-01-10Merge tag 'sysctl-6.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "To help make the move of sysctls out of kernel/sysctl.c not incur a size penalty sysctl has been changed to allow us to not require the sentinel, the final empty element on the sysctl array. Joel Granados has been doing all this work. In the v6.6 kernel we got the major infrastructure changes required to support this. For v6.7 we had all arch/ and drivers/ modified to remove the sentinel. For v6.8-rc1 we get a few more updates for fs/ directory only. The kernel/ directory is left but we'll save that for v6.9-rc1 as those patches are still being reviewed. After that we then can expect also the removal of the no longer needed check for procname == NULL. Let us recap the purpose of this work: - this helps reduce the overall build time size of the kernel and run time memory consumed by the kernel by about ~64 bytes per array - the extra 64-byte penalty is no longer inncurred now when we move sysctls out from kernel/sysctl.c to their own files Thomas Weißschuh also sent a few cleanups, for v6.9-rc1 we expect to see further work by Thomas Weißschuh with the constificatin of the struct ctl_table. Due to Joel Granados's work, and to help bring in new blood, I have suggested for him to become a maintainer and he's accepted. So for v6.9-rc1 I look forward to seeing him sent you a pull request for further sysctl changes. This also removes Iurii Zaikin as a maintainer as he has moved on to other projects and has had no time to help at all" * tag 'sysctl-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: sysctl: remove struct ctl_path sysctl: delete unused define SYSCTL_PERM_EMPTY_DIR coda: Remove the now superfluous sentinel elements from ctl_table array sysctl: Remove the now superfluous sentinel elements from ctl_table array fs: Remove the now superfluous sentinel elements from ctl_table array cachefiles: Remove the now superfluous sentinel element from ctl_table array sysclt: Clarify the results of selftest run sysctl: Add a selftest for handling empty dirs sysctl: Fix out of bounds access for empty sysctl registers MAINTAINERS: Add Joel Granados as co-maintainer for proc sysctl MAINTAINERS: remove Iurii Zaikin from proc sysctl
2023-12-28fs: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel elements ctl_table struct. Special attention was placed in making sure that an empty directory for fs/verity was created when CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the register sysctl call that expects a size. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-12-11quota: convert dquot_claim_space_nodirty() to return voidChao Yu
dquot_claim_space_nodirty() always return zero, let's convert it to return void, then, its caller can get rid of handling failure case. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20231210025028.3262900-1-chao@kernel.org>
2023-11-02Merge tag 'mm-stable-2023-11-01-14-33' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Many singleton patches against the MM code. The patch series which are included in this merge do the following: - Kemeng Shi has contributed some compation maintenance work in the series 'Fixes and cleanups to compaction' - Joel Fernandes has a patchset ('Optimize mremap during mutual alignment within PMD') which fixes an obscure issue with mremap()'s pagetable handling during a subsequent exec(), based upon an implementation which Linus suggested - More DAMON/DAMOS maintenance and feature work from SeongJae Park i the following patch series: mm/damon: misc fixups for documents, comments and its tracepoint mm/damon: add a tracepoint for damos apply target regions mm/damon: provide pseudo-moving sum based access rate mm/damon: implement DAMOS apply intervals mm/damon/core-test: Fix memory leaks in core-test mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval - In the series 'Do not try to access unaccepted memory' Adrian Hunter provides some fixups for the recently-added 'unaccepted memory' feature. To increase the feature's checking coverage. 'Plug a few gaps where RAM is exposed without checking if it is unaccepted memory' - In the series 'cleanups for lockless slab shrink' Qi Zheng has done some maintenance work which is preparation for the lockless slab shrinking code - Qi Zheng has redone the earlier (and reverted) attempt to make slab shrinking lockless in the series 'use refcount+RCU method to implement lockless slab shrink' - David Hildenbrand contributes some maintenance work for the rmap code in the series 'Anon rmap cleanups' - Kefeng Wang does more folio conversions and some maintenance work in the migration code. Series 'mm: migrate: more folio conversion and unification' - Matthew Wilcox has fixed an issue in the buffer_head code which was causing long stalls under some heavy memory/IO loads. Some cleanups were added on the way. Series 'Add and use bdev_getblk()' - In the series 'Use nth_page() in place of direct struct page manipulation' Zi Yan has fixed a potential issue with the direct manipulation of hugetlb page frames - In the series 'mm: hugetlb: Skip initialization of gigantic tail struct pages if freed by HVO' has improved our handling of gigantic pages in the hugetlb vmmemmep optimizaton code. This provides significant boot time improvements when significant amounts of gigantic pages are in use - Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code rationalization and folio conversions in the hugetlb code - Yin Fengwei has improved mlock()'s handling of large folios in the series 'support large folio for mlock' - In the series 'Expose swapcache stat for memcg v1' Liu Shixin has added statistics for memcg v1 users which are available (and useful) under memcg v2 - Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable) prctl so that userspace may direct the kernel to not automatically propagate the denial to child processes. The series is named 'MDWE without inheritance' - Kefeng Wang has provided the series 'mm: convert numa balancing functions to use a folio' which does what it says - In the series 'mm/ksm: add fork-exec support for prctl' Stefan Roesch makes is possible for a process to propagate KSM treatment across exec() - Huang Ying has enhanced memory tiering's calculation of memory distances. This is used to permit the dax/kmem driver to use 'high bandwidth memory' in addition to Optane Data Center Persistent Memory Modules (DCPMM). The series is named 'memory tiering: calculate abstract distance based on ACPI HMAT' - In the series 'Smart scanning mode for KSM' Stefan Roesch has optimized KSM by teaching it to retain and use some historical information from previous scans - Yosry Ahmed has fixed some inconsistencies in memcg statistics in the series 'mm: memcg: fix tracking of pending stats updates values' - In the series 'Implement IOCTL to get and optionally clear info about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits us to atomically read-then-clear page softdirty state. This is mainly used by CRIU - Hugh Dickins contributed the series 'shmem,tmpfs: general maintenance', a bunch of relatively minor maintenance tweaks to this code - Matthew Wilcox has increased the use of the VMA lock over file-backed page faults in the series 'Handle more faults under the VMA lock'. Some rationalizations of the fault path became possible as a result - In the series 'mm/rmap: convert page_move_anon_rmap() to folio_move_anon_rmap()' David Hildenbrand has implemented some cleanups and folio conversions - In the series 'various improvements to the GUP interface' Lorenzo Stoakes has simplified and improved the GUP interface with an eye to providing groundwork for future improvements - Andrey Konovalov has sent along the series 'kasan: assorted fixes and improvements' which does those things - Some page allocator maintenance work from Kemeng Shi in the series 'Two minor cleanups to break_down_buddy_pages' - In thes series 'New selftest for mm' Breno Leitao has developed another MM self test which tickles a race we had between madvise() and page faults - In the series 'Add folio_end_read' Matthew Wilcox provides cleanups and an optimization to the core pagecache code - Nhat Pham has added memcg accounting for hugetlb memory in the series 'hugetlb memcg accounting' - Cleanups and rationalizations to the pagemap code from Lorenzo Stoakes, in the series 'Abstract vma_merge() and split_vma()' - Audra Mitchell has fixed issues in the procfs page_owner code's new timestamping feature which was causing some misbehaviours. In the series 'Fix page_owner's use of free timestamps' - Lorenzo Stoakes has fixed the handling of new mappings of sealed files in the series 'permit write-sealed memfd read-only shared mappings' - Mike Kravetz has optimized the hugetlb vmemmap optimization in the series 'Batch hugetlb vmemmap modification operations' - Some buffer_head folio conversions and cleanups from Matthew Wilcox in the series 'Finish the create_empty_buffers() transition' - As a page allocator performance optimization Huang Ying has added automatic tuning to the allocator's per-cpu-pages feature, in the series 'mm: PCP high auto-tuning' - Roman Gushchin has contributed the patchset 'mm: improve performance of accounted kernel memory allocations' which improves their performance by ~30% as measured by a micro-benchmark - folio conversions from Kefeng Wang in the series 'mm: convert page cpupid functions to folios' - Some kmemleak fixups in Liu Shixin's series 'Some bugfix about kmemleak' - Qi Zheng has improved our handling of memoryless nodes by keeping them off the allocation fallback list. This is done in the series 'handle memoryless nodes more appropriately' - khugepaged conversions from Vishal Moola in the series 'Some khugepaged folio conversions'" [ bcachefs conflicts with the dynamically allocated shrinkers have been resolved as per Stephen Rothwell in https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/ with help from Qi Zheng. The clone3 test filtering conflict was half-arsed by yours truly ] * tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits) mm/damon/sysfs: update monitoring target regions for online input commit mm/damon/sysfs: remove requested targets when online-commit inputs selftests: add a sanity check for zswap Documentation: maple_tree: fix word spelling error mm/vmalloc: fix the unchecked dereference warning in vread_iter() zswap: export compression failure stats Documentation: ubsan: drop "the" from article title mempolicy: migration attempt to match interleave nodes mempolicy: mmap_lock is not needed while migrating folios mempolicy: alloc_pages_mpol() for NUMA policy without vma mm: add page_rmappable_folio() wrapper mempolicy: remove confusing MPOL_MF_LAZY dead code mempolicy: mpol_shared_policy_init() without pseudo-vma mempolicy trivia: use pgoff_t in shared mempolicy tree mempolicy trivia: slightly more consistent naming mempolicy trivia: delete those ancient pr_debug()s mempolicy: fix migrate_pages(2) syscall return nr_failed kernfs: drop shared NUMA mempolicy hooks hugetlbfs: drop shared NUMA mempolicy pretence mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets() ...
2023-11-02Merge tag 'fs_for_v6.7-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, udf, and quota updates from Jan Kara: - conversion of ext2 directory code to use folios - cleanups in UDF declarations - bugfix for quota interaction with file encryption * tag 'fs_for_v6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Convert ext2_prepare_chunk and ext2_commit_chunk to folios ext2: Convert ext2_make_empty() to use a folio ext2: Convert ext2_unlink() and ext2_rename() to use folios ext2: Convert ext2_delete_entry() to use folios ext2: Convert ext2_empty_dir() to use a folio ext2: Convert ext2_add_link() to use a folio ext2: Convert ext2_readdir to use a folio ext2: Add ext2_get_folio() ext2: Convert ext2_check_page to ext2_check_folio highmem: Add folio_release_kmap() udf: Avoid unneeded variable length array in struct fileIdentDesc udf: Annotate struct udf_bitmap with __counted_by quota: explicitly forbid quota files from being encrypted
2023-10-06quota: Fix slow quotaoffJan Kara
Eric has reported that commit dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") heavily increases runtime of generic/270 xfstest for ext4 in nojournal mode. The reason for this is that ext4 in nojournal mode leaves dquots dirty until the last dqput() and thus the cleanup done in quota_release_workfn() has to write them all. Due to the way quota_release_workfn() is written this results in synchronize_srcu() call for each dirty dquot which makes the dquot cleanup when turning quotas off extremely slow. To be able to avoid synchronize_srcu() for each dirty dquot we need to rework how we track dquots to be cleaned up. Instead of keeping the last dquot reference while it is on releasing_dquots list, we drop it right away and mark the dquot with new DQ_RELEASING_B bit instead. This way we can we can remove dquot from releasing_dquots list when new reference to it is acquired and thus there's no need to call synchronize_srcu() each time we drop dq_list_lock. References: https://lore.kernel.org/all/ZRytn6CxFK2oECUt@debian-BULLSEYE-live-builder-AMD64 Reported-by: Eric Whitney <enwlinux@gmail.com> Fixes: dabc8b207566 ("quota: fix dqput() to follow the guarantees dquot_srcu should provide") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2023-10-04quota: dynamically allocate the dquota-cache shrinkerQi Zheng
Use new APIs to dynamically allocate the dquota-cache shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-14-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Jan Kara <jack@suse.cz> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-09-05quota: explicitly forbid quota files from being encryptedEric Biggers
Since commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key"), xfstest generic/270 causes a WARNING when run on f2fs with test_dummy_encryption in the mount options: $ kvm-xfstests -c f2fs/encrypt generic/270 [...] WARNING: CPU: 1 PID: 2453 at fs/crypto/keyring.c:240 fscrypt_destroy_keyring+0x1f5/0x260 The cause of the WARNING is that not all encrypted inodes have been evicted before fscrypt_destroy_keyring() is called, which violates an assumption. This happens because the test uses an external quota file, which gets automatically encrypted due to test_dummy_encryption. Encryption of quota files has never really been supported. On ext4, ext4_quota_read() does not decrypt the data, so encrypted quota files are always considered invalid on ext4. On f2fs, f2fs_quota_read() uses the pagecache, so trying to use an encrypted quota file gets farther, resulting in the issue described above being possible. But this was never intended to be possible, and there is no use case for it. Therefore, make the quota support layer explicitly reject using IS_ENCRYPTED inodes when quotaon is attempted. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230905003227.326998-1-ebiggers@kernel.org>
2023-08-30Merge tag 'for_v6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, quota, and udf updates from Jan Kara: - fixes for possible use-after-free issues with quota when racing with chown - fixes for ext2 crashing when xattr allocation races with another block allocation to the same file from page writeback code - fix for block number overflow in ext2 - marking of reiserfs as obsolete in MAINTAINERS - assorted minor cleanups * tag 'for_v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: Fix kernel-doc warnings ext2: improve consistency of ext2_fsblk_t datatype usage ext2: dump current reservation window info ext2: fix race between setxattr and write back ext2: introduce new flags argument for ext2_new_blocks() ext2: remove ext2_new_block() ext2: fix datatype of block number in ext2_xattr_set2() udf: Drop pointless aops assignment quota: use lockdep_assert_held_write in dquot_load_quota_sb MAINTAINERS: change reiserfs status to obsolete udf: Fix -Wstringop-overflow warnings quota: simplify drop_dquot_ref() quota: fix dqput() to follow the guarantees dquot_srcu should provide quota: add new helper dquot_active() quota: rename dquot_active() to inode_quota_active() quota: factor out dquot_write_dquot() ext2: remove redundant assignment to variable desc and variable best_desc
2023-08-09quota: Check presence of quota operation structures instead of ->quota_read ↵Jan Kara
and ->quota_write callbacks Currently we check whether superblock has ->quota_read and ->quota_write operations to check whether filesystem supports quotas. However for example for shmfs we will not read or write dquots so check whether quota operations are set in the superblock instead. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Message-Id: <20230725144510.253763-4-cem@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-08-03quota: use lockdep_assert_held_write in dquot_load_quota_sbChristoph Hellwig
Use lockdep_assert_held_write to assert and self-document the locking state in dquot_load_quota_sb instead of hand-crafting it with a trylock. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230802115439.2145212-2-hch@lst.de>
2023-07-03quota: simplify drop_dquot_ref()Baokun Li
As Honza said, remove_inode_dquot_ref() currently does not release the last dquot reference but instead adds the dquot to tofree_head list. This is because dqput() can sleep while dropping of the last dquot reference (writing back the dquot and calling ->release_dquot()) and that must not happen under dq_list_lock. Now that dqput() queues the final dquot cleanup into a workqueue, remove_inode_dquot_ref() can call dqput() unconditionally and we can significantly simplify it. Here we open code the simplified code of remove_inode_dquot_ref() into remove_dquot_ref() and remove the function put_dquot_list() which is no longer used. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-6-libaokun1@huawei.com>
2023-07-03quota: fix dqput() to follow the guarantees dquot_srcu should provideBaokun Li
The dquot_mark_dquot_dirty() using dquot references from the inode should be protected by dquot_srcu. quota_off code takes care to call synchronize_srcu(&dquot_srcu) to not drop dquot references while they are used by other users. But dquot_transfer() breaks this assumption. We call dquot_transfer() to drop the last reference of dquot and add it to free_dquots, but there may still be other users using the dquot at this time, as shown in the function graph below: cpu1 cpu2 _________________|_________________ wb_do_writeback CHOWN(1) ... ext4_da_update_reserve_space dquot_claim_block ... dquot_mark_dquot_dirty // try to dirty old quota test_bit(DQ_ACTIVE_B, &dquot->dq_flags) // still ACTIVE if (test_bit(DQ_MOD_B, &dquot->dq_flags)) // test no dirty, wait dq_list_lock ... dquot_transfer __dquot_transfer dqput_all(transfer_from) // rls old dquot dqput // last dqput dquot_release clear_bit(DQ_ACTIVE_B, &dquot->dq_flags) atomic_dec(&dquot->dq_count) put_dquot_last(dquot) list_add_tail(&dquot->dq_free, &free_dquots) // add the dquot to free_dquots if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags)) add dqi_dirty_list // add released dquot to dirty_list This can cause various issues, such as dquot being destroyed by dqcache_shrink_scan() after being added to free_dquots, which can trigger a UAF in dquot_mark_dquot_dirty(); or after dquot is added to free_dquots and then to dirty_list, it is added to free_dquots again after dquot_writeback_dquots() is executed, which causes the free_dquots list to be corrupted and triggers a UAF when dqcache_shrink_scan() is called for freeing dquot twice. As Honza said, we need to fix dquot_transfer() to follow the guarantees dquot_srcu should provide. But calling synchronize_srcu() directly from dquot_transfer() is too expensive (and mostly unnecessary). So we add dquot whose last reference should be dropped to the new global dquot list releasing_dquots, and then queue work item which would call synchronize_srcu() and after that perform the final cleanup of all the dquots on releasing_dquots. Fixes: 4580b30ea887 ("quota: Do not dirty bad dquots") Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-5-libaokun1@huawei.com>
2023-07-03quota: add new helper dquot_active()Baokun Li
Add new helper function dquot_active() to make the code more concise. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-4-libaokun1@huawei.com>
2023-07-03quota: rename dquot_active() to inode_quota_active()Baokun Li
Now we have a helper function dquot_dirty() to determine if dquot has DQ_MOD_B bit. dquot_active() can easily be misunderstood as a helper function to determine if dquot has DQ_ACTIVE_B bit. So we avoid this by renaming it to inode_quota_active() and later on we will add the helper function dquot_active() to determine if dquot has DQ_ACTIVE_B bit. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-3-libaokun1@huawei.com>
2023-07-03quota: factor out dquot_write_dquot()Baokun Li
Refactor out dquot_write_dquot() to reduce duplicate code. Signed-off-by: Baokun Li <libaokun1@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230630110822.3881712-2-libaokun1@huawei.com>
2023-06-05quota: fix warning in dqgrab()Ye Bin
There's issue as follows when do fault injection: WARNING: CPU: 1 PID: 14870 at include/linux/quotaops.h:51 dquot_disable+0x13b7/0x18c0 Modules linked in: CPU: 1 PID: 14870 Comm: fsconfig Not tainted 6.3.0-next-20230505-00006-g5107a9c821af-dirty #541 RIP: 0010:dquot_disable+0x13b7/0x18c0 RSP: 0018:ffffc9000acc79e0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88825e41b980 RDX: 0000000000000000 RSI: ffff88825e41b980 RDI: 0000000000000002 RBP: ffff888179f68000 R08: ffffffff82087ca7 R09: 0000000000000000 R10: 0000000000000001 R11: ffffed102f3ed026 R12: ffff888179f68130 R13: ffff888179f68110 R14: dffffc0000000000 R15: ffff888179f68118 FS: 00007f450a073740(0000) GS:ffff88882fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe96f2efd8 CR3: 000000025c8ad000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> dquot_load_quota_sb+0xd53/0x1060 dquot_resume+0x172/0x230 ext4_reconfigure+0x1dc6/0x27b0 reconfigure_super+0x515/0xa90 __x64_sys_fsconfig+0xb19/0xd20 do_syscall_64+0x39/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd Above issue may happens as follows: ProcessA ProcessB ProcessC sys_fsconfig vfs_fsconfig_locked reconfigure_super ext4_remount dquot_suspend -> suspend all type quota sys_fsconfig vfs_fsconfig_locked reconfigure_super ext4_remount dquot_resume ret = dquot_load_quota_sb add_dquot_ref do_open -> open file O_RDWR vfs_open do_dentry_open get_write_access atomic_inc_unless_negative(&inode->i_writecount) ext4_file_open dquot_file_open dquot_initialize __dquot_initialize dqget atomic_inc(&dquot->dq_count); __dquot_initialize __dquot_initialize dqget if (!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) ext4_acquire_dquot -> Return error DQ_ACTIVE_B flag isn't set dquot_disable invalidate_dquots if (atomic_read(&dquot->dq_count)) dqgrab WARN_ON_ONCE(!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) -> Trigger warning In the above scenario, 'dquot->dq_flags' has no DQ_ACTIVE_B is normal when dqgrab(). To solve above issue just replace the dqgrab() use in invalidate_dquots() with atomic_inc(&dquot->dq_count). Signed-off-by: Ye Bin <yebin10@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230605140731.2427629-3-yebin10@huawei.com>
2023-06-05quota: Properly disable quotas when add_dquot_ref() failsJan Kara
When add_dquot_ref() fails (usually due to IO error or ENOMEM), we want to disable quotas we are trying to enable. However dquot_disable() call was passed just the flags we are enabling so in case flags == DQUOT_USAGE_ENABLED dquot_disable() call will just fail with EINVAL instead of properly disabling quotas. Fix the problem by always passing DQUOT_LIMITS_ENABLED | DQUOT_USAGE_ENABLED to dquot_disable() in this case. Reported-and-tested-by: Ye Bin <yebin10@huawei.com> Reported-by: syzbot+e633c79ceaecbf479854@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230605140731.2427629-2-yebin10@huawei.com>
2023-05-30fs: Drop wait_unfrozen wait queueJan Kara
wait_unfrozen waitqueue is used only in quota code to wait for filesystem to become unfrozen. In that place we can just use sb_start_write() - sb_end_write() pair to achieve the same. So just remove the waitqueue. Reviewed-by: Christian Brauner <brauner@kernel.org> Message-Id: <20230525141710.7595-1-jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
2023-04-14quota: mark PRINT_QUOTA_WARNING as BROKENYangtao Li
PRINT_QUOTA_WARNING is deprecated since commit 8e8934695dfd ("quota: send messages via netlink") merged in 2007. Users should rather be using notification over netlink socket if they are interested about explicit notification in addition to plain EDQUOT error. Since printing to console from deep inside filesystem code is problematic, mark the feature as BROKEN now and see who complains. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230413163833.43913-1-frank.li@vivo.com>
2023-04-13quota: update Kconfig commentYangtao Li
f2fs support quota since commit 0abd675e97e6 ("f2fs: support plain user/group quota"), let's document it. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230413151412.30059-1-frank.li@vivo.com>
2023-03-29quota: Use register_sysctl_init() for registering fs_dqstats_tableJan Kara
register_sysctl_init() also prints information that may be useful for further debugging when we fail to register sysctl table. Use it when registering fs_dqstats_table. Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz>
2023-03-20quota: make dquot_set_dqinfo return errors from ->write_infoYangtao Li
dquot_set_dqinfo() ignores the return code from the ->write_info call, which means that quotacalls like Q_SETINFO never see the error. This doesn't seem right, so fix that. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230227120216.31306-2-frank.li@vivo.com>
2023-03-20quota: fixup *_write_file_info() to return proper error codeYangtao Li
For v1_write_file_info function, when quota_write() returns 0, it should be considered an EIO error. And for v2_write_file_info(), fix to proper error return code instead of raw number. Signed-off-by: Yangtao Li <frank.li@vivo.com> Signed-off-by: Jan Kara <jack@suse.cz> Message-Id: <20230227120216.31306-1-frank.li@vivo.com>