summaryrefslogtreecommitdiff
path: root/fs/ext4
AgeCommit message (Collapse)Author
2021-12-23ext4: use ext4_journal_start/stop for fast commit transactionsHarshad Shirwadkar
This patch drops all calls to ext4_fc_start_update() and ext4_fc_stop_update(). To ensure that there are no ongoing journal updates during fast commit, we also make jbd2_fc_begin_commit() lock journal for updates. This way we don't have to maintain two different transaction start stop APIs for fast commit and full commit. This patch doesn't remove the functions altogether since in future we want to have inode level locking for fast commits. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211223202140.2061101-2-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: fix i_version handling on remountLukas Czerner
i_version mount option is getting lost on remount. This is because the 'i_version' mount option differs from the util-linux mount option 'iversion', but it has exactly the same functionality. We have to specifically notify the vfs that this is what we want by setting appropriate flag in fc->sb_flags. Fix it and as a result we can remove *flags argument from __ext4_remount(); do the same for __ext4_fill_super(). In addition set out to deprecate ext4 specific 'i_version' mount option in favor or 'iversion' by kernel version 5.20. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Fixes: cebe85d570cf ("ext4: switch to the new mount api") Link: https://lore.kernel.org/r/20211222104517.11187-2-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: remove lazytime/nolazytime mount options handled by MS_LAZYTIMELukas Czerner
The lazytime and nolazytime mount options were added temporarily back in 2015 with commit a26f49926da9 ("ext4: add optimization for the lazytime mount option"). It think it has been enough time for the util-linux with lazytime support to get widely used. Remove the mount options. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Link: https://lore.kernel.org/r/20211222104517.11187-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-23ext4: don't fail remount if journalling mode didn't changeLukas Czerner
Switching to the new mount api introduced inconsistency in how the journalling mode mount option (data=) is handled during a remount. Ext4 always prevented changing the journalling mode during the remount, however the new code always fails the remount when the journalling mode is specified, even if it remains unchanged. Fix it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Fixes: cebe85d570cf ("ext4: switch to the new mount api") Link: https://lore.kernel.org/r/20211220152657.101599-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Remove unused match_table_t tokensLukas Czerner
Remove unused match_table_t, slim down mount_opts structure by removing unnecessary definitions, remove redundant MOPT_ flags and clean up ext4_parse_param() by converting the most of the if/else branching to switch except for the MOPT_SET/MOPT_CEAR handling. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-14-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: switch to the new mount apiLukas Czerner
Add the necessary functions for the fs_context_operations. Convert and rename ext4_remount() and ext4_fill_super() to ext4_get_tree() and ext4_reconfigure() respectively and switch the ext4 to use the new api. One user facing change is the fact that we no longer have access to the entire string of mount options provided by mount(2) since the mount api does not store it anywhere. As a result we can't print the options to the log as we did in the past after the successful mount. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-13-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: change token2str() to use ext4_param_specsLukas Czerner
Change token2str() to use ext4_param_specs instead of tokens so that we can get rid of tokens entirely. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-12-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: clean up return values in handle_mount_opt()Lukas Czerner
Clean up return values in handle_mount_opt() and rename the function to ext4_parse_param() Now we can use it in fs_context_operations as .parse_param. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-11-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Completely separate options parsing and sb setupLukas Czerner
The new mount api separates option parsing and super block setup into two distinct steps and so we need to separate the options parsing out of the ext4_fill_super() and ext4_remount(). In order to achieve this we have to create new ext4_fill_super() and ext4_remount() functions which will serve its purpose only until we actually do convert to the new api (as such they are only temporary for this patch series) and move the option parsing out of the old function which will now be renamed to __ext4_fill_super() and __ext4_remount(). There is a small complication in the fact that while the mount option parsing is going to happen before we get to __ext4_fill_super(), the mount options stored in the super block itself needs to be applied first, before the user specified mount options. So with this patch we're going through the following sequence: - parse user provided options (including sb block) - initialize sbi and store s_sb_block if provided - in __ext4_fill_super() - read the super block - parse and apply options specified in s_mount_opts - check and apply user provided options stored in ctx - continue with the regular ext4_fill_super operation It's not exactly the most elegant solution, but if we still want to support s_mount_opts we have to do it in this order. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-10-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: get rid of super block and sbi from handle_mount_ops()Lukas Czerner
At the parsing phase of mount in the new mount api sb will not be available. We've already removed some uses of sb and sbi, but now we need to get rid of the rest of it. Use ext4_fs_context to store all of the configuration specification so that it can be later applied to the super block and sbi. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-9-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: check ext2/3 compatibility outside handle_mount_opt()Lukas Czerner
At the parsing phase of mount in the new mount api sb will not be available so move ext2/3 compatibility check outside handle_mount_opt(). Unfortunately we will lose the ability to show exactly which option is not compatible. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-8-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: move quota configuration out of handle_mount_opt()Lukas Czerner
At the parsing phase of mount in the new mount api sb will not be available so move quota confiquration out of handle_mount_opt() by noting the quota file names in the ext4_fs_context structure to be able to apply it later. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-7-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Allow sb to be NULL in ext4_msg()Lukas Czerner
At the parsing phase of mount in the new mount api sb will not be available so allow sb to be NULL in ext4_msg and use that in handle_mount_opt(). Also change return value to appropriate -EINVAL where needed. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-6-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Change handle_mount_opt() to use fs_parameterLukas Czerner
Use the new mount option specifications to parse the options in handle_mount_opt(). However we're still using the old API to get the options string. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-5-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: move option validation to a separate functionLukas Czerner
Move option validation out of parse_options() into a separate function ext4_validate_options(). Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-4-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-09ext4: Add fs parameter specifications for mount optionsLukas Czerner
Create an array of fs_parameter_spec called ext4_param_specs to hold the mount option specifications we're going to be using with the new mount api. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Link: https://lore.kernel.org/r/20211027141857.33657-3-lczerner@redhat.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-12-04fsdax: shift partition offset handling into the file systemsChristoph Hellwig
Remove the last user of ->bdev in dax.c by requiring the file system to pass in an address that already includes the DAX offset. As part of the only set ->bdev or ->daxdev when actually required in the ->iomap_begin methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-27-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: return the partition offset from fs_dax_get_by_bdevChristoph Hellwig
Prepare for the removal of the block_device from the DAX I/O path by returning the partition offset from fs_dax_get_by_bdev so that the file systems have it at hand for use during I/O. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-26-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04iomap: add a IOMAP_DAX flagChristoph Hellwig
Add a flag so that the file system can easily detect DAX operations based just on the iomap operation requested instead of looking at inode state using IS_DAX. This will be needed to apply the to be added partition offset only for operations that actually use DAX, but not things like fiemap that are based on the block device. In the long run it should also allow turning the bdev, dax_dev and inline_data into a union. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-25-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04ext4: cleanup the dax handling in ext4_fill_superChristoph Hellwig
Only call fs_dax_get_by_bdev once the sbi has been allocated and remove the need for the dax_dev local variable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-21-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04fsdax: decouple zeroing from the iomap buffered I/O codeChristoph Hellwig
Unshare the DAX and iomap buffered I/O page zeroing code. This code previously did a IS_DAX check deep inside the iomap code, which in fact was the only DAX check in the code. Instead move these checks into the callers. Most callers already have DAX special casing anyway and XFS will need it for reflink support as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-19-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-12-04dax: remove dax_capableChristoph Hellwig
Just open code the block size and dax_dev == NULL checks in the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> [erofs] Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://lore.kernel.org/r/20211129102203.2243509-9-hch@lst.de Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2021-11-10Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Only bug fixes and cleanups for ext4 this merge window. Of note are fixes for the combination of the inline_data and fast_commit fixes, and more accurately calculating when to schedule additional lazy inode table init, especially when CONFIG_HZ is 100HZ" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix error code saved on super block during file system abort ext4: inline data inode fast commit replay fixes ext4: commit inline data during fast commit ext4: scope ret locally in ext4_try_to_trim_range() ext4: remove an unused variable warning with CONFIG_QUOTA=n ext4: fix boolreturn.cocci warnings in fs/ext4/name.c ext4: prevent getting empty inode buffer ext4: move ext4_fill_raw_inode() related functions ext4: factor out ext4_fill_raw_inode() ext4: prevent partial update of the extent blocks ext4: check for inconsistent extents between index and leaf block ext4: check for out-of-order index extents in ext4_valid_extent_entries() ext4: convert from atomic_t to refcount_t on ext4_io_end->count ext4: refresh the ext4_ext_path struct after dropping i_data_sem. ext4: ensure enough credits in ext4_ext_shift_path_extents ext4: correct the left/middle/right debug message for binsearch ext4: fix lazy initialization next schedule time computation in more granular unit Revert "ext4: enforce buffer head state assertion in ext4_da_map_blocks"
2021-11-06Merge tag 'fsnotify_for_v5.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify updates from Jan Kara: "Support for reporting filesystem errors through fanotify so that system health monitoring daemons can watch for these and act instead of scraping system logs" * tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (34 commits) samples: remove duplicate include in fs-monitor.c samples: Fix warning in fsnotify sample docs: Fix formatting of literal sections in fanotify docs samples: Make fs-monitor depend on libc and headers docs: Document the FAN_FS_ERROR event samples: Add fs error monitoring example ext4: Send notifications on error fanotify: Allow users to request FAN_FS_ERROR events fanotify: Emit generic error info for error event fanotify: Report fid info for file related file system errors fanotify: WARN_ON against too large file handles fanotify: Add helpers to decide whether to report FID/DFID fanotify: Wrap object_fh inline space in a creator macro fanotify: Support merging of error events fanotify: Support enqueueing of error events fanotify: Pre-allocate pool of error events fanotify: Reserve UAPI bits for FAN_FS_ERROR fsnotify: Support FS_ERROR event type fanotify: Require fid_mode for any non-fd event fanotify: Encode empty file handle when no inode is provided ...
2021-11-04ext4: fix error code saved on super block during file system abortGabriel Krisman Bertazi
ext4_abort will eventually call ext4_errno_to_code, which translates the errno to an EXT4_ERR specific error. This means that ext4_abort expects an errno. By using EXT4_ERR_ here, it gets misinterpreted (as an errno), and ends up saving EXT4_ERR_EBUSY on the superblock during an abort, which makes no sense. ESHUTDOWN will get properly translated to EXT4_ERR_SHUTDOWN, so use that instead. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Link: https://lore.kernel.org/r/20211026173302.84000-1-krisman@collabora.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: inline data inode fast commit replay fixesHarshad Shirwadkar
Since there are no blocks in an inline data inode, there's no point in fixing iblocks field in fast commit replay path for this inode. Similarly, there's no point in fixing any block bitmaps / global block counters with respect to such an inode. Just bail out from these functions if an inline data inode is encountered. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211015182513.395917-2-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: commit inline data during fast commitHarshad Shirwadkar
During the commit phase in fast commits if an inode with inline data is being committed, also commit the inline data along with inode. Since recovery code just blindly copies entire content found in inode TLV, there is no change needed on the recovery path. Thus, this change is backward compatiable. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20211015182513.395917-1-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: scope ret locally in ext4_try_to_trim_range()Lukas Bulwahn
As commit 6920b3913235 ("ext4: add new helper interface ext4_try_to_trim_range()") moves some code into the separate function ext4_try_to_trim_range(), the use of the variable ret within that function is more limited and can be adjusted as well. Scope the use of the variable ret locally and drop dead assignments. No functional change. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20210820120853.23134-1-lukas.bulwahn@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: remove an unused variable warning with CONFIG_QUOTA=nAustin Kim
The 'enable_quota' variable is only used in an CONFIG_QUOTA. With CONFIG_QUOTA=n, compiler causes a harmless warning: fs/ext4/super.c: In function ‘ext4_remount’: fs/ext4/super.c:5840:6: warning: variable ‘enable_quota’ set but not used [-Wunused-but-set-variable] int enable_quota = 0; ^~~~~ Move 'enable_quota' into the same #ifdef CONFIG_QUOTA block to remove an unused variable warning. Signed-off-by: Austin Kim <austindh.kim@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210824034929.GA13415@raspberrypi Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: fix boolreturn.cocci warnings in fs/ext4/name.cJing Yangyang
Return statements in functions returning bool should use true/false instead of 1/0. ./fs/ext4/namei.c:1441:12-13:WARNING:return of 0/1 in function 'ext4_match' with return type bool Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Jing Yangyang <jing.yangyang@zte.com.cn> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210824055543.58718-1-deng.changcheng@zte.com.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: prevent getting empty inode bufferZhang Yi
In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate inode buffer when the inode monopolize an inode block for performance reason. For most cases, ext4_mark_iloc_dirty() will fill the inode buffer to make it fine, but we could miss this call if something bad happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an empty inode buffer and trigger ext4 error. For example, if we remove a nonexistent xattr on inode A, ext4_xattr_set_handle() will return ENODATA before invoking ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We will get checksum error message in ext4_iget() when getting inode again. EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid Even worse, if we allocate another inode B at the same inode block, it will corrupt the inode A on disk when write back inode B. So this patch initialize the inode buffer by filling the in-mem inode contents if we skip read I/O, ensure that the buffer is really uptodate. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210901020955.1657340-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: move ext4_fill_raw_inode() related functionsZhang Yi
In preparation for calling ext4_fill_raw_inode() in __ext4_get_inode_loc(), move three related functions before __ext4_get_inode_loc(), no logical change. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210901020955.1657340-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: factor out ext4_fill_raw_inode()Zhang Yi
Factor out ext4_fill_raw_inode() from ext4_do_update_inode(), which is use to fill the in-mem inode contents into the inode table buffer, in preparation for initializing the exclusive inode buffer without reading the block in __ext4_get_inode_loc(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210901020955.1657340-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: prevent partial update of the extent blocksZhang Yi
In the most error path of current extents updating operations are not roll back partial updates properly when some bad things happens(.e.g in ext4_ext_insert_extent()). So we may get an inconsistent extents tree if journal has been aborted due to IO error, which may probability lead to BUGON later when we accessing these extent entries in errors=continue mode. This patch drop extent buffer's verify flag before updatng the contents in ext4_ext_get_access(), and reset it after updating in __ext4_ext_dirty(). After this patch we could force to check the extent buffer if extents tree updating was break off, make sure the extents are consistent. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20210908120850.4012324-4-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: check for inconsistent extents between index and leaf blockZhang Yi
Now that we can check out overlapping extents in leaf block and out-of-order index extents in index block. But the .ee_block in the first extent of one leaf block should equal to the .ei_block in it's parent index extent entry. This patch add a check to verify such inconsistent between the index and leaf block. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20210908120850.4012324-3-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: check for out-of-order index extents in ext4_valid_extent_entries()Zhang Yi
After commit 5946d089379a ("ext4: check for overlapping extents in ext4_valid_extent_entries()"), we can check out the overlapping extent entry in leaf extent blocks. But the out-of-order extent entry in index extent blocks could also trigger bad things if the filesystem is inconsistent. So this patch add a check to figure out the out-of-order index extents and return error. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20210908120850.4012324-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: convert from atomic_t to refcount_t on ext4_io_end->countXiyu Yang
refcount_t type and corresponding API can protect refcounters from accidental underflow and overflow and further use-after-free situations. Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/1626674355-55795-1-git-send-email-xiyuyang19@fudan.edu.cn Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: refresh the ext4_ext_path struct after dropping i_data_sem.yangerkun
After we drop i_data sem, we need to reload the ext4_ext_path structure since the extent tree can change once i_data_sem is released. This addresses the BUG: [52117.465187] ------------[ cut here ]------------ [52117.465686] kernel BUG at fs/ext4/extents.c:1756! ... [52117.478306] Call Trace: [52117.478565] ext4_ext_shift_extents+0x3ee/0x710 [52117.479020] ext4_fallocate+0x139c/0x1b40 [52117.479405] ? __do_sys_newfstat+0x6b/0x80 [52117.479805] vfs_fallocate+0x151/0x4b0 [52117.480177] ksys_fallocate+0x4a/0xa0 [52117.480533] __x64_sys_fallocate+0x22/0x30 [52117.480930] do_syscall_64+0x35/0x80 [52117.481277] entry_SYSCALL_64_after_hwframe+0x44/0xae [52117.481769] RIP: 0033:0x7fa062f855ca Cc: stable@kernel.org Link: https://lore.kernel.org/r/20210903062748.4118886-4-yangerkun@huawei.com Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: ensure enough credits in ext4_ext_shift_path_extentsyangerkun
Like ext4_ext_rm_leaf, we can ensure that there are enough credits before every call that will consume credits. As part of this fix we fold the functionality of ext4_access_path() into ext4_ext_shift_path_extents(). This change is needed as a preparation for the next bugfix patch. Cc: stable@kernel.org Link: https://lore.kernel.org/r/20210903062748.4118886-3-yangerkun@huawei.com Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: correct the left/middle/right debug message for binsearchyangerkun
The debuginfo for binsearch want to show the left/middle/right extent while the process search for the goal block. However we show this info after we change right or left. Link: https://lore.kernel.org/r/20210903062748.4118886-2-yangerkun@huawei.com Signed-off-by: yangerkun <yangerkun@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04ext4: fix lazy initialization next schedule time computation in more ↵Shaoying Xu
granular unit Ext4 file system has default lazy inode table initialization setup once it is mounted. However, it has issue on computing the next schedule time that makes the timeout same amount in jiffies but different real time in secs if with various HZ values. Therefore, fix by measuring the current time in a more granular unit nanoseconds and make the next schedule time independent of the HZ value. Fixes: bfff68738f1c ("ext4: add support for lazy inode table initialization") Signed-off-by: Shaoying Xu <shaoyi@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20210902164412.9994-2-shaoyi@amazon.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2021-11-04Revert "ext4: enforce buffer head state assertion in ext4_da_map_blocks"Eric Whitney
This reverts commit 948ca5f30e1df0c11eb5b0f410b9ceb97fa77ad9. Two crash reports from users running variations on 5.15-rc4 kernels suggest that it is premature to enforce the state assertion in the original commit. Both crashes were triggered by BUG calls in that code, indicating that under some rare circumstance the buffer head state did not match a delayed allocated block at the time the block was written out. No reproducer is available. Resolving this problem will require more time than remains in the current release cycle, so reverting the original patch for the time being is necessary to avoid any instability it may cause. Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20211012171901.5352-1-enwlinux@gmail.com Fixes: 948ca5f30e1d ("ext4: enforce buffer head state assertion in ext4_da_map_blocks") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2021-11-02Merge tag 'gfs2-v5.15-rc5-mmap-fault' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 mmap + page fault deadlocks fixes from Andreas Gruenbacher: "Functions gfs2_file_read_iter and gfs2_file_write_iter are both accessing the user buffer to write to or read from while holding the inode glock. In the most basic deadlock scenario, that buffer will not be resident and it will be mapped to the same file. Accessing the buffer will trigger a page fault, and gfs2 will deadlock trying to take the same inode glock again while trying to handle that fault. Fix that and similar, more complex scenarios by disabling page faults while accessing user buffers. To make this work, introduce a small amount of new infrastructure and fix some bugs that didn't trigger so far, with page faults enabled" * tag 'gfs2-v5.15-rc5-mmap-fault' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix mmap + page fault deadlocks for direct I/O iov_iter: Introduce nofault flag to disable page faults gup: Introduce FOLL_NOFAULT flag to disable page faults iomap: Add done_before argument to iomap_dio_rw iomap: Support partial direct I/O on user copy failures iomap: Fix iomap_dio_rw return value for user copies gfs2: Fix mmap + page fault deadlocks for buffered I/O gfs2: Eliminate ip->i_gh gfs2: Move the inode glock locking to gfs2_file_buffered_write gfs2: Introduce flag for glock holder auto-demotion gfs2: Clean up function may_grant gfs2: Add wrapper for iomap_file_buffered_write iov_iter: Introduce fault_in_iov_iter_writeable iov_iter: Turn iov_iter_fault_in_readable into fault_in_iov_iter_readable gup: Turn fault_in_pages_{readable,writeable} into fault_in_{readable,writeable} powerpc/kvm: Fix kvm_use_magic_page iov_iter: Fix iov_iter_get_pages{,_alloc} page fault return value
2021-11-01Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscryptLinus Torvalds
Pull fscrypt updates from Eric Biggers: "Some cleanups for fs/crypto/: - Allow 256-bit master keys with AES-256-XTS - Improve documentation and comments - Remove unneeded field fscrypt_operations::max_namelen" * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: improve a few comments fscrypt: allow 256-bit master keys with AES-256-XTS fscrypt: improve documentation for inline encryption fscrypt: clean up comments in bio.c fscrypt: remove fscrypt_operations::max_namelen
2021-10-27ext4: Send notifications on errorGabriel Krisman Bertazi
Send a FS_ERROR message via fsnotify to a userspace monitoring tool whenever a ext4 error condition is triggered. This follows the existing error conditions in ext4, so it is hooked to the ext4_error* functions. Link: https://lore.kernel.org/r/20211025192746.66445-30-krisman@collabora.com Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-24iomap: Add done_before argument to iomap_dio_rwAndreas Gruenbacher
Add a done_before argument to iomap_dio_rw that indicates how much of the request has already been transferred. When the request succeeds, we report that done_before additional bytes were tranferred. This is useful for finishing a request asynchronously when part of the request has already been completed synchronously. We'll use that to allow iomap_dio_rw to be used with page faults disabled: when a page fault occurs while submitting a request, we synchronously complete the part of the request that has already been submitted. The caller can then take care of the page fault and call iomap_dio_rw again for the rest of the request, passing in the number of bytes already tranferred. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
2021-10-18ext4: use sb_bdev_nr_blocksChristoph Hellwig
Use the sb_bdev_nr_blocks helper instead of open coding it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20211018101130.1838532-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-18block: switch polling to be bio basedChristoph Hellwig
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-11unicode: pass a UNICODE_AGE() tripple to utf8_loadChristoph Hellwig
Don't bother with pointless string parsing when the caller can just pass the version in the format that the core expects. Also remove the fallback to the latest version that none of the callers actually uses. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2021-10-11ext4: simplify ext4_sb_read_encodingChristoph Hellwig
Return the encoding table as the return value instead of as an argument, and don't bother with the encoding flags as the caller can handle that trivially. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Acked-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>