summaryrefslogtreecommitdiff
path: root/fs/jbd2
AgeCommit message (Collapse)Author
2024-11-13jbd2: Fix comment describing journal_init_common()Daniel Martín Gómez
The code indicates that journal_init_common() fills the journal_t object it returns while the comment incorrectly states that only a few fields are initialised. Also, the comment claims that journal structures could be created from scratch which isn't possible as journal_init_common() calls journal_load_superblock() which loads and checks journal superblock from disk. Signed-off-by: Daniel Martín Gómez <dalme@riseup.net> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241107144538.3544-1-dalme@riseup.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: make b_frozen_data allocation always succeedZhihao Cheng
The b_frozen_data allocation should not be failed during journal committing process, otherwise jbd2 will abort. Since commit 490c1b444ce653d("jbd2: do not fail journal because of frozen_buffer allocation failure") already added '__GFP_NOFAIL' flag in do_get_write_access(), just add '__GFP_NOFAIL' flag for all allocations in jbd2_journal_write_metadata_buffer(), like 'new_bh' allocation does. Besides, remove all error handling branches for do_get_write_access(). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241012085530.2147846-1-chengzhihao@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: remove the 'success' parameter from the jbd2_do_replay() functionYe Bin
Keep 'success' internally to track if any error happened and then return it at the end in do_one_pass(). If jbd2_do_replay() return -ENOMEM then stop replay journal. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240930005942.626942-7-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: remove useless 'block_error' variableYe Bin
The judgement 'if (block_error && success == 0)' is never valid. Just remove useless 'block_error' variable. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240930005942.626942-6-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: factor out jbd2_do_replay()Ye Bin
Factor out jbd2_do_replay() no funtional change. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240930005942.626942-5-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: refactor JBD2_COMMIT_BLOCK process in do_one_pass()Ye Bin
To make JBD2_COMMIT_BLOCK process more clean, no functional change. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240930005942.626942-4-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: unified release of buffer_head in do_one_pass()Ye Bin
Now buffer_head free is very fragmented in do_one_pass(), unified release of buffer_head in do_one_pass() Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240930005942.626942-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-11-12jbd2: remove redundant judgments for check v1 checksumYe Bin
'need_check_commit_time' is only used by v2/v3 checksum, so there isn't need to add 'need_check_commit_time' judegement for v1 checksum logic. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240930005942.626942-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26jbd2: remove unneeded check of ret in jbd2_fc_get_bufKemeng Shi
Simply return -EINVAL if j_fc_off is invalid to avoid repeated check of ret. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-9-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26jbd2: correct comment jbd2_mark_journal_emptyKemeng Shi
After jbd2_mark_journal_empty, journal log is supposed to be empty. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-8-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26jbd2: move escape handle to futher improve jbd2_journal_write_metadata_bufferKemeng Shi
Move escape handle to futher improve code readability and remove some repeat check. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26jbd2: remove unneeded done_copy_out variable in ↵Kemeng Shi
jbd2_journal_write_metadata_buffer It's more intuitive to use jh_in->b_frozen_data directly instead of done_copy_out variable. Simply remove unneeded done_copy_out variable and use b_frozen_data instead. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240801013815.2393869-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26jbd2: remove unneeded kmap for jh_in->b_frozen_data in ↵Kemeng Shi
jbd2_journal_write_metadata_buffer Remove kmap for page of b_frozen_data from jbd2_alloc() which always provides an address from the direct kernel mapping. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26jbd2: remove unused return value of jbd2_fc_release_bufsKemeng Shi
Remove unused return value of jbd2_fc_release_bufs. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26jbd2: remove dead check in journal_alloc_journal_headKemeng Shi
We will alloc journal_head with __GFP_NOFAIL anyway, test for failure is pointless. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26jbd2: correctly compare tids with tid_geq function in jbd2_fc_begin_commitKemeng Shi
Use tid_geq to compare tids to work over sequence number wraps. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Cc: stable@kernel.org Link: https://patch.msgid.link/20240801013815.2393869-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26ext4: fix incorrect tid assumption in jbd2_journal_shrink_checkpoint_list()Luis Henriques (SUSE)
Function jbd2_journal_shrink_checkpoint_list() assumes that '0' is not a valid value for transaction IDs, which is incorrect. Don't assume that and use two extra boolean variables to control the loop iterations and keep track of the first and last tid. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240724161119.13448-4-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2024-08-26ext4: fix incorrect tid assumption in __jbd2_log_wait_for_space()Luis Henriques (SUSE)
Function __jbd2_log_wait_for_space() assumes that '0' is not a valid value for transaction IDs, which is incorrect. Don't assume that and invoke jbd2_log_wait_commit() if the journal had a committing transaction instead. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240724161119.13448-3-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2024-08-26jbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns errorBaokun Li
In __jbd2_log_wait_for_space(), we might call jbd2_cleanup_journal_tail() to recover some journal space. But if an error occurs while executing jbd2_cleanup_journal_tail() (e.g., an EIO), we don't stop waiting for free space right away, we try other branches, and if j_committing_transaction is NULL (i.e., the tid is 0), we will get the following complain: ============================================ JBD2: I/O error when updating journal superblock for sdd-8. __jbd2_log_wait_for_space: needed 256 blocks and only had 217 space available __jbd2_log_wait_for_space: no way to get more journal space in sdd-8 ------------[ cut here ]------------ WARNING: CPU: 2 PID: 139804 at fs/jbd2/checkpoint.c:109 __jbd2_log_wait_for_space+0x251/0x2e0 Modules linked in: CPU: 2 PID: 139804 Comm: kworker/u8:3 Not tainted 6.6.0+ #1 RIP: 0010:__jbd2_log_wait_for_space+0x251/0x2e0 Call Trace: <TASK> add_transaction_credits+0x5d1/0x5e0 start_this_handle+0x1ef/0x6a0 jbd2__journal_start+0x18b/0x340 ext4_dirty_inode+0x5d/0xb0 __mark_inode_dirty+0xe4/0x5d0 generic_update_time+0x60/0x70 [...] ============================================ So only if jbd2_cleanup_journal_tail() returns 1, i.e., there is nothing to clean up at the moment, continue to try to reclaim free space in other ways. Note that this fix relies on commit 6f6a6fda2945 ("jbd2: fix ocfs2 corrupt when updating journal superblock fails") to make jbd2_cleanup_journal_tail return the correct error code. Fixes: 8c3f25d8950c ("jbd2: don't give up looking for space so easily in __jbd2_log_wait_for_space") Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240718115336.2554501-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-08-26ext4: fix fast commit inode enqueueing during a full journal commitLuis Henriques (SUSE)
When a full journal commit is on-going, any fast commit has to be enqueued into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing is done only once, i.e. if an inode is already queued in a previous fast commit entry it won't be enqueued again. However, if a full commit starts _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to be done into FC_Q_STAGING. And this is not being done in function ext4_fc_track_template(). This patch fixes the issue by re-enqueuing an inode into the STAGING queue during the fast commit clean-up callback when doing a full commit. However, to prevent a race with a fast-commit, the clean-up callback has to be called with the journal locked. This bug was found using fstest generic/047. This test creates several 32k bytes files, sync'ing each of them after it's creation, and then shutting down the filesystem. Some data may be loss in this operation; for example a file may have it's size truncated to zero. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240717172220.14201-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2024-07-08jbd2: increase maximum transaction sizeJan Kara
Originally, we were quite conservative in limiting maximum transaction size to a quarter of the journal because we were not accounting transaction descriptor and revoke blocks. These days we do properly account them and reserve space for them from the total transaction credits. Thus there's no need to be so conservative and we can increase the maximum transaction size to one third of the journal (even half should work fine in principle but the performance will likely suffer in that case). This also fixes failures to grow filesystems with tiny journals. Link: CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240701132800.7158-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08jbd2: drop pointless shrinker batch initializationJan Kara
In jbd2_journal_init_common() we set batch size of a shrinker shrinking checkpointed buffers to journal->j_max_transaction_buffers. But that is guaranteed to be 0 at that point so we effectively stay with the default shrinker batch size of 128. It has been like this since introduction of jbd2 shrinkers so just drop the pointless initialization. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08jbd2: avoid infinite transaction commit loopJan Kara
Commit 9f356e5a4f12 ("jbd2: Account descriptor blocks into t_outstanding_credits") started to account descriptor blocks into transactions outstanding credits. However it didn't appropriately decrease the maximum amount of credits available to userspace. Thus if the filesystem requests a transaction smaller than j_max_transaction_buffers but large enough that when descriptor blocks are added the size exceeds j_max_transaction_buffers, we confuse add_transaction_credits() into thinking previous handles have grown the transaction too much and enter infinite journal commit loop in start_this_handle() -> add_transaction_credits() trying to create transaction with enough credits available. Fix the problem by properly accounting for transaction space reserved for descriptor blocks when verifying requested transaction handle size. CC: stable@vger.kernel.org Fixes: 9f356e5a4f12 ("jbd2: Account descriptor blocks into t_outstanding_credits") Reported-by: Alexander Coffin <alex.coffin@maticrobots.com> Link: https://lore.kernel.org/all/CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08jbd2: precompute number of transaction descriptor blocksJan Kara
Instead of computing the number of descriptor blocks a transaction can have each time we need it (which is currently when starting each transaction but will become more frequent later) precompute the number once during journal initialization together with maximum transaction size. We perform the precomputation whenever journal feature set is updated similarly as for computation of journal->j_revoke_records_per_block. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08jbd2: make jbd2_journal_get_max_txn_bufs() internalJan Kara
There's no reason to have jbd2_journal_get_max_txn_bufs() public function. Currently all users are internal and can use journal->j_max_transaction_buffers instead. This saves some unnecessary recomputations of the limit as a bonus which becomes important as this function gets more complex in the following patch. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-08jbd2: avoid mount failed when commit block is partial submittedYe Bin
We encountered a problem that the file system could not be mounted in the power-off scenario. The analysis of the file system mirror shows that only part of the data is written to the last commit block. The valid data of the commit block is concentrated in the first sector. However, the data of the entire block is involved in the checksum calculation. For different hardware, the minimum atomic unit may be different. If the checksum of a committed block is incorrect, clear the data except the 'commit_header' and then calculate the checksum. If the checkusm is correct, it is considered that the block is partially committed, Then continue to replay journal. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240620072405.3533701-1-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-05jbd2: add missing MODULE_DESCRIPTION()Jeff Johnson
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/jbd2/jbd2.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://patch.msgid.link/20240526-md-fs-jbd2-v1-1-7bba6665327d@quicinc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-07-05jbd2: speed up jbd2_transaction_committed()Zhang Yi
jbd2_transaction_committed() is used to check whether a transaction with the given tid has already committed, it holds j_state_lock in read mode and check the tid of current running transaction and committing transaction, but holding the j_state_lock is expensive. We have already stored the sequence number of the most recently committed transaction in journal t->j_commit_sequence, we could do this check by comparing it with the given tid instead. If the given tid isn't smaller than j_commit_sequence, we can ensure that the given transaction has been committed. That way we could drop the expensive lock and achieve about 10% ~ 20% performance gains in concurrent DIOs on may virtual machine with 100G ramdisk. fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \ -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \ -group_reporting Before: overwrite IOPS=88.2k, BW=344MiB/s read IOPS=95.7k, BW=374MiB/s rand overwrite IOPS=98.7k, BW=386MiB/s randread IOPS=102k, BW=397MiB/s After: overwrite IOPS=105k, BW=410MiB/s read IOPS=112k, BW=436MiB/s rand overwrite IOPS=104k, BW=404MiB/s randread IOPS=111k, BW=432MiB/s CC: Dave Chinner <david@fromorbit.com> Suggested-by: Dave Chinner <david@fromorbit.com> Link: https://lore.kernel.org/linux-ext4/ZjILCPNZRHeazSqV@dread.disaster.area/ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/20240520131831.2910790-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove unnecessary "should_sleep" in kjournald2Kemeng Shi
We only need to sleep if no running transaction is expired. Simply remove unnecessary "should_sleep". Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-10-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove dead check of JBD2_UNMOUNT in kjournald2Kemeng Shi
We always set JBD2_UNMOUNT with j_state_lock held in journal_kill_thread. In kjournald2, we check JBD2_UNMOUNT flag two times under the same j_state_lock. Then the second check is unnecessary. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-9-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove dead equality check of j_commit_[sequence/request] in kjournald2Kemeng Shi
The j_commit_[sequence/request] are updated with j_state_lock held during runtime. In kjournald2, two equality checks of j_commit_[sequence/request] are under the same j_state_lock, then the second check is unnecessary. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-8-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: use bh_in instead of jh2bh(jh_in) to simplify codeKemeng Shi
We save jh2bh(jh_in) to bh_in, so use bh_in directly instead of jh2bh(jh_in) to simplify the code. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove unneeded kmap to do escape in jbd2_journal_write_metadata_bufferKemeng Shi
The data to do escape could be accessed directly from b_frozen_data, just remove unneeded kmap. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: jump to new copy_done tag when b_frozen_data is created concurrentlyKemeng Shi
If b_frozen_data is created concurrently, we can update new_folio and new_offset with b_frozen_data and then move forward Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove unnedded "need_copy_out" in jbd2_journal_write_metadata_bufferKemeng Shi
As we only need to copy out when we should do escape, need_copy_out could be simply replaced by "do_escape". Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: remove unused return info from jbd2_journal_write_metadata_bufferKemeng Shi
The done_copy_out info from jbd2_journal_write_metadata_buffer is not used. Simply remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: avoid memleak in jbd2_journal_write_metadata_bufferKemeng Shi
The new_bh is from alloc_buffer_head, we should call free_buffer_head to free it in error case. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-06-27jbd2: use str_plural() to fix Coccinelle warningThorsten Blum
Fixes the following Coccinelle/coccicheck warning reported by string_choices.cocci: opportunity for str_plural(dropped) Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Link: https://patch.msgid.link/20240402105157.254389-2-thorsten.blum@toblux.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-21Merge tag 'pull-bd_inode-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull bdev bd_inode updates from Al Viro: "Replacement of bdev->bd_inode with sane(r) set of primitives by me and Yu Kuai" * tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RIP ->bd_inode dasd_format(): killing the last remaining user of ->bd_inode nilfs_attach_log_writer(): use ->bd_mapping->host instead of ->bd_inode block/bdev.c: use the knowledge of inode/bdev coallocation gfs2: more obvious initializations of mapping->host fs/buffer.c: massage the remaining users of ->bd_inode to ->bd_mapping blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here... grow_dev_folio(): we only want ->bd_inode->i_mapping there use ->bd_mapping instead of ->bd_inode->i_mapping block_device: add a pointer to struct address_space (page cache of bdev) missing helpers: bdev_unhash(), bdev_drop() block: move two helpers into bdev.c block2mtd: prevent direct access of bd_inode dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode) blkdev_write_iter(): saner way to get inode and bdev bcachefs: remove dead function bdev_sectors() ext4: remove block_device_ejected() erofs_buf: store address_space instead of inode erofs: switch erofs_bread() to passing offset instead of block number
2024-05-09jbd2: add prefix 'jbd2' for 'shrink_type'Ye Bin
As 'shrink_type' is exported. The module prefix 'jbd2' is added to distinguish from memory reclamation. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240407065355.1528580-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-09jbd2: use shrink_type type instead of bool type for ↵Ye Bin
__jbd2_journal_clean_checkpoint_list() "enum shrink_type" can clearly express the meaning of the parameter of __jbd2_journal_clean_checkpoint_list(), and there is no need to use the bool type. Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://lore.kernel.org/r/20240407065355.1528580-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-07jbd2: remove redundant assignement to variable errColin Ian King
The variable err is being assigned a value that is never read, it is being re-assigned inside the following while loop and also after the while loop. The assignment is redundant and can be removed. Cleans up clang scan build warning: fs/jbd2/commit.c:574:2: warning: Value stored to 'err' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240410112803.232993-1-colin.i.king@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-05-03use ->bd_mapping instead of ->bd_inode->i_mappingAl Viro
Just the low-hanging fruit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240411145346.2516848-2-viro@zeniv.linux.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-01-04jbd2: abort journal when detecting metadata writeback error of fs devZhihao Cheng
This is a replacement solution of commit bc71726c725767 ("ext4: abort the filesystem if failed to async write metadata buffer"), JBD2 can detect metadata writeback error of fs dev by 'j_fs_dev_wb_err'. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-5-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-04jbd2: remove unused 'JBD2_CHECKPOINT_IO_ERROR' and 'j_atomic_flags'Zhihao Cheng
Since 'JBD2_CHECKPOINT_IO_ERROR' and j_atomic_flags' are not useful anymore after fs dev's errseq is imported into jbd2, just remove them. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-4-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-04jbd2: replace journal state flag by checking errseqZhihao Cheng
Now JBD2 detects metadata writeback error of fs dev according to errseq. Replace journal state flag by checking errseq. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-3-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2024-01-04jbd2: add errseq to detect client fs's bdev writeback errorZhihao Cheng
Add errseq in journal, so that JBD2 can detect whether metadata is successfully written to fs bdev. This patch adds detection in recovery process to replace original solution(using local variable wb_err). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-2-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-12-12jbd2: fix soft lockup in journal_finish_inode_data_buffers()Ye Bin
There's issue when do io test: WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170] CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G OE Call trace: dump_backtrace+0x0/0x1a0 show_stack+0x24/0x30 dump_stack+0xb0/0x100 watchdog_timer_fn+0x254/0x3f8 __hrtimer_run_queues+0x11c/0x380 hrtimer_interrupt+0xfc/0x2f8 arch_timer_handler_phys+0x38/0x58 handle_percpu_devid_irq+0x90/0x248 generic_handle_irq+0x3c/0x58 __handle_domain_irq+0x68/0xc0 gic_handle_irq+0x90/0x320 el1_irq+0xcc/0x180 queued_spin_lock_slowpath+0x1d8/0x320 jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2] kjournald2+0xec/0x2f0 [jbd2] kthread+0x134/0x138 ret_from_fork+0x10/0x18 Analyzed informations from vmcore as follows: (1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list'; (2) Now is processing the 855th jbd2_inode; (3) JBD2 task has TIF_NEED_RESCHED flag; (4) There's no pags in address_space around the 855th jbd2_inode; (5) There are some process is doing drop caches; (6) Mounted with 'nodioread_nolock' option; (7) 128 CPUs; According to informations from vmcore we know 'journal->j_list_lock' spin lock competition is fierce. So journal_finish_inode_data_buffers() maybe process slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors(). However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK, will not call cond_resched(). So may lead to soft lockup. journal_finish_inode_data_buffers filemap_fdatawait_range_keep_errors __filemap_fdatawait_range while (index <= end) nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK); if (!nr_pages) break; --> If 'nr_pages' is equal zero will break, then will not call cond_resched() for (i = 0; i < nr_pages; i++) wait_on_page_writeback(page); cond_resched(); To solve above issue, add scheduling point in the journal_finish_inode_data_buffers(); Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231211112544.3879780-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30jbd2: increase the journal IO's priorityZhang Yi
Current jbd2 only add REQ_SYNC for descriptor block, metadata log buffer, commit buffer and superblock buffer, the submitted IO could be throttled by writeback throttle in block layer, that could lead to priority inversion in some cases. The log IO looks like a kind of high priority metadata IO, so it should not be throttled by WBT like QOS policies in block layer, let's add REQ_SYNC | REQ_IDLE to exempt from writeback throttle, and also add REQ_META together indicates it's a metadata IO. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231129114740.2686201-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-11-30jbd2: correct the printing of write_flags in jbd2_write_superblock()Zhang Yi
The write_flags print in the trace of jbd2_write_superblock() is not real, so move the modification before the trace. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231129114740.2686201-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>