diff options
author | Darrick J. Wong <djwong@kernel.org> | 2021-08-06 11:05:43 -0700 |
---|---|---|
committer | Darrick J. Wong <djwong@kernel.org> | 2021-08-09 11:13:17 -0700 |
commit | 40b1de007aca4f9ec4ee4322c29f026ebb60ac96 (patch) | |
tree | 0933ecaa5f4f262b63e94f1a8da9bf60e2810ab8 /fs/xfs/xfs_mount.c | |
parent | a6343e4d9278b3919c809fab9945c4d8f04fadf5 (diff) |
xfs: throttle inode inactivation queuing on memory reclaim
Now that we defer inode inactivation, we've decoupled the process of
unlinking or closing an inode from the process of inactivating it. In
theory this should lead to better throughput since we now inactivate the
queued inodes in batches instead of one at a time.
Unfortunately, one of the primary risks with this decoupling is the loss
of rate control feedback between the frontend and background threads.
In other words, a rm -rf /* thread can run the system out of memory if
it can queue inodes for inactivation and jump to a new CPU faster than
the background threads can actually clear the deferred work. The
workers can get scheduled off the CPU if they have to do IO, etc.
To solve this problem, we configure a shrinker so that it will activate
the /second/ time the shrinkers are called. The custom shrinker will
queue all percpu deferred inactivation workers immediately and set a
flag to force frontend callers who are releasing a vfs inode to wait for
the inactivation workers.
On my test VM with 560M of RAM and a 2TB filesystem, this seems to solve
most of the OOMing problem when deleting 10 million inodes.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Diffstat (limited to 'fs/xfs/xfs_mount.c')
-rw-r--r-- | fs/xfs/xfs_mount.c | 9 |
1 files changed, 8 insertions, 1 deletions
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index b81f2fc734bd..ff08192d8d2a 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -769,6 +769,10 @@ xfs_mountfs( goto out_free_perag; } + error = xfs_inodegc_register_shrinker(mp); + if (error) + goto out_fail_wait; + /* * Log's mount-time initialization. The first part of recovery can place * some items on the AIL, to be handled when recovery is finished or @@ -779,7 +783,7 @@ xfs_mountfs( XFS_FSB_TO_BB(mp, sbp->sb_logblocks)); if (error) { xfs_warn(mp, "log mount failed"); - goto out_fail_wait; + goto out_inodegc_shrinker; } /* Make sure the summary counts are ok. */ @@ -974,6 +978,8 @@ xfs_mountfs( xfs_unmount_flush_inodes(mp); out_log_dealloc: xfs_log_mount_cancel(mp); + out_inodegc_shrinker: + unregister_shrinker(&mp->m_inodegc_shrinker); out_fail_wait: if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) xfs_buftarg_drain(mp->m_logdev_targp); @@ -1054,6 +1060,7 @@ xfs_unmountfs( #if defined(DEBUG) xfs_errortag_clearall(mp); #endif + unregister_shrinker(&mp->m_inodegc_shrinker); xfs_free_perag(mp); xfs_errortag_del(mp); |