diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-08-28 11:04:18 -0700 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-08-28 11:04:18 -0700 |
commit | 511fb5bafed197ff76d9adf5448de67f1d0558ae (patch) | |
tree | 6683ae0e7b62caa9488040d71705768a306f37dd /block | |
parent | de16588a7737b12e63ec646d72b45befb2b1f8f7 (diff) | |
parent | cd4284cfd3e11c7a49e4808f76f53284d47d04dd (diff) |
Merge tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull superblock updates from Christian Brauner:
"This contains the super rework that was ready for this cycle. The
first part changes the order of how we open block devices and allocate
superblocks, contains various cleanups, simplifications, and a new
mechanism to wait on superblock state changes.
This unblocks work to ultimately limit the number of writers to a
block device. Jan has already scheduled follow-up work that will be
ready for v6.7 and allows us to restrict the number of writers to a
given block device. That series builds on this work right here.
The second part contains filesystem freezing updates.
Overview:
The generic superblock changes are rougly organized as follows
(ignoring additional minor cleanups):
(1) Removal of the bd_super member from struct block_device.
This was a very odd back pointer to struct super_block with
unclear rules. For all relevant places we have other means to get
the same information so just get rid of this.
(2) Simplify rules for superblock cleanup.
Roughly, everything that is allocated during fs_context
initialization and that's stored in fs_context->s_fs_info needs
to be cleaned up by the fs_context->free() implementation before
the superblock allocation function has been called successfully.
After sget_fc() returned fs_context->s_fs_info has been
transferred to sb->s_fs_info at which point sb->kill_sb() if
fully responsible for cleanup. Adhering to these rules means that
cleanup of sb->s_fs_info in fill_super() is to be avoided as it's
brittle and inconsistent.
Cleanup shouldn't be duplicated between sb->put_super() as
sb->put_super() is only called if sb->s_root has been set aka
when the filesystem has been successfully born (SB_BORN). That
complexity should be avoided.
This also means that block devices are to be closed in
sb->kill_sb() instead of sb->put_super(). More details in the
lower section.
(3) Make it possible to lookup or create a superblock before opening
block devices
There's a subtle dependency on (2) as some filesystems did rely
on fill_super() to be called in order to correctly clean up
sb->s_fs_info. All these filesystems have been fixed.
(4) Switch most filesystem to follow the same logic as the generic
mount code now does as outlined in (3).
(5) Use the superblock as the holder of the block device. We can now
easily go back from block device to owning superblock.
(6) Export and extend the generic fs_holder_ops and use them as
holder ops everywhere and remove the filesystem specific holder
ops.
(7) Call from the block layer up into the filesystem layer when the
block device is removed, allowing to shut down the filesystem
without risk of deadlocks.
(8) Get rid of get_super().
We can now easily go back from the block device to owning
superblock and can call up from the block layer into the
filesystem layer when the device is removed. So no need to wade
through all registered superblock to find the owning superblock
anymore"
Link: https://lore.kernel.org/lkml/20230824-prall-intakt-95dbffdee4a0@brauner/
* tag 'v6.6-vfs.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (47 commits)
super: use higher-level helper for {freeze,thaw}
super: wait until we passed kill super
super: wait for nascent superblocks
super: make locking naming consistent
super: use locking helpers
fs: simplify invalidate_inodes
fs: remove get_super
block: call into the file system for ioctl BLKFLSBUF
block: call into the file system for bdev_mark_dead
block: consolidate __invalidate_device and fsync_bdev
block: drop the "busy inodes on changed media" log message
dasd: also call __invalidate_device when setting the device offline
amiflop: don't call fsync_bdev in FDFMTBEG
floppy: call disk_force_media_change when changing the format
block: simplify the disk_force_media_change interface
nbd: call blk_mark_disk_dead in nbd_clear_sock_ioctl
xfs use fs_holder_ops for the log and RT devices
xfs: drop s_umount over opening the log and RT devices
ext4: use fs_holder_ops for the log device
ext4: drop s_umount over opening the log device
...
Diffstat (limited to 'block')
-rw-r--r-- | block/bdev.c | 69 | ||||
-rw-r--r-- | block/disk-events.c | 23 | ||||
-rw-r--r-- | block/genhd.c | 45 | ||||
-rw-r--r-- | block/ioctl.c | 9 | ||||
-rw-r--r-- | block/partitions/core.c | 5 |
5 files changed, 71 insertions, 80 deletions
diff --git a/block/bdev.c b/block/bdev.c index 979e28a46b98..f3b13aa1b7d4 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -206,23 +206,6 @@ int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend) } EXPORT_SYMBOL(sync_blockdev_range); -/* - * Write out and wait upon all dirty data associated with this - * device. Filesystem data as well as the underlying block - * device. Takes the superblock lock. - */ -int fsync_bdev(struct block_device *bdev) -{ - struct super_block *sb = get_super(bdev); - if (sb) { - int res = sync_filesystem(sb); - drop_super(sb); - return res; - } - return sync_blockdev(bdev); -} -EXPORT_SYMBOL(fsync_bdev); - /** * freeze_bdev - lock a filesystem and force it into a consistent state * @bdev: blockdevice to lock @@ -248,9 +231,9 @@ int freeze_bdev(struct block_device *bdev) if (!sb) goto sync; if (sb->s_op->freeze_super) - error = sb->s_op->freeze_super(sb); + error = sb->s_op->freeze_super(sb, FREEZE_HOLDER_USERSPACE); else - error = freeze_super(sb); + error = freeze_super(sb, FREEZE_HOLDER_USERSPACE); deactivate_super(sb); if (error) { @@ -291,9 +274,9 @@ int thaw_bdev(struct block_device *bdev) goto out; if (sb->s_op->thaw_super) - error = sb->s_op->thaw_super(sb); + error = sb->s_op->thaw_super(sb, FREEZE_HOLDER_USERSPACE); else - error = thaw_super(sb); + error = thaw_super(sb, FREEZE_HOLDER_USERSPACE); if (error) bdev->bd_fsfreeze_count++; else @@ -960,26 +943,38 @@ out_path_put: } EXPORT_SYMBOL(lookup_bdev); -int __invalidate_device(struct block_device *bdev, bool kill_dirty) +/** + * bdev_mark_dead - mark a block device as dead + * @bdev: block device to operate on + * @surprise: indicate a surprise removal + * + * Tell the file system that this devices or media is dead. If @surprise is set + * to %true the device or media is already gone, if not we are preparing for an + * orderly removal. + * + * This calls into the file system, which then typicall syncs out all dirty data + * and writes back inodes and then invalidates any cached data in the inodes on + * the file system. In addition we also invalidate the block device mapping. + */ +void bdev_mark_dead(struct block_device *bdev, bool surprise) { - struct super_block *sb = get_super(bdev); - int res = 0; + mutex_lock(&bdev->bd_holder_lock); + if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead) + bdev->bd_holder_ops->mark_dead(bdev, surprise); + else + sync_blockdev(bdev); + mutex_unlock(&bdev->bd_holder_lock); - if (sb) { - /* - * no need to lock the super, get_super holds the - * read mutex so the filesystem cannot go away - * under us (->put_super runs with the write lock - * hold). - */ - shrink_dcache_sb(sb); - res = invalidate_inodes(sb, kill_dirty); - drop_super(sb); - } invalidate_bdev(bdev); - return res; } -EXPORT_SYMBOL(__invalidate_device); +#ifdef CONFIG_DASD_MODULE +/* + * Drivers should not use this directly, but the DASD driver has historically + * had a shutdown to offline mode that doesn't actually remove the gendisk + * that otherwise looks a lot like a safe device removal. + */ +EXPORT_SYMBOL_GPL(bdev_mark_dead); +#endif void sync_bdevs(bool wait) { diff --git a/block/disk-events.c b/block/disk-events.c index 0cfac464e6d1..422db8292d09 100644 --- a/block/disk-events.c +++ b/block/disk-events.c @@ -281,9 +281,7 @@ bool disk_check_media_change(struct gendisk *disk) if (!(events & DISK_EVENT_MEDIA_CHANGE)) return false; - if (__invalidate_device(disk->part0, true)) - pr_warn("VFS: busy inodes on changed media %s\n", - disk->disk_name); + bdev_mark_dead(disk->part0, true); set_bit(GD_NEED_PART_SCAN, &disk->state); return true; } @@ -294,25 +292,16 @@ EXPORT_SYMBOL(disk_check_media_change); * @disk: the disk which will raise the event * @events: the events to raise * - * Generate uevents for the disk. If DISK_EVENT_MEDIA_CHANGE is present, - * attempt to free all dentries and inodes and invalidates all block + * Should be called when the media changes for @disk. Generates a uevent + * and attempts to free all dentries and inodes and invalidates all block * device page cache entries in that case. - * - * Returns %true if DISK_EVENT_MEDIA_CHANGE was raised, or %false if not. */ -bool disk_force_media_change(struct gendisk *disk, unsigned int events) +void disk_force_media_change(struct gendisk *disk) { - disk_event_uevent(disk, events); - - if (!(events & DISK_EVENT_MEDIA_CHANGE)) - return false; - + disk_event_uevent(disk, DISK_EVENT_MEDIA_CHANGE); inc_diskseq(disk); - if (__invalidate_device(disk->part0, true)) - pr_warn("VFS: busy inodes on changed media %s\n", - disk->disk_name); + bdev_mark_dead(disk->part0, true); set_bit(GD_NEED_PART_SCAN, &disk->state); - return true; } EXPORT_SYMBOL_GPL(disk_force_media_change); diff --git a/block/genhd.c b/block/genhd.c index 3d287b32d50d..cc32a0c704eb 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -554,7 +554,7 @@ out_exit_elevator: } EXPORT_SYMBOL(device_add_disk); -static void blk_report_disk_dead(struct gendisk *disk) +static void blk_report_disk_dead(struct gendisk *disk, bool surprise) { struct block_device *bdev; unsigned long idx; @@ -565,10 +565,7 @@ static void blk_report_disk_dead(struct gendisk *disk) continue; rcu_read_unlock(); - mutex_lock(&bdev->bd_holder_lock); - if (bdev->bd_holder_ops && bdev->bd_holder_ops->mark_dead) - bdev->bd_holder_ops->mark_dead(bdev); - mutex_unlock(&bdev->bd_holder_lock); + bdev_mark_dead(bdev, surprise); put_device(&bdev->bd_device); rcu_read_lock(); @@ -576,14 +573,7 @@ static void blk_report_disk_dead(struct gendisk *disk) rcu_read_unlock(); } -/** - * blk_mark_disk_dead - mark a disk as dead - * @disk: disk to mark as dead - * - * Mark as disk as dead (e.g. surprise removed) and don't accept any new I/O - * to this disk. - */ -void blk_mark_disk_dead(struct gendisk *disk) +static void __blk_mark_disk_dead(struct gendisk *disk) { /* * Fail any new I/O. @@ -603,8 +593,19 @@ void blk_mark_disk_dead(struct gendisk *disk) * Prevent new I/O from crossing bio_queue_enter(). */ blk_queue_start_drain(disk->queue); +} - blk_report_disk_dead(disk); +/** + * blk_mark_disk_dead - mark a disk as dead + * @disk: disk to mark as dead + * + * Mark as disk as dead (e.g. surprise removed) and don't accept any new I/O + * to this disk. + */ +void blk_mark_disk_dead(struct gendisk *disk) +{ + __blk_mark_disk_dead(disk); + blk_report_disk_dead(disk, true); } EXPORT_SYMBOL_GPL(blk_mark_disk_dead); @@ -641,18 +642,20 @@ void del_gendisk(struct gendisk *disk) disk_del_events(disk); /* - * Prevent new openers by unlinked the bdev inode, and write out - * dirty data before marking the disk dead and stopping all I/O. + * Prevent new openers by unlinked the bdev inode. */ mutex_lock(&disk->open_mutex); - xa_for_each(&disk->part_tbl, idx, part) { + xa_for_each(&disk->part_tbl, idx, part) remove_inode_hash(part->bd_inode); - fsync_bdev(part); - __invalidate_device(part, true); - } mutex_unlock(&disk->open_mutex); - blk_mark_disk_dead(disk); + /* + * Tell the file system to write back all dirty data and shut down if + * it hasn't been notified earlier. + */ + if (!test_bit(GD_DEAD, &disk->state)) + blk_report_disk_dead(disk, false); + __blk_mark_disk_dead(disk); /* * Drop all partitions now that the disk is marked dead. diff --git a/block/ioctl.c b/block/ioctl.c index 3be11941fb2d..648670ddb164 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -364,7 +364,14 @@ static int blkdev_flushbuf(struct block_device *bdev, unsigned cmd, { if (!capable(CAP_SYS_ADMIN)) return -EACCES; - fsync_bdev(bdev); + + mutex_lock(&bdev->bd_holder_lock); + if (bdev->bd_holder_ops && bdev->bd_holder_ops->sync) + bdev->bd_holder_ops->sync(bdev); + else + sync_blockdev(bdev); + mutex_unlock(&bdev->bd_holder_lock); + invalidate_bdev(bdev); return 0; } diff --git a/block/partitions/core.c b/block/partitions/core.c index 13a7341299a9..e137a87f4db0 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -281,10 +281,7 @@ static void delete_partition(struct block_device *part) * looked up any more even when openers still hold references. */ remove_inode_hash(part->bd_inode); - - fsync_bdev(part); - __invalidate_device(part, true); - + bdev_mark_dead(part, false); drop_partition(part); } |