summaryrefslogtreecommitdiff
path: root/fs/xfs/libxfs
AgeCommit message (Collapse)Author
2024-11-28xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delayChristoph Hellwig
xfs_bmap_add_extent_hole_delay works entirely on delalloc extents, for which xfs_bmap_same_rtgroup doesn't make sense. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-11-22xfs: fix sparse inode limits on runt AGDave Chinner
The runt AG at the end of a filesystem is almost always smaller than the mp->m_sb.sb_agblocks. Unfortunately, when setting the max_agbno limit for the inode chunk allocation, we do not take this into account. This means we can allocate a sparse inode chunk that overlaps beyond the end of an AG. When we go to allocate an inode from that sparse chunk, the irec fails validation because the agbno of the start of the irec is beyond valid limits for the runt AG. Prevent this from happening by taking into account the size of the runt AG when allocating inode chunks. Also convert the various checks for valid inode chunk agbnos to use xfs_ag_block_count() so that they will also catch such issues in the future. Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-11-22xfs: remove unknown compat feature check in superblock write validationLong Li
Compat features are new features that older kernels can safely ignore, allowing read-write mounts without issues. The current sb write validation implementation returns -EFSCORRUPTED for unknown compat features, preventing filesystem write operations and contradicting the feature's definition. Additionally, if the mounted image is unclean, the log recovery may need to write to the superblock. Returning an error for unknown compat features during sb write validation can cause mount failures. Although XFS currently does not use compat feature flags, this issue affects current kernels' ability to mount images that may use compat feature flags in the future. Since superblock read validation already warns about unknown compat features, it's unnecessary to repeat this warning during write validation. Therefore, the relevant code in write validation is being removed. Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Long Li <leo.lilong@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2024-11-21Merge tag 'xfs-6.13-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Carlos Maiolino: "The bulk of this pull request is a major rework that Darrick and Christoph have been doing on XFS's real-time volume, coupled with a few features to support this rework. It does also includes some bug fixes. - convert perag to use xarrays - create a new generic allocation group structure - add metadata inode dir trees - create in-core rt allocation groups - shard the RT section into allocation groups - persist quota options with the enw metadata dir tree - enable quota for RT volumes - enable metadata directory trees - some bugfixes" * tag 'xfs-6.13-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (146 commits) xfs: port ondisk structure checks from xfs/122 to the kernel xfs: separate space btree structures in xfs_ondisk.h xfs: convert struct typedefs in xfs_ondisk.h xfs: enable metadata directory feature xfs: enable realtime quota again xfs: update sb field checks when metadir is turned on xfs: reserve quota for realtime files correctly xfs: create quota preallocation watermarks for realtime quota xfs: report realtime block quota limits on realtime directories xfs: persist quota flags with metadir xfs: advertise realtime quota support in the xqm stat files xfs: scrub quota file metapaths xfs: fix chown with rt quota xfs: use metadir for quota inodes xfs: refactor xfs_qm_destroy_quotainos xfs: use rtgroup busy extent list for FITRIM xfs: implement busy extent tracking for rtgroups xfs: port the perag discard code to handle generic groups xfs: move the min and max group block numbers to xfs_group xfs: adjust min_block usage in xfs_verify_agbno ...
2024-11-18Merge tag 'vfs-6.13.mgtime' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs multigrain timestamps from Christian Brauner: "This is another try at implementing multigrain timestamps. This time with significant help from the timekeeping maintainers to reduce the performance impact. Thomas provided a base branch that contains the required timekeeping interfaces for the VFS. It serves as the base for the multi-grain timestamp work: - Multigrain timestamps allow the kernel to use fine-grained timestamps when an inode's attributes is being actively observed via ->getattr(). With this support, it's possible for a file to get a fine-grained timestamp, and another modified after it to get a coarse-grained stamp that is earlier than the fine-grained time. If this happens then the files can appear to have been modified in reverse order, which breaks VFS ordering guarantees. To prevent this, a floor value is maintained for multigrain timestamps. Whenever a fine-grained timestamp is handed out, record it, and when later coarse-grained stamps are handed out, ensure they are not earlier than that value. If the coarse-grained timestamp is earlier than the fine-grained floor, return the floor value instead. The timekeeper changes add a static singleton atomic64_t into timekeeper.c that is used to keep track of the latest fine-grained time ever handed out. This is tracked as a monotonic ktime_t value to ensure that it isn't affected by clock jumps. Because it is updated at different times than the rest of the timekeeper object, the floor value is managed independently of the timekeeper via a cmpxchg() operation, and sits on its own cacheline. Two new public timekeeper interfaces are added: (1) ktime_get_coarse_real_ts64_mg() fills a timespec64 with the later of the coarse-grained clock and the floor time (2) ktime_get_real_ts64_mg() gets the fine-grained clock value, and tries to swap it into the floor. A timespec64 is filled with the result. - The VFS has always used coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide when to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This adds a way to only use fine-grained timestamps when they are being actively queried. Use the (unused) top bit in inode->i_ctime_nsec as a flag that indicates whether the current timestamps have been queried via stat() or the like. When it's set, we allow the kernel to use a fine-grained timestamp iff it's necessary to make the ctime show a different value. This solves the problem of being able to distinguish the timestamp between updates, but introduces a new problem: it's now possible for a file being changed to get a fine-grained timestamp. A file that is altered just a bit later can then get a coarse-grained one that appears older than the earlier fine-grained time. This violates timestamp ordering guarantees. This is where the earlier mentioned timkeeping interfaces help. A global monotonic atomic64_t value is kept that acts as a timestamp floor. When we go to stamp a file, we first get the latter of the current floor value and the current coarse-grained time. If the inode ctime hasn't been queried then we just attempt to stamp it with that value. If it has been queried, then first see whether the current coarse time is later than the existing ctime. If it is, then we accept that value. If it isn't, then we get a fine-grained time and try to swap that into the global floor. Whether that succeeds or fails, we take the resulting floor time, convert it to realtime and try to swap that into the ctime. We take the result of the ctime swap whether it succeeds or fails, since either is just as valid. Filesystems can opt into this by setting the FS_MGTIME fstype flag. Others should be unaffected (other than being subject to the same floor value as multigrain filesystems)" * tag 'vfs-6.13.mgtime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: reduce pointer chasing in is_mgtime() test tmpfs: add support for multigrain timestamps btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps Documentation: add a new file documenting multigrain timestamps fs: add percpu counters for significant multigrain timestamp events fs: tracepoints around multigrain timestamp events fs: handle delegated timestamps in setattr_copy_mgtime timekeeping: Add percpu counter for tracking floor swap events timekeeping: Add interfaces for handling timestamps with a floor value fs: have setattr_copy handle multigrain timestamps appropriately fs: add infrastructure for multigrain timestamps
2024-11-12Merge tag 'better-ondisk-6.13_2024-11-05' of ↵Carlos Maiolino
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: improve ondisk structure checks [v5.5 10/10] Reorganize xfs_ondisk.h to group the build checks by type, then add a bunch of missing checks that were in xfs/122 but not the build system. With this, we can get rid of xfs/122. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-12Merge tag 'metadir-6.13_2024-11-05' of ↵Carlos Maiolino
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: enable metadir [v5.5 09/10] Actually enable this very large feature, which adds metadata directory trees, allocation groups on the realtime volume, persistent quota options, and quota for realtime files. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-12Merge tag 'metadir-quotas-6.13_2024-11-05' of ↵Carlos Maiolino
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: persist quota options with metadir [v5.5 07/10] Store the quota files in the metadata directory tree instead of the superblock. Since we're introducing a new incompat feature flag, let's also make the mount process bring up quotas in whatever state they were when the filesystem was last unmounted, instead of requiring sysadmins to remember that themselves. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-12Merge tag 'realtime-groups-6.13_2024-11-05' of ↵Carlos Maiolino
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: shard the realtime section [v5.5 06/10] Right now, the realtime section uses a single pair of metadata inodes to store the free space information. This presents a scalability problem since every thread trying to allocate or free rt extents have to lock these files. Solve this problem by sharding the realtime section into separate realtime allocation groups. While we're at it, define a superblock to be stamped into the start of the rt section. This enables utilities such as blkid to identify block devices containing realtime sections, and avoids the situation where anything written into block 0 of the realtime extent can be misinterpreted as file data. The best advantage for rtgroups will become evident later when we get to adding rmap and reflink to the realtime volume, since the geometry constraints are the same for rt groups and AGs. Hence we can reuse all that code directly. This is a very large patchset, but it catches us up with 20 years of technical debt that have accumulated. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-12Merge tag 'incore-rtgroups-6.13_2024-11-05' of ↵Carlos Maiolino
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: create incore rt allocation groups [v5.5 04/10] Add in-memory data structures for sharding the realtime volume into independent allocation groups. For existing filesystems, the entire rt volume is modelled as having a single large group, with (potentially) a number of rt extents exceeding 2^32 blocks, though these are not likely to exist because the codebase has been a bit broken for decades. The next series fills in the ondisk format and other supporting structures. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-12Merge tag 'metadata-directory-tree-6.13_2024-11-05' of ↵Carlos Maiolino
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: metadata inode directory trees [v5.5 03/10] This series delivers a new feature -- metadata inode directories. This is a separate directory tree (rooted in the superblock) that contains only inodes that contain filesystem metadata. Different metadata objects can be looked up with regular paths. Start by creating xfs_imeta{dir,file}* functions to mediate access to the metadata directory tree. By the end of this mega series, all existing metadata inodes (rt+quota) will use this directory tree instead of the superblock. Next, define the metadir on-disk format, which consists of marking inodes with a new iflag that says they're metadata. This prevents bulkstat and friends from ever getting their hands on fs metadata files. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-12Merge tag 'generic-groups-6.13_2024-11-05' of ↵Carlos Maiolino
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: create a generic allocation group structure [v5.5 02/10] Soon we'll be sharding the realtime volume into separate allocation groups. These rt groups will /mostly/ behave the same as the ones on the data device, but since rt groups don't have quite the same set of struct fields as perags, let's hoist the parts that will be shared by both into a common xfs_group object. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-12Merge tag 'perag-xarray-6.13_2024-11-05' of ↵Carlos Maiolino
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into staging-merge xfs: convert perag to use xarrays [v5.5 01/10] Convert the xfs_mount perag tree to use an xarray instead of a radix tree. There should be no functional changes here. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: port ondisk structure checks from xfs/122 to the kernelDarrick J. Wong
Check this with every kernel and userspace build, so we can drop the nonsense in xfs/122. Roughly drafted with: sed -e 's/^offsetof/\tXFS_CHECK_OFFSET/g' \ -e 's/^sizeof/\tXFS_CHECK_STRUCT_SIZE/g' \ -e 's/ = \([0-9]*\)/,\t\t\t\1);/g' \ -e 's/xfs_sb_t/struct xfs_dsb/g' \ -e 's/),/,/g' \ -e 's/xfs_\([a-z0-9_]*\)_t,/struct xfs_\1,/g' \ < tests/xfs/122.out | sort and then manual fixups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: separate space btree structures in xfs_ondisk.hDarrick J. Wong
Create a separate section for space management btrees so that they're not mixed in with file structures. Ignore the dsb stuff sprinkled around for now, because we'll deal with that in a subsequent patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: convert struct typedefs in xfs_ondisk.hDarrick J. Wong
Replace xfs_foo_t with struct xfs_foo where appropriate. The next patch will import more checks from xfs/122, and it's easier to automate deduplication if we don't have to reason about typedefs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: enable metadata directory featureDarrick J. Wong
Enable the metadata directory feature. With this feature, all metadata inodes are placed in the metadata directory, and the only inumbers in the superblock are the roots of the two directory trees. The RT device is now sharded into a number of rtgroups, where 0 rtgroups mean that no RT extents are supported, and the traditional XFS stub RT bitmap and summary inodes don't exist. A single rtgroup gives roughly identical behavior to the traditional RT setup, but now with checksummed and self identifying free space metadata. For quota, the quota options are read from the superblock unless explicitly overridden via mount options. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: scrub quota file metapathsDarrick J. Wong
Enable online fsck for quota file metadata directory paths. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: use metadir for quota inodesDarrick J. Wong
Store the quota inodes in the /quota metadata directory if metadir is enabled. This enables us to stop using the sb_[ugp]uotino fields in the superblock. From this point on, all metadata files will be children of the metadata directory tree root. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: implement busy extent tracking for rtgroupsDarrick J. Wong
For rtgroups filesystems, track newly freed (rt) space through the log until the rt EFIs have been committed to disk. This way we ensure that space cannot be reused until all traces of the old owner are gone. As a fringe benefit, we now support -o discard on the realtime device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: move the min and max group block numbers to xfs_groupDarrick J. Wong
Move the min and max agblock numbers to the generic xfs_group structure so that we can start building validators for extents within an rtgroup. While we're at it, use check_add_overflow for the extent length computation because that has much better overflow checking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: adjust min_block usage in xfs_verify_agbnoDarrick J. Wong
There's some weird logic in xfs_verify_agbno -- min_block ought to be the first agblock number in the AG that can be used by non-static metadata. However, we initialize it to the last agblock of the static metadata, which works due to the <= check, even though this isn't technically correct. Change the check to < and set min_block to the next agblock past the static metadata. This hasn't been an issue up to now, but we're going to move these things into the generic group struct, and this will cause problems with rtgroups, where min_block can be zero for an rtgroup that doesn't have a rt superblock. Note that there's no user-visible impact with the old logic, so this isn't a bug fix. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_tDarrick J. Wong
Now that we've finished adding allocation groups to the realtime volume, let's make the file block mapping address (xfs_rtblock_t) a segmented value just like we do on the data device. This means that group number and block number conversions can be done with shifting and masking instead of integer division. While in theory we could continue caching the rgno shift value in m_rgblklog, the fact that we now always use the shift value means that we have an opportunity to increase the redundancy of the rt geometry by storing it in the ondisk superblock and adding more sb verifier code. Extend the sueprblock to store the rgblklog value. Now that we have segmented addresses, set the correct values in m_groups[XG_TYPE_RTG] so that the xfs_group helpers work correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundariesDarrick J. Wong
We're about to segment xfs_rtblock_t addresses, so we must create type-specific helpers to do rt extent rounding of file mapping block lengths because the rtb helpers soon will not do the right thing there. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundariesDarrick J. Wong
We're about to segment xfs_rtblock_t addresses, so we must create type-specific helpers to do rt extent rounding of file block offsets because the rtb helpers soon will not do the right thing there. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: mask off the rtbitmap and summary inodes when metadir in useDarrick J. Wong
Set the rtbitmap and summary file inumbers to NULLFSINO in the superblock and make sure they're zeroed whenever we write the superblock to disk, to mimic mkfs behavior. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: scrub metadir paths for rtgroup metadataDarrick J. Wong
Add the code we need to scan the metadata directory paths of rt group metadata files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: scrub the realtime group superblockDarrick J. Wong
Enable scrubbing of realtime group superblocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: make the RT allocator rtgroup awareChristoph Hellwig
Make the allocator rtgroup aware by either picking a specific group if there is a hint, or loop over all groups otherwise. A simple rotor is provided to pick the placement for initial allocations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: don't merge ioends across RTGsDarrick J. Wong
Unlike AGs, RTGs don't always have metadata in their first blocks, and thus we don't get automatic protection from merging I/O completions across RTG boundaries. Add code to set the IOMAP_F_BOUNDARY flag for ioends that start at the first block of a RTG so that they never get merged into the previous ioend. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: use realtime EFI to free extents when rtgroups are enabledDarrick J. Wong
When rmap is enabled, XFS expects a certain order of operations, which is: 1) remove the file mapping, 2) remove the reverse mapping, and then 3) free the blocks. When reflink is enabled, XFS replaces (3) with a deferred refcount decrement operation that can schedule freeing the blocks if that was the last refcount. For realtime files, xfs_bmap_del_extent_real tries to do 1 and 3 in the same transaction, which will break both rmap and reflink unless we switch it to use realtime EFIs. Both rmap and reflink depend on the rtgroups feature, so let's turn on EFIs for all rtgroups filesystems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: support error injection when freeing rt extentsDarrick J. Wong
A handful of fstests expect to be able to test what happens when extent free intents fail to actually free the extent. Now that we're supporting EFIs for realtime extents, add to xfs_rtfree_extent the same injection point that exists in the regular extent freeing code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: support logging EFIs for realtime extentsDarrick J. Wong
Teach the EFI mechanism how to free realtime extents. We're going to need this to enforce proper ordering of operations when we enable realtime rmap. Declare a new log intent item type (XFS_LI_EFI_RT) and a separate defer ops for rt extents. This keeps the ondisk artifacts and processing code completely separate between the rt and non-rt cases. Hopefully this will make it easier to debug filesystem problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: grow the realtime section when realtime groups are enabledDarrick J. Wong
Enable growing the rt section when realtime groups are enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: encode the rtsummary in big endian formatDarrick J. Wong
Currently, the ondisk realtime summary file counters are accessed in units of 32-bit words. There's no endian translation of the contents of this file, which means that the Bad Things Happen(tm) if you go from (say) x86 to powerpc. Since we have a new feature flag, let's take the opportunity to enforce an endianness on the file. Encode the summary information in big endian format, like most of the rest of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: encode the rtbitmap in big endian formatDarrick J. Wong
Currently, the ondisk realtime bitmap file is accessed in units of 32-bit words. There's no endian translation of the contents of this file, which means that the Bad Things Happen(tm) if you go from (say) x86 to powerpc. Since we have a new feature flag, let's take the opportunity to enforce an endianness on the file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: add block headers to realtime bitmap and summary blocksDarrick J. Wong
Upgrade rtbitmap and rtsummary blocks to have self describing metadata like most every other thing in XFS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: export the geometry of realtime groups to userspaceDarrick J. Wong
Create an ioctl so that the kernel can report the status of realtime groups to userspace. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: record rt group metadata errors in the health systemDarrick J. Wong
Record the state of per-rtgroup metadata sickness in the rtgroup structure for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: add frextents to the lazysbcounters when rtgroups enabledDarrick J. Wong
Make the free rt extent count a part of the lazy sb counters when the realtime groups feature is enabled. This is possible because the patch to recompute frextents from the rtbitmap during log recovery predates the code adding rtgroup support, hence we know that the value will always be correct during runtime. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: add a helper to prevent bmap merges across rtgroup boundariesChristoph Hellwig
Except for the rt superblock, realtime groups do not store any metadata at the start (or end) of the group. There is nothing to prevent the bmap code from merging allocations from multiple groups into a single bmap record. Add a helper to check for this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> [djwong: massage the commit message after pulling this into rtgroups] Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: check that rtblock extents do not break rtsupers or rtgroupsDarrick J. Wong
Check that rt block pointers do not point to the realtime superblock and that allocated rt space extents do not cross rtgroup boundaries. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: export realtime group geometry via XFS_FSOP_GEOMDarrick J. Wong
Export the realtime geometry information so that userspace can query it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: update realtime super every time we update the primary fs superDarrick J. Wong
Every time we update parts of the primary filesystem superblock that are echoed in the rt superblock, we must update the rt super. Avoid changing the log to support logging to the rt device by using ordered buffers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: define the format of rt groupsDarrick J. Wong
Define the ondisk format of realtime group metadata, and a superblock for realtime volumes. rt supers are conditionally enabled by a predicate function so that they can be disabled if we ever implement zoned storage support for the realtime volume. For rt group enabled file systems there is a separate bitmap and summary file for each group and thus the number of bitmap and summary blocks needs to be calculated differently. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2024-11-05xfs: make RT extent numbers relative to the rtgroupChristoph Hellwig
To prepare for adding per-rtgroup bitmap files, make the xfs_rtxnum_t type encode the RT extent number relative to the rtgroup. The biggest part of this to clearly distinguish between the relative extent number that gets masked when converting from a global block number and length values that just have a factor applied to them when converting from file system blocks. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: refactor xfs_rtsummary_blockcountChristoph Hellwig
Make xfs_rtsummary_blockcount take all the required information from the mount structure and return the number of summary levels from it as well. This cleans up many of the callers and prepares for making the rtsummary files per-rtgroup where they need to look at different value. This means we recalculate some values in some callers, but as all these calculations are outside the fast path and cheap, which seems like a price worth paying. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: refactor xfs_rtbitmap_blockcountChristoph Hellwig
Rename the existing xfs_rtbitmap_blockcount to xfs_rtbitmap_blockcount_len and add a new xfs_rtbitmap_blockcount wrapper around it that takes the number of extents from the mount structure. This will simplify the move to per-rtgroup bitmaps as those will need to pass in the number of extents per rtgroup instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: support creating per-RTG files in growfsChristoph Hellwig
To support adding new RT groups in growfs, we need to be able to create the per-RT group files. Add a new xfs_rtginode_create helper to create a given per-RTG file. Most of the code for that is shared, but the details of the actual file are abstracted out using a new create method in struct xfs_rtginode_ops. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2024-11-05xfs: move RT bitmap and summary information to the rtgroupChristoph Hellwig
Move the pointers to the RT bitmap and summary inodes as well as the summary cache to the rtgroups structure to prepare for having a separate bitmap and summary inodes for each rtgroup. Code using the inodes now needs to operate on a rtgroup. Where easily possible such code is converted to iterate over all rtgroups, else rtgroup 0 (the only one that can currently exist) is hardcoded. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>