btrfs: track data relocation with simple quota

Relocation data allocations are quite tricky for simple quotas. The basic data relocation sequence is (ignoring details that aren't relevant to this fix): - create a fake relocation data fs root - create a fake relocation inode in that root - for each data extent: - preallocate a data extent on behalf of the fake inode - copy over the data - for each extent - swap the refs so that the original file extent now refers to the new extent item - drop the fake root, dropping its refs on the old extents, which lets us delete them. Done naively, this results in storing an extent item in the extent tree whose owner_ref points at the relocation data root and a no-op squota recording, since the reloc root is not a legit fstree. So far, that's OK. The problem comes when you do the swap, and leave an extent item owned by this bogus root as the real permanent extents of the file. If the file then drops that ref, we free it and no-op account that against the fake relocation root. Essentially, this means that relocation is simple quota "extent laundering", since we re-own the extents into a fake root. Simple quotas very intentionally doesn't have a mechanism for transferring ownership of extents, as that is exactly the complicated thing we are trying to avoid with the new design. Further, it cannot be correctly done in this case, since at the time you create the new "real" refs, there is no way to know which was the original owner before relocation unless we track it. Therefore, it makes more sense to trick the preallocation to handle relocation as a special case and note the proper owner ref from the beginning. That way, we never write out an extent item without the correct owner ref that it will eventually have. This could be done by wiring a special root parameter all the way through the allocation code path, but to avoid that special case touching all the code, take advantage of the serial nature of relocation to store the src root on the relocation root object. Then when we finish the prealloc, if it happens to be this case, prepare the delayed ref appropriately. We must also add logic to handle relocating adjacent extents with different owning roots. Those cannot be preallocated together in a cluster as it would lose the separate ownership information. This is obviously a smelly bit of code, but I think it is the best solution to the problem, given the relocation implementation. Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
author: Boris Burkov <boris@bur.io> 2023-06-28 14:00:09 -0700
committer: David Sterba <dsterba@suse.com> 2023-10-12 16:44:12 +0200
commit: 2672a051e3847401b80ed4e89c7af5be6b3f1be0 (patch)
tree: ae6c118e1e8ac8834b3c82182856930d9a23b5e0 /fs/btrfs/ctree.h
parent: 60ea105a0f9fd359a5d0f1a8a2fead7cfe0528d1 (diff)
1 files changed, 3 insertions, 0 deletions
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 0c059f20533d..2758fbae7e39 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -298,6 +298,9 @@ struct btrfs_root {
 	/* Used only by log trees, when logging csum items */
 	struct extent_io_tree log_csum_range;
 
+	/* Used in simple quotas, track root during relocation. */
+	u64 relocation_src_root;
+
 #ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
 	u64 alloc_bytenr;
 #endif
author	Boris Burkov <boris@bur.io>	2023-06-28 14:00:09 -0700
committer	David Sterba <dsterba@suse.com>	2023-10-12 16:44:12 +0200
commit	2672a051e3847401b80ed4e89c7af5be6b3f1be0 (patch)
tree	ae6c118e1e8ac8834b3c82182856930d9a23b5e0 /fs/btrfs/ctree.h
parent	60ea105a0f9fd359a5d0f1a8a2fead7cfe0528d1 (diff)