summaryrefslogtreecommitdiff
path: root/fs/bcachefs/btree_io.c
AgeCommit message (Collapse)Author
2024-04-03bcachefs: Flag btrees with missing dataKent Overstreet
We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-04-01bcachefs: BCH_WATERMARK_interior_updatesKent Overstreet
This adds a new watermark, higher priority than BCH_WATERMARK_reclaim, for interior btree updates. We've seen a deadlock where journal replay triggers a ton of btree node merges, and these use up all available open buckets and then interior updates get stuck. One cause of this is that we're currently lacking btree node merging on write buffer btrees - that needs to be fixed as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Fix btree node keys accounting in topology repair pathKent Overstreet
When dropping keys now outside a now because we're changing the node min/max, we need to redo the node's accounting as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-31bcachefs: Improved topology repair checksKent Overstreet
Consolidate bch2_gc_check_topology() and btree_node_interior_verify(), and replace them with an improved version, bch2_btree_node_check_topology(). This checks that children of an interior node correctly span the full range of the parent node with no overlaps. Also, ensure that topology repairs at runtime are always a fatal error; in particular, this adds a check in btree_iter_down() - if we don't find a key while walking down the btree that's indicative of a topology error and should be flagged as such, not a null ptr deref. Some checks in btree_update_interior.c remaining BUG_ONS(), because we already checked the node for topology errors when starting the update, and the assertions indicate that we _just_ corrupted the btree node - i.e. the problem can't be that existing on disk corruption, they indicate an actual algorithmic bug. In the future, we'll be annotating the fsck errors list with which recovery pass corrects them; the open coded "run explicit recovery pass or fatal error" in bch2_btree_node_check_topology() will in the future be done for every fsck_err() call. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-18bcachefs: Improve bch2_fatal_error()Kent Overstreet
error messages should always include __func__ Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-17bcachefs: Don't corrupt journal keys gap buffer when dropping alloc infoKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: Kill more -EIO error codesKent Overstreet
This converts -EIOs related to btree node errors to private error codes, which will help with some ongoing debugging by giving us better error messages. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-13bcachefs: kill kvpmalloc()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-03-10bcachefs: Correctly validate k->u64s in btree node read pathKent Overstreet
validate_bset_keys() never properly validated k->u64s; it checked if it was 0, but not if it was smaller than keys for the given packed format; this fixes that small oversight. This patch was backported, so it's adding quite a few error enums so that they don't get renumbered and we don't have confusing gaps. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-21bcachefs: Prep work for variable size btree node buffersKent Overstreet
bcachefs btree nodes are big - typically 256k - and btree roots are pinned in memory. As we're now up to 18 btrees, we now have significant memory overhead in mostly empty btree roots. And in the future we're going to start enforcing that certain btree node boundaries exist, to solve lock contention issues - analagous to XFS's AGIs. Thus, we need to start allocating smaller btree node buffers when we can. This patch changes code that refers to the filesystem constant c->opts.btree_node_size to refer to the btree node buffer size - btree_buf_bytes() - where appropriate. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: improve checksum error messagesKent Overstreet
new helpers: - bch2_csum_to_text() - bch2_csum_err_msg() standardize our checksum error messages a bit, and print out the checksums a bit more nicely. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: improve validate_bset_keys()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: add missing bch2_latency_acct() callKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-05bcachefs: add time_stats for btree_node_read_done()Kent Overstreet
Seeing weird latency issues in the btree node read path - add one bch2_btree_node_read_done(). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: bkey_for_each_ptr() now declares loop iterKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: better error message in btree_node_write_work()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Improve error message when finding wrong btree nodeKent Overstreet
single_device.merge_torture_flakey is, very rarely, finding a btree node that doesn't match the key that points to it: this patch improves the error message to print out more fields from the btree node header, so that we can see what else does or does not match the key. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Include btree_trans in more tracepointsKent Overstreet
This gives us more context information - e.g. which codepath is invoking btree node reads. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2024-01-01bcachefs: Rename BTREE_INSERT flagsKent Overstreet
BTREE_INSERT flags are actually transaction commit flags - rename them for clarity. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-12-03bcachefs: Don't drop journal pins in exit pathKent Overstreet
There's no need to drop journal pins in our exit paths - the code was trying to have everything cleaned up on any shutdown, but better to just tweak the assertions a bit. This fixes a bug where calling into journal reclaim in the exit path would cass a null ptr deref. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-24closures: CLOSURE_CALLBACK() to fix type punningKent Overstreet
Control flow integrity is now checking that type signatures match on indirect function calls. That breaks closures, which embed a work_struct in a closure in such a way that a closure_fn may also be used as a workqueue fn by the underlying closure code. So we have to change closure fns to take a work_struct as their argument - but that results in a loss of clarity, as closure fns have different semantics from normal workqueue functions (they run owning a ref on the closure, which must be released with continue_at() or closure_return()). Thus, this patc introduces CLOSURE_CALLBACK() and closure_type() macros as suggested by Kees, to smooth things over a bit. Suggested-by: Kees Cook <keescook@chromium.org> Cc: Coly Li <colyli@suse.de> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-05bcachefs: bkey_copy() is no longer a macroKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: Enumerate fsck errorsKent Overstreet
This patch adds a superblock error counter for every distinct fsck error; this means that when analyzing filesystems out in the wild we'll be able to see what sorts of inconsistencies are being found and repair, and hence what bugs to look for. Errors validating bkeys are not yet considered distinct fsck errors, but this patch adds a new helper, bkey_fsck_err(), in order to add distinct error types for them as well. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-11-01bcachefs: Add IO error counts to bch_memberKent Overstreet
We now track IO errors per device since filesystem creation. IO error counts can be viewed in sysfs, or with the 'bcachefs show-super' command. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-31bcachefs: bch2_btree_id_str()Kent Overstreet
Since we can run with unknown btree IDs, we can't directly index btree IDs into fixed size arrays. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Heap allocate btree_transKent Overstreet
We're using more stack than we'd like in a number of functions, and btree_trans is the biggest object that we stack allocate. But we have to do a heap allocatation to initialize it anyways, so there's no real downside to heap allocating the entire thing. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix W=12 build errorsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Break up io.cKent Overstreet
More reorganization, this splits up io.c into - io_read.c - io_misc.c - fallocate, fpunch, truncate - io_write.c Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Array bounds fixesKent Overstreet
It's no longer legal to use a zero size array as a flexible array member - this causes UBSAN to complain. This patch switches our zero size arrays to normal flexible array members when possible, and inserts casts in other places (e.g. where we use the zero size array as a marker partway through an array). Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BCH_COMPAT_bformat_overflow_done no longer requiredKent Overstreet
Awhile back, we changed bkey_format generation to ensure that the packed representation could never represent fields larger than the unpacked representation. This was to ensure that bkey_packed_successor() always gave a sensible result, but in the current code bkey_packed_successor() is only used in a debug assertion - not for anything important. This kills the requirement that we've gotten rid of those weird bkey formats, and instead changes the assertion to check if we're dealing with an old weird bkey format. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Convert btree_err_type to normal error codesKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree_err() macroKent Overstreet
Error code wasn't being propagated correctly, change it to match fsck_err() Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Log a message when running an explicit recovery passKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bkey_format helper improvementsKent Overstreet
- add a to_text() method for bkey_format - convert bch2_bkey_format_validate() to modern error message style, where we pass a printbuf for the error string instead of returning a static string Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Make topology repair a normal recovery passKent Overstreet
This adds bch2_run_explicit_recovery_pass(), for rewinding recovery and explicitly running a specific recovery pass - this is a more general replacement for how we were running topology repair before. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bcachefs_metadata_version_major_minorKent Overstreet
This introduces major/minor versioning to the superblock version number. Major version number changes indicate incompatible releases; we can move forward to a new major version number, but not backwards. Minor version numbers indicate compatible changes - these add features, but can still be mounted and used by old versions. With the recent patches that make it possible to roll out new btrees and key types without breaking compatibility, we should be able to roll out most new features without incompatible changes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Assorted sparse fixesKent Overstreet
- endianness fixes - mark some things static - fix a few __percpu annotations - fix silent enum conversions Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Allow for unknown btree IDsKent Overstreet
We need to allow filesystems with metadata from newer versions to be mountable and usable by older versions. This patch enables us to roll out new btrees without a new major version number; we can now handle btree roots for unknown btree types. The unknown btree roots will be retained, and fsck (including backpointers) will check them, the same as other btree types. We add a dynamic array for the extra, unknown btree roots, in addition to the fixed size btree root array, and add new helpers for looking up btree roots. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: bch2_version_compatible()Kent Overstreet
This adds a new helper for checking if an on-disk version is compatible with the running version of bcachefs - prep work for introducing major:minor version numbers. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Kill BTREE_INSERT_USE_RESERVEKent Overstreet
Now that we have journal watermarks and alloc watermarks unified, BTREE_INSERT_USE_RESERVE is redundant and can be deleted. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Fix btree node write error messageKent Overstreet
Error messages should include the error code, when available. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: GFP_NOIO -> GFP_NOFSKent Overstreet
GFP_NOIO dates from the bcache days, when we operated under the block layer. Now, GFP_NOFS is more appropriate, so switch all GFP_NOIO uses to GFP_NOFS. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22six locks: Kill six_lock_state unionKent Overstreet
As suggested by Linus, this drops the six_lock_state union in favor of raw bitmasks. On the one hand, bitfields give more type-level structure to the code. However, a significant amount of the code was working with six_lock_state as a u64/atomic64_t, and the conversions from the bitfields to the u64 were deemed a bit too out-there. More significantly, because bitfield order is poorly defined (#ifdef __LITTLE_ENDIAN_BITFIELD can be used, but is gross), incrementing the sequence number would overflow into the rest of the bitfield if the compiler didn't put the sequence number at the high end of the word. The new code is a bit saner when we're on an architecture without real atomic64_t support - all accesses to lock->state now go through atomic64_*() operations. On architectures with real atomic64_t support, we additionally use atomic bit ops for setting/clearing individual bits. Text size: 7467 bytes -> 4649 bytes - compilers still suck at bitfields. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop a redundant error messageKent Overstreet
When we're already read-only, we don't need to print out errors from writing btree nodes. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Private error codes: ENOMEMKent Overstreet
This adds private error codes for most (but not all) of our ENOMEM uses, which makes it easier to track down assorted allocation failures. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Drop some anonymous structs, unionsKent Overstreet
Rust bindgen doesn't cope well with anonymous structs and unions. This patch drops the fancy anonymous structs & unions in bkey_i that let us use the same helpers for bkey_i and bkey_packed; since bkey_packed is an internal type that's never exposed to outside code, it's only a minor inconvenienc. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: BKEY_PADDED_ONSTACK()Kent Overstreet
Rust bindgen doesn't do anonymous structs very nicely: BKEY_PADDED() only needs the anonymous struct when it's used on the stack, to guarantee layout, not when it's embedded in another struct. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Centralize btree node lock initializationKent Overstreet
This fixes some confusion in the lockdep code due to initializing btree node/key cache locks with the same lockdep key, but different names. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Plumb btree_trans through btree cache codeKent Overstreet
Soon, __bch2_btree_node_write() is going to require a btree_trans: zoned device support is going to require a new allocation for every btree node write. This is a bit of prep work. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
2023-10-22bcachefs: Add some logging for btree node rewrites due to errorsKent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>