summaryrefslogtreecommitdiff
path: root/fs/verity
AgeCommit message (Collapse)Author
2024-06-12bpf: treewide: Align kfunc signatures to prog point-of-viewDaniel Xu
Previously, kfunc declarations in bpf_kfuncs.h (and others) used "user facing" types for kfuncs prototypes while the actual kfunc definitions used "kernel facing" types. More specifically: bpf_dynptr vs bpf_dynptr_kern, __sk_buff vs sk_buff, and xdp_md vs xdp_buff. It wasn't an issue before, as the verifier allows aliased types. However, since we are now generating kfunc prototypes in vmlinux.h (in addition to keeping bpf_kfuncs.h around), this conflict creates compilation errors. Fix this conflict by using "user facing" types in kfunc definitions. This results in more casts, but otherwise has no additional runtime cost. Note, similar to 5b268d1ebcdc ("bpf: Have bpf_rdonly_cast() take a const pointer"), we also make kfuncs take const arguments where appropriate in order to make the kfunc more permissive. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/b58346a63a0e66bc9b7504da751b526b0b189a67.1718207789.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-05-03fsverity: use register_sysctl_init() to avoid kmemleak warningEric Biggers
Since the fsverity sysctl registration runs as a builtin initcall, there is no corresponding sysctl deregistration and the resulting struct ctl_table_header is not used. This can cause a kmemleak warning just after the system boots up. (A pointer to the ctl_table_header is stored in the fsverity_sysctl_header static variable, which kmemleak should detect; however, the compiler can optimize out that variable.) Avoid the kmemleak warning by using register_sysctl_init() which is intended for use by builtin initcalls and uses kmemleak_not_leak(). Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/r/CAHj4cs8DTSvR698UE040rs_pX1k-WVe7aR6N2OoXXuhXJPDC-w@mail.gmail.com Cc: stable@vger.kernel.org Reviewed-by: Joel Granados <j.granados@samsung.com> Link: https://lore.kernel.org/r/20240501025331.594183-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-03-12Merge tag 'net-next-6.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ...
2024-02-01fsverity: remove hash page spin lockAndrey Albershteyn
The spin lock is not necessary here as it can be replaced with memory barrier which should be better performance-wise. When Merkle tree block size differs from page size, in is_hash_block_verified() two things are modified during check - a bitmap and PG_checked flag of the page. Each bit in the bitmap represent verification status of the Merkle tree blocks. PG_checked flag tells if page was just re-instantiated or was in pagecache. Both of this states are shared between verification threads. Page which was re-instantiated can not have already verified blocks (bit set in bitmap). The spin lock was used to allow only one thread to modify both of these states and keep order of operations. The only requirement here is that PG_Checked is set strictly after bitmap is updated. This way other threads which see that PG_Checked=1 (page cached) knows that bitmap is up-to-date. Otherwise, if PG_Checked is set before bitmap is cleared, other threads can see bit=1 and therefore will not perform verification of that Merkle tree block. However, there's still the case when one thread is setting a bit in verify_data_block() and other thread is clearing it in is_hash_block_verified(). This can happen if two threads get to !PageChecked branch and one of the threads is rescheduled before resetting the bitmap. This is fine as at worst blocks are re-verified in each thread. Signed-off-by: Andrey Albershteyn <aalbersh@redhat.com> [ebiggers: improved the comment and removed the 'verified' variable] Link: https://lore.kernel.org/r/20240201052813.68380-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2024-01-31bpf: treewide: Annotate BPF kfuncs in BTFDaniel Xu
This commit marks kfuncs as such inside the .BTF_ids section. The upshot of these annotations is that we'll be able to automatically generate kfunc prototypes for downstream users. The process is as follows: 1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs 2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for each function inside BTF_KFUNCS sets 3. At runtime, vmlinux or module BTF is made available in sysfs 4. At runtime, bpftool (or similar) can look at provided BTF and generate appropriate prototypes for functions with "bpf_kfunc" tag To ensure future kfunc are similarly tagged, we now also return error inside kfunc registration for untagged kfuncs. For vmlinux kfuncs, we also WARN(), as initcall machinery does not handle errors. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Acked-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-11Merge tag 'net-next-6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "The most interesting thing is probably the networking structs reorganization and a significant amount of changes is around self-tests. Core & protocols: - Analyze and reorganize core networking structs (socks, netdev, netns, mibs) to optimize cacheline consumption and set up build time warnings to safeguard against future header changes This improves TCP performances with many concurrent connections up to 40% - Add page-pool netlink-based introspection, exposing the memory usage and recycling stats. This helps indentify bad PP users and possible leaks - Refine TCP/DCCP source port selection to no longer favor even source port at connect() time when IP_LOCAL_PORT_RANGE is set. This lowers the time taken by connect() for hosts having many active connections to the same destination - Refactor the TCP bind conflict code, shrinking related socket structs - Refactor TCP SYN-Cookie handling, as a preparation step to allow arbitrary SYN-Cookie processing via eBPF - Tune optmem_max for 0-copy usage, increasing the default value to 128KB and namespecifying it - Allow coalescing for cloned skbs coming from page pools, improving RX performances with some common configurations - Reduce extension header parsing overhead at GRO time - Add bridge MDB bulk deletion support, allowing user-space to request the deletion of matching entries - Reorder nftables struct members, to keep data accessed by the datapath first - Introduce TC block ports tracking and use. This allows supporting multicast-like behavior at the TC layer - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and classifiers (RSVP and tcindex) - More data-race annotations - Extend the diag interface to dump TCP bound-only sockets - Conditional notification of events for TC qdisc class and actions - Support for WPAN dynamic associations with nearby devices, to form a sub-network using a specific PAN ID - Implement SMCv2.1 virtual ISM device support - Add support for Batman-avd mulicast packet type BPF: - Tons of verifier improvements: - BPF register bounds logic and range support along with a large test suite - log improvements - complete precision tracking support for register spills - track aligned STACK_ZERO cases as imprecise spilled registers. This improves the verifier "instructions processed" metric from single digit to 50-60% for some programs - support for user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience - support tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like - several fixes - Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload - Fix kCFI bugs in BPF all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y - Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques - Support uid/gid options when mounting bpffs - Add a new kfunc which acquires the associated cgroup of a task within a specific cgroup v1 hierarchy where the latter is identified by its id - Extend verifier to allow bpf_refcount_acquire() of a map value field obtained via direct load which is a use-case needed in sched_ext - Add BPF link_info support for uprobe multi link along with bpftool integration for the latter - Support for VLAN tag in XDP hints - Remove deprecated bpfilter kernel leftovers given the project is developed in user-space (https://github.com/facebook/bpfilter) Misc: - Support for parellel TC self-tests execution - Increase MPTCP self-tests coverage - Updated the bridge documentation, including several so-far undocumented features - Convert all the net self-tests to run in unique netns, to avoid random failures due to conflict and allow concurrent runs - Add TCP-AO self-tests - Add kunit tests for both cfg80211 and mac80211 - Autogenerate Netlink families documentation from YAML spec - Add yml-gen support for fixed headers and recursive nests, the tool can now generate user-space code for all genetlink families for which we have specs - A bunch of additional module descriptions fixes - Catch incorrect freeing of pages belonging to a page pool Driver API: - Rust abstractions for network PHY drivers; do not cover yet the full C API, but already allow implementing functional PHY drivers in rust - Introduce queue and NAPI support in the netdev Netlink interface, allowing complete access to the device <> NAPIs <> queues relationship - Introduce notifications filtering for devlink to allow control application scale to thousands of instances - Improve PHY validation, requesting rate matching information for each ethtool link mode supported by both the PHY and host - Add support for ethtool symmetric-xor RSS hash - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD platform - Expose pin fractional frequency offset value over new DPLL generic netlink attribute - Convert older drivers to platform remove callback returning void - Add support for PHY package MMD read/write New hardware / drivers: - Ethernet: - Octeon CN10K devices - Broadcom 5760X P7 - Qualcomm SM8550 SoC - Texas Instrument DP83TG720S PHY - Bluetooth: - IMC Networks Bluetooth radio Removed: - WiFi: - libertas 16-bit PCMCIA support - Atmel at76c50x drivers - HostAP ISA/PCMCIA style 802.11b driver - zd1201 802.11b USB dongles - Orinoco ISA/PCMCIA 802.11b driver - Aviator/Raytheon driver - Planet WL3501 driver - RNDIS USB 802.11b driver Driver updates: - Ethernet high-speed NICs: - Intel (100G, ice, idpf): - allow one by one port representors creation and removal - add temperature and clock information reporting - add get/set for ethtool's header split ringparam - add again FW logging - adds support switchdev hardware packet mirroring - iavf: implement symmetric-xor RSS hash - igc: add support for concurrent physical and free-running timers - i40e: increase the allowable descriptors - nVidia/Mellanox: - Preparation for Socket-Direct multi-dev netdev. That will allow in future releases combining multiple PFs devices attached to different NUMA nodes under the same netdev - Broadcom (bnxt): - TX completion handling improvements - add basic ntuple filter support - reduce MSIX vectors usage for MQPRIO offload - add VXLAN support, USO offload and TX coalesce completion for P7 - Marvell Octeon EP: - xmit-more support - add PF-VF mailbox support and use it for FW notifications for VFs - Wangxun (ngbe/txgbe): - implement ethtool functions to operate pause param, ring param, coalesce channel number and msglevel - Netronome/Corigine (nfp): - add flow-steering support - support UDP segmentation offload - Ethernet NICs embedded, slower, virtual: - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver - stmmac: add support for HW-accelerated VLAN stripping - TI AM654x sw: add mqprio, frame preemption & coalescing - gve: add support for non-4k page sizes. - virtio-net: support dynamic coalescing moderation - nVidia/Mellanox Ethernet datacenter switches: - allow firmware upgrade without a reboot - more flexible support for bridge flooding via the compressed FID flooding mode - Ethernet embedded switches: - Microchip: - fine-tune flow control and speed configurations in KSZ8xxx - KSZ88X3: enable setting rmii reference - Renesas: - add jumbo frames support - Marvell: - 88E6xxx: add "eth-mac" and "rmon" stats support - Ethernet PHYs: - aquantia: add firmware load support - at803x: refactor the driver to simplify adding support for more chip variants - NXP C45 TJA11xx: Add MACsec offload support - Wifi: - MediaTek (mt76): - NVMEM EEPROM improvements - mt7996 Extremely High Throughput (EHT) improvements - mt7996 Wireless Ethernet Dispatcher (WED) support - mt7996 36-bit DMA support - Qualcomm (ath12k): - support for a single MSI vector - WCN7850: support AP mode - Intel (iwlwifi): - new debugfs file fw_dbg_clear - allow concurrent P2P operation on DFS channels - Bluetooth: - QCA2066: support HFP offload - ISO: more broadcast-related improvements - NXP: better recovery in case receiver/transmitter get out of sync" * tag 'net-next-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1714 commits) lan78xx: remove redundant statement in lan78xx_get_eee lan743x: remove redundant statement in lan743x_ethtool_get_eee bnxt_en: Fix RCU locking for ntuple filters in bnxt_rx_flow_steer() bnxt_en: Fix RCU locking for ntuple filters in bnxt_srxclsrldel() bnxt_en: Remove unneeded variable in bnxt_hwrm_clear_vnic_filter() tcp: Revert no longer abort SYN_SENT when receiving some ICMP Revert "mlx5 updates 2023-12-20" Revert "net: stmmac: Enable Per DMA Channel interrupt" ipvlan: Remove usage of the deprecated ida_simple_xx() API ipvlan: Fix a typo in a comment net/sched: Remove ipt action tests net: stmmac: Use interrupt mode INTM=1 for per channel irq net: stmmac: Add support for TX/RX channel interrupt net: stmmac: Make MSI interrupt routine generic dt-bindings: net: snps,dwmac: per channel irq net: phy: at803x: make read_status more generic net: phy: at803x: add support for cdt cross short test for qca808x net: phy: at803x: refactor qca808x cable test get status function net: phy: at803x: generalize cdt fault length function net: ethernet: cortina: Drop TSO support ...
2023-12-28fs: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel elements ctl_table struct. Special attention was placed in making sure that an empty directory for fs/verity was created when CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the register sysctl call that expects a size. Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-12-01bpf, fsverity: Add kfunc bpf_get_fsverity_digestSong Liu
fsverity provides fast and reliable hash of files, namely fsverity_digest. The digest can be used by security solutions to verify file contents. Add new kfunc bpf_get_fsverity_digest() so that we can access fsverity from BPF LSM programs. This kfunc is added to fs/verity/measure.c because some data structure used in the function is private to fsverity (fs/verity/fsverity_private.h). To avoid recursion, bpf_get_fsverity_digest is only allowed in BPF LSM programs. Signed-off-by: Song Liu <song@kernel.org> Acked-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20231129234417.856536-3-song@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-08-20fsverity: skip PKCS#7 parser when keyring is emptyEric Biggers
If an fsverity builtin signature is given for a file but the ".fs-verity" keyring is empty, there's no real reason to run the PKCS#7 parser. Skip this to avoid the PKCS#7 attack surface when builtin signature support is configured into the kernel but is not being used. This is a hardening improvement, not a fix per se, but I've added Fixes and Cc stable to get it out to more users. Fixes: 432434c9f8e1 ("fs-verity: support builtin file signatures") Cc: stable@vger.kernel.org Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lore.kernel.org/r/20230820173237.2579-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-07-11fsverity: move sysctl registration out of signature.cEric Biggers
Currently the registration of the fsverity sysctls happens in signature.c, which couples it to CONFIG_FS_VERITY_BUILTIN_SIGNATURES. This makes it hard to add new sysctls unrelated to builtin signatures. Also, some users have started checking whether the directory /proc/sys/fs/verity exists as a way to tell whether fsverity is supported. This isn't the intended method; instead, the existence of /sys/fs/$fstype/features/verity should be checked, or users should just try to use the fsverity ioctls. Regardless, it should be made to work as expected without a dependency on CONFIG_FS_VERITY_BUILTIN_SIGNATURES. Therefore, move the sysctl registration into init.c. With CONFIG_FS_VERITY_BUILTIN_SIGNATURES, nothing changes. Without it, but with CONFIG_FS_VERITY, an empty list of sysctls is now registered. Link: https://lore.kernel.org/r/20230705212743.42180-3-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-07-11fsverity: simplify handling of errors during initcallEric Biggers
Since CONFIG_FS_VERITY is a bool, not a tristate, fs/verity/ can only be builtin or absent entirely; it can't be a loadable module. Therefore, the error code that gets returned from the fsverity_init() initcall is never used. If any part of the initcall does fail, which should never happen, the kernel will be left in a bad state. Following the usual convention for builtin code, just panic the kernel if any of part of the initcall fails. Link: https://lore.kernel.org/r/20230705212743.42180-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-07-11fsverity: explicitly check that there is no algorithm 0Eric Biggers
Since libfsverity and some other code would break if 0 is ever allocated as an FS_VERITY_HASH_ALG_* value, make fsverity_check_hash_algs() explicitly check that there is no algorithm 0. Link: https://lore.kernel.org/r/20230705211719.37713-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-20fsverity: improve documentation for builtin signature supportEric Biggers
fsverity builtin signatures (CONFIG_FS_VERITY_BUILTIN_SIGNATURES) aren't the only way to do signatures with fsverity, and they have some major limitations. Yet, more users have tried to use them, e.g. recently by https://github.com/ostreedev/ostree/pull/2640. In most cases this seems to be because users aren't sufficiently familiar with the limitations of this feature and what the alternatives are. Therefore, make some updates to the documentation to try to clarify the properties of this feature and nudge users in the right direction. Note that the Integrity Policy Enforcement (IPE) LSM, which is not yet upstream, is planned to use the builtin signatures. (This differs from IMA, which uses its own signature mechanism.) For that reason, my earlier patch "fsverity: mark builtin signatures as deprecated" (https://lore.kernel.org/r/20221208033548.122704-1-ebiggers@kernel.org), which marked builtin signatures as "deprecated", was controversial. This patch therefore stops short of marking the feature as deprecated. I've also revised the language to focus on better explaining the feature and what its alternatives are. Link: https://lore.kernel.org/r/20230620041937.5809-1-ebiggers@kernel.org Reviewed-by: Colin Walters <walters@verbum.org> Reviewed-by: Luca Boccassi <bluca@debian.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-14fsverity: rework fsverity_get_digest() againEric Biggers
Address several issues with the calling convention and documentation of fsverity_get_digest(): - Make it provide the hash algorithm as either a FS_VERITY_HASH_ALG_* value or HASH_ALGO_* value, at the caller's choice, rather than only a HASH_ALGO_* value as it did before. This allows callers to work with the fsverity native algorithm numbers if they want to. HASH_ALGO_* is what IMA uses, but other users (e.g. overlayfs) should use FS_VERITY_HASH_ALG_* to match fsverity-utils and the fsverity UAPI. - Make it return the digest size so that it doesn't need to be looked up separately. Use the return value for this, since 0 works nicely for the "file doesn't have fsverity enabled" case. This also makes it clear that no other errors are possible. - Rename the 'digest' parameter to 'raw_digest' and clearly document that it is only useful in combination with the algorithm ID. This hopefully clears up a point of confusion. - Export it to modules, since overlayfs will need it for checking the fsverity digests of lowerdata files (https://lore.kernel.org/r/dd294a44e8f401e6b5140029d8355f88748cd8fd.1686565330.git.alexl@redhat.com). Acked-by: Mimi Zohar <zohar@linux.ibm.com> # for the IMA piece Link: https://lore.kernel.org/r/20230612190047.59755-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-04fsverity: simplify error handling in verify_data_block()Eric Biggers
Clean up the error handling in verify_data_block() to (a) eliminate the 'err' variable which has caused some confusion because the function actually returns a bool, (b) reduce the compiled code size slightly, and (c) execute one fewer branch in the success case. Link: https://lore.kernel.org/r/20230604022312.48532-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-04fsverity: don't use bio_first_page_all() in fsverity_verify_bio()Eric Biggers
bio_first_page_all(bio)->mapping->host is not compatible with large folios, since the first page of the bio is not necessarily the head page of the folio, and therefore it might not have the mapping pointer set. Therefore, move the dereference of ->mapping->host into verify_data_blocks(), which works with a folio. (Like the commit that this Fixes, this hasn't actually been tested with large folios yet, since the filesystems that use fs/verity/ don't support that yet. But based on code review, I think this is needed.) Fixes: 5d0f0e57ed90 ("fsverity: support verifying data from large folios") Link: https://lore.kernel.org/r/20230604022101.48342-1-ebiggers@kernel.org Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-04fsverity: constify fsverity_hash_algEric Biggers
Now that fsverity_hash_alg doesn't have an embedded mempool, it can be 'const' almost everywhere. Add it. Link: https://lore.kernel.org/r/20230604022348.48658-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-06-04fsverity: use shash API instead of ahash APIEric Biggers
The "ahash" API, like the other scatterlist-based crypto APIs such as "skcipher", comes with some well-known limitations. First, it can't easily be used with vmalloc addresses. Second, the request struct can't be allocated on the stack. This adds complexity and a possible failure point that needs to be worked around, e.g. using a mempool. The only benefit of ahash over "shash" is that ahash is needed to access traditional memory-to-memory crypto accelerators, i.e. drivers/crypto/. However, this style of crypto acceleration has largely fallen out of favor and been superseded by CPU-based acceleration or inline crypto engines. Also, ahash needs to be used asynchronously to take full advantage of such hardware, but fs/verity/ has never done this. On all systems that aren't actually using one of these ahash-only crypto accelerators, ahash just adds unnecessary overhead as it sits between the user and the underlying shash algorithms. Also, XFS is planned to cache fsverity Merkle tree blocks in the existing XFS buffer cache. As a result, it will be possible for a single Merkle tree block to be split across discontiguous pages (https://lore.kernel.org/r/20230405233753.GU3223426@dread.disaster.area). This data will need to be hashed. It is easiest to work with a vmapped address in this case. However, ahash is incompatible with this. Therefore, let's convert fs/verity/ from ahash to shash. This simplifies the code, and it should also slightly improve performance for everyone who wasn't actually using one of these ahash-only crypto accelerators, i.e. almost everyone (or maybe even everyone)! Link: https://lore.kernel.org/r/20230516052306.99600-1-ebiggers@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-04-11fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fdsEric Biggers
Commit 56124d6c87fd ("fsverity: support enabling with tree block size < PAGE_SIZE") changed FS_IOC_ENABLE_VERITY to use __kernel_read() to read the file's data, instead of direct pagecache accesses. An unintended consequence of this is that the 'WARN_ON_ONCE(!(file->f_mode & FMODE_READ))' in __kernel_read() became reachable by fuzz tests. This happens if FS_IOC_ENABLE_VERITY is called on a fd opened with access mode 3, which means "ioctl access only". Arguably, FS_IOC_ENABLE_VERITY should work on ioctl-only fds. But ioctl-only fds are a weird Linux extension that is rarely used and that few people even know about. (The documentation for FS_IOC_ENABLE_VERITY even specifically says it requires O_RDONLY.) It's probably not worthwhile to make the ioctl internally open a new fd just to handle this case. Thus, just reject the ioctl on such fds for now. Fixes: 56124d6c87fd ("fsverity: support enabling with tree block size < PAGE_SIZE") Reported-by: syzbot+51177e4144d764827c45@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=2281afcbbfa8fdb92f9887479cc0e4180f1c6b28 Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230406215106.235829-1-ebiggers@kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-04-11fsverity: explicitly check for buffer overflow in build_merkle_tree()Eric Biggers
The new Merkle tree construction algorithm is a bit fragile in that it may overflow the 'root_hash' array if the tree actually generated does not match the calculated tree parameters. This should never happen unless there is a filesystem bug that allows the file size to change despite deny_write_access(), or a bug in the Merkle tree logic itself. Regardless, it's fairly easy to check for buffer overflow here, so let's do so. This is a robustness improvement only; this case is not currently known to be reachable. I've added a Fixes tag anyway, since I recommend that this be included in kernels that have the mentioned commit. Fixes: 56124d6c87fd ("fsverity: support enabling with tree block size < PAGE_SIZE") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230328041505.110162-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-04-11fsverity: use WARN_ON_ONCE instead of WARN_ONEric Biggers
As per Linus's suggestion (https://lore.kernel.org/r/CAHk-=whefxRGyNGzCzG6BVeM=5vnvgb-XhSeFJVxJyAxAF8XRA@mail.gmail.com), use WARN_ON_ONCE instead of WARN_ON. This barely adds any extra overhead, and it makes it so that if any of these ever becomes reachable (they shouldn't, but that's the point), the logs can't be flooded. Link: https://lore.kernel.org/r/20230406181542.38894-1-ebiggers@kernel.org Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-27fs-verity: simplify sysctls with register_sysctl()Luis Chamberlain
register_sysctl_paths() is only needed if your child (directories) have entries but this does not so just use register_sysctl() so to do away with the path specification. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230302202826.776286-10-mcgrof@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-15fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITYEric Biggers
The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing performance problems and is hindering adoption of fsverity. It was intended to solve a race condition where unverified pages might be left in the pagecache. But actually it doesn't solve it fully. Since the incomplete solution for this race condition has too much performance impact for it to be worth it, let's remove it for now. Fixes: 3fda4c617e84 ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl") Cc: stable@vger.kernel.org Reviewed-by: Victor Hsieh <victorhsieh@google.com> Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-14fsverity: Remove WQ_UNBOUND from fsverity read workqueueNathan Huckleberry
WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This is problematic for latency sensitive workloads, like I/O post-processing. Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related scheduler latency and improves app cold startup times by ~30ms. WQ_UNBOUND was also removed from the dm-verity workqueue for the same reason [1]. This code was tested by running Android app startup benchmarks and measuring how long the fsverity workqueue spent in the runnable state. Before Total workqueue scheduler latency: 553800us After Total workqueue scheduler latency: 18962us [1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/ Signed-off-by: Nathan Huckleberry <nhuck@google.com> Fixes: 8a1d0f9cacc9 ("fs-verity: add data verification hooks for ->readpages()") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230310193325.620493-1-nhuck@google.com Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-01-27fsverity: support verifying data from large foliosEric Biggers
Try to make fs/verity/verify.c aware of large folios. This includes making fsverity_verify_bio() support the case where the bio contains large folios, and adding a function fsverity_verify_folio() which is the equivalent of fsverity_verify_page(). There's no way to actually test this with large folios yet, but I've tested that this doesn't cause any regressions. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230127221529.299560-1-ebiggers@kernel.org
2023-01-09fsverity: support enabling with tree block size < PAGE_SIZEEric Biggers
Make FS_IOC_ENABLE_VERITY support values of fsverity_enable_arg::block_size other than PAGE_SIZE. To make this possible, rework build_merkle_tree(), which was reading data and hash pages from the file and assuming that they were the same thing as "blocks". For reading the data blocks, just replace the direct pagecache access with __kernel_read(), to naturally read one block at a time. (A disadvantage of the above is that we lose the two optimizations of hashing the pagecache pages in-place and forcing the maximum readahead. That shouldn't be very important, though.) The hash block reads are a bit more difficult to handle, as the only way to do them is through fsverity_operations::read_merkle_tree_page(). Instead, let's switch to the single-pass tree construction algorithm that fsverity-utils uses. This eliminates the need to read back any hash blocks while the tree is being built, at the small cost of an extra block-sized memory buffer per Merkle tree level. This is probably what I should have done originally. Taken together, the above two changes result in page-size independent code that is also a bit simpler than what we had before. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-8-ebiggers@kernel.org
2023-01-09fsverity: support verification with tree block size < PAGE_SIZEEric Biggers
Add support for verifying data from verity files whose Merkle tree block size is less than the page size. The main use case for this is to allow a single Merkle tree block size to be used across all systems, so that only one set of fsverity file digests and signatures is needed. To do this, eliminate various assumptions that the Merkle tree block size and the page size are the same: - Make fsverity_verify_page() a wrapper around a new function fsverity_verify_blocks() which verifies one or more blocks in a page. - When a Merkle tree block is needed, get the corresponding page and only verify and use the needed portion. (The Merkle tree continues to be read and cached in page-sized chunks; that doesn't need to change.) - When the Merkle tree block size and page size differ, use a bitmap fsverity_info::hash_block_verified to keep track of which Merkle tree blocks have been verified, as PageChecked cannot be used directly. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-7-ebiggers@kernel.org
2023-01-09fsverity: replace fsverity_hash_page() with fsverity_hash_block()Eric Biggers
In preparation for allowing the Merkle tree block size to differ from PAGE_SIZE, replace fsverity_hash_page() with fsverity_hash_block(). The new function is similar to the old one, but it operates on the block at the given offset in the page instead of on the full page. (For now, all callers still pass a full page.) Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-6-ebiggers@kernel.org
2023-01-09fsverity: use EFBIG for file too large to enable verityEric Biggers
Currently, there is an implementation limit where files can't have more than 8 Merkle tree levels. With SHA-256 and 4K blocks, this limit is never reached, since a file would need to be larger than 2**64 bytes to need 9 levels. However, with SHA-512, 9 levels are needed for files larger than about 1.15 EB, which is possible on btrfs. Therefore, this limit technically became reachable when btrfs added fsverity support. Meanwhile, support for merkle_tree_block_size < PAGE_SIZE will introduce another implementation limit on file size, resulting from the use of an in-memory bitmap to track which Merkle tree blocks have been verified. In any case, currently FS_IOC_ENABLE_VERITY fails with EINVAL when the file is too large. This is undocumented, and also ambiguous since EINVAL can mean other things too. Let's change the error code to EFBIG, which is much clearer, and document it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-5-ebiggers@kernel.org
2023-01-09fsverity: store log2(digest_size) precomputedEric Biggers
Add log_digestsize to struct merkle_tree_params so that it can be used in verify.c. Also save memory by using u8 for all the log_* fields. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-4-ebiggers@kernel.org
2023-01-09fsverity: simplify Merkle tree readahead size calculationEric Biggers
First, calculate max_ra_pages more efficiently by using the bio size. Second, calculate the number of readahead pages from the hash page index, instead of calculating it ahead of time using the data page index. This ends up being a bit simpler, especially since level 0 is last in the tree, so we can just limit the readahead to the tree size. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-3-ebiggers@kernel.org
2023-01-09fsverity: use unsigned long for level_startEric Biggers
fs/verity/ isn't consistent with whether Merkle tree block indices are 'unsigned long' or 'u64'. There's no real point to using u64 for them, though, since (a) a Merkle tree with over ULONG_MAX blocks would only be needed for a file larger than MAX_LFS_FILESIZE, and (b) for reads, the status of all Merkle tree blocks has to be tracked in memory. Therefore, let's make things a bit more efficient on 32-bit systems by using 'unsigned long[]' for merkle_tree_params::level_start, instead of 'u64[]'. Also, to be extra safe, explicitly check that there aren't more than ULONG_MAX Merkle tree blocks. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Andrey Albershteyn <aalbersh@redhat.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://lore.kernel.org/r/20221223203638.41293-2-ebiggers@kernel.org
2023-01-01fsverity: remove debug messages and CONFIG_FS_VERITY_DEBUGEric Biggers
I've gotten very little use out of these debug messages, and I'm not aware of anyone else having used them. Indeed, sprinkling pr_debug around is not really a best practice these days, especially for filesystem code. Tracepoints are used instead. Let's just remove these and start from a clean slate. This change does not affect info, warning, and error messages. Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221215060420.60692-1-ebiggers@kernel.org
2023-01-01fsverity: pass pos and size to ->write_merkle_tree_blockEric Biggers
fsverity_operations::write_merkle_tree_block is passed the index of the block to write and the log base 2 of the block size. However, all implementations of it use these parameters only to calculate the position and the size of the block, in bytes. Therefore, make ->write_merkle_tree_block take 'pos' and 'size' parameters instead of 'index' and 'log_blocksize'. Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20221214224304.145712-5-ebiggers@kernel.org
2023-01-01fsverity: optimize fsverity_cleanup_inode() on non-verity filesEric Biggers
Make fsverity_cleanup_inode() an inline function that checks for non-NULL ->i_verity_info, then (if needed) calls __fsverity_cleanup_inode() to do the real work. This reduces the overhead on non-verity files. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20221214224304.145712-4-ebiggers@kernel.org
2023-01-01fsverity: optimize fsverity_prepare_setattr() on non-verity filesEric Biggers
Make fsverity_prepare_setattr() an inline function that does the IS_VERITY() check, then (if needed) calls __fsverity_prepare_setattr() to do the real work. This reduces the overhead on non-verity files. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20221214224304.145712-3-ebiggers@kernel.org
2023-01-01fsverity: optimize fsverity_file_open() on non-verity filesEric Biggers
Make fsverity_file_open() an inline function that does the IS_VERITY() check, then (if needed) calls __fsverity_file_open() to do the real work. This reduces the overhead on non-verity files. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Dave Chinner <dchinner@redhat.com> Link: https://lore.kernel.org/r/20221214224304.145712-2-ebiggers@kernel.org
2022-11-29fsverity: simplify fsverity_get_digest()Eric Biggers
Instead of looking up the algorithm by name in hash_algo_name[] to get its hash_algo ID, just store the hash_algo ID in the fsverity_hash_alg struct. Verify at boot time that every fsverity_hash_alg has a valid hash_algo ID with matching digest size. Remove an unnecessary memset() of the whole digest array to 0 before the digest is copied into it. Finally, remove the pr_debug statement. There is already a pr_debug for the fsverity digest when the file is opened. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> Link: https://lore.kernel.org/r/20221129045139.69803-1-ebiggers@kernel.org
2022-11-28fsverity: stop using PG_error to track error statusEric Biggers
As a step towards freeing the PG_error flag for other uses, change ext4 and f2fs to stop using PG_error to track verity errors. Instead, if a verity error occurs, just mark the whole bio as failed. The coarser granularity isn't really a problem since it isn't any worse than what the block layer provides, and errors from a multi-page readahead aren't reported to applications unless a single-page read fails too. f2fs supports compression, which makes the f2fs changes a bit more complicated than desired, but the basic premise still works. Note: there are still a few uses of PageError in f2fs, but they are on the write path, so they are unrelated and this patch doesn't touch them. Reviewed-by: Chao Yu <chao@kernel.org> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20221129070401.156114-1-ebiggers@kernel.org
2022-10-06Merge tag 'for-6.1-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "There's a bunch of performance improvements, most notably the FIEMAP speedup, the new block group tree to speed up mount on large filesystems, more io_uring integration, some sysfs exports and the usual fixes and core updates. Summary: Performance: - outstanding FIEMAP speed improvement - algorithmic change how extents are enumerated leads to orders of magnitude speed boost (uncached and cached) - extent sharing check speedup (2.2x uncached, 3x cached) - add more cancellation points, allowing to interrupt seeking in files with large number of extents - more efficient hole and data seeking (4x uncached, 1.3x cached) - sample results: 256M, 32K extents: 4s -> 29ms (~150x) 512M, 64K extents: 30s -> 59ms (~550x) 1G, 128K extents: 225s -> 120ms (~1800x) - improved inode logging, especially for directories (on dbench workload throughput +25%, max latency -21%) - improved buffered IO, remove redundant extent state tracking, lowering memory consumption and avoiding rb tree traversal - add sysfs tunable to let qgroup temporarily skip exact accounting when deleting snapshot, leading to a speedup but requiring a rescan after that, will be used by snapper - support io_uring and buffered writes, until now it was just for direct IO, with the no-wait semantics implemented in the buffered write path it now works and leads to speed improvement in IOPS (2x), throughput (2.2x), latency (depends, 2x to 150x) - small performance improvements when dropping and searching for extent maps as well as when flushing delalloc in COW mode (throughput +5MB/s) User visible changes: - new incompatible feature block-group-tree adding a dedicated tree for tracking block groups, this allows a much faster load during mount and avoids seeking unlike when it's scattered in the extent tree items - this reduces mount time for many-terabyte sized filesystems - conversion tool will be provided so existing filesystem can also be updated in place - to reduce test matrix and feature combinations requires no-holes and free-space-tree (mkfs defaults since 5.15) - improved reporting of super block corruption detected by scrub - scrub also tries to repair super block and does not wait until next commit - discard stats and tunables are exported in sysfs (/sys/fs/btrfs/FSID/discard) - qgroup status is exported in sysfs (/sys/sys/fs/btrfs/FSID/qgroups/) - verify that super block was not modified when thawing filesystem Fixes: - FIEMAP fixes - fix extent sharing status, does not depend on the cached status where merged - flush delalloc so compressed extents are reported correctly - fix alignment of VMA for memory mapped files on THP - send: fix failures when processing inodes with no links (orphan files and directories) - fix race between quota enable and quota rescan ioctl - handle more corner cases for read-only compat feature verification - fix missed extent on fsync after dropping extent maps Core: - lockdep annotations to validate various transactions states and state transitions - preliminary support for fs-verity in send - more effective memory use in scrub for subpage where sector is smaller than page - block group caching progress logic has been removed, load is now synchronous - simplify end IO callbacks and bio handling, use chained bios instead of own tracking - add no-wait semantics to several functions (tree search, nocow, flushing, buffered write - cleanups and refactoring MM changes: - export balance_dirty_pages_ratelimited_flags" * tag 'for-6.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (177 commits) btrfs: set generation before calling btrfs_clean_tree_block in btrfs_init_new_buffer btrfs: drop extent map range more efficiently btrfs: avoid pointless extent map tree search when flushing delalloc btrfs: remove unnecessary next extent map search btrfs: remove unnecessary NULL pointer checks when searching extent maps btrfs: assert tree is locked when clearing extent map from logging btrfs: remove unnecessary extent map initializations btrfs: remove the refcount warning/check at free_extent_map() btrfs: add helper to replace extent map range with a new extent map btrfs: move open coded extent map tree deletion out of inode eviction btrfs: use cond_resched_rwlock_write() during inode eviction btrfs: use extent_map_end() at btrfs_drop_extent_map_range() btrfs: move btrfs_drop_extent_cache() to extent_map.c btrfs: fix missed extent on fsync after dropping extent maps btrfs: remove stale prototype of btrfs_write_inode btrfs: enable nowait async buffered writes btrfs: assert nowait mode is not used for some btree search functions btrfs: make btrfs_buffered_write nowait compatible btrfs: plumb NOWAIT through the write path btrfs: make lock_and_cleanup_extent_if_need nowait compatible ...
2022-09-26btrfs: send: add support for fs-verityBoris Burkov
Preserve the fs-verity status of a btrfs file across send/recv. There is no facility for installing the Merkle tree contents directly on the receiving filesystem, so we package up the parameters used to enable verity found in the verity descriptor. This gives the receive side enough information to properly enable verity again. Note that this means that receive will have to re-compute the whole Merkle tree, similar to how compression worked before encoded_write. Since the file becomes read-only after verity is enabled, it is important that verity is added to the send stream after any file writes. Therefore, when we process a verity item, merely note that it happened, then actually create the command in the send stream during 'finish_inode_if_needed'. This also creates V3 of the send stream format, without any format changes besides adding the new commands and attributes. Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-19fs-verity: use kmap_local_page() instead of kmap()Eric Biggers
Convert the use of kmap() to its recommended replacement kmap_local_page(). This avoids the overhead of doing a non-local mapping, which is unnecessary in this case. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220818224010.43778-1-ebiggers@kernel.org
2022-08-19fs-verity: use memcpy_from_page()Eric Biggers
Replace extract_hash() with the memcpy_from_page() helper function. This is simpler, and it has the side effect of replacing the use of kmap_atomic() with its recommended replacement kmap_local_page(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Link: https://lore.kernel.org/r/20220818223903.43710-1-ebiggers@kernel.org
2022-07-15fs-verity: mention btrfs supportEric Biggers
btrfs supports fs-verity since Linux v5.15. Document this. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: David Sterba <dsterba@suse.com> Link: https://lore.kernel.org/r/20220610000616.18225-1-ebiggers@kernel.org
2022-05-24Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecacheLinus Torvalds
Pull page cache updates from Matthew Wilcox: - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits) nilfs2: Fix some kernel-doc comments Appoint myself page cache maintainer fs: Remove aops->freepage secretmem: Convert to free_folio nfs: Convert to free_folio orangefs: Convert to free_folio fs: Add free_folio address space operation fs: Convert drop_buffers() to use a folio fs: Change try_to_free_buffers() to take a folio jbd2: Convert release_buffer_page() to use a folio jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio reiserfs: Convert release_buffer_page() to use a folio fs: Remove last vestiges of releasepage ubifs: Convert to release_folio reiserfs: Convert to release_folio orangefs: Convert to release_folio ocfs2: Convert to release_folio nilfs2: Remove comment about releasepage nfs: Convert to release_folio jfs: Convert to release_folio ...
2022-05-24Merge tag 'integrity-v5.19' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull IMA updates from Mimi Zohar: "New is IMA support for including fs-verity file digests and signatures in the IMA measurement list as well as verifying the fs-verity file digest based signatures, both based on policy. In addition, are two bug fixes: - avoid reading UEFI variables, which cause a page fault, on Apple Macs with T2 chips. - remove the original "ima" template Kconfig option to address a boot command line ordering issue. The rest is a mixture of code/documentation cleanup" * tag 'integrity-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: integrity: Fix sparse warnings in keyring_handler evm: Clean up some variables evm: Return INTEGRITY_PASS for enum integrity_status value '0' efi: Do not import certificates from UEFI Secure Boot for T2 Macs fsverity: update the documentation ima: support fs-verity file digest based version 3 signatures ima: permit fsverity's file digests in the IMA measurement list ima: define a new template field named 'd-ngv2' and templates fs-verity: define a function to return the integrity protected file digest ima: use IMA default hash algorithm for integrity violations ima: fix 'd-ng' comments and documentation ima: remove the IMA_TEMPLATE Kconfig option ima: remove redundant initialization of pointer 'file'.
2022-05-19fs-verity: Use struct_size() helper in enable_verity()Zhang Jianhua
Follow the best practice for allocating a variable-sized structure. Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> [ebiggers: adjusted commit message] Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220519022450.2434483-1-chris.zjh@huawei.com
2022-05-18fs-verity: remove unused parameter desc_size in fsverity_create_info()Zhang Jianhua
The parameter desc_size in fsverity_create_info() is useless and it is not referenced anywhere. The greatest meaning of desc_size here is to indecate the size of struct fsverity_descriptor and futher calculate the size of signature. However, the desc->sig_size can do it also and it is indeed, so remove it. Therefore, it is no need to acquire desc_size by fsverity_get_descriptor() in ensure_verity_info(), so remove the parameter desc_ret in fsverity_get_descriptor() too. Signed-off-by: Zhang Jianhua <chris.zjh@huawei.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20220518132256.2297655-1-chris.zjh@huawei.com
2022-05-08mm/readahead: Convert page_cache_async_readahead to take a folioMatthew Wilcox (Oracle)
Removes a couple of calls to compound_head and saves a few bytes. Also convert verity's read_file_data_page() to be folio-based. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-05-01fs-verity: define a function to return the integrity protected file digestMimi Zohar
Define a function named fsverity_get_digest() to return the verity file digest and the associated hash algorithm (enum hash_algo). This assumes that before calling fsverity_get_digest() the file must have been opened, which is even true for the IMA measure/appraise on file open policy rule use case (func=FILE_CHECK). do_open() calls vfs_open() immediately prior to ima_file_check(). Acked-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>