Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe: - Unify where the struct request handling code is located in the blk-mq code (Christoph) - Header cleanups (Christoph) - Clean up the io_context handling code (Christoph, me) - Get rid of ->rq_disk in struct request (Christoph) - Error handling fix for add_disk() (Christoph) - request allocation cleanusp (Christoph) - Documentation updates (Eric, Matthew) - Remove trivial crypto unregister helper (Eric) - Reduce shared tag overhead (John) - Reduce poll_stats memory overhead (me) - Known indirect function call for dio (me) - Use atomic references for struct request (me) - Support request list issue for block and NVMe (me) - Improve queue dispatch pinning (Ming) - Improve the direct list issue code (Keith) - BFQ improvements (Jan) - Direct completion helper and use it in mmc block (Sebastian) - Use raw spinlock for the blktrace code (Wander) - fsync error handling fix (Ye) - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me) * tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits) MAINTAINERS: add entries for block layer documentation docs: block: remove queue-sysfs.rst docs: sysfs-block: document virt_boundary_mask docs: sysfs-block: document stable_writes docs: sysfs-block: fill in missing documentation from queue-sysfs.rst docs: sysfs-block: add contact for nomerges docs: sysfs-block: sort alphabetically docs: sysfs-block: move to stable directory block: don't protect submit_bio_checks by q_usage_counter block: fix old-style declaration nvme-pci: fix queue_rqs list splitting block: introduce rq_list_move block: introduce rq_list_for_each_safe macro block: move rq_list macros to blk-mq.h block: drop needless assignment in set_task_ioprio() block: remove unnecessary trailing '\' bio.h: fix kernel-doc warnings block: check minor range in device_add_disk() block: use "unsigned long" for blk_validate_block_size(). block: fix error unwinding in device_add_disk ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2022-01-12 10:26:52 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2022-01-12 10:26:52 -0800
commit: d3c810803576d867265277df8e94eee386351c9d (patch)
tree: 2f40646e0bbcbe64e86d16a7800f1b19e8592d6b /Documentation/ABI
parent: 42a7b4ed45e7667836fae4fb0e1ac6340588b1b0 (diff)
parent: f029cedb9bb5bab7f1bb3042be348f2dac0ee66e (diff)
2 files changed, 676 insertions, 346 deletions
diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
new file mode 100644
index 000000000000..8dd3e84a8aad
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-block
@@ -0,0 +1,676 @@
+What:		/sys/block/<disk>/alignment_offset
+Date:		April 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Storage devices may report a physical block size that is
+		bigger than the logical block size (for instance a drive
+		with 4KB physical sectors exposing 512-byte logical
+		blocks to the operating system).  This parameter
+		indicates how many bytes the beginning of the device is
+		offset from the disk's natural alignment.
+
+
+What:		/sys/block/<disk>/discard_alignment
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Devices that support discard functionality may
+		internally allocate space in units that are bigger than
+		the exported logical block size. The discard_alignment
+		parameter indicates how many bytes the beginning of the
+		device is offset from the internal allocation unit's
+		natural alignment.
+
+
+What:		/sys/block/<disk>/diskseq
+Date:		February 2021
+Contact:	Matteo Croce <mcroce@microsoft.com>
+Description:
+		The /sys/block/<disk>/diskseq files reports the disk
+		sequence number, which is a monotonically increasing
+		number assigned to every drive.
+		Some devices, like the loop device, refresh such number
+		every time the backing file is changed.
+		The value type is 64 bit unsigned.
+
+
+What:		/sys/block/<disk>/inflight
+Date:		October 2009
+Contact:	Jens Axboe <axboe@kernel.dk>, Nikanth Karthikesan <knikanth@suse.de>
+Description:
+		Reports the number of I/O requests currently in progress
+		(pending / in flight) in a device driver. This can be less
+		than the number of requests queued in the block device queue.
+		The report contains 2 fields: one for read requests
+		and one for write requests.
+		The value type is unsigned int.
+		Cf. Documentation/block/stat.rst which contains a single value for
+		requests in flight.
+		This is related to /sys/block/<disk>/queue/nr_requests
+		and for SCSI device also its queue_depth.
+
+
+What:		/sys/block/<disk>/integrity/device_is_integrity_capable
+Date:		July 2014
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Indicates whether a storage device is capable of storing
+		integrity metadata. Set if the device is T10 PI-capable.
+
+
+What:		/sys/block/<disk>/integrity/format
+Date:		June 2008
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Metadata format for integrity capable block device.
+		E.g. T10-DIF-TYPE1-CRC.
+
+
+What:		/sys/block/<disk>/integrity/protection_interval_bytes
+Date:		July 2015
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Describes the number of data bytes which are protected
+		by one integrity tuple. Typically the device's logical
+		block size.
+
+
+What:		/sys/block/<disk>/integrity/read_verify
+Date:		June 2008
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Indicates whether the block layer should verify the
+		integrity of read requests serviced by devices that
+		support sending integrity metadata.
+
+
+What:		/sys/block/<disk>/integrity/tag_size
+Date:		June 2008
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Number of bytes of integrity tag space available per
+		512 bytes of data.
+
+
+What:		/sys/block/<disk>/integrity/write_generate
+Date:		June 2008
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Indicates whether the block layer should automatically
+		generate checksums for write requests bound for
+		devices that support receiving integrity metadata.
+
+
+What:		/sys/block/<disk>/<partition>/alignment_offset
+Date:		April 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Storage devices may report a physical block size that is
+		bigger than the logical block size (for instance a drive
+		with 4KB physical sectors exposing 512-byte logical
+		blocks to the operating system).  This parameter
+		indicates how many bytes the beginning of the partition
+		is offset from the disk's natural alignment.
+
+
+What:		/sys/block/<disk>/<partition>/discard_alignment
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Devices that support discard functionality may
+		internally allocate space in units that are bigger than
+		the exported logical block size. The discard_alignment
+		parameter indicates how many bytes the beginning of the
+		partition is offset from the internal allocation unit's
+		natural alignment.
+
+
+What:		/sys/block/<disk>/<partition>/stat
+Date:		February 2008
+Contact:	Jerome Marchand <jmarchan@redhat.com>
+Description:
+		The /sys/block/<disk>/<partition>/stat files display the
+		I/O statistics of partition <partition>. The format is the
+		same as the format of /sys/block/<disk>/stat.
+
+
+What:		/sys/block/<disk>/queue/add_random
+Date:		June 2010
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] This file allows to turn off the disk entropy contribution.
+		Default value of this file is '1'(on).
+
+
+What:		/sys/block/<disk>/queue/chunk_sectors
+Date:		September 2016
+Contact:	Hannes Reinecke <hare@suse.com>
+Description:
+		[RO] chunk_sectors has different meaning depending on the type
+		of the disk. For a RAID device (dm-raid), chunk_sectors
+		indicates the size in 512B sectors of the RAID volume stripe
+		segment. For a zoned block device, either host-aware or
+		host-managed, chunk_sectors indicates the size in 512B sectors
+		of the zones of the device, with the eventual exception of the
+		last zone of the device which may be smaller.
+
+
+What:		/sys/block/<disk>/queue/dax
+Date:		June 2016
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] This file indicates whether the device supports Direct
+		Access (DAX), used by CPU-addressable storage to bypass the
+		pagecache.  It shows '1' if true, '0' if not.
+
+
+What:		/sys/block/<disk>/queue/discard_granularity
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		[RO] Devices that support discard functionality may internally
+		allocate space using units that are bigger than the logical
+		block size. The discard_granularity parameter indicates the size
+		of the internal allocation unit in bytes if reported by the
+		device. Otherwise the discard_granularity will be set to match
+		the device's physical block size. A discard_granularity of 0
+		means that the device does not support discard functionality.
+
+
+What:		/sys/block/<disk>/queue/discard_max_bytes
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		[RW] While discard_max_hw_bytes is the hardware limit for the
+		device, this setting is the software limit. Some devices exhibit
+		large latencies when large discards are issued, setting this
+		value lower will make Linux issue smaller discards and
+		potentially help reduce latencies induced by large discard
+		operations.
+
+
+What:		/sys/block/<disk>/queue/discard_max_hw_bytes
+Date:		July 2015
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Devices that support discard functionality may have
+		internal limits on the number of bytes that can be trimmed or
+		unmapped in a single operation.  The `discard_max_hw_bytes`
+		parameter is set by the device driver to the maximum number of
+		bytes that can be discarded in a single operation.  Discard
+		requests issued to the device must not exceed this limit.  A
+		`discard_max_hw_bytes` value of 0 means that the device does not
+		support discard functionality.
+
+
+What:		/sys/block/<disk>/queue/discard_zeroes_data
+Date:		May 2011
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		[RO] Will always return 0.  Don't rely on any specific behavior
+		for discards, and don't read this file.
+
+
+What:		/sys/block/<disk>/queue/fua
+Date:		May 2018
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Whether or not the block driver supports the FUA flag for
+		write requests.  FUA stands for Force Unit Access. If the FUA
+		flag is set that means that write requests must bypass the
+		volatile cache of the storage device.
+
+
+What:		/sys/block/<disk>/queue/hw_sector_size
+Date:		January 2008
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] This is the hardware sector size of the device, in bytes.
+
+
+What:		/sys/block/<disk>/queue/independent_access_ranges/
+Date:		October 2021
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] The presence of this sub-directory of the
+		/sys/block/xxx/queue/ directory indicates that the device is
+		capable of executing requests targeting different sector ranges
+		in parallel. For instance, single LUN multi-actuator hard-disks
+		will have an independent_access_ranges directory if the device
+		correctly advertizes the sector ranges of its actuators.
+
+		The independent_access_ranges directory contains one directory
+		per access range, with each range described using the sector
+		(RO) attribute file to indicate the first sector of the range
+		and the nr_sectors (RO) attribute file to indicate the total
+		number of sectors in the range starting from the first sector of
+		the range.  For example, a dual-actuator hard-disk will have the
+		following independent_access_ranges entries.::
+
+			$ tree /sys/block/<disk>/queue/independent_access_ranges/
+			/sys/block/<disk>/queue/independent_access_ranges/
+			|-- 0
+			|   |-- nr_sectors
+			|   `-- sector
+			`-- 1
+			    |-- nr_sectors
+			    `-- sector
+
+		The sector and nr_sectors attributes use 512B sector unit,
+		regardless of the actual block size of the device. Independent
+		access ranges do not overlap and include all sectors within the
+		device capacity. The access ranges are numbered in increasing
+		order of the range start sector, that is, the sector attribute
+		of range 0 always has the value 0.
+
+
+What:		/sys/block/<disk>/queue/io_poll
+Date:		November 2015
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] When read, this file shows whether polling is enabled (1)
+		or disabled (0).  Writing '0' to this file will disable polling
+		for this device.  Writing any non-zero value will enable this
+		feature.
+
+
+What:		/sys/block/<disk>/queue/io_poll_delay
+Date:		November 2016
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] If polling is enabled, this controls what kind of polling
+		will be performed. It defaults to -1, which is classic polling.
+		In this mode, the CPU will repeatedly ask for completions
+		without giving up any time.  If set to 0, a hybrid polling mode
+		is used, where the kernel will attempt to make an educated guess
+		at when the IO will complete. Based on this guess, the kernel
+		will put the process issuing IO to sleep for an amount of time,
+		before entering a classic poll loop. This mode might be a little
+		slower than pure classic polling, but it will be more efficient.
+		If set to a value larger than 0, the kernel will put the process
+		issuing IO to sleep for this amount of microseconds before
+		entering classic polling.
+
+
+What:		/sys/block/<disk>/queue/io_timeout
+Date:		November 2018
+Contact:	Weiping Zhang <zhangweiping@didiglobal.com>
+Description:
+		[RW] io_timeout is the request timeout in milliseconds. If a
+		request does not complete in this time then the block driver
+		timeout handler is invoked. That timeout handler can decide to
+		retry the request, to fail it or to start a device recovery
+		strategy.
+
+
+What:		/sys/block/<disk>/queue/iostats
+Date:		January 2009
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] This file is used to control (on/off) the iostats
+		accounting of the disk.
+
+
+What:		/sys/block/<disk>/queue/logical_block_size
+Date:		May 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		[RO] This is the smallest unit the storage device can address.
+		It is typically 512 bytes.
+
+
+What:		/sys/block/<disk>/queue/max_active_zones
+Date:		July 2020
+Contact:	Niklas Cassel <niklas.cassel@wdc.com>
+Description:
+		[RO] For zoned block devices (zoned attribute indicating
+		"host-managed" or "host-aware"), the sum of zones belonging to
+		any of the zone states: EXPLICIT OPEN, IMPLICIT OPEN or CLOSED,
+		is limited by this value. If this value is 0, there is no limit.
+
+		If the host attempts to exceed this limit, the driver should
+		report this error with BLK_STS_ZONE_ACTIVE_RESOURCE, which user
+		space may see as the EOVERFLOW errno.
+
+
+What:		/sys/block/<disk>/queue/max_discard_segments
+Date:		February 2017
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] The maximum number of DMA scatter/gather entries in a
+		discard request.
+
+
+What:		/sys/block/<disk>/queue/max_hw_sectors_kb
+Date:		September 2004
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] This is the maximum number of kilobytes supported in a
+		single data transfer.
+
+
+What:		/sys/block/<disk>/queue/max_integrity_segments
+Date:		September 2010
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Maximum number of elements in a DMA scatter/gather list
+		with integrity data that will be submitted by the block layer
+		core to the associated block driver.
+
+
+What:		/sys/block/<disk>/queue/max_open_zones
+Date:		July 2020
+Contact:	Niklas Cassel <niklas.cassel@wdc.com>
+Description:
+		[RO] For zoned block devices (zoned attribute indicating
+		"host-managed" or "host-aware"), the sum of zones belonging to
+		any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN, is
+		limited by this value. If this value is 0, there is no limit.
+
+
+What:		/sys/block/<disk>/queue/max_sectors_kb
+Date:		September 2004
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] This is the maximum number of kilobytes that the block
+		layer will allow for a filesystem request. Must be smaller than
+		or equal to the maximum size allowed by the hardware.
+
+
+What:		/sys/block/<disk>/queue/max_segment_size
+Date:		March 2010
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Maximum size in bytes of a single element in a DMA
+		scatter/gather list.
+
+
+What:		/sys/block/<disk>/queue/max_segments
+Date:		March 2010
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] Maximum number of elements in a DMA scatter/gather list
+		that is submitted to the associated block driver.
+
+
+What:		/sys/block/<disk>/queue/minimum_io_size
+Date:		April 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		[RO] Storage devices may report a granularity or preferred
+		minimum I/O size which is the smallest request the device can
+		perform without incurring a performance penalty.  For disk
+		drives this is often the physical block size.  For RAID arrays
+		it is often the stripe chunk size.  A properly aligned multiple
+		of minimum_io_size is the preferred request size for workloads
+		where a high number of I/O operations is desired.
+
+
+What:		/sys/block/<disk>/queue/nomerges
+Date:		January 2010
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] Standard I/O elevator operations include attempts to merge
+		contiguous I/Os. For known random I/O loads these attempts will
+		always fail and result in extra cycles being spent in the
+		kernel. This allows one to turn off this behavior on one of two
+		ways: When set to 1, complex merge checks are disabled, but the
+		simple one-shot merges with the previous I/O request are
+		enabled. When set to 2, all merge tries are disabled. The
+		default value is 0 - which enables all types of merge tries.
+
+
+What:		/sys/block/<disk>/queue/nr_requests
+Date:		July 2003
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] This controls how many requests may be allocated in the
+		block layer for read or write requests. Note that the total
+		allocated number may be twice this amount, since it applies only
+		to reads or writes (not the accumulated sum).
+
+		To avoid priority inversion through request starvation, a
+		request queue maintains a separate request pool per each cgroup
+		when CONFIG_BLK_CGROUP is enabled, and this parameter applies to
+		each such per-block-cgroup request pool.  IOW, if there are N
+		block cgroups, each request queue may have up to N request
+		pools, each independently regulated by nr_requests.
+
+
+What:		/sys/block/<disk>/queue/nr_zones
+Date:		November 2018
+Contact:	Damien Le Moal <damien.lemoal@wdc.com>
+Description:
+		[RO] nr_zones indicates the total number of zones of a zoned
+		block device ("host-aware" or "host-managed" zone model). For
+		regular block devices, the value is always 0.
+
+
+What:		/sys/block/<disk>/queue/optimal_io_size
+Date:		April 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		[RO] Storage devices may report an optimal I/O size, which is
+		the device's preferred unit for sustained I/O.  This is rarely
+		reported for disk drives.  For RAID arrays it is usually the
+		stripe width or the internal track size.  A properly aligned
+		multiple of optimal_io_size is the preferred request size for
+		workloads where sustained throughput is desired.  If no optimal
+		I/O size is reported this file contains 0.
+
+
+What:		/sys/block/<disk>/queue/physical_block_size
+Date:		May 2009
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		[RO] This is the smallest unit a physical storage device can
+		write atomically.  It is usually the same as the logical block
+		size but may be bigger.  One example is SATA drives with 4KB
+		sectors that expose a 512-byte logical block size to the
+		operating system.  For stacked block devices the
+		physical_block_size variable contains the maximum
+		physical_block_size of the component devices.
+
+
+What:		/sys/block/<disk>/queue/read_ahead_kb
+Date:		May 2004
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] Maximum number of kilobytes to read-ahead for filesystems
+		on this block device.
+
+
+What:		/sys/block/<disk>/queue/rotational
+Date:		January 2009
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] This file is used to stat if the device is of rotational
+		type or non-rotational type.
+
+
+What:		/sys/block/<disk>/queue/rq_affinity
+Date:		September 2008
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] If this option is '1', the block layer will migrate request
+		completions to the cpu "group" that originally submitted the
+		request. For some workloads this provides a significant
+		reduction in CPU cycles due to caching effects.
+
+		For storage configurations that need to maximize distribution of
+		completion processing setting this option to '2' forces the
+		completion to run on the requesting cpu (bypassing the "group"
+		aggregation logic).
+
+
+What:		/sys/block/<disk>/queue/scheduler
+Date:		October 2004
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] When read, this file will display the current and available
+		IO schedulers for this block device. The currently active IO
+		scheduler will be enclosed in [] brackets. Writing an IO
+		scheduler name to this file will switch control of this block
+		device to that new IO scheduler. Note that writing an IO
+		scheduler name to this file will attempt to load that IO
+		scheduler module, if it isn't already present in the system.
+
+
+What:		/sys/block/<disk>/queue/stable_writes
+Date:		September 2020
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] This file will contain '1' if memory must not be modified
+		while it is being used in a write request to this device.  When
+		this is the case and the kernel is performing writeback of a
+		page, the kernel will wait for writeback to complete before
+		allowing the page to be modified again, rather than allowing
+		immediate modification as is normally the case.  This
+		restriction arises when the device accesses the memory multiple
+		times where the same data must be seen every time -- for
+		example, once to calculate a checksum and once to actually write
+		the data.  If no such restriction exists, this file will contain
+		'0'.  This file is writable for testing purposes.
+
+
+What:		/sys/block/<disk>/queue/throttle_sample_time
+Date:		March 2017
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] This is the time window that blk-throttle samples data, in
+		millisecond.  blk-throttle makes decision based on the
+		samplings. Lower time means cgroups have more smooth throughput,
+		but higher CPU overhead. This exists only when
+		CONFIG_BLK_DEV_THROTTLING_LOW is enabled.
+
+
+What:		/sys/block/<disk>/queue/virt_boundary_mask
+Date:		April 2021
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] This file shows the I/O segment memory alignment mask for
+		the block device.  I/O requests to this device will be split
+		between segments wherever either the memory address of the end
+		of the previous segment or the memory address of the beginning
+		of the current segment is not aligned to virt_boundary_mask + 1
+		bytes.
+
+
+What:		/sys/block/<disk>/queue/wbt_lat_usec
+Date:		November 2016
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] If the device is registered for writeback throttling, then
+		this file shows the target minimum read latency. If this latency
+		is exceeded in a given window of time (see wb_window_usec), then
+		the writeback throttling will start scaling back writes. Writing
+		a value of '0' to this file disables the feature. Writing a
+		value of '-1' to this file resets the value to the default
+		setting.
+
+
+What:		/sys/block/<disk>/queue/write_cache
+Date:		April 2016
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RW] When read, this file will display whether the device has
+		write back caching enabled or not. It will return "write back"
+		for the former case, and "write through" for the latter. Writing
+		to this file can change the kernels view of the device, but it
+		doesn't alter the device state. This means that it might not be
+		safe to toggle the setting from "write back" to "write through",
+		since that will also eliminate cache flushes issued by the
+		kernel.
+
+
+What:		/sys/block/<disk>/queue/write_same_max_bytes
+Date:		January 2012
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		[RO] Some devices support a write same operation in which a
+		single data block can be written to a range of several
+		contiguous blocks on storage. This can be used to wipe areas on
+		disk or to initialize drives in a RAID configuration.
+		write_same_max_bytes indicates how many bytes can be written in
+		a single write same command. If write_same_max_bytes is 0, write
+		same is not supported by the device.
+
+
+What:		/sys/block/<disk>/queue/write_zeroes_max_bytes
+Date:		November 2016
+Contact:	Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
+Description:
+		[RO] Devices that support write zeroes operation in which a
+		single request can be issued to zero out the range of contiguous
+		blocks on storage without having any payload in the request.
+		This can be used to optimize writing zeroes to the devices.
+		write_zeroes_max_bytes indicates how many bytes can be written
+		in a single write zeroes command. If write_zeroes_max_bytes is
+		0, write zeroes is not supported by the device.
+
+
+What:		/sys/block/<disk>/queue/zone_append_max_bytes
+Date:		May 2020
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] This is the maximum number of bytes that can be written to
+		a sequential zone of a zoned block device using a zone append
+		write operation (REQ_OP_ZONE_APPEND). This value is always 0 for
+		regular block devices.
+
+
+What:		/sys/block/<disk>/queue/zone_write_granularity
+Date:		January 2021
+Contact:	linux-block@vger.kernel.org
+Description:
+		[RO] This indicates the alignment constraint, in bytes, for
+		write operations in sequential zones of zoned block devices
+		(devices with a zoned attributed that reports "host-managed" or
+		"host-aware"). This value is always 0 for regular block devices.
+
+
+What:		/sys/block/<disk>/queue/zoned
+Date:		September 2016
+Contact:	Damien Le Moal <damien.lemoal@wdc.com>
+Description:
+		[RO] zoned indicates if the device is a zoned block device and
+		the zone model of the device if it is indeed zoned.  The
+		possible values indicated by zoned are "none" for regular block
+		devices and "host-aware" or "host-managed" for zoned block
+		devices. The characteristics of host-aware and host-managed
+		zoned block devices are described in the ZBC (Zoned Block
+		Commands) and ZAC (Zoned Device ATA Command Set) standards.
+		These standards also define the "drive-managed" zone model.
+		However, since drive-managed zoned block devices do not support
+		zone commands, they will be treated as regular block devices and
+		zoned will report "none".
+
+
+What:		/sys/block/<disk>/stat
+Date:		February 2008
+Contact:	Jerome Marchand <jmarchan@redhat.com>
+Description:
+		The /sys/block/<disk>/stat files displays the I/O
+		statistics of disk <disk>. They contain 11 fields:
+
+		==  ==============================================
+		 1  reads completed successfully
+		 2  reads merged
+		 3  sectors read
+		 4  time spent reading (ms)
+		 5  writes completed
+		 6  writes merged
+		 7  sectors written
+		 8  time spent writing (ms)
+		 9  I/Os currently in progress
+		10  time spent doing I/Os (ms)
+		11  weighted time spent doing I/Os (ms)
+		12  discards completed
+		13  discards merged
+		14  sectors discarded
+		15  time spent discarding (ms)
+		16  flush requests completed
+		17  time spent flushing (ms)
+		==  ==============================================
+
+		For more details refer Documentation/admin-guide/iostats.rst
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
deleted file mode 100644
index b16b0c45a272..000000000000
--- a/Documentation/ABI/testing/sysfs-block
+++ /dev/null
@@ -1,346 +0,0 @@
-What:		/sys/block/<disk>/stat
-Date:		February 2008
-Contact:	Jerome Marchand <jmarchan@redhat.com>
-Description:
-		The /sys/block/<disk>/stat files displays the I/O
-		statistics of disk <disk>. They contain 11 fields:
-
-		==  ==============================================
-		 1  reads completed successfully
-		 2  reads merged
-		 3  sectors read
-		 4  time spent reading (ms)
-		 5  writes completed
-		 6  writes merged
-		 7  sectors written
-		 8  time spent writing (ms)
-		 9  I/Os currently in progress
-		10  time spent doing I/Os (ms)
-		11  weighted time spent doing I/Os (ms)
-		12  discards completed
-		13  discards merged
-		14  sectors discarded
-		15  time spent discarding (ms)
-		16  flush requests completed
-		17  time spent flushing (ms)
-		==  ==============================================
-
-		For more details refer Documentation/admin-guide/iostats.rst
-
-
-What:		/sys/block/<disk>/inflight
-Date:		October 2009
-Contact:	Jens Axboe <axboe@kernel.dk>, Nikanth Karthikesan <knikanth@suse.de>
-Description:
-		Reports the number of I/O requests currently in progress
-		(pending / in flight) in a device driver. This can be less
-		than the number of requests queued in the block device queue.
-		The report contains 2 fields: one for read requests
-		and one for write requests.
-		The value type is unsigned int.
-		Cf. Documentation/block/stat.rst which contains a single value for
-		requests in flight.
-		This is related to nr_requests in Documentation/block/queue-sysfs.rst
-		and for SCSI device also its queue_depth.
-
-
-What:		/sys/block/<disk>/diskseq
-Date:		February 2021
-Contact:	Matteo Croce <mcroce@microsoft.com>
-Description:
-		The /sys/block/<disk>/diskseq files reports the disk
-		sequence number, which is a monotonically increasing
-		number assigned to every drive.
-		Some devices, like the loop device, refresh such number
-		every time the backing file is changed.
-		The value type is 64 bit unsigned.
-
-
-What:		/sys/block/<disk>/<part>/stat
-Date:		February 2008
-Contact:	Jerome Marchand <jmarchan@redhat.com>
-Description:
-		The /sys/block/<disk>/<part>/stat files display the
-		I/O statistics of partition <part>. The format is the
-		same as the above-written /sys/block/<disk>/stat
-		format.
-
-
-What:		/sys/block/<disk>/integrity/format
-Date:		June 2008
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Metadata format for integrity capable block device.
-		E.g. T10-DIF-TYPE1-CRC.
-
-
-What:		/sys/block/<disk>/integrity/read_verify
-Date:		June 2008
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Indicates whether the block layer should verify the
-		integrity of read requests serviced by devices that
-		support sending integrity metadata.
-
-
-What:		/sys/block/<disk>/integrity/tag_size
-Date:		June 2008
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Number of bytes of integrity tag space available per
-		512 bytes of data.
-
-
-What:		/sys/block/<disk>/integrity/device_is_integrity_capable
-Date:		July 2014
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Indicates whether a storage device is capable of storing
-		integrity metadata. Set if the device is T10 PI-capable.
-
-What:		/sys/block/<disk>/integrity/protection_interval_bytes
-Date:		July 2015
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Describes the number of data bytes which are protected
-		by one integrity tuple. Typically the device's logical
-		block size.
-
-What:		/sys/block/<disk>/integrity/write_generate
-Date:		June 2008
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Indicates whether the block layer should automatically
-		generate checksums for write requests bound for
-		devices that support receiving integrity metadata.
-
-What:		/sys/block/<disk>/alignment_offset
-Date:		April 2009
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Storage devices may report a physical block size that is
-		bigger than the logical block size (for instance a drive
-		with 4KB physical sectors exposing 512-byte logical
-		blocks to the operating system).  This parameter
-		indicates how many bytes the beginning of the device is
-		offset from the disk's natural alignment.
-
-What:		/sys/block/<disk>/<partition>/alignment_offset
-Date:		April 2009
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Storage devices may report a physical block size that is
-		bigger than the logical block size (for instance a drive
-		with 4KB physical sectors exposing 512-byte logical
-		blocks to the operating system).  This parameter
-		indicates how many bytes the beginning of the partition
-		is offset from the disk's natural alignment.
-
-What:		/sys/block/<disk>/queue/logical_block_size
-Date:		May 2009
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		This is the smallest unit the storage device can
-		address.  It is typically 512 bytes.
-
-What:		/sys/block/<disk>/queue/physical_block_size
-Date:		May 2009
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		This is the smallest unit a physical storage device can
-		write atomically.  It is usually the same as the logical
-		block size but may be bigger.  One example is SATA
-		drives with 4KB sectors that expose a 512-byte logical
-		block size to the operating system.  For stacked block
-		devices the physical_block_size variable contains the
-		maximum physical_block_size of the component devices.
-
-What:		/sys/block/<disk>/queue/minimum_io_size
-Date:		April 2009
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Storage devices may report a granularity or preferred
-		minimum I/O size which is the smallest request the
-		device can perform without incurring a performance
-		penalty.  For disk drives this is often the physical
-		block size.  For RAID arrays it is often the stripe
-		chunk size.  A properly aligned multiple of
-		minimum_io_size is the preferred request size for
-		workloads where a high number of I/O operations is
-		desired.
-
-What:		/sys/block/<disk>/queue/optimal_io_size
-Date:		April 2009
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Storage devices may report an optimal I/O size, which is
-		the device's preferred unit for sustained I/O.  This is
-		rarely reported for disk drives.  For RAID arrays it is
-		usually the stripe width or the internal track size.  A
-		properly aligned multiple of optimal_io_size is the
-		preferred request size for workloads where sustained
-		throughput is desired.  If no optimal I/O size is
-		reported this file contains 0.
-
-What:		/sys/block/<disk>/queue/nomerges
-Date:		January 2010
-Contact:
-Description:
-		Standard I/O elevator operations include attempts to
-		merge contiguous I/Os. For known random I/O loads these
-		attempts will always fail and result in extra cycles
-		being spent in the kernel. This allows one to turn off
-		this behavior on one of two ways: When set to 1, complex
-		merge checks are disabled, but the simple one-shot merges
-		with the previous I/O request are enabled. When set to 2,
-		all merge tries are disabled. The default value is 0 -
-		which enables all types of merge tries.
-
-What:		/sys/block/<disk>/discard_alignment
-Date:		May 2011
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Devices that support discard functionality may
-		internally allocate space in units that are bigger than
-		the exported logical block size. The discard_alignment
-		parameter indicates how many bytes the beginning of the
-		device is offset from the internal allocation unit's
-		natural alignment.
-
-What:		/sys/block/<disk>/<partition>/discard_alignment
-Date:		May 2011
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Devices that support discard functionality may
-		internally allocate space in units that are bigger than
-		the exported logical block size. The discard_alignment
-		parameter indicates how many bytes the beginning of the
-		partition is offset from the internal allocation unit's
-		natural alignment.
-
-What:		/sys/block/<disk>/queue/discard_granularity
-Date:		May 2011
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Devices that support discard functionality may
-		internally allocate space using units that are bigger
-		than the logical block size. The discard_granularity
-		parameter indicates the size of the internal allocation
-		unit in bytes if reported by the device. Otherwise the
-		discard_granularity will be set to match the device's
-		physical block size. A discard_granularity of 0 means
-		that the device does not support discard functionality.
-
-What:		/sys/block/<disk>/queue/discard_max_bytes
-Date:		May 2011
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Devices that support discard functionality may have
-		internal limits on the number of bytes that can be
-		trimmed or unmapped in a single operation. Some storage
-		protocols also have inherent limits on the number of
-		blocks that can be described in a single command. The
-		discard_max_bytes parameter is set by the device driver
-		to the maximum number of bytes that can be discarded in
-		a single operation. Discard requests issued to the
-		device must not exceed this limit. A discard_max_bytes
-		value of 0 means that the device does not support
-		discard functionality.
-
-What:		/sys/block/<disk>/queue/discard_zeroes_data
-Date:		May 2011
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Will always return 0.  Don't rely on any specific behavior
-		for discards, and don't read this file.
-
-What:		/sys/block/<disk>/queue/write_same_max_bytes
-Date:		January 2012
-Contact:	Martin K. Petersen <martin.petersen@oracle.com>
-Description:
-		Some devices support a write same operation in which a
-		single data block can be written to a range of several
-		contiguous blocks on storage. This can be used to wipe
-		areas on disk or to initialize drives in a RAID
-		configuration. write_same_max_bytes indicates how many
-		bytes can be written in a single write same command. If
-		write_same_max_bytes is 0, write same is not supported
-		by the device.
-
-What:		/sys/block/<disk>/queue/write_zeroes_max_bytes
-Date:		November 2016
-Contact:	Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
-Description:
-		Devices that support write zeroes operation in which a
-		single request can be issued to zero out the range of
-		contiguous blocks on storage without having any payload
-		in the request. This can be used to optimize writing zeroes
-		to the devices. write_zeroes_max_bytes indicates how many
-		bytes can be written in a single write zeroes command. If
-		write_zeroes_max_bytes is 0, write zeroes is not supported
-		by the device.
-
-What:		/sys/block/<disk>/queue/zoned
-Date:		September 2016
-Contact:	Damien Le Moal <damien.lemoal@wdc.com>
-Description:
-		zoned indicates if the device is a zoned block device
-		and the zone model of the device if it is indeed zoned.
-		The possible values indicated by zoned are "none" for
-		regular block devices and "host-aware" or "host-managed"
-		for zoned block devices. The characteristics of
-		host-aware and host-managed zoned block devices are
-		described in the ZBC (Zoned Block Commands) and ZAC
-		(Zoned Device ATA Command Set) standards. These standards
-		also define the "drive-managed" zone model. However,
-		since drive-managed zoned block devices do not support
-		zone commands, they will be treated as regular block
-		devices and zoned will report "none".
-
-What:		/sys/block/<disk>/queue/nr_zones
-Date:		November 2018
-Contact:	Damien Le Moal <damien.lemoal@wdc.com>
-Description:
-		nr_zones indicates the total number of zones of a zoned block
-		device ("host-aware" or "host-managed" zone model). For regular
-		block devices, the value is always 0.
-
-What:		/sys/block/<disk>/queue/max_active_zones
-Date:		July 2020
-Contact:	Niklas Cassel <niklas.cassel@wdc.com>
-Description:
-		For zoned block devices (zoned attribute indicating
-		"host-managed" or "host-aware"), the sum of zones belonging to
-		any of the zone states: EXPLICIT OPEN, IMPLICIT OPEN or CLOSED,
-		is limited by this value. If this value is 0, there is no limit.
-
-What:		/sys/block/<disk>/queue/max_open_zones
-Date:		July 2020
-Contact:	Niklas Cassel <niklas.cassel@wdc.com>
-Description:
-		For zoned block devices (zoned attribute indicating
-		"host-managed" or "host-aware"), the sum of zones belonging to
-		any of the zone states: EXPLICIT OPEN or IMPLICIT OPEN,
-		is limited by this value. If this value is 0, there is no limit.
-
-What:		/sys/block/<disk>/queue/chunk_sectors
-Date:		September 2016
-Contact:	Hannes Reinecke <hare@suse.com>
-Description:
-		chunk_sectors has different meaning depending on the type
-		of the disk. For a RAID device (dm-raid), chunk_sectors
-		indicates the size in 512B sectors of the RAID volume
-		stripe segment. For a zoned block device, either
-		host-aware or host-managed, chunk_sectors indicates the
-		size in 512B sectors of the zones of the device, with
-		the eventual exception of the last zone of the device
-		which may be smaller.
-
-What:		/sys/block/<disk>/queue/io_timeout
-Date:		November 2018
-Contact:	Weiping Zhang <zhangweiping@didiglobal.com>
-Description:
-		io_timeout is the request timeout in milliseconds. If a request
-		does not complete in this time then the block driver timeout
-		handler is invoked. That timeout handler can decide to retry
-		the request, to fail it or to start a device recovery strategy.
author	Linus Torvalds <torvalds@linux-foundation.org>	2022-01-12 10:26:52 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2022-01-12 10:26:52 -0800
commit	d3c810803576d867265277df8e94eee386351c9d (patch)
tree	2f40646e0bbcbe64e86d16a7800f1b19e8592d6b /Documentation/ABI
parent	42a7b4ed45e7667836fae4fb0e1ac6340588b1b0 (diff)
parent	f029cedb9bb5bab7f1bb3042be348f2dac0ee66e (diff)