<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/drivers/nvme, branch rust-6.5</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=rust-6.5</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=rust-6.5'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2023-05-26T15:21:50Z</updated>
<entry>
<title>NVMe: Add MAXIO 1602 to bogus nid list.</title>
<updated>2023-05-26T15:21:50Z</updated>
<author>
<name>Tatsuki Sugiura</name>
<email>sugi@nemui.org</email>
</author>
<published>2023-05-20T12:23:50Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=a3a9d63dcd15535e7fdf4c7c1b32bfaed762973a'/>
<id>urn:sha1:a3a9d63dcd15535e7fdf4c7c1b32bfaed762973a</id>
<content type='text'>
HIKSEMI FUTURE M.2 SSD uses the same dummy nguid and eui64.
I confirmed it with my two devices.

This patch marks the controller as NVME_QUIRK_BOGUS_NID.

---------------------------------------------------------
sugi@tempest:~% sudo nvme id-ctrl /dev/nvme0
NVME Identify Controller:
vid       : 0x1e4b
ssvid     : 0x1e4b
sn        : 30096022612
mn        : HS-SSD-FUTURE 2048G
fr        : SN10542
rab       : 0
ieee      : 000000
cmic      : 0
mdts      : 7
cntlid    : 0
ver       : 0x10400
rtd3r     : 0x7a120
rtd3e     : 0x1e8480
oaes      : 0x200
ctratt    : 0x2
rrls      : 0
cntrltype : 1
fguid     : 00000000-0000-0000-0000-000000000000
&lt;snip...&gt;
---------------------------------------------------------

---------------------------------------------------------
sugi@tempest:~% sudo nvme id-ns /dev/nvme0n1
NVME Identify Namespace 1:
&lt;snip...&gt;
nguid   : 00000000000000000000000000000000
eui64   : 0000000000000002
lbaf  0 : ms:0   lbads:9  rp:0 (in use)
---------------------------------------------------------

Signed-off-by: Tatsuki Sugiura &lt;sugi@nemui.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'nvme-6.4-2023-05-18' of git://git.infradead.org/nvme into block-6.4</title>
<updated>2023-05-19T01:46:42Z</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2023-05-19T01:46:42Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=1878b736e88e0d30f67d7fb577816395727ef040'/>
<id>urn:sha1:1878b736e88e0d30f67d7fb577816395727ef040</id>
<content type='text'>
Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.4

 - More device quirks (Sagi, Hristo, Adrian, Daniel)
 - Controller delete race (Maurizo)
 - Multipath cleanup fix (Christoph)"

* tag 'nvme-6.4-2023-05-18' of git://git.infradead.org/nvme:
  nvme-pci: Add quirk for Teamgroup MP33 SSD
  nvme: do not let the user delete a ctrl before a complete initialization
  nvme-multipath: don't call blk_mark_disk_dead in nvme_mpath_remove_disk
  nvme-pci: clamp max_hw_sectors based on DMA optimized limitation
  nvme-pci: add quirk for missing secondary temperature thresholds
  nvme-pci: add NVME_QUIRK_BOGUS_NID for HS-SSD-FUTURE 2048G
</content>
</entry>
<entry>
<title>nvme-pci: Add quirk for Teamgroup MP33 SSD</title>
<updated>2023-05-19T00:53:55Z</updated>
<author>
<name>Daniel Smith</name>
<email>dansmith@ds.gy</email>
</author>
<published>2023-05-17T21:32:32Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=0649728123cf6a5518e154b4e1735fc85ea4f55c'/>
<id>urn:sha1:0649728123cf6a5518e154b4e1735fc85ea4f55c</id>
<content type='text'>
Add a quirk for Teamgroup MP33 that reports duplicate ids for disk.

Signed-off-by: Daniel Smith &lt;dansmith@ds.gy&gt;
[kch: patch formatting]
Signed-off-by: Chaitanya Kulkarni &lt;kch@nvidia.com&gt;
Tested-by: Daniel Smith &lt;dansmith@ds.gy&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>nvme: do not let the user delete a ctrl before a complete initialization</title>
<updated>2023-05-17T14:36:17Z</updated>
<author>
<name>Maurizio Lombardi</name>
<email>mlombard@redhat.com</email>
</author>
<published>2023-05-11T11:07:41Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=2eb94dd56a4a4e3fe286def3e2ba207804a37345'/>
<id>urn:sha1:2eb94dd56a4a4e3fe286def3e2ba207804a37345</id>
<content type='text'>
If a userspace application performes a "delete_controller" command
early during the ctrl initialization, the delete operation
may race against the init code and the kernel will crash.

nvme nvme5: Connect command failed: host path error
nvme nvme5: failed to connect queue: 0 ret=880
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
 blk_mq_quiesce_queue+0x18/0x90
 nvme_tcp_delete_ctrl+0x24/0x40 [nvme_tcp]
 nvme_do_delete_ctrl+0x7f/0x8b [nvme_core]
 nvme_sysfs_delete.cold+0x8/0xd [nvme_core]
 kernfs_fop_write_iter+0x124/0x1b0
 new_sync_write+0xff/0x190
 vfs_write+0x1ef/0x280

Fix the crash by checking the NVME_CTRL_STARTED_ONCE bit;
if it's not set it means that the nvme controller is still
in the process of getting initialized and the kernel
will return an -EBUSY error to userspace.
Set the NVME_CTRL_STARTED_ONCE later in the nvme_start_ctrl()
function, after the controller start operation is completed.

Signed-off-by: Maurizio Lombardi &lt;mlombard@redhat.com&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>nvme-multipath: don't call blk_mark_disk_dead in nvme_mpath_remove_disk</title>
<updated>2023-05-17T14:32:42Z</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2023-05-17T07:53:45Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=1743e5f6000901a11f4e1cd741bfa9136f3ec9b1'/>
<id>urn:sha1:1743e5f6000901a11f4e1cd741bfa9136f3ec9b1</id>
<content type='text'>
nvme_mpath_remove_disk is called after del_gendisk, at which point a
blk_mark_disk_dead call doesn't make any sense.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Keith Busch &lt;kbusch@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux</title>
<updated>2023-05-07T17:00:09Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-05-07T17:00:09Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=03e5cb7b50feb687508946a702febaba24c77f0b'/>
<id>urn:sha1:03e5cb7b50feb687508946a702febaba24c77f0b</id>
<content type='text'>
Pull more io_uring updates from Jens Axboe:
 "Nothing major in here, just two different parts:

   - A small series from Breno that enables passing the full SQE down
     for -&gt;uring_cmd().

     This is a prerequisite for enabling full network socket operations.
     Queued up a bit late because of some stylistic concerns that got
     resolved, would be nice to have this in 6.4-rc1 so the dependent
     work will be easier to handle for 6.5.

   - Fix for the huge page coalescing, which was a regression introduced
     in the 6.3 kernel release (Tobias)"

* tag 'for-6.4/io_uring-2023-05-07' of git://git.kernel.dk/linux:
  io_uring: Remove unnecessary BUILD_BUG_ON
  io_uring: Pass whole sqe to commands
  io_uring: Create a helper to return the SQE size
  io_uring/rsrc: check for nonconsecutive pages
</content>
</entry>
<entry>
<title>io_uring: Pass whole sqe to commands</title>
<updated>2023-05-04T14:19:05Z</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2023-05-04T12:18:55Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=fd9b8547bc5c34186dc42ea05fb4380d21695374'/>
<id>urn:sha1:fd9b8547bc5c34186dc42ea05fb4380d21695374</id>
<content type='text'>
Currently uring CMD operation relies on having large SQEs, but future
operations might want to use normal SQE.

The io_uring_cmd currently only saves the payload (cmd) part of the SQE,
but, for commands that use normal SQE size, it might be necessary to
access the initial SQE fields outside of the payload/cmd block.  So,
saves the whole SQE other than just the pdu.

This changes slightly how the io_uring_cmd works, since the cmd
structures and callbacks are not opaque to io_uring anymore. I.e, the
callbacks can look at the SQE entries, not only, in the cmd structure.

The main advantage is that we don't need to create custom structures for
simple commands.

Creates io_uring_sqe_cmd() that returns the cmd private data as a null
pointer and avoids casting in the callee side.
Also, make most of ublk_drv's sqe-&gt;cmd priv structure into const, and use
io_uring_sqe_cmd() to get the private structure, removing the unwanted
cast. (There is one case where the cast is still needed since the
header-&gt;{len,addr} is updated in the private structure)

Suggested-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Link: https://lore.kernel.org/r/20230504121856.904491-3-leitao@debian.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
</entry>
<entry>
<title>nvme-pci: clamp max_hw_sectors based on DMA optimized limitation</title>
<updated>2023-05-03T16:15:44Z</updated>
<author>
<name>Adrian Huang</name>
<email>ahuang12@lenovo.com</email>
</author>
<published>2023-04-21T08:08:00Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=3710e2b056cb92ad816e4d79fa54a6a5b6ad8cbd'/>
<id>urn:sha1:3710e2b056cb92ad816e4d79fa54a6a5b6ad8cbd</id>
<content type='text'>
When running the fio test on a 448-core AMD server + a NVME disk,
a soft lockup or a hard lockup call trace is shown:

[soft lockup]
watchdog: BUG: soft lockup - CPU#126 stuck for 23s! [swapper/126:0]
RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x50
...
Call Trace:
 &lt;IRQ&gt;
 fq_flush_timeout+0x7d/0xd0
 ? __pfx_fq_flush_timeout+0x10/0x10
 call_timer_fn+0x2e/0x150
 run_timer_softirq+0x48a/0x560
 ? __pfx_fq_flush_timeout+0x10/0x10
 ? clockevents_program_event+0xaf/0x130
 __do_softirq+0xf1/0x335
 irq_exit_rcu+0x9f/0xd0
 sysvec_apic_timer_interrupt+0xb4/0xd0
 &lt;/IRQ&gt;
 &lt;TASK&gt;
 asm_sysvec_apic_timer_interrupt+0x1f/0x30
...

Obvisouly, fq_flush_timeout spends over 20 seconds. Here is ftrace log:

               |  fq_flush_timeout() {
               |    fq_ring_free() {
               |      put_pages_list() {
   0.170 us    |        free_unref_page_list();
   0.810 us    |      }
               |      free_iova_fast() {
               |        free_iova() {
 * 85622.66 us |          _raw_spin_lock_irqsave();
   2.860 us    |          remove_iova();
   0.600 us    |          _raw_spin_unlock_irqrestore();
   0.470 us    |          lock_info_report();
   2.420 us    |          free_iova_mem.part.0();
 * 85638.27 us |        }
 * 85638.84 us |      }
               |      put_pages_list() {
   0.230 us    |        free_unref_page_list();
   0.470 us    |      }
   ...            ...
 $ 31017069 us |  }

Most of cores are under lock contention for acquiring iova_rbtree_lock due
to the iova flush queue mechanism.

[hard lockup]
NMI watchdog: Watchdog detected hard LOCKUP on cpu 351
RIP: 0010:native_queued_spin_lock_slowpath+0x2d8/0x330

Call Trace:
 &lt;IRQ&gt;
 _raw_spin_lock_irqsave+0x4f/0x60
 free_iova+0x27/0xd0
 free_iova_fast+0x4d/0x1d0
 fq_ring_free+0x9b/0x150
 iommu_dma_free_iova+0xb4/0x2e0
 __iommu_dma_unmap+0x10b/0x140
 iommu_dma_unmap_sg+0x90/0x110
 dma_unmap_sg_attrs+0x4a/0x50
 nvme_unmap_data+0x5d/0x120 [nvme]
 nvme_pci_complete_batch+0x77/0xc0 [nvme]
 nvme_irq+0x2ee/0x350 [nvme]
 ? __pfx_nvme_pci_complete_batch+0x10/0x10 [nvme]
 __handle_irq_event_percpu+0x53/0x1a0
 handle_irq_event_percpu+0x19/0x60
 handle_irq_event+0x3d/0x60
 handle_edge_irq+0xb3/0x210
 __common_interrupt+0x7f/0x150
 common_interrupt+0xc5/0xf0
 &lt;/IRQ&gt;
 &lt;TASK&gt;
 asm_common_interrupt+0x2b/0x40
...

ftrace shows fq_ring_free spends over 10 seconds [1]. Again, most of
cores are under lock contention for acquiring iova_rbtree_lock due
to the iova flush queue mechanism.

[Root Cause]
The root cause is that the max_hw_sectors_kb of nvme disk (mdts=10)
is 4096kb, which streaming DMA mappings cannot benefit from the
scalable IOVA mechanism introduced by the commit 9257b4a206fc
("iommu/iova: introduce per-cpu caching to iova allocation") if
the length is greater than 128kb.

To fix the lock contention issue, clamp max_hw_sectors based on
DMA optimized limitation in order to leverage scalable IOVA mechanism.

Note: The issue does not happen with another NVME disk (mdts = 5
and max_hw_sectors_kb = 128)

[1] https://gist.github.com/AdrianHuang/bf8ec7338204837631fbdaed25d19cc4

Suggested-by: Keith Busch &lt;kbusch@kernel.org&gt;
Reported-and-tested-by: Jiwei Sun &lt;sunjw10@lenovo.com&gt;
Signed-off-by: Adrian Huang &lt;ahuang12@lenovo.com&gt;
Reviewed-by: Keith Busch &lt;kbusch@kernel.org&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-pci: add quirk for missing secondary temperature thresholds</title>
<updated>2023-05-03T16:11:43Z</updated>
<author>
<name>Hristo Venev</name>
<email>hristo@venev.name</email>
</author>
<published>2023-04-25T19:58:54Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=bd375feeaf3408ed00e08c3bc918d6be15f691ad'/>
<id>urn:sha1:bd375feeaf3408ed00e08c3bc918d6be15f691ad</id>
<content type='text'>
On Kingston KC3000 and Kingston FURY Renegade (both have the same PCI
IDs) accessing temp3_{min,max} fails with an invalid field error (note
that there is no problem setting the thresholds for temp1).

This contradicts the NVM Express Base Specification 2.0b, page 292:

  The over temperature threshold and under temperature threshold
  features shall be implemented for all implemented temperature sensors
  (i.e., all Temperature Sensor fields that report a non-zero value).

Define NVME_QUIRK_NO_SECONDARY_TEMP_THRESH that disables the thresholds
for all but the composite temperature and set it for this device.

Signed-off-by: Hristo Venev &lt;hristo@venev.name&gt;
Reviewed-by: Guenter Roeck &lt;linux@roeck-us.net&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
<entry>
<title>nvme-pci: add NVME_QUIRK_BOGUS_NID for HS-SSD-FUTURE 2048G</title>
<updated>2023-05-03T16:10:01Z</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagi@grimberg.me</email>
</author>
<published>2023-05-03T15:57:33Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=1616d6c3717bae9041a4240d381ec56ccdaafedc'/>
<id>urn:sha1:1616d6c3717bae9041a4240d381ec56ccdaafedc</id>
<content type='text'>
Add a quirk to fix HS-SSD-FUTURE 2048G SSD drives reporting duplicate
nsids.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217384
Reported-by: Andrey God &lt;andreygod83@protonmail.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagi@grimberg.me&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
</content>
</entry>
</feed>
