<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/fs/nfs, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.</subtitle>
<id>https://git.kobert.dev/pm24.git/atom/fs/nfs?h=master</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom/fs/nfs?h=master'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2024-11-30T18:17:53Z</updated>
<entry>
<title>Merge tag 'nfs-for-6.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs</title>
<updated>2024-11-30T18:17:53Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-30T18:17:53Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=baf67f6aa9d29512809f1b1fbab624fce57fd16d'/>
<id>urn:sha1:baf67f6aa9d29512809f1b1fbab624fce57fd16d</id>
<content type='text'>
Pull NFS client updates from Trond Myklebust:
 "Bugfixes:
   - nfs/localio: fix for a memory corruption in nfs_local_read_done
   - Revert "nfs: don't reuse partially completed requests in
     nfs_lock_and_join_requests"
   - nfsv4:
       - ignore SB_RDONLY when mounting nfs
       - Fix a use-after-free problem in open()
   - sunrpc:
       - clear XPRT_SOCK_UPD_TIMEOUT when reseting the transport
       - timeout and cancel TLS handshake with -ETIMEDOUT
       - fix one UAF issue caused by sunrpc kernel tcp socket
       - Fix a hang in TLS sock_close if sk_write_pending
   - pNFS/blocklayout: Fix device registration issues

  Features and cleanups:
   - localio cleanups from Mike Snitzer
   - Clean up refcounting on the nfs version modules
   - __counted_by() annotations
   - nfs: make processes that are waiting for an I/O lock killable"

* tag 'nfs-for-6.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
  fs/nfs/io: make nfs_start_io_*() killable
  nfs/blocklayout: Limit repeat device registration on failure
  nfs/blocklayout: Don't attempt unregister for invalid block device
  sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket
  SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT
  sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport
  nfs: ignore SB_RDONLY when mounting nfs
  Revert "nfs: don't reuse partially completed requests in nfs_lock_and_join_requests"
  Revert "fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private"
  nfs/localio: must clear res.replen in nfs_local_read_done
  NFSv4.0: Fix a use-after-free problem in the asynchronous open()
  NFSv4.0: Fix the wake up of the next waiter in nfs_release_seqid()
  SUNRPC: Fix a hang in TLS sock_close if sk_write_pending
  sunrpc: remove newlines from tracepoints
  nfs: Annotate struct pnfs_commit_array with __counted_by()
  nfs/localio: eliminate need for nfs_local_fsync_work forward declaration
  nfs/localio: remove extra indirect nfs_to call to check {read,write}_iter
  nfs/localio: eliminate unnecessary kref in nfs_local_fsync_ctx
  nfs/localio: remove redundant suid/sgid handling
  NFS: Implement get_nfs_version()
  ...
</content>
</entry>
<entry>
<title>fs/nfs/io: make nfs_start_io_*() killable</title>
<updated>2024-11-28T17:55:33Z</updated>
<author>
<name>Max Kellermann</name>
<email>max.kellermann@ionos.com</email>
</author>
<published>2024-11-21T13:53:51Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=38a125b31504f91bf6fdd3cfc3a3e9a721e6c97a'/>
<id>urn:sha1:38a125b31504f91bf6fdd3cfc3a3e9a721e6c97a</id>
<content type='text'>
This allows killing processes that wait for a lock when one process is
stuck waiting for the NFS server.  This aims to complete the coverage
of NFS operations being killable, like nfs_direct_wait() does, for
example.

Signed-off-by: Max Kellermann &lt;max.kellermann@ionos.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>nfs/blocklayout: Limit repeat device registration on failure</title>
<updated>2024-11-28T17:55:32Z</updated>
<author>
<name>Benjamin Coddington</name>
<email>bcodding@redhat.com</email>
</author>
<published>2024-11-22T15:11:12Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=614733f9441ed53bb442d4734112ec1e24bd6da7'/>
<id>urn:sha1:614733f9441ed53bb442d4734112ec1e24bd6da7</id>
<content type='text'>
Every pNFS SCSI IO wants to do LAYOUTGET, then within the layout find the
device which can drive GETDEVINFO, then finally may need to prep the device
with a reservation.  This slow work makes a mess of IO latencies if one of
the later steps is going to fail for awhile.

If we're unable to register a SCSI device, ensure we mark the device as
unavailable so that it will timeout and be re-added via GETDEVINFO.  This
avoids repeated doomed attempts to register a device in the IO path.

Add some clarifying comments as well.

Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration")
Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>nfs/blocklayout: Don't attempt unregister for invalid block device</title>
<updated>2024-11-28T17:55:32Z</updated>
<author>
<name>Benjamin Coddington</name>
<email>bcodding@redhat.com</email>
</author>
<published>2024-11-22T15:11:11Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=3a4ce14d9a6b868e0787e4582420b721c04ee41e'/>
<id>urn:sha1:3a4ce14d9a6b868e0787e4582420b721c04ee41e</id>
<content type='text'>
Since commit d869da91cccb ("nfs/blocklayout: Fix premature PR key
unregistration") an unmount of a pNFS SCSI layout-enabled NFS may
dereference a NULL block_device in:

  bl_unregister_scsi+0x16/0xe0 [blocklayoutdriver]
  bl_free_device+0x70/0x80 [blocklayoutdriver]
  bl_free_deviceid_node+0x12/0x30 [blocklayoutdriver]
  nfs4_put_deviceid_node+0x60/0xc0 [nfsv4]
  nfs4_deviceid_purge_client+0x132/0x190 [nfsv4]
  unset_pnfs_layoutdriver+0x59/0x60 [nfsv4]
  nfs4_destroy_server+0x36/0x70 [nfsv4]
  nfs_free_server+0x23/0xe0 [nfs]
  deactivate_locked_super+0x30/0xb0
  cleanup_mnt+0xba/0x150
  task_work_run+0x59/0x90
  syscall_exit_to_user_mode+0x217/0x220
  do_syscall_64+0x8e/0x160

This happens because even though we were able to create the
nfs4_deviceid_node, the lookup for the device was unable to attach the
block device to the pnfs_block_dev.

If we never found a block device to register, we can avoid this case with
the PNFS_BDEV_REGISTERED flag.  Move the deref behind the test for the
flag.

Fixes: d869da91cccb ("nfs/blocklayout: Fix premature PR key unregistration")
Signed-off-by: Benjamin Coddington &lt;bcodding@redhat.com&gt;
Reviewed-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: Chuck Lever &lt;chuck.lever@oracle.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>nfs: ignore SB_RDONLY when mounting nfs</title>
<updated>2024-11-28T17:55:32Z</updated>
<author>
<name>Li Lingfeng</name>
<email>lilingfeng3@huawei.com</email>
</author>
<published>2024-11-14T04:53:03Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=52cb7f8f177878b4f22397b9c4d2c8f743766be3'/>
<id>urn:sha1:52cb7f8f177878b4f22397b9c4d2c8f743766be3</id>
<content type='text'>
When exporting only one file system with fsid=0 on the server side, the
client alternately uses the ro/rw mount options to perform the mount
operation, and a new vfsmount is generated each time.

It can be reproduced as follows:
[root@localhost ~]# mount /dev/sda /mnt2
[root@localhost ~]# echo "/mnt2 *(rw,no_root_squash,fsid=0)" &gt;/etc/exports
[root@localhost ~]# systemctl restart nfs-server
[root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount -t nfs -o ro,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount -t nfs -o rw,vers=4 127.0.0.1:/ /mnt/sdaa
[root@localhost ~]# mount | grep nfs4
127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,...
127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,...
127.0.0.1:/ on /mnt/sdaa type nfs4 (ro,relatime,vers=4.2,rsize=1048576,...
127.0.0.1:/ on /mnt/sdaa type nfs4 (rw,relatime,vers=4.2,rsize=1048576,...
[root@localhost ~]#

We expected that after mounting with the ro option, using the rw option to
mount again would return EBUSY, but the actual situation was not the case.

As shown above, when mounting for the first time, a superblock with the ro
flag will be generated, and at the same time, in do_new_mount_fc --&gt;
do_add_mount, it detects that the superblock corresponding to the current
target directory is inconsistent with the currently generated one
(path-&gt;mnt-&gt;mnt_sb != newmnt-&gt;mnt.mnt_sb), and a new vfsmount will be
generated.

When mounting with the rw option for the second time, since no matching
superblock can be found in the fs_supers list, a new superblock with the
rw flag will be generated again. The superblock in use (ro) is different
from the newly generated superblock (rw), and a new vfsmount will be
generated again.

When mounting with the ro option for the third time, the superblock (ro)
is found in fs_supers, the superblock in use (rw) is different from the
found superblock (ro), and a new vfsmount will be generated again.

We can switch between ro/rw through remount, and only one superblock needs
to be generated, thus avoiding the problem of repeated generation of
vfsmount caused by switching superblocks.

Furthermore, This can also resolve the issue described in the link.

Fixes: 275a5d24bf56 ("NFS: Error when mounting the same filesystem with different options")
Link: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/
Signed-off-by: Li Lingfeng &lt;lilingfeng3@huawei.com&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2024-11-23T17:58:07Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-11-23T17:58:07Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=5c00ff742bf5caf85f60e1c73999f99376fb865d'/>
<id>urn:sha1:5c00ff742bf5caf85f60e1c73999f99376fb865d</id>
<content type='text'>
Pull MM updates from Andrew Morton:

 - The series "zram: optimal post-processing target selection" from
   Sergey Senozhatsky improves zram's post-processing selection
   algorithm. This leads to improved memory savings.

 - Wei Yang has gone to town on the mapletree code, contributing several
   series which clean up the implementation:
	- "refine mas_mab_cp()"
	- "Reduce the space to be cleared for maple_big_node"
	- "maple_tree: simplify mas_push_node()"
	- "Following cleanup after introduce mas_wr_store_type()"
	- "refine storing null"

 - The series "selftests/mm: hugetlb_fault_after_madv improvements" from
   David Hildenbrand fixes this selftest for s390.

 - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
   implements some rationaizations and cleanups in the page mapping
   code.

 - The series "mm: optimize shadow entries removal" from Shakeel Butt
   optimizes the file truncation code by speeding up the handling of
   shadow entries.

 - The series "Remove PageKsm()" from Matthew Wilcox completes the
   migration of this flag over to being a folio-based flag.

 - The series "Unify hugetlb into arch_get_unmapped_area functions" from
   Oscar Salvador implements a bunch of consolidations and cleanups in
   the hugetlb code.

 - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
   takes away the wp-fault time practice of turning a huge zero page
   into small pages. Instead we replace the whole thing with a THP. More
   consistent cleaner and potentiall saves a large number of pagefaults.

 - The series "percpu: Add a test case and fix for clang" from Andy
   Shevchenko enhances and fixes the kernel's built in percpu test code.

 - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
   optimizes mremap() by avoiding doing things which we didn't need to
   do.

 - The series "Improve the tmpfs large folio read performance" from
   Baolin Wang teaches tmpfs to copy data into userspace at the folio
   size rather than as individual pages. A 20% speedup was observed.

 - The series "mm/damon/vaddr: Fix issue in
   damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
   splitting.

 - The series "memcg-v1: fully deprecate charge moving" from Shakeel
   Butt removes the long-deprecated memcgv2 charge moving feature.

 - The series "fix error handling in mmap_region() and refactor" from
   Lorenzo Stoakes cleanup up some of the mmap() error handling and
   addresses some potential performance issues.

 - The series "x86/module: use large ROX pages for text allocations"
   from Mike Rapoport teaches x86 to use large pages for
   read-only-execute module text.

 - The series "page allocation tag compression" from Suren Baghdasaryan
   is followon maintenance work for the new page allocation profiling
   feature.

 - The series "page-&gt;index removals in mm" from Matthew Wilcox remove
   most references to page-&gt;index in mm/. A slow march towards shrinking
   struct page.

 - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
   interface tests" from Andrew Paniakin performs maintenance work for
   DAMON's self testing code.

 - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
   improves zswap's batching of compression and decompression. It is a
   step along the way towards using Intel IAA hardware acceleration for
   this zswap operation.

 - The series "kasan: migrate the last module test to kunit" from
   Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
   tests over to the KUnit framework.

 - The series "implement lightweight guard pages" from Lorenzo Stoakes
   permits userapace to place fault-generating guard pages within a
   single VMA, rather than requiring that multiple VMAs be created for
   this. Improved efficiencies for userspace memory allocators are
   expected.

 - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
   tracepoints to provide increased visibility into memcg stats flushing
   activity.

 - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
   fixes a zram buglet which potentially affected performance.

 - The series "mm: add more kernel parameters to control mTHP" from
   Maíra Canal enhances our ability to control/configuremultisize THP
   from the kernel boot command line.

 - The series "kasan: few improvements on kunit tests" from Sabyrzhan
   Tasbolatov has a couple of fixups for the KASAN KUnit tests.

 - The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
   from Kairui Song optimizes list_lru memory utilization when lockdep
   is enabled.

* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
  cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
  mm/kfence: add a new kunit test test_use_after_free_read_nofault()
  zram: fix NULL pointer in comp_algorithm_show()
  memcg/hugetlb: add hugeTLB counters to memcg
  vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
  mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
  zram: ZRAM_DEF_COMP should depend on ZRAM
  MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
  Docs/mm/damon: recommend academic papers to read and/or cite
  mm: define general function pXd_init()
  kmemleak: iommu/iova: fix transient kmemleak false positive
  mm/list_lru: simplify the list_lru walk callback function
  mm/list_lru: split the lock to per-cgroup scope
  mm/list_lru: simplify reparenting and initial allocation
  mm/list_lru: code clean up for reparenting
  mm/list_lru: don't export list_lru_add
  mm/list_lru: don't pass unnecessary key parameters
  kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
  kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
  kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
  ...
</content>
</entry>
<entry>
<title>Revert "nfs: don't reuse partially completed requests in nfs_lock_and_join_requests"</title>
<updated>2024-11-18T14:42:03Z</updated>
<author>
<name>Trond Myklebust</name>
<email>trond.myklebust@hammerspace.com</email>
</author>
<published>2024-11-05T02:09:21Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=66f9dac9077c9c063552e465212abeb8f97d28a7'/>
<id>urn:sha1:66f9dac9077c9c063552e465212abeb8f97d28a7</id>
<content type='text'>
This reverts commit b571cfcb9dcac187c6d967987792d37cb0688610.

This patch appears to assume that if one request is complete, then the
others will complete too before unlocking. That is not a valid
assumption, since other requests could hit a non-fatal error or a short
write that would cause them not to complete.

Reported-by: Igor Raits &lt;igor@gooddata.com&gt;
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219508
Fixes: b571cfcb9dca ("nfs: don't reuse partially completed requests in nfs_lock_and_join_requests")
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>mm/list_lru: simplify the list_lru walk callback function</title>
<updated>2024-11-12T01:22:26Z</updated>
<author>
<name>Kairui Song</name>
<email>kasong@tencent.com</email>
</author>
<published>2024-11-04T17:52:57Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=da0c02516c501b43bd39ad4aca5779c86153d143'/>
<id>urn:sha1:da0c02516c501b43bd39ad4aca5779c86153d143</id>
<content type='text'>
Now isolation no longer takes the list_lru global node lock, only use the
per-cgroup lock instead.  And this lock is inside the list_lru_one being
walked, no longer needed to pass the lock explicitly.

Link: https://lkml.kernel.org/r/20241104175257.60853-7-ryncsn@gmail.com
Signed-off-by: Kairui Song &lt;kasong@tencent.com&gt;
Cc: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Matthew Wilcox (Oracle) &lt;willy@infradead.org&gt;
Cc: Michal Hocko &lt;mhocko@suse.com&gt;
Cc: Muchun Song &lt;muchun.song@linux.dev&gt;
Cc: Qi Zheng &lt;zhengqi.arch@bytedance.com&gt;
Cc: Roman Gushchin &lt;roman.gushchin@linux.dev&gt;
Cc: Shakeel Butt &lt;shakeel.butt@linux.dev&gt;
Cc: Waiman Long &lt;longman@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>Revert "fs: nfs: fix missing refcnt by replacing folio_set_private by folio_attach_private"</title>
<updated>2024-11-10T02:38:39Z</updated>
<author>
<name>Trond Myklebust</name>
<email>trond.myklebust@hammerspace.com</email>
</author>
<published>2024-11-05T02:01:43Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=8f52caf9d231e77412766b48e5630a647e5ef774'/>
<id>urn:sha1:8f52caf9d231e77412766b48e5630a647e5ef774</id>
<content type='text'>
This reverts commit 03e02b94171b1985dd0aa184296fe94425b855a3.

As was pointed out during code review, there is no need to use
folio_attach_private()/folio_detach_private() when a refcount to the
folio is already carried by the struct nfs_page.

Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
<entry>
<title>nfs/localio: must clear res.replen in nfs_local_read_done</title>
<updated>2024-11-10T02:22:00Z</updated>
<author>
<name>NeilBrown</name>
<email>neilb@suse.de</email>
</author>
<published>2024-11-08T23:39:44Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=650703bc4ed3edf841e851c99ab8e7ba9e5262a3'/>
<id>urn:sha1:650703bc4ed3edf841e851c99ab8e7ba9e5262a3</id>
<content type='text'>
Otherwise memory corruption can occur due to NFSv3 LOCALIO reads
leaving garbage in res.replen:
- nfs3_read_done() copies that into server-&gt;read_hdrsize; from there
  nfs3_proc_read_setup() copies it to args.replen in new requests.
- nfs3_xdr_enc_read3args() passes that to rpc_prepare_reply_pages()
  which includes it in hdrsize for xdr_init_pages, so that rq_rcv_buf
  contains a ridiculous len.
- This is copied to rq_private_buf and xs_read_stream_request()
  eventually passes the kvec to sock_recvmsg() which receives incoming
  data into entirely the wrong place.

This is easily reproduced with NFSv3 LOCALIO that is servicing reads
when it is made to pivot back to using normal RPC.  This switch back
to using normal NFSv3 with RPC can occur for a few reasons but this
issue was exposed with a test that stops and then restarts the NFSv3
server while LOCALIO is performing heavy read IO.

Fixes: 70ba381e1a43 ("nfs: add LOCALIO support")
Reported-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: NeilBrown &lt;neilb@suse.de&gt;
Co-developed-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Mike Snitzer &lt;snitzer@kernel.org&gt;
Signed-off-by: Trond Myklebust &lt;trond.myklebust@hammerspace.com&gt;
</content>
</entry>
</feed>
