Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dave Jiang:
- ACPI_NFIT Kconfig documetation fix
- Make nvdimm_bus_type const
- Make dax_bus_type const
- remove SLAB_MEM_SPREAD flag usage in DAX
* tag 'libnvdimm-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: remove SLAB_MEM_SPREAD flag usage
device-dax: make dax_bus_type const
nvdimm: make nvdimm_bus_type const
libnvdimm: Fix ACPI_NFIT in BLK_DEV_PMEM help
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
|
|
In preparation for checking whether the architecture has data cache
aliasing within alloc_dax(), modify the error handling of nvdimm/pmem
pmem_attach_disk() to treat alloc_dax() -EOPNOTSUPP failure as non-fatal.
[ Based on commit "nvdimm/pmem: Fix leak on dax_add_host() failure". ]
Link: https://lkml.kernel.org/r/20240215144633.96437-4-mathieu.desnoyers@efficios.com
Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing caches")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Sclafani <dm-devel@lists.linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Fix a leak on dax_add_host() error, where "goto out_cleanup_dax" is done
before setting pmem->dax_dev, which therefore issues the two following
calls on NULL pointers:
out_cleanup_dax:
kill_dax(pmem->dax_dev);
put_dax(pmem->dax_dev);
Link: https://lkml.kernel.org/r/20240208184913.484340-1-mathieu.desnoyers@efficios.com
Link: https://lkml.kernel.org/r/20240208184913.484340-2-mathieu.desnoyers@efficios.com
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pass the queue limits directly to blk_alloc_disk instead of setting them
one at a time.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pass a queue_limits to blk_alloc_disk and apply it if non-NULL. This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.
Also change blk_alloc_disk to return an ERR_PTR instead of just NULL
which can't distinguish errors.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20240215071055.2201424-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Now that the driver core can properly handle constant struct bus_type,
move the nvdimm_bus_type variable to be a constant structure as well,
placing it into read-only memory which can not be modified at runtime.
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20240204-bus_cleanup-nvdimm-v1-1-77ae19fa3e3b@marliere.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The ACPI_NFIT config option is described incorrectly as the
inverse NFIT_ACPI, which doesn't exist, so update the help
to the actual config option.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Link: https://lore.kernel.org/r/20240212123716.795996-1-pbrobinson@gmail.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Pull virtio updates from Michael Tsirkin:
- vdpa/mlx5: support for resumable vqs
- virtio_scsi: mq_poll support
- 3virtio_pmem: support SHMEM_REGION
- virtio_balloon: stay awake while adjusting balloon
- virtio: support for no-reset virtio PCI PM
- Fixes, cleanups
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Add mkey leak detection
vdpa/mlx5: Introduce reference counting to mrs
vdpa/mlx5: Use vq suspend/resume during .set_map
vdpa/mlx5: Mark vq state for modification in hw vq
vdpa/mlx5: Mark vq addrs for modification in hw vq
vdpa/mlx5: Introduce per vq and device resume
vdpa/mlx5: Allow modifying multiple vq fields in one modify command
vdpa/mlx5: Expose resumable vq capability
vdpa: Block vq property changes in DRIVER_OK
vdpa: Track device suspended state
scsi: virtio_scsi: Add mq_poll support
virtio_pmem: support feature SHMEM_REGION
virtio_balloon: stay awake while adjusting balloon
vdpa: Remove usage of the deprecated ida_simple_xx() API
virtio: Add support for no-reset virtio PCI PM
virtio_net: fix missing dma unmap for resize
vhost-vdpa: account iommu allocations
vdpa: Fix an error handling path in eni_vdpa_probe()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Ira Weiny:
"A mix of bug fixes and updates to interfaces used by nvdimm:
- Updates to interfaces include:
Use the new scope based management
Remove deprecated ida interfaces
Update to sysfs_emit()
- Fixup kdoc comments"
* tag 'libnvdimm-for-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
acpi/nfit: Use sysfs_emit() for all attributes
nvdimm/namespace: fix kernel-doc for function params
nvdimm/dimm_devs: fix kernel-doc for function params
nvdimm/btt: fix btt_blk_cleanup() kernel-doc
nvdimm-btt: simplify code with the scope based resource management
nvdimm: Remove usage of the deprecated ida_simple_xx() API
ACPI: NFIT: Use cleanup.h helpers instead of devm_*()
|
|
This patch adds the support for feature VIRTIO_PMEM_F_SHMEM_REGION
(virtio spec v1.2 section 5.19.5.2 [1]).
During feature negotiation, if VIRTIO_PMEM_F_SHMEM_REGION is offered
by the device, the driver looks for a shared memory region of id 0.
If it is found, this feature is understood. Otherwise, this feature
bit is cleared.
During probe, if VIRTIO_PMEM_F_SHMEM_REGION has been negotiated,
virtio pmem ignores the `start` and `size` fields in device config
and uses the physical address range of shared memory region 0.
[1] https://docs.oasis-open.org/virtio/virtio/v1.2/csd01/virtio-v1.2-csd01.html#x1-6480002
Signed-off-by: Changyuan Lyu <changyuanl@google.com>
Message-Id: <20231220204906.566922-1-changyuanl@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
|
|
Adjust kernel-doc notation to prevent warnings when using -Wall.
namespace_devs.c:76: warning: No description found for return value of 'nd_is_uuid_unique'
namespace_devs.c:343: warning: No description found for return value of 'shrink_dpa_allocation'
namespace_devs.c:668: warning: No description found for return value of 'grow_dpa_allocation'
namespace_devs.c:958: warning: No description found for return value of 'namespace_update_uuid'
namespace_devs.c:1665: warning: Function parameter or member 'nd_mapping' not described in 'create_namespace_pmem'
namespace_devs.c:1665: warning: Excess function parameter 'nspm' description in 'create_namespace_pmem'
namespace_devs.c:1665: warning: No description found for return value of 'create_namespace_pmem'
[iweiny: s/-errno/ERR_PTR(-errno)/]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: <nvdimm@lists.linux.dev>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231207210545.24056-3-rdunlap@infradead.org
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
Adjust kernel-doc notation to prevent warnings when using -Wall.
dimm_devs.c:59: warning: Function parameter or member 'ndd' not described in 'nvdimm_init_nsarea'
dimm_devs.c:59: warning: Excess function parameter 'nvdimm' description in 'nvdimm_init_nsarea'
dimm_devs.c:59: warning: No description found for return value of 'nvdimm_init_nsarea'
dimm_devs.c:728: warning: No description found for return value of 'nd_pmem_max_contiguous_dpa'
dimm_devs.c:773: warning: No description found for return value of 'nd_pmem_available_dpa'
dimm_devs.c:844: warning: Function parameter or member 'ndd' not described in 'nvdimm_allocated_dpa'
dimm_devs.c:844: warning: Excess function parameter 'nvdimm' description in 'nvdimm_allocated_dpa'
dimm_devs.c:844: warning: No description found for return value of 'nvdimm_allocated_dpa'
[iweiny: drop ND_CMD_* status code]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: <nvdimm@lists.linux.dev>
Link: https://lore.kernel.org/r/20231207210545.24056-2-rdunlap@infradead.org
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
Correct the function parameters to prevent kernel-doc warnings:
btt.c:1567: warning: Function parameter or member 'nd_region' not described in 'btt_init'
btt.c:1567: warning: Excess function parameter 'maxlane' description in 'btt_init'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: <nvdimm@lists.linux.dev>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20231207210545.24056-1-rdunlap@infradead.org
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
Use the scope based resource management (defined in
linux/cleanup.h) to automate resource lifetime
control on struct btt_sb *super in discover_arenas().
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20231214083919.22218-1-dinghao.liu@zju.edu.cn
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
ida_alloc() and ida_free() should be preferred to the deprecated
ida_simple_get() and ida_simple_remove().
This is less verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/50719568e4108f65f3b989ba05c1563e17afba3f.1702228319.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
Found with grep.
strncpy() is deprecated for use on NUL-terminated destination strings
[1] and as such we should prefer more robust and less ambiguous string
interfaces.
We expect super->signature to be NUL-terminated based on its usage with
memcmp against a NUL-term'd buffer:
btt_devs.c:
253 | if (memcmp(super->signature, BTT_SIG, BTT_SIG_LEN) != 0)
btt.h:
13 | #define BTT_SIG "BTT_ARENA_INFO\0"
NUL-padding is not required as `super` is already zero-allocated:
btt.c:
985 | super = kzalloc(sizeof(struct btt_sb), GFP_NOIO);
... rendering any additional NUL-padding superfluous.
Considering the above, a suitable replacement is `strscpy` [2] due to
the fact that it guarantees NUL-termination on the destination buffer
without unnecessarily NUL-padding.
Let's also use the more idiomatic strscpy usage of (dest, src,
sizeof(dest)) instead of (dest, src, XYZ_LEN) for buffers that the
compiler can determine the size of. This more tightly correlates the
destination buffer to the amount of bytes copied.
Side note, this pattern of memcmp() on two NUL-terminated strings should
really be changed to just a strncmp(), if i'm not mistaken? I see
multiple instances of this pattern in this system:
| if (memcmp(super->signature, BTT_SIG, BTT_SIG_LEN) != 0)
| return false;
where BIT_SIG is defined (weirdly) as a double NUL-terminated string:
| #define BTT_SIG "BTT_ARENA_INFO\0"
Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1]
Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2]
Link: https://github.com/KSPP/linux/issues/90
Cc: linux-hardening@vger.kernel.org
Signed-off-by: Justin Stitt <justinstitt@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20231019-strncpy-drivers-nvdimm-btt-c-v2-1-366993878cf0@google.com
Signed-off-by: Kees Cook <keescook@chromium.org>
|
|
Remove kernel-doc warnings:
drivers/nvdimm/badrange.c:271: warning: Function parameter or member
'nd_region' not described in 'nvdimm_badblocks_populate'
drivers/nvdimm/badrange.c:271: warning: Function parameter or member
'range' not described in 'nvdimm_badblocks_populate'
drivers/nvdimm/badrange.c:271: warning: Excess function parameter 'region'
description in 'nvdimm_badblocks_populate'
drivers/nvdimm/badrange.c:271: warning: Excess function parameter 'res'
description in 'nvdimm_badblocks_populate'
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230731112942.215135-1-wangzhu9@huawei.com
Tested-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct nd_region.
Additionally, since the element count member must be set before accessing
the annotated flexible array member, move its initialization earlier.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: nvdimm@lists.linux.dev
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
nd_region_acquire_lane uses get_cpu, which disables preemption. This is
an issue on PREEMPT_RT kernels, since btt_write_pg and also
nd_region_acquire_lane itself take a spin lock, resulting in BUG:
sleeping function called from invalid context.
Fix the issue by replacing get_cpu with smp_process_id and
migrate_disable when needed. This makes BTT operations preemptible, thus
permitting the use of spin_lock.
BUG example occurring when running ndctl tests on PREEMPT_RT kernel:
BUG: sleeping function called from invalid context at
kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 4903, name:
libndctl
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
Preemption disabled at:
[<ffffffffc1313db5>] nd_region_acquire_lane+0x15/0x90 [libnvdimm]
Call Trace:
<TASK>
dump_stack_lvl+0x8e/0xb0
__might_resched+0x19b/0x250
rt_spin_lock+0x4c/0x100
? btt_write_pg+0x2d7/0x500 [nd_btt]
btt_write_pg+0x2d7/0x500 [nd_btt]
? local_clock_noinstr+0x9/0xc0
btt_submit_bio+0x16d/0x270 [nd_btt]
__submit_bio+0x48/0x80
__submit_bio_noacct+0x7e/0x1e0
submit_bio_wait+0x58/0xb0
__blkdev_direct_IO_simple+0x107/0x240
? inode_set_ctime_current+0x51/0x110
? __pfx_submit_bio_wait_endio+0x10/0x10
blkdev_write_iter+0x1d8/0x290
vfs_write+0x237/0x330
...
</TASK>
Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
value
Use devm_kstrdup() instead of kstrdup() and check its return value to
avoid memory leak.
Fixes: 49bddc73d15c ("libnvdimm/of_pmem: Provide a unique name for bus provider")
Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull nvdimm updates from Dave Jiang:
"This is mostly small cleanups, fixes, and with a change to prevent
zero-sized namespace exposed to user for nvdimm.
Summary:
- kstrtobool() conversion for nvdimm
- Add REQ_OP_WRITE for virtio_pmem
- Header files update for of_pmem
- Restrict zero-sized namespace from being exposed to user
- Avoid unnecessary endian conversion
- Fix mem leak in nvdimm pmu
- Fix dereference after free in nvdimm pmu"
* tag 'libnvdimm-for-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm: Fix dereference after free in register_nvdimm_pmu()
nvdimm: Fix memleak of pmu attr_groups in unregister_nvdimm_pmu()
nvdimm/pfn_dev: Avoid unnecessary endian conversion
nvdimm/pfn_dev: Prevent the creation of zero-sized namespaces
nvdimm: Explicitly include correct DT includes
virtio_pmem: add the missing REQ_OP_WRITE for flush bio
nvdimm: Use kstrtobool() instead of strtobool()
|
|
support
Patch series "Add support for DAX vmemmap optimization for ppc64", v6.
This patch series implements changes required to support DAX vmemmap
optimization for ppc64. The vmemmap optimization is only enabled with
radix MMU translation and 1GB PUD mapping with 64K page size.
The patch series also splits the hugetlb vmemmap optimization as a
separate Kconfig variable so that architectures can enable DAX vmemmap
optimization without enabling hugetlb vmemmap optimization. This should
enable architectures like arm64 to enable DAX vmemmap optimization while
they can't enable hugetlb vmemmap optimization. More details of the same
are in patch "mm/vmemmap optimization: Split hugetlb and devdax vmemmap
optimization".
With 64K page size for 16384 pages added (1G) we save 14 pages
With 4K page size for 262144 pages added (1G) we save 4094 pages
With 4K page size for 512 pages added (2M) we save 6 pages
This patch (of 13):
Architectures like powerpc would like to enable transparent huge page pud
support only with radix translation. To support that add
has_transparent_pud_hugepage() helper that architectures can override.
[aneesh.kumar@linux.ibm.com: use the new has_transparent_pud_hugepage()]
Link: https://lkml.kernel.org/r/87tttrvtaj.fsf@linux.ibm.com
Link: https://lkml.kernel.org/r/20230724190759.483013-1-aneesh.kumar@linux.ibm.com
Link: https://lkml.kernel.org/r/20230724190759.483013-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
'nd_pmu->pmu.attr_groups' is dereferenced in function
'nvdimm_pmu_free_hotplug_memory' call after it has been freed. Because in
function 'nvdimm_pmu_free_hotplug_memory' memory pointed by the fields of
'nd_pmu->pmu.attr_groups' is deallocated it is necessary to call 'kfree'
after 'nvdimm_pmu_free_hotplug_memory'.
Fixes: 0fab1ba6ad6b ("drivers/nvdimm: Add perf interface to expose nvdimm performance stats")
Co-developed-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com>
Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/20230817114103.754977-1-konstantin.meskhidze@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
Memory pointed by 'nd_pmu->pmu.attr_groups' is allocated in function
'register_nvdimm_pmu' and is lost after 'kfree(nd_pmu)' call in function
'unregister_nvdimm_pmu'.
Fixes: 0fab1ba6ad6b ("drivers/nvdimm: Add perf interface to expose nvdimm performance stats")
Co-developed-by: Ivanov Mikhail <ivanov.mikhail1@huawei-partners.com>
Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze@huawei.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/20230817115945.771826-1-konstantin.meskhidze@huawei.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
use the local variable that already have the converted values.
No functional change in this patch.
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Link: https://lore.kernel.org/r/20230809053512.350660-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
On architectures that have different page size values used for kernel
direct mapping and userspace mappings, the user can end up creating zero-sized
namespaces as shown below
:/sys/bus/nd/devices/region1# cat align
0x1000000
/sys/bus/nd/devices/region1# echo 0x200000 > align
/sys/bus/nd/devices/region1/dax1.0# cat supported_alignments
65536 16777216
$ ndctl create-namespace -r region1 -m devdax -s 18M --align 64K
{
"dev":"namespace1.0",
"mode":"devdax",
"map":"dev",
"size":0,
"uuid":"3094329a-0c66-4905-847e-357223e56ab0",
"daxregion":{
"id":1,
"size":0,
"align":65536
},
"align":65536
}
similarily for fsdax
$ ndctl create-namespace -r region1 -m fsdax -s 18M --align 64K
{
"dev":"namespace1.0",
"mode":"fsdax",
"map":"dev",
"size":0,
"uuid":"45538a6f-dec7-405d-b1da-2a4075e06232",
"sector_size":512,
"align":65536,
"blockdev":"pmem1"
}
In commit 9ffc1d19fc4a ("mm/memremap_pages: Introduce memremap_compat_align()")
memremap_compat_align was added to make sure the kernel always creates
namespaces with 16MB alignment. But the user can still override the
region alignment and no input validation is done in the region alignment
values to retain the flexibility user had before. However, the kernel
ensures that only part of the namespace that can be mapped via kernel
direct mapping page size is enabled. This is achieved by tracking the
unmapped part of the namespace in pfn_sb->end_trunc. The kernel also
ensures that the start address of the namespace is also aligned to the
kernel direct mapping page size.
Depending on the user request, the kernel implements userspace mapping
alignment by updating pfn device alignment attribute and this value is
used to adjust the start address for userspace mappings. This is tracked
in pfn_sb->dataoff. Hence the available size for userspace mapping is:
usermapping_size = size of the namespace - pfn_sb->end_trun - pfn_sb->dataoff
If the kernel finds the user mapping size zero then don't allow the
creation of namespace.
After fix:
$ ndctl create-namespace -f -r region1 -m devdax -s 18M --align 64K
libndctl: ndctl_dax_enable: dax1.1: failed to enable
Error: namespace1.2: failed to enable
failed to create namespace: No such device or address
And existing zero sized namespace will be marked disabled.
root@ltczz75-lp2:/home/kvaneesh# ndctl list -N -r region1 -i
[
{
"dev":"namespace1.0",
"mode":"raw",
"size":18874368,
"uuid":"94a90fb0-8e78-4fb6-a759-ffc62f9fa181",
"sector_size":512,
"state":"disabled"
},
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Link: https://lore.kernel.org/r/20230809053512.350660-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
When doing mkfs.xfs on a pmem device, the following warning was
reported:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 384 at block/blk-core.c:751 submit_bio_noacct
Modules linked in:
CPU: 2 PID: 384 Comm: mkfs.xfs Not tainted 6.4.0-rc7+ #154
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
RIP: 0010:submit_bio_noacct+0x340/0x520
......
Call Trace:
<TASK>
? submit_bio_noacct+0xd5/0x520
submit_bio+0x37/0x60
async_pmem_flush+0x79/0xa0
nvdimm_flush+0x17/0x40
pmem_submit_bio+0x370/0x390
__submit_bio+0xbc/0x190
submit_bio_noacct_nocheck+0x14d/0x370
submit_bio_noacct+0x1ef/0x520
submit_bio+0x55/0x60
submit_bio_wait+0x5a/0xc0
blkdev_issue_flush+0x44/0x60
The root cause is that submit_bio_noacct() needs bio_op() is either
WRITE or ZONE_APPEND for flush bio and async_pmem_flush() doesn't assign
REQ_OP_WRITE when allocating flush bio, so submit_bio_noacct just fail
the flush bio.
Simply fix it by adding the missing REQ_OP_WRITE for flush bio. And we
could fix the flush order issue and do flush optimization later.
Cc: stable@vger.kernel.org # 6.3+
Fixes: b4a6bb3a67aa ("block: add a sanity check for non-write flush/fua bios")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Tested-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.
In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.
While at it, include the corresponding header file (<linux/kstrtox.h>)
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
|
|
When multiple processes mmap() a dax file, then at some point,
a process issues a 'load' and consumes a hwpoison, the process
receives a SIGBUS with si_code = BUS_MCEERR_AR and with si_lsb
set for the poison scope. Soon after, any other process issues
a 'load' to the poisoned page (that is unmapped from the kernel
side by memory_failure), it receives a SIGBUS with
si_code = BUS_ADRERR and without valid si_lsb.
This is confusing to user, and is different from page fault due
to poison in RAM memory, also some helpful information is lost.
Channel dax backend driver's poison detection to the filesystem
such that instead of reporting VM_FAULT_SIGBUS, it could report
VM_FAULT_HWPOISON.
If user level block IO syscalls fail due to poison, the errno will
be converted to EIO to maintain block API consistency.
Signed-off-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/20230615181325.1327259-2-jane.chu@oracle.com
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
|
The security_show function is not used outside of drivers/nvdimm/dimm_devs.c
and the attribute it is for is also already static. Silence the sparse
warning for this not being declared by making it static. Fixes:
drivers/nvdimm/dimm_devs.c:352:9: warning: symbol 'security_show' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://lore.kernel.org/r/20230616160925.17687-1-ben.dooks@codethink.co.uk
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
|
The nd_class is not used outside of drivers/nvdimm/bus.c and thus sparse
is generating the following warning. Remove this by making it static:
drivers/nvdimm/bus.c:28:14: warning: symbol 'nd_class' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Link: https://lore.kernel.org/r/20230616160628.11801-1-ben.dooks@codethink.co.uk
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
|
|
The security_show() function was made global and __weak at some
point to allow overriding it. The override was removed later, but
it remains global, which causes a warning about the missing
declaration:
drivers/nvdimm/dimm_devs.c:352:9: error: no previous prototype for 'security_show'
This is also not an appropriate name for a global symbol in the
kernel, so just make it static again.
Fixes: 15a8348707ff ("libnvdimm: Introduce CONFIG_NVDIMM_SECURITY_TEST flag")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/20230516201415.556858-3-arnd@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The module pointer in class_create() never actually did anything, and it
shouldn't have been requred to be set as a parameter even if it did
something. So just remove it and fix up all callers of the function in
the kernel tree at the same time.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Link: https://lore.kernel.org/r/20230313181843.1207845-4-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Pull virtio updates from Michael Tsirkin:
- device feature provisioning in ifcvf, mlx5
- new SolidNET driver
- support for zoned block device in virtio blk
- numa support in virtio pmem
- VIRTIO_F_RING_RESET support in vhost-net
- more debugfs entries in mlx5
- resume support in vdpa
- completion batching in virtio blk
- cleanup of dma api use in vdpa
- now simulating more features in vdpa-sim
- documentation, features, fixes all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits)
vdpa/mlx5: support device features provisioning
vdpa/mlx5: make MTU/STATUS presence conditional on feature bits
vdpa: validate device feature provisioning against supported class
vdpa: validate provisioned device features against specified attribute
vdpa: conditionally read STATUS in config space
vdpa: fix improper error message when adding vdpa dev
vdpa/mlx5: Initialize CVQ iotlb spinlock
vdpa/mlx5: Don't clear mr struct on destroy MR
vdpa/mlx5: Directly assign memory key
tools/virtio: enable to build with retpoline
vringh: fix a typo in comments for vringh_kiov
vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails
scsi: virtio_scsi: fix handling of kmalloc failure
vdpa: Fix a couple of spelling mistakes in some messages
vhost-net: support VIRTIO_F_RING_RESET
vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit
vdpa: mlx5: support per virtqueue dma device
vdpa: set dma mask for vDPA device
virtio-vdpa: support per vq dma device
vdpa: introduce get_vq_dma_device()
...
|
|
Pull Compute Express Link (CXL) updates from Dan Williams:
"To date Linux has been dependent on platform-firmware to map CXL RAM
regions and handle events / errors from devices. With this update we
can now parse / update the CXL memory layout, and report events /
errors from devices. This is a precursor for the CXL subsystem to
handle the end-to-end "RAS" flow for CXL memory. i.e. the flow that
for DDR-attached-DRAM is handled by the EDAC driver where it maps
system physical address events to a field-replaceable-unit (FRU /
endpoint device). In general, CXL has the potential to standardize
what has historically been a pile of memory-controller-specific error
handling logic.
Another change of note is the default policy for handling RAM-backed
device-dax instances. Previously the default access mode was "device",
mmap(2) a device special file to access memory. The new default is
"kmem" where the address range is assigned to the core-mm via
add_memory_driver_managed(). This saves typical users from wondering
why their platform memory is not visible via free(1) and stuck behind
a device-file. At the same time it allows expert users to deploy
policy to, for example, get dedicated access to high performance
memory, or hide low performance memory from general purpose kernel
allocations. This affects not only CXL, but also systems with
high-bandwidth-memory that platform-firmware tags with the
EFI_MEMORY_SP (special purpose) designation.
Summary:
- CXL RAM region enumeration: instantiate 'struct cxl_region' objects
for platform firmware created memory regions
- CXL RAM region provisioning: complement the existing PMEM region
creation support with RAM region support
- "Soft Reservation" policy change: Online (memory hot-add)
soft-reserved memory (EFI_MEMORY_SP) by default, but still allow
for setting aside such memory for dedicated access via device-dax.
- CXL Events and Interrupts: Takeover CXL event handling from
platform-firmware (ACPI calls this CXL Memory Error Reporting) and
export CXL Events via Linux Trace Events.
- Convey CXL _OSC results to drivers: Similar to PCI, let the CXL
subsystem interrogate the result of CXL _OSC negotiation.
- Emulate CXL DVSEC Range Registers as "decoders": Allow for
first-generation devices that pre-date the definition of the CXL
HDM Decoder Capability to translate the CXL DVSEC Range Registers
into 'struct cxl_decoder' objects.
- Set timestamp: Per spec, set the device timestamp in case of
hotplug, or if platform-firwmare failed to set it.
- General fixups: linux-next build issues, non-urgent fixes for
pre-production hardware, unit test fixes, spelling and debug
message improvements"
* tag 'cxl-for-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (66 commits)
dax/kmem: Fix leak of memory-hotplug resources
cxl/mem: Add kdoc param for event log driver state
cxl/trace: Add serial number to trace points
cxl/trace: Add host output to trace points
cxl/trace: Standardize device information output
cxl/pci: Remove locked check for dvsec_range_allowed()
cxl/hdm: Add emulation when HDM decoders are not committed
cxl/hdm: Create emulated cxl_hdm for devices that do not have HDM decoders
cxl/hdm: Emulate HDM decoder from DVSEC range registers
cxl/pci: Refactor cxl_hdm_decode_init()
cxl/port: Export cxl_dvsec_rr_decode() to cxl_port
cxl/pci: Break out range register decoding from cxl_hdm_decode_init()
cxl: add RAS status unmasking for CXL
cxl: remove unnecessary calling of pci_enable_pcie_error_reporting()
dax/hmem: build hmem device support as module if possible
dax: cxl: add CXL_REGION dependency
cxl: avoid returning uninitialized error code
cxl/pmem: Fix nvdimm registration races
cxl/mem: Fix UAPI command comment
cxl/uapi: Tag commands from cxl_query_cmd()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.3-rc1.
There's a lot of changes this development cycle, most of the work
falls into two different categories:
- fw_devlink fixes and updates. This has gone through numerous review
cycles and lots of review and testing by lots of different devices.
Hopefully all should be good now, and Saravana will be keeping a
watch for any potential regression on odd embedded systems.
- driver core changes to work to make struct bus_type able to be
moved into read-only memory (i.e. const) The recent work with Rust
has pointed out a number of areas in the driver core where we are
passing around and working with structures that really do not have
to be dynamic at all, and they should be able to be read-only
making things safer overall. This is the contuation of that work
(started last release with kobject changes) in moving struct
bus_type to be constant. We didn't quite make it for this release,
but the remaining patches will be finished up for the release after
this one, but the groundwork has been laid for this effort.
Other than that we have in here:
- debugfs memory leak fixes in some subsystems
- error path cleanups and fixes for some never-able-to-be-hit
codepaths.
- cacheinfo rework and fixes
- Other tiny fixes, full details are in the shortlog
All of these have been in linux-next for a while with no reported
problems"
[ Geert Uytterhoeven points out that that last sentence isn't true, and
that there's a pending report that has a fix that is queued up - Linus ]
* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
OPP: fix error checking in opp_migrate_dentry()
debugfs: update comment of debugfs_rename()
i3c: fix device.h kernel-doc warnings
dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
Revert "driver core: add error handling for devtmpfs_create_node()"
Revert "devtmpfs: add debug info to handle()"
Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
driver core: cpu: don't hand-override the uevent bus_type callback.
devtmpfs: remove return value of devtmpfs_delete_node()
devtmpfs: add debug info to handle()
driver core: add error handling for devtmpfs_create_node()
driver core: bus: update my copyright notice
driver core: bus: add bus_get_dev_root() function
driver core: bus: constify bus_unregister()
driver core: bus: constify some internal functions
driver core: bus: constify bus_get_kset()
driver core: bus: constify bus_register/unregister_notifier()
driver core: remove private pointer from struct bus_type
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X
bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()")
which does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter".
These filters provide users with finer-grained control over DAMOS's
actions. SeongJae has also done some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series
"mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
swap PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with
his series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings.
The previous BPF-based approach had shortcomings. See "mm: In-kernel
support for memory-deny-write-execute (MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a
per-node basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage
during compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in
ths series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier
functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's
series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM" and "fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest
of the kernel in the series "Simplify the external interface for
GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the
series "mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
include/linux/migrate.h: remove unneeded externs
mm/memory_hotplug: cleanup return value handing in do_migrate_range()
mm/uffd: fix comment in handling pte markers
mm: change to return bool for isolate_movable_page()
mm: hugetlb: change to return bool for isolate_hugetlb()
mm: change to return bool for isolate_lru_page()
mm: change to return bool for folio_isolate_lru()
objtool: add UACCESS exceptions for __tsan_volatile_read/write
kmsan: disable ftrace in kmsan core code
kasan: mark addr_has_metadata __always_inline
mm: memcontrol: rename memcg_kmem_enabled()
sh: initialize max_mapnr
m68k/nommu: add missing definition of ARCH_PFN_OFFSET
mm: percpu: fix incorrect size in pcpu_obj_full_size()
maple_tree: reduce stack usage with gcc-9 and earlier
mm: page_alloc: call panic() when memoryless node allocation fails
mm: multi-gen LRU: avoid futile retries
migrate_pages: move THP/hugetlb migration support check to simplify code
migrate_pages: batch flushing TLB
migrate_pages: share more code between _unmap and _move
...
|
|
Compute the numa information for a virtio_pmem device from the memory
range of the device. Previously, the target_node was always 0 since
the ndr_desc.target_node field was never explicitly set. The code for
computing the numa node is taken from cxl_pmem_region_probe in
drivers/cxl/pmem.c.
Signed-off-by: Michael Sammler <sammler@google.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Tested-by: Mina Almasry <almasrymina@google.com>
Message-Id: <20221115214036.1571015-1-sammler@google.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
|
|
A loop of the form:
while true; do modprobe cxl_pci; modprobe -r cxl_pci; done
...fails with the following crash signature:
BUG: kernel NULL pointer dereference, address: 0000000000000040
[..]
RIP: 0010:cxl_internal_send_cmd+0x5/0xb0 [cxl_core]
[..]
Call Trace:
<TASK>
cxl_pmem_ctl+0x121/0x240 [cxl_pmem]
nvdimm_get_config_data+0xd6/0x1a0 [libnvdimm]
nd_label_data_init+0x135/0x7e0 [libnvdimm]
nvdimm_probe+0xd6/0x1c0 [libnvdimm]
nvdimm_bus_probe+0x7a/0x1e0 [libnvdimm]
really_probe+0xde/0x380
__driver_probe_device+0x78/0x170
driver_probe_device+0x1f/0x90
__device_attach_driver+0x85/0x110
bus_for_each_drv+0x7d/0xc0
__device_attach+0xb4/0x1e0
bus_probe_device+0x9f/0xc0
device_add+0x445/0x9c0
nd_async_device_register+0xe/0x40 [libnvdimm]
async_run_entry_fn+0x30/0x130
...namely that the bottom half of async nvdimm device registration runs
after the CXL has already torn down the context that cxl_pmem_ctl()
needs. Unlike the ACPI NFIT case that benefits from launching multiple
nvdimm device registrations in parallel from those listed in the table,
CXL is already marked PROBE_PREFER_ASYNCHRONOUS. So provide for a
synchronous registration path to preclude this scenario.
Fixes: 21083f51521f ("cxl/pmem: Register 'pmem' / cxl_nvdimm devices")
Cc: <stable@vger.kernel.org>
Reported-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The ->rw_page method is a special purpose bypass of the usual bio handling
path that is limited to single-page reads and writes and synchronous which
causes a lot of extra code in the drivers, callers and the block layer.
The only remaining user is the MM swap code. Switch that swap code to
simply submit a single-vec on-stack bio an synchronously wait on it based
on a newly added QUEUE_FLAG_SYNCHRONOUS flag set by the drivers that
currently implement ->rw_page instead. While this touches one extra cache
line and executes extra code, it simplifies the block layer and drivers
and ensures that all feastures are properly supported by all drivers, e.g.
right now ->rw_page bypassed cgroup writeback entirely.
[akpm@linux-foundation.org: fix comment typo, per Dan]
Link: https://lkml.kernel.org/r/20230125133436.447864-8-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
...updated MAX_STRUCT_PAGE_SIZE to account for sizeof(struct page)
potentially doubling in the case of CONFIG_KMSAN=y. Unfortunately this
doubles the amount of capacity stolen from user addressable capacity for
everyone, regardless of whether they are using the debug option. Revert
that change, mandate that MAX_STRUCT_PAGE_SIZE never exceed 64, but
allow for debug scenarios to proceed with creating debug sized page maps
with a compile option to support debug scenarios.
Note that this only applies to cases where the page map is permanent,
i.e. stored in a reservation of the pmem itself ("--map=dev" in "ndctl
create-namespace" terms). For the "--map=mem" case, since the allocation
is ephemeral for the lifespan of the namespace, there are no explicit
restriction. However, the implicit restriction, of having enough
available "System RAM" to store the page map for the typically large
pmem, still applies.
Fixes: 6e9f05dc66f9 ("libnvdimm/pfn_dev: increase MAX_STRUCT_PAGE_SIZE")
Cc: <stable@vger.kernel.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Reported-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Yu Zhao <yuzhao@google.com>
Link: https://lore.kernel.org/r/167467815773.463042.7022545814443036382.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The uevent() callback in struct bus_type should not be modifying the
device that is passed into it, so mark it as a const * and propagate the
function signature changes out into all relevant subsystems that use
this callback.
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Acked-by: Hans de Goede <hdegoede@redhat.com>
Link: https://lore.kernel.org/r/20230111113018.459199-16-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Now that cpu_cache_invalidate_memregion() is generically available, use
it to centralize CPU cache management in the nvdimm region driver.
This trades off removing redundant per-dimm CPU cache flushing with an
opportunistic flush on every region disable event to cover the case of
sensitive dirty data in the cache being written back to media after a
secure erase / overwrite event.
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166993221550.1995348.16843505129579060258.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
nfit_test overrode the security_show() sysfs attribute function in nvdimm
dimm_devs in order to allow testing of security unlock. With the
introduction of CXL security commands, the trick to override
security_show() becomes significantly more complicated. By introdcing a
security flag CONFIG_NVDIMM_SECURITY_TEST, libnvdimm can just toggle the
check via a compile option. In addition the original override can can be
removed from tools/testing/nvdimm/.
The flag will also be used to bypass cpu_cache_invalidate_memregion() when
set in a different commit. This allows testing on QEMU with nfit_test or
cxl_test since cpu_cache_has_invalidate_memregion() checks whether
X86_FEATURE_HYPERVISOR cpu feature flag is set on x86.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983618758.2734609.18031639517065867138.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
The original nvdimm_security_ops ->disable() only supports user passphrase
for security disable. The CXL spec introduced the disabling of master
passphrase. Add a ->disable_master() callback to support this new operation
and leaving the old ->disable() mechanism alone. A "disable_master" command
is added for the sysfs attribute in order to allow command to be issued
from userspace. ndctl will need enabling in order to utilize this new
operation.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983616454.2734609.14204031148234398086.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull nvdimm updates from Dan Williams:
"Some small cleanups and fixes in and around the nvdimm subsystem. The
most significant change is a regression fix for nvdimm namespace
(volume) creation when the namespace size is smaller than 2MB/
Summary:
- Fix nvdimm namespace creation on platforms that do not publish
associated 'DIMM' metadata for a persistent memory region.
- Miscellaneous fixes and cleanups"
* tag 'libnvdimm-for-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
ACPI: HMAT: Release platform device in case of platform_device_add_data() fails
dax: Remove usage of the deprecated ida_simple_xxx API
libnvdimm/region: Allow setting align attribute on regions without mappings
nvdimm/namespace: Fix comment typo
nvdimm: make __nvdimm_security_overwrite_query static
nvdimm/region: Fix kernel-doc
nvdimm/namespace: drop unneeded temporary variable in size_store()
nvdimm/namespace: return uuid_null only once in nd_dev_to_uuid()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
linux-next for a couple of months without, to my knowledge, any
negative reports (or any positive ones, come to that).
- Also the Maple Tree from Liam Howlett. An overlapping range-based
tree for vmas. It it apparently slightly more efficient in its own
right, but is mainly targeted at enabling work to reduce mmap_lock
contention.
Liam has identified a number of other tree users in the kernel which
could be beneficially onverted to mapletrees.
Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
at [1]. This has yet to be addressed due to Liam's unfortunately
timed vacation. He is now back and we'll get this fixed up.
- Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
clang-generated instrumentation to detect used-unintialized bugs down
to the single bit level.
KMSAN keeps finding bugs. New ones, as well as the legacy ones.
- Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
memory into THPs.
- Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
support file/shmem-backed pages.
- userfaultfd updates from Axel Rasmussen
- zsmalloc cleanups from Alexey Romanov
- cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
memory-failure
- Huang Ying adds enhancements to NUMA balancing memory tiering mode's
page promotion, with a new way of detecting hot pages.
- memcg updates from Shakeel Butt: charging optimizations and reduced
memory consumption.
- memcg cleanups from Kairui Song.
- memcg fixes and cleanups from Johannes Weiner.
- Vishal Moola provides more folio conversions
- Zhang Yi removed ll_rw_block() :(
- migration enhancements from Peter Xu
- migration error-path bugfixes from Huang Ying
- Aneesh Kumar added ability for a device driver to alter the memory
tiering promotion paths. For optimizations by PMEM drivers, DRM
drivers, etc.
- vma merging improvements from Jakub Matěn.
- NUMA hinting cleanups from David Hildenbrand.
- xu xin added aditional userspace visibility into KSM merging
activity.
- THP & KSM code consolidation from Qi Zheng.
- more folio work from Matthew Wilcox.
- KASAN updates from Andrey Konovalov.
- DAMON cleanups from Kaixu Xia.
- DAMON work from SeongJae Park: fixes, cleanups.
- hugetlb sysfs cleanups from Muchun Song.
- Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.
Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]
* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
hugetlb: allocate vma lock for all sharable vmas
hugetlb: take hugetlb vma_lock when clearing vma_lock->vma pointer
hugetlb: fix vma lock handling during split vma and range unmapping
mglru: mm/vmscan.c: fix imprecise comments
mm/mglru: don't sync disk for each aging cycle
mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
mm: memcontrol: use do_memsw_account() in a few more places
mm: memcontrol: deprecate swapaccounting=0 mode
mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
mm/secretmem: remove reduntant return value
mm/hugetlb: add available_huge_pages() func
mm: remove unused inline functions from include/linux/mm_inline.h
selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
selftests/vm: add thp collapse shmem testing
selftests/vm: add thp collapse file and tmpfs testing
selftests/vm: modularize thp collapse memory operations
selftests/vm: dedup THP helpers
mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
mm/madvise: add file and shmem support to MADV_COLLAPSE
...
|