Age | Commit message (Collapse) | Author |
|
Made a minor edit in the comments for 'struct zswap_entry' to delete the
description of the 'value' member that was deleted in commit 20a5532ffa53
("mm: remove code to handle same filled pages").
Link: https://lkml.kernel.org/r/20241002194213.30041-1-kanchana.p.sridhar@intel.com
Signed-off-by: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Fixes: 20a5532ffa53 ("mm: remove code to handle same filled pages")
Reviewed-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Usama Arif <usamaarif642@gmail.com>
Cc: Chengming Zhou <chengming.zhou@linux.dev>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Wajdi Feghali <wajdi.k.feghali@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Re-sort few misplaced entries in the CREDITS file.
Link: https://lkml.kernel.org/r/20241002111932.46012-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Return -ENOSYS from memfd_secret() syscall if !can_set_direct_map(). This
is the case for example on some arm64 configurations, where marking 4k
PTEs in the direct map not present can only be done if the direct map is
set up at 4k granularity in the first place (as ARM's break-before-make
semantics do not easily allow breaking apart large/gigantic pages).
More precisely, on arm64 systems with !can_set_direct_map(),
set_direct_map_invalid_noflush() is a no-op, however it returns success
(0) instead of an error. This means that memfd_secret will seemingly
"work" (e.g. syscall succeeds, you can mmap the fd and fault in pages),
but it does not actually achieve its goal of removing its memory from the
direct map.
Note that with this patch, memfd_secret() will start erroring on systems
where can_set_direct_map() returns false (arm64 with
CONFIG_RODATA_FULL_DEFAULT_ENABLED=n, CONFIG_DEBUG_PAGEALLOC=n and
CONFIG_KFENCE=n), but that still seems better than the current silent
failure. Since CONFIG_RODATA_FULL_DEFAULT_ENABLED defaults to 'y', most
arm64 systems actually have a working memfd_secret() and aren't be
affected.
From going through the iterations of the original memfd_secret patch
series, it seems that disabling the syscall in these scenarios was the
intended behavior [1] (preferred over having
set_direct_map_invalid_noflush return an error as that would result in
SIGBUSes at page-fault time), however the check for it got dropped between
v16 [2] and v17 [3], when secretmem moved away from CMA allocations.
[1]: https://lore.kernel.org/lkml/20201124164930.GK8537@kernel.org/
[2]: https://lore.kernel.org/lkml/20210121122723.3446-11-rppt@kernel.org/#t
[3]: https://lore.kernel.org/lkml/20201125092208.12544-10-rppt@kernel.org/
Link: https://lkml.kernel.org/r/20241001080056.784735-1-roypat@amazon.co.uk
Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Alexander Graf <graf@amazon.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: James Gowans <jgowans@amazon.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
I'm leaving Google.
Link: https://lkml.kernel.org/r/20240927192912.31532-1-i@maskray.me
Signed-off-by: Fangrui Song <i@maskray.me>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
We should only check for pmd_special() after we made sure that we have a
present PMD. For example, if we have a migration PMD, pmd_special() might
indicate that we have a special PMD although we really don't.
This fixes confusing migration entries as PFN mappings, and not doing what
we are supposed to do in the "is_swap_pmd()" case further down in the
function -- including messing up COW, page table handling and accounting.
Link: https://lkml.kernel.org/r/20240926154234.2247217-1-david@redhat.com
Fixes: bc02afbd4d73 ("mm/fork: accept huge pfnmap entries")
Signed-off-by: David Hildenbrand <david@redhat.com>
Reported-by: syzbot+bf2c35fa302ebe3c7471@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/66f15c8d.050a0220.c23dd.000f.GAE@google.com/
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
In resource_test_insert_resource(), the pointer is used in error message
after kfree(). This is user-after-free. To fix this, we need to call
kunit_add_action_or_reset() to schedule memory freeing after usage. But
kunit_add_action_or_reset() itself may fail and free the memory. So, its
return value should be checked and abort the test for failure. Then, we
found that other usage of kunit_add_action_or_reset() in
resource_test_region_intersects() needs to be fixed too. We fix all these
user-after-free bugs in this patch.
Link: https://lkml.kernel.org/r/20240930070611.353338-1-ying.huang@intel.com
Fixes: 99185c10d5d9 ("resource, kunit: add test case for region_intersects()")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reported-by: Kees Bakker <kees@ijzerbout.nl>
Closes: https://lore.kernel.org/lkml/87ldzaotcg.fsf@yhuang6-desk2.ccr.corp.intel.com/
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When /proc/kcore is read an attempt to read the first two pages results in
HW-specific page swap on s390 and another (so called prefix) pages are
accessed instead. That leads to a wrong read.
Allow architecture-specific translation of memory addresses using
kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily
to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks. That
way an architecture can deal with specific physical memory ranges.
Re-use the existing /dev/mem callback implementation on s390, which
handles the described prefix pages swapping correctly.
For other architectures the default callback is basically NOP. It is
expected the condition (vaddr == __va(__pa(vaddr))) always holds true for
KCORE_RAM memory type.
Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The hmm2 double_map test was failing due to an incorrect buffer->mirror
size. The buffer->mirror size was 6, while buffer->ptr size was 6 *
PAGE_SIZE. The test failed because the kernel's copy_to_user function was
attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror. Since the
size of buffer->mirror was incorrect, copy_to_user failed.
This patch corrects the buffer->mirror size to 6 * PAGE_SIZE.
Test Result without this patch
==============================
# RUN hmm2.hmm2_device_private.double_map ...
# hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0)
# double_map: Test terminated by assertion
# FAIL hmm2.hmm2_device_private.double_map
not ok 53 hmm2.hmm2_device_private.double_map
Test Result with this patch
===========================
# RUN hmm2.hmm2_device_private.double_map ...
# OK hmm2.hmm2_device_private.double_map
ok 53 hmm2.hmm2_device_private.double_map
Link: https://lkml.kernel.org/r/20240927050752.51066-1-donettom@linux.ibm.com
Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM")
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
pgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise,
vmf->address not aligned to fault_size will be aligned to the next
alignment, that can result in memory failure getting the wrong address.
It's a subtle situation that only can be observed in
page_mapped_in_vma() after the page is page fault handled by
dev_dax_huge_fault. Generally, there is little chance to perform
page_mapped_in_vma in dev-dax's page unless in specific error injection
to the dax device to trigger an MCE - memory-failure. In that case,
page_mapped_in_vma() will be triggered to determine which task is
accessing the failure address and kill that task in the end.
We used self-developed dax device (which is 2M aligned mapping) , to
perform error injection to random address. It turned out that error
injected to non-2M-aligned address was causing endless MCE until panic.
Because page_mapped_in_vma() kept resulting wrong address and the task
accessing the failure address was never killed properly:
[ 3783.719419] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.049006] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.049190] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.448042] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.448186] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3784.792026] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3784.792179] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.162502] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.162633] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.461116] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.461247] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3785.764730] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3785.764859] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.042128] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.042259] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.464293] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.464423] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3786.818090] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3786.818217] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
[ 3787.085297] mce: Uncorrected hardware memory error in user-access at
200c9742380
[ 3787.085424] Memory failure: 0x200c9742: recovery action for dax page:
Recovered
It took us several weeks to pinpoint this problem, but we eventually
used bpftrace to trace the page fault and mce address and successfully
identified the issue.
Joao added:
; Likely we never reproduce in production because we always pin
: device-dax regions in the region align they provide (Qemu does
: similarly with prealloc in hugetlb/file backed memory). I think this
: bug requires that we touch *unpinned* device-dax regions unaligned to
: the device-dax selected alignment (page size i.e. 4K/2M/1G)
Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.1727421694.git.llfl@linux.alibaba.com
Fixes: b9b5777f09be ("device-dax: use ALIGN() for determining pgoff")
Signed-off-by: Kun(llfl) <llfl@linux.alibaba.com>
Tested-by: JianXiong Zhao <zhaojianxiong.zjx@alibaba-inc.com>
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Calling into kthread unparking unconditionally is mostly harmless when
the kthread is already unparked. The wake up is then simply ignored
because the target is not in TASK_PARKED state.
However if the kthread is per CPU, the wake up is preceded by a call
to kthread_bind() which expects the task to be inactive and in
TASK_PARKED state, which obviously isn't the case if it is unparked.
As a result, calling kthread_stop() on an unparked per-cpu kthread
triggers such a warning:
WARNING: CPU: 0 PID: 11 at kernel/kthread.c:525 __kthread_bind_mask kernel/kthread.c:525
<TASK>
kthread_stop+0x17a/0x630 kernel/kthread.c:707
destroy_workqueue+0x136/0xc40 kernel/workqueue.c:5810
wg_destruct+0x1e2/0x2e0 drivers/net/wireguard/device.c:257
netdev_run_todo+0xe1a/0x1000 net/core/dev.c:10693
default_device_exit_batch+0xa14/0xa90 net/core/dev.c:11769
ops_exit_list net/core/net_namespace.c:178 [inline]
cleanup_net+0x89d/0xcc0 net/core/net_namespace.c:640
process_one_work kernel/workqueue.c:3231 [inline]
process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
worker_thread+0x86d/0xd70 kernel/workqueue.c:3393
kthread+0x2f0/0x390 kernel/kthread.c:389
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Fix this with skipping unecessary unparking while stopping a kthread.
Link: https://lkml.kernel.org/r/20240913214634.12557-1-frederic@kernel.org
Fixes: 5c25b5ff89f0 ("workqueue: Tag bound workers with KTHREAD_IS_PER_CPU")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reported-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com
Tested-by: syzbot+943d34fa3cf2191e3068@syzkaller.appspotmail.com
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hillf Danton <hdanton@sina.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit eab0af905bfc3e9c05da2ca163d76a1513159aa4.
There is no existing user of those flags. PF_MEMALLOC_NOWARN is dangerous
because a nested allocation context can use GFP_NOFAIL which could cause
unexpected failure. Such a code would be hard to maintain because it
could be deeper in the call chain.
PF_MEMALLOC_NORECLAIM has been added even when it was pointed out [1] that
such a allocation contex is inherently unsafe if the context doesn't fully
control all allocations called from this context.
While PF_MEMALLOC_NOWARN is not dangerous the way PF_MEMALLOC_NORECLAIM is
it doesn't have any user and as Matthew has pointed out we are running out
of those flags so better reclaim it without any real users.
[1] https://lore.kernel.org/all/ZcM0xtlKbAOFjv5n@tiehlicka/
Link: https://lkml.kernel.org/r/20240926172940.167084-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "remove PF_MEMALLOC_NORECLAIM" v3.
This patch (of 2):
bch2_new_inode relies on PF_MEMALLOC_NORECLAIM to try to allocate a new
inode to achieve GFP_NOWAIT semantic while holding locks. If this
allocation fails it will drop locks and use GFP_NOFS allocation context.
We would like to drop PF_MEMALLOC_NORECLAIM because it is really
dangerous to use if the caller doesn't control the full call chain with
this flag set. E.g. if any of the function down the chain needed
GFP_NOFAIL request the PF_MEMALLOC_NORECLAIM would override this and
cause unexpected failure.
While this is not the case in this particular case using the scoped gfp
semantic is not really needed bacause we can easily pus the allocation
context down the chain without too much clutter.
[akpm@linux-foundation.org: fix kerneldoc warnings]
Link: https://lkml.kernel.org/r/20240926172940.167084-1-mhocko@kernel.org
Link: https://lkml.kernel.org/r/20240926172940.167084-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz> # For vfs changes
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: James Morris <jmorris@namei.org>
Cc: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Yafang Shao <laoar.shao@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Disabling preemption in the GRU driver is unnecessary, and clashes with
sleeping locks in several code paths. Remove preempt_disable and
preempt_enable from the GRU driver.
Signed-off-by: Dimitri Sivanich <sivanich@hpe.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
After the delegation is returned to the NFS server remove it
from the server's delegations list to reduce the time it takes
to scan this list.
Network trace captured while running the below script shows the
time taken to service the CB_RECALL increases gradually due to
the overhead of traversing the delegation list in
nfs_delegation_find_inode_server.
The NFS server in this test is a Solaris server which issues
CB_RECALL when receiving the all-zero stateid in the SETATTR.
mount=/mnt/data
for i in $(seq 1 20)
do
echo $i
mkdir $mount/testtarfile$i
time tar -C $mount/testtarfile$i -xf 5000_files.tar
done
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode
Pull unicode fix from Gabriel Krisman Bertazi:
- Handle code-points with the Ignorable property as regular character
instead of treating them as an empty string (me)
* tag 'unicode-fixes-6.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
unicode: Don't special case ignorable code points
|
|
We don't need to handle them separately. Instead, just let them
decompose/casefold to themselves.
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
|
|
The sched_ext selftests is missing proper cross-compilation support, a
proper target entry, and out-of-tree build support.
When building the kselftest suite, e.g.:
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- \
TARGETS=sched_ext SKIP_TARGETS="" O=/output/foo \
-C tools/testing/selftests install
or:
make ARCH=arm64 LLVM=1 TARGETS=sched_ext SKIP_TARGETS="" \
O=/output/foo -C tools/testing/selftests install
The expectation is that the sched_ext is included, cross-built, the
correct toolchain is picked up, and placed into /output/foo.
In contrast to the BPF selftests, the sched_ext suite does not use
bpftool at test run-time, so it is sufficient to build bpftool for the
build host only.
Add ARCH, CROSS_COMPILE, OUTPUT, and TARGETS support to the sched_ext
selftest. Also, remove some variables that were unused by the
Makefile.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: David Vernet <void@manifault.com>
Tested-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Using the device-managed version allows to simplify clean-up in probe()
error path.
Additionally, this device-managed ensures proper cleanup, which helps to
resolve memory errors, page faults, btrfs going read-only, and btrfs
disk corruption.
Fixes: 4b2c53d93a4b ("SFH:Transport Driver to add support of AMD Sensor Fusion Hub (SFH)")
Tested-by: Chris Hixon <linux-kernel-bugs@hixontech.com>
Tested-by: Richard <hobbes1069@gmail.com>
Tested-by: Skyler <skpu@pm.me>
Reported-by: Chris Hixon <linux-kernel-bugs@hixontech.com>
Closes: https://lore.kernel.org/all/3b129b1f-8636-456a-80b4-0f6cce0eef63@hixontech.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219331
Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@amd.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>
|
|
A user reported that commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi
device manage_system_start_stop") introduced a spin down + immediate spin
up of the disk both when entering and when resuming from hibernation.
This behavior was not there before, and causes an increased latency both
when entering and when resuming from hibernation.
Hibernation is done by three consecutive PM events, in the following order:
1) PM_EVENT_FREEZE
2) PM_EVENT_THAW
3) PM_EVENT_HIBERNATE
Commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi device
manage_system_start_stop") modified ata_eh_handle_port_suspend() to call
ata_dev_power_set_standby() (which spins down the disk), for both event
PM_EVENT_FREEZE and event PM_EVENT_HIBERNATE.
Documentation/driver-api/pm/devices.rst, section "Entering Hibernation",
explicitly mentions that PM_EVENT_FREEZE does not have to be put the device
in a low-power state, and actually recommends not doing so. Thus, let's not
spin down the disk on PM_EVENT_FREEZE. (The disk will instead be spun down
during the subsequent PM_EVENT_HIBERNATE event.)
This way, PM_EVENT_FREEZE will behave as it did before commit aa3998dbeb3a
("ata: libata-scsi: Disable scsi device manage_system_start_stop"), while
PM_EVENT_HIBERNATE will continue to spin down the disk.
This will avoid the superfluous spin down + spin up when entering and
resuming from hibernation, while still making sure that the disk is spun
down before actually entering hibernation.
Cc: stable@vger.kernel.org # v6.6+
Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20241008135843.1266244-2-cassel@kernel.org
Signed-off-by: Niklas Cassel <cassel@kernel.org>
|
|
The boot mapped ring buffer has its buffer mapped at a fixed location
found at boot up. It is not dynamic. It cannot grow or be expanded when
new CPUs come online.
Do not hook fixed memory mapped ring buffers to the CPU hotplug callback,
otherwise it can cause a crash when it tries to add the buffer to the
memory that is already fully occupied.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://lore.kernel.org/20241008143242.25e20801@gandalf.local.home
Fixes: be68d63a139bd ("ring-buffer: Add ring_buffer_alloc_range()")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Yisen Zhuang has left the company in September.
Jian Shen will be responsible for maintaining the
hns3/hns driver's code in the future,
so add Jian Shen to the hns3/hns driver's matainer list.
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
If hashing fails in sctp_listen_start(), the socket remains in the
LISTENING state, even though it was not added to the hash table.
This can lead to a scenario where a socket appears to be listening
without actually being accessible.
This patch ensures that if the hashing operation fails, the sk_state
is set back to CLOSED before returning an error.
Note that there is no need to undo the autobind operation if hashing
fails, as the bind port can still be used for next listen() call on
the same socket.
Fixes: 76c6d988aeb3 ("sctp: add sock_reuseport for the sock in __sctp_hash_endpoint")
Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Currently this driver prints this line with what looks like
a rogue format specifier when the device is probed:
[ 2.840000] eth%d: MVME147 at 0xfffe1800, irq 12, Hardware Address xx:xx:xx:xx:xx:xx
Change the printk() for netdev_info() and move it after the
registration has completed so it prints out the name of the
interface properly.
Signed-off-by: Daniel Palmer <daniel@0x0f.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
All MMD reads return 0 for the RTL8126A-integrated PHY. Therefore phylib
assumes it doesn't support EEE, what results in higher power consumption,
and a significantly higher chip temperature in my case.
To fix this split out the PHY driver for the RTL8126A-integrated PHY
and set the read_mmd/write_mmd callbacks to read from vendor-specific
registers.
Fixes: 5befa3728b85 ("net: phy: realtek: add support for RTL8126A-integrated 5Gbps PHY")
Cc: stable@vger.kernel.org
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This commit is a replay of commit 6252690f7e1b ("btrfs: fix invalid
mapping of extent xarray state"). We need to call
btrfs_folio_clear_dirty() before btrfs_set_range_writeback(), so that
xarray DIRTY tag is cleared.
With a refactoring commit 8189197425e7 ("btrfs: refactor
__extent_writepage_io() to do sector-by-sector submission"), it screwed
up and the order is reversed and causing the same hang. Fix the ordering
now in submit_one_sector().
Fixes: 8189197425e7 ("btrfs: refactor __extent_writepage_io() to do sector-by-sector submission")
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
At btrfs_load_zone_info() we have an error path that is dereferencing
the name of a device which is a RCU string but we are not holding a RCU
read lock, which is incorrect.
Fix this by using btrfs_err_in_rcu() instead of btrfs_err().
The problem is there since commit 08e11a3db098 ("btrfs: zoned: load zone's
allocation offset"), back then at btrfs_load_block_group_zone_info() but
then later on that code was factored out into the helper
btrfs_load_zone_info() by commit 09a46725cc84 ("btrfs: zoned: factor out
per-zone logic from btrfs_load_block_group_zone_info").
Fixes: 08e11a3db098 ("btrfs: zoned: load zone's allocation offset")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The VLAN table is a shared memory between the two ports/slices
in a ICSSG cluster and this may lead to race condition when the
common code paths for both ports are executed in different CPUs.
Fix the race condition access by locking the shared memory access
Fixes: 487f7323f39a ("net: ti: icssg-prueth: Add helper functions to configure FDB")
Signed-off-by: MD Danish Anwar <danishanwar@ti.com>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Fix a typo in comments.
Signed-off-by: Andrew Kreimer <algonell@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
fallocate unshare mode explicitly breaks extent sharing. When a
command completes, it checks the data fork for any remaining shared
extents to determine whether the reflink inode flag and COW fork
preallocation can be removed. This logic doesn't consider in-core
pagecache and I/O state, however, which means we can unsafely remove
COW fork blocks that are still needed under certain conditions.
For example, consider the following command sequence:
xfs_io -fc "pwrite 0 1k" -c "reflink <file> 0 256k 1k" \
-c "pwrite 0 32k" -c "funshare 0 1k" <file>
This allocates a data block at offset 0, shares it, and then
overwrites it with a larger buffered write. The overwrite triggers
COW fork preallocation, 32 blocks by default, which maps the entire
32k write to delalloc in the COW fork. All but the shared block at
offset 0 remains hole mapped in the data fork. The unshare command
redirties and flushes the folio at offset 0, removing the only
shared extent from the inode. Since the inode no longer maps shared
extents, unshare purges the COW fork before the remaining 28k may
have written back.
This leaves dirty pagecache backed by holes, which writeback quietly
skips, thus leaving clean, non-zeroed pagecache over holes in the
file. To verify, fiemap shows holes in the first 32k of the file and
reads return different data across a remount:
$ xfs_io -c "fiemap -v" <file>
<file>:
EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS
...
1: [8..511]: hole 504
...
$ xfs_io -c "pread -v 4k 8" <file>
00001000: cd cd cd cd cd cd cd cd ........
$ umount <mnt>; mount <dev> <mnt>
$ xfs_io -c "pread -v 4k 8" <file>
00001000: 00 00 00 00 00 00 00 00 ........
To avoid this problem, make unshare follow the same rules used for
background cowblock scanning and never purge the COW fork for inodes
with dirty pagecache or in-flight I/O.
Fixes: 46afb0628b86347 ("xfs: only flush the unshared range in xfs_reflink_unshare")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
|
|
When CONFIG_NET_9P_USBG=y but CONFIG_USB_LIBCOMPOSITE=m and
CONFIG_CONFIGFS_FS=m, the following build error occurs:
riscv64-unknown-linux-gnu-ld: net/9p/trans_usbg.o: in function `usb9pfs_free_func':
trans_usbg.c:(.text+0x124): undefined reference to `usb_free_all_descriptors'
riscv64-unknown-linux-gnu-ld: net/9p/trans_usbg.o: in function `usb9pfs_rx_complete':
trans_usbg.c:(.text+0x2d8): undefined reference to `usb_interface_id'
riscv64-unknown-linux-gnu-ld: trans_usbg.c:(.text+0x2f6): undefined reference to `usb_string_id'
riscv64-unknown-linux-gnu-ld: net/9p/trans_usbg.o: in function `usb9pfs_func_bind':
trans_usbg.c:(.text+0x31c): undefined reference to `usb_ep_autoconfig'
riscv64-unknown-linux-gnu-ld: trans_usbg.c:(.text+0x336): undefined reference to `usb_ep_autoconfig'
riscv64-unknown-linux-gnu-ld: trans_usbg.c:(.text+0x378): undefined reference to `usb_assign_descriptors'
riscv64-unknown-linux-gnu-ld: net/9p/trans_usbg.o: in function `f_usb9pfs_opts_buflen_store':
trans_usbg.c:(.text+0x49e): undefined reference to `usb_put_function_instance'
riscv64-unknown-linux-gnu-ld: net/9p/trans_usbg.o: in function `usb9pfs_alloc_instance':
trans_usbg.c:(.text+0x5fe): undefined reference to `config_group_init_type_name'
riscv64-unknown-linux-gnu-ld: net/9p/trans_usbg.o: in function `usb9pfs_alloc':
trans_usbg.c:(.text+0x7aa): undefined reference to `config_ep_by_speed'
riscv64-unknown-linux-gnu-ld: trans_usbg.c:(.text+0x7ea): undefined reference to `config_ep_by_speed'
riscv64-unknown-linux-gnu-ld: net/9p/trans_usbg.o: in function `usb9pfs_set_alt':
trans_usbg.c:(.text+0x828): undefined reference to `alloc_ep_req'
riscv64-unknown-linux-gnu-ld: net/9p/trans_usbg.o: in function `usb9pfs_modexit':
trans_usbg.c:(.exit.text+0x10): undefined reference to `usb_function_unregister'
riscv64-unknown-linux-gnu-ld: net/9p/trans_usbg.o: in function `usb9pfs_modinit':
trans_usbg.c:(.init.text+0x1e): undefined reference to `usb_function_register'
Select the config for NET_9P_USBG to fix it.
Fixes: a3be076dc174 ("net/9p/usbg: Add new usb gadget function transport")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Tested-by: Kexy Biscuit <kexybiscuit@aosc.io>
Link: https://lore.kernel.org/r/20240930081520.2371424-1-ruanjinjie@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.12-2024-10-08:
amdgpu:
- Fix invalid UBSAN warnings
- Fix artifacts in MPO transitions
- Hibernation fix
amdkfd:
- Fix an eviction fence leak
radeon:
- Add late register for connectors
- Always set GEM function pointers
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241008142831.3739244-1-alexander.deucher@amd.com
|
|
dcr_map is called in the previous if and therefore needs to be unmapped.
Fixes: 1ff0fcfcb1a6 ("ibm_newemac: Fix new MAL feature handling")
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Link: https://patch.msgid.link/20241007235711.5714-1-rosenp@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The intent of this debugfs entry is to allow modification of wedging
behavior, either from IGT tests or during manual debug; it should be
marked as writable to properly reflect this. In practice this hasn't
caused a problem because we always access wedged_mode as root, which
ignores file permissions, but it's still misleading to have the entry
incorrectly marked as RO.
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Fixes: 6b8ef44cc0a9 ("drm/xe: Introduce the wedged_mode debugfs")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241002230620.1249258-2-matthew.d.roper@intel.com
(cherry picked from commit 93d93813422758f6c99289de446b19184019ef5a)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
As part of a Wa_22019338487, ensure that GT freq is restored
even when GSC reload is not successful.
Fixes: 3b1592fb7835 ("drm/xe/lnl: Apply Wa_22019338487")
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240925204918.1989574-1-vinay.belgaumkar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 491418a258322bbd7f045e36884d2849b673f23d)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Looks like we are meant to use xa_err() to extract the error encoded in
the ptr.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-7-matthew.auld@intel.com
(cherry picked from commit f040327238b1a8311598c40ac94464e77fff368c)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Looks like we are meant to use xa_err() to extract the error encoded in
the ptr.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-6-matthew.auld@intel.com
(cherry picked from commit 1aa4b7864707886fa40d959483591f3d3937fa28)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Ensure we serialize with completion side to prevent UAF with fence going
out of scope on the stack, since we have no clue if it will fire after
the timeout before we can erase from the xa. Also we have some dependent
loads and stores for which we need the correct ordering, and we lack the
needed barriers. Fix this by grabbing the ct->lock after the wait, which
is also held by the completion side.
v2 (Badal):
- Also print done after acquiring the lock and seeing timeout.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241001084346.98516-5-matthew.auld@intel.com
(cherry picked from commit 52789ce35c55ccd30c4b67b9cc5b2af55e0122ea)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
|
|
Most qdiscs maintain their backlog using qdisc_pkt_len(skb)
on the assumption it is invariant between the enqueue()
and dequeue() handlers.
Unfortunately syzbot can crash a host rather easily using
a TBF + SFQ combination, with an STAB on SFQ [1]
We can't support TCA_STAB on arbitrary level, this would
require to maintain per-qdisc storage.
[1]
[ 88.796496] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 88.798611] #PF: supervisor read access in kernel mode
[ 88.799014] #PF: error_code(0x0000) - not-present page
[ 88.799506] PGD 0 P4D 0
[ 88.799829] Oops: Oops: 0000 [#1] SMP NOPTI
[ 88.800569] CPU: 14 UID: 0 PID: 2053 Comm: b371744477 Not tainted 6.12.0-rc1-virtme #1117
[ 88.801107] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 88.801779] RIP: 0010:sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
[ 88.802544] Code: 0f b7 50 12 48 8d 04 d5 00 00 00 00 48 89 d6 48 29 d0 48 8b 91 c0 01 00 00 48 c1 e0 03 48 01 c2 66 83 7a 1a 00 7e c0 48 8b 3a <4c> 8b 07 4c 89 02 49 89 50 08 48 c7 47 08 00 00 00 00 48 c7 07 00
All code
========
0: 0f b7 50 12 movzwl 0x12(%rax),%edx
4: 48 8d 04 d5 00 00 00 lea 0x0(,%rdx,8),%rax
b: 00
c: 48 89 d6 mov %rdx,%rsi
f: 48 29 d0 sub %rdx,%rax
12: 48 8b 91 c0 01 00 00 mov 0x1c0(%rcx),%rdx
19: 48 c1 e0 03 shl $0x3,%rax
1d: 48 01 c2 add %rax,%rdx
20: 66 83 7a 1a 00 cmpw $0x0,0x1a(%rdx)
25: 7e c0 jle 0xffffffffffffffe7
27: 48 8b 3a mov (%rdx),%rdi
2a:* 4c 8b 07 mov (%rdi),%r8 <-- trapping instruction
2d: 4c 89 02 mov %r8,(%rdx)
30: 49 89 50 08 mov %rdx,0x8(%r8)
34: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi)
3b: 00
3c: 48 rex.W
3d: c7 .byte 0xc7
3e: 07 (bad)
...
Code starting with the faulting instruction
===========================================
0: 4c 8b 07 mov (%rdi),%r8
3: 4c 89 02 mov %r8,(%rdx)
6: 49 89 50 08 mov %rdx,0x8(%r8)
a: 48 c7 47 08 00 00 00 movq $0x0,0x8(%rdi)
11: 00
12: 48 rex.W
13: c7 .byte 0xc7
14: 07 (bad)
...
[ 88.803721] RSP: 0018:ffff9a1f892b7d58 EFLAGS: 00000206
[ 88.804032] RAX: 0000000000000000 RBX: ffff9a1f8420c800 RCX: ffff9a1f8420c800
[ 88.804560] RDX: ffff9a1f81bc1440 RSI: 0000000000000000 RDI: 0000000000000000
[ 88.805056] RBP: ffffffffc04bb0e0 R08: 0000000000000001 R09: 00000000ff7f9a1f
[ 88.805473] R10: 000000000001001b R11: 0000000000009a1f R12: 0000000000000140
[ 88.806194] R13: 0000000000000001 R14: ffff9a1f886df400 R15: ffff9a1f886df4ac
[ 88.806734] FS: 00007f445601a740(0000) GS:ffff9a2e7fd80000(0000) knlGS:0000000000000000
[ 88.807225] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 88.807672] CR2: 0000000000000000 CR3: 000000050cc46000 CR4: 00000000000006f0
[ 88.808165] Call Trace:
[ 88.808459] <TASK>
[ 88.808710] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434)
[ 88.809261] ? page_fault_oops (arch/x86/mm/fault.c:715)
[ 88.809561] ? exc_page_fault (./arch/x86/include/asm/irqflags.h:26 ./arch/x86/include/asm/irqflags.h:87 ./arch/x86/include/asm/irqflags.h:147 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
[ 88.809806] ? asm_exc_page_fault (./arch/x86/include/asm/idtentry.h:623)
[ 88.810074] ? sfq_dequeue (net/sched/sch_sfq.c:272 net/sched/sch_sfq.c:499) sch_sfq
[ 88.810411] sfq_reset (net/sched/sch_sfq.c:525) sch_sfq
[ 88.810671] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
[ 88.810950] tbf_reset (./include/linux/timekeeping.h:169 net/sched/sch_tbf.c:334) sch_tbf
[ 88.811208] qdisc_reset (./include/linux/skbuff.h:2135 ./include/linux/skbuff.h:2441 ./include/linux/skbuff.h:3304 ./include/linux/skbuff.h:3310 net/sched/sch_generic.c:1036)
[ 88.811484] netif_set_real_num_tx_queues (./include/linux/spinlock.h:396 ./include/net/sch_generic.h:768 net/core/dev.c:2958)
[ 88.811870] __tun_detach (drivers/net/tun.c:590 drivers/net/tun.c:673)
[ 88.812271] tun_chr_close (drivers/net/tun.c:702 drivers/net/tun.c:3517)
[ 88.812505] __fput (fs/file_table.c:432 (discriminator 1))
[ 88.812735] task_work_run (kernel/task_work.c:230)
[ 88.813016] do_exit (kernel/exit.c:940)
[ 88.813372] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:58 (discriminator 4))
[ 88.813639] ? handle_mm_fault (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/memcontrol.h:1022 ./include/linux/memcontrol.h:1045 ./include/linux/memcontrol.h:1052 mm/memory.c:5928 mm/memory.c:6088)
[ 88.813867] do_group_exit (kernel/exit.c:1070)
[ 88.814138] __x64_sys_exit_group (kernel/exit.c:1099)
[ 88.814490] x64_sys_call (??:?)
[ 88.814791] do_syscall_64 (arch/x86/entry/common.c:52 (discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
[ 88.815012] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 88.815495] RIP: 0033:0x7f44560f1975
Fixes: 175f9c1bba9b ("net_sched: Add size table for qdiscs")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20241007184130.3960565-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The previous commit introduced the use of CLONE_NEWTIME without including
<sched.h> which contains its definition.
Add an explicit include of <sched.h> to ensure that CLONE_NEWTIME
is correctly defined before it is used.
Fixes: 2aec90036dcd ("selftests: vDSO: ensure vgetrandom works in a time namespace")
Signed-off-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Sporadic issues, such as PHY access loss, have been observed on I219 (19)
devices. It was found that these devices have hardware more closely
related to ADP than MTP and the issues were caused by taking MTP-specific
flows.
Change the MAC and board types of these devices from MTP to ADP to
correctly reflect the LAN hardware, and flows, of these devices.
Fixes: db2d737d63c5 ("e1000e: Separate MTP board type from ADP")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com>
Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Commit 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
changed igb_io_error_detected() to ignore non-fatal pcie errors in order
to avoid hung task that can happen when igb_down() is called multiple
times. This caused an issue when processing transient non-fatal errors.
igb_io_resume(), which is called after igb_io_error_detected(), assumes
that device is brought down by igb_io_error_detected() if the interface
is up. This resulted in panic with stacktrace below.
[ T3256] igb 0000:09:00.0 haeth0: igb: haeth0 NIC Link is Down
[ T292] pcieport 0000:00:1c.5: AER: Uncorrected (Non-Fatal) error received: 0000:09:00.0
[ T292] igb 0000:09:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
[ T292] igb 0000:09:00.0: device [8086:1537] error status/mask=00004000/00000000
[ T292] igb 0000:09:00.0: [14] CmpltTO [ 200.105524,009][ T292] igb 0000:09:00.0: AER: TLP Header: 00000000 00000000 00000000 00000000
[ T292] pcieport 0000:00:1c.5: AER: broadcast error_detected message
[ T292] igb 0000:09:00.0: Non-correctable non-fatal error reported.
[ T292] pcieport 0000:00:1c.5: AER: broadcast mmio_enabled message
[ T292] pcieport 0000:00:1c.5: AER: broadcast resume message
[ T292] ------------[ cut here ]------------
[ T292] kernel BUG at net/core/dev.c:6539!
[ T292] invalid opcode: 0000 [#1] PREEMPT SMP
[ T292] RIP: 0010:napi_enable+0x37/0x40
[ T292] Call Trace:
[ T292] <TASK>
[ T292] ? die+0x33/0x90
[ T292] ? do_trap+0xdc/0x110
[ T292] ? napi_enable+0x37/0x40
[ T292] ? do_error_trap+0x70/0xb0
[ T292] ? napi_enable+0x37/0x40
[ T292] ? napi_enable+0x37/0x40
[ T292] ? exc_invalid_op+0x4e/0x70
[ T292] ? napi_enable+0x37/0x40
[ T292] ? asm_exc_invalid_op+0x16/0x20
[ T292] ? napi_enable+0x37/0x40
[ T292] igb_up+0x41/0x150
[ T292] igb_io_resume+0x25/0x70
[ T292] report_resume+0x54/0x70
[ T292] ? report_frozen_detected+0x20/0x20
[ T292] pci_walk_bus+0x6c/0x90
[ T292] ? aer_print_port_info+0xa0/0xa0
[ T292] pcie_do_recovery+0x22f/0x380
[ T292] aer_process_err_devices+0x110/0x160
[ T292] aer_isr+0x1c1/0x1e0
[ T292] ? disable_irq_nosync+0x10/0x10
[ T292] irq_thread_fn+0x1a/0x60
[ T292] irq_thread+0xe3/0x1a0
[ T292] ? irq_set_affinity_notifier+0x120/0x120
[ T292] ? irq_affinity_notify+0x100/0x100
[ T292] kthread+0xe2/0x110
[ T292] ? kthread_complete_and_exit+0x20/0x20
[ T292] ret_from_fork+0x2d/0x50
[ T292] ? kthread_complete_and_exit+0x20/0x20
[ T292] ret_from_fork_asm+0x11/0x20
[ T292] </TASK>
To fix this issue igb_io_resume() checks if the interface is running and
the device is not down this means igb_io_error_detected() did not bring
the device down and there is no need to bring it up.
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Reviewed-by: Yuanyuan Zhong <yzhong@purestorage.com>
Fixes: 004d25060c78 ("igb: Fix igb_down hung on surprise removal")
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
This patch addresses a macvlan leak issue in the i40e driver caused by
concurrent access to vsi->mac_filter_hash. The leak occurs when multiple
threads attempt to modify the mac_filter_hash simultaneously, leading to
inconsistent state and potential memory leaks.
To fix this, we now wrap the calls to i40e_del_mac_filter() and zeroing
vf->default_lan_addr.addr with spin_lock/unlock_bh(&vsi->mac_filter_hash_lock),
ensuring atomic operations and preventing concurrent access.
Additionally, we add lockdep_assert_held(&vsi->mac_filter_hash_lock) in
i40e_add_mac_filter() to help catch similar issues in the future.
Reproduction steps:
1. Spawn VFs and configure port vlan on them.
2. Trigger concurrent macvlan operations (e.g., adding and deleting
portvlan and/or mac filters).
3. Observe the potential memory leak and inconsistent state in the
mac_filter_hash.
This synchronization ensures the integrity of the mac_filter_hash and prevents
the described leak.
Fixes: fed0d9f13266 ("i40e: Fix VF's MAC Address change on VM")
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Improve the error and skip condition messages to let the developer know
precisely where a test has failed. Also make better use of the ksft api
for this.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Rather than building on supported archs, build on all archs, and then
use the presence of the symbol in the vDSO to either skip the test or
move forward with it.
Note that this means that this test no longer checks whether the symbol
was correctly added to the kernel. But hopefully this will be clear
enough to developers and we'll cross our fingers that symbols aren't
removed by accident and not caught after this change.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Rather than using symlinks to find the vgetrandom-chacha.S file for each
arch, store this in a file that uses the compiler to determine
architecture, and then make use of weak symbols to skip the test on
architectures that don't provide the code.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
|
|
Increasing MSI-X value on a VF leads to invalid memory operations. This
is caused by not reallocating some arrays.
Reproducer:
modprobe ice
echo 0 > /sys/bus/pci/devices/$PF_PCI/sriov_drivers_autoprobe
echo 1 > /sys/bus/pci/devices/$PF_PCI/sriov_numvfs
echo 17 > /sys/bus/pci/devices/$VF0_PCI/sriov_vf_msix_count
Default MSI-X is 16, so 17 and above triggers this issue.
KASAN reports:
BUG: KASAN: slab-out-of-bounds in ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
Read of size 8 at addr ffff8888b937d180 by task bash/28433
(...)
Call Trace:
(...)
? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
kasan_report+0xed/0x120
? ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
ice_vsi_alloc_ring_stats+0x38d/0x4b0 [ice]
ice_vsi_cfg_def+0x3360/0x4770 [ice]
? mutex_unlock+0x83/0xd0
? __pfx_ice_vsi_cfg_def+0x10/0x10 [ice]
? __pfx_ice_remove_vsi_lkup_fltr+0x10/0x10 [ice]
ice_vsi_cfg+0x7f/0x3b0 [ice]
ice_vf_reconfig_vsi+0x114/0x210 [ice]
ice_sriov_set_msix_vec_count+0x3d0/0x960 [ice]
sriov_vf_msix_count_store+0x21c/0x300
(...)
Allocated by task 28201:
(...)
ice_vsi_cfg_def+0x1c8e/0x4770 [ice]
ice_vsi_cfg+0x7f/0x3b0 [ice]
ice_vsi_setup+0x179/0xa30 [ice]
ice_sriov_configure+0xcaa/0x1520 [ice]
sriov_numvfs_store+0x212/0x390
(...)
To fix it, use ice_vsi_rebuild() instead of ice_vf_reconfig_vsi(). This
causes the required arrays to be reallocated taking the new queue count
into account (ice_vsi_realloc_stat_arrays()). Set req_txq and req_rxq
before ice_vsi_rebuild(), so that realloc uses the newly set queue
count.
Additionally, ice_vsi_rebuild() does not remove VSI filters
(ice_fltr_remove_all()), so ice_vf_init_host_cfg() is no longer
necessary.
Reported-by: Jacob Keller <jacob.e.keller@intel.com>
Fixes: 2a2cb4c6c181 ("ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()")
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
Triggering the reset while in switchdev mode causes
errors[1]. Rules are already removed by this time
because switch content is flushed in case of the reset.
This means that rules were deleted from HW but SW
still thinks they exist so when we get
SWITCHDEV_FDB_DEL_TO_DEVICE notification we try to
delete not existing rule.
We can avoid these errors by clearing the rules
early in the reset flow before they are removed from HW.
Switchdev API will get notified that the rule was removed
so we won't get SWITCHDEV_FDB_DEL_TO_DEVICE notification.
Remove unnecessary ice_clear_sw_switch_recipes.
[1]
ice 0000:01:00.0: Failed to delete FDB forward rule, err: -2
ice 0000:01:00.0: Failed to delete FDB guard rule, err: -2
Fixes: 7c945a1a8e5f ("ice: Switchdev FDB events support")
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
netif_is_ice() works by checking the pointer to netdev ops. However, it
only checks for the default ice_netdev_ops, not ice_netdev_safe_mode_ops,
so in Safe Mode it always returns false, which is unintuitive. While it
doesn't look like netif_is_ice() is currently being called anywhere in Safe
Mode, this could change and potentially lead to unexpected behaviour.
Fixes: df006dd4b1dc ("ice: Add initial support framework for LAG")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
If DDP package is missing or corrupted, the driver should enter Safe Mode.
Instead, an error is returned and probe fails.
To fix this, don't exit init if ice_init_ddp_config() returns an error.
Repro:
* Remove or rename DDP package (/lib/firmware/intel/ice/ddp/ice.pkg)
* Load ice
Fixes: cc5776fe1832 ("ice: Enable switching default Tx scheduler topology")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- ops.enqueue() didn't have a way to tell whether select_task_rq_scx()
and thus ops.select() were skipped. Some schedulers were incorrectly
using SCX_ENQ_WAKEUP. Add SCX_ENQ_CPU_SELECTED and fix scx_qmap using
it.
- Remove a spurious WARN_ON_ONCE() in scx_cgroup_exit()
- Fix error information clobbering during load
- Add missing __weak markers to BPF helper declarations
- Doc update
* tag 'sched_ext-for-6.12-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Documentation: Update instructions for running example schedulers
sched_ext, scx_qmap: Add and use SCX_ENQ_CPU_SELECTED
sched/core: Add ENQUEUE_RQ_SELECTED to indicate whether ->select_task_rq() was called
sched/core: Make select_task_rq() take the pointer to wake_flags instead of value
sched_ext: scx_cgroup_exit() may be called without successful scx_cgroup_init()
sched_ext: Improve error reporting during loading
sched_ext: Add __weak markers to BPF helper function decalarations
|