Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
"This contains the work to extend move_mount() to allow adding a mount
beneath the topmost mount of a mount stack.
There are two LWN articles about this. One covers the original patch
series in [1]. The other in [2] summarizes the session and roughly the
discussion between Al and me at LSFMM. The second article also goes
into some good questions from attendees.
Since all details are found in the relevant commit with a technical
dive into semantics and locking at the end I'm only adding the
motivation and core functionality for this from commit message and
leave out the invasive details. The code is also heavily commented and
annotated as well which was explicitly requested.
TL;DR:
> mount -t ext4 /dev/sda /mnt
|
└─/mnt /dev/sda ext4
> mount --beneath -t xfs /dev/sdb /mnt
|
└─/mnt /dev/sdb xfs
└─/mnt /dev/sda ext4
> umount /mnt
|
└─/mnt /dev/sdb xfs
The longer motivation is that various distributions are adding or are
in the process of adding support for system extensions and in the
future configuration extensions through various tools. A more detailed
explanation on system and configuration extensions can be found on the
manpage which is listed below at [3].
System extension images may – dynamically at runtime — extend the
/usr/ and /opt/ directory hierarchies with additional files. This is
particularly useful on immutable system images where a /usr/ and/or
/opt/ hierarchy residing on a read-only file system shall be extended
temporarily at runtime without making any persistent modifications.
When one or more system extension images are activated, their /usr/
and /opt/ hierarchies are combined via overlayfs with the same
hierarchies of the host OS, and the host /usr/ and /opt/ overmounted
with it ("merging"). When they are deactivated, the mount point is
disassembled — again revealing the unmodified original host version of
the hierarchy ("unmerging"). Merging thus makes the extension's
resources suddenly appear below the /usr/ and /opt/ hierarchies as if
they were included in the base OS image itself. Unmerging makes them
disappear again, leaving in place only the files that were shipped
with the base OS image itself.
System configuration images are similar but operate on directories
containing system or service configuration.
On nearly all modern distributions mount propagation plays a crucial
role and the rootfs of the OS is a shared mount in a peer group
(usually with peer group id 1):
TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID
/ / ext4 shared:1 29 1
On such systems all services and containers run in a separate mount
namespace and are pivot_root()ed into their rootfs. A separate mount
namespace is almost always used as it is the minimal isolation
mechanism services have. But usually they are even much more isolated
up to the point where they almost become indistinguishable from
containers.
Mount propagation again plays a crucial role here. The rootfs of all
these services is a slave mount to the peer group of the host rootfs.
This is done so the service will receive mount propagation events from
the host when certain files or directories are updated.
In addition, the rootfs of each service, container, and sandbox is
also a shared mount in its separate peer group:
TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID
/ / ext4 shared:24 master:1 71 47
For people not too familiar with mount propagation, the master:1 means
that this is a slave mount to peer group 1. Which as one can see is
the host rootfs as indicated by shared:1 above. The shared:24
indicates that the service rootfs is a shared mount in a separate peer
group with peer group id 24.
A service may run other services. Such nested services will also have
a rootfs mount that is a slave to the peer group of the outer service
rootfs mount.
For containers things are just slighly different. A container's rootfs
isn't a slave to the service's or host rootfs' peer group. The rootfs
mount of a container is simply a shared mount in its own peer group:
TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID
/home/ubuntu/debian-tree / ext4 shared:99 61 60
So whereas services are isolated OS components a container is treated
like a separate world and mount propagation into it is restricted to a
single well known mount that is a slave to the peer group of the
shared mount /run on the host:
TARGET SOURCE FSTYPE PROPAGATION MNT_ID PARENT_ID
/propagate/debian-tree /run/host/incoming tmpfs master:5 71 68
Here, the master:5 indicates that this mount is a slave to the peer
group with peer group id 5. This allows to propagate mounts into the
container and served as a workaround for not being able to insert
mounts into mount namespaces directly. But the new mount api does
support inserting mounts directly. For the interested reader the
blogpost in [4] might be worth reading where I explain the old and the
new approach to inserting mounts into mount namespaces.
Containers of course, can themselves be run as services. They often
run full systems themselves which means they again run services and
containers with the exact same propagation settings explained above.
The whole system is designed so that it can be easily updated,
including all services in various fine-grained ways without having to
enter every single service's mount namespace which would be
prohibitively expensive. The mount propagation layout has been
carefully chosen so it is possible to propagate updates for system
extensions and configurations from the host into all services.
The simplest model to update the whole system is to mount on top of
/usr, /opt, or /etc on the host. The new mount on /usr, /opt, or /etc
will then propagate into every service. This works cleanly the first
time. However, when the system is updated multiple times it becomes
necessary to unmount the first update on /opt, /usr, /etc and then
propagate the new update. But this means, there's an interval where
the old base system is accessible. This has to be avoided to protect
against downgrade attacks.
The vfs already exposes a mechanism to userspace whereby mounts can be
mounted beneath an existing mount. Such mounts are internally referred
to as "tucked". The patch series exposes the ability to mount beneath
a top mount through the new MOVE_MOUNT_BENEATH flag for the
move_mount() system call. This allows userspace to seamlessly upgrade
mounts. After this series the only thing that will have changed is
that mounting beneath an existing mount can be done explicitly instead
of just implicitly.
The crux is that the proposed mechanism already exists and that it is
so powerful as to cover cases where mounts are supposed to be updated
with new versions. Crucially, it offers an important flexibility.
Namely that updates to a system may either be forced or can be delayed
and the umount of the top mount be left to a service if it is a
cooperative one"
Link: https://lwn.net/Articles/927491 [1]
Link: https://lwn.net/Articles/934094 [2]
Link: https://man7.org/linux/man-pages/man8/systemd-sysext.8.html [3]
Link: https://brauner.io/2023/02/28/mounting-into-mount-namespaces.html [4]
Link: https://github.com/flatcar/sysext-bakery
Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_1
Link: https://fedoraproject.org/wiki/Changes/Unified_Kernel_Support_Phase_2
Link: https://github.com/systemd/systemd/pull/26013
* tag 'v6.5/vfs.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: allow to mount beneath top mount
fs: use a for loop when locking a mount
fs: properly document __lookup_mnt()
fs: add path_mounted()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs file handling updates from Christian Brauner:
"This contains Amir's work to fix a long-standing problem where an
unprivileged overlayfs mount can be used to avoid fanotify permission
events that were requested for an inode or superblock on the
underlying filesystem.
Some background about files opened in overlayfs. If a file is opened
in overlayfs @file->f_path will refer to a "fake" path. What this
means is that while @file->f_inode will refer to inode of the
underlying layer, @file->f_path refers to an overlayfs
{dentry,vfsmount} pair. The reasons for doing this are out of scope
here but it is the reason why the vfs has been providing the
open_with_fake_path() helper for overlayfs for very long time now. So
nothing new here.
This is for sure not very elegant and everyone including the overlayfs
maintainers agree. Improving this significantly would involve more
fragile and potentially rather invasive changes.
In various codepaths access to the path of the underlying filesystem
is needed for such hybrid file. The best example is fsnotify where
this becomes security relevant. Passing the overlayfs
@file->f_path->dentry will cause fsnotify to skip generating fsnotify
events registered on the underlying inode or superblock.
To fix this we extend the vfs provided open_with_fake_path() concept
for overlayfs to create a backing file container that holds the real
path and to expose a helper that can be used by relevant callers to
get access to the path of the underlying filesystem through the new
file_real_path() helper. This pattern is similar to what we do in
d_real() and d_real_inode().
The first beneficiary is fsnotify and fixes the security sensitive
problem mentioned above.
There's a couple of nice cleanups included as well.
Over time, the old open_with_fake_path() helper added specifically for
overlayfs a long time ago started to get used in other places such as
cachefiles. Even though cachefiles have nothing to do with hybrid
files.
The only reason cachefiles used that concept was that files opened
with open_with_fake_path() aren't charged against the caller's open
file limit by raising FMODE_NOACCOUNT. It's just mere coincidence that
both overlayfs and cachefiles need to ensure to not overcharge the
caller for their internal open calls.
So this work disentangles FMODE_NOACCOUNT use cases and backing file
use-cases by adding the FMODE_BACKING flag which indicates that the
file can be used to retrieve the backing file of another filesystem.
(Fyi, Jens will be sending you a really nice cleanup from Christoph
that gets rid of 3 FMODE_* flags otherwise this would be the last
fmode_t bit we'd be using.)
So now overlayfs becomes the sole user of the renamed
open_with_fake_path() helper which is now named backing_file_open().
For internal kernel users such as cachefiles that are only interested
in FMODE_NOACCOUNT but not in FMODE_BACKING we add a new
kernel_file_open() helper which opens a file without being charged
against the caller's open file limit. All new helpers are properly
documented and clearly annotated to mention their special uses.
We also rename vfs_tmpfile_open() to kernel_tmpfile_open() to clearly
distinguish it from vfs_tmpfile() and align it the other kernel_*()
internal helpers"
* tag 'v6.5/vfs.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ovl: enable fsnotify events on underlying real files
fs: use backing_file container for internal files with "fake" f_path
fs: move kmem_cache_zalloc() into alloc_empty_file*() helpers
fs: use a helper for opening kernel internal files
fs: rename {vfs,kernel}_tmpfile_open()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs rename locking updates from Christian Brauner:
"This contains the work from Jan to fix problems with cross-directory
renames originally reported in [1].
To quickly sum it up some filesystems (so far we know at least about
ext4, udf, f2fs, ocfs2, likely also reiserfs, gfs2 and others) need to
lock the directory when it is being renamed into another directory.
This is because we need to update the parent pointer in the directory
in that case and if that races with other operations on the directory,
in particular a conversion from one directory format into another, bad
things can happen.
So far we've done the locking in the filesystem code but recently
Darrick pointed out in [2] that the RENAME_EXCHANGE case was missing.
That one is particularly nasty because RENAME_EXCHANGE can arbitrarily
mix regular files and directories and proper lock ordering is not
achievable in the filesystems alone.
This patch set adds locking into vfs_rename() so that not only parent
directories but also moved inodes, regardless of whether they are
directories or not, are locked when calling into the filesystem.
This means establishing a locking order for unrelated directories. New
helpers are added for this purpose and our documentation is updated to
cover this in detail.
The locking is now actually easier to follow as we now always lock
source and target. We've always locked the target independent of
whether it was a directory or file and we've always locked source if
it was a regular file. The exact details for why this came about can
be found in [3] and [4]"
Link: https://lore.kernel.org/all/20230117123735.un7wbamlbdihninm@quack3 [1]
Link: https://lore.kernel.org/all/20230517045836.GA11594@frogsfrogsfrogs [2]
Link: https://lore.kernel.org/all/20230526-schrebergarten-vortag-9cd89694517e@brauner [3]
Link: https://lore.kernel.org/all/20230530-seenotrettung-allrad-44f4b00139d4@brauner [4]
* tag 'v6.5/vfs.rename.locking' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: Restrict lock_two_nondirectories() to non-directory inodes
fs: Lock moved directories
fs: Establish locking order for unrelated directories
Revert "f2fs: fix potential corruption when moving a directory"
Revert "udf: Protect rename against modification of moved directory"
ext4: Remove ext4 locking of moved directory
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"Miscellaneous features, cleanups, and fixes for vfs and individual fs
Features:
- Use mode 0600 for file created by cachefilesd so it can be run by
unprivileged users. This aligns them with directories which are
already created with mode 0700 by cachefilesd
- Reorder a few members in struct file to prevent some false sharing
scenarios
- Indicate that an eventfd is used a semaphore in the eventfd's
fdinfo procfs file
- Add a missing uapi header for eventfd exposing relevant uapi
defines
- Let the VFS protect transitions of a superblock from read-only to
read-write in addition to the protection it already provides for
transitions from read-write to read-only. Protecting read-only to
read-write transitions allows filesystems such as ext4 to perform
internal writes, keeping writers away until the transition is
completed
Cleanups:
- Arnd removed the architecture specific arch_report_meminfo()
prototypes and added a generic one into procfs.h. Note, we got a
report about a warning in amdpgpu codepaths that suggested this was
bisectable to this change but we concluded it was a false positive
- Remove unused parameters from split_fs_names()
- Rename put_and_unmap_page() to unmap_and_put_page() to let the name
reflect the order of the cleanup operation that has to unmap before
the actual put
- Unexport buffer_check_dirty_writeback() as it is not used outside
of block device aops
- Stop allocating aio rings from highmem
- Protecting read-{only,write} transitions in the VFS used open-coded
barriers in various places. Replace them with proper little helpers
and document both the helpers and all barrier interactions involved
when transitioning between read-{only,write} states
- Use flexible array members in old readdir codepaths
Fixes:
- Use the correct type __poll_t for epoll and eventfd
- Replace all deprecated strlcpy() invocations, whose return value
isn't checked with an equivalent strscpy() call
- Fix some kernel-doc warnings in fs/open.c
- Reduce the stack usage in jffs2's xattr codepaths finally getting
rid of this: fs/jffs2/xattr.c:887:1: error: the frame size of 1088
bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
royally annoying compilation warning
- Use __FMODE_NONOTIFY instead of FMODE_NONOTIFY where an int and not
fmode_t is required to avoid fmode_t to integer degradation
warnings
- Create coredumps with O_WRONLY instead of O_RDWR. There's a long
explanation in that commit how O_RDWR is actually a bug which we
found out with the help of Linus and git archeology
- Fix "no previous prototype" warnings in the pipe codepaths
- Add overflow calculations for remap_verify_area() as a signed
addition overflow could be triggered in xfstests
- Fix a null pointer dereference in sysv
- Use an unsigned variable for length calculations in jfs avoiding
compilation warnings with gcc 13
- Fix a dangling pipe pointer in the watch queue codepath
- The legacy mount option parser provided as a fallback by the VFS
for filesystems not yet converted to the new mount api did prefix
the generated mount option string with a leading ',' causing issues
for some filesystems
- Fix a repeated word in a comment in fs.h
- autofs: Update the ctime when mtime is updated as mandated by
POSIX"
* tag 'v6.5/vfs.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (27 commits)
readdir: Replace one-element arrays with flexible-array members
fs: Provide helpers for manipulating sb->s_readonly_remount
fs: Protect reconfiguration of sb read-write from racing writes
eventfd: add a uapi header for eventfd userspace APIs
autofs: set ctime as well when mtime changes on a dir
eventfd: show the EFD_SEMAPHORE flag in fdinfo
fs/aio: Stop allocating aio rings from HIGHMEM
fs: Fix comment typo
fs: unexport buffer_check_dirty_writeback
fs: avoid empty option when generating legacy mount string
watch_queue: prevent dangling pipe pointer
fs.h: Optimize file struct to prevent false sharing
highmem: Rename put_and_unmap_page() to unmap_and_put_page()
cachefiles: Allow the cache to be non-root
init: remove unused names parameter in split_fs_names()
jfs: Use unsigned variable for length calculations
fs/sysv: Null check to prevent null-ptr-deref bug
fs: use UB-safe check for signed addition overflow in remap_verify_area
procfs: consolidate arch_report_meminfo declaration
fs: pipe: reveal missing function protoypes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull ntfs updates from Christian Brauner:
"A pile of various smaller fixes for ntfs"
* tag 'v6.5/fs.ntfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
ntfs: do not dereference a null ctx on error
ntfs: Remove unneeded semicolon
ntfs: Correct spelling
ntfs: remove redundant initialization to pointer cb_sb_start
|
|
Pull auxdisplay update from Miguel Ojeda:
"A single cleanup for i2c drivers to switch them back to use
'.probe()'"
* tag 'auxdisplay-6.5' of https://github.com/ojeda/linux:
auxdisplay: Switch i2c drivers back to use .probe()
|
|
Pull rust updates from Miguel Ojeda:
"A fairly small one in terms of feature additions. Most of the changes
in terms of lines come from the upgrade to the new version of the
toolchain (which in turn is big due to the vendored 'alloc' crate).
Upgrade to Rust 1.68.2:
- This is the first such upgrade, and we will try to update it often
from now on, in order to remain close to the latest release, until
a minimum version (which is "in the future") can be established.
The upgrade brings the stabilization of 4 features we used (and 2
more that we used in our old 'rust' branch).
Commit 3ed03f4da06e ("rust: upgrade to Rust 1.68.2") contains the
details and rationale.
pin-init API:
- Several internal improvements and fixes to the pin-init API, e.g.
allowing to use 'Self' in a struct definition with '#[pin_data]'.
'error' module:
- New 'name()' method for the 'Error' type (with 'errname()'
integration), used to implement the 'Debug' trait for 'Error'.
- Add error codes from 'include/linux/errno.h' to the list of Rust
'Error' constants.
- Allow specifying error type on the 'Result' type (with the default
still being our usual 'Error' type).
'str' module:
- 'TryFrom' implementation for 'CStr', and new 'to_cstring()' method
based on it.
'sync' module:
- Implement 'AsRef' trait for 'Arc', allowing to use 'Arc' in code
that is generic over smart pointer types.
- Add 'ptr_eq' method to 'Arc' for easier, less error prone
comparison between two 'Arc' pointers.
- Reword the 'Send' safety comment for 'Arc', and avoid referencing
it from the 'Sync' one.
'task' module:
- Implement 'Send' marker for 'Task'.
'types' module:
- Implement 'Send' and 'Sync' markers for 'ARef<T>' when 'T' is
'AlwaysRefCounted', 'Send' and 'Sync'.
Other changes:
- Documentation improvements and '.gitattributes' change to start
using the Rust diff driver"
* tag 'rust-6.5' of https://github.com/Rust-for-Linux/linux:
rust: error: `impl Debug` for `Error` with `errname()` integration
rust: task: add `Send` marker to `Task`
rust: specify when `ARef` is thread safe
rust: sync: reword the `Arc` safety comment for `Sync`
rust: sync: reword the `Arc` safety comment for `Send`
rust: sync: implement `AsRef<T>` for `Arc<T>`
rust: sync: add `Arc::ptr_eq`
rust: error: add missing error codes
rust: str: add conversion from `CStr` to `CString`
rust: error: allow specifying error type on `Result`
rust: init: update macro expansion example in docs
rust: macros: replace Self with the concrete type in #[pin_data]
rust: macros: refactor generics parsing of `#[pin_data]` into its own function
rust: macros: fix usage of `#[allow]` in `quote!`
docs: rust: point directly to the standalone installers
.gitattributes: set diff driver for Rust source code files
rust: upgrade to Rust 1.68.2
rust: arc: fix intra-doc link in `Arc<T>::init`
rust: alloc: clarify what is the upstream version
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Alexander Gordeev:
- Use correct type for size of memory allocated for ELF core header on
kernel crash.
- Fix insecure W+X mapping warning when KASAN shadow memory range is
not aligned on page boundary.
- Avoid allocation of short by one page KASAN shadow memory when the
original memory range is less than (PAGE_SIZE << 3).
- Fix virtual vs physical address confusion in physical memory
enumerator. It is not a real issue, since virtual and physical
addresses are currently the same.
- Set CONFIG_NET_TC_SKB_EXT=y in s390 config files as it is required
for offloading TC as well as bridges on switchdev capable ConnectX
devices.
* tag 's390-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/defconfigs: set CONFIG_NET_TC_SKB_EXT=y
s390/boot: fix physmem_info virtual vs physical address confusion
s390/kasan: avoid short by one page shadow memory
s390/kasan: fix insecure W+X mapping warning
s390/crash: use the correct type for memory allocation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux
Pull nios2 updates from Dinh Nguyen:
- Convert pgtable constructor/destructors to ptdesc
- Replace strlcpy with strscpy
* tag 'nios2_updates_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: Replace all non-returning strlcpy with strscpy
nios2: Convert __pte_free_tlb() to use ptdescs
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"Nothing fancy. Two driver and one DT binding fix"
* tag 'i2c-for-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle
i2c: qup: Add missing unwind goto in qup_i2c_probe()
dt-bindings: i2c: opencores: Add missing type for "regstep"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Borislav Petkov:
- Drop the __weak attribute from a function prototype as it otherwise
leads to the function getting replaced by a dummy stub
- Fix the umask value setup of the frontend event as former is
different on two Intel cores
* tag 'perf_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL
perf/core: Drop __weak attribute from arch_perf_update_userpage() prototype
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fix from Borislav Petkov:
- Add a ORC format hash to vmlinux and modules in order for other tools
which use it, to detect changes to it and adapt accordingly
* tag 'objtool_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Add ELF section with ORC version identifier
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Do not use set_pgd() when updating the KASLR trampoline pgd entry
because that updates the user PGD too on KPTI builds, resulting in
memory corruption
- Prevent a panic in the IO-APIC setup code due to conflicting command
line parameters
* tag 'x86_urgent_for_v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys
x86/mm: Avoid using set_pgd() outside of real PGD pages
|
|
Pull drm fixes from Dave Airlie:
"Very quiet last week, just two misc fixes, one dp-mst and one qaic:
qaic:
- dma-buf import fix
dp-mst:
- fix NULL ptr deref"
[ It turns out it was a quiet week because Alex Deucher hadn't sent in
his pending AMD changes. So they are coming next - Linus ]
* tag 'drm-fixes-2023-06-23' of git://anongit.freedesktop.org/drm/drm:
drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
accel/qaic: Call DRM helper function to destroy prime GEM
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Arnd Bergmann:
"The final bug fixes for Qualcomm and Rockchips came in, all of them
for devicetree files:
- Devices on Qualcomm SC7180/SC7280 that are cache coherent are now
marked so correctly to fix a regression after a change in kernel
behavior
- Rockchips has a few minor changes for correctness of regulator and
cache properties, as well as fixes for incorrect behavior of the
RK3568 PCI controller and reset pins on two boards"
* tag 'arm-fixes-6.4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent
arm64: dts: rockchip: Fix rk356x PCIe register and range mappings
arm64: dts: rockchip: fix button reset pin for nanopi r5c
arm64: dts: rockchip: fix nEXTRST on SOQuartz
arm64: dts: rockchip: add missing cache properties
arm64: dts: rockchip: fix USB regulator on ROCK64
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
"Unfortunately the recent u32 overflow fix was not complete, there was
one conversion left, assertion not triggered by my tests but caught by
Qu's fstests case.
The "cleanup for later" has been promoted to a proper fix and wraps
all uses of the stripe left shift so the diffstat has grown but leaves
no potentially problematic uses.
We should have done it that way before, sorry"
* tag 'for-6.4-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix remaining u32 overflows when left shifting stripe_nr
|
|
Pull block fix from Jens Axboe:
"It's apparently the week of 'fixup something from last week', because
the same is true for this block pull request.
Fix up a lock grab that needs to be IRQ saving, rather than just IRQ
disabling, in the block cgroup code"
* tag 'block-6.4-2023-06-23' of git://git.kernel.dk/linux:
block: make sure local irq is disabled when calling __blkcg_rstat_flush
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull iommu fix from Joerg Roedel:
- Fix potential memory leak in AMD IOMMU domain allocation path
* tag 'iommu-fix-v6.4-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix possible memory leak of 'domain'
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Three oneliner fixes: one for a thinko in SOF SoundWire code and two
HD-audio quirks for ASUS laptops. All device-specific and should be
safe to apply"
* tag 'sound-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Add quirk for ASUS ROG GV601V
ALSA: hda/realtek: Add quirk for ASUS ROG G634Z
ASoC: intel: sof_sdw: Fixup typo in device link checking
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix IRQ initialization in gpiochip_irqchip_add_domain()
- add a missing return value check for platform_get_irq() in
gpio-sifive
- don't free irq_domains which GPIOLIB does not manage
* tag 'gpio-fixes-for-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpiolib: Fix irq_domain resource tracking for gpiochip_irqchip_add_domain()
gpio: sifive: add missing check for platform_get_irq
gpiolib: Fix GPIO chip IRQ initialization restriction
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes
One last Qualcomm ARM64 DeviceTree fix for v6.4
Changes related to cache management for DMA memory caused WiFi to stop
work on SC7180 and SC7280 based products, using TF-A. These changes
marks the relevant device dma-coherent to correct the behavior.
* tag 'qcom-arm64-fixes-for-6.4-2' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sc7280: Mark SCM as dma-coherent for chrome devices
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for trogdor
arm64: dts: qcom: sc7180: Mark SCM as dma-coherent for IDP
dt-bindings: firmware: qcom,scm: Document that SCM can be dma-coherent
Link: https://lore.kernel.org/r/20230622203248.106422-1-andersson@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Dave Airlie reports that gcc-13.1.1 has started complaining about some
of the workqueue code in 32-bit arm builds:
kernel/workqueue.c: In function ‘get_work_pwq’:
kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
| ^
[ ... a couple of other cases ... ]
and while it's not immediately clear exactly why gcc started complaining
about it now, I suspect it's some C23-induced enum type handlign fixup in
gcc-13 is the cause.
Whatever the reason for starting to complain, the code and data types
are indeed disgusting enough that the complaint is warranted.
The wq code ends up creating various "helper constants" (like that
WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of
confused. The mask needs to be 'unsigned long', not some unspecified
enum type.
To make matters worse, the actual "mask and cast to a pointer" is
repeated a couple of times, and the cast isn't even always done to the
right pointer, but - as the error case above - to a 'void *' with then
the compiler finishing the job.
That's now how we roll in the kernel.
So create the masks using the proper types rather than some ambiguous
enumeration, and use a nice helper that actually does the type
conversion in one well-defined place.
Incidentally, this magically makes clang generate better code. That,
admittedly, is really just a sign of clang having been seriously
confused before, and cleaning up the typing unconfuses the compiler too.
Reported-by: Dave Airlie <airlied@gmail.com>
Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Claim clkhi and clklo as integer type to avoid possible calculation
errors caused by data overflow.
Fixes: a55fa9d0e42e ("i2c: imx-lpi2c: add low power i2c bus driver")
Signed-off-by: Clark Wang <xiaoning.wang@nxp.com>
Signed-off-by: Carlos Song <carlos.song@nxp.com>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Smatch Warns:
drivers/i2c/busses/i2c-qup.c:1784 qup_i2c_probe()
warn: missing unwind goto?
The goto label "fail_runtime" and "fail" will disable qup->pclk,
but here qup->pclk failed to obtain, in order to be consistent,
change the direct return to goto label "fail_dma".
Fixes: 9cedf3b2f099 ("i2c: qup: Add bam dma capabilities")
Signed-off-by: Shuai Jiang <d202180596@hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn>
Reviewed-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Cc: <stable@vger.kernel.org> # v4.6+
|
|
"regstep" may be deprecated, but it still needs a type.
Fixes: 8ad69f490516 ("dt-bindings: i2c: convert ocores binding to yaml")
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Peter Korsgaard <peter@korsgaard.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Andi Shyti <andi.shyti@kernel.org>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v6.4:
- Qaic imported dma-buf fix.
- Fix null pointer deref when printing a dp-mst message.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <dev@lankhorst.se>
Link: https://patchwork.freedesktop.org/patch/msgid/e96b1965-ba67-7cc5-2358-826eb5b9b998@lankhorst.se
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from ipsec, bpf, mptcp and netfilter.
Current release - regressions:
- netfilter: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
- eth: mlx5e:
- fix scheduling of IPsec ASO query while in atomic
- free IRQ rmap and notifier on kernel shutdown
Current release - new code bugs:
- phy: manual remove LEDs to ensure correct ordering
Previous releases - regressions:
- mptcp: fix possible divide by zero in recvmsg()
- dsa: revert "net: phy: dp83867: perform soft reset and retain
established link"
Previous releases - always broken:
- sched: netem: acquire qdisc lock in netem_change()
- bpf:
- fix verifier id tracking of scalars on spill
- fix NULL dereference on exceptions
- accept function names that contain dots
- netfilter: disallow element updates of bound anonymous sets
- mptcp: ensure listener is unhashed before updating the sk status
- xfrm:
- add missed call to delete offloaded policies
- fix inbound ipv4/udp/esp packets to UDPv6 dualstack sockets
- selftests: fixes for FIPS mode
- dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling
- eth: sfc: use budget for TX completions
Misc:
- wifi: iwlwifi: add support for SO-F device with PCI id 0x7AF0"
* tag 'net-6.4-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (74 commits)
revert "net: align SO_RCVMARK required privileges with SO_MARK"
net: wwan: iosm: Convert single instance struct member to flexible array
sch_netem: acquire qdisc lock in netem_change()
selftests: forwarding: Fix race condition in mirror installation
wifi: mac80211: report all unusable beacon frames
mptcp: ensure listener is unhashed before updating the sk status
mptcp: drop legacy code around RX EOF
mptcp: consolidate fallback and non fallback state machine
mptcp: fix possible list corruption on passive MPJ
mptcp: fix possible divide by zero in recvmsg()
mptcp: handle correctly disconnect() failures
bpf: Force kprobe multi expected_attach_type for kprobe_multi link
bpf/btf: Accept function names that contain dots
Revert "net: phy: dp83867: perform soft reset and retain established link"
net: mdio: fix the wrong parameters
netfilter: nf_tables: Fix for deleting base chains with payload
netfilter: nfnetlink_osf: fix module autoload
netfilter: nf_tables: drop module reference after updating chain
netfilter: nf_tables: disallow timeout for anonymous sets
netfilter: nf_tables: disallow updates of anonymous sets
...
|
|
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Correctly save/restore PMUSERNR_EL0 when host userspace is using
PMU counters directly
- Fix GICv2 emulation on GICv3 after the locking rework
- Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
document why
Generic:
- Avoid setting page table entries pointing to a deleted memslot if a
host page table entry is changed concurrently with the deletion"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Avoid illegal stage2 mapping on invalid memory slot
KVM: arm64: Use raw_smp_processor_id() in kvm_pmu_probe_armpmu()
KVM: arm64: Restore GICv2-on-GICv3 functionality
KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded
KVM: arm64: PMU: Restore the host's PMUSERENR_EL0
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fix from Michael Ellerman:
- Disable IRQs when switching mm in exit_lazy_flush_tlb() called from
exit_mmap()
Thanks to Nicholas Piggin and Sachin Sant.
* tag 'powerpc-6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64s/radix: Fix exit lazy tlb mm switch with irqs enabled
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fix from Bjorn Helgaas:
- Transfer Intel LGM GW PCIe maintenance from Rahul Tanwar to Chuanhua
Lei (Zhu YiXin)
* tag 'pci-v6.4-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Chuanhua Lei as Intel LGM GW PCIe maintainer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- Fix support for deferred probing for several host drivers
- litex_mmc: Use async probe as it's common for all mmc hosts
- meson-gx: Fix bug when scheduling while atomic
- mmci_stm32: Fix max busy timeout calculation
- sdhci-msm: Disable broken 64-bit DMA on MSM8916
* tag 'mmc-v6.4-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: usdhi60rol0: fix deferred probing
mmc: sunxi: fix deferred probing
mmc: sh_mmcif: fix deferred probing
mmc: sdhci-spear: fix deferred probing
mmc: sdhci-acpi: fix deferred probing
mmc: owl: fix deferred probing
mmc: omap_hsmmc: fix deferred probing
mmc: omap: fix deferred probing
mmc: mvsdio: fix deferred probing
mmc: mtk-sd: fix deferred probing
mmc: meson-gx: fix deferred probing
mmc: bcm2835: fix deferred probing
mmc: litex_mmc: set PROBE_PREFER_ASYNCHRONOUS
mmc: meson-gx: remove redundant mmc_request_done() call from irq context
mmc: mmci: stm32: fix max busy timeout calculation
mmc: sdhci-msm: Disable broken 64-bit DMA on MSM8916
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fix from Hans de Goede:
"One small fix for an AMD PMF driver issue which is causing issues for
users of just released AMD laptop models"
* tag 'platform-drivers-x86-v6.4-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86/amd/pmf: Register notify handler only if SPS is enabled
|
|
Pull io_uring fixes from Jens Axboe:
"A fix for a race condition with poll removal and linked timeouts, and
then a few followup fixes/tweaks for the msg_control patch from last
week.
Not super important, particularly the sparse fixup, as it was broken
before that recent commit. But let's get it sorted for real for this
release, rather than just have it broken a bit differently"
* tag 'io_uring-6.4-2023-06-21' of git://git.kernel.dk/linux:
io_uring/net: use the correct msghdr union member in io_sendmsg_copy_hdr
io_uring/net: disable partial retries for recvmsg with cmsg
io_uring/net: clear msg_controllen on partial sendmsg retry
io_uring/poll: serialize poll linked timer start with poll removal
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"It's late but here are two bug fixes. Both fix problems which can be
severe but are very confined in scope. The risk to most use cases
should be minimal.
- Fix for an old bug which triggers if a cgroup subsystem is
remounted to a different hierarchy while someone is reading its
cgroup.procs/tasks file. The risk is pretty low given how seldom
cgroup subsystems are moved across hierarchies.
- We moved cpus_read_lock() outside of cgroup internal locks a while
ago but forgot to update the legacy_freezer leading to lockdep
triggers. Fixed"
* tag 'cgroup-for-6.4-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Do not corrupt task iteration when rebinding subsystem
cgroup,freezer: hold cpu_hotplug_lock before freezer_mutex in freezer_css_{online,offline}()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 6.4, take #4
- Correctly save/restore PMUSERNR_EL0 when host userspace is using
PMU counters directly
- Fix GICv2 emulation on GICv3 after the locking rework
- Don't use smp_processor_id() in kvm_pmu_probe_armpmu(), and
document why...
|
|
Just like for sc7180 devices using the Chrome bootflow (AKA trogdor
and IDP), sc7280 devices using the Chrome bootflow also need their
firmware marked dma-coherent. On sc7280 this wasn't causing WiFi to
fail to startup, since WiFi works differently there. However, on
sc7280 devices we were still getting the message at bootup after
commit 7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache
invalidation from arch_dma_prep_coherent()"""):
qcom_scm firmware:scm: Assign memory protection call failed -22
qcom_rmtfs_mem 9c900000.memory: assign memory failed
qcom_rmtfs_mem: probe of 9c900000.memory failed with error -22
We should mark SCM properly just like we did for trogdor.
Fixes: 7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
Fixes: 7a1f4e7f740d ("arm64: dts: qcom: sc7280: Add basic dts/dtsi files for sc7280 soc")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230616081440.v2.4.I21dc14a63327bf81c6bb58fe8ed91dbdc9849ee2@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Trogdor devices use firmware backed by TF-A instead of Qualcomm's
normal TZ. On TF-A we end up mapping memory as cacheable.
Specifically, you can see in Trogdor's TF-A code [1] in
qti_sip_mem_assign() that we call qti_mmap_add_dynamic_region() with
MT_RO_DATA. This translates down to MT_MEMORY instead of
MT_NON_CACHEABLE or MT_DEVICE. Apparently Qualcomm's normal TZ
implementation maps the memory as non-cacheable.
Let's add the "dma-coherent" attribute to the SCM for trogdor.
Adding "dma-coherent" like this fixes WiFi on sc7180-trogdor
devices. WiFi was broken as of commit 7bd6680b47fa ("Revert "Revert
"arm64: dma: Drop cache invalidation from
arch_dma_prep_coherent()"""). Specifically at bootup we'd get:
qcom_scm firmware:scm: Assign memory protection call failed -22
qcom_rmtfs_mem 94600000.memory: assign memory failed
qcom_rmtfs_mem: probe of 94600000.memory failed with error -22
From discussion on the mailing lists [2] and over IRC [3], it was
determined that we should always have been tagging the SCM as
dma-coherent on trogdor but that the old "invalidate" happened to make
things work most of the time. Tagging it properly like this is a much
more robust solution.
[1] https://chromium.googlesource.com/chromiumos/third_party/arm-trusted-firmware/+/refs/heads/firmware-trogdor-13577.B/plat/qti/common/src/qti_syscall.c
[2] https://lore.kernel.org/r/20230614165904.1.I279773c37e2c1ed8fbb622ca6d1397aea0023526@changeid
[3] https://oftc.irclog.whitequark.org/linux-msm/2023-06-15
Fixes: 7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
Fixes: 7ec3e67307f8 ("arm64: dts: qcom: sc7180-trogdor: add initial trogdor and lazor dt")
Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230616081440.v2.3.Ic62daa649b47b656b313551d646c4de9a7da4bd4@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
sc7180-idp is, for most intents and purposes, a trogdor device.
Specifically, sc7180-idp is designed to run the same style of firmware
as trogdor devices. This can be seen from the fact that IDP has the
same "Reserved memory changes" in its device tree that trogdor has.
Recently it was realized that we need to mark SCM as dma-coherent to
match what trogdor's style of firmware (based on TF-A) does [1]. That
means we need this dma-coherent tag on IDP as well.
Without this, on newer versions of Linux, specifically those with
commit 7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache
invalidation from arch_dma_prep_coherent()"""), WiFi will fail to
work. At bootup you'll see:
qcom_scm firmware:scm: Assign memory protection call failed -22
qcom_rmtfs_mem 94600000.memory: assign memory failed
qcom_rmtfs_mem: probe of 94600000.memory failed with error -22
[1] https://lore.kernel.org/r/20230615145253.1.Ic62daa649b47b656b313551d646c4de9a7da4bd4@changeid
Fixes: 7bd6680b47fa ("Revert "Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()""")
Fixes: f5ab220d162c ("arm64: dts: qcom: sc7180: Add remoteproc enablers")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230616081440.v2.2.I3c17d546d553378aa8a0c68c3fe04bccea7cba17@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
Trogdor devices use firmware backed by TF-A instead of Qualcomm's
normal TZ. On TF-A we end up mapping memory as cacheable. Specifically,
you can see in Trogdor's TF-A code [1] in qti_sip_mem_assign() that we
call qti_mmap_add_dynamic_region() with MT_RO_DATA. This translates
down to MT_MEMORY instead of MT_NON_CACHEABLE or MT_DEVICE.
Let's allow devices like trogdor to be described properly by allowing
"dma-coherent" in the SCM node.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20230616081440.v2.1.Ie79b5f0ed45739695c9970df121e11d724909157@changeid
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page sharing by KSM
kvm_mmu_notifier_invalidate_range_start
:
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range
kvm_set_spte_gfn (C)
:
kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is cd4c71835228 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit d5d8184d35c9
("KVM: ARM: Memory virtualization setup").
Cc: stable@vger.kernel.org # v3.9+
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reported-by: Shuai Hu <hshuai@redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
Message-Id: <20230615054259.14911-1-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There was regression caused by a97699d1d610 ("btrfs: replace
map_lookup->stripe_len by BTRFS_STRIPE_LEN") and supposedly fixed by
a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr").
To avoid code churn the fix was open coding the type casts but
unfortunately missed one which was still possible to hit [1].
The missing place was assignment of bioc->full_stripe_logical inside
btrfs_map_block().
Fix it by adding a helper that does the safe calculation of the offset
and use it everywhere even though it may not be strictly necessary due
to already using u64 types. This replaces all remaining
"<< BTRFS_STRIPE_LEN_SHIFT" calls.
[1] https://lore.kernel.org/linux-btrfs/20230622065438.86402-1-wqu@suse.com/
Fixes: a7299a18a179 ("btrfs: fix u32 overflows when left shifting stripe_nr")
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
When __blkcg_rstat_flush() is called from cgroup_rstat_flush*() code
path, interrupt is always disabled.
When we start to flush blkcg per-cpu stats list in __blkg_release()
for avoiding to leak blkcg_gq's reference in commit 20cb1c2fb756
("blk-cgroup: Flush stats before releasing blkcg_gq"), local irq
isn't disabled yet, then lockdep warning may be triggered because
the dependent cgroup locks may be acquired from irq(soft irq) handler.
Fix the issue by disabling local irq always.
Fixes: 20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-block/pz2wzwnmn5tk3pwpskmjhli6g3qly7eoknilb26of376c7kwxy@qydzpvt6zpis/T/#u
Cc: stable@vger.kernel.org
Cc: Jay Shin <jaeshin@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Link: https://lore.kernel.org/r/20230622084249.1208005-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
As made explicit by commit 03a283cdc8c8 ("net/mlx5: Kconfig: Make tc
offload depend on tc skb extension") tc skb extension is required for
offloading tc as well as bridges on switchdev capable ConnectX devices.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
This is v3, including a crash fix for patch 01/14.
The following patchset contains Netfilter/IPVS fixes for net:
1) Fix UDP segmentation with IPVS tunneled traffic, from Terin Stock.
2) Fix chain binding transaction logic, add a bound flag to rule
transactions. Remove incorrect logic in nft_data_hold() and
nft_data_release().
3) Add a NFT_TRANS_PREPARE_ERROR deactivate state to deal with releasing
the set/chain as a follow up to 1240eb93f061 ("netfilter: nf_tables:
incorrect error path handling with NFT_MSG_NEWRULE")
4) Drop map element references from preparation phase instead of
set destroy path, otherwise bogus EBUSY with transactions such as:
flush chain ip x y
delete chain ip x w
where chain ip x y contains jump/goto from set elements.
5) Pipapo set type does not regard generation mask from the walk
iteration.
6) Fix reference count underflow in set element reference to
stateful object.
7) Several patches to tighten the nf_tables API:
- disallow set element updates of bound anonymous set
- disallow unbound anonymous set/chain at the end of transaction.
- disallow updates of anonymous set.
- disallow timeout configuration for anonymous sets.
8) Fix module reference leak in chain updates.
9) Fix nfnetlink_osf module autoload.
10) Fix deletion of basechain when NFTA_CHAIN_HOOK is specified as
in iptables-nft.
This Netfilter batch is larger than usual at this stage, I am aware we
are fairly late in the -rc cycle, if you prefer to route them through
net-next, please let me know.
netfilter pull request 23-06-21
* tag 'nf-23-06-21' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: Fix for deleting base chains with payload
netfilter: nfnetlink_osf: fix module autoload
netfilter: nf_tables: drop module reference after updating chain
netfilter: nf_tables: disallow timeout for anonymous sets
netfilter: nf_tables: disallow updates of anonymous sets
netfilter: nf_tables: reject unbound chain set before commit phase
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: disallow element updates of bound anonymous sets
netfilter: nf_tables: fix underflow in object reference counter
netfilter: nft_set_pipapo: .walk does not deal with generations
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: fix chain binding transaction logic
ipvs: align inner_mac_header for encapsulation
====================
Link: https://lore.kernel.org/r/20230621100731.68068-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This reverts commit 1f86123b9749 ("net: align SO_RCVMARK required
privileges with SO_MARK") because the reasoning in the commit message
is not really correct:
SO_RCVMARK is used for 'reading' incoming skb mark (via cmsg), as such
it is more equivalent to 'getsockopt(SO_MARK)' which has no priv check
and retrieves the socket mark, rather than 'setsockopt(SO_MARK) which
sets the socket mark and does require privs.
Additionally incoming skb->mark may already be visible if
sysctl_fwmark_reflect and/or sysctl_tcp_fwmark_accept are enabled.
Furthermore, it is easier to block the getsockopt via bpf
(either cgroup setsockopt hook, or via syscall filters)
then to unblock it if it requires CAP_NET_RAW/ADMIN.
On Android the socket mark is (among other things) used to store
the network identifier a socket is bound to. Setting it is privileged,
but retrieving it is not. We'd like unprivileged userspace to be able
to read the network id of incoming packets (where mark is set via
iptables [to be moved to bpf])...
An alternative would be to add another sysctl to control whether
setting SO_RCVMARK is privilged or not.
(or even a MASK of which bits in the mark can be exposed)
But this seems like over-engineering...
Note: This is a non-trivial revert, due to later merged commit e42c7beee71d
("bpf: net: Consider has_current_bpf_ctx() when testing capable() in sk_setsockopt()")
which changed both 'ns_capable' into 'sockopt_ns_capable' calls.
Fixes: 1f86123b9749 ("net: align SO_RCVMARK required privileges with SO_MARK")
Cc: Larysa Zaremba <larysa.zaremba@intel.com>
Cc: Simon Horman <simon.horman@corigine.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Eyal Birger <eyal.birger@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Patrick Rohr <prohr@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230618103130.51628-1-maze@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
struct mux_adth actually ends with multiple struct mux_adth_dg members.
This is seen both in the comments about the member:
/**
* struct mux_adth - Structure of the Aggregated Datagram Table Header.
...
* @dg: datagramm table with variable length
*/
and in the preparation for populating it:
adth_dg_size = offsetof(struct mux_adth, dg) +
ul_adb->dg_count[i] * sizeof(*dg);
...
adth_dg_size -= offsetof(struct mux_adth, dg);
memcpy(&adth->dg, ul_adb->dg[i], adth_dg_size);
This was reported as a run-time false positive warning:
memcpy: detected field-spanning write (size 16) of single field "&adth->dg" at drivers/net/wwan/iosm/iosm_ipc_mux_codec.c:852 (size 8)
Adjust the struct mux_adth definition and associated sizeof() math; no binary
output differences are observed in the resulting object file.
Reported-by: Florian Klink <flokli@flokli.de>
Closes: https://lore.kernel.org/lkml/dbfa25f5-64c8-5574-4f5d-0151ba95d232@gmail.com/
Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
Cc: M Chetan Kumar <m.chetan.kumar@intel.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: Intel Corporation <linuxwwan@intel.com>
Cc: Loic Poulain <loic.poulain@linaro.org>
Cc: Sergey Ryazanov <ryazanov.s.a@gmail.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620194234.never.023-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
syzbot managed to trigger a divide error [1] in netem.
It could happen if q->rate changes while netem_enqueue()
is running, since q->rate is read twice.
It turns out netem_change() always lacked proper synchronization.
[1]
divide error: 0000 [#1] SMP KASAN
CPU: 1 PID: 7867 Comm: syz-executor.1 Not tainted 6.1.30-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/25/2023
RIP: 0010:div64_u64 include/linux/math64.h:69 [inline]
RIP: 0010:packet_time_ns net/sched/sch_netem.c:357 [inline]
RIP: 0010:netem_enqueue+0x2067/0x36d0 net/sched/sch_netem.c:576
Code: 89 e2 48 69 da 00 ca 9a 3b 42 80 3c 28 00 4c 8b a4 24 88 00 00 00 74 0d 4c 89 e7 e8 c3 4f 3b fd 48 8b 4c 24 18 48 89 d8 31 d2 <49> f7 34 24 49 01 c7 4c 8b 64 24 48 4d 01 f7 4c 89 e3 48 c1 eb 03
RSP: 0018:ffffc9000dccea60 EFLAGS: 00010246
RAX: 000001a442624200 RBX: 000001a442624200 RCX: ffff888108a4f000
RDX: 0000000000000000 RSI: 000000000000070d RDI: 000000000000070d
RBP: ffffc9000dcceb90 R08: ffffffff849c5e26 R09: fffffbfff10e1297
R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888108a4f358
R13: dffffc0000000000 R14: 0000001a8cd9a7ec R15: 0000000000000000
FS: 00007fa73fe18700(0000) GS:ffff8881f6b00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa73fdf7718 CR3: 000000011d36e000 CR4: 0000000000350ee0
Call Trace:
<TASK>
[<ffffffff84714385>] __dev_xmit_skb net/core/dev.c:3931 [inline]
[<ffffffff84714385>] __dev_queue_xmit+0xcf5/0x3370 net/core/dev.c:4290
[<ffffffff84d22df2>] dev_queue_xmit include/linux/netdevice.h:3030 [inline]
[<ffffffff84d22df2>] neigh_hh_output include/net/neighbour.h:531 [inline]
[<ffffffff84d22df2>] neigh_output include/net/neighbour.h:545 [inline]
[<ffffffff84d22df2>] ip_finish_output2+0xb92/0x10d0 net/ipv4/ip_output.c:235
[<ffffffff84d21e63>] __ip_finish_output+0xc3/0x2b0
[<ffffffff84d10a81>] ip_finish_output+0x31/0x2a0 net/ipv4/ip_output.c:323
[<ffffffff84d10f14>] NF_HOOK_COND include/linux/netfilter.h:298 [inline]
[<ffffffff84d10f14>] ip_output+0x224/0x2a0 net/ipv4/ip_output.c:437
[<ffffffff84d123b5>] dst_output include/net/dst.h:444 [inline]
[<ffffffff84d123b5>] ip_local_out net/ipv4/ip_output.c:127 [inline]
[<ffffffff84d123b5>] __ip_queue_xmit+0x1425/0x2000 net/ipv4/ip_output.c:542
[<ffffffff84d12fdc>] ip_queue_xmit+0x4c/0x70 net/ipv4/ip_output.c:556
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230620184425.1179809-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Power source notify handler is getting registered even when none of the
PMF feature in enabled leading to a crash.
...
[ 22.592162] Call Trace:
[ 22.592164] <TASK>
[ 22.592164] ? rcu_note_context_switch+0x5e0/0x660
[ 22.592166] ? __warn+0x81/0x130
[ 22.592171] ? rcu_note_context_switch+0x5e0/0x660
[ 22.592172] ? report_bug+0x171/0x1a0
[ 22.592175] ? prb_read_valid+0x1b/0x30
[ 22.592177] ? handle_bug+0x3c/0x80
[ 22.592178] ? exc_invalid_op+0x17/0x70
[ 22.592179] ? asm_exc_invalid_op+0x1a/0x20
[ 22.592182] ? rcu_note_context_switch+0x5e0/0x660
[ 22.592183] ? acpi_ut_delete_object_desc+0x86/0xb0
[ 22.592186] ? acpi_ut_update_ref_count.part.0+0x22d/0x930
[ 22.592187] __schedule+0xc0/0x1410
[ 22.592189] ? ktime_get+0x3c/0xa0
[ 22.592191] ? lapic_next_event+0x1d/0x30
[ 22.592193] ? hrtimer_start_range_ns+0x25b/0x350
[ 22.592196] schedule+0x5e/0xd0
[ 22.592197] schedule_hrtimeout_range_clock+0xbe/0x140
[ 22.592199] ? __pfx_hrtimer_wakeup+0x10/0x10
[ 22.592200] usleep_range_state+0x64/0x90
[ 22.592203] amd_pmf_send_cmd+0x106/0x2a0 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616]
[ 22.592207] amd_pmf_update_slider+0x56/0x1b0 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616]
[ 22.592210] amd_pmf_set_sps_power_limits+0x72/0x80 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616]
[ 22.592213] amd_pmf_pwr_src_notify_call+0x49/0x90 [amd_pmf bddfe0fe3712aaa99acce3d5487405c5213c6616]
[ 22.592216] notifier_call_chain+0x5a/0xd0
[ 22.592218] atomic_notifier_call_chain+0x32/0x50
...
Fix this by moving the registration of source change notify handler only
when SPS(Static Slider) is advertised as supported.
Reported-by: Allen Zhong <allen@atr.me>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217571
Fixes: 4c71ae414474 ("platform/x86/amd/pmf: Add support SPS PMF feature")
Tested-by: Patil Rajesh Reddy <Patil.Reddy@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Link: https://lore.kernel.org/r/20230622060309.310001-1-Shyam-sundar.S-k@amd.com
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
|
|
When mirroring to a gretap in hardware the device expects to be
programmed with the egress port and all the encapsulating headers. This
requires the driver to resolve the path the packet will take in the
software data path and program the device accordingly.
If the path cannot be resolved (in this case because of an unresolved
neighbor), then mirror installation fails until the path is resolved.
This results in a race that causes the test to sometimes fail.
Fix this by setting the neighbor's state to permanent in a couple of
tests, so that it is always valid.
Fixes: 35c31d5c323f ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d")
Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q")
Signed-off-by: Danielle Ratson <danieller@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/268816ac729cb6028c7a34d4dda6f4ec7af55333.1687264607.git.petrm@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|