Age | Commit message (Collapse) | Author |
|
Currently we use struct per_cpu_nodestat to cache the vmstat counters,
which leads to inaccurate statistics especially THP vmstat counters. In
the systems with hundreds of processors it can be GBs of memory. For
example, for a 96 CPUs system, the threshold is the maximum number of 125.
And the per cpu counters can cache 23.4375 GB in total.
The THP page is already a form of batched addition (it will add 512 worth
of memory in one go) so skipping the batching seems like sensible.
Although every THP stats update overflows the per-cpu counter, resorting
to atomic global updates. But it can make the statistics more accuracy
for the THP vmstat counters.
So we convert the NR_ANON_THPS account to pages. This patch is consistent
with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival").
Doing this also can make the unit of vmstat counters more unified.
Finally, the unit of the vmstat counters are pages, kB and bytes. The
B/KB suffix can tell us that the unit is bytes or kB. The rest which is
without suffix are pages.
Link: https://lkml.kernel.org/r/20201228164110.2838-3-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Rafael. J. Wysocki <rafael@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
clear_buffer_new() is used to clear buffer new stat. When PAGE_SIZE is
64K, most buffer heads in the list are not needed to clear.
clear_buffer_new() has an enpensive atomic modification operation, Let's
add checking buffer head before clear it as __block_write_begin_int does
which is good for performance.
Link: https://lkml.kernel.org/r/1612332890-57918-1-git-send-email-zhangshaokun@hisilicon.com
Signed-off-by: Yang Guo <guoyang2@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Rename generic_file_buffered_read to match the naming of filemap_fault,
also update the written parameter to a more descriptive name and improve
the kerneldoc comment.
Link: https://lkml.kernel.org/r/20210122160140.223228-18-willy@infradead.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
replace_page_cache_page()
Since commit 74d609585d8b ("page cache: Add and replace pages using the
XArray") was merged, the replace_page_cache_page() can not fail and always
return 0, we can remove the redundant return value and void it. Moreover
remove the unused gfp_mask.
Link: https://lkml.kernel.org/r/609c30e5274ba15d8b90c872fd0d8ac437a9b2bb.1610071401.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
[akpm@linux-foundation.org: update inode_operations.tmpfile]
Link: http://lkml.kernel.org/r/20190206073349.GA15311@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Delete duplicate words in fs/*.c.
The doubled words that are being dropped are:
that, be, the, in, and, for
Link: https://lkml.kernel.org/r/20201224052810.25315-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the following coccicheck warnings:
fs/ocfs2/refcounttree.c:981:16-18: WARNING !A || A && B is equivalent to !A || B.
Link: https://lkml.kernel.org/r/1612235424-80367-1-git-send-email-jiapeng.chong@linux.alibaba.com
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The error handling in this function frees "reg" but it is still on the
"o2hb_all_regions" list so it will lead to a use after freew. Joseph Qi
points out that we need to clear the bit in the "o2hb_region_bitmap" as
well
Link: https://lkml.kernel.org/r/YBk4M6HUG8jB/jc7@mwanda
Fixes: 1cf257f51191 ("ocfs2: fix memory leak")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There are some definitions which is not used anymore in OCFS2 module, so
as to be removed.
Link: https://lkml.kernel.org/r/2021011916182284700534@chinatelecom.cn
Signed-off-by: Guozhonghua <guozh88@chinatelecom.cn>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
iput handles NULL pointers gracefully, so there's no need to check the
pointer before the call.
Link: https://lkml.kernel.org/r/20201231040535.4091761-1-yili@winhong.com
Signed-off-by: Yi Li <yili@winhong.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Mounting a corrupted filesystem with NTFS resulted in a kernel crash.
We should check for valid STANDARD_INFORMATION attribute offset and length
before trying to access it
Link: https://lkml.kernel.org/r/20210217155930.1506815-1-rkovhaev@gmail.com
Link: https://syzkaller.appspot.com/bug?extid=c584225dabdea2f71969
Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Reported-by: syzbot+c584225dabdea2f71969@syzkaller.appspotmail.com
Tested-by: syzbot+c584225dabdea2f71969@syzkaller.appspotmail.com
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Drop the repeated words "the" and "in" in comments.
Link: https://lkml.kernel.org/r/20210125194937.24627-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Log space and revoke accounting rework to fix some failed asserts.
- Local resource group glock sharing for better local performance.
- Add support for version 1802 filesystems: trusted xattr support and
'-o rgrplvb' mounts by default.
- Actually synchronize on the inode glock's FREEING bit during withdraw
("gfs2: fix glock confusion in function signal_our_withdraw").
- Fix parallel recovery of multiple journals ("gfs2: keep bios separate
for each journal").
- Various other bug fixes.
* tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (49 commits)
gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush
gfs2: Per-revoke accounting in transactions
gfs2: Rework the log space allocation logic
gfs2: Minor calc_reserved cleanup
gfs2: Use resource group glock sharing
gfs2: Allow node-wide exclusive glock sharing
gfs2: Add local resource group locking
gfs2: Add per-reservation reserved block accounting
gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested}
gfs2: Check for active reservation in gfs2_release
gfs2: Don't search for unreserved space twice
gfs2: Only pass reservation down to gfs2_rbm_find
gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt
gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end
gfs2: Add trusted xattr support
gfs2: Enable rgrplvb for sb_fs_format 1802
gfs2: Don't skip dlm unlock if glock has an lvb
gfs2: Lock imbalance on error path in gfs2_recover_one
gfs2: Move function gfs2_ail_empty_tr
gfs2: Get rid of current_tail()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull idmapped mounts from Christian Brauner:
"This introduces idmapped mounts which has been in the making for some
time. Simply put, different mounts can expose the same file or
directory with different ownership. This initial implementation comes
with ports for fat, ext4 and with Christoph's port for xfs with more
filesystems being actively worked on by independent people and
maintainers.
Idmapping mounts handle a wide range of long standing use-cases. Here
are just a few:
- Idmapped mounts make it possible to easily share files between
multiple users or multiple machines especially in complex
scenarios. For example, idmapped mounts will be used in the
implementation of portable home directories in
systemd-homed.service(8) where they allow users to move their home
directory to an external storage device and use it on multiple
computers where they are assigned different uids and gids. This
effectively makes it possible to assign random uids and gids at
login time.
- It is possible to share files from the host with unprivileged
containers without having to change ownership permanently through
chown(2).
- It is possible to idmap a container's rootfs and without having to
mangle every file. For example, Chromebooks use it to share the
user's Download folder with their unprivileged containers in their
Linux subsystem.
- It is possible to share files between containers with
non-overlapping idmappings.
- Filesystem that lack a proper concept of ownership such as fat can
use idmapped mounts to implement discretionary access (DAC)
permission checking.
- They allow users to efficiently changing ownership on a per-mount
basis without having to (recursively) chown(2) all files. In
contrast to chown (2) changing ownership of large sets of files is
instantenous with idmapped mounts. This is especially useful when
ownership of a whole root filesystem of a virtual machine or
container is changed. With idmapped mounts a single syscall
mount_setattr syscall will be sufficient to change the ownership of
all files.
- Idmapped mounts always take the current ownership into account as
idmappings specify what a given uid or gid is supposed to be mapped
to. This contrasts with the chown(2) syscall which cannot by itself
take the current ownership of the files it changes into account. It
simply changes the ownership to the specified uid and gid. This is
especially problematic when recursively chown(2)ing a large set of
files which is commong with the aforementioned portable home
directory and container and vm scenario.
- Idmapped mounts allow to change ownership locally, restricting it
to specific mounts, and temporarily as the ownership changes only
apply as long as the mount exists.
Several userspace projects have either already put up patches and
pull-requests for this feature or will do so should you decide to pull
this:
- systemd: In a wide variety of scenarios but especially right away
in their implementation of portable home directories.
https://systemd.io/HOME_DIRECTORY/
- container runtimes: containerd, runC, LXD:To share data between
host and unprivileged containers, unprivileged and privileged
containers, etc. The pull request for idmapped mounts support in
containerd, the default Kubernetes runtime is already up for quite
a while now: https://github.com/containerd/containerd/pull/4734
- The virtio-fs developers and several users have expressed interest
in using this feature with virtual machines once virtio-fs is
ported.
- ChromeOS: Sharing host-directories with unprivileged containers.
I've tightly synced with all those projects and all of those listed
here have also expressed their need/desire for this feature on the
mailing list. For more info on how people use this there's a bunch of
talks about this too. Here's just two recent ones:
https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf
https://fosdem.org/2021/schedule/event/containers_idmap/
This comes with an extensive xfstests suite covering both ext4 and
xfs:
https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts
It covers truncation, creation, opening, xattrs, vfscaps, setid
execution, setgid inheritance and more both with idmapped and
non-idmapped mounts. It already helped to discover an unrelated xfs
setgid inheritance bug which has since been fixed in mainline. It will
be sent for inclusion with the xfstests project should you decide to
merge this.
In order to support per-mount idmappings vfsmounts are marked with
user namespaces. The idmapping of the user namespace will be used to
map the ids of vfs objects when they are accessed through that mount.
By default all vfsmounts are marked with the initial user namespace.
The initial user namespace is used to indicate that a mount is not
idmapped. All operations behave as before and this is verified in the
testsuite.
Based on prior discussions we want to attach the whole user namespace
and not just a dedicated idmapping struct. This allows us to reuse all
the helpers that already exist for dealing with idmappings instead of
introducing a whole new range of helpers. In addition, if we decide in
the future that we are confident enough to enable unprivileged users
to setup idmapped mounts the permission checking can take into account
whether the caller is privileged in the user namespace the mount is
currently marked with.
The user namespace the mount will be marked with can be specified by
passing a file descriptor refering to the user namespace as an
argument to the new mount_setattr() syscall together with the new
MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern
of extensibility.
The following conditions must be met in order to create an idmapped
mount:
- The caller must currently have the CAP_SYS_ADMIN capability in the
user namespace the underlying filesystem has been mounted in.
- The underlying filesystem must support idmapped mounts.
- The mount must not already be idmapped. This also implies that the
idmapping of a mount cannot be altered once it has been idmapped.
- The mount must be a detached/anonymous mount, i.e. it must have
been created by calling open_tree() with the OPEN_TREE_CLONE flag
and it must not already have been visible in the filesystem.
The last two points guarantee easier semantics for userspace and the
kernel and make the implementation significantly simpler.
By default vfsmounts are marked with the initial user namespace and no
behavioral or performance changes are observed.
The manpage with a detailed description can be found here:
https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8
In order to support idmapped mounts, filesystems need to be changed
and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The
patches to convert individual filesystem are not very large or
complicated overall as can be seen from the included fat, ext4, and
xfs ports. Patches for other filesystems are actively worked on and
will be sent out separately. The xfstestsuite can be used to verify
that port has been done correctly.
The mount_setattr() syscall is motivated independent of the idmapped
mounts patches and it's been around since July 2019. One of the most
valuable features of the new mount api is the ability to perform
mounts based on file descriptors only.
Together with the lookup restrictions available in the openat2()
RESOLVE_* flag namespace which we added in v5.6 this is the first time
we are close to hardened and race-free (e.g. symlinks) mounting and
path resolution.
While userspace has started porting to the new mount api to mount
proper filesystems and create new bind-mounts it is currently not
possible to change mount options of an already existing bind mount in
the new mount api since the mount_setattr() syscall is missing.
With the addition of the mount_setattr() syscall we remove this last
restriction and userspace can now fully port to the new mount api,
covering every use-case the old mount api could. We also add the
crucial ability to recursively change mount options for a whole mount
tree, both removing and adding mount options at the same time. This
syscall has been requested multiple times by various people and
projects.
There is a simple tool available at
https://github.com/brauner/mount-idmapped
that allows to create idmapped mounts so people can play with this
patch series. I'll add support for the regular mount binary should you
decide to pull this in the following weeks:
Here's an example to a simple idmapped mount of another user's home
directory:
u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt
u1001@f2-vm:/$ ls -al /home/ubuntu/
total 28
drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 .
drwxr-xr-x 4 root root 4096 Oct 28 04:00 ..
-rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile
-rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ ls -al /mnt/
total 28
drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 .
drwxr-xr-x 29 root root 4096 Oct 28 22:01 ..
-rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history
-rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout
-rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc
-rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile
-rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful
-rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo
u1001@f2-vm:/$ touch /mnt/my-file
u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file
u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file
u1001@f2-vm:/$ ls -al /mnt/my-file
-rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file
u1001@f2-vm:/$ ls -al /home/ubuntu/my-file
-rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file
u1001@f2-vm:/$ getfacl /mnt/my-file
getfacl: Removing leading '/' from absolute path names
# file: mnt/my-file
# owner: u1001
# group: u1001
user::rw-
user:u1001:rwx
group::rw-
mask::rwx
other::r--
u1001@f2-vm:/$ getfacl /home/ubuntu/my-file
getfacl: Removing leading '/' from absolute path names
# file: home/ubuntu/my-file
# owner: ubuntu
# group: ubuntu
user::rw-
user:ubuntu:rwx
group::rw-
mask::rwx
other::r--"
* tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits)
xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl
xfs: support idmapped mounts
ext4: support idmapped mounts
fat: handle idmapped mounts
tests: add mount_setattr() selftests
fs: introduce MOUNT_ATTR_IDMAP
fs: add mount_setattr()
fs: add attr_flags_to_mnt_flags helper
fs: split out functions to hold writers
namespace: only take read lock in do_reconfigure_mnt()
mount: make {lock,unlock}_mount_hash() static
namespace: take lock_mount_hash() directly when changing flags
nfs: do not export idmapped mounts
overlayfs: do not mount on top of idmapped mounts
ecryptfs: do not mount on top of idmapped mounts
ima: handle idmapped mounts
apparmor: handle idmapped mounts
fs: make helpers idmap mount aware
exec: handle idmapped mounts
would_dump: handle idmapped mounts
...
|
|
In gfs2_ail1_flush, we're using I/O plugging to give the block layer a
better chance of merging I/O requests. If we're too aggressive here, we
can end up waiting on I/O to complete while still plugged. Fix that in
a way similar to writeback_sb_inodes, except that we can't use
blk_flush_plug because blk_flush_plug_list is not exported.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git
Merge the resource group glock sharing feature and the revoke accounting rework.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
git://anongit.freedesktop.org/drm/drm
Pull follow_pfn() updates from Daniel Vetter:
"Fixes around VM_FPNMAP and follow_pfn:
- replace mm/frame_vector.c by get_user_pages in misc/habana and
drm/exynos drivers, then move that into media as it's sole user
- close race in generic_access_phys
- s390 pci ioctl fix of this series landed in 5.11 already
- properly revoke iomem mappings (/dev/mem, pci files)"
* tag 'topic/iomem-mmap-vs-gup-2021-02-22' of git://anongit.freedesktop.org/drm/drm:
PCI: Revoke mappings like devmem
PCI: Also set up legacy files only after sysfs init
sysfs: Support zapping of binary attr mmaps
resource: Move devmem revoke code to resource framework
/dev/mem: Only set filp->f_mapping
PCI: Obey iomem restrictions for procfs mmap
mm: Close race in generic_access_phys
media: videobuf2: Move frame_vector into media subsystem
mm/frame-vector: Use FOLL_LONGTERM
misc/habana: Use FOLL_LONGTERM for userptr
misc/habana: Stop using frame_vector helpers
drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
drm/exynos: Stop using frame_vector helpers
|
|
git://anongit.freedesktop.org/drm/drm
Pull kcmp kconfig update from Daniel Vetter:
"Make the kcmp syscall available independently of checkpoint/restore.
drm userspaces uses this, systemd uses this, so makes sense to pull it
out from the checkpoint-restore bundle.
Kees reviewed this from security pov and is happy with the final
version"
Link: https://lwn.net/Articles/845448/
* tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm:
kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull more nfsd updates from Chuck Lever:
"Here are a few additional NFSD commits for the merge window:
Optimization:
- Cork the socket while there are queued replies
Fixes:
- DRC shutdown ordering
- svc_rdma_accept() lockdep splat"
* tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Further clean up svc_tcp_sendmsg()
SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg()
SUNRPC: Use TCP_CORK to optimise send performance on the server
svcrdma: Hold private mutex while invoking rdma_accept()
nfsd: register pernet ops last, unregister first
|
|
Pull ceph updates from Ilya Dryomov:
"With netfs helper library and fscache rework delayed, just a few cap
handling improvements to avoid grabbing mmap_lock in some code paths
and deal with capsnaps better and a mount option cleanup"
* tag 'ceph-for-5.12-rc1' of git://github.com/ceph/ceph-client:
ceph: defer flushing the capsnap if the Fb is used
libceph: remove osdtimeout option entirely
libceph: deprecate [no]cephx_require_signatures options
ceph: allow queueing cap/snap handling after putting cap references
ceph: clean up inode work queueing
ceph: fix flush_snap logic after putting caps
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull isofs, udf, and quota updates from Jan Kara:
"Several udf, isofs, and quota fixes"
* tag 'fs_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
parser: Fix kernel-doc markups
udf: handle large user and group ID
isofs: handle large user and group ID
parser: add unsigned int parser
udf: fix silent AED tagLocation corruption
isofs: release buffer head before return
quota: Fix memory leak when handling corrupted quota file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify update from Jan Kara:
"Make inotify groups be charged against appropriate memcgs"
* tag 'fsnotify_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
inotify, memcg: account inotify instances to kmemcg
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull lazytime updates from Jan Kara:
"Cleanups of the lazytime handling in the writeback code making rules
for calling ->dirty_inode() filesystem handlers saner"
* tag 'lazytime_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext4: simplify i_state checks in __ext4_update_other_inode_time()
gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync()
fs: improve comments for writeback_single_inode()
fs: drop redundant check from __writeback_single_inode()
fs: clean up __mark_inode_dirty() a bit
fs: pass only I_DIRTY_INODE flags to ->dirty_inode
fs: don't call ->dirty_inode for lazytime timestamp updates
fat: only specify I_DIRTY_TIME when needed in fat_update_time()
fs: only specify I_DIRTY_TIME when needed in generic_update_time()
fs: correctly document the inode dirty flags
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat updates from Namjae Jeon:
- improve file deletion performance with dirsync mount option
- fix shift-out-of-bounds in exfat_fill_super() reported by syzkaller
* tag 'exfat-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: improve performance of exfat_free_cluster when using dirsync mount option
exfat: fix shift-out-of-bounds in exfat_fill_super()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs
Pull zonefs updates from Damien Le Moal:
"Two changes:
- A fix that did not make it in time for 5.11, to correct the file
size initialization of full sequential zone, from Shin'ichiro
- Add file operation tracepoints to help with debugging, from
Johannes"
* tag 'zonefs-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: Fix file size of zones in full condition
zonefs: add tracepoints for file operations
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull RCU-safe common_lsm_audit() from Al Viro:
"Make common_lsm_audit() non-blocking and usable from RCU pathwalk
context.
We don't really need to grab/drop dentry in there - rcu_read_lock() is
enough. There's a couple of followups using that to simplify the
logics in selinux, but those hadn't soaked in -next yet, so they'll
have to go in next window"
* 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
make dump_common_audit_data() safe to be called from RCU pathwalk
new helper: d_find_alias_rcu()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull d_name whack-a-mole from Al Viro:
"A bunch of places that play with ->d_name in printks instead of using
proper formats..."
* 'work.d_name' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
orangefs_file_mmap(): use %pD
cifs_debug: use %pd instead of messing with ->d_name
erofs: use %pd instead of messing with ->d_name
cramfs: use %pD instead of messing with file_dentry()->d_name
|
|
In the log, revokes are stored as a revoke descriptor (struct
gfs2_log_descriptor), followed by zero or more additional revoke blocks
(struct gfs2_meta_header). On filesystems with a blocksize of 4k, the
revoke descriptor contains up to 503 revokes, and the metadata blocks
contain up to 509 revokes each. We've so far been reserving space for
revokes in transactions in block granularity, so a lot more space than
necessary was being allocated and then released again.
This patch switches to assigning revokes to transactions individually
instead. Initially, space for the revoke descriptor is reserved and
handed out to transactions. When more revokes than that are reserved,
additional revoke blocks are added. When the log is flushed, the space
for the additional revoke blocks is released, but we keep the space for
the revoke descriptor block allocated.
Transactions may still reserve more revokes than they will actually need
in the end, but now we won't overshoot the target as much, and by only
returning the space for excess revokes at log flush time, we further
reduce the amount of contention between processes.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The current log space allocation logic is hard to understand or extend.
The principle it that when the log is flushed, we may or may not have a
transaction active that has space allocated in the log. To deal with
that, we set aside a magical number of blocks to be used in case we
don't have an active transaction. It isn't clear that the pool will
always be big enough. In addition, we can't return unused log space at
the end of a transaction, so the number of blocks allocated must exactly
match the number of blocks used.
Simplify this as follows:
* When transactions are allocated or merged, always reserve enough
blocks to flush the transaction (err on the safe side).
* In gfs2_log_flush, return any allocated blocks that haven't been used.
* Maintain a pool of spare blocks big enough to do one log flush, as
before.
* In gfs2_log_flush, when we have no active transaction, allocate a
suitable number of blocks. For that, use the spare pool when
called from logd, and leave the pool alone otherwise. This means
that when the log is almost full, logd will still be able to do one
more log flush, which will result in more log space becoming
available.
This will make the log space allocator code easier to work with in
the future.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
No functional change.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Pull documentation updates from Jonathan Corbet:
"It has been a relatively quiet cycle in docsland.
- As promised, the minimum Sphinx version to build the docs is now
1.7, and we have dropped support for Python 2 entirely. That
allowed the removal of a bunch of compatibility code.
- A set of treewide warning fixups from Mauro that I applied after it
became clear nobody else was going to deal with them.
- The automarkup mechanism can now create cross-references from
relative paths to RST files.
- More translations, typo fixes, and warning fixes"
* tag 'docs-5.12' of git://git.lwn.net/linux: (75 commits)
docs: kernel-hacking: be more civil
docs: Remove the Microsoft rhetoric
Documentation/admin-guide: kernel-parameters: Update nohlt section
doc/admin-guide: fix spelling mistake: "perfomance" -> "performance"
docs: Document cross-referencing using relative path
docs: Enable usage of relative paths to docs on automarkup
docs: thermal: fix spelling mistakes
Documentation: admin-guide: Update kvm/xen config option
docs: Make syscalls' helpers naming consistent
coding-style.rst: Avoid comma statements
Documentation: /proc/loadavg: add 3 more field descriptions
Documentation/submitting-patches: Add blurb about backtraces in commit messages
Docs: drop Python 2 support
Move our minimum Sphinx version to 1.7
Documentation: input: define ABS_PRESSURE/ABS_MT_PRESSURE resolution as grams
scripts/kernel-doc: add internal hyperlink to DOC: sections
Update Documentation/admin-guide/sysctl/fs.rst
docs: Update DTB format references
docs: zh_CN: add iio index.rst translation
docs/zh_CN: add iio ep93xx_adc.rst translation
...
|
|
There are stressful update of cluster allocation bitmap when using
dirsync mount option which is doing sync buffer on every cluster bit
clearing. This could result in performance degradation when deleting
big size file.
Fix to update only when the bitmap buffer index is changed would make
less disk access, improving performance especially for truncate operation.
Testing with Samsung 256GB sdcard, mounted with dirsync option
(mount -t exfat /dev/block/mmcblk0p1 /temp/mount -o dirsync)
Remove 4GB file, blktrace result.
[Before] : 39 secs.
Total (blktrace):
Reads Queued: 0, 0KiB Writes Queued: 32775, 16387KiB
Read Dispatches: 0, 0KiB Write Dispatches: 32775, 16387KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 32775, 16387KiB
Read Merges: 0, 0KiB Write Merges: 0, 0KiB
IO unplugs: 2 Timer unplugs: 0
[After] : 1 sec.
Total (blktrace):
Reads Queued: 0, 0KiB Writes Queued: 13, 6KiB
Read Dispatches: 0, 0KiB Write Dispatches: 13, 6KiB
Reads Requeued: 0 Writes Requeued: 0
Reads Completed: 0, 0KiB Writes Completed: 13, 6KiB
Read Merges: 0, 0KiB Write Merges: 0, 0KiB
IO unplugs: 1 Timer unplugs: 0
Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com>
Acked-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
syzbot reported a warning which could cause shift-out-of-bounds issue.
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0x183/0x22e lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:148 [inline]
__ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395
exfat_read_boot_sector fs/exfat/super.c:471 [inline]
__exfat_fill_super fs/exfat/super.c:556 [inline]
exfat_fill_super+0x2acb/0x2d00 fs/exfat/super.c:624
get_tree_bdev+0x406/0x630 fs/super.c:1291
vfs_get_tree+0x86/0x270 fs/super.c:1496
do_new_mount fs/namespace.c:2881 [inline]
path_mount+0x1937/0x2c50 fs/namespace.c:3211
do_mount fs/namespace.c:3224 [inline]
__do_sys_mount fs/namespace.c:3432 [inline]
__se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3409
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
exfat specification describe sect_per_clus_bits field of boot sector
could be at most 25 - sect_size_bits and at least 0. And sect_size_bits
can also affect this calculation, It also needs validation.
This patch add validation for sect_per_clus_bits and sect_size_bits
field of boot sector.
Fixes: 719c1e182916 ("exfat: add super block operations")
Cc: stable@vger.kernel.org # v5.9+
Reported-by: syzbot+da4fe66aaadd3c2e2d1c@syzkaller.appspotmail.com
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
"We've got a good handful of patches for SELinux this time around; with
everything passing the selinux-testsuite and applying cleanly to your
tree as of a few minutes ago. The highlights are:
- Add support for labeling anonymous inodes, and extend this new
support to userfaultfd.
- Fallback to SELinux genfs file labeling if the filesystem does not
have xattr support. This is useful for virtiofs which can vary in
its xattr support depending on the backing filesystem.
- Classify and handle MPTCP the same as TCP in SELinux.
- Ensure consistent behavior between inode_getxattr and
inode_listsecurity when the SELinux policy is not loaded. This
fixes a known problem with overlayfs.
- A couple of patches to prune some unused variables from the SELinux
code, mark private variables as static, and mark other variables as
__ro_after_init or __read_mostly"
* tag 'selinux-pr-20210215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
fs: anon_inodes: rephrase to appropriate kernel-doc
userfaultfd: use secure anon inodes for userfaultfd
selinux: teach SELinux about anonymous inodes
fs: add LSM-supporting anon-inode interface
security: add inode_init_security_anon() LSM hook
selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support
selinux: mark selinux_xfrm_refcount as __read_mostly
selinux: mark some global variables __ro_after_init
selinux: make selinuxfs_mount static
selinux: drop the unnecessary aurule_callback variable
selinux: remove unused global variables
selinux: fix inconsistency between inode_getxattr and inode_listsecurity
selinux: handle MPTCP consistently with TCP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs
Pull JFFS2/UBIFS and UBI updates from Richard Weinberger:
"JFFS2:
- Fix for use-after-free in jffs2_sum_write_data()
- Fix for out-of-bounds access in jffs2_zlib_compress()
UBI:
- Remove dead/useless code
UBIFS:
- Fix for a memory leak in ubifs_init_authentication()
- Fix for high stack usage
- Fix for a off-by-one error in xattrs code"
* tag 'for-linus-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
ubifs: Fix error return code in alloc_wbufs()
jffs2: check the validity of dstlen in jffs2_zlib_compress()
ubifs: Fix off-by-one error
ubifs: replay: Fix high stack usage, again
ubifs: Fix memleak in ubifs_init_authentication
jffs2: fix use after free in jffs2_sum_write_data()
ubi: eba: Delete useless kfree code
ubi: remove dead code in validate_vid_hdr()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Many cleanups and fixes for our virtio code
- Add support for a pseudo RTC
- Fix for a possible jailbreak
- Minor fixes (spelling, header files)
* tag 'for-linux-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: irq.h: include <asm-generic/irq.h>
um: io.h: include <linux/types.h>
um: add a pseudo RTC
um: remove process stub VMA
um: rework userspace stubs to not hard-code stub location
um: separate child and parent errors in clone stub
um: defer killing userspace on page table update failures
um: mm: check more comprehensively for stub changes
um: print register names in wait_for_stub
um: hostfs: use a kmem cache for inodes
mm: Remove arch_remap() and mm-arch-hooks.h
um: fix spelling mistake in Kconfig "privleges" -> "privileges"
um: virtio: allow devices to be configured for wakeup
um: time-travel: rework interrupt handling in ext mode
um: virtio: disable VQs during suspend
um: virtio: fix handling of messages without payload
um: virtio: clean up a comment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Convert to using the generic entry infrastructure.
- Add vdso time namespace support.
- Switch s390 and alpha to 64-bit ino_t. As discussed at
https://lore.kernel.org/linux-mm/YCV7QiyoweJwvN+m@osiris/
- Get rid of expensive stck (store clock) usages where possible.
Utilize cpu alternatives to patch stckf when supported.
- Make tod_clock usage less error prone by converting it to a union and
rework code which is using it.
- Machine check handler fixes and cleanups.
- Drop couple of minor inline asm optimizations to fix clang build.
- Default configs changes notably to make libvirt happy.
- Various changes to rework and improve qdio code.
- Other small various fixes and improvements all over the code.
* tag 's390-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (68 commits)
s390/qdio: remove 'merge_pending' mechanism
s390/qdio: improve handling of PENDING buffers for QEBSM devices
s390/qdio: rework q->qdio_error indication
s390/qdio: inline qdio_kick_handler()
s390/time: remove get_tod_clock_ext()
s390/crypto: use store_tod_clock_ext()
s390/hypfs: use store_tod_clock_ext()
s390/debug: use union tod_clock
s390/kvm: use union tod_clock
s390/vdso: use union tod_clock
s390/time: convert tod_clock_base to union
s390/time: introduce new store_tod_clock_ext()
s390/time: rename store_tod_clock_ext() and use union tod_clock
s390/time: introduce union tod_clock
s390,alpha: switch to 64-bit ino_t
s390: split cleanup_sie
s390: use r13 in cleanup_sie as temp register
s390: fix kernel asce loading when sie is interrupted
s390: add stack for machine check handler
s390: use WRITE_ONCE when re-allocating async stack
...
|
|
Pull KVM updates from Paolo Bonzini:
"x86:
- Support for userspace to emulate Xen hypercalls
- Raise the maximum number of user memslots
- Scalability improvements for the new MMU.
Instead of the complex "fast page fault" logic that is used in
mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent,
but the code that can run against page faults is limited. Right now
only page faults take the lock for reading; in the future this will
be extended to some cases of page table destruction. I hope to
switch the default MMU around 5.12-rc3 (some testing was delayed
due to Chinese New Year).
- Cleanups for MAXPHYADDR checks
- Use static calls for vendor-specific callbacks
- On AMD, use VMLOAD/VMSAVE to save and restore host state
- Stop using deprecated jump label APIs
- Workaround for AMD erratum that made nested virtualization
unreliable
- Support for LBR emulation in the guest
- Support for communicating bus lock vmexits to userspace
- Add support for SEV attestation command
- Miscellaneous cleanups
PPC:
- Support for second data watchpoint on POWER10
- Remove some complex workarounds for buggy early versions of POWER9
- Guest entry/exit fixes
ARM64:
- Make the nVHE EL2 object relocatable
- Cleanups for concurrent translation faults hitting the same page
- Support for the standard TRNG hypervisor call
- A bunch of small PMU/Debug fixes
- Simplification of the early init hypercall handling
Non-KVM changes (with acks):
- Detection of contended rwlocks (implemented only for qrwlocks,
because KVM only needs it for x86)
- Allow __DISABLE_EXPORTS from assembly code
- Provide a saner follow_pfn replacements for modules"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits)
KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes
KVM: selftests: Don't bother mapping GVA for Xen shinfo test
KVM: selftests: Fix hex vs. decimal snafu in Xen test
KVM: selftests: Fix size of memslots created by Xen tests
KVM: selftests: Ignore recently added Xen tests' build output
KVM: selftests: Add missing header file needed by xAPIC IPI tests
KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c
KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static
locking/arch: Move qrwlock.h include after qspinlock.h
KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests
KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries
KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
KVM: PPC: remove unneeded semicolon
KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB
KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest
KVM: PPC: Book3S HV: Fix radix guest SLB side channel
KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support
KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR
KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
- Optimize parisc page table locks by using the existing
page_table_lock
- Export argv0-preserve flag in binfmt_misc for usage in qemu-user
- Fix interrupt table (IVT) checksum so firmware will call crash
handler (HPMC)
- Increase IRQ stack to 64kb on 64-bit kernel
- Switch to common devmem_is_allowed() implementation
- Minor fix to get_whan()
* 'parisc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
binfmt_misc: pass binfmt_misc flags to the interpreter
parisc: Optimize per-pagetable spinlocks
parisc: Replace test_ti_thread_flag() with test_tsk_thread_flag()
parisc: Bump 64-bit IRQ stack size to 64 KB
parisc: Fix IVT checksum calculation wrt HPMC
parisc: Use the generic devmem_is_allowed()
parisc: Drop out of get_whan() if task is running again
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
- vDSO build improvements including support for building with BSD.
- Cleanup to the AMU support code and initialisation rework to support
cpufreq drivers built as modules.
- Removal of synthetic frame record from exception stack when entering
the kernel from EL0.
- Add support for the TRNG firmware call introduced by Arm spec
DEN0098.
- Cleanup and refactoring across the board.
- Avoid calling arch_get_random_seed_long() from
add_interrupt_randomness()
- Perf and PMU updates including support for Cortex-A78 and the v8.3
SPE extensions.
- Significant steps along the road to leaving the MMU enabled during
kexec relocation.
- Faultaround changes to initialise prefaulted PTEs as 'old' when
hardware access-flag updates are supported, which drastically
improves vmscan performance.
- CPU errata updates for Cortex-A76 (#1463225) and Cortex-A55
(#1024718)
- Preparatory work for yielding the vector unit at a finer granularity
in the crypto code, which in turn will one day allow us to defer
softirq processing when it is in use.
- Support for overriding CPU ID register fields on the command-line.
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (85 commits)
drivers/perf: Replace spin_lock_irqsave to spin_lock
mm: filemap: Fix microblaze build failure with 'mmu_defconfig'
arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+
arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line
arm64: Defer enabling pointer authentication on boot core
arm64: cpufeatures: Allow disabling of BTI from the command-line
arm64: Move "nokaslr" over to the early cpufeature infrastructure
KVM: arm64: Document HVC_VHE_RESTART stub hypercall
arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0
arm64: Add an aliasing facility for the idreg override
arm64: Honor VHE being disabled from the command-line
arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line
arm64: cpufeature: Add an early command-line cpufeature override facility
arm64: Extract early FDT mapping from kaslr_early_init()
arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding()
arm64: cpufeature: Add global feature override facility
arm64: Move SCTLR_EL1 initialisation to EL-agnostic code
arm64: Simplify init_el2_state to be non-VHE only
arm64: Move VHE-specific SPE setup to mutate_to_vhe()
arm64: Drop early setting of MDSCR_EL2.TPMS
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull tlb gather updates from Ingo Molnar:
"Theses fix MM (soft-)dirty bit management in the procfs code & clean
up the TLB gather API"
* tag 'core-mm-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ldt: Use tlb_gather_mmu_fullmm() when freeing LDT page-tables
tlb: arch: Remove empty __tlb_remove_tlb_entry() stubs
tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu()
tlb: mmu_gather: Introduce tlb_gather_mmu_fullmm()
tlb: mmu_gather: Remove unused start/end arguments from tlb_finish_mmu()
mm: proc: Invalidate TLB after clearing soft-dirty page state
|
|
Pull io_uring updates from Jens Axboe:
"Highlights from this cycles are things like request recycling and
task_work optimizations, which net us anywhere from 10-20% of speedups
on workloads that mostly are inline.
This work was originally done to put io_uring under memcg, which adds
considerable overhead. But it's a really nice win as well. Also worth
highlighting is the LOOKUP_CACHED work in the VFS, and using it in
io_uring. Greatly speeds up the fast path for file opens.
Summary:
- Put io_uring under memcg protection. We accounted just the rings
themselves under rlimit memlock before, now we account everything.
- Request cache recycling, persistent across invocations (Pavel, me)
- First part of a cleanup/improvement to buffer registration (Bijan)
- SQPOLL fixes (Hao)
- File registration NULL pointer fixup (Dan)
- LOOKUP_CACHED support for io_uring
- Disable /proc/thread-self/ for io_uring, like we do for /proc/self
- Add Pavel to the io_uring MAINTAINERS entry
- Tons of code cleanups and optimizations (Pavel)
- Support for skip entries in file registration (Noah)"
* tag 'for-5.12/io_uring-2021-02-17' of git://git.kernel.dk/linux-block: (103 commits)
io_uring: tctx->task_lock should be IRQ safe
proc: don't allow async path resolution of /proc/thread-self components
io_uring: kill cached requests from exiting task closing the ring
io_uring: add helper to free all request caches
io_uring: allow task match to be passed to io_req_cache_free()
io-wq: clear out worker ->fs and ->files
io_uring: optimise io_init_req() flags setting
io_uring: clean io_req_find_next() fast check
io_uring: don't check PF_EXITING from syscall
io_uring: don't split out consume out of SQE get
io_uring: save ctx put/get for task_work submit
io_uring: don't duplicate io_req_task_queue()
io_uring: optimise SQPOLL mm/files grabbing
io_uring: optimise out unlikely link queue
io_uring: take compl state from submit state
io_uring: inline io_complete_rw_common()
io_uring: move res check out of io_rw_reissue()
io_uring: simplify iopoll reissuing
io_uring: clean up io_req_free_batch_finish()
io_uring: move submit side state closer in the ring
...
|
|
Pull core block updates from Jens Axboe:
"Another nice round of removing more code than what is added, mostly
due to Christoph's relentless pursuit of tech debt removal/cleanups.
This pull request contains:
- Two series of BFQ improvements (Paolo, Jan, Jia)
- Block iov_iter improvements (Pavel)
- bsg error path fix (Pan)
- blk-mq scheduler improvements (Jan)
- -EBUSY discard fix (Jan)
- bvec allocation improvements (Ming, Christoph)
- bio allocation and init improvements (Christoph)
- Store bdev pointer in bio instead of gendisk + partno (Christoph)
- Block trace point cleanups (Christoph)
- hard read-only vs read-only split (Christoph)
- Block based swap cleanups (Christoph)
- Zoned write granularity support (Damien)
- Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)"
* tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits)
mm: simplify swapdev_block
sd_zbc: clear zone resources for non-zoned case
block: introduce blk_queue_clear_zone_settings()
zonefs: use zone write granularity as block size
block: introduce zone_write_granularity limit
block: use blk_queue_set_zoned in add_partition()
nullb: use blk_queue_set_zoned() to setup zoned devices
nvme: cleanup zone information initialization
block: document zone_append_max_bytes attribute
block: use bi_max_vecs to find the bvec pool
md/raid10: remove dead code in reshape_request
block: mark the bio as cloned in bio_iov_bvec_set
block: set BIO_NO_PAGE_REF in bio_iov_bvec_set
block: remove a layer of indentation in bio_iov_iter_get_pages
block: turn the nr_iovecs argument to bio_alloc* into an unsigned short
block: remove the 1 and 4 vec bvec_slabs entries
block: streamline bvec_alloc
block: factor out a bvec_alloc_gfp helper
block: move struct biovec_slab to bio.c
block: reuse BIO_INLINE_VECS for integrity bvecs
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux
Pull oprofile and dcookies removal from Viresh Kumar:
"Remove oprofile and dcookies support
The 'oprofile' user-space tools don't use the kernel OPROFILE support
any more, and haven't in a long time. User-space has been converted to
the perf interfaces.
The dcookies stuff is only used by the oprofile code. Now that
oprofile's support is getting removed from the kernel, there is no
need for dcookies as well.
Remove kernel's old oprofile and dcookies support"
* tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux:
fs: Remove dcookies support
drivers: Remove CONFIG_OPROFILE support
arch: xtensa: Remove CONFIG_OPROFILE support
arch: x86: Remove CONFIG_OPROFILE support
arch: sparc: Remove CONFIG_OPROFILE support
arch: sh: Remove CONFIG_OPROFILE support
arch: s390: Remove CONFIG_OPROFILE support
arch: powerpc: Remove oprofile
arch: powerpc: Stop building and using oprofile
arch: parisc: Remove CONFIG_OPROFILE support
arch: mips: Remove CONFIG_OPROFILE support
arch: microblaze: Remove CONFIG_OPROFILE support
arch: ia64: Remove rest of perfmon support
arch: ia64: Remove CONFIG_OPROFILE support
arch: hexagon: Don't select HAVE_OPROFILE
arch: arc: Remove CONFIG_OPROFILE support
arch: arm: Remove CONFIG_OPROFILE support
arch: alpha: Remove CONFIG_OPROFILE support
|
|
Pull xfs updates from Darrick Wong:
"There's a lot going on this time, which seems about right for this
drama-filled year.
Community developers added some code to speed up freezing when
read-only workloads are still running, refactored the logging code,
added checks to prevent file extent counter overflow, reduced iolock
cycling to speed up fsync and gc scans, and started the slow march
towards supporting filesystem shrinking.
There's a huge refactoring of the internal speculative preallocation
garbage collection code which fixes a bunch of bugs, makes the gc
scheduling per-AG and hence multithreaded, and standardizes the retry
logic when we try to reserve space or quota, can't, and want to
trigger a gc scan. We also enable multithreaded quotacheck to reduce
mount times further. This is also preparation for background file gc,
which may or may not land for 5.13.
We also fixed some deadlocks in the rename code, fixed a quota
accounting leak when FSSETXATTR fails, restored the behavior that
write faults to an mmap'd region actually cause a SIGBUS, fixed a bug
where sgid directory inheritance wasn't quite working properly, and
fixed a bug where symlinks weren't working properly in ecryptfs. We
also now advertise the inode btree counters feature that was
introduced two cycles ago.
Summary:
- Fix an ABBA deadlock when renaming files on overlayfs.
- Make sure that we can't overflow the inode extent counters when
adding to or removing extents from a file.
- Make directory sgid inheritance work the same way as all the other
filesystems.
- Don't drain the buffer cache on freeze and ro remount, which should
reduce the amount of time if read-only workloads are continuing
during the freeze.
- Fix a bug where symlink size isn't reported to the vfs in ecryptfs.
- Disentangle log cleaning from log covering. This refactoring sets
us up for future changes to the log, though for now it simply means
that we can use covering for freezes, and cleaning becomes
something we only do at unmount.
- Speed up file fsyncs by reducing iolock cycling.
- Fix delalloc blocks leaking when changing the project id fails
because of input validation errors in FSSETXATTR.
- Fix oversized quota reservation when converting unwritten extents
during a DAX write.
- Create a transaction allocation helper function to standardize the
idiom of allocating a transaction, reserving blocks, locking
inodes, and reserving quota. Replace all the open-coded logic for
file creation, file ownership changes, and file modifications to
use them.
- Actually shut down the fs if the incore quota reservations get
corrupted.
- Fix background block garbage collection scans to not block and to
actually clean out CoW staging extents properly.
- Run block gc scans when we run low on project quota.
- Use the standardized transaction allocation helpers to make it so
that ENOSPC and EDQUOT errors during reservation will back out,
invoke the block gc scanner, and try again. This is preparation for
introducing background inode garbage collection in the next cycle.
- Combine speculative post-EOF block garbage collection with
speculative copy on write block garbage collection.
- Enable multithreaded quotacheck.
- Allow sysadmins to tweak the CPU affinities and maximum concurrency
levels of quotacheck and background blockgc worker pools.
- Expose the inode btree counter feature in the fs geometry ioctl.
- Cleanups of the growfs code in preparation for starting work on
filesystem shrinking.
- Fix all the bloody gcc warnings that the maintainer knows about. :P
- Fix a RST syntax error.
- Don't trigger bmbt corruption assertions after the fs shuts down.
- Restore behavior of forcing SIGBUS on a shut down filesystem when
someone triggers a mmap write fault (or really, any buffered
write)"
* tag 'xfs-5.12-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (85 commits)
xfs: consider shutdown in bmapbt cursor delete assert
xfs: fix boolreturn.cocci warnings
xfs: restore shutdown check in mapped write fault path
xfs: fix rst syntax error in admin guide
xfs: fix incorrect root dquot corruption error when switching group/project quota types
xfs: get rid of xfs_growfs_{data,log}_t
xfs: rename `new' to `delta' in xfs_growfs_data_private()
libxfs: expose inobtcount in xfs geometry
xfs: don't bounce the iolock between free_{eof,cow}blocks
xfs: expose the blockgc workqueue knobs publicly
xfs: parallelize block preallocation garbage collection
xfs: rename block gc start and stop functions
xfs: only walk the incore inode tree once per blockgc scan
xfs: consolidate the eofblocks and cowblocks workers
xfs: consolidate incore inode radix tree posteof/cowblocks tags
xfs: remove trivial eof/cowblocks functions
xfs: hide xfs_icache_free_cowblocks
xfs: hide xfs_icache_free_eofblocks
xfs: relocate the eofb/cowb workqueue functions
xfs: set WQ_SYSFS on all workqueues in debug mode
...
|
|
Pull iomap updates from Darrick Wong:
"The big change in this cycle is some new code to make it possible for
XFS to try unaligned directio overwrites without taking locks. If the
block is fully written and within EOF (i.e. doesn't require any
further fs intervention) then we can let the unlocked write proceed.
If not, we fall back to synchronizing direct writes.
Summary:
- Adjust the final parameter of iomap_dio_rw.
- Add a new flag to request that iomap directio writes return EAGAIN
if the write is not a pure overwrite within EOF; this will be used
to reduce lock contention with unaligned direct writes on XFS.
- Amend XFS' directio code to eliminate exclusive locking for
unaligned direct writes if the circumstances permit"
* tag 'iomap-5.12-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reduce exclusive locking on unaligned dio
xfs: split the unaligned DIO write code out
xfs: improve the reflink_bounce_dio_write tracepoint
xfs: simplify the read/write tracepoints
xfs: remove the buffered I/O fallback assert
xfs: cleanup the read/write helper naming
xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-aware
xfs: factor out a xfs_ilock_iocb helper
iomap: add a IOMAP_DIO_OVERWRITE_ONLY flag
iomap: pass a flags argument to iomap_dio_rw
iomap: rename the flags variable in __iomap_dio_rw
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore fix from Kees Cook:
"Fix a CONFIG typo (Jiri Bohac)"
* tag 'pstore-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore: Fix typo in compression option name
|
|
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity updates from Eric Biggers:
"Add an ioctl which allows reading fs-verity metadata from a file.
This is useful when a file with fs-verity enabled needs to be served
somewhere, and the other end wants to do its own fs-verity compatible
verification of the file. See the commit messages for details.
This new ioctl has been tested using new xfstests I've written for it"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fs-verity: support reading signature with ioctl
fs-verity: support reading descriptor with ioctl
fs-verity: support reading Merkle tree with ioctl
fs-verity: add FS_IOC_READ_VERITY_METADATA ioctl
fs-verity: don't pass whole descriptor to fsverity_verify_signature()
fs-verity: factor out fsverity_get_descriptor()
|
|
Pull nfsd updates from Chuck Lever:
- Update NFSv2 and NFSv3 XDR decoding functions
- Further improve support for re-exporting NFS mounts
- Convert NFSD stats to per-CPU counters
- Add batch Receive posting to the server's RPC/RDMA transport
* tag 'nfsd-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (65 commits)
nfsd: skip some unnecessary stats in the v4 case
nfs: use change attribute for NFS re-exports
NFSv4_2: SSC helper should use its own config.
nfsd: cstate->session->se_client -> cstate->clp
nfsd: simplify nfsd4_check_open_reclaim
nfsd: remove unused set_client argument
nfsd: find_cpntf_state cleanup
nfsd: refactor set_client
nfsd: rename lookup_clientid->set_client
nfsd: simplify nfsd_renew
nfsd: simplify process_lock
nfsd4: simplify process_lookup1
SUNRPC: Correct a comment
svcrdma: DMA-sync the receive buffer in svc_rdma_recvfrom()
svcrdma: Reduce Receive doorbell rate
svcrdma: Deprecate stat variables that are no longer used
svcrdma: Restore read and write stats
svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter
svcrdma: Convert rdma_stat_recv to a per-CPU counter
svcrdma: Refactor svc_rdma_init() and svc_rdma_clean_up()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"This contains a somewhat important but rarely reproduced fix reported
month ago for platforms which have weak memory model (e.g. arm64).
The root cause is that test_bit/set_bit atomic operations are actually
implemented in relaxed forms, and uninitialized fields governed by an
atomic bit could be observed in advance due to memory reordering thus
memory barrier pairs should be used.
There is also a trivial fix of crafted blkszbits generated by
syzkaller.
Summary:
- fix shift-out-of-bounds of crafted blkszbits generated by syzkaller
- ensure initialized fields can only be observed after bit is set"
* tag 'erofs-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: initialized fields can only be observed after bit is set
erofs: fix shift-out-of-bounds of blkszbits
|