summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2021-02-24mm: memcontrol: convert NR_ANON_THPS account to pagesMuchun Song
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_ANON_THPS account to pages. This patch is consistent with 8f182270dfec ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-3-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Feng Tang <feng.tang@intel.com> Cc: NeilBrown <neilb@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24fs/buffer.c: add checking buffer head stat before clearYang Guo
clear_buffer_new() is used to clear buffer new stat. When PAGE_SIZE is 64K, most buffer heads in the list are not needed to clear. clear_buffer_new() has an enpensive atomic modification operation, Let's add checking buffer head before clear it as __block_write_begin_int does which is good for performance. Link: https://lkml.kernel.org/r/1612332890-57918-1-git-send-email-zhangshaokun@hisilicon.com Signed-off-by: Yang Guo <guoyang2@huawei.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm/filemap: rename generic_file_buffered_read to filemap_readChristoph Hellwig
Rename generic_file_buffered_read to match the naming of filemap_fault, also update the written parameter to a more descriptive name and improve the kerneldoc comment. Link: https://lkml.kernel.org/r/20210122160140.223228-18-willy@infradead.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Kent Overstreet <kent.overstreet@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24mm/filemap: remove unused parameter and change to void type for ↵Baolin Wang
replace_page_cache_page() Since commit 74d609585d8b ("page cache: Add and replace pages using the XArray") was merged, the replace_page_cache_page() can not fail and always return 0, we can remove the redundant return value and void it. Moreover remove the unused gfp_mask. Link: https://lkml.kernel.org/r/609c30e5274ba15d8b90c872fd0d8ac437a9b2bb.1610071401.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24ramfs: support O_TMPFILEAlexey Dobriyan
[akpm@linux-foundation.org: update inode_operations.tmpfile] Link: http://lkml.kernel.org/r/20190206073349.GA15311@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24fs: delete repeated words in commentsRandy Dunlap
Delete duplicate words in fs/*.c. The doubled words that are being dropped are: that, be, the, in, and, for Link: https://lkml.kernel.org/r/20201224052810.25315-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24ocfs2: simplify the calculation of variablesJiapeng Chong
Fix the following coccicheck warnings: fs/ocfs2/refcounttree.c:981:16-18: WARNING !A || A && B is equivalent to !A || B. Link: https://lkml.kernel.org/r/1612235424-80367-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24ocfs2: fix a use after free on errorDan Carpenter
The error handling in this function frees "reg" but it is still on the "o2hb_all_regions" list so it will lead to a use after freew. Joseph Qi points out that we need to clear the bit in the "o2hb_region_bitmap" as well Link: https://lkml.kernel.org/r/YBk4M6HUG8jB/jc7@mwanda Fixes: 1cf257f51191 ("ocfs2: fix memory leak") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24ocfs2: clean up some definitions which are not used any moreguozh
There are some definitions which is not used anymore in OCFS2 module, so as to be removed. Link: https://lkml.kernel.org/r/2021011916182284700534@chinatelecom.cn Signed-off-by: Guozhonghua <guozh88@chinatelecom.cn> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24ocfs2: remove redundant conditional before iputYi Li
iput handles NULL pointers gracefully, so there's no need to check the pointer before the call. Link: https://lkml.kernel.org/r/20201231040535.4091761-1-yili@winhong.com Signed-off-by: Yi Li <yili@winhong.com> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24ntfs: check for valid standard information attributeRustam Kovhaev
Mounting a corrupted filesystem with NTFS resulted in a kernel crash. We should check for valid STANDARD_INFORMATION attribute offset and length before trying to access it Link: https://lkml.kernel.org/r/20210217155930.1506815-1-rkovhaev@gmail.com Link: https://syzkaller.appspot.com/bug?extid=c584225dabdea2f71969 Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Reported-by: syzbot+c584225dabdea2f71969@syzkaller.appspotmail.com Tested-by: syzbot+c584225dabdea2f71969@syzkaller.appspotmail.com Acked-by: Anton Altaparmakov <anton@tuxera.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-24ntfs: layout.h: delete duplicated wordsRandy Dunlap
Drop the repeated words "the" and "in" in comments. Link: https://lkml.kernel.org/r/20210125194937.24627-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Anton Altaparmakov <anton@tuxera.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-02-23Merge tag 'gfs2-for-5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Log space and revoke accounting rework to fix some failed asserts. - Local resource group glock sharing for better local performance. - Add support for version 1802 filesystems: trusted xattr support and '-o rgrplvb' mounts by default. - Actually synchronize on the inode glock's FREEING bit during withdraw ("gfs2: fix glock confusion in function signal_our_withdraw"). - Fix parallel recovery of multiple journals ("gfs2: keep bios separate for each journal"). - Various other bug fixes. * tag 'gfs2-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (49 commits) gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flush gfs2: Per-revoke accounting in transactions gfs2: Rework the log space allocation logic gfs2: Minor calc_reserved cleanup gfs2: Use resource group glock sharing gfs2: Allow node-wide exclusive glock sharing gfs2: Add local resource group locking gfs2: Add per-reservation reserved block accounting gfs2: Rename rs_{free -> requested} and rd_{reserved -> requested} gfs2: Check for active reservation in gfs2_release gfs2: Don't search for unreserved space twice gfs2: Only pass reservation down to gfs2_rbm_find gfs2: Also reflect single-block allocations in rgd->rd_extfail_pt gfs2: Recursive gfs2_quota_hold in gfs2_iomap_end gfs2: Add trusted xattr support gfs2: Enable rgrplvb for sb_fs_format 1802 gfs2: Don't skip dlm unlock if glock has an lvb gfs2: Lock imbalance on error path in gfs2_recover_one gfs2: Move function gfs2_ail_empty_tr gfs2: Get rid of current_tail() ...
2021-02-23Merge tag 'idmapped-mounts-v5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: https://git.kernel.org/brauner/man-pages/c/1d7b902e2875a1ff342e036a9f866a995640aea8 In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...
2021-02-23gfs2: Don't get stuck with I/O plugged in gfs2_ail1_flushBob Peterson
In gfs2_ail1_flush, we're using I/O plugging to give the block layer a better chance of merging I/O requests. If we're too aggressive here, we can end up waiting on I/O to complete while still plugged. Fix that in a way similar to writeback_sb_inodes, except that we can't use blk_flush_plug because blk_flush_plug_list is not exported. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-23Merge branches 'rgrp-glock-sharing' and 'gfs2-revoke' from ↵Andreas Gruenbacher
https://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2.git Merge the resource group glock sharing feature and the revoke accounting rework. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22Merge tag 'topic/iomem-mmap-vs-gup-2021-02-22' of ↵Linus Torvalds
git://anongit.freedesktop.org/drm/drm Pull follow_pfn() updates from Daniel Vetter: "Fixes around VM_FPNMAP and follow_pfn: - replace mm/frame_vector.c by get_user_pages in misc/habana and drm/exynos drivers, then move that into media as it's sole user - close race in generic_access_phys - s390 pci ioctl fix of this series landed in 5.11 already - properly revoke iomem mappings (/dev/mem, pci files)" * tag 'topic/iomem-mmap-vs-gup-2021-02-22' of git://anongit.freedesktop.org/drm/drm: PCI: Revoke mappings like devmem PCI: Also set up legacy files only after sysfs init sysfs: Support zapping of binary attr mmaps resource: Move devmem revoke code to resource framework /dev/mem: Only set filp->f_mapping PCI: Obey iomem restrictions for procfs mmap mm: Close race in generic_access_phys media: videobuf2: Move frame_vector into media subsystem mm/frame-vector: Use FOLL_LONGTERM misc/habana: Use FOLL_LONGTERM for userptr misc/habana: Stop using frame_vector helpers drm/exynos: Use FOLL_LONGTERM for g2d cmdlists drm/exynos: Stop using frame_vector helpers
2021-02-22Merge tag 'topic/kcmp-kconfig-2021-02-22' of ↵Linus Torvalds
git://anongit.freedesktop.org/drm/drm Pull kcmp kconfig update from Daniel Vetter: "Make the kcmp syscall available independently of checkpoint/restore. drm userspaces uses this, systemd uses this, so makes sense to pull it out from the checkpoint-restore bundle. Kees reviewed this from security pov and is happy with the final version" Link: https://lwn.net/Articles/845448/ * tag 'topic/kcmp-kconfig-2021-02-22' of git://anongit.freedesktop.org/drm/drm: kcmp: Support selection of SYS_kcmp without CHECKPOINT_RESTORE
2021-02-22Merge tag 'nfsd-5.12-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Here are a few additional NFSD commits for the merge window: Optimization: - Cork the socket while there are queued replies Fixes: - DRC shutdown ordering - svc_rdma_accept() lockdep splat" * tag 'nfsd-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Further clean up svc_tcp_sendmsg() SUNRPC: Remove redundant socket flags from svc_tcp_sendmsg() SUNRPC: Use TCP_CORK to optimise send performance on the server svcrdma: Hold private mutex while invoking rdma_accept() nfsd: register pernet ops last, unregister first
2021-02-22Merge tag 'ceph-for-5.12-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds
Pull ceph updates from Ilya Dryomov: "With netfs helper library and fscache rework delayed, just a few cap handling improvements to avoid grabbing mmap_lock in some code paths and deal with capsnaps better and a mount option cleanup" * tag 'ceph-for-5.12-rc1' of git://github.com/ceph/ceph-client: ceph: defer flushing the capsnap if the Fb is used libceph: remove osdtimeout option entirely libceph: deprecate [no]cephx_require_signatures options ceph: allow queueing cap/snap handling after putting cap references ceph: clean up inode work queueing ceph: fix flush_snap logic after putting caps
2021-02-22Merge tag 'fs_for_v5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull isofs, udf, and quota updates from Jan Kara: "Several udf, isofs, and quota fixes" * tag 'fs_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: parser: Fix kernel-doc markups udf: handle large user and group ID isofs: handle large user and group ID parser: add unsigned int parser udf: fix silent AED tagLocation corruption isofs: release buffer head before return quota: Fix memory leak when handling corrupted quota file
2021-02-22Merge tag 'fsnotify_for_v5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify update from Jan Kara: "Make inotify groups be charged against appropriate memcgs" * tag 'fsnotify_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: inotify, memcg: account inotify instances to kmemcg
2021-02-22Merge tag 'lazytime_for_v5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull lazytime updates from Jan Kara: "Cleanups of the lazytime handling in the writeback code making rules for calling ->dirty_inode() filesystem handlers saner" * tag 'lazytime_for_v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext4: simplify i_state checks in __ext4_update_other_inode_time() gfs2: don't worry about I_DIRTY_TIME in gfs2_fsync() fs: improve comments for writeback_single_inode() fs: drop redundant check from __writeback_single_inode() fs: clean up __mark_inode_dirty() a bit fs: pass only I_DIRTY_INODE flags to ->dirty_inode fs: don't call ->dirty_inode for lazytime timestamp updates fat: only specify I_DIRTY_TIME when needed in fat_update_time() fs: only specify I_DIRTY_TIME when needed in generic_update_time() fs: correctly document the inode dirty flags
2021-02-22Merge tag 'exfat-for-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - improve file deletion performance with dirsync mount option - fix shift-out-of-bounds in exfat_fill_super() reported by syzkaller * tag 'exfat-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: improve performance of exfat_free_cluster when using dirsync mount option exfat: fix shift-out-of-bounds in exfat_fill_super()
2021-02-22Merge tag 'zonefs-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs updates from Damien Le Moal: "Two changes: - A fix that did not make it in time for 5.11, to correct the file size initialization of full sequential zone, from Shin'ichiro - Add file operation tracepoints to help with debugging, from Johannes" * tag 'zonefs-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: Fix file size of zones in full condition zonefs: add tracepoints for file operations
2021-02-22Merge branch 'work.audit' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull RCU-safe common_lsm_audit() from Al Viro: "Make common_lsm_audit() non-blocking and usable from RCU pathwalk context. We don't really need to grab/drop dentry in there - rcu_read_lock() is enough. There's a couple of followups using that to simplify the logics in selinux, but those hadn't soaked in -next yet, so they'll have to go in next window" * 'work.audit' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: make dump_common_audit_data() safe to be called from RCU pathwalk new helper: d_find_alias_rcu()
2021-02-22Merge branch 'work.d_name' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull d_name whack-a-mole from Al Viro: "A bunch of places that play with ->d_name in printks instead of using proper formats..." * 'work.d_name' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: orangefs_file_mmap(): use %pD cifs_debug: use %pd instead of messing with ->d_name erofs: use %pd instead of messing with ->d_name cramfs: use %pD instead of messing with file_dentry()->d_name
2021-02-22gfs2: Per-revoke accounting in transactionsAndreas Gruenbacher
In the log, revokes are stored as a revoke descriptor (struct gfs2_log_descriptor), followed by zero or more additional revoke blocks (struct gfs2_meta_header). On filesystems with a blocksize of 4k, the revoke descriptor contains up to 503 revokes, and the metadata blocks contain up to 509 revokes each. We've so far been reserving space for revokes in transactions in block granularity, so a lot more space than necessary was being allocated and then released again. This patch switches to assigning revokes to transactions individually instead. Initially, space for the revoke descriptor is reserved and handed out to transactions. When more revokes than that are reserved, additional revoke blocks are added. When the log is flushed, the space for the additional revoke blocks is released, but we keep the space for the revoke descriptor block allocated. Transactions may still reserve more revokes than they will actually need in the end, but now we won't overshoot the target as much, and by only returning the space for excess revokes at log flush time, we further reduce the amount of contention between processes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22gfs2: Rework the log space allocation logicAndreas Gruenbacher
The current log space allocation logic is hard to understand or extend. The principle it that when the log is flushed, we may or may not have a transaction active that has space allocated in the log. To deal with that, we set aside a magical number of blocks to be used in case we don't have an active transaction. It isn't clear that the pool will always be big enough. In addition, we can't return unused log space at the end of a transaction, so the number of blocks allocated must exactly match the number of blocks used. Simplify this as follows: * When transactions are allocated or merged, always reserve enough blocks to flush the transaction (err on the safe side). * In gfs2_log_flush, return any allocated blocks that haven't been used. * Maintain a pool of spare blocks big enough to do one log flush, as before. * In gfs2_log_flush, when we have no active transaction, allocate a suitable number of blocks. For that, use the spare pool when called from logd, and leave the pool alone otherwise. This means that when the log is almost full, logd will still be able to do one more log flush, which will result in more log space becoming available. This will make the log space allocator code easier to work with in the future. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22gfs2: Minor calc_reserved cleanupAndreas Gruenbacher
No functional change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2021-02-22Merge tag 'docs-5.12' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "It has been a relatively quiet cycle in docsland. - As promised, the minimum Sphinx version to build the docs is now 1.7, and we have dropped support for Python 2 entirely. That allowed the removal of a bunch of compatibility code. - A set of treewide warning fixups from Mauro that I applied after it became clear nobody else was going to deal with them. - The automarkup mechanism can now create cross-references from relative paths to RST files. - More translations, typo fixes, and warning fixes" * tag 'docs-5.12' of git://git.lwn.net/linux: (75 commits) docs: kernel-hacking: be more civil docs: Remove the Microsoft rhetoric Documentation/admin-guide: kernel-parameters: Update nohlt section doc/admin-guide: fix spelling mistake: "perfomance" -> "performance" docs: Document cross-referencing using relative path docs: Enable usage of relative paths to docs on automarkup docs: thermal: fix spelling mistakes Documentation: admin-guide: Update kvm/xen config option docs: Make syscalls' helpers naming consistent coding-style.rst: Avoid comma statements Documentation: /proc/loadavg: add 3 more field descriptions Documentation/submitting-patches: Add blurb about backtraces in commit messages Docs: drop Python 2 support Move our minimum Sphinx version to 1.7 Documentation: input: define ABS_PRESSURE/ABS_MT_PRESSURE resolution as grams scripts/kernel-doc: add internal hyperlink to DOC: sections Update Documentation/admin-guide/sysctl/fs.rst docs: Update DTB format references docs: zh_CN: add iio index.rst translation docs/zh_CN: add iio ep93xx_adc.rst translation ...
2021-02-22exfat: improve performance of exfat_free_cluster when using dirsync mount optionHyeongseok Kim
There are stressful update of cluster allocation bitmap when using dirsync mount option which is doing sync buffer on every cluster bit clearing. This could result in performance degradation when deleting big size file. Fix to update only when the bitmap buffer index is changed would make less disk access, improving performance especially for truncate operation. Testing with Samsung 256GB sdcard, mounted with dirsync option (mount -t exfat /dev/block/mmcblk0p1 /temp/mount -o dirsync) Remove 4GB file, blktrace result. [Before] : 39 secs. Total (blktrace): Reads Queued: 0, 0KiB Writes Queued: 32775, 16387KiB Read Dispatches: 0, 0KiB Write Dispatches: 32775, 16387KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 0, 0KiB Writes Completed: 32775, 16387KiB Read Merges: 0, 0KiB Write Merges: 0, 0KiB IO unplugs: 2 Timer unplugs: 0 [After] : 1 sec. Total (blktrace): Reads Queued: 0, 0KiB Writes Queued: 13, 6KiB Read Dispatches: 0, 0KiB Write Dispatches: 13, 6KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 0, 0KiB Writes Completed: 13, 6KiB Read Merges: 0, 0KiB Write Merges: 0, 0KiB IO unplugs: 1 Timer unplugs: 0 Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2021-02-22exfat: fix shift-out-of-bounds in exfat_fill_super()Namjae Jeon
syzbot reported a warning which could cause shift-out-of-bounds issue. Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x183/0x22e lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:148 [inline] __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395 exfat_read_boot_sector fs/exfat/super.c:471 [inline] __exfat_fill_super fs/exfat/super.c:556 [inline] exfat_fill_super+0x2acb/0x2d00 fs/exfat/super.c:624 get_tree_bdev+0x406/0x630 fs/super.c:1291 vfs_get_tree+0x86/0x270 fs/super.c:1496 do_new_mount fs/namespace.c:2881 [inline] path_mount+0x1937/0x2c50 fs/namespace.c:3211 do_mount fs/namespace.c:3224 [inline] __do_sys_mount fs/namespace.c:3432 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3409 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 exfat specification describe sect_per_clus_bits field of boot sector could be at most 25 - sect_size_bits and at least 0. And sect_size_bits can also affect this calculation, It also needs validation. This patch add validation for sect_per_clus_bits and sect_size_bits field of boot sector. Fixes: 719c1e182916 ("exfat: add super block operations") Cc: stable@vger.kernel.org # v5.9+ Reported-by: syzbot+da4fe66aaadd3c2e2d1c@syzkaller.appspotmail.com Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
2021-02-21Merge tag 'selinux-pr-20210215' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: "We've got a good handful of patches for SELinux this time around; with everything passing the selinux-testsuite and applying cleanly to your tree as of a few minutes ago. The highlights are: - Add support for labeling anonymous inodes, and extend this new support to userfaultfd. - Fallback to SELinux genfs file labeling if the filesystem does not have xattr support. This is useful for virtiofs which can vary in its xattr support depending on the backing filesystem. - Classify and handle MPTCP the same as TCP in SELinux. - Ensure consistent behavior between inode_getxattr and inode_listsecurity when the SELinux policy is not loaded. This fixes a known problem with overlayfs. - A couple of patches to prune some unused variables from the SELinux code, mark private variables as static, and mark other variables as __ro_after_init or __read_mostly" * tag 'selinux-pr-20210215' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: fs: anon_inodes: rephrase to appropriate kernel-doc userfaultfd: use secure anon inodes for userfaultfd selinux: teach SELinux about anonymous inodes fs: add LSM-supporting anon-inode interface security: add inode_init_security_anon() LSM hook selinux: fall back to SECURITY_FS_USE_GENFS if no xattr support selinux: mark selinux_xfrm_refcount as __read_mostly selinux: mark some global variables __ro_after_init selinux: make selinuxfs_mount static selinux: drop the unnecessary aurule_callback variable selinux: remove unused global variables selinux: fix inconsistency between inode_getxattr and inode_listsecurity selinux: handle MPTCP consistently with TCP
2021-02-21Merge tag 'for-linus-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull JFFS2/UBIFS and UBI updates from Richard Weinberger: "JFFS2: - Fix for use-after-free in jffs2_sum_write_data() - Fix for out-of-bounds access in jffs2_zlib_compress() UBI: - Remove dead/useless code UBIFS: - Fix for a memory leak in ubifs_init_authentication() - Fix for high stack usage - Fix for a off-by-one error in xattrs code" * tag 'for-linus-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: Fix error return code in alloc_wbufs() jffs2: check the validity of dstlen in jffs2_zlib_compress() ubifs: Fix off-by-one error ubifs: replay: Fix high stack usage, again ubifs: Fix memleak in ubifs_init_authentication jffs2: fix use after free in jffs2_sum_write_data() ubi: eba: Delete useless kfree code ubi: remove dead code in validate_vid_hdr()
2021-02-21Merge tag 'for-linux-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Many cleanups and fixes for our virtio code - Add support for a pseudo RTC - Fix for a possible jailbreak - Minor fixes (spelling, header files) * tag 'for-linux-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: irq.h: include <asm-generic/irq.h> um: io.h: include <linux/types.h> um: add a pseudo RTC um: remove process stub VMA um: rework userspace stubs to not hard-code stub location um: separate child and parent errors in clone stub um: defer killing userspace on page table update failures um: mm: check more comprehensively for stub changes um: print register names in wait_for_stub um: hostfs: use a kmem cache for inodes mm: Remove arch_remap() and mm-arch-hooks.h um: fix spelling mistake in Kconfig "privleges" -> "privileges" um: virtio: allow devices to be configured for wakeup um: time-travel: rework interrupt handling in ext mode um: virtio: disable VQs during suspend um: virtio: fix handling of messages without payload um: virtio: clean up a comment
2021-02-21Merge tag 's390-5.12-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Convert to using the generic entry infrastructure. - Add vdso time namespace support. - Switch s390 and alpha to 64-bit ino_t. As discussed at https://lore.kernel.org/linux-mm/YCV7QiyoweJwvN+m@osiris/ - Get rid of expensive stck (store clock) usages where possible. Utilize cpu alternatives to patch stckf when supported. - Make tod_clock usage less error prone by converting it to a union and rework code which is using it. - Machine check handler fixes and cleanups. - Drop couple of minor inline asm optimizations to fix clang build. - Default configs changes notably to make libvirt happy. - Various changes to rework and improve qdio code. - Other small various fixes and improvements all over the code. * tag 's390-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (68 commits) s390/qdio: remove 'merge_pending' mechanism s390/qdio: improve handling of PENDING buffers for QEBSM devices s390/qdio: rework q->qdio_error indication s390/qdio: inline qdio_kick_handler() s390/time: remove get_tod_clock_ext() s390/crypto: use store_tod_clock_ext() s390/hypfs: use store_tod_clock_ext() s390/debug: use union tod_clock s390/kvm: use union tod_clock s390/vdso: use union tod_clock s390/time: convert tod_clock_base to union s390/time: introduce new store_tod_clock_ext() s390/time: rename store_tod_clock_ext() and use union tod_clock s390/time: introduce union tod_clock s390,alpha: switch to 64-bit ino_t s390: split cleanup_sie s390: use r13 in cleanup_sie as temp register s390: fix kernel asce loading when sie is interrupted s390: add stack for machine check handler s390: use WRITE_ONCE when re-allocating async stack ...
2021-02-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "x86: - Support for userspace to emulate Xen hypercalls - Raise the maximum number of user memslots - Scalability improvements for the new MMU. Instead of the complex "fast page fault" logic that is used in mmu.c, tdp_mmu.c uses an rwlock so that page faults are concurrent, but the code that can run against page faults is limited. Right now only page faults take the lock for reading; in the future this will be extended to some cases of page table destruction. I hope to switch the default MMU around 5.12-rc3 (some testing was delayed due to Chinese New Year). - Cleanups for MAXPHYADDR checks - Use static calls for vendor-specific callbacks - On AMD, use VMLOAD/VMSAVE to save and restore host state - Stop using deprecated jump label APIs - Workaround for AMD erratum that made nested virtualization unreliable - Support for LBR emulation in the guest - Support for communicating bus lock vmexits to userspace - Add support for SEV attestation command - Miscellaneous cleanups PPC: - Support for second data watchpoint on POWER10 - Remove some complex workarounds for buggy early versions of POWER9 - Guest entry/exit fixes ARM64: - Make the nVHE EL2 object relocatable - Cleanups for concurrent translation faults hitting the same page - Support for the standard TRNG hypervisor call - A bunch of small PMU/Debug fixes - Simplification of the early init hypercall handling Non-KVM changes (with acks): - Detection of contended rwlocks (implemented only for qrwlocks, because KVM only needs it for x86) - Allow __DISABLE_EXPORTS from assembly code - Provide a saner follow_pfn replacements for modules" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (192 commits) KVM: x86/xen: Explicitly pad struct compat_vcpu_info to 64 bytes KVM: selftests: Don't bother mapping GVA for Xen shinfo test KVM: selftests: Fix hex vs. decimal snafu in Xen test KVM: selftests: Fix size of memslots created by Xen tests KVM: selftests: Ignore recently added Xen tests' build output KVM: selftests: Add missing header file needed by xAPIC IPI tests KVM: selftests: Add operand to vmsave/vmload/vmrun in svm.c KVM: SVM: Make symbol 'svm_gp_erratum_intercept' static locking/arch: Move qrwlock.h include after qspinlock.h KVM: PPC: Book3S HV: Fix host radix SLB optimisation with hash guests KVM: PPC: Book3S HV: Ensure radix guest has no SLB entries KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2 KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path KVM: PPC: remove unneeded semicolon KVM: PPC: Book3S HV: Use POWER9 SLBIA IH=6 variant to clear SLB KVM: PPC: Book3S HV: No need to clear radix host SLB before loading HPT guest KVM: PPC: Book3S HV: Fix radix guest SLB side channel KVM: PPC: Book3S HV: Remove support for running HPT guest on RPT host without mixed mode support KVM: PPC: Book3S HV: Introduce new capability for 2nd DAWR KVM: PPC: Book3S HV: Add infrastructure to support 2nd DAWR ...
2021-02-21Merge branch 'parisc-5.12-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc updates from Helge Deller: - Optimize parisc page table locks by using the existing page_table_lock - Export argv0-preserve flag in binfmt_misc for usage in qemu-user - Fix interrupt table (IVT) checksum so firmware will call crash handler (HPMC) - Increase IRQ stack to 64kb on 64-bit kernel - Switch to common devmem_is_allowed() implementation - Minor fix to get_whan() * 'parisc-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: binfmt_misc: pass binfmt_misc flags to the interpreter parisc: Optimize per-pagetable spinlocks parisc: Replace test_ti_thread_flag() with test_tsk_thread_flag() parisc: Bump 64-bit IRQ stack size to 64 KB parisc: Fix IVT checksum calculation wrt HPMC parisc: Use the generic devmem_is_allowed() parisc: Drop out of get_whan() if task is running again
2021-02-21Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: - vDSO build improvements including support for building with BSD. - Cleanup to the AMU support code and initialisation rework to support cpufreq drivers built as modules. - Removal of synthetic frame record from exception stack when entering the kernel from EL0. - Add support for the TRNG firmware call introduced by Arm spec DEN0098. - Cleanup and refactoring across the board. - Avoid calling arch_get_random_seed_long() from add_interrupt_randomness() - Perf and PMU updates including support for Cortex-A78 and the v8.3 SPE extensions. - Significant steps along the road to leaving the MMU enabled during kexec relocation. - Faultaround changes to initialise prefaulted PTEs as 'old' when hardware access-flag updates are supported, which drastically improves vmscan performance. - CPU errata updates for Cortex-A76 (#1463225) and Cortex-A55 (#1024718) - Preparatory work for yielding the vector unit at a finer granularity in the crypto code, which in turn will one day allow us to defer softirq processing when it is in use. - Support for overriding CPU ID register fields on the command-line. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (85 commits) drivers/perf: Replace spin_lock_irqsave to spin_lock mm: filemap: Fix microblaze build failure with 'mmu_defconfig' arm64: Make CPU_BIG_ENDIAN depend on ld.bfd or ld.lld 13.0.0+ arm64: cpufeatures: Allow disabling of Pointer Auth from the command-line arm64: Defer enabling pointer authentication on boot core arm64: cpufeatures: Allow disabling of BTI from the command-line arm64: Move "nokaslr" over to the early cpufeature infrastructure KVM: arm64: Document HVC_VHE_RESTART stub hypercall arm64: Make kvm-arm.mode={nvhe, protected} an alias of id_aa64mmfr1.vh=0 arm64: Add an aliasing facility for the idreg override arm64: Honor VHE being disabled from the command-line arm64: Allow ID_AA64MMFR1_EL1.VH to be overridden from the command line arm64: cpufeature: Add an early command-line cpufeature override facility arm64: Extract early FDT mapping from kaslr_early_init() arm64: cpufeature: Use IDreg override in __read_sysreg_by_encoding() arm64: cpufeature: Add global feature override facility arm64: Move SCTLR_EL1 initialisation to EL-agnostic code arm64: Simplify init_el2_state to be non-VHE only arm64: Move VHE-specific SPE setup to mutate_to_vhe() arm64: Drop early setting of MDSCR_EL2.TPMS ...
2021-02-21Merge tag 'core-mm-2021-02-17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull tlb gather updates from Ingo Molnar: "Theses fix MM (soft-)dirty bit management in the procfs code & clean up the TLB gather API" * tag 'core-mm-2021-02-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ldt: Use tlb_gather_mmu_fullmm() when freeing LDT page-tables tlb: arch: Remove empty __tlb_remove_tlb_entry() stubs tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu() tlb: mmu_gather: Introduce tlb_gather_mmu_fullmm() tlb: mmu_gather: Remove unused start/end arguments from tlb_finish_mmu() mm: proc: Invalidate TLB after clearing soft-dirty page state
2021-02-21Merge tag 'for-5.12/io_uring-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring updates from Jens Axboe: "Highlights from this cycles are things like request recycling and task_work optimizations, which net us anywhere from 10-20% of speedups on workloads that mostly are inline. This work was originally done to put io_uring under memcg, which adds considerable overhead. But it's a really nice win as well. Also worth highlighting is the LOOKUP_CACHED work in the VFS, and using it in io_uring. Greatly speeds up the fast path for file opens. Summary: - Put io_uring under memcg protection. We accounted just the rings themselves under rlimit memlock before, now we account everything. - Request cache recycling, persistent across invocations (Pavel, me) - First part of a cleanup/improvement to buffer registration (Bijan) - SQPOLL fixes (Hao) - File registration NULL pointer fixup (Dan) - LOOKUP_CACHED support for io_uring - Disable /proc/thread-self/ for io_uring, like we do for /proc/self - Add Pavel to the io_uring MAINTAINERS entry - Tons of code cleanups and optimizations (Pavel) - Support for skip entries in file registration (Noah)" * tag 'for-5.12/io_uring-2021-02-17' of git://git.kernel.dk/linux-block: (103 commits) io_uring: tctx->task_lock should be IRQ safe proc: don't allow async path resolution of /proc/thread-self components io_uring: kill cached requests from exiting task closing the ring io_uring: add helper to free all request caches io_uring: allow task match to be passed to io_req_cache_free() io-wq: clear out worker ->fs and ->files io_uring: optimise io_init_req() flags setting io_uring: clean io_req_find_next() fast check io_uring: don't check PF_EXITING from syscall io_uring: don't split out consume out of SQE get io_uring: save ctx put/get for task_work submit io_uring: don't duplicate io_req_task_queue() io_uring: optimise SQPOLL mm/files grabbing io_uring: optimise out unlikely link queue io_uring: take compl state from submit state io_uring: inline io_complete_rw_common() io_uring: move res check out of io_rw_reissue() io_uring: simplify iopoll reissuing io_uring: clean up io_req_free_batch_finish() io_uring: move submit side state closer in the ring ...
2021-02-21Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull core block updates from Jens Axboe: "Another nice round of removing more code than what is added, mostly due to Christoph's relentless pursuit of tech debt removal/cleanups. This pull request contains: - Two series of BFQ improvements (Paolo, Jan, Jia) - Block iov_iter improvements (Pavel) - bsg error path fix (Pan) - blk-mq scheduler improvements (Jan) - -EBUSY discard fix (Jan) - bvec allocation improvements (Ming, Christoph) - bio allocation and init improvements (Christoph) - Store bdev pointer in bio instead of gendisk + partno (Christoph) - Block trace point cleanups (Christoph) - hard read-only vs read-only split (Christoph) - Block based swap cleanups (Christoph) - Zoned write granularity support (Damien) - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)" * tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits) mm: simplify swapdev_block sd_zbc: clear zone resources for non-zoned case block: introduce blk_queue_clear_zone_settings() zonefs: use zone write granularity as block size block: introduce zone_write_granularity limit block: use blk_queue_set_zoned in add_partition() nullb: use blk_queue_set_zoned() to setup zoned devices nvme: cleanup zone information initialization block: document zone_append_max_bytes attribute block: use bi_max_vecs to find the bvec pool md/raid10: remove dead code in reshape_request block: mark the bio as cloned in bio_iov_bvec_set block: set BIO_NO_PAGE_REF in bio_iov_bvec_set block: remove a layer of indentation in bio_iov_iter_get_pages block: turn the nr_iovecs argument to bio_alloc* into an unsigned short block: remove the 1 and 4 vec bvec_slabs entries block: streamline bvec_alloc block: factor out a bvec_alloc_gfp helper block: move struct biovec_slab to bio.c block: reuse BIO_INLINE_VECS for integrity bvecs ...
2021-02-21Merge tag 'oprofile-removal-5.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux Pull oprofile and dcookies removal from Viresh Kumar: "Remove oprofile and dcookies support The 'oprofile' user-space tools don't use the kernel OPROFILE support any more, and haven't in a long time. User-space has been converted to the perf interfaces. The dcookies stuff is only used by the oprofile code. Now that oprofile's support is getting removed from the kernel, there is no need for dcookies as well. Remove kernel's old oprofile and dcookies support" * tag 'oprofile-removal-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/linux: fs: Remove dcookies support drivers: Remove CONFIG_OPROFILE support arch: xtensa: Remove CONFIG_OPROFILE support arch: x86: Remove CONFIG_OPROFILE support arch: sparc: Remove CONFIG_OPROFILE support arch: sh: Remove CONFIG_OPROFILE support arch: s390: Remove CONFIG_OPROFILE support arch: powerpc: Remove oprofile arch: powerpc: Stop building and using oprofile arch: parisc: Remove CONFIG_OPROFILE support arch: mips: Remove CONFIG_OPROFILE support arch: microblaze: Remove CONFIG_OPROFILE support arch: ia64: Remove rest of perfmon support arch: ia64: Remove CONFIG_OPROFILE support arch: hexagon: Don't select HAVE_OPROFILE arch: arc: Remove CONFIG_OPROFILE support arch: arm: Remove CONFIG_OPROFILE support arch: alpha: Remove CONFIG_OPROFILE support
2021-02-21Merge tag 'xfs-5.12-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "There's a lot going on this time, which seems about right for this drama-filled year. Community developers added some code to speed up freezing when read-only workloads are still running, refactored the logging code, added checks to prevent file extent counter overflow, reduced iolock cycling to speed up fsync and gc scans, and started the slow march towards supporting filesystem shrinking. There's a huge refactoring of the internal speculative preallocation garbage collection code which fixes a bunch of bugs, makes the gc scheduling per-AG and hence multithreaded, and standardizes the retry logic when we try to reserve space or quota, can't, and want to trigger a gc scan. We also enable multithreaded quotacheck to reduce mount times further. This is also preparation for background file gc, which may or may not land for 5.13. We also fixed some deadlocks in the rename code, fixed a quota accounting leak when FSSETXATTR fails, restored the behavior that write faults to an mmap'd region actually cause a SIGBUS, fixed a bug where sgid directory inheritance wasn't quite working properly, and fixed a bug where symlinks weren't working properly in ecryptfs. We also now advertise the inode btree counters feature that was introduced two cycles ago. Summary: - Fix an ABBA deadlock when renaming files on overlayfs. - Make sure that we can't overflow the inode extent counters when adding to or removing extents from a file. - Make directory sgid inheritance work the same way as all the other filesystems. - Don't drain the buffer cache on freeze and ro remount, which should reduce the amount of time if read-only workloads are continuing during the freeze. - Fix a bug where symlink size isn't reported to the vfs in ecryptfs. - Disentangle log cleaning from log covering. This refactoring sets us up for future changes to the log, though for now it simply means that we can use covering for freezes, and cleaning becomes something we only do at unmount. - Speed up file fsyncs by reducing iolock cycling. - Fix delalloc blocks leaking when changing the project id fails because of input validation errors in FSSETXATTR. - Fix oversized quota reservation when converting unwritten extents during a DAX write. - Create a transaction allocation helper function to standardize the idiom of allocating a transaction, reserving blocks, locking inodes, and reserving quota. Replace all the open-coded logic for file creation, file ownership changes, and file modifications to use them. - Actually shut down the fs if the incore quota reservations get corrupted. - Fix background block garbage collection scans to not block and to actually clean out CoW staging extents properly. - Run block gc scans when we run low on project quota. - Use the standardized transaction allocation helpers to make it so that ENOSPC and EDQUOT errors during reservation will back out, invoke the block gc scanner, and try again. This is preparation for introducing background inode garbage collection in the next cycle. - Combine speculative post-EOF block garbage collection with speculative copy on write block garbage collection. - Enable multithreaded quotacheck. - Allow sysadmins to tweak the CPU affinities and maximum concurrency levels of quotacheck and background blockgc worker pools. - Expose the inode btree counter feature in the fs geometry ioctl. - Cleanups of the growfs code in preparation for starting work on filesystem shrinking. - Fix all the bloody gcc warnings that the maintainer knows about. :P - Fix a RST syntax error. - Don't trigger bmbt corruption assertions after the fs shuts down. - Restore behavior of forcing SIGBUS on a shut down filesystem when someone triggers a mmap write fault (or really, any buffered write)" * tag 'xfs-5.12-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (85 commits) xfs: consider shutdown in bmapbt cursor delete assert xfs: fix boolreturn.cocci warnings xfs: restore shutdown check in mapped write fault path xfs: fix rst syntax error in admin guide xfs: fix incorrect root dquot corruption error when switching group/project quota types xfs: get rid of xfs_growfs_{data,log}_t xfs: rename `new' to `delta' in xfs_growfs_data_private() libxfs: expose inobtcount in xfs geometry xfs: don't bounce the iolock between free_{eof,cow}blocks xfs: expose the blockgc workqueue knobs publicly xfs: parallelize block preallocation garbage collection xfs: rename block gc start and stop functions xfs: only walk the incore inode tree once per blockgc scan xfs: consolidate the eofblocks and cowblocks workers xfs: consolidate incore inode radix tree posteof/cowblocks tags xfs: remove trivial eof/cowblocks functions xfs: hide xfs_icache_free_cowblocks xfs: hide xfs_icache_free_eofblocks xfs: relocate the eofb/cowb workqueue functions xfs: set WQ_SYSFS on all workqueues in debug mode ...
2021-02-21Merge tag 'iomap-5.12-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull iomap updates from Darrick Wong: "The big change in this cycle is some new code to make it possible for XFS to try unaligned directio overwrites without taking locks. If the block is fully written and within EOF (i.e. doesn't require any further fs intervention) then we can let the unlocked write proceed. If not, we fall back to synchronizing direct writes. Summary: - Adjust the final parameter of iomap_dio_rw. - Add a new flag to request that iomap directio writes return EAGAIN if the write is not a pure overwrite within EOF; this will be used to reduce lock contention with unaligned direct writes on XFS. - Amend XFS' directio code to eliminate exclusive locking for unaligned direct writes if the circumstances permit" * tag 'iomap-5.12-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: reduce exclusive locking on unaligned dio xfs: split the unaligned DIO write code out xfs: improve the reflink_bounce_dio_write tracepoint xfs: simplify the read/write tracepoints xfs: remove the buffered I/O fallback assert xfs: cleanup the read/write helper naming xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-aware xfs: factor out a xfs_ilock_iocb helper iomap: add a IOMAP_DIO_OVERWRITE_ONLY flag iomap: pass a flags argument to iomap_dio_rw iomap: rename the flags variable in __iomap_dio_rw
2021-02-21Merge tag 'pstore-v5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore fix from Kees Cook: "Fix a CONFIG typo (Jiri Bohac)" * tag 'pstore-v5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore: Fix typo in compression option name
2021-02-21Merge tag 'fsverity-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt Pull fsverity updates from Eric Biggers: "Add an ioctl which allows reading fs-verity metadata from a file. This is useful when a file with fs-verity enabled needs to be served somewhere, and the other end wants to do its own fs-verity compatible verification of the file. See the commit messages for details. This new ioctl has been tested using new xfstests I've written for it" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fs-verity: support reading signature with ioctl fs-verity: support reading descriptor with ioctl fs-verity: support reading Merkle tree with ioctl fs-verity: add FS_IOC_READ_VERITY_METADATA ioctl fs-verity: don't pass whole descriptor to fsverity_verify_signature() fs-verity: factor out fsverity_get_descriptor()
2021-02-21Merge tag 'nfsd-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds
Pull nfsd updates from Chuck Lever: - Update NFSv2 and NFSv3 XDR decoding functions - Further improve support for re-exporting NFS mounts - Convert NFSD stats to per-CPU counters - Add batch Receive posting to the server's RPC/RDMA transport * tag 'nfsd-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (65 commits) nfsd: skip some unnecessary stats in the v4 case nfs: use change attribute for NFS re-exports NFSv4_2: SSC helper should use its own config. nfsd: cstate->session->se_client -> cstate->clp nfsd: simplify nfsd4_check_open_reclaim nfsd: remove unused set_client argument nfsd: find_cpntf_state cleanup nfsd: refactor set_client nfsd: rename lookup_clientid->set_client nfsd: simplify nfsd_renew nfsd: simplify process_lock nfsd4: simplify process_lookup1 SUNRPC: Correct a comment svcrdma: DMA-sync the receive buffer in svc_rdma_recvfrom() svcrdma: Reduce Receive doorbell rate svcrdma: Deprecate stat variables that are no longer used svcrdma: Restore read and write stats svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter svcrdma: Convert rdma_stat_recv to a per-CPU counter svcrdma: Refactor svc_rdma_init() and svc_rdma_clean_up() ...
2021-02-21Merge tag 'erofs-for-5.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: "This contains a somewhat important but rarely reproduced fix reported month ago for platforms which have weak memory model (e.g. arm64). The root cause is that test_bit/set_bit atomic operations are actually implemented in relaxed forms, and uninitialized fields governed by an atomic bit could be observed in advance due to memory reordering thus memory barrier pairs should be used. There is also a trivial fix of crafted blkszbits generated by syzkaller. Summary: - fix shift-out-of-bounds of crafted blkszbits generated by syzkaller - ensure initialized fields can only be observed after bit is set" * tag 'erofs-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: initialized fields can only be observed after bit is set erofs: fix shift-out-of-bounds of blkszbits