Age | Commit message (Collapse) | Author |
|
In fuse_writepage_end() the old writepages entry needs to be removed from
the rbtree before inserting the new one, otherwise tree_insert() would
fail. This is a very rare codepath and no reproducer exists.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
We used to do this before 3453d5708b33, but this was changed to better
handle the NFS4ERR_SEQ_MISORDERED error code. This commit fixed the slot
re-use case when the server doesn't receive the interrupted operation,
but if the server does receive the operation then it could still end up
replying to the client with mis-matched operations from the reply cache.
We can fix this by sending a SEQUENCE to the server while recovering from
a SEQ_MISORDERED error when we detect that we are in an interrupted slot
situation.
Fixes: 3453d5708b33 (NFSv4.1: Avoid false retries when RPC calls are interrupted)
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|
Pull io_uring fixes from Jens Axboe:
"Two late fixes again:
- Fix missing msg_name assignment in certain cases (Pavel)
- Correct a previous fix for full coverage (Pavel)"
* tag 'io_uring-5.8-2020-07-12' of git://git.kernel.dk/linux-block:
io_uring: fix not initialised work->flags
io_uring: fix missing msg_name assignment
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"Two refcounting fixes and one prepartory patch for upcoming splice
cleanup:
- fix double put of block group with nodatacow
- fix missing block group put when remounting with discard=async
- explicitly set splice callback (no functional change), to ease
integrating splice cleanup patches"
* tag 'for-5.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: wire up iter_file_splice_write
btrfs: fix double put of block group with nocow
btrfs: discard: add missing put when grabbing block group from unused list
|
|
59960b9deb535 ("io_uring: fix lazy work init") tried to fix missing
io_req_init_async(), but left out work.flags and hash. Do it earlier.
Fixes: 7cdaf587de7c ("io_uring: avoid whole io_wq_work copy for requests completed inline")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Ensure to set msg.msg_name for the async portion of send/recvmsg,
as the header copy will copy to/from it.
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull cifs fixes from Steve French:
"Four cifs/smb3 fixes: the three for stable fix problems found recently
with change notification including a reference count leak"
* tag '5.8-rc4-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
cifs: fix reference leak for tlink
smb3: fix unneeded error message on change notify
cifs: remove the retry in cifs_poxis_lock_set
smb3: fix access denied on change notify request to some servers
|
|
Pull io_uring fixes from Jens Axboe:
- Fix memleak for error path in registered files (Yang)
- Export CQ overflow state in flags, necessary to fix a case where
liburing doesn't know if it needs to enter the kernel (Xiaoguang)
- Fix for a regression in when user memory is accounted freed, causing
issues with back-to-back ring exit + init if the ulimit -l setting is
very tight.
* tag 'io_uring-5.8-2020-07-10' of git://git.kernel.dk/linux-block:
io_uring: account user memory freed when exit has been queued
io_uring: fix memleak in io_sqe_files_register()
io_uring: fix memleak in __io_sqe_files_update()
io_uring: export cq overflow status to userspace
|
|
Pull in-kernel read and write op cleanups from Christoph Hellwig:
"Cleanup in-kernel read and write operations
Reshuffle the (__)kernel_read and (__)kernel_write helpers, and ensure
all users of in-kernel file I/O use them if they don't use iov_iter
based methods already.
The new WARN_ONs in combination with syzcaller already found a missing
input validation in 9p. The fix should be on your way through the
maintainer ASAP".
[ This is prep-work for the real changes coming 5.9 ]
* tag 'cleanup-kernel_read_write' of git://git.infradead.org/users/hch/misc:
fs: remove __vfs_read
fs: implement kernel_read using __kernel_read
integrity/ima: switch to using __kernel_read
fs: add a __kernel_read helper
fs: remove __vfs_write
fs: implement kernel_write using __kernel_write
fs: check FMODE_WRITE in __kernel_write
fs: unexport __kernel_write
bpfilter: switch to kernel_write
autofs: switch to kernel_write
cachefiles: switch to kernel_write
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
"Fix gfs2 readahead deadlocks by adding a IOCB_NOIO flag that allows
gfs2 to use the generic fiel read iterator functions without having to
worry about being called back while holding locks".
* tag 'gfs2-v5.8-rc4.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Rework read and page fault locking
fs: Add IOCB_NOIO flag for generic_file_read_iter
|
|
We currently account the memory after the exit work has been run, but
that leaves a gap where a process has closed its ring and until the
memory has been accounted as freed. If the memlocked ulimit is
borderline, then that can introduce spurious setup errors returning
-ENOMEM because the free work hasn't been run yet.
Account this as freed when we close the ring, as not to expose a tiny
gap where setting up a new ring can fail.
Fixes: 85faa7b8346e ("io_uring: punt final io_ring_ctx wait-and-free to workqueue")
Cc: stable@vger.kernel.org # v5.7
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
I got a memleak report when doing some fuzz test:
BUG: memory leak
unreferenced object 0x607eeac06e78 (size 8):
comm "test", pid 295, jiffies 4294735835 (age 31.745s)
hex dump (first 8 bytes):
00 00 00 00 00 00 00 00 ........
backtrace:
[<00000000932632e6>] percpu_ref_init+0x2a/0x1b0
[<0000000092ddb796>] __io_uring_register+0x111d/0x22a0
[<00000000eadd6c77>] __x64_sys_io_uring_register+0x17b/0x480
[<00000000591b89a6>] do_syscall_64+0x56/0xa0
[<00000000864a281d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Call percpu_ref_exit() on error path to avoid
refcount memleak.
Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
btrfs implements the iter_write op and thus can use the more efficient
iov_iter based splice implementation. For now falling back to the less
efficient default is pretty harmless, but I have a pending series that
removes the default, and thus would cause btrfs to not support splice
at all.
Reported-by: Andy Lavr <andy.lavr@gmail.com>
Tested-by: Andy Lavr <andy.lavr@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
While debugging a patch that I wrote I was hitting use-after-free panics
when accessing block groups on unmount. This turned out to be because
in the nocow case if we bail out of doing the nocow for whatever reason
we need to call btrfs_dec_nocow_writers() if we called the inc. This
puts our block group, but a few error cases does
if (nocow) {
btrfs_dec_nocow_writers();
goto error;
}
unfortunately, error is
error:
if (nocow)
btrfs_dec_nocow_writers();
so we get a double put on our block group. Fix this by dropping the
error cases calling of btrfs_dec_nocow_writers(), as it's handled at the
error label now.
Fixes: 762bf09893b4 ("btrfs: improve error handling in run_delalloc_nocow")
CC: stable@vger.kernel.org # 5.4+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
To 2.28
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Don't leak a reference to tlink during the NOTIFY ioctl
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org> # v5.6+
|
|
Commit
bf67fad19e493b ("efi: Use more granular check for availability for variable services")
introduced a check into the efivarfs, efi-pstore and other drivers that
aborts loading of the module if not all three variable runtime services
(GetVariable, SetVariable and GetNextVariable) are supported. However, this
results in efivarfs being unavailable entirely if only SetVariable support
is missing, which is only needed if you want to make any modifications.
Also, efi-pstore and the sysfs EFI variable interface could be backed by
another implementation of the 'efivars' abstraction, in which case it is
completely irrelevant which services are supported by the EFI firmware.
So make the generic 'efivars' abstraction dependent on the availibility of
the GetVariable and GetNextVariable EFI runtime services, and add a helper
'efivar_supports_writes()' to find out whether the currently active efivars
abstraction supports writes (and wire it up to the availability of
SetVariable for the generic one).
Then, use the efivar_supports_writes() helper to decide whether to permit
efivarfs to be mounted read-write, and whether to enable efi-pstore or the
sysfs EFI variable interface altogether.
Fixes: bf67fad19e493b ("efi: Use more granular check for availability for variable services")
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Tested-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
I got a memleak report when doing some fuzz test:
BUG: memory leak
unreferenced object 0xffff888113e02300 (size 488):
comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
a0 a4 ce 19 81 88 ff ff 60 ce 09 0d 81 88 ff ff ........`.......
backtrace:
[<00000000129a84ec>] kmem_cache_zalloc include/linux/slab.h:659 [inline]
[<00000000129a84ec>] __alloc_file+0x25/0x310 fs/file_table.c:101
[<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151
[<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193
[<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233
[<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74
[<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720
[<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
BUG: memory leak
unreferenced object 0xffff8881152dd5e0 (size 16):
comm "syz-executor401", pid 356, jiffies 4294809529 (age 11.954s)
hex dump (first 16 bytes):
01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<0000000074caa794>] kmem_cache_zalloc include/linux/slab.h:659 [inline]
[<0000000074caa794>] lsm_file_alloc security/security.c:567 [inline]
[<0000000074caa794>] security_file_alloc+0x32/0x160 security/security.c:1440
[<00000000c6745ea3>] __alloc_file+0xba/0x310 fs/file_table.c:106
[<000000003050ad84>] alloc_empty_file+0x4f/0x120 fs/file_table.c:151
[<000000004d0a41a3>] alloc_file+0x5e/0x550 fs/file_table.c:193
[<000000002cb242f0>] alloc_file_pseudo+0x16a/0x240 fs/file_table.c:233
[<00000000046a4baa>] anon_inode_getfile fs/anon_inodes.c:91 [inline]
[<00000000046a4baa>] anon_inode_getfile+0xac/0x1c0 fs/anon_inodes.c:74
[<0000000035beb745>] __do_sys_perf_event_open+0xd4a/0x2680 kernel/events/core.c:11720
[<0000000049009dc7>] do_syscall_64+0x56/0xa0 arch/x86/entry/common.c:359
[<00000000353731ca>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
If io_sqe_file_register() failed, we need put the file that get by fget()
to avoid the memleak.
Fixes: c3a31e605620 ("io_uring: add support for IORING_REGISTER_FILES_UPDATE")
Cc: stable@vger.kernel.org
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
For those applications which are not willing to use io_uring_enter()
to reap and handle cqes, they may completely rely on liburing's
io_uring_peek_cqe(), but if cq ring has overflowed, currently because
io_uring_peek_cqe() is not aware of this overflow, it won't enter
kernel to flush cqes, below test program can reveal this bug:
static void test_cq_overflow(struct io_uring *ring)
{
struct io_uring_cqe *cqe;
struct io_uring_sqe *sqe;
int issued = 0;
int ret = 0;
do {
sqe = io_uring_get_sqe(ring);
if (!sqe) {
fprintf(stderr, "get sqe failed\n");
break;;
}
ret = io_uring_submit(ring);
if (ret <= 0) {
if (ret != -EBUSY)
fprintf(stderr, "sqe submit failed: %d\n", ret);
break;
}
issued++;
} while (ret > 0);
assert(ret == -EBUSY);
printf("issued requests: %d\n", issued);
while (issued) {
ret = io_uring_peek_cqe(ring, &cqe);
if (ret) {
if (ret != -EAGAIN) {
fprintf(stderr, "peek completion failed: %s\n",
strerror(ret));
break;
}
printf("left requets: %d\n", issued);
continue;
}
io_uring_cqe_seen(ring, cqe);
issued--;
printf("left requets: %d\n", issued);
}
}
int main(int argc, char *argv[])
{
int ret;
struct io_uring ring;
ret = io_uring_queue_init(16, &ring, 0);
if (ret) {
fprintf(stderr, "ring setup failed: %d\n", ret);
return 1;
}
test_cq_overflow(&ring);
return 0;
}
To fix this issue, export cq overflow status to userspace by adding new
IORING_SQ_CQ_OVERFLOW flag, then helper functions() in liburing, such as
io_uring_peek_cqe, can be aware of this cq overflow and do flush accordingly.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
We should not be logging a warning repeatedly on change notify.
CC: Stable <stable@vger.kernel.org> # v5.6+
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
Fold it into the two callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Consolidate the two in-kernel read helpers to make upcoming changes
easier. The only difference are the missing call to rw_verify_area
in kernel_read, and an access_ok check that doesn't make sense for
kernel buffers to start with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This is the counterpart to __kernel_write, and skip the rw_verify_area
call compared to kernel_read.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Fold it into the two callers.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Consolidate the two in-kernel write helpers to make upcoming changes
easier. The only difference are the missing call to rw_verify_area
in kernel_write, and an access_ok check that doesn't make sense for
kernel buffers to start with.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Add a WARN_ON_ONCE if the file isn't actually open for write. This
matches the check done in vfs_write, but actually warn warns as a
kernel user calling write on a file not opened for writing is a pretty
obvious programming error.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
This is a very special interface that skips sb_writes protection, and not
used by modules anymore.
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
While pipes don't really need sb_writers projection, __kernel_write is an
interface better kept private, and the additional rw_verify_area does not
hurt here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ian Kent <raven@themaw.net>
|
|
__kernel_write doesn't take a sb_writers references, which we need here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
|
|
The caller of cifs_posix_lock_set will do retry(like
fcntl_setlk64->do_lock_file_wait) if we will wait for any file_lock.
So the retry in cifs_poxis_lock_set seems duplicated, remove it to
make a cleanup.
Signed-off-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.de>
|
|
read permission, not just read attributes permission, is required
on the directory.
See MS-SMB2 (protocol specification) section 3.3.5.19.
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org> # v5.6+
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
|
|
So far, gfs2 has taken the inode glocks inside the ->readpage and
->readahead address space operations. Since commit d4388340ae0b ("fs:
convert mpage_readpages to mpage_readahead"), gfs2_readahead is passed
the pages to read ahead locked. With that, the current holder of the
inode glock may be trying to lock one of those pages while
gfs2_readahead is trying to take the inode glock, resulting in a
deadlock.
Fix that by moving the lock taking to the higher-level ->read_iter file
and ->fault vm operations. This also gets rid of an ugly lock inversion
workaround in gfs2_readpage.
The cache consistency model of filesystems like gfs2 is such that if
data is found in the page cache, the data is up to date and can be used
without taking any filesystem locks. If a page is not cached,
filesystem locks must be taken before populating the page cache.
To avoid taking the inode glock when the data is already cached,
gfs2_file_read_iter first tries to read the data with the IOCB_NOIO flag
set. If that fails, the inode glock is taken and the operation is
retried with the IOCB_NOIO flag cleared.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- regression fix of a leak in global block reserve accounting
- fix a (hard to hit) race of readahead vs releasepage that could lead
to crash
- convert all remaining uses of comment fall through annotations to the
pseudo keyword
- fix crash when mounting a fuzzed image with -o recovery
* tag 'for-5.8-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: reset tree root pointer after error in init_tree_roots
btrfs: fix reclaim_size counter leak after stealing from global reserve
btrfs: fix fatal extent_buffer readahead vs releasepage race
btrfs: convert comments to fallthrough annotations
|
|
[BUG]
The following small test script can trigger ASSERT() at unmount time:
mkfs.btrfs -f $dev
mount $dev $mnt
mount -o remount,discard=async $mnt
umount $mnt
The call trace:
assertion failed: atomic_read(&block_group->count) == 1, in fs/btrfs/block-group.c:3431
------------[ cut here ]------------
kernel BUG at fs/btrfs/ctree.h:3204!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 4 PID: 10389 Comm: umount Tainted: G O 5.8.0-rc3-custom+ #68
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
Call Trace:
btrfs_free_block_groups.cold+0x22/0x55 [btrfs]
close_ctree+0x2cb/0x323 [btrfs]
btrfs_put_super+0x15/0x17 [btrfs]
generic_shutdown_super+0x72/0x110
kill_anon_super+0x18/0x30
btrfs_kill_super+0x17/0x30 [btrfs]
deactivate_locked_super+0x3b/0xa0
deactivate_super+0x40/0x50
cleanup_mnt+0x135/0x190
__cleanup_mnt+0x12/0x20
task_work_run+0x64/0xb0
__prepare_exit_to_usermode+0x1bc/0x1c0
__syscall_return_slowpath+0x47/0x230
do_syscall_64+0x64/0xb0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The code:
ASSERT(atomic_read(&block_group->count) == 1);
btrfs_put_block_group(block_group);
[CAUSE]
Obviously it's some btrfs_get_block_group() call doesn't get its put
call.
The offending btrfs_get_block_group() happens here:
void btrfs_mark_bg_unused(struct btrfs_block_group *bg)
{
if (list_empty(&bg->bg_list)) {
btrfs_get_block_group(bg);
list_add_tail(&bg->bg_list, &fs_info->unused_bgs);
}
}
So every call sites removing the block group from unused_bgs list should
reduce the ref count of that block group.
However for async discard, it didn't follow the call convention:
void btrfs_discard_punt_unused_bgs_list(struct btrfs_fs_info *fs_info)
{
list_for_each_entry_safe(block_group, next, &fs_info->unused_bgs,
bg_list) {
list_del_init(&block_group->bg_list);
btrfs_discard_queue_work(&fs_info->discard_ctl, block_group);
}
}
And in btrfs_discard_queue_work(), it doesn't call
btrfs_put_block_group() either.
[FIX]
Fix the problem by reducing the reference count when we grab the block
group from unused_bgs list.
Reported-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Fixes: 6e80d4f8c422 ("btrfs: handle empty block_group removal for async discard")
CC: stable@vger.kernel.org # 5.6+
Tested-by: Marcos Paulo de Souza <mpdesouza@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Pull io_uring fix from Jens Axboe:
"Andres reported a regression with the fix that was merged earlier this
week, where his setup of using signals to interrupt io_uring CQ waits
no longer worked correctly.
Fix this, and also limit our use of TWA_SIGNAL to the case where we
need it, and continue using TWA_RESUME for task_work as before.
Since the original is marked for 5.7 stable, let's flush this one out
early"
* tag 'io_uring-5.8-2020-07-05' of git://git.kernel.dk/linux-block:
io_uring: fix regression with always ignoring signals in io_cqring_wait()
|
|
When switching to TWA_SIGNAL for task_work notifications, we also made
any signal based condition in io_cqring_wait() return -ERESTARTSYS.
This breaks applications that rely on using signals to abort someone
waiting for events.
Check if we have a signal pending because of queued task_work, and
repeat the signal check once we've run the task_work. This provides a
reliable way of telling the two apart.
Additionally, only use TWA_SIGNAL if we are using an eventfd. If not,
we don't have the dependency situation described in the original commit,
and we can get by with just using TWA_RESUME like we previously did.
Fixes: ce593a6c480a ("io_uring: use signal based task_work running")
Cc: stable@vger.kernel.org # v5.7
Reported-by: Andres Freund <andres@anarazel.de>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull sysctl fix from Al Viro:
"Another regression fix for sysctl changes this cycle..."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Call sysctl_head_finish on error
|
|
Pull cifs fixes from Steve French:
"Eight cifs/smb3 fixes, most when specifying the multiuser mount flag.
Five of the fixes are for stable"
* tag '5.8-rc3-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: prevent truncation from long to int in wait_for_free_credits
cifs: Fix the target file was deleted when rename failed.
SMB3: Honor 'posix' flag for multiuser mounts
SMB3: Honor 'handletimeout' flag for multiuser mounts
SMB3: Honor lease disabling for multiuser mounts
SMB3: Honor persistent/resilient handle flags for multiuser mounts
SMB3: Honor 'seal' flag for multiuser mounts
cifs: Display local UID details for SMB sessions in DebugData
|
|
Pull xfs fix from Darrick Wong:
"Fix a use-after-free bug when the fs shuts down"
* tag 'xfs-5.8-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix use-after-free on CIL context on shutdown
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fixes from Andreas Gruenbacher:
"Various gfs2 fixes"
* tag 'gfs2-v5.8-rc3.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: The freeze glock should never be frozen
gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHE
gfs2: read-only mounts should grab the sd_freeze_gl glock
gfs2: freeze should work on read-only mounts
gfs2: eliminate GIF_ORDERED in favor of list_empty
gfs2: Don't sleep during glock hash walk
gfs2: fix trans slab error when withdraw occurs inside log_flush
gfs2: Don't return NULL from gfs2_inode_lookup
|
|
This error path returned directly instead of calling sysctl_head_finish().
Fixes: ef9d965bc8b6 ("sysctl: reject gigantic reads/write to sysctl files")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Before this patch, some gfs2 code locked the freeze glock with LM_FLAG_NOEXP
(Do not freeze) flag, and some did not. We never want to freeze the freeze
glock, so this patch makes it consistently use LM_FLAG_NOEXP always.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, the freeze code in gfs2 specified GL_NOCACHE in
several places. That's wrong because we always want to know the state
of whether the file system is frozen.
There was also a problem with freeze/thaw transitioning the glock from
frozen (EX) to thawed (SH) because gfs2 will normally grant glocks in EX
to processes that request it in SH mode, unless GL_EXACT is specified.
Therefore, the freeze/thaw code, which tried to reacquire the glock in
SH mode would get the glock in EX mode, and miss the transition from EX
to SH. That made it think the thaw had completed normally, but since the
glock was still cached in EX, other nodes could not freeze again.
This patch removes the GL_NOCACHE flag to allow the freeze glock to be
cached. It also adds the GL_EXACT flag so the glock is fully transitioned
from EX to SH, thereby allowing future freeze operations.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, only read-write mounts would grab the freeze
glock in read-only mode, as part of gfs2_make_fs_rw. So the freeze
glock was never initialized. That meant requests to freeze, which
request the glock in EX, were granted without any state transition.
That meant you could mount a gfs2 file system, which is currently
frozen on a different cluster node, in read-only mode.
This patch makes read-only mounts lock the freeze glock in SH mode,
which will block for file systems that are frozen on another node.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, function freeze_go_sync, called when promoting
the freeze glock, was testing for the SDF_JOURNAL_LIVE superblock flag.
That's only set for read-write mounts. Read-only mounts don't use a
journal, so the bit is never set, so the freeze never happened.
This patch removes the check for SDF_JOURNAL_LIVE for freeze requests
but still checks it when deciding whether to flush a journal.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
In several places, we used the GIF_ORDERED inode flag to determine
if an inode was on the ordered writes list. However, since we always
held the sd_ordered_lock spin_lock during the manipulation, we can
just as easily check list_empty(&ip->i_ordered) instead.
This allows us to keep more than one ordered writes list to make
journal writing improvements.
This patch eliminates GIF_ORDERED in favor of checking list_empty.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Pull nfsd fixes from Bruce Fields:
"Fixes for a umask bug on exported filesystems lacking ACL support, a
leak and a module unloading bug in the /proc/fs/nfsd/clients/ code,
and a compile warning"
* tag 'nfsd-5.8-1' of git://linux-nfs.org/~bfields/linux:
SUNRPC: Add missing definition of ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE
nfsd: fix nfsdfs inode reference count leak
nfsd4: fix nfsdfs reference count loop
nfsd: apply umask on fs without ACL support
|
|
Pull io_uring fixes from Jens Axboe:
"One fix in here, for a regression in 5.7 where a task is waiting in
the kernel for a condition, but that condition won't become true until
task_work is run. And the task_work can't be run exactly because the
task is waiting in the kernel, so we'll never make any progress.
One example of that is registering an eventfd and queueing io_uring
work, and then the task goes and waits in eventfd read with the
expectation that it'll get woken (and read an event) when the io_uring
request completes. The io_uring request is finished through task_work,
which won't get run while the task is looping in eventfd read"
* tag 'io_uring-5.8-2020-07-01' of git://git.kernel.dk/linux-block:
io_uring: use signal based task_work running
task_work: teach task_work_add() to do signal_wake_up()
|
|
Eric reported an issue where mounting -o recovery with a fuzzed fs
resulted in a kernel panic. This is because we tried to free the tree
node, except it was an error from the read. Fix this by properly
resetting the tree_root->node == NULL in this case. The panic was the
following
BTRFS warning (device loop0): failed to read tree root
BUG: kernel NULL pointer dereference, address: 000000000000001f
RIP: 0010:free_extent_buffer+0xe/0x90 [btrfs]
Call Trace:
free_root_extent_buffers.part.0+0x11/0x30 [btrfs]
free_root_pointers+0x1a/0xa2 [btrfs]
open_ctree+0x1776/0x18a5 [btrfs]
btrfs_mount_root.cold+0x13/0xfa [btrfs]
? selinux_fs_context_parse_param+0x37/0x80
legacy_get_tree+0x27/0x40
vfs_get_tree+0x25/0xb0
fc_mount+0xe/0x30
vfs_kern_mount.part.0+0x71/0x90
btrfs_mount+0x147/0x3e0 [btrfs]
? cred_has_capability+0x7c/0x120
? legacy_get_tree+0x27/0x40
legacy_get_tree+0x27/0x40
vfs_get_tree+0x25/0xb0
do_mount+0x735/0xa40
__x64_sys_mount+0x8e/0xd0
do_syscall_64+0x4d/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nik says: this is problematic only if we fail on the last iteration of
the loop as this results in init_tree_roots returning err value with
tree_root->node = -ERR. Subsequently the caller does: fail_tree_roots
which calls free_root_pointers on the bogus value.
Reported-by: Eric Sandeen <sandeen@redhat.com>
Fixes: b8522a1e5f42 ("btrfs: Factor out tree roots initialization during mount")
CC: stable@vger.kernel.org # 5.5+
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add details how the pointer gets dereferenced ]
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Commit 7f9fe614407692 ("btrfs: improve global reserve stealing logic"),
added in the 5.8 merge window, introduced another leak for the space_info's
reclaim_size counter. This is very often triggered by the test cases
generic/269 and generic/416 from fstests, producing a stack trace like the
following during unmount:
[37079.155499] ------------[ cut here ]------------
[37079.156844] WARNING: CPU: 2 PID: 2000423 at fs/btrfs/block-group.c:3422 btrfs_free_block_groups+0x2eb/0x300 [btrfs]
[37079.158090] Modules linked in: dm_snapshot btrfs dm_thin_pool (...)
[37079.164440] CPU: 2 PID: 2000423 Comm: umount Tainted: G W 5.7.0-rc7-btrfs-next-62 #1
[37079.165422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), (...)
[37079.167384] RIP: 0010:btrfs_free_block_groups+0x2eb/0x300 [btrfs]
[37079.168375] Code: bd 58 ff ff ff 00 4c 8d (...)
[37079.170199] RSP: 0018:ffffaa53875c7de0 EFLAGS: 00010206
[37079.171120] RAX: ffff98099e701cf8 RBX: ffff98099e2d4000 RCX: 0000000000000000
[37079.172057] RDX: 0000000000000001 RSI: ffffffffc0acc5b1 RDI: 00000000ffffffff
[37079.173002] RBP: ffff98099e701cf8 R08: 0000000000000000 R09: 0000000000000000
[37079.173886] R10: 0000000000000000 R11: 0000000000000000 R12: ffff98099e701c00
[37079.174730] R13: ffff98099e2d5100 R14: dead000000000122 R15: dead000000000100
[37079.175578] FS: 00007f4d7d0a5840(0000) GS:ffff9809ec600000(0000) knlGS:0000000000000000
[37079.176434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[37079.177289] CR2: 0000559224dcc000 CR3: 000000012207a004 CR4: 00000000003606e0
[37079.178152] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[37079.178935] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[37079.179675] Call Trace:
[37079.180419] close_ctree+0x291/0x2d1 [btrfs]
[37079.181162] generic_shutdown_super+0x6c/0x100
[37079.181898] kill_anon_super+0x14/0x30
[37079.182641] btrfs_kill_super+0x12/0x20 [btrfs]
[37079.183371] deactivate_locked_super+0x31/0x70
[37079.184012] cleanup_mnt+0x100/0x160
[37079.184650] task_work_run+0x68/0xb0
[37079.185284] exit_to_usermode_loop+0xf9/0x100
[37079.185920] do_syscall_64+0x20d/0x260
[37079.186556] entry_SYSCALL_64_after_hwframe+0x49/0xb3
[37079.187197] RIP: 0033:0x7f4d7d2d9357
[37079.187836] Code: eb 0b 00 f7 d8 64 89 01 48 (...)
[37079.189180] RSP: 002b:00007ffee4e0d368 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[37079.189845] RAX: 0000000000000000 RBX: 00007f4d7d3fb224 RCX: 00007f4d7d2d9357
[37079.190515] RDX: ffffffffffffff78 RSI: 0000000000000000 RDI: 0000559224dc5c90
[37079.191173] RBP: 0000559224dc1970 R08: 0000000000000000 R09: 00007ffee4e0c0e0
[37079.191815] R10: 0000559224dc7b00 R11: 0000000000000246 R12: 0000000000000000
[37079.192451] R13: 0000559224dc5c90 R14: 0000559224dc1a80 R15: 0000559224dc1ba0
[37079.193096] irq event stamp: 0
[37079.193729] hardirqs last enabled at (0): [<0000000000000000>] 0x0
[37079.194379] hardirqs last disabled at (0): [<ffffffff97ab8935>] copy_process+0x755/0x1ea0
[37079.195033] softirqs last enabled at (0): [<ffffffff97ab8935>] copy_process+0x755/0x1ea0
[37079.195700] softirqs last disabled at (0): [<0000000000000000>] 0x0
[37079.196318] ---[ end trace b32710d864dea887 ]---
In the past commit d611add48b717a ("btrfs: fix reclaim counter leak of
space_info objects") fixed similar cases. That commit however has a date
more recent (April 7 2020) then the commit mentioned before (March 13
2020), however it was merged in kernel 5.7 while the older commit, which
introduces a new leak, was merged only in the 5.8 merge window. So the
leak sneaked in unnoticed.
Fix this by making steal_from_global_rsv() remove the ticket using the
helper remove_ticket(), which decrements the reclaim_size counter of the
space_info object.
Fixes: 7f9fe614407692 ("btrfs: improve global reserve stealing logic")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|