From 4547f4d8ffd63ba4ac129f9136027bd14b729101 Mon Sep 17 00:00:00 2001 From: Liu Bo Date: Fri, 23 Sep 2016 14:05:04 -0700 Subject: Btrfs: kill BUG_ON in do_relocation While updating btree, we try to push items between sibling nodes/leaves in order to keep height as low as possible. But we don't memset the original places with zero when pushing items so that we could end up leaving stale content in nodes/leaves. One may read the above stale content by increasing btree blocks' @nritems. One case I've come across is that in fs tree, a leaf has two parent nodes, hence running balance ends up with processing this leaf with two parent nodes, but it can only reach the valid parent node through btrfs_search_slot, so it'd be like, do_relocation for P in all parent nodes of block A: if !P->eb: btrfs_search_slot(key); --> get path from P to A. if lowest: BUG_ON(A->bytenr != bytenr of A recorded in P); btrfs_cow_block(P, A); --> change A's bytenr in P. After btrfs_cow_block, P has the new bytenr of A, but with the same @key, we get the same path again, and get panic by BUG_ON. Note that this is only happening in a corrupted fs, for a regular fs in which we have correct @nritems so that we won't read stale content in any case. Reviewed-by: Josef Bacik Signed-off-by: Liu Bo Signed-off-by: David Sterba --- fs/btrfs/relocation.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 0ec8ffa37ab0..c4af0cdb783d 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -2728,7 +2728,14 @@ static int do_relocation(struct btrfs_trans_handle *trans, bytenr = btrfs_node_blockptr(upper->eb, slot); if (lowest) { - BUG_ON(bytenr != node->bytenr); + if (bytenr != node->bytenr) { + btrfs_err(root->fs_info, + "lowest leaf/node mismatch: bytenr %llu node->bytenr %llu slot %d upper %llu", + bytenr, node->bytenr, slot, + upper->eb->start); + err = -EIO; + goto next; + } } else { if (node->eb->start == bytenr) goto next; -- cgit v1.2.3-70-g09d2 From b77428b12b55437b28deae738d9ce8b2e0663b55 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Mon, 24 Oct 2016 14:21:18 +1100 Subject: xfs: defer should abort intent items if the trans roll fails If the deferred ops transaction roll fails, we need to abort the intent items if we haven't already logged a done item for it, regardless of whether or not the deferred ops has had a transaction committed. Dave found this while running generic/388. Move the tracepoint to make it easier to track object lifetimes. Reported-by: Dave Chinner Signed-off-by: Darrick J. Wong Reviewed-by: Dave Chinner Signed-off-by: Dave Chinner --- fs/xfs/libxfs/xfs_defer.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) (limited to 'fs') diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 613c5cf19436..5c2929f94bd3 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -199,9 +199,9 @@ xfs_defer_intake_work( struct xfs_defer_pending *dfp; list_for_each_entry(dfp, &dop->dop_intake, dfp_list) { - trace_xfs_defer_intake_work(tp->t_mountp, dfp); dfp->dfp_intent = dfp->dfp_type->create_intent(tp, dfp->dfp_count); + trace_xfs_defer_intake_work(tp->t_mountp, dfp); list_sort(tp->t_mountp, &dfp->dfp_work, dfp->dfp_type->diff_items); list_for_each(li, &dfp->dfp_work) @@ -221,21 +221,14 @@ xfs_defer_trans_abort( struct xfs_defer_pending *dfp; trace_xfs_defer_trans_abort(tp->t_mountp, dop); - /* - * If the transaction was committed, drop the intent reference - * since we're bailing out of here. The other reference is - * dropped when the intent hits the AIL. If the transaction - * was not committed, the intent is freed by the intent item - * unlock handler on abort. - */ - if (!dop->dop_committed) - return; - /* Abort intent items. */ + /* Abort intent items that don't have a done item. */ list_for_each_entry(dfp, &dop->dop_pending, dfp_list) { trace_xfs_defer_pending_abort(tp->t_mountp, dfp); - if (!dfp->dfp_done) + if (dfp->dfp_intent && !dfp->dfp_done) { dfp->dfp_type->abort_intent(dfp->dfp_intent); + dfp->dfp_intent = NULL; + } } /* Shut down FS. */ -- cgit v1.2.3-70-g09d2 From 86a6c211d676add579a75b7e172a72bb3e2c21f8 Mon Sep 17 00:00:00 2001 From: Benjamin Coddington Date: Wed, 15 Jun 2016 15:02:55 -0400 Subject: NFS: Trim extra slash in v4 nfs_path A NFSv4 mount of a subdirectory will show an extra slash (as in 'server://path') in proc's mountinfo which will not match the device name and path. This can cause problems for programs searching for the mount. Fix this by checking for a leading slash in the dentry path, if so trim away any trailing slashes in the device name. Signed-off-by: Benjamin Coddington Signed-off-by: Anna Schumaker --- fs/nfs/namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c index c8162c660c44..5551e8ef67fd 100644 --- a/fs/nfs/namespace.c +++ b/fs/nfs/namespace.c @@ -98,7 +98,7 @@ rename_retry: return end; } namelen = strlen(base); - if (flags & NFS_PATH_CANONICAL) { + if (*end == '/') { /* Strip off excess slashes in base string */ while (namelen > 0 && base[namelen - 1] == '/') namelen--; -- cgit v1.2.3-70-g09d2 From 0b34c261e235a5c74dcf78bd305845bd15fe2b42 Mon Sep 17 00:00:00 2001 From: Goldwyn Rodrigues Date: Fri, 30 Sep 2016 10:40:52 -0500 Subject: btrfs: qgroup: Prevent qgroup->reserved from going subzero While free'ing qgroup->reserved resources, we much check if the page has not been invalidated by a truncate operation by checking if the page is still dirty before reducing the qgroup resources. Resources in such a case are free'd when the entire extent is released by delayed_ref. This fixes a double accounting while releasing resources in case of truncating a file, reproduced by the following testcase. SCRATCH_DEV=/dev/vdb SCRATCH_MNT=/mnt mkfs.btrfs -f $SCRATCH_DEV mount -t btrfs $SCRATCH_DEV $SCRATCH_MNT cd $SCRATCH_MNT btrfs quota enable $SCRATCH_MNT btrfs subvolume create a btrfs qgroup limit 500m a $SCRATCH_MNT sync for c in {1..15}; do dd if=/dev/zero bs=1M count=40 of=$SCRATCH_MNT/a/file; done sleep 10 sync sleep 5 touch $SCRATCH_MNT/a/newfile echo "Removing file" rm $SCRATCH_MNT/a/file Fixes: b9d0b38928 ("btrfs: Add handler for invalidate page") Signed-off-by: Goldwyn Rodrigues Reviewed-by: Qu Wenruo Signed-off-by: David Sterba --- fs/btrfs/inode.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 50ba4ca167e7..6677674d0505 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -8930,9 +8930,14 @@ again: * So even we call qgroup_free_data(), it won't decrease reserved * space. * 2) Not written to disk - * This means the reserved space should be freed here. + * This means the reserved space should be freed here. However, + * if a truncate invalidates the page (by clearing PageDirty) + * and the page is accounted for while allocating extent + * in btrfs_check_data_free_space() we let delayed_ref to + * free the entire extent. */ - btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE); + if (PageDirty(page)) + btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE); if (!inode_evicting) { clear_extent_bit(tree, page_start, page_end, EXTENT_LOCKED | EXTENT_DIRTY | -- cgit v1.2.3-70-g09d2 From 69ae5e4459e43e56f03d0987e865fbac2b05af2a Mon Sep 17 00:00:00 2001 From: Wang Xiaoguang Date: Thu, 13 Oct 2016 09:23:39 +0800 Subject: btrfs: make file clone aware of fatal signals Indeed this just make the behavior similar to xfs when process has fatal signals pending, and it'll make fstests/generic/298 happy. Signed-off-by: Wang Xiaoguang Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/ioctl.c | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'fs') diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index af69129d7e0e..71634570943c 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3814,6 +3814,11 @@ process_slot: } btrfs_release_path(path); key.offset = next_key_min_offset; + + if (fatal_signal_pending(current)) { + ret = -EINTR; + goto out; + } } ret = 0; -- cgit v1.2.3-70-g09d2 From dd4b857aabbce57518f2d32d9805365d3ff34fc6 Mon Sep 17 00:00:00 2001 From: Wang Xiaoguang Date: Tue, 18 Oct 2016 15:56:13 +0800 Subject: btrfs: pass correct args to btrfs_async_run_delayed_refs() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In btrfs_truncate_inode_items()->btrfs_async_run_delayed_refs(), we swap the arg2 and arg3 wrongly, fix this. This bug just impacts asynchronous delayed refs handle when we truncate inodes. In delayed_ref_async_start(), there is such codes: trans = btrfs_join_transaction(async->root); if (trans->transid > async->transid) goto end; ret = btrfs_run_delayed_refs(trans, async->root, async->count); From this codes, we can see that this just influence whether can we handle delayed refs or the number of delayed refs to handle, this may impact performance, but will not result in missing delayed refs, all delayed refs will be handled in btrfs_commit_transaction(). Signed-off-by: Wang Xiaoguang Reviewed-by: Holger Hoffstätte Signed-off-by: David Sterba --- fs/btrfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 6677674d0505..1f980efbb2e8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4605,8 +4605,8 @@ delete: BUG_ON(ret); if (btrfs_should_throttle_delayed_refs(trans, root)) btrfs_async_run_delayed_refs(root, - trans->transid, - trans->delayed_ref_updates * 2, 0); + trans->delayed_ref_updates * 2, + trans->transid, 0); if (be_nice) { if (truncate_space_check(trans, root, extent_num_bytes)) { -- cgit v1.2.3-70-g09d2 From 9c894696f56f5d84fb5766af81bcda4a7cb9a842 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 12 Oct 2016 11:33:21 +0300 Subject: Btrfs: remove some no-op casts We cast 0 to a u8 but then because of type promotion, it's immediately cast to int back to int before we do a bitwise negate. The cast doesn't matter in this case, the code works as intended. It causes a static checker warning though so let's remove it. Signed-off-by: Dan Carpenter Reviewed-by: David Sterba Signed-off-by: David Sterba --- fs/btrfs/extent_io.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 66a755150056..8ed05d95584a 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -5569,7 +5569,7 @@ void le_bitmap_set(u8 *map, unsigned int start, int len) *p |= mask_to_set; len -= bits_to_set; bits_to_set = BITS_PER_BYTE; - mask_to_set = ~(u8)0; + mask_to_set = ~0; p++; } if (len) { @@ -5589,7 +5589,7 @@ void le_bitmap_clear(u8 *map, unsigned int start, int len) *p &= ~mask_to_clear; len -= bits_to_clear; bits_to_clear = BITS_PER_BYTE; - mask_to_clear = ~(u8)0; + mask_to_clear = ~0; p++; } if (len) { @@ -5679,7 +5679,7 @@ void extent_buffer_bitmap_set(struct extent_buffer *eb, unsigned long start, kaddr[offset] |= mask_to_set; len -= bits_to_set; bits_to_set = BITS_PER_BYTE; - mask_to_set = ~(u8)0; + mask_to_set = ~0; if (++offset >= PAGE_SIZE && len > 0) { offset = 0; page = eb->pages[++i]; @@ -5721,7 +5721,7 @@ void extent_buffer_bitmap_clear(struct extent_buffer *eb, unsigned long start, kaddr[offset] &= ~mask_to_clear; len -= bits_to_clear; bits_to_clear = BITS_PER_BYTE; - mask_to_clear = ~(u8)0; + mask_to_clear = ~0; if (++offset >= PAGE_SIZE && len > 0) { offset = 0; page = eb->pages[++i]; -- cgit v1.2.3-70-g09d2 From 9d1032cc49a8a1065e79ee323de66bcb4fdbd535 Mon Sep 17 00:00:00 2001 From: Wang Xiaoguang Date: Mon, 20 Jun 2016 09:18:52 +0800 Subject: btrfs: fix WARNING in btrfs_select_ref_head() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This issue was found when testing in-band dedupe enospc behaviour, sometimes run_one_delayed_ref() may fail for enospc reason, then __btrfs_run_delayed_refs()will return, but forget to add num_heads_read back, which will trigger "WARN_ON(delayed_refs->num_heads_ready == 0)" in btrfs_select_ref_head(). Signed-off-by: Wang Xiaoguang Signed-off-by: David Sterba --- fs/btrfs/extent-tree.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'fs') diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 210c94ac8818..4607af38c72e 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -2647,7 +2647,10 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans, btrfs_free_delayed_extent_op(extent_op); if (ret) { + spin_lock(&delayed_refs->lock); locked_ref->processing = 0; + delayed_refs->num_heads_ready++; + spin_unlock(&delayed_refs->lock); btrfs_delayed_ref_unlock(locked_ref); btrfs_put_delayed_ref(ref); btrfs_debug(fs_info, "run_one_delayed_ref returned %d", -- cgit v1.2.3-70-g09d2 From 68a564006a21ae59c7c51b4359e2e8efa42ae4af Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 18 Oct 2016 00:05:35 +0200 Subject: NFSv4.1: work around -Wmaybe-uninitialized warning A bugfix introduced a harmless gcc warning in nfs4_slot_seqid_in_use if we enable -Wmaybe-uninitialized again: fs/nfs/nfs4session.c:203:54: error: 'cur_seq' may be used uninitialized in this function [-Werror=maybe-uninitialized] gcc is not smart enough to conclude that the IS_ERR/PTR_ERR pair results in a nonzero return value here. Using PTR_ERR_OR_ZERO() instead makes this clear to the compiler. The warning originally did not appear in v4.8 as it was globally disabled, but the bugfix that introduced the warning got backported to stable kernels which again enable it, and this is now the only warning in the v4.7 builds. Fixes: e09c978aae5b ("NFSv4.1: Fix Oopsable condition in server callback races") Signed-off-by: Arnd Bergmann Cc: Trond Myklebust Signed-off-by: Anna Schumaker --- fs/nfs/nfs4session.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/nfs/nfs4session.c b/fs/nfs/nfs4session.c index b62973045a3e..150c5a1879bf 100644 --- a/fs/nfs/nfs4session.c +++ b/fs/nfs/nfs4session.c @@ -178,12 +178,14 @@ static int nfs4_slot_get_seqid(struct nfs4_slot_table *tbl, u32 slotid, __must_hold(&tbl->slot_tbl_lock) { struct nfs4_slot *slot; + int ret; slot = nfs4_lookup_slot(tbl, slotid); - if (IS_ERR(slot)) - return PTR_ERR(slot); - *seq_nr = slot->seq_nr; - return 0; + ret = PTR_ERR_OR_ZERO(slot); + if (!ret) + *seq_nr = slot->seq_nr; + + return ret; } /* -- cgit v1.2.3-70-g09d2 From 0cc11a61b80a1ab1d12f1597b27b8b45ef8bac4a Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 20 Oct 2016 09:34:31 -0400 Subject: nfsd: move blocked lock handling under a dedicated spinlock Bruce was hitting some lockdep warnings in testing, showing that we could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a rather complex situation involving four different spinlocks. The crux of the matter is that we end up taking the nn->client_lock in the lm_notify handler. The simplest fix is to just declare a new per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures. Signed-off-by: Jeff Layton Signed-off-by: J. Bruce Fields --- fs/nfsd/netns.h | 5 +++++ fs/nfsd/nfs4state.c | 28 +++++++++++++++------------- 2 files changed, 20 insertions(+), 13 deletions(-) (limited to 'fs') diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index b10d557f9c9e..ee36efd5aece 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -84,6 +84,8 @@ struct nfsd_net { struct list_head client_lru; struct list_head close_lru; struct list_head del_recall_lru; + + /* protected by blocked_locks_lock */ struct list_head blocked_locks_lru; struct delayed_work laundromat_work; @@ -91,6 +93,9 @@ struct nfsd_net { /* client_lock protects the client lru list and session hash table */ spinlock_t client_lock; + /* protects blocked_locks_lru */ + spinlock_t blocked_locks_lock; + struct file *rec_file; bool in_grace; const struct nfsd4_client_tracking_ops *client_tracking_ops; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 9752beb78659..95474196a17e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -217,7 +217,7 @@ find_blocked_lock(struct nfs4_lockowner *lo, struct knfsd_fh *fh, { struct nfsd4_blocked_lock *cur, *found = NULL; - spin_lock(&nn->client_lock); + spin_lock(&nn->blocked_locks_lock); list_for_each_entry(cur, &lo->lo_blocked, nbl_list) { if (fh_match(fh, &cur->nbl_fh)) { list_del_init(&cur->nbl_list); @@ -226,7 +226,7 @@ find_blocked_lock(struct nfs4_lockowner *lo, struct knfsd_fh *fh, break; } } - spin_unlock(&nn->client_lock); + spin_unlock(&nn->blocked_locks_lock); if (found) posix_unblock_lock(&found->nbl_lock); return found; @@ -4665,7 +4665,7 @@ nfs4_laundromat(struct nfsd_net *nn) * indefinitely once the lock does become free. */ BUG_ON(!list_empty(&reaplist)); - spin_lock(&nn->client_lock); + spin_lock(&nn->blocked_locks_lock); while (!list_empty(&nn->blocked_locks_lru)) { nbl = list_first_entry(&nn->blocked_locks_lru, struct nfsd4_blocked_lock, nbl_lru); @@ -4678,7 +4678,7 @@ nfs4_laundromat(struct nfsd_net *nn) list_move(&nbl->nbl_lru, &reaplist); list_del_init(&nbl->nbl_list); } - spin_unlock(&nn->client_lock); + spin_unlock(&nn->blocked_locks_lock); while (!list_empty(&reaplist)) { nbl = list_first_entry(&nn->blocked_locks_lru, @@ -5439,13 +5439,13 @@ nfsd4_lm_notify(struct file_lock *fl) bool queue = false; /* An empty list means that something else is going to be using it */ - spin_lock(&nn->client_lock); + spin_lock(&nn->blocked_locks_lock); if (!list_empty(&nbl->nbl_list)) { list_del_init(&nbl->nbl_list); list_del_init(&nbl->nbl_lru); queue = true; } - spin_unlock(&nn->client_lock); + spin_unlock(&nn->blocked_locks_lock); if (queue) nfsd4_run_cb(&nbl->nbl_cb); @@ -5868,10 +5868,10 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, if (fl_flags & FL_SLEEP) { nbl->nbl_time = jiffies; - spin_lock(&nn->client_lock); + spin_lock(&nn->blocked_locks_lock); list_add_tail(&nbl->nbl_list, &lock_sop->lo_blocked); list_add_tail(&nbl->nbl_lru, &nn->blocked_locks_lru); - spin_unlock(&nn->client_lock); + spin_unlock(&nn->blocked_locks_lock); } err = vfs_lock_file(filp, F_SETLK, file_lock, conflock); @@ -5900,10 +5900,10 @@ out: if (nbl) { /* dequeue it if we queued it before */ if (fl_flags & FL_SLEEP) { - spin_lock(&nn->client_lock); + spin_lock(&nn->blocked_locks_lock); list_del_init(&nbl->nbl_list); list_del_init(&nbl->nbl_lru); - spin_unlock(&nn->client_lock); + spin_unlock(&nn->blocked_locks_lock); } free_blocked_lock(nbl); } @@ -6943,9 +6943,11 @@ static int nfs4_state_create_net(struct net *net) INIT_LIST_HEAD(&nn->client_lru); INIT_LIST_HEAD(&nn->close_lru); INIT_LIST_HEAD(&nn->del_recall_lru); - INIT_LIST_HEAD(&nn->blocked_locks_lru); spin_lock_init(&nn->client_lock); + spin_lock_init(&nn->blocked_locks_lock); + INIT_LIST_HEAD(&nn->blocked_locks_lru); + INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main); get_net(net); @@ -7063,14 +7065,14 @@ nfs4_state_shutdown_net(struct net *net) } BUG_ON(!list_empty(&reaplist)); - spin_lock(&nn->client_lock); + spin_lock(&nn->blocked_locks_lock); while (!list_empty(&nn->blocked_locks_lru)) { nbl = list_first_entry(&nn->blocked_locks_lru, struct nfsd4_blocked_lock, nbl_lru); list_move(&nbl->nbl_lru, &reaplist); list_del_init(&nbl->nbl_list); } - spin_unlock(&nn->client_lock); + spin_unlock(&nn->blocked_locks_lock); while (!list_empty(&reaplist)) { nbl = list_first_entry(&nn->blocked_locks_lru, -- cgit v1.2.3-70-g09d2 From 0b944d3a4bba6b25f43aed530f4fa85c04d162a6 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Sun, 30 Oct 2016 11:42:01 -0500 Subject: aio: hold an extra file reference over AIO read/write operations Otherwise we might dereference an already freed file and/or inode when aio_complete is called before we return from the read_iter or write_iter method. Signed-off-by: Christoph Hellwig Signed-off-by: Al Viro --- fs/aio.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/aio.c b/fs/aio.c index 1157e13a36d6..0aa71d338c04 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1460,6 +1460,7 @@ rw_common: return ret; } + get_file(file); if (rw == WRITE) file_start_write(file); @@ -1467,6 +1468,7 @@ rw_common: if (rw == WRITE) file_end_write(file); + fput(file); kfree(iovec); break; -- cgit v1.2.3-70-g09d2 From 723c038475b78edc9327eb952f95f9881cc9d79d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Sun, 30 Oct 2016 11:42:02 -0500 Subject: fs: remove the never implemented aio_fsync file operation Signed-off-by: Christoph Hellwig Signed-off-by: Al Viro --- Documentation/filesystems/Locking | 1 - Documentation/filesystems/vfs.txt | 1 - fs/aio.c | 14 -------------- fs/ntfs/dir.c | 2 -- include/linux/fs.h | 1 - 5 files changed, 19 deletions(-) (limited to 'fs') diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 14cdc101d165..1b5f15653b1b 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -447,7 +447,6 @@ prototypes: int (*flush) (struct file *); int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, loff_t start, loff_t end, int datasync); - int (*aio_fsync) (struct kiocb *, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index d619c8d71966..b5039a00caaf 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -828,7 +828,6 @@ struct file_operations { int (*flush) (struct file *, fl_owner_t id); int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, loff_t, loff_t, int datasync); - int (*aio_fsync) (struct kiocb *, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); diff --git a/fs/aio.c b/fs/aio.c index 0aa71d338c04..2a6030af6ba5 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1472,20 +1472,6 @@ rw_common: kfree(iovec); break; - case IOCB_CMD_FDSYNC: - if (!file->f_op->aio_fsync) - return -EINVAL; - - ret = file->f_op->aio_fsync(req, 1); - break; - - case IOCB_CMD_FSYNC: - if (!file->f_op->aio_fsync) - return -EINVAL; - - ret = file->f_op->aio_fsync(req, 0); - break; - default: pr_debug("EINVAL: no operation provided\n"); return -EINVAL; diff --git a/fs/ntfs/dir.c b/fs/ntfs/dir.c index a18613579001..0ee19ecc982d 100644 --- a/fs/ntfs/dir.c +++ b/fs/ntfs/dir.c @@ -1544,8 +1544,6 @@ const struct file_operations ntfs_dir_ops = { .iterate = ntfs_readdir, /* Read directory contents. */ #ifdef NTFS_RW .fsync = ntfs_dir_fsync, /* Sync a directory to disk. */ - /*.aio_fsync = ,*/ /* Sync all outstanding async - i/o operations on a kiocb. */ #endif /* NTFS_RW */ /*.ioctl = ,*/ /* Perform function on the mounted filesystem. */ diff --git a/include/linux/fs.h b/include/linux/fs.h index 16d2b6e874d6..ff7bcd9e8398 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1709,7 +1709,6 @@ struct file_operations { int (*flush) (struct file *, fl_owner_t id); int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, loff_t, loff_t, int datasync); - int (*aio_fsync) (struct kiocb *, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); -- cgit v1.2.3-70-g09d2 From 89319d31d2d097da8e27fb0e0ae9d532f4f16827 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Sun, 30 Oct 2016 11:42:03 -0500 Subject: fs: remove aio_run_iocb Pass the ABI iocb structure to aio_setup_rw and let it handle the non-vectored I/O case as well. With that and a new helper for the AIO return value handling we can now define new aio_read and aio_write helpers that implement reads and writes in a self-contained way without duplicating too much code. Signed-off-by: Christoph Hellwig Signed-off-by: Al Viro --- fs/aio.c | 182 +++++++++++++++++++++++++++++++++------------------------------ 1 file changed, 94 insertions(+), 88 deletions(-) (limited to 'fs') diff --git a/fs/aio.c b/fs/aio.c index 2a6030af6ba5..c19755187ca5 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1392,110 +1392,100 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx) return -EINVAL; } -typedef ssize_t (rw_iter_op)(struct kiocb *, struct iov_iter *); - -static int aio_setup_vectored_rw(int rw, char __user *buf, size_t len, - struct iovec **iovec, - bool compat, - struct iov_iter *iter) +static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec, + bool vectored, bool compat, struct iov_iter *iter) { + void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf; + size_t len = iocb->aio_nbytes; + + if (!vectored) { + ssize_t ret = import_single_range(rw, buf, len, *iovec, iter); + *iovec = NULL; + return ret; + } #ifdef CONFIG_COMPAT if (compat) - return compat_import_iovec(rw, - (struct compat_iovec __user *)buf, - len, UIO_FASTIOV, iovec, iter); + return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec, + iter); #endif - return import_iovec(rw, (struct iovec __user *)buf, - len, UIO_FASTIOV, iovec, iter); + return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter); } -/* - * aio_run_iocb: - * Performs the initial checks and io submission. - */ -static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode, - char __user *buf, size_t len, bool compat) +static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret) +{ + switch (ret) { + case -EIOCBQUEUED: + return ret; + case -ERESTARTSYS: + case -ERESTARTNOINTR: + case -ERESTARTNOHAND: + case -ERESTART_RESTARTBLOCK: + /* + * There's no easy way to restart the syscall since other AIO's + * may be already running. Just fail this IO with EINTR. + */ + ret = -EINTR; + /*FALLTHRU*/ + default: + aio_complete(req, ret, 0); + return 0; + } +} + +static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored, + bool compat) { struct file *file = req->ki_filp; - ssize_t ret; - int rw; - fmode_t mode; - rw_iter_op *iter_op; struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; struct iov_iter iter; + ssize_t ret; - switch (opcode) { - case IOCB_CMD_PREAD: - case IOCB_CMD_PREADV: - mode = FMODE_READ; - rw = READ; - iter_op = file->f_op->read_iter; - goto rw_common; - - case IOCB_CMD_PWRITE: - case IOCB_CMD_PWRITEV: - mode = FMODE_WRITE; - rw = WRITE; - iter_op = file->f_op->write_iter; - goto rw_common; -rw_common: - if (unlikely(!(file->f_mode & mode))) - return -EBADF; - - if (!iter_op) - return -EINVAL; - - if (opcode == IOCB_CMD_PREADV || opcode == IOCB_CMD_PWRITEV) - ret = aio_setup_vectored_rw(rw, buf, len, - &iovec, compat, &iter); - else { - ret = import_single_range(rw, buf, len, iovec, &iter); - iovec = NULL; - } - if (!ret) - ret = rw_verify_area(rw, file, &req->ki_pos, - iov_iter_count(&iter)); - if (ret < 0) { - kfree(iovec); - return ret; - } - - get_file(file); - if (rw == WRITE) - file_start_write(file); + if (unlikely(!(file->f_mode & FMODE_READ))) + return -EBADF; + if (unlikely(!file->f_op->read_iter)) + return -EINVAL; - ret = iter_op(req, &iter); + ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter); + if (ret) + return ret; + ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter)); + if (!ret) + ret = aio_ret(req, file->f_op->read_iter(req, &iter)); + kfree(iovec); + return ret; +} - if (rw == WRITE) - file_end_write(file); - fput(file); - kfree(iovec); - break; +static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored, + bool compat) +{ + struct file *file = req->ki_filp; + struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; + struct iov_iter iter; + ssize_t ret; - default: - pr_debug("EINVAL: no operation provided\n"); + if (unlikely(!(file->f_mode & FMODE_WRITE))) + return -EBADF; + if (unlikely(!file->f_op->write_iter)) return -EINVAL; - } - if (ret != -EIOCBQUEUED) { - /* - * There's no easy way to restart the syscall since other AIO's - * may be already running. Just fail this IO with EINTR. - */ - if (unlikely(ret == -ERESTARTSYS || ret == -ERESTARTNOINTR || - ret == -ERESTARTNOHAND || - ret == -ERESTART_RESTARTBLOCK)) - ret = -EINTR; - aio_complete(req, ret, 0); + ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter); + if (ret) + return ret; + ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); + if (!ret) { + file_start_write(file); + ret = aio_ret(req, file->f_op->write_iter(req, &iter)); + file_end_write(file); } - - return 0; + kfree(iovec); + return ret; } static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, struct iocb *iocb, bool compat) { struct aio_kiocb *req; + struct file *file; ssize_t ret; /* enforce forwards compatibility on users */ @@ -1518,7 +1508,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, if (unlikely(!req)) return -EAGAIN; - req->common.ki_filp = fget(iocb->aio_fildes); + req->common.ki_filp = file = fget(iocb->aio_fildes); if (unlikely(!req->common.ki_filp)) { ret = -EBADF; goto out_put_req; @@ -1553,13 +1543,29 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb, req->ki_user_iocb = user_iocb; req->ki_user_data = iocb->aio_data; - ret = aio_run_iocb(&req->common, iocb->aio_lio_opcode, - (char __user *)(unsigned long)iocb->aio_buf, - iocb->aio_nbytes, - compat); - if (ret) - goto out_put_req; + get_file(file); + switch (iocb->aio_lio_opcode) { + case IOCB_CMD_PREAD: + ret = aio_read(&req->common, iocb, false, compat); + break; + case IOCB_CMD_PWRITE: + ret = aio_write(&req->common, iocb, false, compat); + break; + case IOCB_CMD_PREADV: + ret = aio_read(&req->common, iocb, true, compat); + break; + case IOCB_CMD_PWRITEV: + ret = aio_write(&req->common, iocb, true, compat); + break; + default: + pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode); + ret = -EINVAL; + break; + } + fput(file); + if (ret && ret != -EIOCBQUEUED) + goto out_put_req; return 0; out_put_req: put_reqs_available(ctx, 1); -- cgit v1.2.3-70-g09d2 From 70fe2f48152e60664809e2fed76bbb50c9fa2aa3 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Sun, 30 Oct 2016 11:42:04 -0500 Subject: aio: fix freeze protection of aio writes Currently we dropped freeze protection of aio writes just after IO was submitted. Thus aio write could be in flight while the filesystem was frozen and that could result in unexpected situation like aio completion wanting to convert extent type on frozen filesystem. Testcase from Dmitry triggering this is like: for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done & fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \ --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite Fix the problem by dropping freeze protection only once IO is completed in aio_complete(). Reported-by: Dmitry Monakhov Signed-off-by: Jan Kara [hch: forward ported on top of various VFS and aio changes] Signed-off-by: Christoph Hellwig Signed-off-by: Al Viro --- fs/aio.c | 19 ++++++++++++++++++- include/linux/fs.h | 1 + 2 files changed, 19 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/aio.c b/fs/aio.c index c19755187ca5..428484f2f841 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1078,6 +1078,17 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2) unsigned tail, pos, head; unsigned long flags; + if (kiocb->ki_flags & IOCB_WRITE) { + struct file *file = kiocb->ki_filp; + + /* + * Tell lockdep we inherited freeze protection from submission + * thread. + */ + __sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE); + file_end_write(file); + } + /* * Special case handling for sync iocbs: * - events go directly into the iocb for fast handling @@ -1473,9 +1484,15 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored, return ret; ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) { + req->ki_flags |= IOCB_WRITE; file_start_write(file); ret = aio_ret(req, file->f_op->write_iter(req, &iter)); - file_end_write(file); + /* + * We release freeze protection in aio_complete(). Fool lockdep + * by telling it the lock got released so that it doesn't + * complain about held lock when we return to userspace. + */ + __sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE); } kfree(iovec); return ret; diff --git a/include/linux/fs.h b/include/linux/fs.h index ff7bcd9e8398..dc0478c07b2a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -321,6 +321,7 @@ struct writeback_control; #define IOCB_HIPRI (1 << 3) #define IOCB_DSYNC (1 << 4) #define IOCB_SYNC (1 << 5) +#define IOCB_WRITE (1 << 6) struct kiocb { struct file *ki_filp; -- cgit v1.2.3-70-g09d2 From fd3220d37b1f6f0cab6142d98b0e6c4082e63299 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 31 Oct 2016 14:42:14 +0100 Subject: ovl: update S_ISGID when setting posix ACLs This change fixes xfstest generic/375, which failed to clear the setgid bit in the following test case on overlayfs: touch $testfile chown 100:100 $testfile chmod 2755 $testfile _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile Reported-by: Amir Goldstein Signed-off-by: Miklos Szeredi Tested-by: Amir Goldstein Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting") Cc: # v4.8 --- fs/overlayfs/super.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'fs') diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index bcf3965be819..edd46a0e951d 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -1037,6 +1037,21 @@ ovl_posix_acl_xattr_set(const struct xattr_handler *handler, posix_acl_release(acl); + /* + * Check if sgid bit needs to be cleared (actual setacl operation will + * be done with mounter's capabilities and so that won't do it for us). + */ + if (unlikely(inode->i_mode & S_ISGID) && + handler->flags == ACL_TYPE_ACCESS && + !in_group_p(inode->i_gid) && + !capable_wrt_inode_uidgid(inode, CAP_FSETID)) { + struct iattr iattr = { .ia_valid = ATTR_KILL_SGID }; + + err = ovl_setattr(dentry, &iattr); + if (err) + return err; + } + err = ovl_xattr_set(dentry, handler->name, value, size, flags); if (!err) ovl_copyattr(ovl_inode_real(inode, NULL), inode); -- cgit v1.2.3-70-g09d2 From b93d4a0eb308d4400b84c8b24c1b80e09a9497d0 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 31 Oct 2016 14:42:14 +0100 Subject: ovl: fix get_acl() on tmpfs tmpfs doesn't have ->get_acl() because it only uses cached acls. This fixes the acl tests in pjdfstest when tmpfs is used as the upper layer of the overlay. Reported-by: Amir Goldstein Signed-off-by: Miklos Szeredi Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes") Cc: # v4.8 --- fs/overlayfs/inode.c | 3 --- 1 file changed, 3 deletions(-) (limited to 'fs') diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index c58f01babf30..7fb53d055537 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -270,9 +270,6 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type) if (!IS_ENABLED(CONFIG_FS_POSIX_ACL) || !IS_POSIXACL(realinode)) return NULL; - if (!realinode->i_op->get_acl) - return NULL; - old_cred = ovl_override_creds(inode->i_sb); acl = get_acl(realinode, type); revert_creds(old_cred); -- cgit v1.2.3-70-g09d2 From 641089c1549d8d3df0b047b5de7e9a111362cdce Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 31 Oct 2016 14:42:14 +0100 Subject: ovl: fsync after copy-up Make sure the copied up file hits the disk before renaming to the final destination. If this is not done then the copy-up may corrupt the data in the file in case of a crash. Signed-off-by: Miklos Szeredi Cc: --- fs/overlayfs/copy_up.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c index aeb60f791418..36795eed40b0 100644 --- a/fs/overlayfs/copy_up.c +++ b/fs/overlayfs/copy_up.c @@ -178,6 +178,8 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len) len -= bytes; } + if (!error) + error = vfs_fsync(new_file, 0); fput(new_file); out_fput: fput(old_file); -- cgit v1.2.3-70-g09d2 From f46c445b79906a9da55c13e0a6f6b6a006b892fe Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Sat, 29 Oct 2016 18:19:03 -0400 Subject: nfsd: Fix general protection fault in release_lock_stateid() When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example), I get this crash on the server: Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8 Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000 Oct 28 22:04:30 klimt kernel: RIP: 0010:[] [] release_lock_stateid+0x1f/0x60 [nfsd] Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0 EFLAGS: 00010246 Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000 Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020 Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000 Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548 Oct 28 22:04:30 klimt kernel: FS: 0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000 Oct 28 22:04:30 klimt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0 Oct 28 22:04:30 klimt kernel: Stack: Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20 Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0 Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870 Oct 28 22:04:30 klimt kernel: Call Trace: Oct 28 22:04:30 klimt kernel: [] nfsd4_free_stateid+0xfd/0x1b0 [nfsd] Oct 28 22:04:30 klimt kernel: [] nfsd4_proc_compound+0x40d/0x690 [nfsd] Oct 28 22:04:30 klimt kernel: [] nfsd_dispatch+0xd4/0x1d0 [nfsd] Oct 28 22:04:30 klimt kernel: [] svc_process_common+0x3d9/0x700 [sunrpc] Oct 28 22:04:30 klimt kernel: [] svc_process+0xf4/0x330 [sunrpc] Oct 28 22:04:30 klimt kernel: [] nfsd+0xfa/0x160 [nfsd] Oct 28 22:04:30 klimt kernel: [] ? nfsd_destroy+0x170/0x170 [nfsd] Oct 28 22:04:30 klimt kernel: [] kthread+0x10b/0x120 Oct 28 22:04:30 klimt kernel: [] ? kthread_stop+0x280/0x280 Oct 28 22:04:30 klimt kernel: [] ret_from_fork+0x2a/0x40 Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8 Oct 28 22:04:30 klimt kernel: RIP [] release_lock_stateid+0x1f/0x60 [nfsd] Oct 28 22:04:30 klimt kernel: RSP Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]--- Jeff Layton says: > Hm...now that I look though, this is a little suspicious: > > struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner); > > I wonder if it's possible for the openstateid to have already been > destroyed at this point. > > We might be better off doing something like this to get the client pointer: > > stp->st_stid.sc_client; > > ...which should be more direct and less dependent on other stateids > staying valid. With the suggested change, I am no longer able to reproduce the above oops. v2: Fix unhash_lock_stateid() as well Fix-suggested-by: Jeff Layton Fixes: 42691398be08 ('nfsd: Fix race between FREE_STATEID and LOCK') Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields --- fs/nfsd/nfs4state.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 95474196a17e..4b4beaaa4eaa 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -1227,9 +1227,7 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp, static bool unhash_lock_stateid(struct nfs4_ol_stateid *stp) { - struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner); - - lockdep_assert_held(&oo->oo_owner.so_client->cl_lock); + lockdep_assert_held(&stp->st_stid.sc_client->cl_lock); list_del_init(&stp->st_locks); nfs4_unhash_stid(&stp->st_stid); @@ -1238,12 +1236,12 @@ static bool unhash_lock_stateid(struct nfs4_ol_stateid *stp) static void release_lock_stateid(struct nfs4_ol_stateid *stp) { - struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner); + struct nfs4_client *clp = stp->st_stid.sc_client; bool unhashed; - spin_lock(&oo->oo_owner.so_client->cl_lock); + spin_lock(&clp->cl_lock); unhashed = unhash_lock_stateid(stp); - spin_unlock(&oo->oo_owner.so_client->cl_lock); + spin_unlock(&clp->cl_lock); if (unhashed) nfs4_put_stid(&stp->st_stid); } -- cgit v1.2.3-70-g09d2 From dc0336214eb07ee9de2a41dd4c81c744ffa419ac Mon Sep 17 00:00:00 2001 From: Mike Marshall Date: Fri, 4 Nov 2016 16:32:25 -0400 Subject: orangefs: clean up debugfs We recently refactored the Orangefs debugfs code. The refactor seemed to trigger dan.carpenter@oracle.com's static tester to find a possible double-free in the code. While designing the fix we saw a condition under which the buffer being freed could also be overflowed. We also realized how to rebuild the related debugfs file's "contents" (a string) without deleting and re-creating the file. This fix should eliminate the possible double-free, the potential overflow and improve code readability. Signed-off-by: Mike Marshall Signed-off-by: Martin Brandenburg --- fs/orangefs/orangefs-debugfs.c | 147 ++++++++++++++++++----------------------- fs/orangefs/orangefs-mod.c | 6 +- 2 files changed, 68 insertions(+), 85 deletions(-) (limited to 'fs') diff --git a/fs/orangefs/orangefs-debugfs.c b/fs/orangefs/orangefs-debugfs.c index eb09aa026723..d484068ca716 100644 --- a/fs/orangefs/orangefs-debugfs.c +++ b/fs/orangefs/orangefs-debugfs.c @@ -141,6 +141,9 @@ static struct client_debug_mask client_debug_mask; */ static DEFINE_MUTEX(orangefs_debug_lock); +/* Used to protect data in ORANGEFS_KMOD_DEBUG_HELP_FILE */ +static DEFINE_MUTEX(orangefs_help_file_lock); + /* * initialize kmod debug operations, create orangefs debugfs dir and * ORANGEFS_KMOD_DEBUG_HELP_FILE. @@ -289,6 +292,8 @@ static void *help_start(struct seq_file *m, loff_t *pos) gossip_debug(GOSSIP_DEBUGFS_DEBUG, "help_start: start\n"); + mutex_lock(&orangefs_help_file_lock); + if (*pos == 0) payload = m->private; @@ -305,6 +310,7 @@ static void *help_next(struct seq_file *m, void *v, loff_t *pos) static void help_stop(struct seq_file *m, void *p) { gossip_debug(GOSSIP_DEBUGFS_DEBUG, "help_stop: start\n"); + mutex_unlock(&orangefs_help_file_lock); } static int help_show(struct seq_file *m, void *v) @@ -610,32 +616,54 @@ out: * /sys/kernel/debug/orangefs/debug-help can be catted to * see all the available kernel and client debug keywords. * - * When the kernel boots, we have no idea what keywords the + * When orangefs.ko initializes, we have no idea what keywords the * client supports, nor their associated masks. * - * We pass through this function once at boot and stamp a + * We pass through this function once at module-load and stamp a * boilerplate "we don't know" message for the client in the * debug-help file. We pass through here again when the client * starts and then we can fill out the debug-help file fully. * * The client might be restarted any number of times between - * reboots, we only build the debug-help file the first time. + * module reloads, we only build the debug-help file the first time. */ int orangefs_prepare_debugfs_help_string(int at_boot) { - int rc = -EINVAL; - int i; - int byte_count = 0; char *client_title = "Client Debug Keywords:\n"; char *kernel_title = "Kernel Debug Keywords:\n"; + size_t string_size = DEBUG_HELP_STRING_SIZE; + size_t result_size; + size_t i; + char *new; + int rc = -EINVAL; gossip_debug(GOSSIP_UTILS_DEBUG, "%s: start\n", __func__); - if (at_boot) { - byte_count += strlen(HELP_STRING_UNINITIALIZED); + if (at_boot) client_title = HELP_STRING_UNINITIALIZED; - } else { - /* + + /* build a new debug_help_string. */ + new = kzalloc(DEBUG_HELP_STRING_SIZE, GFP_KERNEL); + if (!new) { + rc = -ENOMEM; + goto out; + } + + /* + * strlcat(dst, src, size) will append at most + * "size - strlen(dst) - 1" bytes of src onto dst, + * null terminating the result, and return the total + * length of the string it tried to create. + * + * We'll just plow through here building our new debug + * help string and let strlcat take care of assuring that + * dst doesn't overflow. + */ + strlcat(new, client_title, string_size); + + if (!at_boot) { + + /* * fill the client keyword/mask array and remember * how many elements there were. */ @@ -644,64 +672,40 @@ int orangefs_prepare_debugfs_help_string(int at_boot) if (cdm_element_count <= 0) goto out; - /* Count the bytes destined for debug_help_string. */ - byte_count += strlen(client_title); - for (i = 0; i < cdm_element_count; i++) { - byte_count += strlen(cdm_array[i].keyword + 2); - if (byte_count >= DEBUG_HELP_STRING_SIZE) { - pr_info("%s: overflow 1!\n", __func__); - goto out; - } + strlcat(new, "\t", string_size); + strlcat(new, cdm_array[i].keyword, string_size); + strlcat(new, "\n", string_size); } - - gossip_debug(GOSSIP_UTILS_DEBUG, - "%s: cdm_element_count:%d:\n", - __func__, - cdm_element_count); } - byte_count += strlen(kernel_title); + strlcat(new, "\n", string_size); + strlcat(new, kernel_title, string_size); + for (i = 0; i < num_kmod_keyword_mask_map; i++) { - byte_count += - strlen(s_kmod_keyword_mask_map[i].keyword + 2); - if (byte_count >= DEBUG_HELP_STRING_SIZE) { - pr_info("%s: overflow 2!\n", __func__); - goto out; - } + strlcat(new, "\t", string_size); + strlcat(new, s_kmod_keyword_mask_map[i].keyword, string_size); + result_size = strlcat(new, "\n", string_size); } - /* build debug_help_string. */ - debug_help_string = kzalloc(DEBUG_HELP_STRING_SIZE, GFP_KERNEL); - if (!debug_help_string) { - rc = -ENOMEM; + /* See if we tried to put too many bytes into "new"... */ + if (result_size >= string_size) { + kfree(new); goto out; } - strcat(debug_help_string, client_title); - - if (!at_boot) { - for (i = 0; i < cdm_element_count; i++) { - strcat(debug_help_string, "\t"); - strcat(debug_help_string, cdm_array[i].keyword); - strcat(debug_help_string, "\n"); - } - } - - strcat(debug_help_string, "\n"); - strcat(debug_help_string, kernel_title); - - for (i = 0; i < num_kmod_keyword_mask_map; i++) { - strcat(debug_help_string, "\t"); - strcat(debug_help_string, s_kmod_keyword_mask_map[i].keyword); - strcat(debug_help_string, "\n"); + if (at_boot) { + debug_help_string = new; + } else { + mutex_lock(&orangefs_help_file_lock); + memset(debug_help_string, 0, DEBUG_HELP_STRING_SIZE); + strlcat(debug_help_string, new, string_size); + mutex_unlock(&orangefs_help_file_lock); } rc = 0; -out: - - return rc; +out: return rc; } @@ -959,8 +963,12 @@ int orangefs_debugfs_new_client_string(void __user *arg) ret = copy_from_user(&client_debug_array_string, (void __user *)arg, ORANGEFS_MAX_DEBUG_STRING_LEN); - if (ret != 0) + + if (ret != 0) { + pr_info("%s: CLIENT_STRING: copy_from_user failed\n", + __func__); return -EIO; + } /* * The real client-core makes an effort to ensure @@ -975,45 +983,18 @@ int orangefs_debugfs_new_client_string(void __user *arg) client_debug_array_string[ORANGEFS_MAX_DEBUG_STRING_LEN - 1] = '\0'; - if (ret != 0) { - pr_info("%s: CLIENT_STRING: copy_from_user failed\n", - __func__); - return -EIO; - } - pr_info("%s: client debug array string has been received.\n", __func__); if (!help_string_initialized) { - /* Free the "we don't know yet" default string... */ - kfree(debug_help_string); - - /* build a proper debug help string */ + /* Build a proper debug help string. */ if (orangefs_prepare_debugfs_help_string(0)) { gossip_err("%s: no debug help string \n", __func__); return -EIO; } - /* Replace the boilerplate boot-time debug-help file. */ - debugfs_remove(help_file_dentry); - - help_file_dentry = - debugfs_create_file( - ORANGEFS_KMOD_DEBUG_HELP_FILE, - 0444, - debug_dir, - debug_help_string, - &debug_help_fops); - - if (!help_file_dentry) { - gossip_err("%s: debugfs_create_file failed for" - " :%s:!\n", - __func__, - ORANGEFS_KMOD_DEBUG_HELP_FILE); - return -EIO; - } } debug_mask_to_string(&client_debug_mask, 1); diff --git a/fs/orangefs/orangefs-mod.c b/fs/orangefs/orangefs-mod.c index 2e5b03065f34..4113eb0495bf 100644 --- a/fs/orangefs/orangefs-mod.c +++ b/fs/orangefs/orangefs-mod.c @@ -124,7 +124,7 @@ static int __init orangefs_init(void) * unknown at boot time. * * orangefs_prepare_debugfs_help_string will be used again - * later to rebuild the debug-help file after the client starts + * later to rebuild the debug-help-string after the client starts * and passes along the needed info. The argument signifies * which time orangefs_prepare_debugfs_help_string is being * called. @@ -152,7 +152,9 @@ static int __init orangefs_init(void) ret = register_filesystem(&orangefs_fs_type); if (ret == 0) { - pr_info("orangefs: module version %s loaded\n", ORANGEFS_VERSION); + pr_info("%s: module version %s loaded\n", + __func__, + ORANGEFS_VERSION); ret = 0; goto out; } -- cgit v1.2.3-70-g09d2 From 8ef3295530ddc969ea9a3f307d94df97fcbc0629 Mon Sep 17 00:00:00 2001 From: Petr Vandrovec Date: Mon, 7 Nov 2016 12:11:29 -0800 Subject: NFS: Ignore connections that have cl_rpcclient uninitialized cl_rpcclient starts as ERR_PTR(-EINVAL), and connections like that are floating freely through the system. Most places check whether pointer is valid before dereferencing it, but newly added code in nfs_match_client does not. Which causes crashes when more than one NFS mount point is present. Signed-off-by: Petr Vandrovec Signed-off-by: Anna Schumaker --- fs/nfs/client.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/client.c b/fs/nfs/client.c index 7555ba889d1f..ebecfb8fba06 100644 --- a/fs/nfs/client.c +++ b/fs/nfs/client.c @@ -314,7 +314,8 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat /* Match the full socket address */ if (!rpc_cmp_addr_port(sap, clap)) /* Match all xprt_switch full socket addresses */ - if (!rpc_clnt_xprt_switch_has_addr(clp->cl_rpcclient, + if (IS_ERR(clp->cl_rpcclient) || + !rpc_clnt_xprt_switch_has_addr(clp->cl_rpcclient, sap)) continue; -- cgit v1.2.3-70-g09d2 From 192747166a468dd8fb5d47ad9d5048c138c1fc25 Mon Sep 17 00:00:00 2001 From: Anna Schumaker Date: Wed, 26 Oct 2016 15:54:31 -0400 Subject: NFS: Don't print a pNFS error if we aren't using pNFS We used to check for a valid layout type id before verifying pNFS flags as an indicator for if we are using pNFS. This changed in 3132e49ece with the introduction of multiple layout types, since now we are passing an array of ids instead of just one. Since then, users have been seeing a KERN_ERR printk show up whenever mounting NFS v4 without pNFS. This patch restores the original behavior of exiting set_pnfs_layoutdriver() early if we aren't using pNFS. Fixes 3132e49ece ("pnfs: track multiple layout types in fsinfo structure") Reviewed-by: Jeff Layton Signed-off-by: Anna Schumaker --- fs/nfs/pnfs.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c index 56b2d96f9103..259ef85f435a 100644 --- a/fs/nfs/pnfs.c +++ b/fs/nfs/pnfs.c @@ -146,6 +146,8 @@ set_pnfs_layoutdriver(struct nfs_server *server, const struct nfs_fh *mntfh, u32 id; int i; + if (fsinfo->nlayouttypes == 0) + goto out_no_driver; if (!(server->nfs_client->cl_exchange_flags & (EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS))) { printk(KERN_ERR "NFS: %s: cl_exchange_flags 0x%x\n", -- cgit v1.2.3-70-g09d2 From 0ac84b72c0ed96ace1d8973a06f0120a3b905177 Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Mon, 7 Nov 2016 10:48:16 -0700 Subject: fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix the following warn: fs/nfs/nfs4session.c: In function ‘nfs4_slot_seqid_in_use’: fs/nfs/nfs4session.c:203:54: warning: ‘cur_seq’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (nfs4_slot_get_seqid(tbl, slotid, &cur_seq) == 0 && ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ cur_seq == seq_nr && test_bit(slotid, tbl->used_slots)) ~~~~~~~~~~~~~~~~~ Signed-off-by: Shuah Khan Signed-off-by: Anna Schumaker --- fs/nfs/nfs4session.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/nfs/nfs4session.c b/fs/nfs/nfs4session.c index 150c5a1879bf..a61350f75c74 100644 --- a/fs/nfs/nfs4session.c +++ b/fs/nfs/nfs4session.c @@ -198,7 +198,7 @@ static int nfs4_slot_get_seqid(struct nfs4_slot_table *tbl, u32 slotid, static bool nfs4_slot_seqid_in_use(struct nfs4_slot_table *tbl, u32 slotid, u32 seq_nr) { - u32 cur_seq; + u32 cur_seq = 0; bool ret = false; spin_lock(&tbl->slot_tbl_lock); -- cgit v1.2.3-70-g09d2 From 8a8d56176635515d607c520580c73d61f300b409 Mon Sep 17 00:00:00 2001 From: "Yan, Zheng" Date: Wed, 9 Nov 2016 16:47:54 +0800 Subject: ceph: use default file splice read callback Splice read/write implementation changed recently. When using generic_file_splice_read(), iov_iter with type == ITER_PIPE is passed to filesystem's read_iter callback. But ceph_sync_read() can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter expects pages from page cache). Fixing ceph_sync_read() requires a big patch. So use default splice read callback for now. Signed-off-by: Yan, Zheng Signed-off-by: Ilya Dryomov --- fs/ceph/file.c | 1 - 1 file changed, 1 deletion(-) (limited to 'fs') diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 18630e800208..f995e3528a33 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1770,7 +1770,6 @@ const struct file_operations ceph_file_fops = { .fsync = ceph_fsync, .lock = ceph_lock, .flock = ceph_flock, - .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, .unlocked_ioctl = ceph_ioctl, .compat_ioctl = ceph_ioctl, -- cgit v1.2.3-70-g09d2 From e519e7774784f3fa7728657d780863805ed1c983 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 10 Nov 2016 18:32:13 -0500 Subject: splice: remove detritus from generic_file_splice_read() i_size check is a leftover from the horrors that used to play with the page cache in that function. With the switch to ->read_iter(), it's neither needed nor correct - for gfs2 it ends up being buggy, since i_size is not guaranteed to be correct until later (inside ->read_iter()). Spotted-by: Abhi Das Signed-off-by: Al Viro --- fs/splice.c | 5 ----- 1 file changed, 5 deletions(-) (limited to 'fs') diff --git a/fs/splice.c b/fs/splice.c index 153d4f3bd441..dcaf185a5731 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -299,13 +299,8 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos, { struct iov_iter to; struct kiocb kiocb; - loff_t isize; int idx, ret; - isize = i_size_read(in->f_mapping->host); - if (unlikely(*ppos >= isize)) - return 0; - iov_iter_pipe(&to, ITER_PIPE | READ, pipe, len); idx = to.idx; init_sync_kiocb(&kiocb, in); -- cgit v1.2.3-70-g09d2 From d006c71f8ad2663dd47f81bf96bf655eeed428e2 Mon Sep 17 00:00:00 2001 From: Junxiao Bi Date: Thu, 10 Nov 2016 10:46:29 -0800 Subject: ocfs2: fix not enough credit panic The following panic was caught when run ocfs2 disconfig single test (block size 512 and cluster size 8192). ocfs2_journal_dirty() return -ENOSPC, that means credits were used up. The total credit should include 3 times of "num_dx_leaves" from ocfs2_dx_dir_rebalance(), because 2 times will be consumed in ocfs2_dx_dir_transfer_leaf() and 1 time will be consumed in ocfs2_dx_dir_new_cluster() -> __ocfs2_dx_dir_new_cluster() -> ocfs2_dx_dir_format_cluster(). But only two times is included in ocfs2_dx_dir_rebalance_credits(), fix it. This can cause read-only fs(v4.1+) or panic for mainline linux depending on mount option. ------------[ cut here ]------------ kernel BUG at fs/ocfs2/journal.c:775! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport acpi_cpufreq i2c_piix4 i2c_core pcspkr ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod CPU: 2 PID: 10601 Comm: dd Not tainted 4.1.12-71.el6uek.bug24939243.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016 task: ffff8800b6de6200 ti: ffff8800a7d48000 task.ti: ffff8800a7d48000 RIP: ocfs2_journal_dirty+0xa7/0xb0 [ocfs2] RSP: 0018:ffff8800a7d4b6d8 EFLAGS: 00010286 RAX: 00000000ffffffe4 RBX: 00000000814d0a9c RCX: 00000000000004f9 RDX: ffffffffa008e990 RSI: ffffffffa008f1ee RDI: ffff8800622b6460 RBP: ffff8800a7d4b6f8 R08: ffffffffa008f288 R09: ffff8800622b6460 R10: 0000000000000000 R11: 0000000000000282 R12: 0000000002c8421e R13: ffff88006d0cad00 R14: ffff880092beef60 R15: 0000000000000070 FS: 00007f9b83e92700(0000) GS:ffff8800be880000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fb2c0d1a000 CR3: 0000000008f80000 CR4: 00000000000406e0 Call Trace: ocfs2_dx_dir_transfer_leaf+0x159/0x1a0 [ocfs2] ocfs2_dx_dir_rebalance+0xd9b/0xea0 [ocfs2] ocfs2_find_dir_space_dx+0xd3/0x300 [ocfs2] ocfs2_prepare_dx_dir_for_insert+0x219/0x450 [ocfs2] ocfs2_prepare_dir_for_insert+0x1d6/0x580 [ocfs2] ocfs2_mknod+0x5a2/0x1400 [ocfs2] ocfs2_create+0x73/0x180 [ocfs2] vfs_create+0xd8/0x100 lookup_open+0x185/0x1c0 do_last+0x36d/0x780 path_openat+0x92/0x470 do_filp_open+0x4a/0xa0 do_sys_open+0x11a/0x230 SyS_open+0x1e/0x20 system_call_fastpath+0x12/0x71 Code: 1d 3f 29 09 00 48 85 db 74 1f 48 8b 03 0f 1f 80 00 00 00 00 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 eb eb 90 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 RIP ocfs2_journal_dirty+0xa7/0xb0 [ocfs2] ---[ end trace 91ac5312a6ee1288 ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Link: http://lkml.kernel.org/r/1478248135-31963-1-git-send-email-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi Cc: Mark Fasheh Cc: Joel Becker Cc: Joseph Qi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ocfs2/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c index e7054e2ac922..3ecb9f337b7d 100644 --- a/fs/ocfs2/dir.c +++ b/fs/ocfs2/dir.c @@ -3699,7 +3699,7 @@ static void ocfs2_dx_dir_transfer_leaf(struct inode *dir, u32 split_hash, static int ocfs2_dx_dir_rebalance_credits(struct ocfs2_super *osb, struct ocfs2_dx_root_block *dx_root) { - int credits = ocfs2_clusters_to_blocks(osb->sb, 2); + int credits = ocfs2_clusters_to_blocks(osb->sb, 3); credits += ocfs2_calc_extend_credits(osb->sb, &dx_root->dr_list); credits += ocfs2_quota_trans_credits(osb->sb); -- cgit v1.2.3-70-g09d2 From 70d78fe7c8b640b5acfad56ad341985b3810998a Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Thu, 10 Nov 2016 10:46:38 -0800 Subject: coredump: fix unfreezable coredumping task It could be not possible to freeze coredumping task when it waits for 'core_state->startup' completion, because threads are frozen in get_signal() before they got a chance to complete 'core_state->startup'. Inability to freeze a task during suspend will cause suspend to fail. Also CRIU uses cgroup freezer during dump operation. So with an unfreezable task the CRIU dump will fail because it waits for a transition from 'FREEZING' to 'FROZEN' state which will never happen. Use freezer_do_not_count() to tell freezer to ignore coredumping task while it waits for core_state->startup completion. Link: http://lkml.kernel.org/r/1475225434-3753-1-git-send-email-aryabinin@virtuozzo.com Signed-off-by: Andrey Ryabinin Acked-by: Pavel Machek Acked-by: Oleg Nesterov Cc: Alexander Viro Cc: Tejun Heo Cc: "Rafael J. Wysocki" Cc: Michal Hocko Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/coredump.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'fs') diff --git a/fs/coredump.c b/fs/coredump.c index 281b768000e6..eb9c92c9b20f 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -1,6 +1,7 @@ #include #include #include +#include #include #include #include @@ -423,7 +424,9 @@ static int coredump_wait(int exit_code, struct core_state *core_state) if (core_waiters > 0) { struct core_thread *ptr; + freezer_do_not_count(); wait_for_completion(&core_state->startup); + freezer_count(); /* * Wait for all the threads to become inactive, so that * all the thread context (extended register state, like -- cgit v1.2.3-70-g09d2