From 4547f4d8ffd63ba4ac129f9136027bd14b729101 Mon Sep 17 00:00:00 2001
From: Liu Bo <bo.li.liu@oracle.com>
Date: Fri, 23 Sep 2016 14:05:04 -0700
Subject: Btrfs: kill BUG_ON in do_relocation

While updating btree, we try to push items between sibling
nodes/leaves in order to keep height as low as possible.
But we don't memset the original places with zero when
pushing items so that we could end up leaving stale content
in nodes/leaves.  One may read the above stale content by
increasing btree blocks' @nritems.

One case I've come across is that in fs tree, a leaf has two
parent nodes, hence running balance ends up with processing
this leaf with two parent nodes, but it can only reach the
valid parent node through btrfs_search_slot, so it'd be like,

do_relocation
    for P in all parent nodes of block A:
        if !P->eb:
            btrfs_search_slot(key);   --> get path from P to A.
        if lowest:
            BUG_ON(A->bytenr != bytenr of A recorded in P);
        btrfs_cow_block(P, A);   --> change A's bytenr in P.

After btrfs_cow_block, P has the new bytenr of A, but with the
same @key, we get the same path again, and get panic by BUG_ON.

Note that this is only happening in a corrupted fs, for a
regular fs in which we have correct @nritems so that we won't
read stale content in any case.

Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/relocation.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0ec8ffa37ab0..c4af0cdb783d 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2728,7 +2728,14 @@ static int do_relocation(struct btrfs_trans_handle *trans,
 
 		bytenr = btrfs_node_blockptr(upper->eb, slot);
 		if (lowest) {
-			BUG_ON(bytenr != node->bytenr);
+			if (bytenr != node->bytenr) {
+				btrfs_err(root->fs_info,
+		"lowest leaf/node mismatch: bytenr %llu node->bytenr %llu slot %d upper %llu",
+					  bytenr, node->bytenr, slot,
+					  upper->eb->start);
+				err = -EIO;
+				goto next;
+			}
 		} else {
 			if (node->eb->start == bytenr)
 				goto next;
-- 
cgit v1.2.3-70-g09d2


From b77428b12b55437b28deae738d9ce8b2e0663b55 Mon Sep 17 00:00:00 2001
From: "Darrick J. Wong" <darrick.wong@oracle.com>
Date: Mon, 24 Oct 2016 14:21:18 +1100
Subject: xfs: defer should abort intent items if the trans roll fails

If the deferred ops transaction roll fails, we need to abort the intent
items if we haven't already logged a done item for it, regardless of
whether or not the deferred ops has had a transaction committed.  Dave
found this while running generic/388.

Move the tracepoint to make it easier to track object lifetimes.

Reported-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/libxfs/xfs_defer.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

(limited to 'fs')

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 613c5cf19436..5c2929f94bd3 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -199,9 +199,9 @@ xfs_defer_intake_work(
 	struct xfs_defer_pending	*dfp;
 
 	list_for_each_entry(dfp, &dop->dop_intake, dfp_list) {
-		trace_xfs_defer_intake_work(tp->t_mountp, dfp);
 		dfp->dfp_intent = dfp->dfp_type->create_intent(tp,
 				dfp->dfp_count);
+		trace_xfs_defer_intake_work(tp->t_mountp, dfp);
 		list_sort(tp->t_mountp, &dfp->dfp_work,
 				dfp->dfp_type->diff_items);
 		list_for_each(li, &dfp->dfp_work)
@@ -221,21 +221,14 @@ xfs_defer_trans_abort(
 	struct xfs_defer_pending	*dfp;
 
 	trace_xfs_defer_trans_abort(tp->t_mountp, dop);
-	/*
-	 * If the transaction was committed, drop the intent reference
-	 * since we're bailing out of here. The other reference is
-	 * dropped when the intent hits the AIL.  If the transaction
-	 * was not committed, the intent is freed by the intent item
-	 * unlock handler on abort.
-	 */
-	if (!dop->dop_committed)
-		return;
 
-	/* Abort intent items. */
+	/* Abort intent items that don't have a done item. */
 	list_for_each_entry(dfp, &dop->dop_pending, dfp_list) {
 		trace_xfs_defer_pending_abort(tp->t_mountp, dfp);
-		if (!dfp->dfp_done)
+		if (dfp->dfp_intent && !dfp->dfp_done) {
 			dfp->dfp_type->abort_intent(dfp->dfp_intent);
+			dfp->dfp_intent = NULL;
+		}
 	}
 
 	/* Shut down FS. */
-- 
cgit v1.2.3-70-g09d2


From 86a6c211d676add579a75b7e172a72bb3e2c21f8 Mon Sep 17 00:00:00 2001
From: Benjamin Coddington <bcodding@redhat.com>
Date: Wed, 15 Jun 2016 15:02:55 -0400
Subject: NFS: Trim extra slash in v4 nfs_path

A NFSv4 mount of a subdirectory will show an extra slash (as in
'server://path') in proc's mountinfo which will not match the device name
and path.  This can cause problems for programs searching for the mount.
Fix this by checking for a leading slash in the dentry path, if so trim
away any trailing slashes in the device name.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfs/namespace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/nfs/namespace.c b/fs/nfs/namespace.c
index c8162c660c44..5551e8ef67fd 100644
--- a/fs/nfs/namespace.c
+++ b/fs/nfs/namespace.c
@@ -98,7 +98,7 @@ rename_retry:
 		return end;
 	}
 	namelen = strlen(base);
-	if (flags & NFS_PATH_CANONICAL) {
+	if (*end == '/') {
 		/* Strip off excess slashes in base string */
 		while (namelen > 0 && base[namelen - 1] == '/')
 			namelen--;
-- 
cgit v1.2.3-70-g09d2


From 0b34c261e235a5c74dcf78bd305845bd15fe2b42 Mon Sep 17 00:00:00 2001
From: Goldwyn Rodrigues <rgoldwyn@suse.com>
Date: Fri, 30 Sep 2016 10:40:52 -0500
Subject: btrfs: qgroup: Prevent qgroup->reserved from going subzero

While free'ing qgroup->reserved resources, we much check if
the page has not been invalidated by a truncate operation
by checking if the page is still dirty before reducing the
qgroup resources. Resources in such a case are free'd when
the entire extent is released by delayed_ref.

This fixes a double accounting while releasing resources
in case of truncating a file, reproduced by the following testcase.

SCRATCH_DEV=/dev/vdb
SCRATCH_MNT=/mnt
mkfs.btrfs -f $SCRATCH_DEV
mount -t btrfs $SCRATCH_DEV $SCRATCH_MNT
cd $SCRATCH_MNT
btrfs quota enable $SCRATCH_MNT
btrfs subvolume create a
btrfs qgroup limit 500m a $SCRATCH_MNT
sync
for c in {1..15}; do
dd if=/dev/zero  bs=1M count=40 of=$SCRATCH_MNT/a/file;
done

sleep 10
sync
sleep 5

touch $SCRATCH_MNT/a/newfile

echo "Removing file"
rm $SCRATCH_MNT/a/file

Fixes: b9d0b38928 ("btrfs: Add handler for invalidate page")
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/inode.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

(limited to 'fs')

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 50ba4ca167e7..6677674d0505 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -8930,9 +8930,14 @@ again:
 	 *    So even we call qgroup_free_data(), it won't decrease reserved
 	 *    space.
 	 * 2) Not written to disk
-	 *    This means the reserved space should be freed here.
+	 *    This means the reserved space should be freed here. However,
+	 *    if a truncate invalidates the page (by clearing PageDirty)
+	 *    and the page is accounted for while allocating extent
+	 *    in btrfs_check_data_free_space() we let delayed_ref to
+	 *    free the entire extent.
 	 */
-	btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE);
+	if (PageDirty(page))
+		btrfs_qgroup_free_data(inode, page_start, PAGE_SIZE);
 	if (!inode_evicting) {
 		clear_extent_bit(tree, page_start, page_end,
 				 EXTENT_LOCKED | EXTENT_DIRTY |
-- 
cgit v1.2.3-70-g09d2


From 69ae5e4459e43e56f03d0987e865fbac2b05af2a Mon Sep 17 00:00:00 2001
From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Date: Thu, 13 Oct 2016 09:23:39 +0800
Subject: btrfs: make file clone aware of fatal signals

Indeed this just make the behavior similar to xfs when process has
fatal signals pending, and it'll make fstests/generic/298 happy.

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/ioctl.c | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'fs')

diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index af69129d7e0e..71634570943c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3814,6 +3814,11 @@ process_slot:
 		}
 		btrfs_release_path(path);
 		key.offset = next_key_min_offset;
+
+		if (fatal_signal_pending(current)) {
+			ret = -EINTR;
+			goto out;
+		}
 	}
 	ret = 0;
 
-- 
cgit v1.2.3-70-g09d2


From dd4b857aabbce57518f2d32d9805365d3ff34fc6 Mon Sep 17 00:00:00 2001
From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Date: Tue, 18 Oct 2016 15:56:13 +0800
Subject: btrfs: pass correct args to btrfs_async_run_delayed_refs()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

In btrfs_truncate_inode_items()->btrfs_async_run_delayed_refs(), we
swap the arg2 and arg3 wrongly, fix this.

This bug just impacts asynchronous delayed refs handle when we truncate inodes.
In delayed_ref_async_start(), there is such codes:

    trans = btrfs_join_transaction(async->root);
    if (trans->transid > async->transid)
        goto end;
    ret = btrfs_run_delayed_refs(trans, async->root, async->count);

From this codes, we can see that this just influence whether can we handle
delayed refs or the number of delayed refs to handle, this may impact
performance, but will not result in missing delayed refs, all delayed refs will
be handled in btrfs_commit_transaction().

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/inode.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'fs')

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 6677674d0505..1f980efbb2e8 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4605,8 +4605,8 @@ delete:
 			BUG_ON(ret);
 			if (btrfs_should_throttle_delayed_refs(trans, root))
 				btrfs_async_run_delayed_refs(root,
-							     trans->transid,
-					trans->delayed_ref_updates * 2, 0);
+					trans->delayed_ref_updates * 2,
+					trans->transid, 0);
 			if (be_nice) {
 				if (truncate_space_check(trans, root,
 							 extent_num_bytes)) {
-- 
cgit v1.2.3-70-g09d2


From 9c894696f56f5d84fb5766af81bcda4a7cb9a842 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Wed, 12 Oct 2016 11:33:21 +0300
Subject: Btrfs: remove some no-op casts

We cast 0 to a u8 but then because of type promotion, it's immediately
cast to int back to int before we do a bitwise negate.  The cast doesn't
matter in this case, the code works as intended.  It causes a static
checker warning though so let's remove it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent_io.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

(limited to 'fs')

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 66a755150056..8ed05d95584a 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -5569,7 +5569,7 @@ void le_bitmap_set(u8 *map, unsigned int start, int len)
 		*p |= mask_to_set;
 		len -= bits_to_set;
 		bits_to_set = BITS_PER_BYTE;
-		mask_to_set = ~(u8)0;
+		mask_to_set = ~0;
 		p++;
 	}
 	if (len) {
@@ -5589,7 +5589,7 @@ void le_bitmap_clear(u8 *map, unsigned int start, int len)
 		*p &= ~mask_to_clear;
 		len -= bits_to_clear;
 		bits_to_clear = BITS_PER_BYTE;
-		mask_to_clear = ~(u8)0;
+		mask_to_clear = ~0;
 		p++;
 	}
 	if (len) {
@@ -5679,7 +5679,7 @@ void extent_buffer_bitmap_set(struct extent_buffer *eb, unsigned long start,
 		kaddr[offset] |= mask_to_set;
 		len -= bits_to_set;
 		bits_to_set = BITS_PER_BYTE;
-		mask_to_set = ~(u8)0;
+		mask_to_set = ~0;
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
@@ -5721,7 +5721,7 @@ void extent_buffer_bitmap_clear(struct extent_buffer *eb, unsigned long start,
 		kaddr[offset] &= ~mask_to_clear;
 		len -= bits_to_clear;
 		bits_to_clear = BITS_PER_BYTE;
-		mask_to_clear = ~(u8)0;
+		mask_to_clear = ~0;
 		if (++offset >= PAGE_SIZE && len > 0) {
 			offset = 0;
 			page = eb->pages[++i];
-- 
cgit v1.2.3-70-g09d2


From 9d1032cc49a8a1065e79ee323de66bcb4fdbd535 Mon Sep 17 00:00:00 2001
From: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Date: Mon, 20 Jun 2016 09:18:52 +0800
Subject: btrfs: fix WARNING in btrfs_select_ref_head()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This issue was found when testing in-band dedupe enospc behaviour,
sometimes run_one_delayed_ref() may fail for enospc reason, then
__btrfs_run_delayed_refs（）will return, but forget to add num_heads_read
back, which will trigger "WARN_ON(delayed_refs->num_heads_ready == 0)" in
btrfs_select_ref_head().

Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
---
 fs/btrfs/extent-tree.c | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'fs')

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 210c94ac8818..4607af38c72e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -2647,7 +2647,10 @@ static noinline int __btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
 
 		btrfs_free_delayed_extent_op(extent_op);
 		if (ret) {
+			spin_lock(&delayed_refs->lock);
 			locked_ref->processing = 0;
+			delayed_refs->num_heads_ready++;
+			spin_unlock(&delayed_refs->lock);
 			btrfs_delayed_ref_unlock(locked_ref);
 			btrfs_put_delayed_ref(ref);
 			btrfs_debug(fs_info, "run_one_delayed_ref returned %d",
-- 
cgit v1.2.3-70-g09d2


From 68a564006a21ae59c7c51b4359e2e8efa42ae4af Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd@arndb.de>
Date: Tue, 18 Oct 2016 00:05:35 +0200
Subject: NFSv4.1: work around -Wmaybe-uninitialized warning

A bugfix introduced a harmless gcc warning in nfs4_slot_seqid_in_use
if we enable -Wmaybe-uninitialized again:

fs/nfs/nfs4session.c:203:54: error: 'cur_seq' may be used uninitialized in this function [-Werror=maybe-uninitialized]

gcc is not smart enough to conclude that the IS_ERR/PTR_ERR pair
results in a nonzero return value here. Using PTR_ERR_OR_ZERO()
instead makes this clear to the compiler.

The warning originally did not appear in v4.8 as it was globally
disabled, but the bugfix that introduced the warning got backported
to stable kernels which again enable it, and this is now the only
warning in the v4.7 builds.

Fixes: e09c978aae5b ("NFSv4.1: Fix Oopsable condition in server callback races")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfs/nfs4session.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

(limited to 'fs')

diff --git a/fs/nfs/nfs4session.c b/fs/nfs/nfs4session.c
index b62973045a3e..150c5a1879bf 100644
--- a/fs/nfs/nfs4session.c
+++ b/fs/nfs/nfs4session.c
@@ -178,12 +178,14 @@ static int nfs4_slot_get_seqid(struct nfs4_slot_table  *tbl, u32 slotid,
 	__must_hold(&tbl->slot_tbl_lock)
 {
 	struct nfs4_slot *slot;
+	int ret;
 
 	slot = nfs4_lookup_slot(tbl, slotid);
-	if (IS_ERR(slot))
-		return PTR_ERR(slot);
-	*seq_nr = slot->seq_nr;
-	return 0;
+	ret = PTR_ERR_OR_ZERO(slot);
+	if (!ret)
+		*seq_nr = slot->seq_nr;
+
+	return ret;
 }
 
 /*
-- 
cgit v1.2.3-70-g09d2


From 0cc11a61b80a1ab1d12f1597b27b8b45ef8bac4a Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@redhat.com>
Date: Thu, 20 Oct 2016 09:34:31 -0400
Subject: nfsd: move blocked lock handling under a dedicated spinlock

Bruce was hitting some lockdep warnings in testing, showing that we
could hit a deadlock with the new CB_NOTIFY_LOCK handling, involving a
rather complex situation involving four different spinlocks.

The crux of the matter is that we end up taking the nn->client_lock in
the lm_notify handler. The simplest fix is to just declare a new
per-nfsd_net spinlock to protect the new CB_NOTIFY_LOCK structures.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/netns.h     |  5 +++++
 fs/nfsd/nfs4state.c | 28 +++++++++++++++-------------
 2 files changed, 20 insertions(+), 13 deletions(-)

(limited to 'fs')

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index b10d557f9c9e..ee36efd5aece 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -84,6 +84,8 @@ struct nfsd_net {
 	struct list_head client_lru;
 	struct list_head close_lru;
 	struct list_head del_recall_lru;
+
+	/* protected by blocked_locks_lock */
 	struct list_head blocked_locks_lru;
 
 	struct delayed_work laundromat_work;
@@ -91,6 +93,9 @@ struct nfsd_net {
 	/* client_lock protects the client lru list and session hash table */
 	spinlock_t client_lock;
 
+	/* protects blocked_locks_lru */
+	spinlock_t blocked_locks_lock;
+
 	struct file *rec_file;
 	bool in_grace;
 	const struct nfsd4_client_tracking_ops *client_tracking_ops;
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9752beb78659..95474196a17e 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -217,7 +217,7 @@ find_blocked_lock(struct nfs4_lockowner *lo, struct knfsd_fh *fh,
 {
 	struct nfsd4_blocked_lock *cur, *found = NULL;
 
-	spin_lock(&nn->client_lock);
+	spin_lock(&nn->blocked_locks_lock);
 	list_for_each_entry(cur, &lo->lo_blocked, nbl_list) {
 		if (fh_match(fh, &cur->nbl_fh)) {
 			list_del_init(&cur->nbl_list);
@@ -226,7 +226,7 @@ find_blocked_lock(struct nfs4_lockowner *lo, struct knfsd_fh *fh,
 			break;
 		}
 	}
-	spin_unlock(&nn->client_lock);
+	spin_unlock(&nn->blocked_locks_lock);
 	if (found)
 		posix_unblock_lock(&found->nbl_lock);
 	return found;
@@ -4665,7 +4665,7 @@ nfs4_laundromat(struct nfsd_net *nn)
 	 * indefinitely once the lock does become free.
 	 */
 	BUG_ON(!list_empty(&reaplist));
-	spin_lock(&nn->client_lock);
+	spin_lock(&nn->blocked_locks_lock);
 	while (!list_empty(&nn->blocked_locks_lru)) {
 		nbl = list_first_entry(&nn->blocked_locks_lru,
 					struct nfsd4_blocked_lock, nbl_lru);
@@ -4678,7 +4678,7 @@ nfs4_laundromat(struct nfsd_net *nn)
 		list_move(&nbl->nbl_lru, &reaplist);
 		list_del_init(&nbl->nbl_list);
 	}
-	spin_unlock(&nn->client_lock);
+	spin_unlock(&nn->blocked_locks_lock);
 
 	while (!list_empty(&reaplist)) {
 		nbl = list_first_entry(&nn->blocked_locks_lru,
@@ -5439,13 +5439,13 @@ nfsd4_lm_notify(struct file_lock *fl)
 	bool queue = false;
 
 	/* An empty list means that something else is going to be using it */
-	spin_lock(&nn->client_lock);
+	spin_lock(&nn->blocked_locks_lock);
 	if (!list_empty(&nbl->nbl_list)) {
 		list_del_init(&nbl->nbl_list);
 		list_del_init(&nbl->nbl_lru);
 		queue = true;
 	}
-	spin_unlock(&nn->client_lock);
+	spin_unlock(&nn->blocked_locks_lock);
 
 	if (queue)
 		nfsd4_run_cb(&nbl->nbl_cb);
@@ -5868,10 +5868,10 @@ nfsd4_lock(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	if (fl_flags & FL_SLEEP) {
 		nbl->nbl_time = jiffies;
-		spin_lock(&nn->client_lock);
+		spin_lock(&nn->blocked_locks_lock);
 		list_add_tail(&nbl->nbl_list, &lock_sop->lo_blocked);
 		list_add_tail(&nbl->nbl_lru, &nn->blocked_locks_lru);
-		spin_unlock(&nn->client_lock);
+		spin_unlock(&nn->blocked_locks_lock);
 	}
 
 	err = vfs_lock_file(filp, F_SETLK, file_lock, conflock);
@@ -5900,10 +5900,10 @@ out:
 	if (nbl) {
 		/* dequeue it if we queued it before */
 		if (fl_flags & FL_SLEEP) {
-			spin_lock(&nn->client_lock);
+			spin_lock(&nn->blocked_locks_lock);
 			list_del_init(&nbl->nbl_list);
 			list_del_init(&nbl->nbl_lru);
-			spin_unlock(&nn->client_lock);
+			spin_unlock(&nn->blocked_locks_lock);
 		}
 		free_blocked_lock(nbl);
 	}
@@ -6943,9 +6943,11 @@ static int nfs4_state_create_net(struct net *net)
 	INIT_LIST_HEAD(&nn->client_lru);
 	INIT_LIST_HEAD(&nn->close_lru);
 	INIT_LIST_HEAD(&nn->del_recall_lru);
-	INIT_LIST_HEAD(&nn->blocked_locks_lru);
 	spin_lock_init(&nn->client_lock);
 
+	spin_lock_init(&nn->blocked_locks_lock);
+	INIT_LIST_HEAD(&nn->blocked_locks_lru);
+
 	INIT_DELAYED_WORK(&nn->laundromat_work, laundromat_main);
 	get_net(net);
 
@@ -7063,14 +7065,14 @@ nfs4_state_shutdown_net(struct net *net)
 	}
 
 	BUG_ON(!list_empty(&reaplist));
-	spin_lock(&nn->client_lock);
+	spin_lock(&nn->blocked_locks_lock);
 	while (!list_empty(&nn->blocked_locks_lru)) {
 		nbl = list_first_entry(&nn->blocked_locks_lru,
 					struct nfsd4_blocked_lock, nbl_lru);
 		list_move(&nbl->nbl_lru, &reaplist);
 		list_del_init(&nbl->nbl_list);
 	}
-	spin_unlock(&nn->client_lock);
+	spin_unlock(&nn->blocked_locks_lock);
 
 	while (!list_empty(&reaplist)) {
 		nbl = list_first_entry(&nn->blocked_locks_lru,
-- 
cgit v1.2.3-70-g09d2


From 0b944d3a4bba6b25f43aed530f4fa85c04d162a6 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sun, 30 Oct 2016 11:42:01 -0500
Subject: aio: hold an extra file reference over AIO read/write operations

Otherwise we might dereference an already freed file and/or inode
when aio_complete is called before we return from the read_iter or
write_iter method.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'fs')

diff --git a/fs/aio.c b/fs/aio.c
index 1157e13a36d6..0aa71d338c04 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1460,6 +1460,7 @@ rw_common:
 			return ret;
 		}
 
+		get_file(file);
 		if (rw == WRITE)
 			file_start_write(file);
 
@@ -1467,6 +1468,7 @@ rw_common:
 
 		if (rw == WRITE)
 			file_end_write(file);
+		fput(file);
 		kfree(iovec);
 		break;
 
-- 
cgit v1.2.3-70-g09d2


From 723c038475b78edc9327eb952f95f9881cc9d79d Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sun, 30 Oct 2016 11:42:02 -0500
Subject: fs: remove the never implemented aio_fsync file operation

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 Documentation/filesystems/Locking |  1 -
 Documentation/filesystems/vfs.txt |  1 -
 fs/aio.c                          | 14 --------------
 fs/ntfs/dir.c                     |  2 --
 include/linux/fs.h                |  1 -
 5 files changed, 19 deletions(-)

(limited to 'fs')

diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 14cdc101d165..1b5f15653b1b 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -447,7 +447,6 @@ prototypes:
 	int (*flush) (struct file *);
 	int (*release) (struct inode *, struct file *);
 	int (*fsync) (struct file *, loff_t start, loff_t end, int datasync);
-	int (*aio_fsync) (struct kiocb *, int datasync);
 	int (*fasync) (int, struct file *, int);
 	int (*lock) (struct file *, int, struct file_lock *);
 	ssize_t (*readv) (struct file *, const struct iovec *, unsigned long,
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index d619c8d71966..b5039a00caaf 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -828,7 +828,6 @@ struct file_operations {
 	int (*flush) (struct file *, fl_owner_t id);
 	int (*release) (struct inode *, struct file *);
 	int (*fsync) (struct file *, loff_t, loff_t, int datasync);
-	int (*aio_fsync) (struct kiocb *, int datasync);
 	int (*fasync) (int, struct file *, int);
 	int (*lock) (struct file *, int, struct file_lock *);
 	ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
diff --git a/fs/aio.c b/fs/aio.c
index 0aa71d338c04..2a6030af6ba5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1472,20 +1472,6 @@ rw_common:
 		kfree(iovec);
 		break;
 
-	case IOCB_CMD_FDSYNC:
-		if (!file->f_op->aio_fsync)
-			return -EINVAL;
-
-		ret = file->f_op->aio_fsync(req, 1);
-		break;
-
-	case IOCB_CMD_FSYNC:
-		if (!file->f_op->aio_fsync)
-			return -EINVAL;
-
-		ret = file->f_op->aio_fsync(req, 0);
-		break;
-
 	default:
 		pr_debug("EINVAL: no operation provided\n");
 		return -EINVAL;
diff --git a/fs/ntfs/dir.c b/fs/ntfs/dir.c
index a18613579001..0ee19ecc982d 100644
--- a/fs/ntfs/dir.c
+++ b/fs/ntfs/dir.c
@@ -1544,8 +1544,6 @@ const struct file_operations ntfs_dir_ops = {
 	.iterate	= ntfs_readdir,		/* Read directory contents. */
 #ifdef NTFS_RW
 	.fsync		= ntfs_dir_fsync,	/* Sync a directory to disk. */
-	/*.aio_fsync	= ,*/			/* Sync all outstanding async
-						   i/o operations on a kiocb. */
 #endif /* NTFS_RW */
 	/*.ioctl	= ,*/			/* Perform function on the
 						   mounted filesystem. */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 16d2b6e874d6..ff7bcd9e8398 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1709,7 +1709,6 @@ struct file_operations {
 	int (*flush) (struct file *, fl_owner_t id);
 	int (*release) (struct inode *, struct file *);
 	int (*fsync) (struct file *, loff_t, loff_t, int datasync);
-	int (*aio_fsync) (struct kiocb *, int datasync);
 	int (*fasync) (int, struct file *, int);
 	int (*lock) (struct file *, int, struct file_lock *);
 	ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
-- 
cgit v1.2.3-70-g09d2


From 89319d31d2d097da8e27fb0e0ae9d532f4f16827 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Sun, 30 Oct 2016 11:42:03 -0500
Subject: fs: remove aio_run_iocb

Pass the ABI iocb structure to aio_setup_rw and let it handle the
non-vectored I/O case as well.  With that and a new helper for the AIO
return value handling we can now define new aio_read and aio_write
helpers that implement reads and writes in a self-contained way without
duplicating too much code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c | 182 +++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 94 insertions(+), 88 deletions(-)

(limited to 'fs')

diff --git a/fs/aio.c b/fs/aio.c
index 2a6030af6ba5..c19755187ca5 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1392,110 +1392,100 @@ SYSCALL_DEFINE1(io_destroy, aio_context_t, ctx)
 	return -EINVAL;
 }
 
-typedef ssize_t (rw_iter_op)(struct kiocb *, struct iov_iter *);
-
-static int aio_setup_vectored_rw(int rw, char __user *buf, size_t len,
-				 struct iovec **iovec,
-				 bool compat,
-				 struct iov_iter *iter)
+static int aio_setup_rw(int rw, struct iocb *iocb, struct iovec **iovec,
+		bool vectored, bool compat, struct iov_iter *iter)
 {
+	void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf;
+	size_t len = iocb->aio_nbytes;
+
+	if (!vectored) {
+		ssize_t ret = import_single_range(rw, buf, len, *iovec, iter);
+		*iovec = NULL;
+		return ret;
+	}
 #ifdef CONFIG_COMPAT
 	if (compat)
-		return compat_import_iovec(rw,
-				(struct compat_iovec __user *)buf,
-				len, UIO_FASTIOV, iovec, iter);
+		return compat_import_iovec(rw, buf, len, UIO_FASTIOV, iovec,
+				iter);
 #endif
-	return import_iovec(rw, (struct iovec __user *)buf,
-				len, UIO_FASTIOV, iovec, iter);
+	return import_iovec(rw, buf, len, UIO_FASTIOV, iovec, iter);
 }
 
-/*
- * aio_run_iocb:
- *	Performs the initial checks and io submission.
- */
-static ssize_t aio_run_iocb(struct kiocb *req, unsigned opcode,
-			    char __user *buf, size_t len, bool compat)
+static inline ssize_t aio_ret(struct kiocb *req, ssize_t ret)
+{
+	switch (ret) {
+	case -EIOCBQUEUED:
+		return ret;
+	case -ERESTARTSYS:
+	case -ERESTARTNOINTR:
+	case -ERESTARTNOHAND:
+	case -ERESTART_RESTARTBLOCK:
+		/*
+		 * There's no easy way to restart the syscall since other AIO's
+		 * may be already running. Just fail this IO with EINTR.
+		 */
+		ret = -EINTR;
+		/*FALLTHRU*/
+	default:
+		aio_complete(req, ret, 0);
+		return 0;
+	}
+}
+
+static ssize_t aio_read(struct kiocb *req, struct iocb *iocb, bool vectored,
+		bool compat)
 {
 	struct file *file = req->ki_filp;
-	ssize_t ret;
-	int rw;
-	fmode_t mode;
-	rw_iter_op *iter_op;
 	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
 	struct iov_iter iter;
+	ssize_t ret;
 
-	switch (opcode) {
-	case IOCB_CMD_PREAD:
-	case IOCB_CMD_PREADV:
-		mode	= FMODE_READ;
-		rw	= READ;
-		iter_op	= file->f_op->read_iter;
-		goto rw_common;
-
-	case IOCB_CMD_PWRITE:
-	case IOCB_CMD_PWRITEV:
-		mode	= FMODE_WRITE;
-		rw	= WRITE;
-		iter_op	= file->f_op->write_iter;
-		goto rw_common;
-rw_common:
-		if (unlikely(!(file->f_mode & mode)))
-			return -EBADF;
-
-		if (!iter_op)
-			return -EINVAL;
-
-		if (opcode == IOCB_CMD_PREADV || opcode == IOCB_CMD_PWRITEV)
-			ret = aio_setup_vectored_rw(rw, buf, len,
-						&iovec, compat, &iter);
-		else {
-			ret = import_single_range(rw, buf, len, iovec, &iter);
-			iovec = NULL;
-		}
-		if (!ret)
-			ret = rw_verify_area(rw, file, &req->ki_pos,
-					     iov_iter_count(&iter));
-		if (ret < 0) {
-			kfree(iovec);
-			return ret;
-		}
-
-		get_file(file);
-		if (rw == WRITE)
-			file_start_write(file);
+	if (unlikely(!(file->f_mode & FMODE_READ)))
+		return -EBADF;
+	if (unlikely(!file->f_op->read_iter))
+		return -EINVAL;
 
-		ret = iter_op(req, &iter);
+	ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter);
+	if (ret)
+		return ret;
+	ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter));
+	if (!ret)
+		ret = aio_ret(req, file->f_op->read_iter(req, &iter));
+	kfree(iovec);
+	return ret;
+}
 
-		if (rw == WRITE)
-			file_end_write(file);
-		fput(file);
-		kfree(iovec);
-		break;
+static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
+		bool compat)
+{
+	struct file *file = req->ki_filp;
+	struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
+	struct iov_iter iter;
+	ssize_t ret;
 
-	default:
-		pr_debug("EINVAL: no operation provided\n");
+	if (unlikely(!(file->f_mode & FMODE_WRITE)))
+		return -EBADF;
+	if (unlikely(!file->f_op->write_iter))
 		return -EINVAL;
-	}
 
-	if (ret != -EIOCBQUEUED) {
-		/*
-		 * There's no easy way to restart the syscall since other AIO's
-		 * may be already running. Just fail this IO with EINTR.
-		 */
-		if (unlikely(ret == -ERESTARTSYS || ret == -ERESTARTNOINTR ||
-			     ret == -ERESTARTNOHAND ||
-			     ret == -ERESTART_RESTARTBLOCK))
-			ret = -EINTR;
-		aio_complete(req, ret, 0);
+	ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter);
+	if (ret)
+		return ret;
+	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
+	if (!ret) {
+		file_start_write(file);
+		ret = aio_ret(req, file->f_op->write_iter(req, &iter));
+		file_end_write(file);
 	}
-
-	return 0;
+	kfree(iovec);
+	return ret;
 }
 
 static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 			 struct iocb *iocb, bool compat)
 {
 	struct aio_kiocb *req;
+	struct file *file;
 	ssize_t ret;
 
 	/* enforce forwards compatibility on users */
@@ -1518,7 +1508,7 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	if (unlikely(!req))
 		return -EAGAIN;
 
-	req->common.ki_filp = fget(iocb->aio_fildes);
+	req->common.ki_filp = file = fget(iocb->aio_fildes);
 	if (unlikely(!req->common.ki_filp)) {
 		ret = -EBADF;
 		goto out_put_req;
@@ -1553,13 +1543,29 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
 	req->ki_user_iocb = user_iocb;
 	req->ki_user_data = iocb->aio_data;
 
-	ret = aio_run_iocb(&req->common, iocb->aio_lio_opcode,
-			   (char __user *)(unsigned long)iocb->aio_buf,
-			   iocb->aio_nbytes,
-			   compat);
-	if (ret)
-		goto out_put_req;
+	get_file(file);
+	switch (iocb->aio_lio_opcode) {
+	case IOCB_CMD_PREAD:
+		ret = aio_read(&req->common, iocb, false, compat);
+		break;
+	case IOCB_CMD_PWRITE:
+		ret = aio_write(&req->common, iocb, false, compat);
+		break;
+	case IOCB_CMD_PREADV:
+		ret = aio_read(&req->common, iocb, true, compat);
+		break;
+	case IOCB_CMD_PWRITEV:
+		ret = aio_write(&req->common, iocb, true, compat);
+		break;
+	default:
+		pr_debug("invalid aio operation %d\n", iocb->aio_lio_opcode);
+		ret = -EINVAL;
+		break;
+	}
+	fput(file);
 
+	if (ret && ret != -EIOCBQUEUED)
+		goto out_put_req;
 	return 0;
 out_put_req:
 	put_reqs_available(ctx, 1);
-- 
cgit v1.2.3-70-g09d2


From 70fe2f48152e60664809e2fed76bbb50c9fa2aa3 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Sun, 30 Oct 2016 11:42:04 -0500
Subject: aio: fix freeze protection of aio writes

Currently we dropped freeze protection of aio writes just after IO was
submitted. Thus aio write could be in flight while the filesystem was
frozen and that could result in unexpected situation like aio completion
wanting to convert extent type on frozen filesystem. Testcase from
Dmitry triggering this is like:

for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
    --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite

Fix the problem by dropping freeze protection only once IO is completed
in aio_complete().

Reported-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Jan Kara <jack@suse.cz>
[hch: forward ported on top of various VFS and aio changes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/aio.c           | 19 ++++++++++++++++++-
 include/linux/fs.h |  1 +
 2 files changed, 19 insertions(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/aio.c b/fs/aio.c
index c19755187ca5..428484f2f841 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1078,6 +1078,17 @@ static void aio_complete(struct kiocb *kiocb, long res, long res2)
 	unsigned tail, pos, head;
 	unsigned long	flags;
 
+	if (kiocb->ki_flags & IOCB_WRITE) {
+		struct file *file = kiocb->ki_filp;
+
+		/*
+		 * Tell lockdep we inherited freeze protection from submission
+		 * thread.
+		 */
+		__sb_writers_acquired(file_inode(file)->i_sb, SB_FREEZE_WRITE);
+		file_end_write(file);
+	}
+
 	/*
 	 * Special case handling for sync iocbs:
 	 *  - events go directly into the iocb for fast handling
@@ -1473,9 +1484,15 @@ static ssize_t aio_write(struct kiocb *req, struct iocb *iocb, bool vectored,
 		return ret;
 	ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter));
 	if (!ret) {
+		req->ki_flags |= IOCB_WRITE;
 		file_start_write(file);
 		ret = aio_ret(req, file->f_op->write_iter(req, &iter));
-		file_end_write(file);
+		/*
+		 * We release freeze protection in aio_complete().  Fool lockdep
+		 * by telling it the lock got released so that it doesn't
+		 * complain about held lock when we return to userspace.
+		 */
+		__sb_writers_release(file_inode(file)->i_sb, SB_FREEZE_WRITE);
 	}
 	kfree(iovec);
 	return ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ff7bcd9e8398..dc0478c07b2a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -321,6 +321,7 @@ struct writeback_control;
 #define IOCB_HIPRI		(1 << 3)
 #define IOCB_DSYNC		(1 << 4)
 #define IOCB_SYNC		(1 << 5)
+#define IOCB_WRITE		(1 << 6)
 
 struct kiocb {
 	struct file		*ki_filp;
-- 
cgit v1.2.3-70-g09d2


From fd3220d37b1f6f0cab6142d98b0e6c4082e63299 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi@redhat.com>
Date: Mon, 31 Oct 2016 14:42:14 +0100
Subject: ovl: update S_ISGID when setting posix ACLs

This change fixes xfstest generic/375, which failed to clear the
setgid bit in the following test case on overlayfs:

  touch $testfile
  chown 100:100 $testfile
  chmod 2755 $testfile
  _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile

Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting")
Cc: <stable@vger.kernel.org> # v4.8
---
 fs/overlayfs/super.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

(limited to 'fs')

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index bcf3965be819..edd46a0e951d 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1037,6 +1037,21 @@ ovl_posix_acl_xattr_set(const struct xattr_handler *handler,
 
 	posix_acl_release(acl);
 
+	/*
+	 * Check if sgid bit needs to be cleared (actual setacl operation will
+	 * be done with mounter's capabilities and so that won't do it for us).
+	 */
+	if (unlikely(inode->i_mode & S_ISGID) &&
+	    handler->flags == ACL_TYPE_ACCESS &&
+	    !in_group_p(inode->i_gid) &&
+	    !capable_wrt_inode_uidgid(inode, CAP_FSETID)) {
+		struct iattr iattr = { .ia_valid = ATTR_KILL_SGID };
+
+		err = ovl_setattr(dentry, &iattr);
+		if (err)
+			return err;
+	}
+
 	err = ovl_xattr_set(dentry, handler->name, value, size, flags);
 	if (!err)
 		ovl_copyattr(ovl_inode_real(inode, NULL), inode);
-- 
cgit v1.2.3-70-g09d2


From b93d4a0eb308d4400b84c8b24c1b80e09a9497d0 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi@redhat.com>
Date: Mon, 31 Oct 2016 14:42:14 +0100
Subject: ovl: fix get_acl() on tmpfs

tmpfs doesn't have ->get_acl() because it only uses cached acls.

This fixes the acl tests in pjdfstest when tmpfs is used as the upper layer
of the overlay.

Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes")
Cc: <stable@vger.kernel.org> # v4.8
---
 fs/overlayfs/inode.c | 3 ---
 1 file changed, 3 deletions(-)

(limited to 'fs')

diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c
index c58f01babf30..7fb53d055537 100644
--- a/fs/overlayfs/inode.c
+++ b/fs/overlayfs/inode.c
@@ -270,9 +270,6 @@ struct posix_acl *ovl_get_acl(struct inode *inode, int type)
 	if (!IS_ENABLED(CONFIG_FS_POSIX_ACL) || !IS_POSIXACL(realinode))
 		return NULL;
 
-	if (!realinode->i_op->get_acl)
-		return NULL;
-
 	old_cred = ovl_override_creds(inode->i_sb);
 	acl = get_acl(realinode, type);
 	revert_creds(old_cred);
-- 
cgit v1.2.3-70-g09d2


From 641089c1549d8d3df0b047b5de7e9a111362cdce Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi@redhat.com>
Date: Mon, 31 Oct 2016 14:42:14 +0100
Subject: ovl: fsync after copy-up

Make sure the copied up file hits the disk before renaming to the final
destination.  If this is not done then the copy-up may corrupt the data in
the file in case of a crash.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
---
 fs/overlayfs/copy_up.c | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'fs')

diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index aeb60f791418..36795eed40b0 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -178,6 +178,8 @@ static int ovl_copy_up_data(struct path *old, struct path *new, loff_t len)
 		len -= bytes;
 	}
 
+	if (!error)
+		error = vfs_fsync(new_file, 0);
 	fput(new_file);
 out_fput:
 	fput(old_file);
-- 
cgit v1.2.3-70-g09d2


From f46c445b79906a9da55c13e0a6f6b6a006b892fe Mon Sep 17 00:00:00 2001
From: Chuck Lever <chuck.lever@oracle.com>
Date: Sat, 29 Oct 2016 18:19:03 -0400
Subject: nfsd: Fix general protection fault in release_lock_stateid()

When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example),
I get this crash on the server:

Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8
Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000
Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>]  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0  EFLAGS: 00010246
Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000
Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020
Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000
Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b
Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548
Oct 28 22:04:30 klimt kernel: FS:  0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000
Oct 28 22:04:30 klimt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0
Oct 28 22:04:30 klimt kernel: Stack:
Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20
Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0
Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870
Oct 28 22:04:30 klimt kernel: Call Trace:
Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd]
Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120
Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280
Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40
Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8
Oct 28 22:04:30 klimt kernel: RIP  [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd]
Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0>
Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]---

Jeff Layton says:
> Hm...now that I look though, this is a little suspicious:
>
>    struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
>
> I wonder if it's possible for the openstateid to have already been
> destroyed at this point.
>
> We might be better off doing something like this to get the client pointer:
>
>    stp->st_stid.sc_client;
>
> ...which should be more direct and less dependent on other stateids
> staying valid.

With the suggested change, I am no longer able to reproduce the above oops.

v2: Fix unhash_lock_stateid() as well

Fix-suggested-by: Jeff Layton <jlayton@redhat.com>
Fixes: 42691398be08 ('nfsd: Fix race between FREE_STATEID and LOCK')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/nfs4state.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

(limited to 'fs')

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 95474196a17e..4b4beaaa4eaa 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1227,9 +1227,7 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,
 
 static bool unhash_lock_stateid(struct nfs4_ol_stateid *stp)
 {
-	struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
-
-	lockdep_assert_held(&oo->oo_owner.so_client->cl_lock);
+	lockdep_assert_held(&stp->st_stid.sc_client->cl_lock);
 
 	list_del_init(&stp->st_locks);
 	nfs4_unhash_stid(&stp->st_stid);
@@ -1238,12 +1236,12 @@ static bool unhash_lock_stateid(struct nfs4_ol_stateid *stp)
 
 static void release_lock_stateid(struct nfs4_ol_stateid *stp)
 {
-	struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
+	struct nfs4_client *clp = stp->st_stid.sc_client;
 	bool unhashed;
 
-	spin_lock(&oo->oo_owner.so_client->cl_lock);
+	spin_lock(&clp->cl_lock);
 	unhashed = unhash_lock_stateid(stp);
-	spin_unlock(&oo->oo_owner.so_client->cl_lock);
+	spin_unlock(&clp->cl_lock);
 	if (unhashed)
 		nfs4_put_stid(&stp->st_stid);
 }
-- 
cgit v1.2.3-70-g09d2


From dc0336214eb07ee9de2a41dd4c81c744ffa419ac Mon Sep 17 00:00:00 2001
From: Mike Marshall <hubcap@omnibond.com>
Date: Fri, 4 Nov 2016 16:32:25 -0400
Subject: orangefs: clean up debugfs

We recently refactored the Orangefs debugfs code.
The refactor seemed to trigger dan.carpenter@oracle.com's
static tester to find a possible double-free in the code.

While designing the fix we saw a condition under which the
buffer being freed could also be overflowed.

We also realized how to rebuild the related debugfs file's
"contents" (a string) without deleting and re-creating the file.

This fix should eliminate the possible double-free, the
potential overflow and improve code readability.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
---
 fs/orangefs/orangefs-debugfs.c | 147 ++++++++++++++++++-----------------------
 fs/orangefs/orangefs-mod.c     |   6 +-
 2 files changed, 68 insertions(+), 85 deletions(-)

(limited to 'fs')

diff --git a/fs/orangefs/orangefs-debugfs.c b/fs/orangefs/orangefs-debugfs.c
index eb09aa026723..d484068ca716 100644
--- a/fs/orangefs/orangefs-debugfs.c
+++ b/fs/orangefs/orangefs-debugfs.c
@@ -141,6 +141,9 @@ static struct client_debug_mask client_debug_mask;
  */
 static DEFINE_MUTEX(orangefs_debug_lock);
 
+/* Used to protect data in ORANGEFS_KMOD_DEBUG_HELP_FILE */
+static DEFINE_MUTEX(orangefs_help_file_lock);
+
 /*
  * initialize kmod debug operations, create orangefs debugfs dir and
  * ORANGEFS_KMOD_DEBUG_HELP_FILE.
@@ -289,6 +292,8 @@ static void *help_start(struct seq_file *m, loff_t *pos)
 
 	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "help_start: start\n");
 
+	mutex_lock(&orangefs_help_file_lock);
+
 	if (*pos == 0)
 		payload = m->private;
 
@@ -305,6 +310,7 @@ static void *help_next(struct seq_file *m, void *v, loff_t *pos)
 static void help_stop(struct seq_file *m, void *p)
 {
 	gossip_debug(GOSSIP_DEBUGFS_DEBUG, "help_stop: start\n");
+	mutex_unlock(&orangefs_help_file_lock);
 }
 
 static int help_show(struct seq_file *m, void *v)
@@ -610,32 +616,54 @@ out:
  * /sys/kernel/debug/orangefs/debug-help can be catted to
  * see all the available kernel and client debug keywords.
  *
- * When the kernel boots, we have no idea what keywords the
+ * When orangefs.ko initializes, we have no idea what keywords the
  * client supports, nor their associated masks.
  *
- * We pass through this function once at boot and stamp a
+ * We pass through this function once at module-load and stamp a
  * boilerplate "we don't know" message for the client in the
  * debug-help file. We pass through here again when the client
  * starts and then we can fill out the debug-help file fully.
  *
  * The client might be restarted any number of times between
- * reboots, we only build the debug-help file the first time.
+ * module reloads, we only build the debug-help file the first time.
  */
 int orangefs_prepare_debugfs_help_string(int at_boot)
 {
-	int rc = -EINVAL;
-	int i;
-	int byte_count = 0;
 	char *client_title = "Client Debug Keywords:\n";
 	char *kernel_title = "Kernel Debug Keywords:\n";
+	size_t string_size =  DEBUG_HELP_STRING_SIZE;
+	size_t result_size;
+	size_t i;
+	char *new;
+	int rc = -EINVAL;
 
 	gossip_debug(GOSSIP_UTILS_DEBUG, "%s: start\n", __func__);
 
-	if (at_boot) {
-		byte_count += strlen(HELP_STRING_UNINITIALIZED);
+	if (at_boot)
 		client_title = HELP_STRING_UNINITIALIZED;
-	} else {
-		/*
+
+	/* build a new debug_help_string. */
+	new = kzalloc(DEBUG_HELP_STRING_SIZE, GFP_KERNEL);
+	if (!new) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	/*
+	 * strlcat(dst, src, size) will append at most
+	 * "size - strlen(dst) - 1" bytes of src onto dst,
+	 * null terminating the result, and return the total
+	 * length of the string it tried to create.
+	 *
+	 * We'll just plow through here building our new debug
+	 * help string and let strlcat take care of assuring that
+	 * dst doesn't overflow.
+	 */
+	strlcat(new, client_title, string_size);
+
+	if (!at_boot) {
+
+                /*
 		 * fill the client keyword/mask array and remember
 		 * how many elements there were.
 		 */
@@ -644,64 +672,40 @@ int orangefs_prepare_debugfs_help_string(int at_boot)
 		if (cdm_element_count <= 0)
 			goto out;
 
-		/* Count the bytes destined for debug_help_string. */
-		byte_count += strlen(client_title);
-
 		for (i = 0; i < cdm_element_count; i++) {
-			byte_count += strlen(cdm_array[i].keyword + 2);
-			if (byte_count >= DEBUG_HELP_STRING_SIZE) {
-				pr_info("%s: overflow 1!\n", __func__);
-				goto out;
-			}
+			strlcat(new, "\t", string_size);
+			strlcat(new, cdm_array[i].keyword, string_size);
+			strlcat(new, "\n", string_size);
 		}
-
-		gossip_debug(GOSSIP_UTILS_DEBUG,
-			     "%s: cdm_element_count:%d:\n",
-			     __func__,
-			     cdm_element_count);
 	}
 
-	byte_count += strlen(kernel_title);
+	strlcat(new, "\n", string_size);
+	strlcat(new, kernel_title, string_size);
+
 	for (i = 0; i < num_kmod_keyword_mask_map; i++) {
-		byte_count +=
-			strlen(s_kmod_keyword_mask_map[i].keyword + 2);
-		if (byte_count >= DEBUG_HELP_STRING_SIZE) {
-			pr_info("%s: overflow 2!\n", __func__);
-			goto out;
-		}
+		strlcat(new, "\t", string_size);
+		strlcat(new, s_kmod_keyword_mask_map[i].keyword, string_size);
+		result_size = strlcat(new, "\n", string_size);
 	}
 
-	/* build debug_help_string. */
-	debug_help_string = kzalloc(DEBUG_HELP_STRING_SIZE, GFP_KERNEL);
-	if (!debug_help_string) {
-		rc = -ENOMEM;
+	/* See if we tried to put too many bytes into "new"... */
+	if (result_size >= string_size) {
+		kfree(new);
 		goto out;
 	}
 
-	strcat(debug_help_string, client_title);
-
-	if (!at_boot) {
-		for (i = 0; i < cdm_element_count; i++) {
-			strcat(debug_help_string, "\t");
-			strcat(debug_help_string, cdm_array[i].keyword);
-			strcat(debug_help_string, "\n");
-		}
-	}
-
-	strcat(debug_help_string, "\n");
-	strcat(debug_help_string, kernel_title);
-
-	for (i = 0; i < num_kmod_keyword_mask_map; i++) {
-		strcat(debug_help_string, "\t");
-		strcat(debug_help_string, s_kmod_keyword_mask_map[i].keyword);
-		strcat(debug_help_string, "\n");
+	if (at_boot) {
+		debug_help_string = new;
+	} else {
+		mutex_lock(&orangefs_help_file_lock);
+		memset(debug_help_string, 0, DEBUG_HELP_STRING_SIZE);
+		strlcat(debug_help_string, new, string_size);
+		mutex_unlock(&orangefs_help_file_lock);
 	}
 
 	rc = 0;
 
-out:
-
-	return rc;
+out:	return rc;
 
 }
 
@@ -959,8 +963,12 @@ int orangefs_debugfs_new_client_string(void __user *arg)
 	ret = copy_from_user(&client_debug_array_string,
                                      (void __user *)arg,
                                      ORANGEFS_MAX_DEBUG_STRING_LEN);
-	if (ret != 0)
+
+	if (ret != 0) {
+		pr_info("%s: CLIENT_STRING: copy_from_user failed\n",
+			__func__);
 		return -EIO;
+	}
 
 	/*
 	 * The real client-core makes an effort to ensure
@@ -975,45 +983,18 @@ int orangefs_debugfs_new_client_string(void __user *arg)
 	client_debug_array_string[ORANGEFS_MAX_DEBUG_STRING_LEN - 1] =
 		'\0';
 	
-	if (ret != 0) {
-		pr_info("%s: CLIENT_STRING: copy_from_user failed\n",
-			__func__);
-		return -EIO;
-	}
-
 	pr_info("%s: client debug array string has been received.\n",
 		__func__);
 
 	if (!help_string_initialized) {
 
-		/* Free the "we don't know yet" default string... */
-		kfree(debug_help_string);
-
-		/* build a proper debug help string */
+		/* Build a proper debug help string. */
 		if (orangefs_prepare_debugfs_help_string(0)) {
 			gossip_err("%s: no debug help string \n",
 				   __func__);
 			return -EIO;
 		}
 
-		/* Replace the boilerplate boot-time debug-help file. */
-		debugfs_remove(help_file_dentry);
-
-		help_file_dentry =
-			debugfs_create_file(
-				ORANGEFS_KMOD_DEBUG_HELP_FILE,
-				0444,
-				debug_dir,
-				debug_help_string,
-				&debug_help_fops);
-
-		if (!help_file_dentry) {
-			gossip_err("%s: debugfs_create_file failed for"
-				   " :%s:!\n",
-				   __func__,
-				   ORANGEFS_KMOD_DEBUG_HELP_FILE);
-			return -EIO;
-		}
 	}
 
 	debug_mask_to_string(&client_debug_mask, 1);
diff --git a/fs/orangefs/orangefs-mod.c b/fs/orangefs/orangefs-mod.c
index 2e5b03065f34..4113eb0495bf 100644
--- a/fs/orangefs/orangefs-mod.c
+++ b/fs/orangefs/orangefs-mod.c
@@ -124,7 +124,7 @@ static int __init orangefs_init(void)
 	 * unknown at boot time.
 	 *
 	 * orangefs_prepare_debugfs_help_string will be used again
-	 * later to rebuild the debug-help file after the client starts
+	 * later to rebuild the debug-help-string after the client starts
 	 * and passes along the needed info. The argument signifies
 	 * which time orangefs_prepare_debugfs_help_string is being
 	 * called.
@@ -152,7 +152,9 @@ static int __init orangefs_init(void)
 
 	ret = register_filesystem(&orangefs_fs_type);
 	if (ret == 0) {
-		pr_info("orangefs: module version %s loaded\n", ORANGEFS_VERSION);
+		pr_info("%s: module version %s loaded\n",
+			__func__,
+			ORANGEFS_VERSION);
 		ret = 0;
 		goto out;
 	}
-- 
cgit v1.2.3-70-g09d2


From 8ef3295530ddc969ea9a3f307d94df97fcbc0629 Mon Sep 17 00:00:00 2001
From: Petr Vandrovec <petr@vandrovec.name>
Date: Mon, 7 Nov 2016 12:11:29 -0800
Subject: NFS: Ignore connections that have cl_rpcclient uninitialized

cl_rpcclient starts as ERR_PTR(-EINVAL), and connections like that
are floating freely through the system.  Most places check whether
pointer is valid before dereferencing it, but newly added code
in nfs_match_client does not.

Which causes crashes when more than one NFS mount point is present.

Signed-off-by: Petr Vandrovec <petr@vandrovec.name>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfs/client.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 7555ba889d1f..ebecfb8fba06 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -314,7 +314,8 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
 		/* Match the full socket address */
 		if (!rpc_cmp_addr_port(sap, clap))
 			/* Match all xprt_switch full socket addresses */
-			if (!rpc_clnt_xprt_switch_has_addr(clp->cl_rpcclient,
+			if (IS_ERR(clp->cl_rpcclient) ||
+                            !rpc_clnt_xprt_switch_has_addr(clp->cl_rpcclient,
 							   sap))
 				continue;
 
-- 
cgit v1.2.3-70-g09d2


From 192747166a468dd8fb5d47ad9d5048c138c1fc25 Mon Sep 17 00:00:00 2001
From: Anna Schumaker <Anna.Schumaker@Netapp.com>
Date: Wed, 26 Oct 2016 15:54:31 -0400
Subject: NFS: Don't print a pNFS error if we aren't using pNFS

We used to check for a valid layout type id before verifying pNFS flags
as an indicator for if we are using pNFS.  This changed in 3132e49ece
with the introduction of multiple layout types, since now we are passing
an array of ids instead of just one.  Since then, users have been seeing
a KERN_ERR printk show up whenever mounting NFS v4 without pNFS.  This
patch restores the original behavior of exiting set_pnfs_layoutdriver()
early if we aren't using pNFS.

Fixes 3132e49ece ("pnfs: track multiple layout types in fsinfo
structure")
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfs/pnfs.c | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'fs')

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 56b2d96f9103..259ef85f435a 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -146,6 +146,8 @@ set_pnfs_layoutdriver(struct nfs_server *server, const struct nfs_fh *mntfh,
 	u32 id;
 	int i;
 
+	if (fsinfo->nlayouttypes == 0)
+		goto out_no_driver;
 	if (!(server->nfs_client->cl_exchange_flags &
 		 (EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS))) {
 		printk(KERN_ERR "NFS: %s: cl_exchange_flags 0x%x\n",
-- 
cgit v1.2.3-70-g09d2


From 0ac84b72c0ed96ace1d8973a06f0120a3b905177 Mon Sep 17 00:00:00 2001
From: Shuah Khan <shuahkh@osg.samsung.com>
Date: Mon, 7 Nov 2016 10:48:16 -0700
Subject: fs/nfs: Fix used uninitialized warn in nfs4_slot_seqid_in_use()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fix the following warn:

fs/nfs/nfs4session.c: In function ‘nfs4_slot_seqid_in_use’:
fs/nfs/nfs4session.c:203:54: warning: ‘cur_seq’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  if (nfs4_slot_get_seqid(tbl, slotid, &cur_seq) == 0 &&
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~
      cur_seq == seq_nr && test_bit(slotid, tbl->used_slots))
      ~~~~~~~~~~~~~~~~~

Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfs/nfs4session.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/nfs/nfs4session.c b/fs/nfs/nfs4session.c
index 150c5a1879bf..a61350f75c74 100644
--- a/fs/nfs/nfs4session.c
+++ b/fs/nfs/nfs4session.c
@@ -198,7 +198,7 @@ static int nfs4_slot_get_seqid(struct nfs4_slot_table  *tbl, u32 slotid,
 static bool nfs4_slot_seqid_in_use(struct nfs4_slot_table *tbl,
 		u32 slotid, u32 seq_nr)
 {
-	u32 cur_seq;
+	u32 cur_seq = 0;
 	bool ret = false;
 
 	spin_lock(&tbl->slot_tbl_lock);
-- 
cgit v1.2.3-70-g09d2


From 8a8d56176635515d607c520580c73d61f300b409 Mon Sep 17 00:00:00 2001
From: "Yan, Zheng" <zyan@redhat.com>
Date: Wed, 9 Nov 2016 16:47:54 +0800
Subject: ceph: use default file splice read callback

Splice read/write implementation changed recently. When using
generic_file_splice_read(), iov_iter with type == ITER_PIPE is
passed to filesystem's read_iter callback. But ceph_sync_read()
can't serve ITER_PIPE iov_iter correctly (ITER_PIPE iov_iter
expects pages from page cache).

Fixing ceph_sync_read() requires a big patch. So use default
splice read callback for now.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
---
 fs/ceph/file.c | 1 -
 1 file changed, 1 deletion(-)

(limited to 'fs')

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 18630e800208..f995e3528a33 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -1770,7 +1770,6 @@ const struct file_operations ceph_file_fops = {
 	.fsync = ceph_fsync,
 	.lock = ceph_lock,
 	.flock = ceph_flock,
-	.splice_read = generic_file_splice_read,
 	.splice_write = iter_file_splice_write,
 	.unlocked_ioctl = ceph_ioctl,
 	.compat_ioctl	= ceph_ioctl,
-- 
cgit v1.2.3-70-g09d2


From e519e7774784f3fa7728657d780863805ed1c983 Mon Sep 17 00:00:00 2001
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Thu, 10 Nov 2016 18:32:13 -0500
Subject: splice: remove detritus from generic_file_splice_read()

i_size check is a leftover from the horrors that used to play with
the page cache in that function.  With the switch to ->read_iter(),
it's neither needed nor correct - for gfs2 it ends up being buggy,
since i_size is not guaranteed to be correct until later (inside
->read_iter()).

Spotted-by: Abhi Das <adas@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
 fs/splice.c | 5 -----
 1 file changed, 5 deletions(-)

(limited to 'fs')

diff --git a/fs/splice.c b/fs/splice.c
index 153d4f3bd441..dcaf185a5731 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -299,13 +299,8 @@ ssize_t generic_file_splice_read(struct file *in, loff_t *ppos,
 {
 	struct iov_iter to;
 	struct kiocb kiocb;
-	loff_t isize;
 	int idx, ret;
 
-	isize = i_size_read(in->f_mapping->host);
-	if (unlikely(*ppos >= isize))
-		return 0;
-
 	iov_iter_pipe(&to, ITER_PIPE | READ, pipe, len);
 	idx = to.idx;
 	init_sync_kiocb(&kiocb, in);
-- 
cgit v1.2.3-70-g09d2


From d006c71f8ad2663dd47f81bf96bf655eeed428e2 Mon Sep 17 00:00:00 2001
From: Junxiao Bi <junxiao.bi@oracle.com>
Date: Thu, 10 Nov 2016 10:46:29 -0800
Subject: ocfs2: fix not enough credit panic

The following panic was caught when run ocfs2 disconfig single test
(block size 512 and cluster size 8192).  ocfs2_journal_dirty() return
-ENOSPC, that means credits were used up.

The total credit should include 3 times of "num_dx_leaves" from
ocfs2_dx_dir_rebalance(), because 2 times will be consumed in
ocfs2_dx_dir_transfer_leaf() and 1 time will be consumed in
ocfs2_dx_dir_new_cluster() -> __ocfs2_dx_dir_new_cluster() ->
ocfs2_dx_dir_format_cluster().  But only two times is included in
ocfs2_dx_dir_rebalance_credits(), fix it.

This can cause read-only fs(v4.1+) or panic for mainline linux depending
on mount option.

  ------------[ cut here ]------------
  kernel BUG at fs/ocfs2/journal.c:775!
  invalid opcode: 0000 [#1] SMP
  Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport acpi_cpufreq i2c_piix4 i2c_core pcspkr ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
  CPU: 2 PID: 10601 Comm: dd Not tainted 4.1.12-71.el6uek.bug24939243.x86_64 #2
  Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
  task: ffff8800b6de6200 ti: ffff8800a7d48000 task.ti: ffff8800a7d48000
  RIP: ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
  RSP: 0018:ffff8800a7d4b6d8  EFLAGS: 00010286
  RAX: 00000000ffffffe4 RBX: 00000000814d0a9c RCX: 00000000000004f9
  RDX: ffffffffa008e990 RSI: ffffffffa008f1ee RDI: ffff8800622b6460
  RBP: ffff8800a7d4b6f8 R08: ffffffffa008f288 R09: ffff8800622b6460
  R10: 0000000000000000 R11: 0000000000000282 R12: 0000000002c8421e
  R13: ffff88006d0cad00 R14: ffff880092beef60 R15: 0000000000000070
  FS:  00007f9b83e92700(0000) GS:ffff8800be880000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fb2c0d1a000 CR3: 0000000008f80000 CR4: 00000000000406e0
  Call Trace:
    ocfs2_dx_dir_transfer_leaf+0x159/0x1a0 [ocfs2]
    ocfs2_dx_dir_rebalance+0xd9b/0xea0 [ocfs2]
    ocfs2_find_dir_space_dx+0xd3/0x300 [ocfs2]
    ocfs2_prepare_dx_dir_for_insert+0x219/0x450 [ocfs2]
    ocfs2_prepare_dir_for_insert+0x1d6/0x580 [ocfs2]
    ocfs2_mknod+0x5a2/0x1400 [ocfs2]
    ocfs2_create+0x73/0x180 [ocfs2]
    vfs_create+0xd8/0x100
    lookup_open+0x185/0x1c0
    do_last+0x36d/0x780
    path_openat+0x92/0x470
    do_filp_open+0x4a/0xa0
    do_sys_open+0x11a/0x230
    SyS_open+0x1e/0x20
    system_call_fastpath+0x12/0x71
  Code: 1d 3f 29 09 00 48 85 db 74 1f 48 8b 03 0f 1f 80 00 00 00 00 48 8b 7b 08 48 83 c3 10 4c 89 e6 ff d0 48 8b 03 48 85 c0 75 eb eb 90 <0f> 0b eb fe 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54
  RIP  ocfs2_journal_dirty+0xa7/0xb0 [ocfs2]
  ---[ end trace 91ac5312a6ee1288 ]---
  Kernel panic - not syncing: Fatal exception
  Kernel Offset: disabled

Link: http://lkml.kernel.org/r/1478248135-31963-1-git-send-email-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/ocfs2/dir.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'fs')

diff --git a/fs/ocfs2/dir.c b/fs/ocfs2/dir.c
index e7054e2ac922..3ecb9f337b7d 100644
--- a/fs/ocfs2/dir.c
+++ b/fs/ocfs2/dir.c
@@ -3699,7 +3699,7 @@ static void ocfs2_dx_dir_transfer_leaf(struct inode *dir, u32 split_hash,
 static int ocfs2_dx_dir_rebalance_credits(struct ocfs2_super *osb,
 					  struct ocfs2_dx_root_block *dx_root)
 {
-	int credits = ocfs2_clusters_to_blocks(osb->sb, 2);
+	int credits = ocfs2_clusters_to_blocks(osb->sb, 3);
 
 	credits += ocfs2_calc_extend_credits(osb->sb, &dx_root->dr_list);
 	credits += ocfs2_quota_trans_credits(osb->sb);
-- 
cgit v1.2.3-70-g09d2


From 70d78fe7c8b640b5acfad56ad341985b3810998a Mon Sep 17 00:00:00 2001
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
Date: Thu, 10 Nov 2016 10:46:38 -0800
Subject: coredump: fix unfreezable coredumping task

It could be not possible to freeze coredumping task when it waits for
'core_state->startup' completion, because threads are frozen in
get_signal() before they got a chance to complete 'core_state->startup'.

Inability to freeze a task during suspend will cause suspend to fail.
Also CRIU uses cgroup freezer during dump operation.  So with an
unfreezable task the CRIU dump will fail because it waits for a
transition from 'FREEZING' to 'FROZEN' state which will never happen.

Use freezer_do_not_count() to tell freezer to ignore coredumping task
while it waits for core_state->startup completion.

Link: http://lkml.kernel.org/r/1475225434-3753-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tejun Heo <tj@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/coredump.c | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'fs')

diff --git a/fs/coredump.c b/fs/coredump.c
index 281b768000e6..eb9c92c9b20f 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -1,6 +1,7 @@
 #include <linux/slab.h>
 #include <linux/file.h>
 #include <linux/fdtable.h>
+#include <linux/freezer.h>
 #include <linux/mm.h>
 #include <linux/stat.h>
 #include <linux/fcntl.h>
@@ -423,7 +424,9 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
 	if (core_waiters > 0) {
 		struct core_thread *ptr;
 
+		freezer_do_not_count();
 		wait_for_completion(&core_state->startup);
+		freezer_count();
 		/*
 		 * Wait for all the threads to become inactive, so that
 		 * all the thread context (extended register state, like
-- 
cgit v1.2.3-70-g09d2