summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-12-01block: protect iterate_bdevs() against concurrent closeRabin Vincent
If a block device is closed while iterate_bdevs() is handling it, the following NULL pointer dereference occurs because bdev->b_disk is NULL in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in turn called by the mapping_cap_writeback_dirty() call in __filemap_fdatawrite_range()): BUG: unable to handle kernel NULL pointer dereference at 0000000000000508 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 PGD 9e62067 PUD 9ee8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000 RIP: 0010:[<ffffffff81314790>] [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38 FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0 Stack: ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001 ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f Call Trace: [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90 [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30 [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20 [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130 [<ffffffff811b2763>] sys_sync+0x63/0x90 [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d RIP [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP <ffff880009f5fe68> CR2: 0000000000000508 ---[ end trace 2487336ceb3de62d ]--- The crash is easily reproducible by running the following command, if an msleep(100) is inserted before the call to func() in iterate_devs(): while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done Fix it by holding the bd_mutex across the func() call and only calling func() if the bdev is opened. Cc: stable@vger.kernel.org Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices") Reported-and-tested-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: remove unnecesary checkMing Lei
The check on bio->bi_vcnt doesn't make sense in erase_end_io(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: use bio_add_page() in do_erase()Ming Lei
Also code gets simplified a bit. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: use bio_add_page() in __bdev_writeseg()Ming Lei
Also this patch simplify the code a bit. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22fs: logfs: convert to bio_add_page() in sync_request()Ming Lei
Always bio_add_page() is the standard and preferred way to do the task. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block: bio: pass bvec table to bio_init()Ming Lei
Some drivers often use external bvec table, so introduce this helper for this case. It is always safe to access the bio->bi_io_vec in this way for this case. After converting to this usage, it will becomes a bit easier to evaluate the remaining direct access to bio->bi_io_vec, so it can help to prepare for the following multipage bvec support. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Fixed up the new O_DIRECT cases. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block_dev: get rid of blksize bits calculationJens Axboe
We store the bits in the bdev sector size locally, but we don't use the calculation anymore. All we do with it is shift it back up to the bdev sector size. So let's just use that directly and kill the variable and bits calculation. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-22block_dev: Fixed direct I/O bio sector calculationDamien Le Moal
A direct I/O alignment must be always checked against the device blocks size, but the I/O offset (bio->bi_iter.bi_sector must always use 512B sector unit, and not the actual logical block size. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-17block: new direct I/O implementationChristoph Hellwig
Similar to the simple fast path, but we now need a dio structure to track multiple-bio completions. It's basically a cut-down version of the new iomap-based direct I/O code for filesystems, but without all the logic to call into the filesystem for extent lookup or allocation, and without the complex I/O completion workqueue handler for AIO - instead we just use the FUA bit on the bios to ensure data is flushed to stable storage. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-17block: make __blkdev_direct_IO_sync() support O_SYNC/DSYNCJens Axboe
Split the op setting code into a helper, use it in both places. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-17block: support a full bio worth of IO for simplified bdev direct-ioJens Axboe
Just alloc the bio_vec array if we exceed the inline limit. Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-17block: fast-path for small and simple direct I/O requestsChristoph Hellwig
This patch adds a small and simple fast patch for small direct I/O requests on block devices that don't use AIO. Between the neat bio_iov_iter_get_pages helper that avoids allocating a page array for get_user_pages and the on-stack bio and biovec this avoid memory allocations and atomic operations entirely in the direct I/O code (lower levels might still do memory allocations and will usually have at least some atomic operations, though). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com> Tested-By: Stephen Bates <sbates@raithlin.com> Reviewed-By: Stephen Bates <sbates@raithlin.com>
2016-11-11block: move poll code to blk-mqJens Axboe
The poll code is blk-mq specific, let's move it to blk-mq.c. This is a prep patch for improving the polling code. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-11-02writeback: add wbc_to_write_flags()Jens Axboe
Add wbc_to_write_flags(), which returns the write modifier flags to use, based on a struct writeback_control. No functional changes in this patch, but it prepares us for factoring other wbc fields for write type. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de>
2016-11-01mm: only include blk_types in swap.h if CONFIG_SWAP is enabledChristoph Hellwig
It's only needed for the CONFIG_SWAP-only use of bio_end_io_t. Because CONFIG_SWAP implies CONFIG_BLOCK this will allow to drop some ifdefs in blk_types.h. Instead we'll need to add a few explicit includes that were implicit before, though. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01block,fs: untangle fs.h and blk_types.hChristoph Hellwig
Nothing in fs.h should require blk_types.h to be included. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01block,fs: use REQ_* flags directlyChristoph Hellwig
Remove the WRITE_* and READ_SYNC wrappers, and just use the flags directly. Where applicable this also drops usage of the bio_set_op_attrs wrapper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-01btrfs: use op_is_sync to check for synchronous requestsChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-28block: better op and flags encodingChristoph Hellwig
Now that we don't need the common flags to overflow outside the range of a 32-bit type we can encode them the same way for both the bio and request fields. This in addition allows us to place the operation first (and make some room for more ops while we're at it) and to stop having to shift around the operation values. In addition this allows passing around only one value in the block layer instead of two (and eventuall also in the file systems, but we can do that later) and thus clean up a lot of code. Last but not least this allows decreasing the size of the cmd_flags field in struct request to 32-bits. Various functions passing this value could also be updated, but I'd like to avoid the churn for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-17btrfs: assign error values to the correct bio structsJunjie Mao
Fixes: 4246a0b63bd8 ("block: add a bi_error field to struct bio") Signed-off-by: Junjie Mao <junjie.mao@enight.me> Acked-by: David Sterba <dsterba@suse.cz> Cc: stable@vger.kernel.org # 4.3+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-15Merge tag 'befs-v4.9-rc1' of git://github.com/luisbg/linux-befsLinus Torvalds
Pull befs fixes from Luis de Bethencourt: "I recently took maintainership of the befs file system [0]. This is the first time I send you a git pull request, so please let me know if all the below is OK. Salah Triki and myself have been cleaning the code and fixing a few small bugs. Sorry I couldn't send this sooner in the merge window, I was waiting to have my GPG key signed by kernel members at ELCE in Berlin a few days ago." [0] https://lkml.org/lkml/2016/7/27/502 * tag 'befs-v4.9-rc1' of git://github.com/luisbg/linux-befs: (39 commits) befs: befs: fix style issues in datastream.c befs: improve documentation in datastream.c befs: fix typos in datastream.c befs: fix typos in btree.c befs: fix style issues in super.c befs: fix comment style befs: add check for ag_shift in superblock befs: dump inode_size superblock information befs: remove unnecessary initialization befs: fix typo in befs_sb_info befs: add flags field to validate superblock state befs: fix typo in befs_find_key befs: remove unused BEFS_BT_PARMATCH fs: befs: remove ret variable fs: befs: remove in vain variable assignment fs: befs: remove unnecessary *befs_sb variable fs: befs: remove useless initialization to zero fs: befs: remove in vain variable assignment fs: befs: Insert NULL inode to dentry fs: befs: Remove useless calls to brelse in befs_find_brun_dblindirect ...
2016-10-15Merge tag 'gcc-plugins-v4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc plugins update from Kees Cook: "This adds a new gcc plugin named "latent_entropy". It is designed to extract as much possible uncertainty from a running system at boot time as possible, hoping to capitalize on any possible variation in CPU operation (due to runtime data differences, hardware differences, SMP ordering, thermal timing variation, cache behavior, etc). At the very least, this plugin is a much more comprehensive example for how to manipulate kernel code using the gcc plugin internals" * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: latent_entropy: Mark functions with __latent_entropy gcc-plugins: Add latent_entropy plugin
2016-10-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more misc uaccess and vfs updates from Al Viro: "The rest of the stuff from -next (more uaccess work) + assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: score: traps: Add missing include file to fix build error fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths fs/super.c: fix race between freeze_super() and thaw_super() overlayfs: Fix setting IOP_XATTR flag iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector() blackfin: no access_ok() for __copy_{to,from}_user() arm64: don't zero in __copy_from_user{,_inatomic} arm: don't zero in __copy_from_user_inatomic()/__copy_from_user() arc: don't leak bits of kernel stack into coredump alpha: get rid of tail-zeroing in __copy_user()
2016-10-14Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Including: - nine bug fixes for stable. Some of these we found at the recent two weeks of SMB3 test events/plugfests. - significant improvements in reconnection (e.g. if server or network crashes) especially when mounted with "persistenthandles" or to server which advertises Continuous Availability on the share. - a new mount option "idsfromsid" which improves POSIX compatibility in some cases (when winbind not configured e.g.) by better (and faster) fetching uid/gid from acl (when "cifsacl" mount option is enabled). NB: we are almost complete work on "cifsacl" (querying mode/uid/gid from ACL) for SMB3, but SMB3 support for cifsacl is not included in this set. - improved handling for SMB3 "credits" (even if server is buggy) Still working on two sets of changes: - cifsacl enablement for SMB3 - cleanup of RFC1001 length calculation (so we can handle encryption and multichannel and RDMA) And a couple of new bugs were reported recently (unrelated to above) so will probably have another merge request next week" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (21 commits) CIFS: Retrieve uid and gid from special sid if enabled CIFS: Add new mount option to set owner uid and gid from special sids in acl CIFS: Reset read oplock to NONE if we have mandatory locks after reopen CIFS: Fix persistent handles re-opening on reconnect SMB2: Separate RawNTLMSSP authentication from SMB2_sess_setup SMB2: Separate Kerberos authentication from SMB2_sess_setup Expose cifs module parameters in sysfs Cleanup missing frees on some ioctls Enable previous version support Do not send SMB3 SET_INFO request if nothing is changing SMB3: Add mount parameter to allow user to override max credits fs/cifs: reopen persistent handles on reconnect Clarify locking of cifs file and tcon structures and make more granular Fix regression which breaks DFS mounting fs/cifs: keep guid when assigning fid to fileinfo SMB3: GUIDs should be constructed as random but valid uuids Set previous session id correctly on SMB3 reconnect cifs: Limit the overall credit acquired Display number of credits available Add way to query creation time of file via cifs xattr ...
2016-10-14Merge branch 'for-linus-4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "Some fixes from Omar and Dave Sterba for our new free space tree. This isn't heavily used yet, but as we move toward making it the new default we wanted to nail down an endian bug" * 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: tests: uninline member definitions in free_space_extent btrfs: tests: constify free space extent specs Btrfs: expand free space tree sanity tests to catch endianness bug Btrfs: fix extent buffer bitmap tests on big-endian systems Btrfs: catch invalid free space trees Btrfs: fix mount -o clear_cache,space_cache=v2 Btrfs: fix free space tree bitmaps on big-endian systems
2016-10-14fs/super.c: don't fool lockdep in freeze_super() and thaw_super() pathsOleg Nesterov
sb_wait_write()->percpu_rwsem_release() fools lockdep to avoid the false-positives. Now that xfs was fixed by Dave's commit dbad7c993053 ("xfs: stop holding ILOCK over filldir callbacks") we can remove it and change freeze_super() and thaw_super() to run with s_writers.rw_sem locks held; we add two trivial helpers for that, lockdep_sb_freeze_release() and lockdep_sb_freeze_acquire(). xfstests-dev/check `grep -il freeze tests/*/???` does not trigger any warning from lockdep. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-14Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This update contains fixes to the "use mounter's permission to access underlying layers" area, and miscellaneous other fixes and cleanups. No new features this time" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: use vfs_get_link() vfs: add vfs_get_link() helper ovl: use generic_readlink ovl: explain error values when removing acl from workdir ovl: Fix info leak in ovl_lookup_temp() ovl: during copy up, switch to mounter's creds early ovl: lookup: do getxattr with mounter's permission ovl: copy_up_xattr(): use strnlen
2016-10-14fs/super.c: fix race between freeze_super() and thaw_super()Oleg Nesterov
Change thaw_super() to check frozen != SB_FREEZE_COMPLETE rather than frozen == SB_UNFROZEN, otherwise it can race with freeze_super() which drops sb->s_umount after SB_FREEZE_WRITE to preserve the lock ordering. In this case thaw_super() will wrongly call s_op->unfreeze_fs() before it was actually frozen, and call sb_freeze_unlock() which leads to the unbalanced percpu_up_write(). Unfortunately lockdep can't detect this, so this triggers misc BUG_ON()'s in kernel/rcu/sync.c. Reported-and-tested-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-14overlayfs: Fix setting IOP_XATTR flagVivek Goyal
ovl_fill_super calls ovl_new_inode to create a root inode for the new superblock before initializing sb->s_xattr. This wrongly causes IOP_XATTR to be cleared in i_opflags of the new inode, causing SELinux to log the following message: SELinux: (dev overlay, type overlay) has no xattr support Fix this by initializing sb->s_xattr and similar fields before calling ovl_new_inode. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-14iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector()Vegard Nossum
Both import_iovec() and rw_copy_check_uvector() take an array (typically small and on-stack) which is used to hold an iovec array copy from userspace. This is to avoid an expensive memory allocation in the fast path (i.e. few iovec elements). The caller may have to check whether these functions actually used the provided buffer or allocated a new one -- but this differs between the too. Let's just add a kernel doc to clarify what the semantics are for each function. Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-10-14CIFS: Retrieve uid and gid from special sid if enabledSteve French
New mount option "idsfromsid" indicates to cifs.ko that it should try to retrieve the uid and gid owner fields from special sids. This patch adds the code to parse the owner sids in the ACL to see if they match, and if so populate the uid and/or gid from them. This is faster than upcalling for them and asking winbind, and is a fairly common case, and is also helpful when cifs.upcall and idmapping is not configured. Signed-off-by: Steve French <steve.french@primarydata.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-10-14CIFS: Add new mount option to set owner uid and gid from special sids in aclSteve French
Add "idsfromsid" mount option to indicate to cifs.ko that it should try to retrieve the uid and gid owner fields from special sids in the ACL if present. This first patch just adds the parsing for the mount option. Signed-off-by: Steve French <steve.french@primarydata.com> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-10-14Merge branch 'for-4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - tracepoints for basic cgroup management operations added - kernfs and cgroup path formatting functions updated to behave in the style of strlcpy() - non-critical bug fixes * 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent() cpuset: fix error handling regression in proc_cpuset_show() cgroup: add tracepoints for basic operations cgroup: make cgroup_path() and friends behave in the style of strlcpy() kernfs: remove kernfs_path_len() kernfs: make kernfs_path*() behave in the style of strlcpy() kernfs: add dummy implementation of kernfs_path_from_node()
2016-10-14ovl: use vfs_get_link()Miklos Szeredi
Resulting in a complete removal of a function basically implementing the inverse of vfs_readlink(). As a bonus, now the proper security hook is also called. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-14vfs: add vfs_get_link() helperMiklos Szeredi
This helper is for filesystems that want to read the symlink and are better off with the get_link() interface (returning a char *) rather than the readlink() interface (copy into a userspace buffer). Also call the LSM hook for readlink (not get_link) since this is for symlink reading not following. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-14ovl: use generic_readlinkMiklos Szeredi
All filesystems that are backers for overlayfs would also use generic_readlink(). Move this logic to the overlay itself, which is a nice cleanup. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-14ovl: explain error values when removing acl from workdirMiklos Szeredi
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-10-13Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Highlights include: Stable bugfixes: - sunrpc: fix writ espace race causing stalls - NFS: Fix inode corruption in nfs_prime_dcache() - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation() - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid - NFSv4: Open state recovery must account for file permission changes - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic Features: - Add support for tracking multiple layout types with an ordered list - Add support for using multiple backchannel threads on the client - Add support for pNFS file layout session trunking - Delay xprtrdma use of DMA API (for device driver removal) - Add support for xprtrdma remote invalidation - Add support for larger xprtrdma inline thresholds - Use a scatter/gather list for sending xprtrdma RPC calls - Add support for the CB_NOTIFY_LOCK callback - Improve hashing sunrpc auth_creds by using both uid and gid Bugfixes: - Fix xprtrdma use of DMA API - Validate filenames before adding to the dcache - Fix corruption of xdr->nwords in xdr_copy_to_scratch - Fix setting buffer length in xdr_set_next_buffer() - Don't deadlock the state manager on the SEQUENCE status flags - Various delegation and stateid related fixes - Retry operations if an interrupted slot receives EREMOTEIO - Make nfs boot time y2038 safe" * tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits) NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic fs: nfs: Make nfs boot time y2038 safe sunrpc: replace generic auth_cred hash with auth-specific function sunrpc: add RPCSEC_GSS hash_cred() function sunrpc: add auth_unix hash_cred() function sunrpc: add generic_auth hash_cred() function sunrpc: add hash_cred() function to rpc_authops struct Retry operation on EREMOTEIO on an interrupted slot pNFS: Fix atime updates on pNFS clients sunrpc: queue work on system_power_efficient_wq NFSv4.1: Even if the stateid is OK, we may need to recover the open modes NFSv4: If recovery failed for a specific open stateid, then don't retry NFSv4: Fix retry issues with nfs41_test/free_stateid NFSv4: Open state recovery must account for file permission changes NFSv4: Mark the lock and open stateids as invalid after freeing them NFSv4: Don't test open_stateid unless it is set NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation NFSv4: Fix a race when updating an open_stateid NFSv4: Fix a race in nfs_inode_reclaim_delegation() ...
2016-10-13Merge tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd updates from Bruce Fields: "Some RDMA work and some good bugfixes, and two new features that could benefit from user testing: - Anna Schumacker contributed a simple NFSv4.2 COPY implementation. COPY is already supported on the client side, so a call to copy_file_range() on a recent client should now result in a server-side copy that doesn't require all the data to make a round trip to the client and back. - Jeff Layton implemented callbacks to notify clients when contended locks become available, which should reduce latency on workloads with contended locks" * tag 'nfsd-4.9' of git://linux-nfs.org/~bfields/linux: NFSD: Implement the COPY call nfsd: handle EUCLEAN nfsd: only WARN once on unmapped errors exportfs: be careful to only return expected errors. nfsd4: setclientid_confirm with unmatched verifier should fail nfsd: randomize SETCLIENTID reply to help distinguish servers nfsd: set the MAY_NOTIFY_LOCK flag in OPEN replies nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant nfsd: add a LRU list for blocked locks nfsd: have nfsd4_lock use blocking locks for v4.1+ locks nfsd: plumb in a CB_NOTIFY_LOCK operation NFSD: fix corruption in notifier registration svcrdma: support Remote Invalidation svcrdma: Server-side support for rpcrdma_connect_private rpcrdma: RDMA/CM private message data structure svcrdma: Skip put_page() when send_reply() fails svcrdma: Tail iovec leaves an orphaned DMA mapping nfsd: fix dprintk in nfsd4_encode_getdeviceinfo nfsd: eliminate cb_minorversion field nfsd: don't set a FL_LAYOUT lease for flexfiles layouts
2016-10-13Merge tag 'xfs-reflink-for-linus-4.9-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs < XFS has gained super CoW powers! > ---------------------------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || || Pull XFS support for shared data extents from Dave Chinner: "This is the second part of the XFS updates for this merge cycle. This pullreq contains the new shared data extents feature for XFS. Given the complexity and size of this change I am expecting - like the addition of reverse mapping last cycle - that there will be some follow-up bug fixes and cleanups around the -rc3 stage for issues that I'm sure will show up once the code hits a wider userbase. What it is: At the most basic level we are simply adding shared data extents to XFS - i.e. a single extent on disk can now have multiple owners. To do this we have to add new on-disk features to both track the shared extents and the number of times they've been shared. This is done by the new "refcount" btree that sits in every allocation group. When we share or unshare an extent, this tree gets updated. Along with this new tree, the reverse mapping tree needs to be updated to track each owner or a shared extent. This also needs to be updated ever share/unshare operation. These interactions at extent allocation and freeing time have complex ordering and recovery constraints, so there's a significant amount of new intent-based transaction code to ensure that operations are performed atomically from both the runtime and integrity/crash recovery perspectives. We also need to break sharing when writes hit a shared extent - this is where the new copy-on-write implementation comes in. We allocate new storage and copy the original data along with the overwrite data into the new location. We only do this for data as we don't share metadata at all - each inode has it's own metadata that tracks the shared data extents, the extents undergoing CoW and it's own private extents. Of course, being XFS, nothing is simple - we use delayed allocation for CoW similar to how we use it for normal writes. ENOSPC is a significant issue here - we build on the reservation code added in 4.8-rc1 with the reverse mapping feature to ensure we don't get spurious ENOSPC issues part way through a CoW operation. These mechanisms also help minimise fragmentation due to repeated CoW operations. To further reduce fragmentation overhead, we've also introduced a CoW extent size hint, which indicates how large a region we should allocate when we execute a CoW operation. With all this functionality in place, we can hook up .copy_file_range, .clone_file_range and .dedupe_file_range and we gain all the capabilities of reflink and other vfs provided functionality that enable manipulation to shared extents. We also added a fallocate mode that explicitly unshares a range of a file, which we implemented as an explicit CoW of all the shared extents in a file. As such, it's a huge chunk of new functionality with new on-disk format features and internal infrastructure. It warns at mount time as an experimental feature and that it may eat data (as we do with all new on-disk features until they stabilise). We have not released userspace suport for it yet - userspace support currently requires download from Darrick's xfsprogs repo and build from source, so the access to this feature is really developer/tester only at this point. Initial userspace support will be released at the same time the kernel with this code in it is released. The new code causes 5-6 new failures with xfstests - these aren't serious functional failures but things the output of tests changing slightly due to perturbations in layouts, space usage, etc. OTOH, we've added 150+ new tests to xfstests that specifically exercise this new functionality so it's got far better test coverage than any functionality we've previously added to XFS. Darrick has done a pretty amazing job getting us to this stage, and special mention also needs to go to Christoph (review, testing, improvements and bug fixes) and Brian (caught several intricate bugs during review) for the effort they've also put in. Summary: - unshare range (FALLOC_FL_UNSHARE) support for fallocate - copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr interface - shared extent support for XFS - copy-on-write support for shared extents - copy_file_range support - clone_file_range support (implements reflink) - dedupe_file_range support - defrag support for reverse mapping enabled filesystems" * tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits) xfs: convert COW blocks to real blocks before unwritten extent conversion xfs: rework refcount cow recovery error handling xfs: clear reflink flag if setting realtime flag xfs: fix error initialization xfs: fix label inaccuracies xfs: remove isize check from unshare operation xfs: reduce stack usage of _reflink_clear_inode_flag xfs: check inode reflink flag before calling reflink functions xfs: implement swapext for rmap filesystems xfs: refactor swapext code xfs: various swapext cleanups xfs: recognize the reflink feature bit xfs: simulate per-AG reservations being critically low xfs: don't mix reflink and DAX mode for now xfs: check for invalid inode reflink flags xfs: set a default CoW extent size of 32 blocks xfs: convert unwritten status of reverse mappings for shared files xfs: use interval query for rmap alloc operations on shared files xfs: add shared rmap map/unmap/convert log item types xfs: increase log reservations for reflink ...
2016-10-13CIFS: Reset read oplock to NONE if we have mandatory locks after reopenPavel Shilovsky
We are already doing the same thing for an ordinary open case: we can't keep read oplock on a file if we have mandatory byte-range locks because pagereading can conflict with these locks on a server. Fix it by setting oplock level to NONE. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-10-13CIFS: Fix persistent handles re-opening on reconnectPavel Shilovsky
openFileList of tcon can be changed while cifs_reopen_file() is called that can lead to an unexpected behavior when we return to the loop. Fix this by introducing a temp list for keeping all file handles that need to be reopen. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-10-13SMB2: Separate RawNTLMSSP authentication from SMB2_sess_setupSachin Prabhu
We split the rawntlmssp authentication into negotiate and authencate parts. We also clean up the code and add helpers. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-10-13SMB2: Separate Kerberos authentication from SMB2_sess_setupSachin Prabhu
Add helper functions and split Kerberos authentication off SMB2_sess_setup. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2016-10-13Expose cifs module parameters in sysfsGermano Percossi
/sys/module/cifs/parameters should display the three other module load time configuration settings for cifs.ko Signed-off-by: Germano Percossi <germano.percossi@citrix.com> Signed-off-by: Steve French <steve.french@primarydata.com>
2016-10-13Cleanup missing frees on some ioctlsSteve French
Cleanup some missing mem frees on some cifs ioctls, and clarify others to make more obvious that no data is returned. CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Sachin Prabhu <sprabhu@redhat.com>
2016-10-13Enable previous version supportSteve French
Add ioctl to query previous versions of file Allows listing snapshots on files on SMB3 mounts. Signed-off-by: Steve French <smfrench@gmail.com>
2016-10-13Do not send SMB3 SET_INFO request if nothing is changingSteve French
[CIFS] We had cases where we sent a SMB2/SMB3 setinfo request with all timestamp (and DOS attribute) fields marked as 0 (ie do not change) e.g. on chmod or chown. Signed-off-by: Steve French <steve.french@primarydata.com> CC: Stable <stable@vger.kernel.org>
2016-10-13Merge git://www.linux-watchdog.org/linux-watchdogLinus Torvalds
Pull watchdog updates from Wim Van Sebroeck: - a new watchdog pretimeout governor framework - support to upload the firmware on the ziirave_wdt - several fixes and cleanups * git://www.linux-watchdog.org/linux-watchdog: (26 commits) watchdog: imx2_wdt: add pretimeout function support watchdog: softdog: implement pretimeout support watchdog: pretimeout: add pretimeout_available_governors attribute watchdog: pretimeout: add option to select a pretimeout governor in runtime watchdog: pretimeout: add panic pretimeout governor watchdog: pretimeout: add noop pretimeout governor watchdog: add watchdog pretimeout governor framework watchdog: hpwdt: add support for iLO5 fs: compat_ioctl: add pretimeout functions for watchdogs watchdog: add pretimeout support to the core watchdog: imx2_wdt: use preferred BIT macro instead of open coded values watchdog: st_wdt: Remove support for obsolete platforms watchdog: bindings: Remove obsolete platforms from dt doc. watchdog: mt7621_wdt: Remove assignment of dev pointer watchdog: rt2880_wdt: Remove assignment of dev pointer watchdog: constify watchdog_ops structures watchdog: tegra: constify watchdog_ops structures watchdog: iTCO_wdt: constify iTCO_wdt_pm structure watchdog: cadence_wdt: Fix the suspend resume watchdog: txx9wdt: Add missing clock (un)prepare calls for CCF ...
2016-10-12Merge branch 'fst-fixes' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.9 Signed-off-by: Chris Mason <clm@fb.com>