summaryrefslogtreecommitdiff
path: root/include/linux/fs.h
AgeCommit message (Collapse)Author
2018-07-10drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()Al Viro
Failure of ->open() should *not* be followed by fput(). Fixed by using filp_clone_open(), which gets the cleanups right. Cc: stable@vger.kernel.org Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-10turn filp_clone_open() into inline wrapper for dentry_open()Al Viro
it's exactly the same thing as dentry_open(&file->f_path, file->f_flags, file->f_cred) ... and rename it to file_clone_open(), while we are at it. 'filp' naming convention is bogus; sure, it's "file pointer", but we generally don't do that kind of Hungarian notation. Some of the instances have too many callers to touch, but this one has only two, so let's sanitize it while we can... Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-08fs: shave 8 bytes off of struct inodeAmir Goldstein
Here is a link to Linus' reply to Jan's concern about making i_blkbibts byte addressable: https://marc.info/?l=linux-fsdevel&m=152882624707975&w=2 Here is a link to an lkp.org report about potential performance improvement in some workload, which could(?) be related to packing i_blkbits closer to i_bytes/i_lock: https://marc.info/?l=linux-fsdevel&m=153077048108198&w=2 Changes since v1: - Add links to relevant discussions Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-07-06vfs: dedupe: rationalize argsMiklos Szeredi
Clean up f_op->dedupe_file_range() interface. 1) Use loff_t for offsets and length instead of u64 2) Order the arguments the same way as {copy|clone}_file_range(). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-06vfs: dedupe: return intMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-06-28Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLLLinus Torvalds
The poll() changes were not well thought out, and completely unexplained. They also caused a huge performance regression, because "->poll()" was no longer a trivial file operation that just called down to the underlying file operations, but instead did at least two indirect calls. Indirect calls are sadly slow now with the Spectre mitigation, but the performance problem could at least be largely mitigated by changing the "->get_poll_head()" operation to just have a per-file-descriptor pointer to the poll head instead. That gets rid of one of the new indirections. But that doesn't fix the new complexity that is completely unwarranted for the regular case. The (undocumented) reason for the poll() changes was some alleged AIO poll race fixing, but we don't make the common case slower and more complex for some uncommon special case, so this all really needs way more explanations and most likely a fundamental redesign. [ This revert is a revert of about 30 different commits, not reverted individually because that would just be unnecessarily messy - Linus ] Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-18removed extra extern file_fdatawait_rangeVasily Averin
Jeff added this extern twice in commit a823e4589e68 Fixes: a823e4589e68 ("mm: add file_fdatawait_range and file_write_and_wait") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-06-15Merge tag 'vfs-timespec64' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground Pull inode timestamps conversion to timespec64 from Arnd Bergmann: "This is a late set of changes from Deepa Dinamani doing an automated treewide conversion of the inode and iattr structures from 'timespec' to 'timespec64', to push the conversion from the VFS layer into the individual file systems. As Deepa writes: 'The series aims to switch vfs timestamps to use struct timespec64. Currently vfs uses struct timespec, which is not y2038 safe. The series involves the following: 1. Add vfs helper functions for supporting struct timepec64 timestamps. 2. Cast prints of vfs timestamps to avoid warnings after the switch. 3. Simplify code using vfs timestamps so that the actual replacement becomes easy. 4. Convert vfs timestamps to use struct timespec64 using a script. This is a flag day patch. Next steps: 1. Convert APIs that can handle timespec64, instead of converting timestamps at the boundaries. 2. Update internal data structures to avoid timestamp conversions' Thomas Gleixner adds: 'I think there is no point to drag that out for the next merge window. The whole thing needs to be done in one go for the core changes which means that you're going to play that catchup game forever. Let's get over with it towards the end of the merge window'" * tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground: pstore: Remove bogus format string definition vfs: change inode times to use struct timespec64 pstore: Convert internal records to timespec64 udf: Simplify calls to udf_disk_stamp_to_time fs: nfs: get rid of memcpys for inode times ceph: make inode time prints to be long long lustre: Use long long type to print inode time fs: add timespec64_truncate()
2018-06-08Merge branch 'work.aio' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull aio iopriority support from Al Viro: "The rest of aio stuff for this cycle - Adam's aio ioprio series" * 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: aio ioprio use ioprio_check_cap ret val fs: aio ioprio add explicit block layer dependence fs: iomap dio set bio prio from kiocb prio fs: blkdev set bio prio from kiocb prio fs: Add aio iopriority support fs: Convert kiocb rw_hint from enum to u16 block: add ioprio_check_cap function
2018-06-07Merge tag 'ovl-fixes-4.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "This contains a fix for the vfs_mkdir() issue discovered by Al, as well as other fixes and cleanups" * tag 'ovl-fixes-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: use inode_insert5() to hash a newly created inode ovl: Pass argument to ovl_get_inode() in a structure vfs: factor out inode_insert5() ovl: clean up copy-up error paths ovl: return EIO on internal error ovl: make ovl_create_real() cope with vfs_mkdir() safely ovl: create helper ovl_create_temp() ovl: return dentry from ovl_create_real() ovl: struct cattr cleanups ovl: strip debug argument from ovl_do_ helpers ovl: remove WARN_ON() real inode attributes mismatch ovl: Kconfig documentation fixes ovl: update documentation for unionmount-testsuite
2018-06-05vfs: change inode times to use struct timespec64Deepa Dinamani
struct timespec is not y2038 safe. Transition vfs to use y2038 safe struct timespec64 instead. The change was made with the help of the following cocinelle script. This catches about 80% of the changes. All the header file and logic changes are included in the first 5 rules. The rest are trivial substitutions. I avoid changing any of the function signatures or any other filesystem specific data structures to keep the patch simple for review. The script can be a little shorter by combining different cases. But, this version was sufficient for my usecase. virtual patch @ depends on patch @ identifier now; @@ - struct timespec + struct timespec64 current_time ( ... ) { - struct timespec now = current_kernel_time(); + struct timespec64 now = current_kernel_time64(); ... - return timespec_trunc( + return timespec64_trunc( ... ); } @ depends on patch @ identifier xtime; @@ struct \( iattr \| inode \| kstat \) { ... - struct timespec xtime; + struct timespec64 xtime; ... } @ depends on patch @ identifier t; @@ struct inode_operations { ... int (*update_time) (..., - struct timespec t, + struct timespec64 t, ...); ... } @ depends on patch @ identifier t; identifier fn_update_time =~ "update_time$"; @@ fn_update_time (..., - struct timespec *t, + struct timespec64 *t, ...) { ... } @ depends on patch @ identifier t; @@ lease_get_mtime( ... , - struct timespec *t + struct timespec64 *t ) { ... } @te depends on patch forall@ identifier ts; local idexpression struct inode *inode_node; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn_update_time =~ "update_time$"; identifier fn; expression e, E3; local idexpression struct inode *node1; local idexpression struct inode *node2; local idexpression struct iattr *attr1; local idexpression struct iattr *attr2; local idexpression struct iattr attr; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; @@ ( ( - struct timespec ts; + struct timespec64 ts; | - struct timespec ts = current_time(inode_node); + struct timespec64 ts = current_time(inode_node); ) <+... when != ts ( - timespec_equal(&inode_node->i_xtime, &ts) + timespec64_equal(&inode_node->i_xtime, &ts) | - timespec_equal(&ts, &inode_node->i_xtime) + timespec64_equal(&ts, &inode_node->i_xtime) | - timespec_compare(&inode_node->i_xtime, &ts) + timespec64_compare(&inode_node->i_xtime, &ts) | - timespec_compare(&ts, &inode_node->i_xtime) + timespec64_compare(&ts, &inode_node->i_xtime) | ts = current_time(e) | fn_update_time(..., &ts,...) | inode_node->i_xtime = ts | node1->i_xtime = ts | ts = inode_node->i_xtime | <+... attr1->ia_xtime ...+> = ts | ts = attr1->ia_xtime | ts.tv_sec | ts.tv_nsec | btrfs_set_stack_timespec_sec(..., ts.tv_sec) | btrfs_set_stack_timespec_nsec(..., ts.tv_nsec) | - ts = timespec64_to_timespec( + ts = ... -) | - ts = ktime_to_timespec( + ts = ktime_to_timespec64( ...) | - ts = E3 + ts = timespec_to_timespec64(E3) | - ktime_get_real_ts(&ts) + ktime_get_real_ts64(&ts) | fn(..., - ts + timespec64_to_timespec(ts) ,...) ) ...+> ( <... when != ts - return ts; + return timespec64_to_timespec(ts); ...> ) | - timespec_equal(&node1->i_xtime1, &node2->i_xtime2) + timespec64_equal(&node1->i_xtime2, &node2->i_xtime2) | - timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2) + timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2) | - timespec_compare(&node1->i_xtime1, &node2->i_xtime2) + timespec64_compare(&node1->i_xtime1, &node2->i_xtime2) | node1->i_xtime1 = - timespec_trunc(attr1->ia_xtime1, + timespec64_trunc(attr1->ia_xtime1, ...) | - attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2, + attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2, ...) | - ktime_get_real_ts(&attr1->ia_xtime1) + ktime_get_real_ts64(&attr1->ia_xtime1) | - ktime_get_real_ts(&attr.ia_xtime1) + ktime_get_real_ts64(&attr.ia_xtime1) ) @ depends on patch @ struct inode *node; struct iattr *attr; identifier fn; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; expression e; @@ ( - fn(node->i_xtime); + fn(timespec64_to_timespec(node->i_xtime)); | fn(..., - node->i_xtime); + timespec64_to_timespec(node->i_xtime)); | - e = fn(attr->ia_xtime); + e = fn(timespec64_to_timespec(attr->ia_xtime)); ) @ depends on patch forall @ struct inode *node; struct iattr *attr; identifier i_xtime =~ "^i_[acm]time$"; identifier ia_xtime =~ "^ia_[acm]time$"; identifier fn; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); fn (..., - &attr->ia_xtime, + &ts, ...); ) ...+> } @ depends on patch forall @ struct inode *node; struct iattr *attr; struct kstat *stat; identifier ia_xtime =~ "^ia_[acm]time$"; identifier i_xtime =~ "^i_[acm]time$"; identifier xtime =~ "^[acm]time$"; identifier fn, ret; @@ { + struct timespec ts; <+... ( + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime, + &ts, ...); | + ts = timespec64_to_timespec(node->i_xtime); ret = fn (..., - &node->i_xtime); + &ts); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime, + &ts, ...); | + ts = timespec64_to_timespec(attr->ia_xtime); ret = fn (..., - &attr->ia_xtime); + &ts); | + ts = timespec64_to_timespec(stat->xtime); ret = fn (..., - &stat->xtime); + &ts); ) ...+> } @ depends on patch @ struct inode *node; struct inode *node2; identifier i_xtime1 =~ "^i_[acm]time$"; identifier i_xtime2 =~ "^i_[acm]time$"; identifier i_xtime3 =~ "^i_[acm]time$"; struct iattr *attrp; struct iattr *attrp2; struct iattr attr ; identifier ia_xtime1 =~ "^ia_[acm]time$"; identifier ia_xtime2 =~ "^ia_[acm]time$"; struct kstat *stat; struct kstat stat1; struct timespec64 ts; identifier xtime =~ "^[acmb]time$"; expression e; @@ ( ( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ; | node->i_xtime2 = \( node2->i_xtime1 \| timespec64_trunc(...) \); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | node->i_xtime1 = node->i_xtime3 = \(ts \| current_time(...) \); | stat->xtime = node2->i_xtime1; | stat1.xtime = node2->i_xtime1; | ( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ; | ( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2; | - e = node->i_xtime1; + e = timespec64_to_timespec( node->i_xtime1 ); | - e = attrp->ia_xtime1; + e = timespec64_to_timespec( attrp->ia_xtime1 ); | node->i_xtime1 = current_time(...); | node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | node->i_xtime1 = node->i_xtime3 = - e; + timespec_to_timespec64(e); | - node->i_xtime1 = e; + node->i_xtime1 = timespec_to_timespec64(e); ) Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <anton@tuxera.com> Cc: <balbi@kernel.org> Cc: <bfields@fieldses.org> Cc: <darrick.wong@oracle.com> Cc: <dhowells@redhat.com> Cc: <dsterba@suse.com> Cc: <dwmw2@infradead.org> Cc: <hch@lst.de> Cc: <hirofumi@mail.parknet.co.jp> Cc: <hubcap@omnibond.com> Cc: <jack@suse.com> Cc: <jaegeuk@kernel.org> Cc: <jaharkes@cs.cmu.edu> Cc: <jslaby@suse.com> Cc: <keescook@chromium.org> Cc: <mark@fasheh.com> Cc: <miklos@szeredi.hu> Cc: <nico@linaro.org> Cc: <reiserfs-devel@vger.kernel.org> Cc: <richard@nod.at> Cc: <sage@redhat.com> Cc: <sfrench@samba.org> Cc: <swhiteho@redhat.com> Cc: <tj@kernel.org> Cc: <trond.myklebust@primarydata.com> Cc: <tytso@mit.edu> Cc: <viro@zeniv.linux.org.uk>
2018-06-05Merge tag 'fscrypt_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Add bunch of cleanups, and add support for the Speck128/256 algorithms. Yes, Speck is contrversial, but the intention is to use them only for the lowest end Android devices, where the alternative *really* is no encryption at all for data stored at rest" * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: log the crypto algorithm implementations fscrypt: add Speck128/256 support fscrypt: only derive the needed portion of the key fscrypt: separate key lookup from key derivation fscrypt: use a common logging function fscrypt: remove internal key size constants fscrypt: remove unnecessary check for non-logon key type fscrypt: make fscrypt_operations.max_namelen an integer fscrypt: drop empty name check from fname_decrypt() fscrypt: drop max_namelen check from fname_decrypt() fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info() fscrypt: don't clear flags on crypto transform fscrypt: remove stale comment from fscrypt_d_revalidate() fscrypt: remove error messages for skcipher_request_alloc() failure fscrypt: remove unnecessary NULL check when allocating skcipher fscrypt: clean up after fscrypt_prepare_lookup() conversions fs, fscrypt: only define ->s_cop when FS_ENCRYPTION is enabled fscrypt: use unbound workqueue for decryption
2018-06-05Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A lot of cleanups and bug fixes, especially dealing with corrupted file systems" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: fix fencepost error in check for inode count overflow during resize ext4: correctly handle a zero-length xattr with a non-zero e_value_offs ext4: bubble errors from ext4_find_inline_data_nolock() up to ext4_iget() ext4: do not allow external inodes for inline data ext4: report delalloc reserve as non-free in statfs for project quota ext4: remove NULL check before calling kmem_cache_destroy() jbd2: remove NULL check before calling kmem_cache_destroy() jbd2: remove bunch of empty lines with jbd2 debug ext4: handle errors on ext4_commit_super ext4: do not update s_last_mounted of a frozen fs ext4: factor out helper ext4_sample_last_mounted() vfs: add the sb_start_intwrite_trylock() helper ext4: update mtime in ext4_punch_hole even if no blocks are released ext4: add verifier check for symlink with append/immutable flags fs: ext4: add new return type vm_fault_t ext4: fix hole length detection in ext4_ind_map_blocks() ext4: mark block bitmap corrupted when found ext4: mark inode bitmap corrupted when found ext4: add new ext4_mark_group_bitmap_corrupted() helper ext4: fix wrong return value in ext4_read_inode_bitmap() ...
2018-06-04Merge branch 'work.aio-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull aio updates from Al Viro: "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly. The only thing I'm holding back for a day or so is Adam's aio ioprio - his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case), but let it sit in -next for decency sake..." * 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits) aio: sanitize the limit checking in io_submit(2) aio: fold do_io_submit() into callers aio: shift copyin of iocb into io_submit_one() aio_read_events_ring(): make a bit more readable aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way aio: take list removal to (some) callers of aio_complete() aio: add missing break for the IOCB_CMD_FDSYNC case random: convert to ->poll_mask timerfd: convert to ->poll_mask eventfd: switch to ->poll_mask pipe: convert to ->poll_mask crypto: af_alg: convert to ->poll_mask net/rxrpc: convert to ->poll_mask net/iucv: convert to ->poll_mask net/phonet: convert to ->poll_mask net/nfc: convert to ->poll_mask net/caif: convert to ->poll_mask net/bluetooth: convert to ->poll_mask net/sctp: convert to ->poll_mask net/tipc: convert to ->poll_mask ...
2018-06-04Merge tag 'locks-v4.18-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull fasync fix from Jeff Layton: "Just a single fix for a deadlock in the fasync handling code that Kirill observed while testing. The fix is to change the fa_lock to be rwlock_t, and use a read lock in kill_fasync_rcu" * tag 'locks-v4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fasync: Fix deadlock between task-context and interrupt-context kill_fasync()
2018-06-04Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Misc bits and pieces not fitting into anything more specific" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: delete unnecessary assignment in vfs_listxattr Documentation: filesystems: update filesystem locking documentation vfs: namei: use path_equal() in follow_dotdot() fs.h: fix outdated comment about file flags __inode_security_revalidate() never gets NULL opt_dentry make xattr_getsecurity() static vfat: simplify checks in vfat_lookup() get rid of dead code in d_find_alias() it's SB_BORN, not MS_BORN... msdos_rmdir(): kill BS comment remove rpc_rmdir() fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()
2018-05-31fs: Add aio iopriority supportAdam Manzanares
This is the per-I/O equivalent of the ioprio_set system call. When IOCB_FLAG_IOPRIO is set on the iocb aio_flags field, then we set the newly added kiocb ki_ioprio field to the value in the iocb aio_reqprio field. This patch depends on block: add ioprio_check_cap function. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-31fs: Convert kiocb rw_hint from enum to u16Adam Manzanares
In order to avoid kiocb bloat for per command iopriority support, rw_hint is converted from enum to a u16. Added a guard around ki_hint assignment. Signed-off-by: Adam Manzanares <adam.manzanares@wdc.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-31vfs: factor out inode_insert5()Miklos Szeredi
Split out common helper for race free insertion of an already allocated inode into the cache. Use this from iget5_locked() and insert_inode_locked4(). Make iget5_locked() use new_inode()/iput() instead of alloc_inode()/destroy_inode() directly. Also export to modules for use by filesystems which want to preallocate an inode before file/directory creation. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-05-29block: don't print a message when the device went awayChristoph Hellwig
The information about a size change in this case just creates confusion. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-26fs: introduce new ->get_poll_head and ->poll_mask methodsChristoph Hellwig
->get_poll_head returns the waitqueue that the poll operation is going to sleep on. Note that this means we can only use a single waitqueue for the poll, unlike some current drivers that use two waitqueues for different events. But now that we have keyed wakeups and heavily use those for poll there aren't that many good reason left to keep the multiple waitqueues, and if there are any ->poll is still around, the driver just won't support aio poll. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-05-25fs: add timespec64_truncate()Deepa Dinamani
As vfs moves to using struct timespec64 to represent times, update the argument to timespec_truncate() to use struct timespec64. Also change the name of the function. The rest of the implementation logic is the same. Move this to fs/inode.c instead of kernel/time/time.c as all the users of this api are filesystems. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Cc: <viro@zeniv.linux.org.uk>
2018-05-20fs, fscrypt: only define ->s_cop when FS_ENCRYPTION is enabledEric Biggers
Now that filesystems only set and use their fscrypt_operations when they are built with encryption support, we can remove ->s_cop from 'struct super_block' when FS_ENCRYPTION is disabled. This saves a few bytes on some kernels and also makes it consistent with ->i_crypt_info. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-05-17fs.h: fix outdated comment about file flagsLi Qiang
The __dentry_open function was removed in commit <2a027e7a18738>("fold __dentry_open() into its sole caller"). Signed-off-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-05-13vfs: add the sb_start_intwrite_trylock() helperAmir Goldstein
Needed by ext4 to test frozen fs before updating s_last_mounted. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2018-05-01fasync: Fix deadlock between task-context and interrupt-context kill_fasync()Kirill Tkhai
I observed the following deadlock between them: [task 1] [task 2] [task 3] kill_fasync() mm_update_next_owner() copy_process() spin_lock_irqsave(&fa->fa_lock) read_lock(&tasklist_lock) write_lock_irq(&tasklist_lock) send_sigio() <IRQ> ... read_lock(&fown->lock) kill_fasync() ... read_lock(&tasklist_lock) spin_lock_irqsave(&fa->fa_lock) ... Task 1 can't acquire read locked tasklist_lock, since there is already task 3 expressed its wish to take the lock exclusive. Task 2 holds the read locked lock, but it can't take the spin lock. Also, there is possible another deadlock (which I haven't observed): [task 1] [task 2] f_getown() kill_fasync() read_lock(&f_own->lock) spin_lock_irqsave(&fa->fa_lock,) <IRQ> send_sigio() write_lock_irq(&f_own->lock) kill_fasync() read_lock(&fown->lock) spin_lock_irqsave(&fa->fa_lock,) Actually, we do not need exclusive fa->fa_lock in kill_fasync_rcu(), as it guarantees fa->fa_file->f_owner integrity only. It may seem, that it used to give a task a small possibility to receive two sequential signals, if there are two parallel kill_fasync() callers, and task handles the first signal fastly, but the behaviour won't become different, since there is exclusive sighand lock in do_send_sig_info(). The patch converts fa_lock into rwlock_t, and this fixes two above deadlocks, as rwlock is allowed to be taken from interrupt handler by qrwlock design. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-04-12Merge branch 'work.thaw' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs thaw updates from Al Viro: "An ancient series that has fallen through the cracks in the previous cycle" * 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: buffer.c: call thaw_super during emergency thaw vfs: factor sb iteration out of do_emergency_remount
2018-04-12Merge branch 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull AFS updates from Al Viro: "The AFS series posted by dhowells depended upon lookup_one_len() rework; now that prereq is in the mainline, that series had been rebased on top of it and got some exposure and testing..." * 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Do better accretion of small writes on newly created content afs: Add stats for data transfer operations afs: Trace protocol errors afs: Locally edit directory data for mkdir/create/unlink/... afs: Adjust the directory XDR structures afs: Split the directory content defs into a header afs: Fix directory handling afs: Split the dynroot stuff out and give it its own ops tables afs: Keep track of invalid-before version for dentry coherency afs: Rearrange status mapping afs: Make it possible to get the data version in readpage afs: Init inode before accessing cache afs: Introduce a statistics proc file afs: Dump bad status record afs: Implement @cell substitution handling afs: Implement @sys substitution handling afs: Prospectively look up extra files when doing a single lookup afs: Don't over-increment the cell usage count when pinning it afs: Fix checker warnings vfs: Remove the const from dir_context::actor
2018-04-11page cache: use xa_lockMatthew Wilcox
Remove the address_space ->tree_lock and use the xa_lock newly added to the radix_tree_root. Rename the address_space ->page_tree to ->i_pages, since we don't really care that it's a tree. [willy@infradead.org: fix nds32, fs/dax.c] Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-10Merge tag 'libnvdimm-for-4.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit d2c997c0f145 ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...
2018-04-09vfs: Remove the const from dir_context::actorDavid Howells
Remove the const marking from the actor function pointer in the dir_context struct. The const prevents the structure from being used as part of a kmalloc'd object as it makes the compiler require that the actor member be set at object initialisation time (or not at all), incuring something like the following error if you try and set it later: fs/afs/dir.c:556:20: error: assignment of read-only member 'actor' Marking the member const like this adds very little in the way of sanity checking as the type checking system is likely to provide sufficient - and if not, the kernel is very likely to oops repeatably in this case. Fixes: ac6614b76478 ("[readdir] constify ->actor") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-09Merge branch 'for-4.17/dax' into libnvdimm-for-nextDan Williams
2018-04-07Merge branch 'next-integrity' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull integrity updates from James Morris: "A mixture of bug fixes, code cleanup, and continues to close IMA-measurement, IMA-appraisal, and IMA-audit gaps. Also note the addition of a new cred_getsecid LSM hook by Matthew Garrett: For IMA purposes, we want to be able to obtain the prepared secid in the bprm structure before the credentials are committed. Add a cred_getsecid hook that makes this possible. which is used by a new CREDS_CHECK target in IMA: In ima_bprm_check(), check with both the existing process credentials and the credentials that will be committed when the new process is started. This will not change behaviour unless the system policy is extended to include CREDS_CHECK targets - BPRM_CHECK will continue to check the same credentials that it did previously" * 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: ima: Fallback to the builtin hash algorithm ima: Add smackfs to the default appraise/measure list evm: check for remount ro in progress before writing ima: Improvements in ima_appraise_measurement() ima: Simplify ima_eventsig_init() integrity: Remove unused macro IMA_ACTION_RULE_FLAGS ima: drop vla in ima_audit_measurement() ima: Fix Kconfig to select TPM 2.0 CRB interface evm: Constify *integrity_status_msg[] evm: Move evm_hmac and evm_hash from evm_main.c to evm_crypto.c fuse: define the filesystem as untrusted ima: fail signature verification based on policy ima: clear IMA_HASH ima: re-evaluate files on privileged mounted filesystems ima: fail file signature verification on non-init mounted filesystems IMA: Support using new creds in appraisal policy security: Add a cred_getsecid hook
2018-04-06Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "Assorted stuff, including Christoph's I_DIRTY patches" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: move I_DIRTY_INODE to fs.h ubifs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call ntfs: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) call gfs2: fix bogus __mark_inode_dirty(I_DIRTY_SYNC | I_DIRTY_DATASYNC) calls fs: fold open_check_o_direct into do_dentry_open vfs: Replace stray non-ASCII homoglyph characters with their ASCII equivalents vfs: make sure struct filename->iname is word-aligned get rid of pointless includes of fs_struct.h [poll] annotate SAA6588_CMD_POLL users
2018-04-05Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "It's a pretty quiet round this time, which is nice. This contains: - series from Bart, cleaning up the way we set/test/clear atomic queue flags. - series from Bart, fixing races between gendisk and queue registration and removal. - set of bcache fixes and improvements from various folks, by way of Michael Lyle. - set of lightnvm updates from Matias, most of it being the 1.2 to 2.0 transition. - removal of unused DIO flags from Nikolay. - blk-mq/sbitmap memory ordering fixes from Omar. - divide-by-zero fix for BFQ from Paolo. - minor documentation patches from Randy. - timeout fix from Tejun. - Alpha "can't write a char atomically" fix from Mikulas. - set of NVMe fixes by way of Keith. - bsg and bsg-lib improvements from Christoph. - a few sed-opal fixes from Jonas. - cdrom check-disk-change deadlock fix from Maurizio. - various little fixes, comment fixes, etc from various folks" * tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits) blk-mq: Directly schedule q->timeout_work when aborting a request blktrace: fix comment in blktrace_api.h lightnvm: remove function name in strings lightnvm: pblk: remove some unnecessary NULL checks lightnvm: pblk: don't recover unwritten lines lightnvm: pblk: implement 2.0 support lightnvm: pblk: implement get log report chunk lightnvm: pblk: rename ppaf* to addrf* lightnvm: pblk: check for supported version lightnvm: implement get log report chunk helpers lightnvm: make address conversions depend on generic device lightnvm: add support for 2.0 address format lightnvm: normalize geometry nomenclature lightnvm: complete geo structure with maxoc* lightnvm: add shorten OCSSD version in geo lightnvm: add minor version to generic geometry lightnvm: simplify geometry structure lightnvm: pblk: refactor init/exit sequences lightnvm: Avoid validation of default op value lightnvm: centralize permission check for lightnvm ioctl ...
2018-03-30fs, dax: prepare for dax-specific address_space_operationsDan Williams
In preparation for the dax implementation to start associating dax pages to inodes via page->mapping, we need to provide a 'struct address_space_operations' instance for dax. Define some generic VFS aops helpers for dax. These noop implementations are there in the dax case to prevent the VFS from falling back to operations with page-cache assumptions, dax_writeback_mapping_range() may not be referenced in the FS_DAX=n case. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Matthew Wilcox <mawilcox@microsoft.com> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-03-28fs: move I_DIRTY_INODE to fs.hChristoph Hellwig
And use it in a few more places rather than opencoding the values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-23ima: fail file signature verification on non-init mounted filesystemsMimi Zohar
FUSE can be mounted by unprivileged users either today with fusermount installed with setuid, or soon with the upcoming patches to allow FUSE mounts in a non-init user namespace. This patch addresses the new unprivileged non-init mounted filesystems, which are untrusted, by failing the signature verification. This patch defines two new flags SB_I_IMA_UNVERIFIABLE_SIGNATURE and SB_I_UNTRUSTED_MOUNTER. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Dongsu Park <dongsu@kinvolk.io> Cc: Alban Crequy <alban@kinvolk.io> Acked-by: Serge Hallyn <serge@hallyn.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-03-19buffer.c: call thaw_super during emergency thawMateusz Guzik
There are 2 distinct freezing mechanisms - one operates on block devices and another one directly on super blocks. Both end up with the same result, but thaw of only one of these does not thaw the other. In particular fsfreeze --freeze uses the ioctl variant going to the super block. Since prior to this patch emergency thaw was not doing a relevant thaw, filesystems frozen with this method remained unaffected. The patch is a hack which adds blind unfreezing. In order to keep the super block write-locked the whole time the code is shuffled around and the newly introduced __iterate_supers is employed. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-19vfs: make sure struct filename->iname is word-alignedRasmus Villemoes
I noticed that offsetof(struct filename, iname) is actually 28 on 64 bit platforms, so we always pass an unaligned pointer to strncpy_from_user. This is mostly a problem for those 64 bit platforms without HAVE_EFFICIENT_UNALIGNED_ACCESS, but even on x86_64, unaligned accesses carry a penalty. A user-space microbenchmark doing nothing but strncpy_from_user from the same (aligned) source string runs about 5% faster when the destination is aligned. That number increases to 20% when the string is long enough (~32 bytes) that we cross a cache line boundary - that's for example the case for about half the files a "git status" in a kernel tree ends up stat'ing. This won't make any real-life workloads 5%, or even 1%, faster, but path lookup is common enough that cutting even a few cycles should be worthwhile. So ensure we always pass an aligned destination pointer to strncpy_from_user. Instead of explicit padding, simply swap the refcnt and aname members, as suggested by Al Viro. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: - backport-friendly part of lock_parent() race fix - a fix for an assumption in the heurisic used by path_connected() that is not true on NFS - livelock fixes for d_alloc_parallel() * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs: Teach path_connected to handle nfs filesystems with multiple roots. fs: dcache: Use READ_ONCE when accessing i_dir_seq fs: dcache: Avoid livelock between d_alloc_parallel and __d_add lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
2018-03-15fs: Teach path_connected to handle nfs filesystems with multiple roots.Eric W. Biederman
On nfsv2 and nfsv3 the nfs server can export subsets of the same filesystem and report the same filesystem identifier, so that the nfs client can know they are the same filesystem. The subsets can be from disjoint directory trees. The nfsv2 and nfsv3 filesystems provides no way to find the common root of all directory trees exported form the server with the same filesystem identifier. The practical result is that in struct super s_root for nfs s_root is not necessarily the root of the filesystem. The nfs mount code sets s_root to the root of the first subset of the nfs filesystem that the kernel mounts. This effects the dcache invalidation code in generic_shutdown_super currently called shrunk_dcache_for_umount and that code for years has gone through an additional list of dentries that might be dentry trees that need to be freed to accomodate nfs. When I wrote path_connected I did not realize nfs was so special, and it's hueristic for avoiding calling is_subdir can fail. The practical case where this fails is when there is a move of a directory from the subtree exposed by one nfs mount to the subtree exposed by another nfs mount. This move can happen either locally or remotely. With the remote case requiring that the move directory be cached before the move and that after the move someone walks the path to where the move directory now exists and in so doing causes the already cached directory to be moved in the dcache through the magic of d_splice_alias. If someone whose working directory is in the move directory or a subdirectory and now starts calling .. from the initial mount of nfs (where s_root == mnt_root), then path_connected as a heuristic will not bother with the is_subdir check. As s_root really is not the root of the nfs filesystem this heuristic is wrong, and the path may actually not be connected and path_connected can fail. The is_subdir function might be cheap enough that we can call it unconditionally. Verifying that will take some benchmarking and the result may not be the same on all kernels this fix needs to be backported to. So I am avoiding that for now. Filesystems with snapshots such as nilfs and btrfs do something similar. But as the directory tree of the snapshots are disjoint from one another and from the main directory tree rename won't move things between them and this problem will not occur. Cc: stable@vger.kernel.org Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Fixes: 397d425dc26d ("vfs: Test for and handle paths that are unreachable from their mnt_root") Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-03-12direct-io: Remove unused DIO_SKIP_DIO_COUNT logicNikolay Borisov
This flag was added by fe0f07d08ee3 ("direct-io: only inc/deci inode->i_dio_count for file systems") as means to optimise the atomic modificaiton of the variable for blockdevices. However with the advent of 542ff7bf18c6 ("block: new direct I/O implementation") it became unused. So let's remove it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-12direct-io: Remove unused DIO_ASYNC_EXTEND flagNikolay Borisov
This flag was added by 6039257378e4 ("direct-io: add flag to allow aio writes beyond i_size") to support XFS. However, with the rework of XFS' DIO's path to use iomap in acdda3aae146 ("xfs: use iomap_dio_rw") it became redundant. So let's remove it. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-02-26dax: fix vma_is_fsdax() helperDan Williams
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on device-dax instances when those are meant to be explicitly allowed. Fixes: 2bb6d2837083 ("mm: introduce get_user_pages_longterm") Cc: <stable@vger.kernel.org> Reported-by: Gerd Rausch <gerd.rausch@oracle.com> Acked-by: Jane Chu <jane.chu@oracle.com> Reported-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2018-01-31Merge tag 'xfs-4.16-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "This merge cycle, we're again some substantive changes to XFS. Metadata verifiers have been restructured to provide more detail about which part of a metadata structure failed checks, and we've enhanced the new online fsck feature to cross-reference extent allocation information with the other metadata structures. With this pull, the metadata verification part of online fsck is more or less finished, though the feature is still experimental and still disabled by default. We're also preparing to remove the EXPERIMENTAL tag from a couple of features this cycle. This week we're committing a bunch of space accounting fixes for reflink and removing the EXPERIMENTAL tag from reflink; I anticipate that we'll be ready to do the same for the reverse mapping feature next week. (I don't have any pending fixes for rmap; however I wish to remove the tags one at a time.) This giant pile of patches has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. Let me know if there's any merge problems -- git merge reported that one of our patches touched the same function as the i_version series, but it resolved things cleanly. Summary: - Log faulting code locations when verifiers fail, for improved diagnosis of corrupt filesystems. - Implement metadata verifiers for local format inode fork data. - Online scrub now cross-references metadata records with other metadata. - Refactor the fs geometry ioctl generation functions. - Harden various metadata verifiers. - Fix various accounting problems. - Fix uncancelled transactions leaking when xattr functions fail. - Prevent the copy-on-write speculative preallocation garbage collector from racing with writeback. - Emit log reservation type information as trace data so that we can compare against xfsprogs. - Fix some erroneous asserts in the online scrub code. - Clean up the transaction reservation calculations. - Fix various minor bugs in online scrub. - Log complaints about mixed dio/buffered writes once per day and less noisily than before. - Refactor buffer log item lists to use list_head. - Break PNFS leases before reflinking blocks. - Reduce lock contention on reflink source files. - Fix some quota accounting problems with reflink. - Fix a serious corruption problem in the direct cow write code where we fed bad iomaps to the vfs iomap consumers. - Various other refactorings. - Remove EXPERIMENTAL tag from reflink!" * tag 'xfs-4.16-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (94 commits) xfs: remove experimental tag for reflinks xfs: don't screw up direct writes when freesp is fragmented xfs: check reflink allocation mappings iomap: warn on zero-length mappings xfs: treat CoW fork operations as delalloc for quota accounting xfs: only grab shared inode locks for source file during reflink xfs: allow xfs_lock_two_inodes to take different EXCL/SHARED modes xfs: reflink should break pnfs leases before sharing blocks xfs: don't clobber inobt/finobt cursors when xref with rmap xfs: skip CoW writes past EOF when writeback races with truncate xfs: preserve i_rdev when recycling a reclaimable inode xfs: refactor accounting updates out of xfs_bmap_btalloc xfs: refactor inode verifier corruption error printing xfs: make tracepoint inode number format consistent xfs: always zero di_flags2 when we free the inode xfs: call xfs_qm_dqattach before performing reflink operations xfs: bmap code cleanup Use list_head infra-structure for buffer's log items list Split buffer's b_fspriv field Get rid of xfs_buf_log_item_t typedef ...
2018-01-31Merge branch 'work.misc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs updates from Al Viro: "All kinds of misc stuff, without any unifying topic, from various people. Neil's d_anon patch, several bugfixes, introduction of kvmalloc analogue of kmemdup_user(), extending bitfield.h to deal with fixed-endians, assorted cleanups all over the place..." * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits) alpha: osf_sys.c: use timespec64 where appropriate alpha: osf_sys.c: fix put_tv32 regression jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path dcache: delete unused d_hash_mask dcache: subtract d_hash_shift from 32 in advance fs/buffer.c: fold init_buffer() into init_page_buffers() fs: fold __inode_permission() into inode_permission() fs: add RWF_APPEND sctp: use vmemdup_user() rather than badly open-coding memdup_user() snd_ctl_elem_init_enum_names(): switch to vmemdup_user() replace_user_tlv(): switch to vmemdup_user() new primitive: vmemdup_user() memdup_user(): switch to GFP_USER eventfd: fold eventfd_ctx_get() into eventfd_ctx_fileget() eventfd: fold eventfd_ctx_read() into eventfd_read() eventfd: convert to use anon_inode_getfd() nfs4file: get rid of pointless include of btrfs.h uvc_v4l2: clean copyin/copyout up vme_user: don't use __copy_..._user() usx2y: don't bother with memdup_user() for 16-byte structure ...
2018-01-30Merge branch 'work.mqueue' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull mqueue/bpf vfs cleanups from Al Viro: "mqueue and bpf go through rather painful and similar contortions to create objects in their dentry trees. Provide a primitive for doing that without abusing ->mknod(), switch bpf and mqueue to it. Another mqueue-related thing that has ended up in that branch is on-demand creation of internal mount (based upon the work of Giuseppe Scrivano)" * 'work.mqueue' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: mqueue: switch to on-demand creation of internal mount tidy do_mq_open() up a bit mqueue: clean prepare_open() up do_mq_open(): move all work prior to dentry_open() into a helper mqueue: fold mq_attr_ok() into mqueue_get_inode() move dentry_open() calls up into do_mq_open() mqueue: switch to vfs_mkobj(), quit abusing ->d_fsdata bpf_obj_do_pin(): switch to vfs_mkobj(), quit abusing ->mknod() new primitive: vfs_mkobj()
2018-01-30Merge branch 'misc.poll' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull poll annotations from Al Viro: "This introduces a __bitwise type for POLL### bitmap, and propagates the annotations through the tree. Most of that stuff is as simple as 'make ->poll() instances return __poll_t and do the same to local variables used to hold the future return value'. Some of the obvious brainos found in process are fixed (e.g. POLLIN misspelled as POLL_IN). At that point the amount of sparse warnings is low and most of them are for genuine bugs - e.g. ->poll() instance deciding to return -EINVAL instead of a bitmap. I hadn't touched those in this series - it's large enough as it is. Another problem it has caught was eventpoll() ABI mess; select.c and eventpoll.c assumed that corresponding POLL### and EPOLL### were equal. That's true for some, but not all of them - EPOLL### are arch-independent, but POLL### are not. The last commit in this series separates userland POLL### values from the (now arch-independent) kernel-side ones, converting between them in the few places where they are copied to/from userland. AFAICS, this is the least disruptive fix preserving poll(2) ABI and making epoll() work on all architectures. As it is, it's simply broken on sparc - try to give it EPOLLWRNORM and it will trigger only on what would've triggered EPOLLWRBAND on other architectures. EPOLLWRBAND and EPOLLRDHUP, OTOH, are never triggered at all on sparc. With this patch they should work consistently on all architectures" * 'misc.poll' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (37 commits) make kernel-side POLL... arch-independent eventpoll: no need to mask the result of epi_item_poll() again eventpoll: constify struct epoll_event pointers debugging printk in sg_poll() uses %x to print POLL... bitmap annotate poll(2) guts 9p: untangle ->poll() mess ->si_band gets POLL... bitmap stored into a user-visible long field ring_buffer_poll_wait() return value used as return value of ->poll() the rest of drivers/*: annotate ->poll() instances media: annotate ->poll() instances fs: annotate ->poll() instances ipc, kernel, mm: annotate ->poll() instances net: annotate ->poll() instances apparmor: annotate ->poll() instances tomoyo: annotate ->poll() instances sound: annotate ->poll() instances acpi: annotate ->poll() instances crypto: annotate ->poll() instances block: annotate ->poll() instances x86: annotate ->poll() instances ...
2018-01-29xfs: only grab shared inode locks for source file during reflinkDarrick J. Wong
Reflink and dedupe operations remap blocks from a source file into a destination file. The destination file needs exclusive locks on all levels because we're updating its block map, but the source file isn't undergoing any block map changes so we can use a shared lock. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>