summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2016-11-14ext4: fix stack memory corruption with 64k block sizeChandan Rajendra
The number of 'counters' elements needed in 'struct sg' is super_block->s_blocksize_bits + 2. Presently we have 16 'counters' elements in the array. This is insufficient for block sizes >= 32k. In such cases the memcpy operation performed in ext4_mb_seq_groups_show() would cause stack memory corruption. Fixes: c9de560ded61f Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
2016-11-14ext4: fix mballoc breakage with 64k block sizeChandan Rajendra
'border' variable is set to a value of 2 times the block size of the underlying filesystem. With 64k block size, the resulting value won't fit into a 16-bit variable. Hence this commit changes the data type of 'border' to 'unsigned int'. Fixes: c9de560ded61f Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org
2016-11-13ext4: don't lock buffer in ext4_commit_super if holding spinlockTheodore Ts'o
If there is an error reported in mballoc via ext4_grp_locked_error(), the code is holding a spinlock, so ext4_commit_super() must not try to lock the buffer head, or else it will trigger a BUG: BUG: sleeping function called from invalid context at ./include/linux/buffer_head.h:358 in_atomic(): 1, irqs_disabled(): 0, pid: 993, name: mount CPU: 0 PID: 993 Comm: mount Not tainted 4.9.0-rc1-clouder1 #62 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 ffff880006423548 ffffffff81318c89 ffffffff819ecdd0 0000000000000166 ffff880006423558 ffffffff810810b0 ffff880006423580 ffffffff81081153 ffff880006e5a1a0 ffff88000690e400 0000000000000000 ffff8800064235c0 Call Trace: [<ffffffff81318c89>] dump_stack+0x67/0x9e [<ffffffff810810b0>] ___might_sleep+0xf0/0x140 [<ffffffff81081153>] __might_sleep+0x53/0xb0 [<ffffffff8126c1dc>] ext4_commit_super+0x19c/0x290 [<ffffffff8126e61a>] __ext4_grp_locked_error+0x14a/0x230 [<ffffffff81081153>] ? __might_sleep+0x53/0xb0 [<ffffffff812822be>] ext4_mb_generate_buddy+0x1de/0x320 Since ext4_grp_locked_error() calls ext4_commit_super with sync == 0 (and it is the only caller which does so), avoid locking and unlocking the buffer in this case. This can result in races with ext4_commit_super() if there are other problems (which is what commit 4743f83990614 was trying to address), but a Warning is better than BUG. Fixes: 4743f83990614 Cc: stable@vger.kernel.org # 4.9 Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-13ext4: allow ext4_ext_truncate() to return an errorTheodore Ts'o
Return errors to the caller instead of declaring the file system corrupted. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-13ext4: allow ext4_truncate() to return an errorTheodore Ts'o
This allows us to properly propagate errors back up to ext4_truncate()'s callers. This also means we no longer have to silently ignore some errors (e.g., when trying to add the inode to the orphan inode list). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-13Merge branch 'fscrypt' into originTheodore Ts'o
2016-11-13Merge branch 'dax-4.10-iomap-pmd' into originTheodore Ts'o
2016-11-13fscrypto: don't use on-stack buffer for key derivationEric Biggers
With the new (in 4.9) option to use a virtually-mapped stack (CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for the scatterlist crypto API because they may not be directly mappable to struct page. get_crypt_info() was using a stack buffer to hold the output from the encryption operation used to derive the per-file key. Fix it by using a heap buffer. This bug could most easily be observed in a CONFIG_DEBUG_SG kernel because this allowed the BUG in sg_set_buf() to be triggered. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13fscrypto: don't use on-stack buffer for filename encryptionEric Biggers
With the new (in 4.9) option to use a virtually-mapped stack (CONFIG_VMAP_STACK), stack buffers cannot be used as input/output for the scatterlist crypto API because they may not be directly mappable to struct page. For short filenames, fname_encrypt() was encrypting a stack buffer holding the padded filename. Fix it by encrypting the filename in-place in the output buffer, thereby making the temporary buffer unnecessary. This bug could most easily be observed in a CONFIG_DEBUG_SG kernel because this allowed the BUG in sg_set_buf() to be triggered. Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13fscrypt: Let fs select encryption index/tweakDavid Gstir
Avoid re-use of page index as tweak for AES-XTS when multiple parts of same page are encrypted. This will happen on multiple (partial) calls of fscrypt_encrypt_page on same page. page->index is only valid for writeback pages. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13fscrypt: Constify struct inode pointerDavid Gstir
Some filesystems, such as UBIFS, maintain a const pointer for struct inode. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13fscrypt: Enable partial page encryptionDavid Gstir
Not all filesystems work on full pages, thus we should allow them to hand partial pages to fscrypt for en/decryption. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13fscrypt: Allow fscrypt_decrypt_page() to function with non-writeback pagesDavid Gstir
Some filesystem might pass pages which do not have page->mapping->host set to the encrypted inode. We want the caller to explicitly pass the corresponding inode. Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-13fscrypt: Add in-place encryption modeDavid Gstir
ext4 and f2fs require a bounce page when encrypting pages. However, not all filesystems will need that (eg. UBIFS). This is handled via a flag on fscrypt_operations where a fs implementation can select in-place encryption over using a bounce page (which is the default). Signed-off-by: David Gstir <david@sigma-star.at> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-10dax: Introduce IOMAP_FAULT flagJan Kara
Introduce a flag telling iomap operations whether they are handling a fault or other IO. That may influence behavior wrt inode size and similar things. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: remove "depends on BROKEN" from FS_DAX_PMDRoss Zwisler
Now that DAX PMD faults are once again working and are now participating in DAX's radix tree locking scheme, allow their config option to be enabled. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08xfs: use struct iomap based DAX PMD fault pathRoss Zwisler
Switch xfs_filemap_pmd_fault() from using dax_pmd_fault() to the new and improved dax_iomap_pmd_fault(). Also, now that it has no more users, remove xfs_get_blocks_dax_fault(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: add struct iomap based DAX PMD supportRoss Zwisler
DAX PMDs have been disabled since Jan Kara introduced DAX radix tree based locking. This patch allows DAX PMDs to participate in the DAX radix tree based locking scheme so that they can be re-enabled using the new struct iomap based fault handlers. There are currently three types of DAX 4k entries: 4k zero pages, 4k DAX mappings that have an associated block allocation, and 4k DAX empty entries. The empty entries exist to provide locking for the duration of a given page fault. This patch adds three equivalent 2MiB DAX entries: Huge Zero Page (HZP) entries, PMD DAX entries that have associated block allocations, and 2 MiB DAX empty entries. Unlike the 4k case where we insert a struct page* into the radix tree for 4k zero pages, for HZP we insert a DAX exceptional entry with the new RADIX_DAX_HZP flag set. This is because we use a single 2 MiB zero page in every 2MiB hole mapping, and it doesn't make sense to have that same struct page* with multiple entries in multiple trees. This would cause contention on the single page lock for the one Huge Zero Page, and it would break the page->index and page->mapping associations that are assumed to be valid in many other places in the kernel. One difficult use case is when one thread is trying to use 4k entries in radix tree for a given offset, and another thread is using 2 MiB entries for that same offset. The current code handles this by making the 2 MiB user fall back to 4k entries for most cases. This was done because it is the simplest solution, and because the use of 2MiB pages is already opportunistic. If we were to try to upgrade from 4k pages to 2MiB pages for a given range, we run into the problem of how we lock out 4k page faults for the entire 2MiB range while we clean out the radix tree so we can insert the 2MiB entry. We can solve this problem if we need to, but I think that the cases where both 2MiB entries and 4K entries are being used for the same range will be rare enough and the gain small enough that it probably won't be worth the complexity. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: move put_(un)locked_mapping_entry() in dax.cRoss Zwisler
No functional change. The static functions put_locked_mapping_entry() and put_unlocked_mapping_entry() will soon be used in error cases in grab_mapping_entry(), so move their definitions above this function. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: move RADIX_DAX_* defines to dax.hRoss Zwisler
The RADIX_DAX_* defines currently mostly live in fs/dax.c, with just RADIX_DAX_ENTRY_LOCK being in include/linux/dax.h so it can be used in mm/filemap.c. When we add PMD support, though, mm/filemap.c will also need access to the RADIX_DAX_PTE type so it can properly construct a 4k sized empty entry. Instead of shifting the defines between dax.c and dax.h as they are individually used in other code, just move them wholesale to dax.h so they'll be available when we need them. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: dax_iomap_fault() needs to call iomap_end()Ross Zwisler
Currently iomap_end() doesn't do anything for DAX page faults for both ext2 and XFS. ext2_iomap_end() just checks for a write underrun, and xfs_file_iomap_end() checks to see if it needs to finish a delayed allocation. However, in the future iomap_end() calls might be needed to make sure we have balanced allocations, locks, etc. So, add calls to iomap_end() with appropriate error handling to dax_iomap_fault(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: add dax_iomap_sector() helper functionRoss Zwisler
To be able to correctly calculate the sector from a file position and a struct iomap there is a complex little bit of logic that currently happens in both dax_iomap_actor() and dax_iomap_fault(). This will need to be repeated yet again in the DAX PMD fault handler when it is added, so break it out into a helper function. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: correct dax iomap code namespaceRoss Zwisler
The recently added DAX functions that use the new struct iomap data structure were named iomap_dax_rw(), iomap_dax_fault() and iomap_dax_actor(). These are actually defined in fs/dax.c, though, so should be part of the "dax" namespace and not the "iomap" namespace. Rename them to dax_iomap_rw(), dax_iomap_fault() and dax_iomap_actor() respectively. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Dave Chinner <david@fromorbit.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: remove dax_pmd_fault()Ross Zwisler
dax_pmd_fault() is the old struct buffer_head + get_block_t based 2 MiB DAX fault handler. This fault handler has been disabled for several kernel releases, and support for PMDs will be reintroduced using the struct iomap interface instead. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: coordinate locking for offsets in PMD rangeRoss Zwisler
DAX radix tree locking currently locks entries based on the unique combination of the 'mapping' pointer and the pgoff_t 'index' for the entry. This works for PTEs, but as we move to PMDs we will need to have all the offsets within the range covered by the PMD to map to the same bit lock. To accomplish this, for ranges covered by a PMD entry we will instead lock based on the page offset of the beginning of the PMD entry. The 'mapping' pointer is still used in the same way. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: consistent variable naming for DAX entriesRoss Zwisler
No functional change. Consistently use the variable name 'entry' instead of 'ret' for DAX radix tree entries. This was already happening in most of the code, so update get_unlocked_mapping_entry(), grab_mapping_entry() and dax_unlock_mapping_entry(). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: remove the last BUG_ON() from fs/dax.cRoss Zwisler
Don't take down the kernel if we get an invalid 'from' and 'length' argument pair. Just warn once and return an error. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: make 'wait_table' global variable staticRoss Zwisler
The global 'wait_table' variable is only used within fs/dax.c, and generates the following sparse warning: fs/dax.c:39:19: warning: symbol 'wait_table' was not declared. Should it be static? Make it static so it has scope local to fs/dax.c, and to make sparse happy. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08ext2: remove support for DAX PMD faultsRoss Zwisler
DAX PMD support was added via the following commit: commit e7b1ea2ad658 ("ext2: huge page fault support") I believe this path to be untested as ext2 doesn't reliably provide block allocations that are aligned to 2MiB. In my testing I've been unable to get ext2 to actually fault in a PMD. It always fails with a "pfn unaligned" message because the sector returned by ext2_get_block() isn't aligned. I've tried various settings for the "stride" and "stripe_width" extended options to mkfs.ext2, without any luck. Since we can't reliably get PMDs, remove support so that we don't have an untested code path that we may someday traverse when we happen to get an aligned block allocation. This should also make 4k DAX faults in ext2 a bit faster since they will no longer have to call the PMD fault handler only to get a response of VM_FAULT_FALLBACK. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08dax: remove buffer_size_valid()Ross Zwisler
Now that ext4 properly sets bh.b_size when we call get_block() for a hole, rely on that value and remove the buffer_size_valid() sanity check. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-08ext4: tell DAX the size of allocation holesRoss Zwisler
When DAX calls _ext4_get_block() and the file offset points to a hole we currently don't set bh->b_size. This is current worked around via buffer_size_valid() in fs/dax.c. _ext4_get_block() has the hole size information from ext4_map_blocks(), so populate bh->b_size so we can remove buffer_size_valid() in a later patch. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-11-04Merge tag 'nfsd-4.9-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd bugfixes from Bruce Fields: "Fixes for some recent regressions including fallout from the vmalloc'd stack change (after which we can no longer encrypt stuff on the stack)" * tag 'nfsd-4.9-1' of git://linux-nfs.org/~bfields/linux: nfsd: Fix general protection fault in release_lock_stateid() svcrdma: backchannel cannot share a page for send and rcv buffers sunrpc: fix some missing rq_rbuffer assignments sunrpc: don't pass on-stack memory to sg_set_buf nfsd: move blocked lock handling under a dedicated spinlock
2016-11-04Merge branch 'for-4.9-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from Chris Mason: "Some fixes that Dave Sterba collected. We held off on these last week because I was focused on the memory corruption testing" * 'for-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix WARNING in btrfs_select_ref_head() Btrfs: remove some no-op casts btrfs: pass correct args to btrfs_async_run_delayed_refs() btrfs: make file clone aware of fatal signals btrfs: qgroup: Prevent qgroup->reserved from going subzero Btrfs: kill BUG_ON in do_relocation
2016-11-04Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fixes from Miklos Szeredi: "Fix two more POSIX ACL bugs introduced in 4.8 and add a missing fsync during copy up to prevent possible data loss" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fsync after copy-up ovl: fix get_acl() on tmpfs ovl: update S_ISGID when setting posix ACLs
2016-11-01nfsd: Fix general protection fault in release_lock_stateid()Chuck Lever
When I push NFSv4.1 / RDMA hard, (xfstests generic/089, for example), I get this crash on the server: Oct 28 22:04:30 klimt kernel: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Oct 28 22:04:30 klimt kernel: Modules linked in: cts rpcsec_gss_krb5 iTCO_wdt iTCO_vendor_support sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm btrfs irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd xor pcspkr raid6_pq i2c_i801 i2c_smbus lpc_ich mfd_core sg mei_me mei ioatdma shpchp wmi ipmi_si ipmi_msghandler rpcrdma ib_ipoib rdma_ucm acpi_power_meter acpi_pad ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb ahci libahci ptp mlx4_core pps_core dca libata i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod Oct 28 22:04:30 klimt kernel: CPU: 7 PID: 1558 Comm: nfsd Not tainted 4.9.0-rc2-00005-g82cd754 #8 Oct 28 22:04:30 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015 Oct 28 22:04:30 klimt kernel: task: ffff880835c3a100 task.stack: ffff8808420d8000 Oct 28 22:04:30 klimt kernel: RIP: 0010:[<ffffffffa05a759f>] [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd] Oct 28 22:04:30 klimt kernel: RSP: 0018:ffff8808420dbce0 EFLAGS: 00010246 Oct 28 22:04:30 klimt kernel: RAX: ffff88084e6660f0 RBX: ffff88084e667020 RCX: 0000000000000000 Oct 28 22:04:30 klimt kernel: RDX: 0000000000000007 RSI: 0000000000000000 RDI: ffff88084e667020 Oct 28 22:04:30 klimt kernel: RBP: ffff8808420dbcf8 R08: 0000000000000001 R09: 0000000000000000 Oct 28 22:04:30 klimt kernel: R10: ffff880835c3a100 R11: ffff880835c3aca8 R12: 6b6b6b6b6b6b6b6b Oct 28 22:04:30 klimt kernel: R13: ffff88084e6670d8 R14: ffff880835f546f0 R15: ffff880835f1c548 Oct 28 22:04:30 klimt kernel: FS: 0000000000000000(0000) GS:ffff88087bdc0000(0000) knlGS:0000000000000000 Oct 28 22:04:30 klimt kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 28 22:04:30 klimt kernel: CR2: 00007ff020389000 CR3: 0000000001c06000 CR4: 00000000001406e0 Oct 28 22:04:30 klimt kernel: Stack: Oct 28 22:04:30 klimt kernel: ffff88084e667020 0000000000000000 ffff88084e6670d8 ffff8808420dbd20 Oct 28 22:04:30 klimt kernel: ffffffffa05ac80d ffff880835f54548 ffff88084e640008 ffff880835f545b0 Oct 28 22:04:30 klimt kernel: ffff8808420dbd70 ffffffffa059803d ffff880835f1c768 0000000000000870 Oct 28 22:04:30 klimt kernel: Call Trace: Oct 28 22:04:30 klimt kernel: [<ffffffffa05ac80d>] nfsd4_free_stateid+0xfd/0x1b0 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa059803d>] nfsd4_proc_compound+0x40d/0x690 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa0583114>] nfsd_dispatch+0xd4/0x1d0 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa047bbf9>] svc_process_common+0x3d9/0x700 [sunrpc] Oct 28 22:04:30 klimt kernel: [<ffffffffa047ca64>] svc_process+0xf4/0x330 [sunrpc] Oct 28 22:04:30 klimt kernel: [<ffffffffa05827ca>] nfsd+0xfa/0x160 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffffa05826d0>] ? nfsd_destroy+0x170/0x170 [nfsd] Oct 28 22:04:30 klimt kernel: [<ffffffff810b367b>] kthread+0x10b/0x120 Oct 28 22:04:30 klimt kernel: [<ffffffff810b3570>] ? kthread_stop+0x280/0x280 Oct 28 22:04:30 klimt kernel: [<ffffffff8174e8ba>] ret_from_fork+0x2a/0x40 Oct 28 22:04:30 klimt kernel: Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 8b 87 b0 00 00 00 48 89 fb 4c 8b a0 98 00 00 00 <49> 8b 44 24 20 48 8d b8 80 03 00 00 e8 10 66 1a e1 48 89 df e8 Oct 28 22:04:30 klimt kernel: RIP [<ffffffffa05a759f>] release_lock_stateid+0x1f/0x60 [nfsd] Oct 28 22:04:30 klimt kernel: RSP <ffff8808420dbce0> Oct 28 22:04:30 klimt kernel: ---[ end trace cf5d0b371973e167 ]--- Jeff Layton says: > Hm...now that I look though, this is a little suspicious: > > struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner); > > I wonder if it's possible for the openstateid to have already been > destroyed at this point. > > We might be better off doing something like this to get the client pointer: > > stp->st_stid.sc_client; > > ...which should be more direct and less dependent on other stateids > staying valid. With the suggested change, I am no longer able to reproduce the above oops. v2: Fix unhash_lock_stateid() as well Fix-suggested-by: Jeff Layton <jlayton@redhat.com> Fixes: 42691398be08 ('nfsd: Fix race between FREE_STATEID and LOCK') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-10-31ovl: fsync after copy-upMiklos Szeredi
Make sure the copied up file hits the disk before renaming to the final destination. If this is not done then the copy-up may corrupt the data in the file in case of a crash. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org>
2016-10-31ovl: fix get_acl() on tmpfsMiklos Szeredi
tmpfs doesn't have ->get_acl() because it only uses cached acls. This fixes the acl tests in pjdfstest when tmpfs is used as the upper layer of the overlay. Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 39a25b2b3762 ("ovl: define ->get_acl() for overlay inodes") Cc: <stable@vger.kernel.org> # v4.8
2016-10-31ovl: update S_ISGID when setting posix ACLsMiklos Szeredi
This change fixes xfstest generic/375, which failed to clear the setgid bit in the following test case on overlayfs: touch $testfile chown 100:100 $testfile chmod 2755 $testfile _runas -u 100 -g 101 -- setfacl -m u::rwx,g::rwx,o::rwx $testfile Reported-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Amir Goldstein <amir73il@gmail.com> Fixes: d837a49bd57f ("ovl: fix POSIX ACL setting") Cc: <stable@vger.kernel.org> # v4.8
2016-10-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Lots of fixes, mostly drivers as is usually the case. 1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey Khoroshilov. 2) Fix element timeouts in netfilter's nft_dynset, from Anders K. Pedersen. 3) Don't put aead_req crypto struct on the stack in mac80211, from Ard Biesheuvel. 4) Several uninitialized variable warning fixes from Arnd Bergmann. 5) Fix memory leak in cxgb4, from Colin Ian King. 6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann. 7) Several VRF semantic fixes from David Ahern. 8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper. 9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet. 10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev. 11) Fix stale link state during failover in NCSCI driver, from Gavin Shan. 12) Fix netdev lower adjacency list traversal, from Ido Schimmel. 13) Propvide proper handle when emitting notifications of filter deletes, from Jamal Hadi Salim. 14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen. 15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac. 16) Several routing offload fixes in mlxsw driver, from Jiri Pirko. 17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy. 18) Validate chunk len before using it in SCTP, from Marcelo Ricardo Leitner. 19) Revert a netns locking change that causes regressions, from Paul Moore. 20) Add recursion limit to GRO handling, from Sabrina Dubroca. 21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon. 22) Avoid accessing stale vxlan/geneve socket in data path, from Pravin Shelar" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits) geneve: avoid using stale geneve socket. vxlan: avoid using stale vxlan socket. qede: Fix out-of-bound fastpath memory access net: phy: dp83848: add dp83822 PHY support enic: fix rq disable tipc: fix broadcast link synchronization problem ibmvnic: Fix missing brackets in init_sub_crq_irqs ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context" arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold net/mlx4_en: Save slave ethtool stats command net/mlx4_en: Fix potential deadlock in port statistics flow net/mlx4: Fix firmware command timeout during interrupt test net/mlx4_core: Do not access comm channel if it has not yet been initialized net/mlx4_en: Fix panic during reboot net/mlx4_en: Process all completions in RX rings after port goes up net/mlx4_en: Resolve dividing by zero in 32-bit system net/mlx4_core: Change the default value of enable_qos net/mlx4_core: Avoid setting ports to auto when only one port type is supported net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec ...
2016-10-29Merge tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull ubi/ubifs fixes from Richard Weinberger: "This contains fixes for issues in both UBI and UBIFS: - A regression wrt overlayfs, introduced in -rc2. - An UBI issue, found by Dan Carpenter's static checker" * tag 'upstream-4.9-rc3' of git://git.infradead.org/linux-ubifs: ubifs: Fix regression in ubifs_readdir() ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()
2016-10-29Merge tag 'driver-core-4.9-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are two small driver core / kernfs fixes for 4.9-rc3. One makes the Kconfig entry for DEBUG_TEST_DRIVER_REMOVE a bit more explicit that this is a crazy thing to enable for a distro kernel (thanks for trying Fedora!), the other resolves an issue with vim opening kernfs files (sysfs, configfs, etc.) Both have been in linux-next with no reported issues" * tag 'driver-core-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: driver core: Make Kconfig text for DEBUG_TEST_DRIVER_REMOVE stronger kernfs: Add noop_fsync to supported kernfs_file_fops
2016-10-28Merge branch 'for-linus-4.9' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "My patch fixes the btrfs list_head abuse that we tracked down during Dave Jones' memory corruption investigation. With both Jens and my patches in place, I'm no longer able to trigger problems. Filipe is fixing a difficult old bug between snapshots, balance and send. Dave is cooking a few more for the next rc, but these are tested and ready" * 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix races on root_log_ctx lists btrfs: fix incremental send failure caused by balance
2016-10-28ubifs: Fix regression in ubifs_readdir()Richard Weinberger
Commit c83ed4c9dbb35 ("ubifs: Abort readdir upon error") broke overlayfs support because the fix exposed an internal error code to VFS. Reported-by: Peter Rosin <peda@axentia.se> Tested-by: Peter Rosin <peda@axentia.se> Reported-by: Ralph Sennhauser <ralph.sennhauser@gmail.com> Tested-by: Ralph Sennhauser <ralph.sennhauser@gmail.com> Fixes: c83ed4c9dbb35 ("ubifs: Abort readdir upon error") Cc: stable@vger.kernel.org Signed-off-by: Richard Weinberger <richard@nod.at>
2016-10-27Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "20 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: drivers/misc/sgi-gru/grumain.c: remove bogus 0x prefix from printk cris/arch-v32: cryptocop: print a hex number after a 0x prefix ipack: print a hex number after a 0x prefix block: DAC960: print a hex number after a 0x prefix fs: exofs: print a hex number after a 0x prefix lib/genalloc.c: start search from start of chunk mm: memcontrol: do not recurse in direct reclaim CREDITS: update credit information for Martin Kepplinger proc: fix NULL dereference when reading /proc/<pid>/auxv mm: kmemleak: ensure that the task stack is not freed during scanning lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB latent_entropy: raise CONFIG_FRAME_WARN by default kconfig.h: remove config_enabled() macro ipc: account for kmem usage on mqueue and msg mm/slab: improve performance of gathering slabinfo stats mm: page_alloc: use KERN_CONT where appropriate mm/list_lru.c: avoid error-path NULL pointer deref h8300: fix syscall restarting kcov: properly check if we are in an interrupt mm/slab: fix kmemcg cache creation delayed issue
2016-10-27fs: exofs: print a hex number after a 0x prefixUwe Kleine-König
It makes the message hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-2-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Boaz Harrosh <ooo@electrozaur.com> Cc: Benny Halevy <bhalevy@primarydata.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27proc: fix NULL dereference when reading /proc/<pid>/auxvLeon Yu
Reading auxv of any kernel thread results in NULL pointer dereferencing in auxv_read() where mm can be NULL. Fix that by checking for NULL mm and bailing out early. This is also the original behavior changed by recent commit c5317167854e ("proc: switch auxv to use of __mem_open()"). # cat /proc/2/auxv Unable to handle kernel NULL pointer dereference at virtual address 000000a8 Internal error: Oops: 17 [#1] PREEMPT SMP ARM CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1 Hardware name: BCM2709 task: ea3b0b00 task.stack: e99b2000 PC is at auxv_read+0x24/0x4c LR is at do_readv_writev+0x2fc/0x37c Process cat (pid: 113, stack limit = 0xe99b2210) Call chain: auxv_read do_readv_writev vfs_readv default_file_splice_read splice_direct_to_actor do_splice_direct do_sendfile SyS_sendfile64 ret_fast_syscall Fixes: c5317167854e ("proc: switch auxv to use of __mem_open()") Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com Signed-off-by: Leon Yu <chianglungyu@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Janis Danisevskis <jdanis@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-27Merge tag 'for-linus-4.9-rc2-ofs-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull oreangefs updates from Mike Marshall: "A couple of orangefs cleanups sent in by other developers: - use d_fsdata instead of d_time (Miklos Szeredi) - use file_inode(file) instead of file->f_path.dentry->d_inode (Amir Goldstein)" * tag 'for-linus-4.9-rc2-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: don't use d_time orangefs: user file_inode() where it is due
2016-10-27Merge tag 'xfs-fixes-for-linus-4.9-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "This update contains fixes for most of the outstanding regressions introduced with the 4.9-rc1 XFS merge. There is also a fix for an iomap bug, too. This is a quite a bit larger than I'd prefer for a -rc3, but most of the change comes from cleaning up the new reflink copy on write code; it's much simpler and easier to understand now. These changes fixed several bugs in the new code, and it wasn't clear that there was an easier/simpler way to fix them. The rest of the fixes are the usual size you'd expect at this stage. I've left the commits to soak in linux-next for a some extra time because of the size before asking you to pull, no new problems with them have been reported so I think it's all OK. Summary: - iomap page offset masking fix for page faults - add IOMAP_REPORT to distinguish between read and fiemap map requests - cleanups to new shared data extent code - fix mount active status on failed log recovery - fix broken dquots in a buffer calculation - fix locking order issues and merge xfs_reflink_remap_range and xfs_file_share_range - rework unmapping of CoW extents and remove now unused functions - clean state when CoW is done" * tag 'xfs-fixes-for-linus-4.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (25 commits) xfs: clear cowblocks tag when cow fork is emptied xfs: fix up inode cowblocks tracking tracepoints fs: Do to trim high file position bits in iomap_page_mkwrite_actor xfs: remove xfs_bunmapi_cow xfs: optimize xfs_reflink_end_cow xfs: optimize xfs_reflink_cancel_cow_blocks xfs: refactor xfs_bunmapi_cow xfs: optimize writes to reflink files xfs: don't bother looking at the refcount tree for reads xfs: handle "raw" delayed extents xfs_reflink_trim_around_shared xfs: add xfs_trim_extent iomap: add IOMAP_REPORT xfs: merge xfs_reflink_remap_range and xfs_file_share_range xfs: remove xfs_file_wait_for_io xfs: move inode locking from xfs_reflink_remap_range to xfs_file_share_range xfs: fix the same_inode check in xfs_file_share_range xfs: remove the same fs check from xfs_file_share_range libxfs: v3 inodes are only valid on crc-enabled filesystems libxfs: clean up _calc_dquots_per_chunk xfs: unset MS_ACTIVE if mount fails ...
2016-10-27btrfs: fix races on root_log_ctx listsChris Mason
btrfs_remove_all_log_ctxs takes a shortcut where it avoids walking the list because it knows all of the waiters are patiently waiting for the commit to finish. But, there's a small race where btrfs_sync_log can remove itself from the list if it finds a log commit is already done. Also, it uses list_del_init() to remove itself from the list, but there's no way to know if btrfs_remove_all_log_ctxs has already run, so we don't know for sure if it is safe to call list_del_init(). This gets rid of all the shortcuts for btrfs_remove_all_log_ctxs(), and just calls it with the proper locking. This is part two of the corruption fixed by cbd60aa7cd1. I should have done this in the first place, but convinced myself the optimizations were safe. A 12 hour run of dbench 2048 will eventually trigger a list debug WARN_ON for the list_del_init() in btrfs_sync_log(). Fixes: d1433debe7f4346cf9fc0dafc71c3137d2a97bc4 Reported-by: Dave Jones <davej@codemonkey.org.uk> cc: stable@vger.kernel.org # 3.15+ Signed-off-by: Chris Mason <clm@fb.com>
2016-10-27kernfs: Add noop_fsync to supported kernfs_file_fopsTony Luck
If you edit a kernfs backed file with vi(1), you see an ugly error message when you write the file because vi tries to fsync(2) the file after writing, which fails. We have noop_fsync() for this, use it. Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>