summaryrefslogtreecommitdiff
path: root/fs/ext4/namei.c
AgeCommit message (Collapse)Author
2012-12-27ext4: avoid hang when mounting non-journal filesystems with orphan listTheodore Ts'o
When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
2012-12-10ext4: Remove CONFIG_EXT4_FS_XATTRTao Ma
Ted has sent out a RFC about removing this feature. Eric and Jan confirmed that both RedHat and SUSE enable this feature in all their product. David also said that "As far as I know, it's enabled in all Android kernels that use ext4." So it seems OK for us. And what's more, as inline data depends its implementation on xattr, and to be frank, I don't run any test again inline data enabled while xattr disabled. So I think we should add inline data and remove this config option in the same release. [ The savings if you disable CONFIG_EXT4_FS_XATTR is only 27k, which isn't much in the grand scheme of things. Since no one seems to be testing this configuration except for some automated compile farms, on balance we are better removing this config option, and so that it is effectively always enabled. -- tytso ] Cc: David Brown <davidb@codeaurora.org> Cc: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: let ext4_rename handle inline dirTao Ma
In case we rename a directory, ext4_rename has to read the dir block and change its dotdot's information. The old ext4_rename encapsulated the dir_block read into itself. So this patch adds a new function ext4_get_first_dir_block() which gets the dir buffer information so the ext4_rename can handle it properly. As it will also change the parent inode number, we return the parent_de so that ext4_rename() can handle it more easily. ext4_find_entry is also changed so that the caller(rename) can tell whether the found entry is an inlined one or not and journaling the corresponding buffer head. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: let empty_dir handle inline dirTao Ma
empty_dir is used when deleting a dir. So it should handle inline dir properly. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: let ext4_delete_entry() handle inline dataTao Ma
Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: make ext4_delete_entry genericTao Ma
Currently ext4_delete_entry() is used only for dir entry removing from a dir block. So let us create a new function ext4_generic_delete_entry and this function takes a entry_buf and a buf_size so that it can be used for inline data. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: let ext4_find_entry handle inline dataTao Ma
Create a new function ext4_find_inline_entry() to handle the case of inline data. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: create a new function search_dirTao Ma
search_dirblock is used to search a dir block, but the code is almost the same for searching an inline dir. So create a new fuction search_dir and let search_dirblock call it. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: let add_dir_entry handle inline data properlyTao Ma
This patch let add_dir_entry handle the inline data case. So the dir is initialized as inline dir first and then we can try to add some files to it, when the inline space can't hold all the entries, a dir block will be created and the dir entry will be moved to it. Also for an inlined dir, "." and ".." are removed and we only use 4 bytes to store the parent inode number. These 2 entries will be added when we convert an inline dir to a block-based one. [ Folded in patch from Dan Carpenter to remove an unused variable. ] Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: create __ext4_insert_dentry for dir entry insertionTao Ma
The old add_dirent_to_buf handles all the work related to the work of adding dir entry to a dir block. Now we have inline data, so create 2 new function __ext4_find_dest_de and __ext4_insert_dentry that do the real work and let add_dirent_to_buf call them. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: refactor __ext4_check_dir_entry() to accept start and sizeTao Ma
The __ext4_check_dir_entry() function() is used to check whether the de is over the block boundary. Now with inline data, it could be within the block boundary while exceeds the inode size. So check this function to check the overflow more precisely. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-10ext4: make ext4_init_dot_dotdot for inline dir usageTao Ma
Currently, the initialization of dot and dotdot are encapsulated in ext4_mkdir and also bond with dir_block. So create a new function named ext4_init_new_dir and the initialization is moved to ext4_init_dot_dotdot. Now it will called either in the normal non-inline case(rec_len of ".." will cover the whole block) or when we converting an inline dir to a block(rec len of ".." will be the real length). The start of the next entry is also returned for inline dir usage. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-12ext4: don't verify checksums of dx non-leaf nodes during fallback scanDarrick J. Wong
During a directory entry lookup of a hashed directory, if the hash-based lookup functions fail and we fall back to a linear scan, don't try to verify the dirent checksum on the internal nodes of the hash tree because they don't store a checksum in a hidden dirent like the leaf nodes do. Reported-by: George Spelvin <linux@horizon.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-10ext4: do not use ext4_error() when there is no space in dir leaf for csumTheodore Ts'o
If there is no space for a checksum in a directory leaf node, previously we would use EXT4_ERROR_INODE() which would mark the file system as inconsistent. While it would be nice to use e2fsck -D, it certainly isn't required, so just print a warning using ext4_warning(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
2012-09-27ext4: ext4_bread usage auditCarlos Maiolino
When ext4_bread() returns NULL and err is set to zero, this means there is no phyical block mapped to the specified logical block number. (Previous to commit 90b0a97323, err was uninitialized in this case, which caused other problems.) The directory handling routines use ext4_bread() in many places, the fact that ext4_bread() now returns NULL with err set to zero could cause problems since a number of these functions will simply return the value of err if the result of ext4_bread() was the NULL pointer, causing the caller of the function to think that the function was successful. Since directories should never contain holes, this case can only happen if the file system is corrupted. This commit audits all of the callers of ext4_bread(), and makes sure they do the right thing if a hole in a directory is found by ext4_bread(). Some ext4_bread() callers did not need any changes either because they already had its own hole detector paths. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-26ext4: always set i_op in ext4_mknod()Bernd Schubert
ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR to mask those methods. And ext4_iget also always sets it, so there is an inconsistency. Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
2012-09-18ext4: make orphan functions be no-op in no-journal modeAnatol Pomozov
Instead of checking whether the handle is valid, we check if journal is enabled. This avoids taking the s_orphan_lock mutex in all cases when there is no journal in use, including the error paths where ext4_orphan_del() is called with a handle set to NULL. Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-09-17ext4: fix possible non-initialized variable in htree_dirblock_to_tree()Carlos Maiolino
htree_dirblock_to_tree() declares a non-initialized 'err' variable, which is passed as a reference to another functions expecting them to set this variable with their error codes. It's passed to ext4_bread(), which then passes it to ext4_getblk(). If ext4_map_blocks() returns 0 due to a lookup failure, leaving the ext4_getblk() buffer_head uninitialized, it will make ext4_getblk() return to ext4_bread() without initialize the 'err' variable, and ext4_bread() will return to htree_dirblock_to_tree() with this variable still uninitialized. htree_dirblock_to_tree() will pass this variable with garbage back to ext4_htree_fill_tree(), which expects a number of directory entries added to the rb-tree. which, in case, might return a fake non-zero value due the garbage left in the 'err' variable, leading the kernel to an Oops in ext4_dx_readdir(), once this is expecting a filled rb-tree node, when in turn it will have a NULL-ed one, causing an invalid page request when trying to get a fname struct from this NULL-ed rb-tree node in this line: fname = rb_entry(info->curr_node, struct fname, rb_hash); The patch itself initializes the err variable in htree_dirblock_to_tree() to avoid usage mistakes by the called functions, and also fix ext4_getblk() to return a initialized 'err' variable when ext4_map_blocks() fails a lookup. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-08-17ext4: add max_dir_size_kb mount optionTheodore Ts'o
Very large directories can cause significant performance problems, or perhaps even invoke the OOM killer, if the process is running in a highly constrained memory environment (whether it is VM's with a small amount of memory or in a small memory cgroup). So it is useful, in cloud server/data center environments, to be able to set a filesystem-wide cap on the maximum size of a directory, to ensure that directories never get larger than a sane size. We do this via a new mount option, max_dir_size_kb. If there is an attempt to grow the directory larger than max_dir_size_kb, the system call will return ENOSPC instead. Google-Bug-Id: 6863013 Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-07-27Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "The usual collection of bug fixes and optimizations. Perhaps of greatest note is a speed up for parallel, non-allocating DIO writes, since we no longer take the i_mutex lock in that case. For bug fixes, we fix an incorrect overhead calculation which caused slightly incorrect results for df(1) and statfs(2). We also fixed bugs in the metadata checksum feature." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: undo ext4_calc_metadata_amount if we fail to claim space ext4: don't let i_reserved_meta_blocks go negative ext4: fix hole punch failure when depth is greater than 0 ext4: remove unnecessary argument from __ext4_handle_dirty_metadata() ext4: weed out ext4_write_super ext4: remove unnecessary superblock dirtying ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super() ext4: remove useless marking of superblock dirty ext4: fix ext4 mismerge back in January ext4: remove dynamic array size in ext4_chksum() ext4: remove unused variable in ext4_update_super() ext4: make quota as first class supported feature ext4: don't take the i_mutex lock when doing DIO overwrites ext4: add a new nolock flag in ext4_map_blocks ext4: split ext4_file_write into buffered IO and direct IO ext4: remove an unused statement in ext4_mb_get_buddy_page_lock() ext4: fix out-of-date comments in extents.c ext4: use s_csum_seed instead of i_csum_seed for xattr block ext4: use proper csum calculation in ext4_rename ext4: fix overhead calculation used by ext4_statfs() ...
2012-07-22ext4: remove unnecessary argument from __ext4_handle_dirty_metadata()Artem Bityutskiy
The '__ext4_handle_dirty_metadata()' does not need the 'now' argument anymore and we can kill it. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2012-07-23don't expose I_NEW inodes via dentry->d_inodeAl Viro
d_instantiate(dentry, inode); unlock_new_inode(inode); is a bad idea; do it the other way round... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14don't pass nameidata to ->create()Al Viro
boolean "does it have to be exclusive?" flag is passed instead; Local filesystem should just ignore it - the object is guaranteed not to be there yet. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-14stop passing nameidata to ->lookup()Al Viro
Just the flags; only NFS cares even about that, but there are legitimate uses for such argument. And getting rid of that completely would require splitting ->lookup() into a couple of methods (at least), so let's leave that alone for now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-09ext4: use proper csum calculation in ext4_renameTao Ma
In ext4_rename, when the old name is a dir, we need to change ".." to its new parent and journal the change, so with metadata_csum enabled, we have to re-calc the csum. As the first block of the dir can be either a htree root or a normal directory block and we have different csum calculation for these 2 types, we have to choose the right one in ext4_rename. btw, it is found by xfstests 013. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Darrick J. Wong <djwong@us.ibm.com>
2012-06-01Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull Ext4 updates from Theodore Ts'o: "The major new feature added in this update is Darrick J Wong's metadata checksum feature, which adds crc32 checksums to ext4's metadata fields. There is also the usual set of cleanups and bug fixes." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits) ext4: hole-punch use truncate_pagecache_range jbd2: use kmem_cache_zalloc wrapper instead of flag ext4: remove mb_groups before tearing down the buddy_cache ext4: add ext4_mb_unload_buddy in the error path ext4: don't trash state flags in EXT4_IOC_SETFLAGS ext4: let getattr report the right blocks in delalloc+bigalloc ext4: add missing save_error_info() to ext4_error() ext4: add debugging trigger for ext4_error() ext4: protect group inode free counting with group lock ext4: use consistent ssize_t type in ext4_file_write() ext4: fix format flag in ext4_ext_binsearch_idx() ext4: cleanup in ext4_discard_allocated_blocks() ext4: return ENOMEM when mounts fail due to lack of memory ext4: remove redundundant "(char *) bh->b_data" casts ext4: disallow hard-linked directory in ext4_lookup ext4: fix potential integer overflow in alloc_flex_gd() ext4: remove needs_recovery in ext4_mb_init() ext4: force ro mount if ext4_setup_super() fails ext4: fix potential NULL dereference in ext4_free_inodes_counts() ext4/jbd2: add metadata checksumming to the list of supported features ...
2012-05-28ext4: disallow hard-linked directory in ext4_lookupAndreas Dilger
A hard-linked directory to its parent can cause the VFS to deadlock, and is a sign of a corrupted file system. So detect this case in ext4_lookup(), before the rmdir() lockup scenario can take place. Signed-off-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2012-05-10vfs: make it possible to access the dentry hash/len as one 64-bit entryLinus Torvalds
This allows comparing hash and len in one operation on 64-bit architectures. Right now only __d_lookup_rcu() takes advantage of this, since that is the case we care most about. The use of anonymous struct/unions hides the alternate 64-bit approach from most users, the exception being a few cases where we initialize a 'struct qstr' with a static initializer. This makes the problematic cases use a new QSTR_INIT() helper function for that (but initializing just the name pointer with a "{ .name = xyzzy }" initializer remains valid, as does just copying another qstr structure). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-30ext4: remove unnecessary check in add_dirent_to_buf()Theodore Ts'o
None of this function callers ever pass in a NULL inode pointer, so this check is unnecessary, and the else clause is dead code. (This change should make the code coverage people a little happier. :-) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-04-29ext4: calculate and verify checksums of directory leaf blocksDarrick J. Wong
Calculate and verify the checksums for directory leaf blocks (i.e. blocks that only contain actual directory entries). The checksum lives in what looks to be an unused directory entry with a 0 name_len at the end of the block. This scheme is not used for internal htree nodes because the mechanism in place there only costs one dx_entry, whereas the "empty" directory entry would cost two dx_entries. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-04-29ext4: Calculate and verify checksums for htree nodesDarrick J. Wong
Calculate and verify the checksum for directory index tree (htree) node blocks. The checksum is stored in the last 4 bytes of the htree block and requires the dx_entry array to stop 1 dx_entry short of the end of the block. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-04-29ext4: calculate and verify superblock checksumDarrick J. Wong
Calculate and verify the superblock checksum. Since the UUID and block group number are embedded in each copy of the superblock, we need only checksum the entire block. Refactor some of the code to eliminate open-coding of the checksum update call. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-04-29ext4: change on-disk layout to support extended metadata checksummingDarrick J. Wong
Define flags and change structure definitions to allow checksumming of ext4 metadata. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-02-20ext4: format flag in dx_probe()Zheng Liu
Fix ext4_warning format flag in dx_probe(). CC: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-01-08ext[34]: avoid i_nlink warnings triggered by drop_nlink/inc_nlink kludge in ↵Al Viro
symlink() Both ext3 and ext4 put the half-created symlink inode into the orphan list for a while (see the comment in ext[34]_symlink() for gory details). Then, if everything went fine, they pull it out of the orphan list and bump the link count back to 1. The thing is, inc_nlink() is going to complain about seeing somebody changing i_nlink from 0 to 1. With a good reason, since normally something like that is a bug. Explicit set_nlink(inode, 1) does the same thing as inc_nlink() here, but it does *not* complain - exactly because it should be usable in strange situations like this one. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch ->mknod() to umode_tAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch ->create() to umode_tAl Viro
vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch vfs_mkdir() and ->mkdir() to umode_tAl Viro
vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-11-02Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: vfs: add d_prune dentry operation vfs: protect i_nlink filesystems: add set_nlink() filesystems: add missing nlink wrappers logfs: remove unnecessary nlink setting ocfs2: remove unnecessary nlink setting jfs: remove unnecessary nlink setting hypfs: remove unnecessary nlink setting vfs: ignore error on forced remount readlinkat: ensure we return ENOENT for the empty pathname for normal lookups vfs: fix dentry leak in simple_fill_super()
2011-11-02Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits) jbd2: Unify log messages in jbd2 code jbd/jbd2: validate sb->s_first in journal_get_superblock() ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled ext4: fix a typo in struct ext4_allocation_context ext4: Don't normalize an falloc request if it can fit in 1 extent. ext4: remove comments about extent mount option in ext4_new_inode() ext4: let ext4_discard_partial_buffers handle unaligned range correctly ext4: return ENOMEM if find_or_create_pages fails ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock() ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten ext4: optimize locking for end_io extent conversion ext4: remove unnecessary call to waitqueue_active() ext4: Use correct locking for ext4_end_io_nolock() ext4: fix race in xattr block allocation path ext4: trace punch_hole correctly in ext4_ext_map_blocks ext4: clean up AGGRESSIVE_TEST code ext4: move variables to their scope ext4: fix quota accounting during migration ext4: migrate cleanup ...
2011-11-02filesystems: add set_nlink()Miklos Szeredi
Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-11-02filesystems: add missing nlink wrappersMiklos Szeredi
Replace direct i_nlink updates with the respective updater function (inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2011-10-29ext4: fix quota accounting during migrationDmitry Monakhov
The tmp_inode should have same uid/gid as the original inode. Otherwise new metadata blocks will be accounted to wrong quota-id, which will result in a quota leak after the inode migration is completed. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-26ext4: avoid setting directory i_nlink to zeroAndreas Dilger
If a directory with more than EXT4_LINK_MAX subdirectories, the nlink count is set to 1. Subsequently, if any subdirectories are deleted, ext4_dec_count() decrements the i_nlink count, which may go to 0 temporarily before being incremented back to 1. While this is done under i_mutex, which prevents races for directory and inode operations that check i_nlink, the temporary i_nlink == 0 case is exposed to userspace via stat() and similar calls that do not hold i_mutex. Instead, change the code to not decrement i_nlink count for any directories that do not already have i_nlink larger than 2. Reported-by: Cliff White <cliffw@whamcloud.com> Reviewed-by: Johann Lombardi <johann@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-08-31ext4: call ext4_handle_dirty_metadata with correct inode in ext4_dx_add_entryTheodore Ts'o
ext4_dx_add_entry manipulates bh2 and frames[0].bh, which are two buffer_heads that point to directory blocks assigned to the directory inode. However, the function calls ext4_handle_dirty_metadata with the inode of the file that's being added to the directory, not the directory inode itself. Therefore, correct the code to dirty the directory buffers with the directory inode, not the file inode. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-31ext4: ext4_mkdir should dirty dir_block with newly created directory inodeDarrick J. Wong
ext4_mkdir calls ext4_handle_dirty_metadata with dir_block and the inode "dir". Unfortunately, dir_block belongs to the newly created directory (which is "inode"), not the parent directory (which is "dir"). Fix the incorrect association. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-31ext4: ext4_rename should dirty dir_bh with the correct directoryDarrick J. Wong
When ext4_rename performs a directory rename (move), dir_bh is a buffer that is modified to update the '..' link in the directory being moved (old_inode). However, ext4_handle_dirty_metadata is called with the old parent directory inode (old_dir) and dir_bh, which is incorrect because dir_bh does not belong to the parent inode. Fix this error. Signed-off-by: Darrick J. Wong <djwong@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
2011-08-23block: separate priority boosting from REQ_METAChristoph Hellwig
Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule, and lave REQ_META purely for marking requests as metadata in blktrace. All existing callers of REQ_META except for XFS are updated to also set REQ_PRIO for now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-23block: remove READ_META and WRITE_METAChristoph Hellwig
Replace all occurnanced of the undocumented READ_META with READ | REQ_META and remove the unused WRITE_META define. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-08-11ext4: Properly count journal credits for long symlinksEric Sandeen
Commit df5e6223407e ("ext4: fix deadlock in ext4_symlink() in ENOSPC conditions") recalculated the number of credits needed for a long symlink, in the process of splitting it into two transactions. However, the first credit calculation under-counted because if selinux is enabled, credits are needed to create the selinux xattr as well. Overrunning the reservation will result in an OOPS in jbd2_journal_dirty_metadata() due to this assert: J_ASSERT_JH(jh, handle->h_buffer_credits > 0); Fix this by increasing the reservation size. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>