summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2010-02-28exofs: RAID0 supportBoaz Harrosh
We now support striping over mirror devices. Including variable sized stripe_unit. Some limits: * stripe_unit must be a multiple of PAGE_SIZE * stripe_unit * stripe_count is maximum upto 32-bit (4Gb) Tested RAID0 over mirrors, RAID0 only, mirrors only. All check. Design notes: * I'm not using a vectored raid-engine mechanism yet. Following the pnfs-objects-layout data-map structure, "Mirror" is just a private case of "group_width" == 1, and RAID0 is a private case of "Mirrors" == 1. The performance lose of the general case over the particular special case optimization is totally negligible, also considering the extra code size. * In general I added a prepare_stripes() stage that divides the to-be-io pages to the participating devices, the previous exofs_ios_write/read, now becomes _write/read_mirrors and a new write/read upper layer loops on all devices calling _write/read_mirrors. Effectively the prepare_stripes stage is the all secret. Also truncate need fixing to accommodate for striping. * In a RAID0 arrangement, in a regular usage scenario, if all inode layouts will start at the same device, the small files fill up the first device and the later devices stay empty, the farther the device the emptier it is. To fix that, each inode will start at a different stripe_unit, according to it's obj_id modulus number-of-stripe-units. And will then span all stripe-units in the same incrementing order wrapping back to the beginning of the device table. We call it a stripe-units moving window. Special consideration was taken to keep all devices in a mirror arrangement identical. So a broken osd-device could just be cloned from one of the mirrors and no FS scrubbing is needed. (We do that by rotating stripe-unit at a time and not a single device at a time.) TODO: We no longer verify object_length == inode->i_size in exofs_iget. (since i_size is stripped on multiple objects now). I should introduce a multiple-device attribute reading, and use it in exofs_iget. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28exofs: Define on-disk per-inode optional layout attributeBoaz Harrosh
* Layouts describe the way a file is spread on multiple devices. The layout information is stored in the objects attribute introduced in this patch. * There can be multiple generating function for the layout. Currently defined: - No attribute present - use below moving-window on global device table, all devices. (This is the only one currently used in exofs) - an obj_id generated moving window - the obj_id is a randomizing factor in the otherwise global map layout. - An explicit layout stored, including a data_map and a device index list. - More might be defined in future ... * There are two attributes defined of the same structure: A-data-files-layout - This layout is used by data-files. If present at a directory, all files of that directory will be created with this layout. A-meta-data-layout - This layout is used by a directory and other meta-data information. Also inherited at creation of subdirectories. * At creation time inodes are created with the layout specified above. A usermode utility may change the creation layout on a give directory or file. Which in the case of directories, will also apply to newly created files/subdirectories, children of that directory. In the simple unaltered case of a newly created exofs, no layout attributes are present, and all layouts adhere to the layout specified at the device-table. * In case of a future file system loaded in an old exofs-driver. At iget(), the generating_function is inspected and if not supported will return an IO error to the application and the inode will not be loaded. So not to damage any data. Note: After this patch we do not yet support any type of layout only the RAID0 patch that enables striping at the super-block level will add support for RAID0 layouts above. This way we are past and future compatible and fully bisectable. * Access to the device table is done by an accessor since it will change according to above information. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28exofs: unindent exofs_sbi_readBoaz Harrosh
The original idea was that a mirror read can be sub-divided to multiple devices. But this has very little gain and only at very large IOes so it's not going to be implemented soon. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28exofs: Move layout related members to a layout structureBoaz Harrosh
* Abstract away those members in exofs_sb_info that are related/needed by a layout into a new exofs_layout structure. Embed it in exofs_sb_info. * At exofs_io_state receive/keep a pointer to an exofs_layout. No need for an exofs_sb_info pointer, all we need is at exofs_layout. * Change any usage of above exofs_sb_info members to their new name. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28exofs: Recover in the case of read-passed-end-of-fileBoaz Harrosh
In check_io, implement the case of reading passed end of file, by clearing the pages and recover with no error. In a raid arrangement this can become a legitimate situation in case of holes in the file. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28exofs: Micro-optimize exofs_i_infoBoaz Harrosh
optimize the exofs_i_info struct usage by moving the embedded vfs_inode to be first. A compiler might optimize away an "add" operation with constant zero. (Which it cannot with other constants) Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28exofs: debug print even lessBoaz Harrosh
* Last debug trimming left in some stupid print, remove them. Fixup some other prints * Shift printing from inode.c to ios.c * Add couple of prints when memory allocation fails. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-01-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: check total number of devices when removing missing Btrfs: check return value of open_bdev_exclusive properly Btrfs: do not mark the chunk as readonly if in degraded mode Btrfs: run orphan cleanup on default fs root Btrfs: fix a memory leak in btrfs_init_acl Btrfs: Use correct values when updating inode i_size on fallocate Btrfs: remove tree_search() in extent_map.c Btrfs: Add mount -o compress-force
2010-01-29Split 'flush_old_exec' into two functionsLinus Torvalds
'flush_old_exec()' is the point of no return when doing an execve(), and it is pretty badly misnamed. It doesn't just flush the old executable environment, it also starts up the new one. Which is very inconvenient for things like setting up the new personality, because we want the new personality to affect the starting of the new environment, but at the same time we do _not_ want the new personality to take effect if flushing the old one fails. As a result, the x86-64 '32-bit' personality is actually done using this insane "I'm going to change the ABI, but I haven't done it yet" bit (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the personality, but just the "pending" bit, so that "flush_thread()" can do the actual personality magic. This patch in no way changes any of that insanity, but it does split the 'flush_old_exec()' function up into a preparatory part that can fail (still called flush_old_exec()), and a new part that will actually set up the new exec environment (setup_new_exec()). All callers are changed to trivially comply with the new world order. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-28Btrfs: check total number of devices when removing missingJosef Bacik
If you have a disk failure in RAID1 and then add a new disk to the array, and then try to remove the missing volume, it will fail. The reason is the sanity check only looks at the total number of rw devices, which is just 2 because we have 2 good disks and 1 bad one. Instead check the total number of devices in the array to make sure we can actually remove the device. Tested this with a failed disk setup and with this test we can now run btrfs-vol -r missing /mount/point and it works fine. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: check return value of open_bdev_exclusive properlyJosef Bacik
Hit this problem while testing RAID1 failure stuff. open_bdev_exclusive returns ERR_PTR(), not NULL. So change the return value properly. This is important if you accidently specify a device that doesn't exist when trying to add a new device to an array, you will panic the box dereferencing bdev. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: do not mark the chunk as readonly if in degraded modeJosef Bacik
If a RAID setup has chunks that span multiple disks, and one of those disks has failed, btrfs_chunk_readonly will return 1 since one of the disks in that chunk's stripes is dead and therefore not writeable. So instead if we are in degraded mode, return 0 so we can go ahead and allocate stuff. Without this patch all of the block groups in a RAID1 setup will end up read-only, which will mean we can't add new disks to the array since we won't be able to make allocations. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: run orphan cleanup on default fs rootJosef Bacik
This patch revert's commit 6c090a11e1c403b727a6a8eff0b97d5fb9e95cb5 Since it introduces this problem where we can run orphan cleanup on a volume that can have orphan entries re-added. Instead of my original fix, Yan Zheng pointed out that we can just revert my original fix and then run the orphan cleanup in open_ctree after we look up the fs_root. I have tested this with all the tests that gave me problems and this patch fixes both problems. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: fix a memory leak in btrfs_init_aclYang Hongyang
In btrfs_init_acl() cloned acl is not released Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: Use correct values when updating inode i_size on fallocateAneesh Kumar K.V
commit f2bc9dd07e3424c4ec5f3949961fe053d47bc825 Author: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Date: Wed Jan 20 12:57:53 2010 +0530 Btrfs: Use correct values when updating inode i_size on fallocate Even though we allocate more, we should be updating inode i_size as per the arguments passed Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: remove tree_search() in extent_map.cMiao Xie
This patch removes tree_search() in extent_map.c because it is not called by anything. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-28Btrfs: Add mount -o compress-forceChris Mason
The default btrfs mount -o compress mode will quickly back off compressing a file if it notices that compression does not reduce the size of the data being written. This can save considerable CPU because all future writes to the file go through uncompressed. But some files are both very large and have mixed data stored in them. In that case, we want to add the ability to always try compressing data before writing it. This commit adds mount -o compress-force. A later commit will add a new inode flag that does the same thing. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-26fix oops in fs/9p late mount failureAl Viro
if 9P ->get_sb() fails late (at root inode or root dentry allocation), we'll hit its ->kill_sb() with NULL ->s_root Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26fix leak in romfs_fill_super()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26get rid of pointless checks after simple_pin_fs()Al Viro
if we'd just got success from it, vfsmount won't be NULL Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26Fix failure exits in bfs_fill_super()Al Viro
double iput(), leaks... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26fix affs parse_options()Al Viro
Error handling in that sucker got broken back in 2003. If function returns 0 on failure, it's not nice to add return -EINVAL into it. Adding return 1 on other failure exits is also not a good thing (and yes, original success exits with 1 and some of failure exits with 0 are still there; so's the original logics in callers). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26Fix remount races with symlink handling in affsAl Viro
A couple of fields in affs_sb_info is used in follow_link() and symlink() for handling AFFS "absolute" symlinks. Need locking against affs_remount() updates. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26Fix a leak in affs_fill_super()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-01-26fnctl: f_modown should call write_lock_irqsave/restoreGreg Kroah-Hartman
Commit 703625118069f9f8960d356676662d3db5a9d116 exposed that f_modown() should call write_lock_irqsave instead of just write_lock_irq so that because a caller could have a spinlock held and it would not be good to renable interrupts. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Tavis Ormandy <taviso@google.com> Cc: stable <stable@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-25Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Drop EXT4_GET_BLOCKS_UPDATE_RESERVE_SPACE flag ext4: Fix quota accounting error with fallocate ext4: Handle -EDQUOT error on write
2010-01-25eventfd - allow atomic read and waitqueue removeDavide Libenzi
KVM needs a wait to atomically remove themselves from the eventfd ->poll() wait queue head, in order to handle correctly their IRQfd deassign operation. This patch introduces such API, plus a way to read an eventfd from its context. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Avi Kivity <avi@redhat.com>
2010-01-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: tty: fix race in tty_fasync serial: serial_cs: oxsemi quirk breaks resume serial: imx: bit &/| confusion serial: Fix crash if the minimum rate of the device is > 9600 baud serial-core: resume serial hardware with no_console_suspend serial: 8250_pnp: use wildcard for serial Wacom tablets nozomi: quick fix for the close/close bug compat_ioctl: Supress "unknown cmd" message on serial /dev/console
2010-01-21Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: fs/bio.c: fix shadows sparse warning drbd: The kernel code is now equivalent to out of tree release 8.3.7 drbd: Allow online resizing of DRBD devices while peer not reachable (needs to be explicitly forced) drbd: Don't go into StandAlone mode when authentification failes because of network error drivers/block/drbd/drbd_receiver.c: correct NULL test cfq-iosched: Respect ioprio_class when preempting genhd: overlapping variable definition block: removed unused as_io_context DM: Fix device mapper topology stacking block: bdev_stack_limits wrapper block: Fix discard alignment calculation and printing block: Correct handling of bottom device misaligment drbd: check on CONFIG_LBDAF, not LBD drivers/block/drbd: Correct NULL test drbd: Silenced an assert that could triggered after changing write ordering method drbd: Kconfig fix drbd: Fix for a race between IO and a detach operation [Bugz 262] drbd: Use drbd_crypto_is_hash() instead of an open coded check
2010-01-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: ecryptfs: use after free ecryptfs: Eliminate useless code ecryptfs: fix interpose/interpolate typos in comments ecryptfs: pass matching flags to interpose as defined and used there ecryptfs: remove unnecessary d_drop calls in ecryptfs_link ecryptfs: don't ignore return value from lock_rename ecryptfs: initialize private persistent file before dereferencing pointer eCryptfs: Remove mmap from directory operations eCryptfs: Add getattr function eCryptfs: Use notify_change for truncating lower inodes
2010-01-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix possible panic on unmount Btrfs: deal with NULL acl sent to btrfs_set_acl Btrfs: fix regression in orphan cleanup Btrfs: Fix race in btrfs_mark_extent_written Btrfs, fix memory leaks in error paths Btrfs: align offsets for btrfs_ordered_update_i_size btrfs: fix missing last-entry in readdir(3)
2010-01-20compat_ioctl: Supress "unknown cmd" message on serial /dev/consoleAtsushi Nemoto
After the commit fb07a5f8 ("compat_ioctl: remove all VT ioctl handling"), I got this error message on 64-bit mips kernel with 32-bit busybox userland: ioctl32(init:1): Unknown cmd fd(0) cmd(00005600){t:'V';sz:0} arg(7fd76480) on /dev/console The cmd 5600 is VT_OPENQRY. The busybox's init issues this ioctl to know vt-console or serial-console. If the console was serial console, VT ioctls are not handled by the serial driver. And by quick search, I found some programs using VT_GETMODE to check vt-console is available or not. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-01-19ecryptfs: use after freeDan Carpenter
The "full_alg_name" variable is used on a couple error paths, so we shouldn't free it until the end. Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: stable@kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19ecryptfs: Eliminate useless codeJulia Lawall
The variable lower_dentry is initialized twice to the same (side effect-free) expression. Drop one initialization. A simplified version of the semantic match that finds this problem is: (http://coccinelle.lip6.fr/) // <smpl> @forall@ idexpression *x; identifier f!=ERR_PTR; @@ x = f(...) ... when != x ( x = f(...,<+...x...+>,...) | * x = f(...) ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19ecryptfs: fix interpose/interpolate typos in commentsErez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Acked-by: Dustin Kirkland <kirkland@canonical.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19ecryptfs: pass matching flags to interpose as defined and used thereErez Zadok
ecryptfs_interpose checks if one of the flags passed is ECRYPTFS_INTERPOSE_FLAG_D_ADD, defined as 0x00000001 in ecryptfs_kernel.h. But the only user of ecryptfs_interpose to pass a non-zero flag to it, has hard-coded the value as "1". This could spell trouble if any of these values changes in the future. Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19ecryptfs: remove unnecessary d_drop calls in ecryptfs_linkErez Zadok
Unnecessary because it would unhash perfectly valid dentries, causing them to have to be re-looked up the next time they're needed, which presumably is right after. Signed-off-by: Aseem Rastogi <arastogi@cs.sunysb.edu> Signed-off-by: Shrikar archak <shrikar84@gmail.com> Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Cc: Saumitra Bhanage <sbhanage@cs.sunysb.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19ecryptfs: don't ignore return value from lock_renameErez Zadok
Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19ecryptfs: initialize private persistent file before dereferencing pointerErez Zadok
Ecryptfs_open dereferences a pointer to the private lower file (the one stored in the ecryptfs inode), without checking if the pointer is NULL. Right afterward, it initializes that pointer if it is NULL. Swap order of statements to first initialize. Bug discovered by Duckjin Kang. Signed-off-by: Duckjin Kang <fromdj2k@gmail.com> Signed-off-by: Erez Zadok <ezk@cs.sunysb.edu> Cc: Dustin Kirkland <kirkland@canonical.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@kernel.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19eCryptfs: Remove mmap from directory operationsTyler Hicks
Adrian reported that mkfontscale didn't work inside of eCryptfs mounts. Strace revealed the following: open("./", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3 fcntl64(3, F_GETFD) = 0x1 (flags FD_CLOEXEC) open("./fonts.scale", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 4 getdents(3, /* 80 entries */, 32768) = 2304 open("./.", O_RDONLY) = 5 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 fstat64(5, {st_mode=S_IFDIR|0755, st_size=16384, ...}) = 0 mmap2(NULL, 16384, PROT_READ, MAP_PRIVATE, 5, 0) = 0xb7fcf000 close(5) = 0 --- SIGBUS (Bus error) @ 0 (0) --- +++ killed by SIGBUS +++ The mmap2() on a directory was successful, resulting in a SIGBUS signal later. This patch removes mmap() from the list of possible ecryptfs_dir_fops so that mmap() isn't possible on eCryptfs directory files. https://bugs.launchpad.net/ecryptfs/+bug/400443 Reported-by: Adrian C. <anrxc@sysphere.org> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19eCryptfs: Add getattr functionTyler Hicks
The i_blocks field of an eCryptfs inode cannot be trusted, but generic_fillattr() uses it to instantiate the blocks field of a stat() syscall when a filesystem doesn't implement its own getattr(). Users have noticed that the output of du is incorrect on newly created files. This patch creates ecryptfs_getattr() which calls into the lower filesystem's getattr() so that eCryptfs can use its kstat.blocks value after calling generic_fillattr(). It is important to note that the block count includes the eCryptfs metadata stored in the beginning of the lower file plus any padding used to fill an extent before encryption. https://bugs.launchpad.net/ecryptfs/+bug/390833 Reported-by: Dominic Sacré <dominic.sacre@gmx.de> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19eCryptfs: Use notify_change for truncating lower inodesTyler Hicks
When truncating inodes in the lower filesystem, eCryptfs directly invoked vmtruncate(). As Christoph Hellwig pointed out, vmtruncate() is a filesystem helper function, but filesystems may need to do more than just a call to vmtruncate(). This patch moves the lower inode truncation out of ecryptfs_truncate() and renames the function to truncate_upper(). truncate_upper() updates an iattr for the lower inode to indicate if the lower inode needs to be truncated upon return. ecryptfs_setattr() then calls notify_change(), using the updated iattr for the lower inode, to complete the truncation. For eCryptfs functions needing to truncate, ecryptfs_truncate() is reintroduced as a simple way to truncate the upper inode to a specified size and then truncate the lower inode accordingly. https://bugs.launchpad.net/bugs/451368 Reported-by: Christoph Hellwig <hch@lst.de> Acked-by: Dustin Kirkland <kirkland@canonical.com> Cc: ecryptfs-devel@lists.launchpad.net Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
2010-01-19fs/bio.c: fix shadows sparse warningThiago Farina
fs/bio.c:81:33: warning: symbol 'bslab' shadows an earlier one fs/bio.c:74:25: originally declared here Signed-off-by: Thiago Farina <tfransosi@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-01-18Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds
* 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: xfs_swap_extents needs to handle dynamic fork offsets xfs: fix missing error check in xfs_rtfree_range xfs: fix stale inode flush avoidance xfs: Remove inode iolock held check during allocation xfs: reclaim all inodes by background tree walks xfs: Avoid inodes in reclaim when flushing from inode cache xfs: reclaim inodes under a write lock
2010-01-17Btrfs: fix possible panic on unmountJosef Bacik
We can race with the unmount of an fs and the stopping of a kthread where we will free the block group before we're done using it. The reason for this is because we do not hold a reference on the block group while its caching, since the allocator drops its reference once it exits or moves on to the next block group. This patch fixes the problem by taking a reference to the block group before we start caching and dropping it when we're done to make sure all accesses to the block group are safe. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-17Btrfs: deal with NULL acl sent to btrfs_set_aclChris Mason
It is legal for btrfs_set_acl to be sent a NULL acl. This makes sure we don't dereference it. A similar patch was sent by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-17Btrfs: fix regression in orphan cleanupJosef Bacik
Currently orphan cleanup only ever gets triggered if we cross subvolumes during a lookup, which means that if we just mount a plain jane fs that has orphans in it, they will never get cleaned up. This results in panic's like these http://www.kerneloops.org/oops.php?number=1109085 where adding an orphan entry results in -EEXIST being returned and we panic. In order to fix this, we check to see on lookup if our root has had the orphan cleanup done, and if not go ahead and do it. This is easily reproduceable by running this testcase #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <unistd.h> #include <stdio.h> int main(int argc, char **argv) { char data[4096]; char newdata[4096]; int fd1, fd2; memset(data, 'a', 4096); memset(newdata, 'b', 4096); while (1) { int i; fd1 = creat("file1", 0666); if (fd1 < 0) break; for (i = 0; i < 512; i++) write(fd1, data, 4096); fsync(fd1); close(fd1); fd2 = creat("file2", 0666); if (fd2 < 0) break; ftruncate(fd2, 4096 * 512); for (i = 0; i < 512; i++) write(fd2, newdata, 4096); close(fd2); i = rename("file2", "file1"); unlink("file1"); } return 0; } and then pulling the power on the box, and then trying to run that test again when the box comes back up. I've tested this locally and it fixes the problem. Thanks to Tomas Carnecky for helping me track this down initially. Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-17Btrfs: Fix race in btrfs_mark_extent_writtenYan, Zheng
Fix bug reported by Johannes Hirte. The reason of that bug is btrfs_del_items is called after btrfs_duplicate_item and btrfs_del_items triggers tree balance. The fix is check that case and call btrfs_search_slot when needed. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-17Btrfs, fix memory leaks in error pathsJiri Slaby
Stanse found 2 memory leaks in relocate_block_group and __btrfs_map_block. cluster and multi are not freed/assigned on all paths. Fix that. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: linux-btrfs@vger.kernel.org Signed-off-by: Chris Mason <chris.mason@oracle.com>
2010-01-17Btrfs: align offsets for btrfs_ordered_update_i_sizeYan, Zheng
Some callers of btrfs_ordered_update_i_size can now pass in a NULL for the ordered extent to update against. This makes sure we properly align the offset they pass in when deciding how much to bump the on disk i_size. Signed-off-by: Chris Mason <chris.mason@oracle.com>