summaryrefslogtreecommitdiff
path: root/block/blk-core.c
AgeCommit message (Collapse)Author
2011-08-03fault-injection: add ability to export fault_attr in arbitrary directoryAkinobu Mita
init_fault_attr_dentries() is used to export fault_attr via debugfs. But it can only export it in debugfs root directory. Per Forlin is working on mmc_fail_request which adds support to inject data errors after a completed host transfer in MMC subsystem. The fault_attr for mmc_fail_request should be defined per mmc host and export it in debugfs directory per mmc host like /sys/kernel/debug/mmc0/mmc_fail_request. init_fault_attr_dentries() doesn't help for mmc_fail_request. So this introduces fault_create_debugfs_attr() which is able to create a directory in the arbitrary directory and replace init_fault_attr_dentries(). [akpm@linux-foundation.org: extraneous semicolon, per Randy] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Tested-by: Per Forlin <per.forlin@linaro.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26fail_make_request: cleanup should_fail_requestAkinobu Mita
This changes should_fail_request() to more usable wrapper function of should_fail(). It can avoid putting #ifdef CONFIG_FAIL_MAKE_REQUEST in the middle of a function. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26block: fix warning with calling smp_processor_id() in preemptible sectionJens Axboe
After commit 5757a6d7 introduced an unsafe calling of smp_processor_id(), with preempt debuggin turned on we spew a lot of: BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/514 caller is __make_request+0x1b8/0x308 [<c0019f44>] (unwind_backtrace+0x0/0xe8) from [<c024b4cc>] (debug_smp_processor_id+0xbc/0xf0) [<c024b4cc>] (debug_smp_processor_id+0xbc/0xf0) from [<c0223d14>] (__make_request+0x1b8/0x308) [<c0223d14>] (__make_request+0x1b8/0x308) from [<c02215ac>] (generic_make_request+0x4dc/0x558) [<c02215ac>] (generic_make_request+0x4dc/0x558) from [<c022173c>] (submit_bio+0x114/0x138) [<c022173c>] (submit_bio+0x114/0x138) from [<c011f504>] (submit_bh+0x148/0x16c) [<c011f504>] (submit_bh+0x148/0x16c) from [<c0121ed8>] (__sync_dirty_buffer+0x88/0xd8) [<c0121ed8>] (__sync_dirty_buffer+0x88/0xd8) from [<c01aff78>] (journal_commit_transaction+0x1198/0x1688) [<c01aff78>] (journal_commit_transaction+0x1198/0x1688) from [<c01b4034>] (kjournald+0xb4/0x224) [<c01b4034>] (kjournald+0xb4/0x224) from [<c0069ea0>] (kthread+0x8c/0x94) [<c0069ea0>] (kthread+0x8c/0x94) from [<c00137f8>] (kernel_thread_exit+0x0/0x8) Fix this by just using raw_smp_processor_id(), it's just a hint after all. There's no pinning of the CPU or accessing per-cpu structures involved. Reported-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-25Merge branch 'for-3.1/core' of git://git.kernel.dk/linux-blockLinus Torvalds
* 'for-3.1/core' of git://git.kernel.dk/linux-block: (24 commits) block: strict rq_affinity backing-dev: use synchronize_rcu_expedited instead of synchronize_rcu block: fix patch import error in max_discard_sectors check block: reorder request_queue to remove 64 bit alignment padding CFQ: add think time check for group CFQ: add think time check for service tree CFQ: move think time check variables to a separate struct fixlet: Remove fs_excl from struct task. cfq: Remove special treatment for metadata rqs. block: document blk_plug list access block: avoid building too big plug list compat_ioctl: fix make headers_check regression block: eliminate potential for infinite loop in blkdev_issue_discard compat_ioctl: fix warning caused by qemu block: flush MEDIA_CHANGE from drivers on close(2) blk-throttle: Make total_nr_queued unsigned block: Add __attribute__((format(printf...) and fix fallout fs/partitions/check.c: make local symbols static block:remove some spare spaces in genhd.c block:fix the comment error in blkdev.h ...
2011-07-23block: strict rq_affinityDan Williams
Some systems benefit from completions always being steered to the strict requester cpu rather than the looser "per-socket" steering that blk_cpu_to_group() attempts by default. This is because the first CPU in the group mask ends up being completely overloaded with work, while the others (including the original submitter) has power left to spare. Allow the strict mode to be set by writing '2' to the sysfs control file. This is identical to the scheme used for the nomerges file, where '2' is a more aggressive setting than just being turned on. echo 2 > /sys/block/<bdev>/queue/rq_affinity Cc: Christoph Hellwig <hch@infradead.org> Cc: Roland Dreier <roland@purestorage.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-07-21[SCSI] fix crash in scsi_dispatch_cmd()James Bottomley
USB surprise removal of sr is triggering an oops in scsi_dispatch_command(). What seems to be happening is that USB is hanging on to a queue reference until the last close of the upper device, so the crash is caused by surprise remove of a mounted CD followed by attempted unmount. The problem is that USB doesn't issue its final commands as part of the SCSI teardown path, but on last close when the block queue is long gone. The long term fix is probably to make sr do the teardown in the same way as sd (so remove all the lower bits on ejection, but keep the upper disk alive until last close of user space). However, the current oops can be simply fixed by not allowing any commands to be sent to a dead queue. Cc: stable@kernel.org Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-07-08block: avoid building too big plug listShaohua Li
When I test fio script with big I/O depth, I found the total throughput drops compared to some relative small I/O depth. The reason is the thread accumulates big requests in its plug list and causes some delays (surely this depends on CPU speed). I thought we'd better have a threshold for requests. When a threshold reaches, this means there is no request merge and queue lock contention isn't severe when pushing per-task requests to queue, so the main advantages of blk plug don't exist. We can force a plug list flush in this case. With this, my test throughput actually increases and almost equals to small I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list() for big I/O depth. The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to reduce lock contention to me. But I'm open here, 32 is ok in my test too. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-27block: export blk_{get,put}_queue()Jens Axboe
We need them in SCSI to fix a bug, but currently they are not exported to modules. Export them. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-26block: remove unused variable in bio_attempt_front_merge()Luca Tettamanti
sector is never read inside the function. Signed-off-by: Luca Tettamanti <kronos.it@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-23block: call elv_bio_merged() when mergedVivek Goyal
Commit 73c101011926 ("block: initial patch for on-stack per-task plugging") removed calls to elv_bio_merged() when @bio merged with @req. Re-add them. This in turn will update merged stats in associated group. That should be safe as long as request has got reference to the blkio_group. Signed-off-by: Namhyung Kim <namhyung@gmail.com> Cc: Divyesh Shah <dpshah@google.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-20block: get rid of on-stack plugging debug checksJens Axboe
We don't need them anymore, so kill: - REQ_ON_PLUG checks in various places - !rq_mergeable() check in plug merging Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-20blk-cgroup: Allow sleeping while dynamically allocating a groupVivek Goyal
Currently, all the cfq_group or throtl_group allocations happen while we are holding ->queue_lock and sleeping is not allowed. Soon, we will move to per cpu stats and also need to allocate the per group stats. As one can not call alloc_percpu() from atomic context as it can sleep, we need to drop ->queue_lock, allocate the group, retake the lock and continue processing. In throttling code, I check the queue DEAD flag again to make sure that driver did not call blk_cleanup_queue() in the mean time. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-18block: don't delay blk_run_queue_asyncShaohua Li
Let's check a scenario: 1. blk_delay_queue(q, SCSI_QUEUE_DELAY); 2. blk_run_queue_async(); the second one will became a noop, because q->delay_work already has WORK_STRUCT_PENDING_BIT set, so the delayed work will still run after SCSI_QUEUE_DELAY. But blk_run_queue_async actually hopes the delayed work runs immediately. Fix this by doing a cancel on potentially pending delayed work before queuing an immediate run of the workqueue. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-19block: remove stale kerneldoc member from __blk_run_queue()Jens Axboe
We don't pass in a 'force_kblockd' anymore, get rid of the stsale comment. Reported-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-19block: get rid of QUEUE_FLAG_REENTERJens Axboe
We are currently using this flag to check whether it's safe to call into ->request_fn(). If it is set, we punt to kblockd. But we get a lot of false positives and excessive punts to kblockd, which hurts performance. The only real abuser of this infrastructure is SCSI. So export the async queue run and convert SCSI over to use that. There's room for improvement in that SCSI need not always use the async call, but this fixes our performance issue and they can fix that up in due time. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18block: kill blk_flush_plug_list() exportJens Axboe
With all drivers and file systems converted, we only have in-core use of this function. So remove the export. Reporteed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18block: add blk_run_queue_asyncChristoph Hellwig
Instead of overloading __blk_run_queue to force an offload to kblockd add a new blk_run_queue_async helper to do it explicitly. I've kept the blk_queue_stopped check for now, but I suspect it's not needed as the check we do when the workqueue items runs should be enough. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18block: blk_delay_queue() should use kblockd workqueueJens Axboe
Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18block: drop queue lock before calling __blk_run_queue() for kblockd puntJens Axboe
If we know we are going to punt to kblockd, we can drop the queue lock before calling into __blk_run_queue() since it only does a safe bit test and a workqueue call. Since kblockd needs to grab this very lock as one of the first things it does, it's a good optimization to drop the lock before waking kblockd. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18Revert "block: add callback function for unplug notification"Jens Axboe
MD can't use this since it really requires us to be able to keep more than a single piece of state for the unplug. Commit 048c9374 added the required support for MD, so get rid of this now unused code. This reverts commit f75664570d8b75469cc468f23c2b27220984983b. Conflicts: block/blk-core.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18block: Enhance new plugging support to support general callbacksNeilBrown
md/raid requires an unplug callback, but as it does not uses requests the current code cannot provide one. So allow arbitrary callbacks to be attached to the blk_plug. Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-16block: make unplug timer trace event correspond to the schedule() unplugJens Axboe
It's a pretty close match to what we had before - the timer triggering would mean that nobody unplugged the plug in due time, in the new scheme this matches very closely what the schedule() unplug now is. It's essentially the difference between an explicit unplug (IO unplug) or an implicit unplug (timer unplug, we scheduled with pending IO queued). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-15block: only force kblockd unplugging from the schedule() pathJens Axboe
For the explicit unplugging, we'd prefer to kick things off immediately and not pay the penalty of the latency to switch to kblockd. So let blk_finish_plug() do the run inline, while the implicit-on-schedule-out unplug will punt to kblockd. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-15block: cleanup the block plug helper functionsChristoph Hellwig
It's a bit of a mess currently. task->plug is being cleared and reset in __blk_finish_plug(), and blk_finish_plug() is testing for a NULL plug which cannot happen even from schedule() anymore since it uses blk_needs_flush_plug() to determine whether to call into this function at all. So get rid of some of the cruft. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12block: move queue run on unplug to kblockdJens Axboe
There are worries that we are now consuming a lot more stack in some cases, since we potentially call into IO dispatch from schedule() or io_schedule(). We can reduce this problem by moving the running of the queue to kblockd, like the old plugging scheme did as well. This may or may not be a good idea from a performance perspective, depending on how many tasks have queue plugs running at the same time. For even the slightly contended case, doing just a single queue run from kblockd instead of multiple runs directly from the unpluggers will be faster. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12block: kill queue_sync_plugs()Jens Axboe
The original use for this dates back to when we had to track write requests for serializing around barriers. That's not needed anymore, so kill it. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12block: readd plug trace eventJens Axboe
This was removed with the queue plug state. But we can easily readd by checking if this is the first request going to this queue. It's good information to have when tracing to see how effective the plugging is. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12block: add callback function for unplug notificationJens Axboe
MD would like to know when a queue is unplugged, so it can flush it's bitmap writes. Add such a callback. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12block: add comment on why we save and disable interrupts in flush_plug_list()Jens Axboe
It's done at the top to avoid doing it for every queue we unplug. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-12block: fixup block IO unplug trace callJens Axboe
It was removed with the on-stack plugging, readd it and track the depth of requests added when flushing the plug. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-11block: splice plug list to local contextNeilBrown
If the request_fn ends up blocking, we could be re-entering the plug flush. Since the list is protected by explicitly not allowing schedule events, this isn't a terribly good idea. Additionally, it can cause us to recurse. As request_fn called by __blk_run_queue is allowed to 'schedule()' (after dropping the queue lock of course), it is possible to get a recursive call: schedule -> blk_flush_plug -> __blk_finish_plug -> flush_plug_list -> __blk_run_queue -> request_fn -> schedule We must make sure that the second schedule does not call into blk_flush_plug again. So instead of leaving the list of requests on blk_plug->list, move them to a separate list leaving blk_plug->list empty. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-07Merge branch 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6Linus Torvalds
* 'for-linus2' of git://git.profusion.mobi/users/lucas/linux-2.6: Fix common misspellings
2011-04-05block: fix request sorting at unplugKonstantin Khlebnikov
Comparison function for list_sort() must be anticommutative, otherwise it is not sorting in ordinary meaning. But fortunately list_sort() always check ((*cmp)(priv, a, b) <= 0) it not distinguish negative and zero, so comparison function can implement only less-or-equal instead of full three-way comparison. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-05block: dump request state on seeing a corrupted request completionJens Axboe
Currently we just dump a non-informative 'request botched' message. Lets actually try and print something sane to help debug issues around this. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-31Fix common misspellingsLucas De Marchi
Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-25block: fix issue with calling blk_stop_queue() from the request_fn handlerJens Axboe
When the queue work handler was converted to delayed work, the stopping was inadvertently made sync as well. Change this back to being async stop, using __cancel_delayed_work() instead of cancel_delayed_work(). Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Reported-by: Chris Mason <chris.mason@oracle.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-25block: fix bug with inserting flush requests as sort/mergeJens Axboe
With the introduction of the on-stack plugging, we would assume that any request being inserted was a normal file system request. As flush/fua requires a special insert mode, this caused problems. Fix this up by checking for this in flush_plug_list() and use the appropriate insert mechanism. Big thanks goes to Markus Tripplesdorf for tirelessly testing patches, and to Sergey Senozhatsky for helping find the real issue. Reported-by: Markus Tripplesdorf <markus@trippelsdorf.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-24Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits) Documentation/iostats.txt: bit-size reference etc. cfq-iosched: removing unnecessary think time checking cfq-iosched: Don't clear queue stats when preempt. blk-throttle: Reset group slice when limits are changed blk-cgroup: Only give unaccounted_time under debug cfq-iosched: Don't set active queue in preempt block: fix non-atomic access to genhd inflight structures block: attempt to merge with existing requests on plug flush block: NULL dereference on error path in __blkdev_get() cfq-iosched: Don't update group weights when on service tree fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away block: Require subsystems to explicitly allocate bio_set integrity mempool jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging fs: make fsync_buffers_list() plug mm: make generic_writepages() use plugging blk-cgroup: Add unaccounted time to timeslice_used. block: fixup plugging stubs for !CONFIG_BLOCK block: remove obsolete comments for blkdev_issue_zeroout. blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Fix up conflicts in fs/{aio.c,super.c}
2011-03-21block: attempt to merge with existing requests on plug flushJens Axboe
One of the disadvantages of on-stack plugging is that we potentially lose out on merging since all pending IO isn't always visible to everybody. When we flush the on-stack plugs, right now we don't do any checks to see if potential merge candidates could be utilized. Correct this by adding a new insert variant, ELEVATOR_INSERT_SORT_MERGE. It works just ELEVATOR_INSERT_SORT, but first checks whether we can merge with an existing request before doing the insertion (if we fail merging). This fixes a regression with multiple processes issuing IO that can be merged. Thanks to Shaohua Li <shaohua.li@intel.com> for testing and fixing an accounting bug. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits) [SCSI] scsi_dh_rdac: Add MD36xxf into device list [SCSI] scsi_debug: add consecutive medium errors [SCSI] libsas: fix ata list corruption issue [SCSI] hpsa: export resettable host attribute [SCSI] hpsa: move device attributes to avoid forward declarations [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26) [SCSI] sd: Logical Block Provisioning update [SCSI] Include protection operation in SCSI command trace [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try) [SCSI] target: Fix volume size misreporting for volumes > 2TB [SCSI] bnx2fc: Broadcom FCoE offload driver [SCSI] fcoe: fix broken fcoe interface reset [SCSI] fcoe: precedence bug in fcoe_filter_frames() [SCSI] libfcoe: Remove stale fcoe-netdev entries [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out" [SCSI] libfc: Fixing a memory leak when destroying an interface [SCSI] megaraid_sas: Version and Changelog update ... Fix up trivial conflicts due to whitespace differences in drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}
2011-03-10Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/coreJens Axboe
Conflicts: block/blk-core.c block/blk-flush.c drivers/md/raid1.c drivers/md/raid10.c drivers/md/raid5.c fs/nilfs2/btnode.c fs/nilfs2/mdt.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: kill off REQ_UNPLUGJens Axboe
With the plugging now being explicitly controlled by the submitter, callers need not pass down unplugging hints to the block layer. If they want to unplug, it's because they manually plugged on their own - in which case, they should just unplug at will. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: remove per-queue pluggingJens Axboe
Code has been converted over to the new explicit on-stack plugging, and delay users have been converted to use the new API for that. So lets kill off the old plugging along with aops->sync_page(). Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: initial patch for on-stack per-task pluggingJens Axboe
This patch adds support for creating a queuing context outside of the queue itself. This enables us to batch up pieces of IO before grabbing the block device queue lock and submitting them to the IO scheduler. The context is created on the stack of the process and assigned in the task structure, so that we can auto-unplug it if we hit a schedule event. The current queue plugging happens implicitly if IO is submitted to an empty device, yet callers have to remember to unplug that IO when they are going to wait for it. This is an ugly API and has caused bugs in the past. Additionally, it requires hacks in the vm (->sync_page() callback) to handle that logic. By switching to an explicit plugging scheme we make the API a lot nicer and can get rid of the ->sync_page() hack in the vm. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10block: add API for delaying work/request_fn a little bitJens Axboe
Currently we use plugging for that, but as plugging is going away, we need an alternative mechanism. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-04Merge branch 'for-linus' of ../linux-2.6-block into block-for-2.6.39/coreTejun Heo
This merge creates two set of conflicts. One is simple context conflicts caused by removal of throtl_scheduled_delayed_work() in for-linus and removal of throtl_shutdown_timer_wq() in for-2.6.39/core. The other is caused by commit 255bb490c8 (block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()) in for-linus crashing with FLUSH reimplementation in for-2.6.39/core. The conflict isn't trivial but the resolution is straight-forward. * __blk_run_queue() calls in flush_end_io() and flush_data_end_io() should be called with @force_kblockd set to %true. * elv_insert() in blk_kick_flush() should use %ELEVATOR_INSERT_REQUEUE. Both changes are to avoid invoking ->request_fn() directly from request completion path and closely match the changes in the commit 255bb490c8. Signed-off-by: Tejun Heo <tj@kernel.org>
2011-03-02block: Move blk_throtl_exit() call to blk_cleanup_queue()Vivek Goyal
Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is written in such a way that it needs queue lock. In blk_release_queue() there is no gurantee that ->queue_lock is still around. Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported one problem. https://lkml.org/lkml/2010/10/23/86 And a quick fix moved blk_throtl_exit() to blk_release_queue(). commit 7ad58c028652753814054f4e3ac58f925e7343f4 Author: Jens Axboe <jaxboe@fusionio.com> Date: Sat Oct 23 20:40:26 2010 +0200 block: fix use-after-free bug in blk throttle code This patch reverts above change and does not try to shutdown the throtl work in blk_sync_queue(). By avoiding call to throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid the problem reported by Ingo. blk_sync_queue() seems to be used only by md driver and it seems to be using it to make sure q->unplug_fn is not called as md registers its own unplug functions and it is about to free up the data structures used by unplug_fn(). Block throttle does not call back into unplug_fn() or into md. So there is no need to cancel blk throttle work. In fact I think cancelling block throttle work is bad because it might happen that some bios are throttled and scheduled to be dispatched later with the help of pending work and if work is cancelled, these bios might never be dispatched. Block layer also uses blk_sync_queue() during blk_cleanup_queue() and blk_release_queue() time. That should be safe as we are also calling blk_throtl_exit() which should make sure all the throttling related data structures are cleaned up. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02block: Initialize ->queue_lock to internal lock at queue allocation timeVivek Goyal
There does not seem to be a clear convention whether q->queue_lock is initialized or not when blk_cleanup_queue() is called. In the past it was not necessary but now blk_throtl_exit() takes up queue lock by default and needs queue lock to be available. In fact elevator_exit() code also has similar requirement just that it is less stringent in the sense that elevator_exit() is called only if elevator is initialized. Two problems have been noticed because of ambiguity about spin lock status. - If a driver calls blk_alloc_queue() and then soon calls blk_cleanup_queue() almost immediately, (because some other driver structure allocation failed or some other error happened) then blk_throtl_exit() will run into issues as queue lock is not initialized. Loop driver ran into this issue recently and I noticed error paths in md driver too. Similar error paths should exist in other drivers too. - If some driver provided external spin lock and zapped the lock before blk_cleanup_queue(), then it can lead to issues. So this patch initializes the default queue lock at queue allocation time. block throttling code is one of the users of queue lock and it is initialized at the queue allocation time, so it makes sense to initialize ->queue_lock also to internal lock. A driver can overide that lock later. This will take care of the issue where a driver does not have to worry about initializing the queue lock to default before calling blk_cleanup_queue() Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02block: add @force_kblockd to __blk_run_queue()Tejun Heo
__blk_run_queue() automatically either calls q->request_fn() directly or schedules kblockd depending on whether the function is recursed. blk-flush implementation needs to be able to explicitly choose kblockd. Add @force_kblockd. All the current users are converted to specify %false for the parameter and this patch doesn't introduce any behavior change. stable: This is prerequisite for fixing ide oops caused by the new blk-flush implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jan Beulich <JBeulich@novell.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-01Merge commit 'v2.6.38-rc6' into for-2.6.39/coreJens Axboe
Conflicts: block/cfq-iosched.c Signed-off-by: Jens Axboe <jaxboe@fusionio.com>