summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-09-25fix a typo in put_compat_shm_info()Al Viro
"uip" misspelled as "up"; unfortunately, the latter happens to be a function and gcc is happy to convert it to void *... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-25Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - Two sets of NVMe pull requests from Christoph: - Fixes for the Fibre Channel host/target to fix spec compliance - Allow a zero keep alive timeout - Make the debug printk for broken SGLs work better - Fix queue zeroing during initialization - Set of RDMA and FC fixes - Target div-by-zero fix - bsg double-free fix. - ndb unknown ioctl fix from Josef. - Buffered vs O_DIRECT page cache inconsistency fix. Has been floating around for a long time, well reviewed. From Lukas. - brd overflow fix from Mikulas. - Fix for a loop regression in this merge window, where using a union for two members of the loop_cmd turned out to be a really bad idea. From Omar. - Fix for an iostat regression fix in this series, using the wrong API to get at the block queue. From Shaohua. - Fix for a potential blktrace delection deadlock. From Waiman. * 'for-linus' of git://git.kernel.dk/linux-block: (30 commits) nvme-fcloop: fix port deletes and callbacks nvmet-fc: sync header templates with comments nvmet-fc: ensure target queue id within range. nvmet-fc: on port remove call put outside lock nvme-rdma: don't fully stop the controller in error recovery nvme-rdma: give up reconnect if state change fails nvme-core: Use nvme_wq to queue async events and fw activation nvme: fix sqhd reference when admin queue connect fails block: fix a crash caused by wrong API fs: Fix page cache inconsistency when mixing buffered and AIO DIO nvmet: implement valid sqhd values in completions nvme-fabrics: Allow 0 as KATO value nvme: allow timed-out ios to retry nvme: stop aer posting if controller state not live nvme-pci: Print invalid SGL only once nvme-pci: initialize queue memory before interrupts nvmet-fc: fix failing max io queue connections nvme-fc: use transport-specific sgl format nvme: add transport SGL definitions nvme.h: remove FC transport-specific error values ...
2017-09-25Merge tag 'gfs2-for-linus-4.14-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fix from Bob Peterson: "GFS2: Fix an old regression in GFS2's debugfs interface This fixes a regression introduced by commit 88ffbf3e037e ("GFS2: Use resizable hash table for glocks"). The regression caused the glock dump in debugfs to not report all the glocks, which makes debugging extremely difficult" * tag 'gfs2-for-linus-4.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix debugfs glocks dump
2017-09-25Merge tag 'microblaze-4.14-rc3' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds
Pull Microblaze fixes from Michal Simek: - Kbuild fix - use vma_pages - setup default little endians * tag 'microblaze-4.14-rc3' of git://git.monstr.eu/linux-2.6-microblaze: arch: change default endian for microblaze microblaze: Cocci spatch "vma_pages" microblaze: Add missing kvm_para.h to Kbuild
2017-09-25Merge tag 'trace-v4.14-rc1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Stack tracing and RCU has been having issues with each other and lockdep has been pointing out constant problems. The changes have been going into the stack tracer, but it has been discovered that the problem isn't with the stack tracer itself, but it is with calling save_stack_trace() from within the internals of RCU. The stack tracer is the one that can trigger the issue the easiest, but examining the problem further, it could also happen from a WARN() in the wrong place, or even if an NMI happened in this area and it did an rcu_read_lock(). The critical area is where RCU is not watching. Which can happen while going to and from idle, or bringing up or taking down a CPU. The final fix was to put the protection in kernel_text_address() as it is the one that requires RCU to be watching while doing the stack trace. To make this work properly, Paul had to allow rcu_irq_enter() happen after rcu_nmi_enter(). This should have been done anyway, since an NMI can page fault (reading vmalloc area), and a page fault triggers rcu_irq_enter(). One patch is just a consolidation of code so that the fix only needed to be done in one location" * tag 'trace-v4.14-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Remove RCU work arounds from stack tracer extable: Enable RCU if it is not watching in kernel_text_address() extable: Consolidate *kernel_text_address() functions rcu: Allow for page faults in NMI handlers
2017-09-25nvme-fcloop: fix port deletes and callbacksJames Smart
Now that there are potentially long delays between when a remoteport or targetport delete calls is made and when the callback occurs (dev_loss_tmo timeout), no longer block in the delete routines and move the final nport puts to the callbacks. Moved the fcloop_nport_get/put/free routines to avoid forward declarations. Ensure port_info structs used in registrations are nulled in case fields are not set (ex: devloss_tmo values). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvmet-fc: sync header templates with commentsJames Smart
Comments were incorrect: - defer_rcv was in host port template. moved to target port template - Added Mandatory statements for target port template items Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvmet-fc: ensure target queue id within range.James Smart
When searching for queue id's ensure they are within the expected range. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvmet-fc: on port remove call put outside lockJames Smart
Avoid calling the put routine, as it may traverse to free routines while holding the target lock. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme-rdma: don't fully stop the controller in error recoverySagi Grimberg
By calling nvme_stop_ctrl on a already failed controller will wait for the scan work to complete (only by identify timeout expiration which is 60 seconds). This is unnecessary when we already know that the controller has failed. Reported-by: Yi Zhang <yizhan@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme-rdma: give up reconnect if state change failsSagi Grimberg
If we failed to transition to state LIVE after a successful reconnect, then controller deletion already started. In this case there is no point moving forward with reconnect. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme-core: Use nvme_wq to queue async events and fw activationSagi Grimberg
async_event_work might race as it is executed from two different workqueues at the moment. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme: fix sqhd reference when admin queue connect failsJames Smart
Fix bug in sqhd patch. It wasn't the sq that was at risk. In the case where the admin queue connect command fails, the sq->size field is not set. Therefore, this becomes a divide by zero error. Add a quick check to bypass under this failure condition. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25gfs2: Fix debugfs glocks dumpAndreas Gruenbacher
The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a single buffer: the right function for restarting an rhashtable iteration from the beginning of the hash table is rhashtable_walk_enter; rhashtable_walk_stop + rhashtable_walk_start will just resume from the current position. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Cc: stable@vger.kernel.org # v4.3+
2017-09-25selftests: timers: set-timer-lat: Fix hang when testing unsupported alarmsShuah Khan
When timer_create() fails on a bootime or realtime clock, setup_timer() returns 0 as if timer has been set. Callers wait forever for the timer to expire. This hang is seen on a system that doesn't have support for: CLOCK_REALTIME_ALARM ABSTIME missing CAP_WAKE_ALARM? : [UNSUPPORTED] Test hangs waiting for a timer that hasn't been set to expire. Fix setup_timer() to return 1, add handling in callers to detect the unsupported case and return 0 without waiting to not fail the test. Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-25selftests: timers: set-timer-lat: fix hang when std out/err are redirectedShuah Khan
do_timer_oneshot() uses select() as a timer with FD_SETSIZE and readfs is cleared with FD_ZERO without FD_SET. When stdout and stderr are redirected, the test hangs in select forever. Fix the problem calling select() with readfds empty and nfds zero. This is sufficient for using select() for timer. With this fix "./set-timer-lat > /dev/null 2>&1" no longer hangs. Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Acked-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-25selftests/memfd: correct run_tests.sh permissionLi Zhijian
to fix the following issue: ------------------ TAP version 13 selftests: run_tests.sh ======================================== selftests: Warning: file run_tests.sh is not executable, correct this. not ok 1..1 selftests: run_tests.sh [FAIL] ------------------ Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-25selftests/seccomp: Support glibc 2.26 siginfo_t.hKees Cook
The 2.26 release of glibc changed how siginfo_t is defined, and the earlier work-around to using the kernel definition are no longer needed. The old way needs to stay around for a while, though. Reported-by: Seth Forshee <seth.forshee@canonical.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Will Drewry <wad@chromium.org> Cc: Shuah Khan <shuah@kernel.org> Cc: linux-kselftest@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Tested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-25selftests: futex: Makefile: fix for loops in targets to run silentlyShuah Khan
Fix for loops in targets to run silently to avoid cluttering the test results. Suppresses the following from targets: for DIR in functional; do \ BUILD_TARGET=./tools/testing/selftests/futex/$DIR; \ mkdir $BUILD_TARGET -p; \ make OUTPUT=$BUILD_TARGET -C $DIR all;\ done ./tools/testing/selftests/futex/run.sh Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-25selftests: Makefile: fix for loops in targets to run silentlyShuah Khan
Fix for loops in targets to run silently to avoid cluttering the test results. Suppresses the following from targets: e.g run from breakpoints for TARGET in breakpoints; do \ BUILD_TARGET=$BUILD/$TARGET; \ mkdir $BUILD_TARGET -p; \ make OUTPUT=$BUILD_TARGET -C $TARGET;\ done; Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-25selftests: mqueue: Use full path to run tests from MakefileShuah Khan
Use full path including $(OUTPUT) to run tests from Makefile for normal case when objects reside in the source tree as well as when objects are relocated with make O=dir. In both cases $(OUTPUT) will be set correctly by lib.mk. Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-25selftests: futex: copy sub-dir test scripts for make O=dir runShuah Khan
For make O=dir run_tests to work, test scripts from sub-directories need to be copied over to the object directory. Running tests from the object directory is necessary to avoid making the source tree dirty. Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com> Reviewed-by: Darren Hart (VMware) <dvhart@infradead.org> Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-09-25IB/mlx5: Fix NULL deference on mlx5_ib_update_xlt failureIlya Lesokhin
mlx5_ib_reg_user_mr called mlx5_ib_dereg_mr in case of MR population failure. This resulted in a NULL dereference as ibmr->device wasn't initialized yet. We address this by adding an internal dereg_mr function that can handle partially initialized MRs, and fixing clean_mr to work on partially initialized MRs. Fixes: ff740aefecb9 ("IB/mlx5: Decouple MR allocation and population flows") Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB/mlx5: Simplify mlx5_ib_cont_pagesIlya Lesokhin
The patch simplifies mlx5_ib_cont_pages and fixes the following issues in the original implementation: First issues is related to alignment of the PFNs. After the check base + p != PFN, the alignment of the PFN wasn't checked. So the PFN sequence 0, 1, 1, 2 would result in a page_shift of 13 even though the 3rd PFN is not 8KB aligned. This wasn't actually a bug because it was supported by all the existing mlx5 compatible device, but we don't want to require this support in all future devices. Another issue is because the inner loop didn't advance PFN so the test "if (base + p != pfn)" always failed for SGE with len > (1<<page_shift). Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Reviewed-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB/ipoib: Fix inconsistency with free_netdev and free_rdma_netdevAlex Vesker
Call free_rdma_netdev instead of free_netdev each time we want to release a netdevice. This call is also relevant for future freeing of offloaded child interfaces. This patch also adds a missing call for free netdevice when releasing a parent interface that has child interfaces using ipoib_remove_one. Fixes: cd565b4b51e5 ('IB/IPoIB: Support acceleration options callbacks') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB/ipoib: Fix sysfs Pkey create<->remove possible deadlockShalom Lagziel
A possible ABBA lock can happen with RTNL and vlan_rwsem. For example: Flow A: Device Flush __ipoib_ib_dev_flush down_read(vlan_rwsem) // Lock A ipoib_flush_ah flush_workqueue(priv->wq) // Wait for completion A work on shared WQ (Mcast carrier) ipoib_mcast_carrier_on_task while (!rtnl_trylock()) // Wait for lock B Flow B: Sysfs PKEY delete ipoib_vlan_delete lock(RTNL) // Lock B down_write(vlan_rwsem) // Wait for lock A This can happen with PKEY creates as well. The solution is to release the RTNL lock in sysfs functions in case it is not possible to lock VLAN RW semaphore and reset the SYS call. Fixes: 69956d83267e ("IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock") Signed-off-by: Shalom Lagziel <shaloml@mellanox.com> Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB: Correct MR length field to be 64-bitParav Pandit
The ib_mr->length represents the length of the MR in bytes as per the IBTA spec 1.3 section 11.2.10.3 (REGISTER PHYSICAL MEMORY REGION). Currently ib_mr->length field is defined as only 32-bits field. This might result into truncation and failed WRs of consumers who registers more than 4GB bytes memory regions and whose WRs accessing such MRs. This patch makes the length 64-bit to avoid such truncation. Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Faisal Latif <faisal.latif@intel.com> Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB/core: Fix qp_sec use after free accessParav Pandit
When security_ib_alloc_security fails, qp->qp_sec memory is freed. However ib_destroy_qp still tries to access this memory which result in kernel crash. So its initialized to NULL to avoid such access. Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25IB/core: Fix typo in the name of the tag-matching cap structLeon Romanovsky
The tag matching functionality is implemented by mlx5 driver by extending XRQ, however this internal kernel information was exposed to user space applications with *xrq* name instead of *tm*. This patch renames *xrq* to *tm* to handle that. Fixes: 8d50505ada72 ("IB/uverbs: Expose XRQ capabilities") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2017-09-25block: fix a crash caused by wrong APIShaohua Li
part_stat_show takes a part device not a disk, so we should use part_to_disk. Fixes: d62e26b3ffd2("block: pass in queue to inflight accounting") Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25fs: Fix page cache inconsistency when mixing buffered and AIO DIOLukas Czerner
Currently when mixing buffered reads and asynchronous direct writes it is possible to end up with the situation where we have stale data in the page cache while the new data is already written to disk. This is permanent until the affected pages are flushed away. Despite the fact that mixing buffered and direct IO is ill-advised it does pose a thread for a data integrity, is unexpected and should be fixed. Fix this by deferring completion of asynchronous direct writes to a process context in the case that there are mapped pages to be found in the inode. Later before the completion in dio_complete() invalidate the pages in question. This ensures that after the completion the pages in the written area are either unmapped, or populated with up-to-date data. Also do the same for the iomap case which uses iomap_dio_complete() instead. This has a side effect of deferring the completion to a process context for every AIO DIO that happens on inode that has pages mapped. However since the consensus is that this is ill-advised practice the performance implication should not be a problem. This was based on proposal from Jeff Moyer, thanks! Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvmet: implement valid sqhd values in completionsJames Smart
To support sqhd, for initiators that are following the spec and paying attention to sqhd vs their sqtail values: - add sqhd to struct nvmet_sq - initialize sqhd to 0 in nvmet_sq_setup - rather than propagate the 0's-based qsize value from the connect message which requires a +1 in every sqhd update, and as nothing else references it, convert to 1's-based value in nvmt_sq/cq_setup() calls. - validate connect message sqsize being non-zero per spec. - updated assign sqhd for every completion that goes back. Also remove handling the NULL sq case in __nvmet_req_complete, as it can't happen with the current code. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme-fabrics: Allow 0 as KATO valueGuilherme G. Piccoli
Currently, driver code allows user to set 0 as KATO (Keep Alive TimeOut), but this is not being respected. This patch enforces the expected behavior. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme: allow timed-out ios to retryJames Smart
Currently the nvme_req_needs_retry() applies several checks to see if a retry is allowed. On of those is whether the current time has exceeded the start time of the io plus the timeout length. This check, if an io times out, means there is never a retry allowed for the io. Which means applications see the io failure. Remove this check and allow the io to timeout, like it does on other protocols, and retries to be made. On the FC transport, a frame can be lost for an individual io, and there may be no other errors that escalate for the connection/association. The io will timeout, which causes the transport to escalate into creating a new association, but the io that timed out, due to this retry logic, has already failed back to the application and things are hosed. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme: stop aer posting if controller state not liveJames Smart
If an nvme async_event command completes, in most cases, a new async event is posted. However, if the controller enters a resetting or reconnecting state, there is nothing to block the scheduled work element from posting the async event again. Nor are there calls from the transport to stop async events when an association dies. In the case of FC, where the association is torn down, the aer must be aborted on the FC link and completes through the normal job completion path. Thus the terminated async event ends up being rescheduled even though the controller isn't in a valid state for the aer, and the reposting gets the transport into a partially torn down data structure. It's possible to hit the scenario on rdma, although much less likely due to an aer completing right as the association is terminated and as the association teardown reclaims the blk requests via nvme_cancel_request() so its immediate, not a link-related action like on FC. Fix by putting controller state checks in both the async event completion routine where it schedules the async event and in the async event work routine before it calls into the transport. It's effectively a "stop_async_events()" behavior. The transport, when it creates a new association with the subsystem will transition the state back to live and is already restarting the async event posting. Signed-off-by: James Smart <james.smart@broadcom.com> [hch: remove taking a lock over reading the controller state] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme-pci: Print invalid SGL only onceKeith Busch
The WARN_ONCE macro returns true if the condition is true, not if the warn was raised, so we're printing the scatter list every time it's invalid. This is excessive and makes debugging harder, so this patch prints it just once. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme-pci: initialize queue memory before interruptsKeith Busch
A spurious interrupt before the nvme driver has initialized the completion queue may inadvertently cause the driver to believe it has a completion to process. This may result in a NULL dereference since the nvmeq's tags are not set at this point. The patch initializes the host's CQ memory so that a spurious interrupt isn't mistaken for a real completion. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvmet-fc: fix failing max io queue connectionsJames Smart
fc transport is treating NVMET_NR_QUEUES as maximum queue count, e.g. admin queue plus NVMET_NR_QUEUES-1 io queues. But NVMET_NR_QUEUES is the number of io queues, so maximum queue count is really NVMET_NR_QUEUES+1. Fix the handling in the target fc transport Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme-fc: use transport-specific sgl formatJames Smart
Sync with NVM Express spec change and FC-NVME 1.18. FC transport sets SGL type to Transport SGL Data Block Descriptor and subtype to transport-specific value 0x0A. Removed the warn-on's on the PRP fields. They are unneeded. They were to check for values from the upper layer that weren't set right, and for the most part were fine. But, with Async events, which reuse the same structure and 2nd time issued the SGL overlay converted them to the Transport SGL values - the warn-on's were errantly firing. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme: add transport SGL definitionsJames Smart
Add transport SGL defintions from NVMe TP 4008, required for the final NVMe-FC standard. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme.h: remove FC transport-specific error valuesJames Smart
The NVM express group recinded the reserved range for the transport. Remove the FC-centric values that had been defined. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25qla2xxx: remove use of FC-specific error codesJames Smart
The qla2xxx driver uses the FC-specific error when it needed to return an error to the FC-NVME transport. Convert to use a generic value instead. Signed-off-by: James Smart <james.smart@broadcom.com> Acked-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25lpfc: remove use of FC-specific error codesJames Smart
The lpfc driver uses the FC-specific error when it needed to return an error to the FC-NVME transport. Convert to use a generic value instead. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvmet-fcloop: remove use of FC-specific error codesJames Smart
The FC-NVME transport loopback test module used the FC-specific error codes in cases where it emulated a transport abort case. Instead of using the FC-specific values, now use a generic value (NVME_SC_INTERNAL). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvmet-fc: remove use of FC-specific error codesJames Smart
The FC-NVME target transport used the FC-specific error codes in return codes when the transport or lldd failed. Instead of using the FC-specific values, now use a generic value (NVME_SC_INTERNAL). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nvme-fc: remove use of FC-specific error codesJames Smart
The FC-NVME transport used the FC-specific error codes in cases where it had to fabricate an error to go back up stack. Instead of using the FC-specific values, now use a generic value (NVME_SC_INTERNAL). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25loop: remove union of use_aio and ref in struct loop_cmdOmar Sandoval
When the request is completed, lo_complete_rq() checks cmd->use_aio. However, if this is in fact an aio request, cmd->use_aio will have already been reused as cmd->ref by lo_rw_aio*. Fix it by not using a union. On x86_64, there's a hole after the union anyways, so this doesn't make struct loop_cmd any bigger. Fixes: 92d773324b7e ("block/loop: fix use after free") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25blktrace: Fix potential deadlock between delete & sysfs opsWaiman Long
The lockdep code had reported the following unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#228); lock(&bdev->bd_mutex/1); lock(s_active#228); lock(&bdev->bd_mutex); *** DEADLOCK *** The deadlock may happen when one task (CPU1) is trying to delete a partition in a block device and another task (CPU0) is accessing tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that partition. The s_active isn't an actual lock. It is a reference count (kn->count) on the sysfs (kernfs) file. Removal of a sysfs file, however, require a wait until all the references are gone. The reference count is treated like a rwsem using lockdep instrumentation code. The fact that a thread is in the sysfs callback method or in the ioctl call means there is a reference to the opended sysfs or device file. That should prevent the underlying block structure from being removed. Instead of using bd_mutex in the block_device structure, a new blk_trace_mutex is now added to the request_queue structure to protect access to the blk_trace structure. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Fix typo in patch subject line, and prune a comment detailing how the code used to work. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25nbd: ignore non-nbd ioctl'sJosef Bacik
In testing we noticed that nbd would spew if you ran a fio job against the raw device itself. This is because fio calls a block device specific ioctl, however the block layer will first pass this back to the driver ioctl handler in case the driver wants to do something special. Since the device was setup using netlink this caused us to spew every time fio called this ioctl. Since we don't have special handling, just error out for any non-nbd specific ioctl's that come in. This fixes the spew. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-09-25bsg-lib: don't free job in bsg_prepare_jobChristoph Hellwig
The job structure is allocated as part of the request, so we should not free it in the error path of bsg_prepare_job. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>