Age | Commit message (Collapse) | Author |
|
'scsi_dif_tuple' is unused since commit 8cb2049c7448 ("[SCSI] qla2xxx: T10
DIF - Handle uninitalized sectors.").
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20240528215640.91771-1-linux@treblig.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The following call trace was observed:
localhost kernel: nvme nvme0: NVME-FC{0}: controller connect complete
localhost kernel: BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u129:4/75092
localhost kernel: nvme nvme0: NVME-FC{0}: new ctrl: NQN "nqn.1992-08.com.netapp:sn.b42d198afb4d11ecad6d00a098d6abfa:subsystem.PR_Channel2022_RH84_subsystem_291"
localhost kernel: caller is qla_nvme_post_cmd+0x216/0x1380 [qla2xxx]
localhost kernel: CPU: 6 PID: 75092 Comm: kworker/u129:4 Kdump: loaded Tainted: G B W OE --------- --- 5.14.0-70.22.1.el9_0.x86_64+debug #1
localhost kernel: Hardware name: HPE ProLiant XL420 Gen10/ProLiant XL420 Gen10, BIOS U39 01/13/2022
localhost kernel: Workqueue: nvme-wq nvme_async_event_work [nvme_core]
localhost kernel: Call Trace:
localhost kernel: dump_stack_lvl+0x57/0x7d
localhost kernel: check_preemption_disabled+0xc8/0xd0
localhost kernel: qla_nvme_post_cmd+0x216/0x1380 [qla2xxx]
Use raw_smp_processor_id() instead of smp_processor_id().
Also use queue_work() across the driver instead of queue_work_on() thus
avoiding usage of smp_processor_id() when CONFIG_DEBUG_PREEMPT is enabled.
Cc: stable@vger.kernel.org
Suggested-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230831112146.32595-2-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add logs for SFP Temperature Alert async event to check if laser is
enabled/disabled.
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Introduce infrastructure in the driver to support the processing of
unsolicited LS (Link Service) requests. This will involve the utilization
of a new pass-up of unsolicited FC-NVMe request IOCB interface. Unsolicited
requests will be submitted to the NVMe transport layer through
nvme_fc_rcv_ls_req(). Any received LS responses, which are sent using
xmt_ls_rsp(), will be forwarded to the firmware through the existing
Pass-Through IOCB interface, responsible for sending FC-NVMe Link Service
requests and responses.
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Different behavior were experienced of session being torn down vs not when
TMF is timed out. When FW detects the time out, the session is torn down.
When driver detects the time out, the session is not torn down.
Allow TMF error to return to upper layer without session tear down.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-10-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Link up failure occurred where driver failed to see certain events from FW
indicating link up (AEN 8011) and fabric login completion (AEN 8014).
Without these 2 events, driver would not proceed forward to scan the
fabric. The cause of this is due to delay in the receive of interrupt for
Mailbox 60 that causes qla to set the fw_started flag late. The late
setting of this flag causes other interrupts to be dropped. These dropped
interrupts happen to be the link up (AEN 8011) and fabric login completion
(AEN 8014).
Set fw_started flag early to prevent interrupts being dropped.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230714070104.40052-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (ufs, pm80xx, libata-scsi, smartpqi,
lpfc, qla2xxx).
We have a couple of major core changes impacting other systems:
- Command Duration Limits, which spills into block and ATA
- block level Persistent Reservation Operations, which touches block,
nvme, target and dm
Both of these are added with merge commits containing a cover letter
explaining what's going on"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (187 commits)
scsi: core: Improve warning message in scsi_device_block()
scsi: core: Replace scsi_target_block() with scsi_block_targets()
scsi: core: Don't wait for quiesce in scsi_device_block()
scsi: core: Don't wait for quiesce in scsi_stop_queue()
scsi: core: Merge scsi_internal_device_block() and device_block()
scsi: sg: Increase number of devices
scsi: bsg: Increase number of devices
scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
scsi: ufs: ufs-pci: Add support for Intel Arrow Lake
scsi: sd: sd_zbc: Use PAGE_SECTORS_SHIFT
scsi: ufs: wb: Add explicit flush_threshold sysfs attribute
scsi: ufs: ufs-qcom: Switch to the new ICE API
scsi: ufs: dt-bindings: qcom: Add ICE phandle
scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_RTC quirk
scsi: ufs: ufs-mediatek: Set UFSHCD_QUIRK_MCQ_BROKEN_INTR quirk
scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_RTC
scsi: ufs: core: Add host quirk UFSHCD_QUIRK_MCQ_BROKEN_INTR
scsi: ufs: core: Remove dedicated hwq for dev command
scsi: ufs: core: mcq: Fix the incorrect OCS value for the device command
scsi: ufs: dt-bindings: samsung,exynos: Drop unneeded quotes
...
|
|
When target mode is enabled, the pci_irq_get_affinity() function may return
a NULL value in qla_mapq_init_qp_cpu_map() due to the qla24xx_enable_msix()
code that handles IRQ settings for target mode. This leads to a crash due
to a NULL pointer dereference.
This patch fixes the issue by adding a check for the NULL value returned by
pci_irq_get_affinity() and introducing a 'cpu_mapped' boolean flag to the
qla_qpair structure, ensuring that the qpair's CPU affinity is updated when
it has not been mapped to a CPU.
Fixes: 1d201c81d4cc ("scsi: qla2xxx: Select qpair depending on which CPU post_cmd() gets called")
Signed-off-by: Gleb Chesnokov <gleb.chesnokov@scst.dev>
Link: https://lore.kernel.org/r/56b416f2-4e0f-b6cf-d6d5-b7c372e3c6a2@scst.dev
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
System crash, where driver is accessing scsi layer's
memory (scsi_cmnd->device->host) to search for a well known internal
pointer (vha). The scsi_cmnd was released back to upper layer which
could be freed, but the driver is still accessing it.
7 [ffffa8e8d2c3f8d0] page_fault at ffffffff86c010fe
[exception RIP: __qla2x00_eh_wait_for_pending_commands+240]
RIP: ffffffffc0642350 RSP: ffffa8e8d2c3f988 RFLAGS: 00010286
RAX: 0000000000000165 RBX: 0000000000000002 RCX: 00000000000036d8
RDX: 0000000000000000 RSI: ffff9c5c56535188 RDI: 0000000000000286
RBP: ffff9c5bf7aa4a58 R8: ffff9c589aecdb70 R9: 00000000000003d1
R10: 0000000000000001 R11: 0000000000380000 R12: ffff9c5c5392bc78
R13: ffff9c57044ff5c0 R14: ffff9c56b5a3aa00 R15: 00000000000006db
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
8 [ffffa8e8d2c3f9c8] qla2x00_eh_wait_for_pending_commands at ffffffffc0646dd5 [qla2xxx]
9 [ffffa8e8d2c3fa00] __qla2x00_async_tm_cmd at ffffffffc0658094 [qla2xxx]
Remove access of freed memory. Currently the driver was checking to see if
scsi_done was called by seeing if the sp->type has changed. Instead,
check to see if the command has left the oustanding_cmds[] array as
sign of scsi_done was called.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Task management cmd failed with status 30h which means
FW is not able to finish processing one task management
before another task management for the same lun.
Hence add wait for completion of marker to space it out.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304271802.uCZfwQC1-lkp@intel.com/
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com <mailto:himanshu.madhani@oracle.com>>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
A system hang was observed with the following call trace:
BUG: kernel NULL pointer dereference, address: 0000000000000000
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 15 PID: 86747 Comm: nvme Kdump: loaded Not tainted 6.2.0+ #1
Hardware name: Dell Inc. PowerEdge R6515/04F3CJ, BIOS 2.7.3 03/31/2022
RIP: 0010:__wake_up_common+0x55/0x190
Code: 41 f6 01 04 0f 85 b2 00 00 00 48 8b 43 08 4c 8d
40 e8 48 8d 43 08 48 89 04 24 48 89 c6\
49 8d 40 18 48 39 c6 0f 84 e9 00 00 00 <49> 8b 40 18 89 6c 24 14 31
ed 4c 8d 60 e8 41 8b 18 f6 c3 04 75 5d
RSP: 0018:ffffb05a82afbba0 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffff8f9b83a00018 RCX: 0000000000000000
RDX: 0000000000000001 RSI: ffff8f9b83a00020 RDI: ffff8f9b83a00018
RBP: 0000000000000001 R08: ffffffffffffffe8 R09: ffffb05a82afbbf8
R10: 70735f7472617473 R11: 5f30307832616c71 R12: 0000000000000001
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f815cf4c740(0000) GS:ffff8f9eeed80000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000010633a000 CR4: 0000000000350ee0
Call Trace:
<TASK>
__wake_up_common_lock+0x83/0xd0
qla_nvme_ls_req+0x21b/0x2b0 [qla2xxx]
__nvme_fc_send_ls_req+0x1b5/0x350 [nvme_fc]
nvme_fc_xmt_disconnect_assoc+0xca/0x110 [nvme_fc]
nvme_fc_delete_association+0x1bf/0x220 [nvme_fc]
? nvme_remove_namespaces+0x9f/0x140 [nvme_core]
nvme_do_delete_ctrl+0x5b/0xa0 [nvme_core]
nvme_sysfs_delete+0x5f/0x70 [nvme_core]
kernfs_fop_write_iter+0x12b/0x1c0
vfs_write+0x2a3/0x3b0
ksys_write+0x5f/0xe0
do_syscall_64+0x5c/0x90
? syscall_exit_work+0x103/0x130
? syscall_exit_to_user_mode+0x12/0x30
? do_syscall_64+0x69/0x90
? exit_to_user_mode_loop+0xd0/0x130
? exit_to_user_mode_prepare+0xec/0x100
? syscall_exit_to_user_mode+0x12/0x30
? do_syscall_64+0x69/0x90
? syscall_exit_to_user_mode+0x12/0x30
? do_syscall_64+0x69/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f815cd3eb97
The IOCB counts are out of order and that would block any commands from
going out and subsequently hang the system. Synchronize the IOCB count to
be in correct order.
Fixes: 5f63a163ed2f ("scsi: qla2xxx: Fix exchange oversubscription for management commands")
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230313043711.13500-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Tested-by: Lin Li <lilin@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The LLDD and the stack currently process FPINs received from the fabric,
but the stack is not aware of any action taken by the driver to alleviate
congestion. The current interface between the driver and the SCSI stack is
limited to passing the notification mainly for statistics and heuristics.
The reaction to an FPIN could be handled either by the driver or by the
stack (marginal path and failover). Amend the interface to indicate if
action on an FPIN has already been reacted to by the LLDDs or not. Add an
additional flag to fc_host_fpin_rcv() to indicate if the FPIN has been
acknowledged/reacted to by the driver.
Also added a new event code FCH_EVT_LINK_FPIN_ACK to notify to the user
that the event has been acknowledged/reacted by the LLDD driver
Link: https://lore.kernel.org/r/20230209034326.882514-1-muneendra.kumar@broadcom.com
Co-developed-by: Anil Gurumurthy <agurumurthy@marvell.com>
Signed-off-by: Anil Gurumurthy <agurumurthy@marvell.com>
Co-developed-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In current I/O path, Tx and Rx may not be processed on same CPU. This may
lead to thrashing and optimum performance may not be achieved.
Pick qpair such that Tx and Rx are processed on same CPU.
Signed-off-by: Shreyas Deodhar <sdeodhar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Residual underrun is not an interface error, hence no need to increment
that count.
Fixes: dbf1f53cfd23 ("scsi: qla2xxx: Implementation to get and manage host, target stats and initiator port")
Cc: stable@vger.kernel.org
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add resource checking for management (non-I/O) commands.
Fixes: 89c72f4245a8 ("scsi: qla2xxx: Add IOCB resource tracking")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
In large environment, it is possible to experience command timeout and
escalation of path recovery. Currently the driver does not track the number
of exchanges/commands sent to FW. If there is a delay for commands at the
head of the queue, then this will create back pressure for commands at the
back of the queue.
Check for exchange availability before command submission.
Fixes: 89c72f4245a8 ("scsi: qla2xxx: Add IOCB resource tracking")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
On some platforms, the current logic of relying on finding new packet
solely based on signature pattern can lead to driver reading stale
packets. Though this is a bug in those platforms, reduce such exposures by
limiting reading packets until the IN pointer.
Link: https://lore.kernel.org/r/20220826102559.17474-3-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
stale packets"
Reverting this commit so that a fixed up patch, without adding new module
parameters, can be submitted.
Link: https://lore.kernel.org/stable/166039743723771@kroah.com/
This reverts commit b1f707146923335849fb70237eec27d4d1ae7d62.
Link: https://lore.kernel.org/r/20220826102559.17474-2-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This patch fixes IKE message being dropped due to error in processing Purex
IOCB and Continuation IOCBs.
Link: https://lore.kernel.org/r/20220713052045.10683-6-njavali@marvell.com
Fixes: fac2807946c1 ("scsi: qla2xxx: edif: Add extraction of auth_els from the wire")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
On some platforms, the current logic of relying on finding new packet
solely based on signature pattern can lead to driver reading stale
packets. Though this is a bug in those platforms, reduce such exposures by
limiting reading packets until the IN pointer.
Two module parameters are introduced:
ql2xrspq_follow_inptr:
When set, on newer adapters that has queue pointer shadowing, look for
response packets only until response queue in pointer.
When reset, response packets are read based on a signature pattern
logic (old way).
ql2xrspq_follow_inptr_legacy:
Like ql2xrspq_follow_inptr, but for those adapters where there is no
queue pointer shadowing.
Link: https://lore.kernel.org/r/20220713052045.10683-5-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Replace display field with the correct field.
Link: https://lore.kernel.org/r/20220713052045.10683-3-njavali@marvell.com
Fixes: 8777e4314d39 ("scsi: qla2xxx: Migrate NVME N2N handling into state machine")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
For 8G adapters, multi-queue was enabled accidentally. Make sure
multi-queue is not enabled.
Link: https://lore.kernel.org/r/20220616053508.27186-5-njavali@marvell.com
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
FW requires minimum 72 bytes buffer size for D_port result. Buffer size
1024 is mentioned in the FW spec so buffer size is increased to 1024.
Rewrite the logic to handle START/RESTART command from SDMAPI.
Link: https://lore.kernel.org/r/20220616053508.27186-3-njavali@marvell.com
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Disable printing I/O error messages by default. The messages will be
printed only when logging was enabled.
Link: https://lore.kernel.org/r/20220616053508.27186-2-njavali@marvell.com
Fixes: 8e2d81c6b5be ("scsi: qla2xxx: Fix excessive messages during device logout")
Cc: stable@vger.kernel.org
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
User experience slow recovery when target device went through a stop/start
of the authentication application (app_stop/app_start).
Between the period of app_stop and app_start on the target device, target
device choose to send ELS Reject for any receive AUTH ELS command. At this
time, authentication application does not do ELS reject if it encounters
error.
Therefore, AUTH ELS reject signify authentication application is not
running. If driver passes up the AUTH ELS Reject to the authentication
application, then it would result in authentication application
retrying/resending the same AUTH ELS command again + delay.
As a work around, driver should trigger a session tear down where it tells
the local authentication application to also tear down. At the next
relogin, both sides are then synchronized.
Link: https://lore.kernel.org/r/20220608115849.16693-10-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
If all keys for a session have been deleted, trigger a session teardown.
Link: https://lore.kernel.org/r/20220608115849.16693-6-njavali@marvell.com
Fixes: dd30706e73b7 ("scsi: qla2xxx: edif: Add key update")
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
User experienced no task management error while target device is responding
with error. The RSP_CODE field in the status IOCB is in little endian.
Driver assumes it's big endian and it picked up erroneous data.
Convert the data back to big endian as is on the wire.
Link: https://lore.kernel.org/r/20220310092604.22950-2-njavali@marvell.com
Fixes: faef62d13463 ("[SCSI] qla2xxx: Fix Task Management command asynchronous handling")
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Make port_state_str and port_dstate_str a little more readable and
maintainable by using named initializers.
Also convert FCS_* macros into an enum.
Link: https://lore.kernel.org/r/AS8PR10MB495215841EB25C16DBC0CB409D349@AS8PR10MB4952.EURPRD10.PROD.OUTLOOK.COM
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Gleb Chesnokov <Chesnokov.G@raidix.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Modify trace messages for additional debugability.
Link: https://lore.kernel.org/r/20211026115412.27691-9-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Disable default logging of some I/O path messages. If desired, the messages
can be turned back on by setting ql2xextended_error_logging.
Link: https://lore.kernel.org/r/20210925035154.29815-1-njavali@marvell.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When Target port transitions personality from one to another (NVMe <-->
FCP), there could be some overlap of the two where one layer is going down
while the other layer is coming up. This overlap can cause temporary I/O
error. Detect those errors/transitions and recover from them. Triggers
session tear down and allow relogin to re-drive the connection under the
following conditions:
- NVMe command error
- On PRLO + N2N (rida format 2)
Link: https://lore.kernel.org/r/20210817051315.2477-11-njavali@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When firmware indicates session has been torn down via UPDATE SA IOCB or
ELS Passthrough IOCB, the driver needs to also tear down the session.
Link: https://lore.kernel.org/r/20210817051315.2477-2-njavali@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The MSI-X and MSI calls fails in kdump kernel. Because of this
qla2xxx_create_qpair() fails leading to .create_queue callback failure.
The fix is to return existing qpair instead of allocating new one and
allocate a single hw queue.
[ 19.975838] qla2xxx [0000:d8:00.1]-00c7:11: MSI-X: Failed to enable support,
giving up -- 16/-28.
[ 19.984885] qla2xxx [0000:d8:00.1]-0037:11: Falling back-to MSI mode --
ret=-28.
[ 19.992278] qla2xxx [0000:d8:00.1]-0039:11: Falling back-to INTa mode --
ret=-28.
..
..
..
[ 21.141518] qla2xxx [0000:d8:00.0]-2104:2: qla_nvme_alloc_queue: handle
00000000e7ee499d, idx =1, qsize 32
[ 21.151166] qla2xxx [0000:d8:00.0]-0181:2: FW/Driver is not multi-queue capable.
[ 21.158558] qla2xxx [0000:d8:00.0]-2122:2: Failed to allocate qpair
[ 21.164824] nvme nvme0: NVME-FC{0}: reset: Reconnect attempt failed (-22)
[ 21.171612] nvme nvme0: NVME-FC{0}: Reconnect attempt in 2 seconds
Link: https://lore.kernel.org/r/20210810043720.1137-13-njavali@marvell.com
Cc: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Suppress logging of retryable errors. These can still be seen if extended
logging is enabled.
Link: https://lore.kernel.org/r/20210810043720.1137-11-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Add debug print of 64G link speed.
Link: https://lore.kernel.org/r/20210810043720.1137-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Increment the command and the completion counts.
Link: https://lore.kernel.org/r/20210624052606.21613-11-njavali@marvell.com
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Some FC adapters from Marvell offer the ability to encrypt data in flight
(EDIF). This feature requires an application to act as an authenticator.
There is no FC switch scan service that can indicate whether a device is
secure or non-secure.
In order to detect whether the remote port supports encrypted operation,
driver must first do a PLOGI with the remote device. On completion of the
PLOGI, driver will query firmware to see if the device supports secure
login. To do that, driver + firmware must advertise the security bit via
PLOGI's service parameter. The remote device shall respond using the same
service parameter whether it supports it or not.
Link: https://lore.kernel.org/r/20210624052606.21613-8-njavali@marvell.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Co-developed-by: Larry Wisneski <Larry.Wisneski@marvell.com>
Signed-off-by: Larry Wisneski <Larry.Wisneski@marvell.com>
Co-developed-by: Duane Grigsby <duane.grigsby@marvell.com>
Signed-off-by: Duane Grigsby <duane.grigsby@marvell.com>
Co-developed-by: Rick Hicksted Jr <rhicksted@marvell.com>
Signed-off-by: Rick Hicksted Jr <rhicksted@marvell.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Some FC adapters from Marvell offer the ability to encrypt data in flight
(EDIF). This feature requires an application to act as an authenticator.
As part of the authentication process, the authentication application will
generate a SADB entry (Security Association/SA, key, SPI value, etc). This
SADB is then passed to driver to be programmed into hardware. There will be
a pair of SADB's (Tx and Rx) for each connection.
After some period, the application can choose to change the key. At that
time, a new set of SADB pair is given to driver. The old set of SADB will
be deleted.
Add a new bsg call (QL_VND_SC_SA_UPDATE) to allow application to allow
adding or deleting SADB entries. Driver will not keep the key in
memory. It will pass it to HW.
It is assumed that application will assign a unique SPI value to this SADB
(SA + key). Driver + hardware will assign a handle to track this unique
SPI/SADB.
Link: https://lore.kernel.org/r/20210624052606.21613-6-njavali@marvell.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Co-developed-by: Larry Wisneski <Larry.Wisneski@marvell.com>
Signed-off-by: Larry Wisneski <Larry.Wisneski@marvell.com>
Co-developed-by: Duane Grigsby <duane.grigsby@marvell.com>
Signed-off-by: Duane Grigsby <duane.grigsby@marvell.com>
Co-developed-by: Rick Hicksted Jr <rhicksted@marvell.com>
Signed-off-by: Rick Hicksted Jr <rhicksted@marvell.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Some FC adapters from Marvell offer the ability to encrypt data in flight
(EDIF). This feature requires an application to act as an authenticator.
Once authentication messages sent from a remote device have arrived, each
message is extracted and placed in a buffer for application to retrieve.
The FC frame header will be stripped, leaving behind the AUTH ELS payload.
It is up to the application to strip the AUTH ELS header to get to the
actual authentication message.
Link: https://lore.kernel.org/r/20210624052606.21613-5-njavali@marvell.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Co-developed-by: Larry Wisneski <Larry.Wisneski@marvell.com>
Signed-off-by: Larry Wisneski <Larry.Wisneski@marvell.com>
Co-developed-by: Duane Grigsby <duane.grigsby@marvell.com>
Signed-off-by: Duane Grigsby <duane.grigsby@marvell.com>
Co-developed-by: Rick Hicksted Jr <rhicksted@marvell.com>
Signed-off-by: Rick Hicksted Jr <rhicksted@marvell.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Some FC adapters from Marvell offer the ability to encrypt data in flight
(EDIF). This feature requires an application to act as an authenticator.
Add the ability for authentication application to send and retrieve
messages as part of the authentication process via existing
FC_BSG_HST_ELS_NOLOGIN BSG interface.
To send a message, application is expected to format the data in the AUTH
ELS format. Refer to FC-SP2 for details.
If a message was received, application is required to reply with either a
LS_ACC or LS_RJT complete the exchange using the same interface. Otherwise,
remote device will treat it as a timeout.
Link: https://lore.kernel.org/r/20210624052606.21613-4-njavali@marvell.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Co-developed-by: Larry Wisneski <Larry.Wisneski@marvell.com>
Signed-off-by: Larry Wisneski <Larry.Wisneski@marvell.com>
Co-developed-by: Duane Grigsby <duane.grigsby@marvell.com>
Signed-off-by: Duane Grigsby <duane.grigsby@marvell.com>
Co-developed-by: Rick Hicksted Jr <rhicksted@marvell.com>
Signed-off-by: Rick Hicksted Jr <rhicksted@marvell.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Use "no-op" mailbox command to check if the adapter firmware is still
responsive.
Link: https://lore.kernel.org/r/20210619052427.6440-1-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Introduce scsi_build_sense() as a wrapper around scsi_build_sense_buffer()
to format the buffer and set the correct SCSI status.
Link: https://lore.kernel.org/r/20210427083046.31620-8-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Commit a6dcfe08487e ("scsi: qla2xxx: Limit interrupt vectors to number of
CPUs") lowers the number of allocated MSI-X vectors to the number of CPUs.
That breaks vector allocation assumptions in qla83xx_iospace_config(),
qla24xx_enable_msix() and qla2x00_iospace_config(). Either of the functions
computes maximum number of qpairs as:
ha->max_qpairs = ha->msix_count - 1 (MB interrupt) - 1 (default
response queue) - 1 (ATIO, in dual or pure target mode)
max_qpairs is set to zero in case of two CPUs and initiator mode. The
number is then used to allocate ha->queue_pair_map inside
qla2x00_alloc_queues(). No allocation happens and ha->queue_pair_map is
left NULL but the driver thinks there are queue pairs available.
qla2xxx_queuecommand() tries to find a qpair in the map and crashes:
if (ha->mqenable) {
uint32_t tag;
uint16_t hwq;
struct qla_qpair *qpair = NULL;
tag = blk_mq_unique_tag(cmd->request);
hwq = blk_mq_unique_tag_to_hwq(tag);
qpair = ha->queue_pair_map[hwq]; # <- HERE
if (qpair)
return qla2xxx_mqueuecommand(host, cmd, qpair);
}
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 72 Comm: kworker/u4:3 Tainted: G W 5.10.0-rc1+ #25
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: scsi_wq_7 fc_scsi_scan_rport [scsi_transport_fc]
RIP: 0010:qla2xxx_queuecommand+0x16b/0x3f0 [qla2xxx]
Call Trace:
scsi_queue_rq+0x58c/0xa60
blk_mq_dispatch_rq_list+0x2b7/0x6f0
? __sbitmap_get_word+0x2a/0x80
__blk_mq_sched_dispatch_requests+0xb8/0x170
blk_mq_sched_dispatch_requests+0x2b/0x50
__blk_mq_run_hw_queue+0x49/0xb0
__blk_mq_delay_run_hw_queue+0xfb/0x150
blk_mq_sched_insert_request+0xbe/0x110
blk_execute_rq+0x45/0x70
__scsi_execute+0x10e/0x250
scsi_probe_and_add_lun+0x228/0xda0
__scsi_scan_target+0xf4/0x620
? __pm_runtime_resume+0x4f/0x70
scsi_scan_target+0x100/0x110
fc_scsi_scan_rport+0xa1/0xb0 [scsi_transport_fc]
process_one_work+0x1ea/0x3b0
worker_thread+0x28/0x3b0
? process_one_work+0x3b0/0x3b0
kthread+0x112/0x130
? kthread_park+0x80/0x80
ret_from_fork+0x22/0x30
The driver should allocate enough vectors to provide every CPU it's own HW
queue and still handle reserved (MB, RSP, ATIO) interrupts.
The change fixes the crash on dual core VM and prevents unbalanced QP
allocation where nr_hw_queues is two less than the number of CPUs.
Link: https://lore.kernel.org/r/20210412165740.39318-1-r.bolshakov@yadro.com
Fixes: a6dcfe08487e ("scsi: qla2xxx: Limit interrupt vectors to number of CPUs")
Cc: Daniel Wagner <daniel.wagner@suse.com>
Cc: Himanshu Madhani <himanshu.madhani@oracle.com>
Cc: Quinn Tran <qutran@marvell.com>
Cc: Nilesh Javali <njavali@marvell.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: stable@vger.kernel.org # 5.11+
Reported-by: Aleksandr Volkov <a.y.volkov@yadro.com>
Reported-by: Aleksandr Miloserdov <a.miloserdov@yadro.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: qla2x00_abort_isp+0x21/0x6b0 [qla2xxx] PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 0 PID: 1715 Comm: kworker/0:2
Tainted: GOE 4.12.14-122.37-default #1 SLE12-SP5
Hardware name: HPE Superdome Flex/Superdome Flex, BIOS
Bundle:3.30.100 SFW:IP147.007.004.017.000.2009211957 09/21/2020
Workqueue: events aer_recover_work_func
task: ffff9e399c14ca80 task.stack: ffffc1c58e4ac000
RIP: 0010:qla2x00_abort_isp+0x21/0x6b0 [qla2xxx]
RSP: 0018:ffffc1c58e4afd50 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff9e419cdef480 RCX: 0000000000000000
RDX: ffff9e399c14ca80 RSI: 0000000000000246 RDI: ffff9e419bbc27b8
RBP: ffff9e419bbc27b8 R08: 0000000000000004 R09: 00000000a0440000
R10: 0000000000000000 R11: ffff9e399416d1a0 R12: ffff9e419cdef000
R13: ffff9e3a7cfae800 R14: ffff9e3a7cfae800 R15: 00000000000000c0
FS: 0000000000000000(0000) GS:ffff9e39a0000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006cd00a005 CR4: 00000000007606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
qla2xxx_pci_slot_reset+0x141/0x160 [qla2xxx]
report_slot_reset+0x41/0x80
? merge_result.part.4+0x30/0x30
pci_walk_bus+0x70/0x90
pcie_do_recovery+0x1db/0x2e0
aer_recover_work_func+0xc2/0xf0
process_one_work+0x14c/0x390
Disable board_disable logic where driver resources are freed while OS is in
the process of recovering the adapter.
Link: https://lore.kernel.org/r/20210329085229.4367-9-njavali@marvell.com
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Fix all recently introduced endianness annotation issues.
Link: https://lore.kernel.org/r/20210320232359.941-4-bvanassche@acm.org
Cc: Quinn Tran <qutran@marvell.com>
Cc: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Fix the following coccicheck warnings:
./drivers/scsi/qla2xxx/qla_isr.c:780:2-18: WARNING: Assignment
of 0/1 to bool variable.
Link: https://lore.kernel.org/r/1611127919-56551-1-git-send-email-abaci-bugfix@linux.alibaba.com
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Zhong <abaci-bugfix@linux.alibaba.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
FW needs to wait for an ABTS response before completing the I/O.
Link: https://lore.kernel.org/r/20210111093134.1206-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
This change will aid in debugging issues arising because of dropped frame,
DIF errors, queue full etc where debug level is not set.
Link: https://lore.kernel.org/r/20210111093134.1206-4-njavali@marvell.com
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
initiator port
This statistics will help in debugging process and checking specific error
counts. It also provides a capability to isolate the port or bring it out
of isolation.
Link: https://lore.kernel.org/r/20210111093134.1206-2-njavali@marvell.com
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
The completion status 0x28 (ppc = be = 0x2800) below indicates session is
not there, trigger session deletion.
qla2xxx [000b:04:00.1]-8009:8: DEVICE RESET ISSUED nexus=8:1:51 cmd=c000001432d0f600.
qla2xxx [000b:04:00.1]-5039:8: Async-tmf error - hdl=67b completion status(2800).
qla2xxx [000b:04:00.1]-8030:8: TM IOCB failed (102).
qla2xxx [000b:04:00.1]-800c:8: do_reset failed for cmd=c000001432d0f600.
qla2xxx [000b:04:00.1]-800f:8: DEVICE RESET FAILED: Task management failed nexus=8:1:51 cmd=c000001432d0f600.
qla2xxx [000b:04:00.1]-8009:8: DEVICE RESET ISSUED nexus=8:1:52 cmd=c000001432d0c200.
qla2xxx [000b:04:00.1]-5039:8: Async-tmf error - hdl=67c completion status(2800).
qla2xxx [000b:04:00.1]-8030:8: TM IOCB failed (102).
Link: https://lore.kernel.org/r/20201202132312.19966-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|