summaryrefslogtreecommitdiff
path: root/drivers/usb/host/xhci-ring.c
AgeCommit message (Collapse)Author
2024-11-06usb: xhci: Avoid queuing redundant Stop Endpoint commandsMichal Pecio
Stop Endpoint command on an already stopped endpoint fails and may be misinterpreted as a known hardware bug by the completion handler. This results in an unnecessary delay with repeated retries of the command. Avoid queuing this command when endpoint state flags indicate that it's stopped or halted and the command will fail. If commands are pending on the endpoint, their completion handlers will process cancelled TDs so it's done. In case of waiting for external operations like clearing TT buffer, the endpoint is stopped and cancelled TDs can be processed now. This eliminates practically all unnecessary retries because an endpoint with pending URBs is maintained in Running state by the driver, unless aforementioned commands or other operations are pending on it. This is guaranteed by xhci_ring_ep_doorbell() and by the fact that it is called every time any of those operations completes. The only known exceptions are hardware bugs (the endpoint never starts at all) and Stream Protocol errors not associated with any TRB, which cause an endpoint reset not followed by restart. Sounds like a bug. Generally, these retries are only expected to happen when the endpoint fails to start for unknown/no reason, which is a worse problem itself, and fixing the bug eliminates the retries too. All cases were tested and found to work as expected. SET_DEQ_PENDING was produced by patching uvcvideo to unlink URBs in 100us intervals, which then runs into this case very often. EP_HALTED was produced by restarting 'cat /dev/ttyUSB0' on a serial dongle with broken cable. EP_CLEARING_TT by the same, with the dongle on an external hub. Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers") CC: stable@vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-34-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06usb: xhci: Fix TD invalidation under pending Set TR DequeueMichal Pecio
xhci_invalidate_cancelled_tds() may not work correctly if the hardware is modifying endpoint or stream contexts at the same time by executing a Set TR Dequeue command. And even if it worked, it would be unable to queue Set TR Dequeue for the next stream, failing to clear xHC cache. On stream endpoints, a chain of Set TR Dequeue commands may take some time to execute and we may want to cancel more TDs during this time. Currently this leads to Stop Endpoint completion handler calling this function without testing for SET_DEQ_PENDING, which will trigger the aforementioned problems when it happens. On all endpoints, a halt condition causes Reset Endpoint to be queued and an error status given to the class driver, which may unlink more URBs in response. Stop Endpoint is queued and its handler may execute concurrently with Set TR Dequeue queued by Reset Endpoint handler. (Reset Endpoint handler calls this function too, but there seems to be no possibility of it running concurrently with Set TR Dequeue). Fix xhci_invalidate_cancelled_tds() to work correctly under a pending Set TR Dequeue. Bail out of the function when SET_DEQ_PENDING is set, then make the completion handler call the function again and also call xhci_giveback_invalidated_tds(), which needs to be called next. This seems to fix another potential bug, where the handler would call xhci_invalidate_cancelled_tds(), which may clear some deferred TDs if a sanity check fails, and the TDs wouldn't be given back promptly. Said sanity check seems to be wrong and prone to false positives when the endpoint halts, but fixing it is beyond the scope of this change, besides ensuring that cleared TDs are given back properly. Fixes: 5ceac4402f5d ("xhci: Handle TD clearing for multiple streams case") CC: stable@vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-33-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06usb: xhci: Limit Stop Endpoint retriesMichal Pecio
Some host controllers fail to atomically transition an endpoint to the Running state on a doorbell ring and enter a hidden "Restarting" state, which looks very much like Stopped, with the important difference that it will spontaneously transition to Running anytime soon. A Stop Endpoint command queued in the Restarting state typically fails with Context State Error and the completion handler sees the Endpoint Context State as either still Stopped or already Running. Even a case of Halted was observed, when an error occurred right after the restart. The Halted state is already recovered from by resetting the endpoint. The Running state is handled by retrying Stop Endpoint. The Stopped state was recognized as a problem on NEC controllers and worked around also by retrying, because the endpoint soon restarts and then stops for good. But there is a risk: the command may fail if the endpoint is "stopped for good" already, and retries will fail forever. The possibility of this was not realized at the time, but a number of cases were discovered later and reproduced. Some proved difficult to deal with, and it is outright impossible to predict if an endpoint may fail to ever start at all due to a hardware bug. One such bug (albeit on ASM3142, not on NEC) was found to be reliably triggered simply by toggling an AX88179 NIC up/down in a tight loop for a few seconds. An endless retries storm is quite nasty. Besides putting needless load on the xHC and CPU, it causes URBs never to be given back, paralyzing the device and connection/disconnection logic for the whole bus if the device is unplugged. User processes waiting for URBs become unkillable, drivers and kworker threads lock up and xhci_hcd cannot be reloaded. For peace of mind, impose a timeout on Stop Endpoint retries in this case. If they don't succeed in 100ms, consider the endpoint stopped permanently for some reason and just give back the unlinked URBs. This failure case is rare already and work is under way to make it rarer. Start this work today by also handling one simple case of race with Reset Endpoint, because it costs just two lines to implement. Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers") CC: stable@vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-32-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06usb: xhci: add help function xhci_dequeue_td()Niklas Neronin
Add xhci_dequeue_td() helper function to reduce code duplication. Function xhci_dequeue_td() advances the dequeue pointer past the specified Transfer Descriptor (TD) and releases the TD. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-30-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06usb: xhci: refactor xhci_td_cleanup() to return voidNiklas Neronin
The function is modified to return 'void' instead of an integer since it invariably returns '0'. Additionally, multiple functions which only return xhci_td_cleanup() are also refactored to return void. This change eliminates the need for callers to handle a return value that does not convey meaningful information and improve code readability, as it becomes immediately clear that the function does not produce a significant output. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-29-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06usb: xhci: remove unused arguments from td_to_noop()Niklas Neronin
Function td_to_noop() does not utilize arguments 'xhci' and 'ep_ring'. These unused arguments are removed to clean up the code. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-28-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06usb: xhci: simplify TDs start and end naming scheme in struct 'xhci_td'Niklas Neronin
Old names: * start_seg - last_trb_seg * first_trb - last_trb New names: * start_seg - end_seg * start_trb - end_trb A Transfer Descriptor (TD) in the xhci driver is a data structure that represents a single transaction to be performed by the USB host controller. This transaction is defined by TRBs from 'start_trb' in 'start_seg' to 'end_trb' in 'end_seg'. The terms "start" and "end" were chosen over "first" and "last" for ease of searching within the codebase. The ring structure uses 'first_seg' and 'last_seg', while the TD structure uses 'start_seg' and 'end_seg'. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-24-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06xhci: Fix control transfer error on Etron xHCI hostKuangyi Chiang
Performing a stability stress test on a USB3.0 2.5G ethernet adapter results in errors like this: [ 91.441469] r8152 2-3:1.0 eth3: get_registers -71 [ 91.458659] r8152 2-3:1.0 eth3: get_registers -71 [ 91.475911] r8152 2-3:1.0 eth3: get_registers -71 [ 91.493203] r8152 2-3:1.0 eth3: get_registers -71 [ 91.510421] r8152 2-3:1.0 eth3: get_registers -71 The r8152 driver will periodically issue lots of control-IN requests to access the status of ethernet adapter hardware registers during the test. This happens when the xHCI driver enqueue a control TD (which cross over the Link TRB between two ring segments, as shown) in the endpoint zero's transfer ring. Seems the Etron xHCI host can not perform this TD correctly, causing the USB transfer error occurred, maybe the upper driver retry that control-IN request can solve problem, but not all drivers do this. | | ------- | TRB | Setup Stage ------- | TRB | Link ------- ------- | TRB | Data Stage ------- | TRB | Status Stage ------- | | To work around this, the xHCI driver should enqueue a No Op TRB if next available TRB is the Link TRB in the ring segment, this can prevent the Setup and Data Stage TRB to be breaked by the Link TRB. Check if the XHCI_ETRON_HOST quirk flag is set before invoking the workaround in xhci_queue_ctrl_tx(). Fixes: d0e96f5a71a0 ("USB: xhci: Control transfer support.") Cc: stable@vger.kernel.org Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-20-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06xhci: trace stream context at Set TR Deq command completionMathias Nyman
A successful 'Set TR Deq' command writes a new dequeue pointer to the stream context for stream rings, show the content of the stream ctx at command completion. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-9-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06xhci: Don't trace ring at every enqueue or dequeue increaseMathias Nyman
Don't trace ring pointers every time driver increases enqueue pointer after queuing a TRB, or dequeue pointer after handling an event. These xhci_inc_deq: and xhci_inc_enq: trace entries fill up the trace and provides little useful info now that the TRB DMA address is printed with the TRB itself. Only trace ring during xhci_inc_enq() and xhci_inc_deq() in case we move to a new ring segment. Also don't show both segment and TRB addess in trace. Segment is just TRB with 0xfff masked off. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-7-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06xhci: show DMA address of TRB when tracing TRBsMathias Nyman
The DMA address of a queued TRB is essential when looking at traces as both transfer events and command completion events refer to the command or transfer based on its DMA address. Previously the TRB address was figured out from xhci_inc_enq and xhci_inc_deq trace entries seen after queuing or handlong a TRB. Now that DMA address is shown in TRB tracing we can get rid of most of the xhci_inc_enq and xhci_inc_deq traces thus decreasing trace size. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-6-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06usb: xhci: Fix sum_trb_lengths()Michal Pecio
This function is supposed to sum the lengths of all transfer TRBs in a TD up to a point, but it starts summing at the current dequeue since it only ever gets called on the first pending TD. This won't work when there are cancelled TDs at the beginning of the ring. The function tries to exclude No-Ops from the count, but not all cancelled TDs are No-Op'ed - not those the HW stopped on. The absolutely obvious fix is to start counting at the TD's first TRB. And remove the now-useless 'ring' parameter, and 'xhci' too. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-06usb: xhci: Remove unused parameters of next_trb()Michal Pecio
The function has two parameters which it doesn't use and hasn't ever used. One caller even puts NULL there, knowing it will work anyway. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241106101459.775897-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-11-05Merge v6.12-rc6 into usb-nextGreg Kroah-Hartman
We need the USB fixes in here as well, and this resolves a merge conflict in: drivers/usb/typec/tcpm/tcpm.c Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Link: https://lore.kernel.org/r/20241101150730.090dc30f@canb.auug.org.au Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-29xhci: Fix Link TRB DMA in command ring stopped completion eventFaisal Hassan
During the aborting of a command, the software receives a command completion event for the command ring stopped, with the TRB pointing to the next TRB after the aborted command. If the command we abort is located just before the Link TRB in the command ring, then during the 'command ring stopped' completion event, the xHC gives the Link TRB in the event's cmd DMA, which causes a mismatch in handling command completion event. To address this situation, move the 'command ring stopped' completion event check slightly earlier, since the specific command it stopped on isn't of significant concern. Fixes: 7f84eef0dafb ("USB: xhci: No-op command queueing and irq handler.") Cc: stable@vger.kernel.org Signed-off-by: Faisal Hassan <quic_faisalh@quicinc.com> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241022155631.1185-1-quic_faisalh@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-21Merge 6.12-rc4 into usb-nextGreg Kroah-Hartman
We need the USB fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17usb: xhci: Fix handling errors mid TD followed by other errorsMichal Pecio
Some host controllers fail to produce the final completion event on an isochronous TD which experienced an error mid TD. We deal with it by flagging such TDs and checking if the next event points at the flagged TD or at the next one, and giving back the flagged TD if the latter. This is not enough, because the next TD may be missed by the xHC. Or there may be no next TD but a ring underrun. We also need to get such TD quickly out of the way, or errors on later TDs may be handled wrong. If the next TD experiences a Missed Service Error, we will set the skip flag on the endpoint and then attempt skipping TDs when yet another event arrives. In such scenario, we ought to report the 'error mid TD' transfer as such rather than skip it. Another problem case are Stopped events. If we see one after an error mid TD, we naively assume that it's a Force Stopped Event because it doesn't match the pending TD, but in reality it might be an ordinary Stopped event for the next TD, which we fail to recognize and handle. Fix this by moving error mid TD handling before the whole TD skipping loop. Remove unnecessary conditions, always give back the TD if the new event points to any TRB outside it or if the pointer is NULL, as may be the case in Ring Underrun and Overrun events on 1st gen hardware. Only if the pending TD isn't flagged, consider other actions like skipping. As a side effect of reordering with skip and FSE cases, error mid TD is reordered with last_td_was_short check. This is harmless, because the two cases are mutually exclusive - only one can happen in any given run of handle_tx_event(). Tested on the NEC host and a USB camera with flaky cable. Dynamic debug confirmed that Transaction Errors are sometimes seen, sometimes mid-TD, sometimes followed by Missed Service. In such cases, they were finished properly before skipping began. [Rebase on 6.12-rc1 -Mathias] Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241016140000.783905-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-17xhci: Mitigate failed set dequeue pointer commandsMathias Nyman
Avoid xHC host from processing a cancelled URB by always turning cancelled URB TDs into no-op TRBs before queuing a 'Set TR Deq' command. If the command fails then xHC will start processing the cancelled TD instead of skipping it once endpoint is restarted, causing issues like Babble error. This is not a complete solution as a failed 'Set TR Deq' command does not guarantee xHC TRB caches are cleared. Fixes: 4db356924a50 ("xhci: turn cancelled td cleanup to its own function") Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20241016140000.783905-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-10-04usb: host: fix typo in the commentYan Zhen
Correctly spelled comments make it easier for the reader to understand the code. Fix typos: 'calcalate' -> 'calculate', 'complted' -> 'completed', 'inidicator' -> 'indicator', 'detction' -> 'detection', 'allocte' -> 'allocate', 'controlles' -> 'controllers', 'initated' -> 'initiated', 'resumeable' -> 'resumable', 'aquires' -> 'acquires', 'tranfers' -> 'transfers', 'tranferred' -> 'transferred'. Signed-off-by: Yan Zhen <yanzhen@vivo.com> Link: https://lore.kernel.org/r/20240919110517.1793550-1-yanzhen@vivo.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-11usb: xhci: fix loss of data on Cadence xHCPawel Laszczak
Streams should flush their TRB cache, re-read TRBs, and start executing TRBs from the beginning of the new dequeue pointer after a 'Set TR Dequeue Pointer' command. Cadence controllers may fail to start from the beginning of the dequeue TRB as it doesn't clear the Opaque 'RsvdO' field of the stream context during 'Set TR Dequeue' command. This stream context area is where xHC stores information about the last partially executed TD when a stream is stopped. xHC uses this information to resume the transfer where it left mid TD, when the stream is restarted. Patch fixes this by clearing out all RsvdO fields before initializing new Stream transfer using a 'Set TR Dequeue Pointer' command. Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver") cc: stable@vger.kernel.org Signed-off-by: Pawel Laszczak <pawell@cadence.com> Reviewed-by: Peter Chen <peter.chen@kernel.org> Link: https://lore.kernel.org/r/PH7PR07MB95386A40146E3EC64086F409DD9D2@PH7PR07MB9538.namprd07.prod.outlook.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-05usb: xhci: adjust empty TD list handling in handle_tx_event()Niklas Neronin
Introduce an initial check for an empty list prior to entering the while loop. Which enables, the implementation of distinct warnings to differentiate between scenarios where the list is initially empty and when it has been emptied during processing skipped isoc TDs. These adjustments not only simplifies the large while loop, but also facilitates future enhancements to the handle_tx_event() function. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240905143300.1959279-11-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-05usb: xhci: remove excessive Bulk short packet debug messageNiklas Neronin
Completion codes 'COMP_SUCCESS' and 'COMP_SHORT_PACKET' are the most frequently encountered completion codes. Typically, these codes do not trigger a default debug message but rather a warning that indicates a potential issue. This behavior is consistent across all transfer types with the exception of Bulk transfers. To reduce unnecessary log clutter, remove the Bulk 'COMP_SHORT_PACKET' debug message. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240905143300.1959279-6-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-09-05usb: xhci: remove excessive isoc frame debug message spamNiklas Neronin
The removed debug messages trigger each time an isoc frame is handled. In case of an error, a dedicated debug message exists. For example, a 60fps USB camera will trigger the debug message every 0.6s. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240905143300.1959279-5-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-13usb: xhci: fix duplicate stall handling in handle_tx_event()Niklas Neronin
Stall handling is managed in the 'process_*' functions, which are called right before the 'goto' stall handling code snippet. Thus, there should be a return after the 'process_*' functions. Otherwise, the stall code may run twice. Fixes: 1b349f214ac7 ("usb: xhci: add 'goto' for halted endpoint check in handle_tx_event()") Reported-by: Michal Pecio <michal.pecio@gmail.com> Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240809124408.505786-3-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27xhci: sort out TRB Endpoint ID bitfield macrosMathias Nyman
xhci macros that read and write endpoint ID bitfields of TRBs are mixing the 1-based Endpoint ID as described in the xHCI specification, and 0-based endpoint index used by driver as an array index. Sort this out by naming macros that deal with 1 based Endpoint ID fields to *_EP_ID_*, and 0 based endpoint index values to *_EP_INDEX_*. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-22-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: add 'goto' for halted endpoint check in handle_tx_event()Niklas Neronin
Add 'goto' statement for a halted endpoint, streamlining the error handling process. In future handle_tx_event() changes this 'goto' statement will have more uses. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-20-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: move process TD code out of the while loopNiklas Neronin
This part is and should only performed once, so it's moved out of the while loop to improve code readability. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-19-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: remove infinite loop preventionNiklas Neronin
If a buggy HW reports some unpredicted event (for example, an overrun event following a MSE event while the EP ring is actually not empty), the driver will never find the TD, and it will loop until the TD list is empty. Before commits [1][2], the spin lock was released when giving back a URB in the do-while loop. This could cause more TD to be added to TD list, causing an infinite loop. Because of commits [1][2] the spin lock is not released any more, thus the infinite loop prevention is unnecessary and is removed. [1], commit 0c03d89d0c71 ("xhci: Giveback urb in finish_td directly") [2], commit 36dc01657b49 ("usb: host: xhci: Support running urb giveback in tasklet context") Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-18-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: remove false xhci_giveback_urb_in_irq() header commentNiklas Neronin
The function doesn't releases and re-acquires the lock, this was removed in commit 36dc01657b49 ("usb: host: xhci: Support running urb giveback in tasklet context") Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-17-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: ensure skipped isoc TDs are returned when isoc ring is stoppedNiklas Neronin
Missed service event tells the driver that the hardware wasn't able to process some queued isoc TDs in their right time slots, and some TDs will be skipped. The driver sets a 'skip' flag to indicate that the next transfer event after this event will point to some future TD instead of the next queued TD. Once the driver receives the next event, it will skip and give back all those hardware skipped TDs. However, should this subsequent event be a stop endpoint which does not point to the next pending TD, the driver fails to return the skipped TDs. Instead, it loops for a period before outputting an erroneous message. Fix this by repositioning the 'stop endpoint' check to follow the isoc skip check, ensuring the skipped TDs are properly returned. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-16-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27xhci: rework xhci internal endpoint halt state detection.Mathias Nyman
When xhci_requires_manual_halt_cleanup() was written it wasn't clear that the xhci internal endpoint halt state always needs to be cleared with a reset endpoint command. Functional stall cases additionally halt the device side endpoint which requires class driver to clear the device side halt with a CLEAR_FEATURE(ENDPOINT_HALT) request as well. Clean up, rename, and make sure the new function always return true when internal endpoint state is halted, including stall cases. Based on related cleanup suggestion code by Niklas Neronin cc: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-15-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: remove obsolete sanity check debug messagesNiklas Neronin
Remove debug messages that served as sanity checks during the initial implementation phase of underrun/overrun completion codes. These checks are now unnecessary. Instead, improve the default debug messages for underrun/overrun events, so that they are consistent with the reset of the completion codes. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-14-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: improve error message for targetless transfer eventNiklas Neronin
Improve error message for unknown transfer event without a TRB, by also printing the event code number. This removes the inevitable question; "what was the unknown event code exactly?" Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-13-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: move untargeted transfer event handling to a separate functionNiklas Neronin
Move handling transfer events without a target transfer TRB into handle_transferless_tx_event(), this type of event does not utilize the rest of handle_tx_event() and as a result it's better to separate it into a dedicated function. Additionally, this change reduces handle_tx_event()'s size and makes it more readable. [Mathias: Simplify code to return helper function value directly. This removes the second xhci_err() message for untargeted and unexpected event completion types] Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-12-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: move link chain bit quirk checks into one helper function.Niklas Neronin
Older 0.95 xHCI hosts and some other specific newer hosts require the chain bit to be set for Link TRBs even if the link TRB is not in the middle of a transfer descriptor (TD). move the checks for all those cases into one xhci_link_chain_quirk() function to clean up and avoid code duplication. No functional changes. [skip renaming chain_links flag, reword commit message -Mathias] Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-10-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: remove unused argument from handle_port_status()Niklas Neronin
Argument struct 'xhci_interrupter *ir' is not used, and as a consequence is removed. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-9-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: remove unused argument from xhci_handle_cmd_config_ep()Niklas Neronin
Argument u32 'cmd_comp_code' is not used, and as a consequence is removed. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-8-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: remove unused 'xhci' argumentNiklas Neronin
Remove argument 'struct xhci_hcd *xhci' from functions which do not utilize it. This change contributes to a simpler codebase by avoiding redundant arguments. Functions which have the argument removed: check_interval() xhci_num_trbs_free() xhci_handle_cmd_enable_slot() xhci_clear_interrupt_pending() xhci_requires_manual_halt_cleanup() Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-7-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27usb: xhci: remove 'num_trbs' from struct 'xhci_td'Niklas Neronin
Remove 'num_trbs' from 'xhci_td' as it's no longer used following the removal of 'num_trbs_free' tracking in commit 2710f8186f88 ("xhci: Stop unnecessary tracking of free trbs in a ring"). Tracking of 'num_trbs_free' is still performed in xhci DbC, but it does not utilize 'num_trbs'. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-6-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27xhci: Set correct transferred length for cancelled isoc transfersMathias Nyman
The transferred length is incorrectly zeroed for cancelled isoc transfer TDs when the transfer ring stops on a cancelled TD with a 'Stop - Length Invalid' completion code. Length was always set to zero in these cases even if it should be set to the sum of the TRB transfer block lengths up to the TRB the ring stopped on, _excluding_ the one stopped on. No issues reported due to this isoc case. Found while inspecting related case in bulk transfer 'Stop - Length Invalid' handling. Change this so that 'Stop - Length Invalid' transfer completion cases always sum up TRB lengths instead of report a zero length. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-4-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-27xhci: Remove dead code in xhci_move_dequeue_past_td()Hector Martin
This codepath is trivially dead, since the function is never called with a non-NULL td (the only callsite is immediately preceded by a NULL guard). [remove unused label 'deq_found' -Mathias] Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240626124835.1023046-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12xhci: Handle TD clearing for multiple streams caseHector Martin
When multiple streams are in use, multiple TDs might be in flight when an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for each, to ensure everything is reset properly and the caches cleared. Change the logic so that any N>1 TDs found active for different streams are deferred until after the first one is processed, calling xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to queue another command until we are done with all of them. Also change the error/"should never happen" paths to ensure we at least clear any affected TDs, even if we can't issue a command to clear the hardware cache, and complain loudly with an xhci_warn() if this ever happens. This problem case dates back to commit e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.") early on in the XHCI driver's life, when stream support was first added. It was then identified but not fixed nor made into a warning in commit 674f8438c121 ("xhci: split handling halted endpoints into two steps"), which added a FIXME comment for the problem case (without materially changing the behavior as far as I can tell, though the new logic made the problem more obvious). Then later, in commit 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs."), it was acknowledged again. [Mathias: commit 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.") was a targeted regression fix to the previously mentioned patch. Users reported issues with usb stuck after unmounting/disconnecting UAS devices. This rolled back the TD clearing of multiple streams to its original state.] Apparently the commit author was aware of the problem (yet still chose to submit it): It was still mentioned as a FIXME, an xhci_dbg() was added to log the problem condition, and the remaining issue was mentioned in the commit description. The choice of making the log type xhci_dbg() for what is, at this point, a completely unhandled and known broken condition is puzzling and unfortunate, as it guarantees that no actual users would see the log in production, thereby making it nigh undebuggable (indeed, even if you turn on DEBUG, the message doesn't really hint at there being a problem at all). It took me *months* of random xHC crashes to finally find a reliable repro and be able to do a deep dive debug session, which could all have been avoided had this unhandled, broken condition been actually reported with a warning, as it should have been as a bug intentionally left in unfixed (never mind that it shouldn't have been left in at all). > Another fix to solve clearing the caches of all stream rings with > cancelled TDs is needed, but not as urgent. 3 years after that statement and 14 years after the original bug was introduced, I think it's finally time to fix it. And maybe next time let's not leave bugs unfixed (that are actually worse than the original bug), and let's actually get people to review kernel commits please. Fixes xHC crashes and IOMMU faults with UAS devices when handling errors/faults. Easiest repro is to use `hdparm` to mark an early sector (e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop. At least in the case of JMicron controllers, the read errors end up having to cancel two TDs (for two queued requests to different streams) and the one that didn't get cleared properly ends up faulting the xHC entirely when it tries to access DMA pages that have since been unmapped, referred to by the stale TDs. This normally happens quickly (after two or three loops). After this fix, I left the `cat` in a loop running overnight and experienced no xHC failures, with all read errors recovered properly. Repro'd and tested on an Apple M1 Mac Mini (dwc3 host). On systems without an IOMMU, this bug would instead silently corrupt freed memory, making this a security bug (even on systems with IOMMUs this could silently corrupt memory belonging to other USB devices on the same controller, so it's still a security bug). Given that the kernel autoprobes partition tables, I'm pretty sure a malicious USB device pretending to be a UAS device and reporting an error with the right timing could deliberately trigger a UAF and write to freed memory, with no user action. [Mathias: Commit message and code comment edit, original at:] https://lore.kernel.org/linux-usb/20240524-xhci-streams-v1-1-6b1f13819bea@marcan.st/ Fixes: e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.") Fixes: 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.") Fixes: 674f8438c121 ("xhci: split handling halted endpoints into two steps") Cc: stable@vger.kernel.org Cc: security@kernel.org Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Hector Martin <marcan@marcan.st> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240611120610.3264502-5-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-06-12xhci: Set correct transferred length for cancelled bulk transfersMathias Nyman
The transferred length is set incorrectly for cancelled bulk transfer TDs in case the bulk transfer ring stops on the last transfer block with a 'Stop - Length Invalid' completion code. length essentially ends up being set to the requested length: urb->actual_length = urb->transfer_buffer_length Length for 'Stop - Length Invalid' cases should be the sum of all TRB transfer block lengths up to the one the ring stopped on, _excluding_ the one stopped on. Fix this by always summing up TRB lengths for 'Stop - Length Invalid' bulk cases. This issue was discovered by Alan Stern while debugging https://bugzilla.kernel.org/show_bug.cgi?id=218890, but does not solve that bug. Issue is older than 4.10 kernel but fix won't apply to those due to major reworks in that area. Tested-by: Pierre Tomon <pierretom+12@ik.me> Cc: stable@vger.kernel.org # v4.10+ Cc: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240611120610.3264502-2-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-01usb: xhci: compact 'trb_in_td()' argumentsNiklas Neronin
Pass pointer to the TD (struct xhci_td *) directly, instead of its components separately. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240429140245.3955523-19-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-01usb: xhci: remove duplicate TRB_TO_SLOT_ID() callsNiklas Neronin
Remove unnecessary repeated calls to TRB_TO_SLOT_ID(). The slot ID is stored in the 'slot_id' variable at the function's start. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240429140245.3955523-18-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-01usb: xhci: remove goto 'cleanup' in handle_tx_event()Niklas Neronin
By removing the goto 'cleanup' statement, and replacing it with 'continue', 'break' and 'return', helps simplify the code and further showcase in which case the while loop iterates. This change prepares for the comprehensive handle_tx_event() rework. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240429140245.3955523-14-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-01usb: xhci: replace goto with return when possible in handle_tx_event()Niklas Neronin
Simplifying the handle_tx_event() function by addressing the complexity of its while loop. Replaces specific 'goto cleanup' statements with 'return' statements, applicable only where 'ep->skip' is set to 'false', ensuring loop termination. The original while loop, combined with 'goto cleanup', adds unnecessary complexity. This change aims to untangle the loop's logic, facilitating a more straightforward review of the upcoming comprehensive rework. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240429140245.3955523-13-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-01usb: xhci: remove 'handling_skipped_tds' from handle_tx_event()Niklas Neronin
When handle_tx_event() encounters a COMP_MISSED_SERVICE_ERROR or COMP_NO_PING_RESPONSE_ERROR event, it moves to 'goto cleanup'. Here, it sets a flag, 'handling_skipped_tds', based on conditions that exclude these two error events. Subsequently, the process evaluates the loop that persists as long as 'handling_skipped_tds' remains true. However, since 'trb_comp_code' does not change after its assignment, if it indicates either of the two error conditions, the loop terminates immediately. To simplify this process and enhance clarity, the modification involves returning immediately upon detecting COMP_MISSED_SERVICE_ERROR or COMP_NO_PING_RESPONSE_ERROR. This adjustment allows for the direct use of 'ep->skip', removing the necessity for the 'handling_skipped_tds' flag. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240429140245.3955523-12-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-01usb: xhci: prevent potential failure in handle_tx_event() for Transfer ↵Niklas Neronin
events without TRB Some transfer events don't always point to a TRB, and consequently don't have a endpoint ring. In these cases, function handle_tx_event() should not proceed, because if 'ep->skip' is set, the pointer to the endpoint ring is used. To prevent a potential failure and make the code logical, return after checking the completion code for a Transfer event without TRBs. Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240429140245.3955523-11-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-05-01xhci: remove XHCI_TRUST_TX_LENGTH quirkMathias Nyman
If this quirk was set then driver would treat transfer events with 'Success' completion code as 'Short packet' if there were untransferred bytes left. This is so common that turn it into default behavior. xhci_warn_ratelimited() is no longer used after this, so remove it. A success event with untransferred bytes left doesn't always mean a misbehaving controller. If there was an error mid a multi-TRB TD it's allowed to issue a success event for the last TRB in that TD. See xhci 1.2 spec 4.9.1 Transfer Descriptors "Note: If an error is detected while processing a multi-TRB TD, the xHC shall generate a Transfer Event for the TRB that the error was detected on with the appropriate error Condition Code, then may advance to the next TD. If in the process of advancing to the next TD, a Transfer TRB is encountered with its IOC flag set, then the Condition Code of the Transfer Event generated for that Transfer TRB should be Success, because there was no error actually associated with the TRB that generated the Event. However, an xHC implementation may redundantly assert the original error Condition Code." Co-developed-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Link: https://lore.kernel.org/r/20240429140245.3955523-10-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>