summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-12-05genirq/msi: Provide constants for PCI/IMS supportThomas Gleixner
Provide the necessary constants for PCI/IMS support: - A new bus token for MSI irqdomain identification - A MSI feature flag for the MSI irqdomains to signal support - A secondary domain id The latter expands the device internal domain pointer storage array from 1 to 2 entries. That extra pointer is mostly unused today, but the alternative solutions would not be free either and would introduce more complexity all over the place. Trade the 8bytes for simplicity. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.846169830@linutronix.de
2022-12-05x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYNThomas Gleixner
x86 MSI irqdomains can handle MSI-X allocation post MSI-X enable just out of the box - on the vector domain and on the remapping domains, Add the feature flag to the supported feature list Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.787373104@linutronix.de
2022-12-05PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-XThomas Gleixner
MSI-X vectors can be allocated after the initial MSI-X enablement, but this needs explicit support of the underlying interrupt domains. Provide a function to query the ability and functions to allocate/free individual vectors post-enable. The allocation can either request a specific index in the MSI-X table or with the index argument MSI_ANY_INDEX it allocates the next free vector. The return value is a struct msi_map which on success contains both index and the Linux interrupt number. In case of failure index is negative and the Linux interrupt number is 0. The allocation function is for a single MSI-X index at a time as that's sufficient for the most urgent use case VFIO to get rid of the 'disable MSI-X, reallocate, enable-MSI-X' cycle which is prone to lost interrupts and redirections to the legacy and obviously unhandled INTx. As single index allocation is also sufficient for the use cases Jason Gunthorpe pointed out: Allocation of a MSI-X or IMS vector for a network queue. See Link below. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/all/20211126232735.547996838@linutronix.de Link: https://lore.kernel.org/r/20221124232326.731233614@linutronix.de
2022-12-05PCI/MSI: Provide prepare_desc() MSI domain opThomas Gleixner
The setup of MSI descriptors for PCI/MSI-X interrupts depends partially on the MSI index for which the descriptor is initialized. Dynamic MSI-X vector allocation post MSI-X enablement allows to allocate vectors at a given index or at any free index in the available table range. The latter requires that the descriptor is initialized after the MSI core has chosen an index. Implement the prepare_desc() op in the PCI/MSI-X specific msi_domain_ops which is invoked before the core interrupt descriptor and the associated Linux interrupt number is allocated. That callback is also provided for the upcoming PCI/IMS implementations so the implementation specific interrupt domain can do their domain specific initialization of the MSI descriptors. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.673658806@linutronix.de
2022-12-05PCI/MSI: Split MSI-X descriptor setupThomas Gleixner
The upcoming mechanism to allocate MSI-X vectors after enabling MSI-X needs to share some of the MSI-X descriptor setup. The regular descriptor setup on enable has the following code flow: 1) Allocate descriptor 2) Setup descriptor with PCI specific data 3) Insert descriptor 4) Allocate interrupts which in turn scans the inserted descriptors This cannot be easily changed because the PCI/MSI code needs to handle the legacy architecture specific allocation model and the irq domain model where quite some domains have the assumption that the above flow is how it works. Ideally the code flow should look like this: 1) Invoke allocation at the MSI core 2) MSI core allocates descriptor 3) MSI core calls back into the irq domain which fills in the domain specific parts This could be done for underlying parent MSI domains which support post-enable allocation/free but that would create significantly different code pathes for MSI/MSI-X enable. Though for dynamic allocation which wants to share the allocation code with the upcoming PCI/IMS support it's the right thing to do. Split the MSI-X descriptor setup into the preallocation part which just sets the index and fills in the horrible hack of virtual IRQs and the real PCI specific MSI-X setup part which solely depends on the index in the descriptor. This allows to provide a common dynamic allocation interface at the MSI core level for both PCI/MSI-X and PCI/IMS. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.616292598@linutronix.de
2022-12-05genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYNThomas Gleixner
Provide a new MSI feature flag in preparation for dynamic MSIX allocation after the initial MSI-X enable has been done. This needs to be an explicit MSI interrupt domain feature because quite some implementations (both interrupt domains and legacy allocation mode) have clear expectations that the allocation code is only invoked when MSI-X is about to be enabled. They either talk to hypervisors or do some other work and are not prepared to be invoked on an already MSI-X enabled device. This is also explicit MSI-X only because rewriting the size of the MSI entries is only possible when disabling MSI which in turn might cause lost interrupts on the device. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.558843119@linutronix.de
2022-12-05genirq/msi: Provide msi_domain_alloc_irq_at()Thomas Gleixner
For supporting post MSI-X enable allocations and for the upcoming PCI/IMS support a separate interface is required which allows not only the allocation of a specific index, but also the allocation of any, i.e. the next free index. The latter is especially required for IMS because IMS completely does away with index to functionality mappings which are often found in MSI/MSI-X implementation. But even with MSI-X there are devices where only the first few indices have a fixed functionality and the rest is freely assignable by software, e.g. to queues. msi_domain_alloc_irq_at() is also different from the range based interfaces as it always enforces that the MSI descriptor is allocated by the core code and not preallocated by the caller like the PCI/MSI[-X] enable code path does. msi_domain_alloc_irq_at() can be invoked with the index argument set to MSI_ANY_INDEX which makes the core code pick the next free index. The irq domain can provide a prepare_desc() operation callback in it's msi_domain_ops to do domain specific post allocation initialization before the actual Linux interrupt and the associated interrupt descriptor and hierarchy alloccations are conducted. The function also takes an optional @icookie argument which is of type union msi_instance_cookie. This cookie is not used by the core code and is stored in the allocated msi_desc::data::icookie. The meaning of the cookie is completely implementation defined. In case of IMS this might be a PASID or a pointer to a device queue, but for the MSI core it's opaque and not used in any way. The function returns a struct msi_map which on success contains the allocated index number and the Linux interrupt number so the caller can spare the index to Linux interrupt number lookup. On failure map::index contains the error code and map::virq is 0. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.501359457@linutronix.de
2022-12-05genirq/msi: Provide msi_domain_ops:: Prepare_desc()Thomas Gleixner
The existing MSI domain ops msi_prepare() and set_desc() turned out to be unsuitable for implementing IMS support. msi_prepare() does not operate on the MSI descriptors. set_desc() lacks an irq_domain pointer and has a completely different purpose. Introduce a prepare_desc() op which allows IMS implementations to amend an MSI descriptor which was allocated by the core code, e.g. by adjusting the iomem base or adding some data based on the allocated index. This is way better than requiring that all IMS domain implementations preallocate the MSI descriptor and then allocate the interrupt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.444560717@linutronix.de
2022-12-05genirq/msi: Provide msi_desc:: Msi_dataThomas Gleixner
The upcoming support for PCI/IMS requires to store some information related to the message handling in the MSI descriptor, e.g. PASID or a pointer to a queue. Provide a generic storage struct which maps over the existing PCI specific storage which means the size of struct msi_desc is not getting bigger. This storage struct has two elements: 1) msi_domain_cookie 2) msi_instance_cookie The domain cookie is going to be used to store domain specific information, e.g. iobase pointer, data pointer. The instance cookie is going to be handed in when allocating an interrupt on an IMS domain so the irq chip callbacks of the IMS domain have the necessary per vector information available. It also comes in handy when cleaning up the platform MSI code for wire to MSI bridges which need to hand down the type information to the underlying interrupt domain. For the core code the cookies are opaque and meaningless. It just stores the instance cookie during an allocation through the upcoming interfaces for IMS and wire to MSI brigdes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.385036043@linutronix.de
2022-12-05genirq/msi: Provide struct msi_mapThomas Gleixner
A simple struct to hold a MSI index / Linux interrupt number pair. It will be returned from the dynamic vector allocation function and handed back to the corresponding free() function. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.326410494@linutronix.de
2022-12-05x86/apic/msi: Remove arch_create_remap_msi_irq_domain()Thomas Gleixner
and related code which is not longer required now that the interrupt remap code has been converted to MSI parent domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.267353814@linutronix.de
2022-12-05iommu/amd: Switch to MSI base domainsThomas Gleixner
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.209212272@linutronix.de
2022-12-05iommu/vt-d: Switch to MSI parent domainsThomas Gleixner
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de
2022-12-05PCI/MSI: Remove unused pci_dev_has_special_msi_domain()Thomas Gleixner
The check for special MSI domains like VMD which prevents the interrupt remapping code to overwrite device::msi::domain is not longer required and has been replaced by an x86 specific version which is aware of MSI parent domains. Remove it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.093093200@linutronix.de
2022-12-05x86/apic/vector: Provide MSI parent domainThomas Gleixner
Enable MSI parent domain support in the x86 vector domain and fixup the checks in the iommu implementations to check whether device::msi::domain is the default MSI parent domain. That keeps the existing logic to protect e.g. devices behind VMD working. The interrupt remap PCI/MSI code still works because the underlying vector domain still provides the same functionality. None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are affected either. They still work the same way both at the low level and the PCI/MSI implementations they provide. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de
2022-12-05PCI/MSI: Add support for per device MSI[X] domainsThomas Gleixner
Provide a template and the necessary callbacks to create PCI/MSI and PCI/MSI-X domains. The domains are created when MSI or MSI-X is enabled. The domain's lifetime is either the device lifetime or in case that e.g. MSI-X was tried first and failed, then the MSI-X domain is removed and a MSI domain is created as both are mutually exclusive and reside in the default domain ID slot of the per device domain pointer array. Also expand pci_msi_domain_supports() to handle feature checks correctly even in the case that the per device domain was not yet created by checking the features supported by the MSI parent. Add the necessary setup calls into the MSI and MSI-X enable code path. These setup calls are backwards compatible. They return success when there is no parent domain found, which means the existing global domains or the legacy allocation path keep just working. Co-developed-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.975388241@linutronix.de
2022-12-05genirq/msi: Provide BUS_DEVICE_PCI_MSI[X]Thomas Gleixner
Provide new bus tokens for the upcoming per device PCI/MSI and PCI/MSIX interrupt domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.917219885@linutronix.de
2022-12-05PCI/MSI: Split __pci_write_msi_msg()Thomas Gleixner
The upcoming per device MSI domains will create different domains for MSI and MSI-X. Split the write message function into MSI and MSI-X helpers so they can be used by those new domain functions seperately. Signed-off-by: Ahmed S. Darwish <darwi@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.857982142@linutronix.de
2022-12-05genirq/msi: Add range checking to msi_insert_desc()Thomas Gleixner
Per device domains provide the real domain size to the core code. This allows range checking on insertion of MSI descriptors and also paves the way for dynamic index allocations which are required e.g. for IMS. This avoids external mechanisms like bitmaps on the device side and just utilizes the core internal MSI descriptor storxe for it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232325.798556374@linutronix.de
2022-12-05dt-bindings: soc: qcom: aoss: Add compatible for SM8550Abel Vesa
Document the compatible for SM8550. Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221116113128.2655441-1-abel.vesa@linaro.org
2022-12-05soc: qcom: llcc: Add configuration data for SM8550Abel Vesa
Add LLCC configuration data for SM8550 SoC. Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221116113005.2653284-4-abel.vesa@linaro.org
2022-12-05dt-bindings: arm: msm: Add LLCC compatible for SM8550Abel Vesa
Add LLCC compatible for SM8550 SoC. Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221116113005.2653284-3-abel.vesa@linaro.org
2022-12-05soc: qcom: llcc: Add v4.1 HW version supportAbel Vesa
The LLCC found in SM8550 supports more slice configuration knobs and HW block version has been bumped up to 4.1. Add support for the new version and make sure the new config values are programed on probe. Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221116113005.2653284-2-abel.vesa@linaro.org
2022-12-05soc: qcom: socinfo: Add SM8550 IDAbel Vesa
Add the ID for the Qualcomm SM8550 SoC. Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221116112438.2643607-1-abel.vesa@linaro.org
2022-12-05soc: qcom: rpmh-rsc: Avoid unnecessary checks on irq-done responseAbel Vesa
The RSC interrupt is issued only after the request is complete. For fire-n-forget requests, the irq-done interrupt is sent after issuing the RPMH request and for response-required request, the interrupt is triggered only after all the requests are complete. These unnecessary checks in the interrupt handler issues AHB reads from a critical path. Lets remove them and clean up error handling in rpmh_request data structures. Co-developed-by: Lina Iyer <ilina@codeaurora.org> Signed-off-by: Lina Iyer <ilina@codeaurora.org> Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221116112246.2640648-2-abel.vesa@linaro.org
2022-12-05soc: qcom: rpmh-rsc: Add support for RSC v3 register offsetsAbel Vesa
The SM8550 RSC has a new set of register offsets due to its version bump. So read the version from HW and use the proper register offsets based on that. Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221116112246.2640648-1-abel.vesa@linaro.org
2022-12-05soc: qcom: rpmhpd: Add SM8550 power domainsAbel Vesa
Add the power domains exposed by RPMH in the Qualcomm SM8550 platform. Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221116111745.2633074-3-abel.vesa@linaro.org
2022-12-05dt-bindings: power: rpmpd: Add SM8550 to rpmpd bindingAbel Vesa
Add compatible and constants for the power domains exposed by the RPMH in the Qualcomm SM8550 platform. Signed-off-by: Abel Vesa <abel.vesa@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221116111745.2633074-2-abel.vesa@linaro.org
2022-12-05dt-bindings: arm: qcom: Add Xperia 5 IV (PDX224)Konrad Dybcio
Add a compatible for Sony Xperia 5 IV. Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221114095654.34561-1-konrad.dybcio@linaro.org
2022-12-05dt-bindings: arm: qcom: Document msm8956 and msm8976 SoC and devicesMarijn Suijten
Note that msm8976 is omitted as a compatible, since there are currently no boards/devices using it. Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org> Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221111120156.48040-9-angelogioacchino.delregno@collabora.com
2022-12-05blk-throttle: Use more suitable time_after check for update of slice_startKemeng Shi
There is no need to update tg->slice_start[rw] to start when they are equal already. So remove "eq" part of check before update slice_start. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-10-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05blk-throttle: remove repeat check of elapsed timeKemeng Shi
There is no need to check elapsed time from last upgrade for each node in hierarchy. Move this check before traversing as throtl_can_upgrade do to remove repeat check. Reported-by: kernel test robot <lkp@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-9-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05blk-throttle: remove incorrect comment for tg_last_low_overflow_timeKemeng Shi
Function tg_last_low_overflow_time is called with intermediate node as following: throtl_hierarchy_can_downgrade throtl_tg_can_downgrade tg_last_low_overflow_time throtl_hierarchy_can_upgrade throtl_tg_can_upgrade tg_last_low_overflow_time throtl_hierarchy_can_downgrade/throtl_hierarchy_can_upgrade will traverse from leaf node to sub-root node and pass traversed intermediate node to tg_last_low_overflow_time. No such limit could be found from context and implementation of tg_last_low_overflow_time, so remove this limit in comment. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-8-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05blk-throttle: fix typo in comment of throtl_adjusted_limitKemeng Shi
lapsed time -> elapsed time Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-7-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05blk-throttle: simpfy low limit reached check in throtl_tg_can_upgradeKemeng Shi
Commit c79892c557616 ("blk-throttle: add upgrade logic for LIMIT_LOW state") added upgrade logic for low limit and methioned that 1. "To determine if a cgroup exceeds its limitation, we check if the cgroup has pending request. Since cgroup is throttled according to the limit, pending request means the cgroup reaches the limit." 2. "If a cgroup has limit set for both read and write, we consider the combination of them for upgrade. The reason is read IO and write IO can interfere with each other. If we do the upgrade based in one direction IO, the other direction IO could be severly harmed." Besides, we also determine that cgroup reaches low limit if low limit is 0, see comment in throtl_tg_can_upgrade. Collect the information above, the desgin of upgrade check is as following: 1.The low limit is reached if limit is zero or io is already queued. 2.Cgroup will pass upgrade check if low limits of READ and WRITE are both reached. Simpfy the check code described above to removce repeat check and improve readability. There is no functional change. Detail equivalence proof is as following: All replaced conditions to return true are as following: condition 1 (!read_limit && !write_limit) condition 2 read_limit && sq->nr_queued[READ] && (!write_limit || sq->nr_queued[WRITE]) condition 3 write_limit && sq->nr_queued[WRITE] && (!read_limit || sq->nr_queued[READ]) Transferring condition 2 as following: (read_limit && sq->nr_queued[READ]) && (!write_limit || sq->nr_queued[WRITE]) is equivalent to (read_limit && sq->nr_queued[READ]) && (!write_limit || (write_limit && sq->nr_queued[WRITE])) is equivalent to condition 2.1 (read_limit && sq->nr_queued[READ] && !write_limit) || condition 2.2 (read_limit && sq->nr_queued[READ] && (write_limit && sq->nr_queued[WRITE])) Transferring condition 3 as following: write_limit && sq->nr_queued[WRITE] && (!read_limit || sq->nr_queued[READ]) is equivalent to (write_limit && sq->nr_queued[WRITE]) && (!read_limit || (read_limit && sq->nr_queued[READ])) is equivalent to condition 3.1 ((write_limit && sq->nr_queued[WRITE]) && !read_limit) || condition 3.2 ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) Condition 3.2 is the same as condition 2.2, so all conditions we get to return are as following: (!read_limit && !write_limit) (1) (!read_limit && (write_limit && sq->nr_queued[WRITE])) (3.1) ((read_limit && sq->nr_queued[READ]) && !write_limit) (2.1) ((write_limit && sq->nr_queued[WRITE]) && (read_limit && sq->nr_queued[READ])) (2.2) As we can extract conditions "(a1 || a2) && (b1 || b2)" to: a1 && b1 a1 && b2 a2 && b1 ab && b2 Considering that: a1 = !read_limit a2 = read_limit && sq->nr_queued[READ] b1 = !write_limit b2 = write_limit && sq->nr_queued[WRITE] We can pack replaced conditions to (!read_limit || (read_limit && sq->nr_queued[READ])) && (!write_limit || (write_limit && sq->nr_queued[WRITE])) which is equivalent to (!read_limit || sq->nr_queued[READ]) && (!write_limit || sq->nr_queued[WRITE]) Reported-by: kernel test robot <lkp@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-6-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05soc: qcom: socinfo: Add MSM8956/76 SoC IDs to the soc_id tableAngeloGioacchino Del Regno
Add SoC ID table entries for MSM8956 and MSM8976 chips. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221111120156.48040-8-angelogioacchino.delregno@collabora.com
2022-12-05dt-bindings: arm: qcom,ids: Add SoC IDs for MSM8956 and MSM8976AngeloGioacchino Del Regno
Document the identifier of MSM8956/76. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Konrad Dybcio <konrad.dybcio@linaro.org> Signed-off-by: Bjorn Andersson <andersson@kernel.org> Link: https://lore.kernel.org/r/20221111120156.48040-7-angelogioacchino.delregno@collabora.com
2022-12-05blk-throttle: correct calculation of wait time in tg_may_dispatchKemeng Shi
In C language, When executing "if (expression1 && expression2)" and expression1 return false, the expression2 may not be executed. For "tg_within_bps_limit(tg, bio, bps_limit, &bps_wait) && tg_within_iops_limit(tg, bio, iops_limit, &iops_wait))", if bps is limited, tg_within_bps_limit will return false and tg_within_iops_limit will not be called. So even bps and iops are both limited, iops_wait will not be calculated and is always zero. So wait time of iops is always ignored. Fix this by always calling tg_within_bps_limit and tg_within_iops_limit to get wait time for both bps and iops. Observed that: 1. Wait time in tg_within_iops_limit/tg_within_bps_limit need always be stored as wait argument is always passed. 2. wait time is stored to zero if iops/bps is limited otherwise non-zero is stored. Simpfy tg_within_iops_limit/tg_within_bps_limit by removing wait argument and return wait time directly. Caller tg_may_dispatch checks if wait time is zero to find if iops/bps is limited. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-5-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05blk-throttle: ignore cgroup without io queued in blk_throtl_cancel_biosKemeng Shi
Ignore cgroup without io queued in blk_throtl_cancel_bios for two reasons: 1. Save cpu cycle for trying to dispatch cgroup which is no io queued. 2. Avoid non-consistent state that cgroup is inserted to service queue without THROTL_TG_PENDING set as tg_update_disptime will unconditional re-insert cgroup to service queue. If we are on the default hierarchy, IO dispatched from child in tg_dispatch_one_bio will trigger inserting cgroup to service queue without erase first and ruin the tree. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-4-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05blk-throttle: Fix that bps of child could exceed bps limited in parentKemeng Shi
Consider situation as following (on the default hierarchy): HDD | root (bps limit: 4k) | child (bps limit :8k) | fio bs=8k Rate of fio is supposed to be 4k, but result is 8k. Reason is as following: Size of single IO from fio is larger than bytes allowed in one throtl_slice in child, so IOs are always queued in child group first. When queued IOs in child are dispatched to parent group, BIO_BPS_THROTTLED is set and these IOs will not be limited by tg_within_bps_limit anymore. Fix this by only set BIO_BPS_THROTTLED when the bio traversed the entire tree. There patch has no influence on situation which is not on the default hierarchy as each group is a single root group without parent. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-3-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05blk-throttle: correct stale comment in throtl_pd_initKemeng Shi
On the default hierarchy (cgroup2), the throttle interface files don't exist in the root cgroup, so the ablity to limit the whole system by configuring root group is not existing anymore. In general, cgroup doesn't wanna be in the business of restricting resources at the system level, so correct the stale comment that we can limit whole system to we can only limit subtree. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221205115709.251489-2-shikemeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-12-05tools/testing/cxl: Require cache invalidation bypassDan Williams
The typical environment where cxl_test is run, QEMU, does not support cpu_cache_invalidate_memregion(). Add the 'test' bypass symbols to the configuration check. Reported-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167026948179.3527561.4535373655515827457.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-05cxl/acpi: Fail decoder add if CXIMS for HBIG is missingAlison Schofield
The BIOS provided CXIMS (CXL XOR Interleave Math Structure) is required for calculating a targets position in an interleave list during region creation. The CXL driver expects to discover a CXIMS that matches the HBIG (Host Bridge Interleave Granularity) and stores the xormaps found in that CXIMS for retrieval during region creation. If there is no CXIMS for an HBIG, no maps are stored. That leads to a NULL pointer dereference at xormap retrieval during region creation. Add a check during ACPI probe for the case of no matching CXIMS. Emit an error message and fail to add the decoder. Fixes: f9db85bfec0d ("cxl/acpi: Support CXL XOR Interleave Math (CXIMS)") Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20221205002951.1788783-1-alison.schofield@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-05cxl/region: Fix spelling mistake "memergion" -> "memregion"Colin Ian King
There is a spelling mistake in a dev_warn message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/r/20221205091819.1943564-1-colin.i.king@gmail.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-05cxl/regs: Fix sparse warningDan Williams
The 0day robot belatedly points out that @addr is not properly tagged as an iomap pointer: "drivers/cxl/core/regs.c:332:14: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected void *addr @@ got void [noderef] __iomem * @@" Fixes: 1168271ca054 ("cxl/acpi: Extract component registers of restricted hosts from RCRB") Reported-by: kernel test robot <lkp@intel.com> Reviewed-by: Robert Richter <rrichter@amd.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://lore.kernel.org/r/167008768190.2516013.11918622906007677341.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2022-12-05Merge branch 'for-6.2/cxl-xor' into for-6.2/cxlDan Williams
Pick up support for "XOR" interleave math when parsing ACPI CFMWS window structures. Fix up conflicts with the RCH emulation already pending in cxl/next.
2022-12-05Merge branch 'for-6.2/cxl-aer' into for-6.2/cxlDan Williams
Pick up CXL AER handling and correctable error extensions. Resolve conflicts with cxl_pmem_wq reworks and RCH support.
2022-12-05Merge branch 'for-6.2/cxl-security' into for-6.2/cxlDan Williams
Pick CXL PMEM security commands for v6.2. Resolve conflicts with the removal of the cxl_pmem_wq.
2022-12-05proc: proc_skip_spaces() shouldn't think it is working on C stringsLinus Torvalds
proc_skip_spaces() seems to think it is working on C strings, and ends up being just a wrapper around skip_spaces() with a really odd calling convention. Instead of basing it on skip_spaces(), it should have looked more like proc_skip_char(), which really is the exact same function (except it skips a particular character, rather than whitespace). So use that as inspiration, odd coding and all. Now the calling convention actually makes sense and works for the intended purpose. Reported-and-tested-by: Kyle Zeng <zengyhkyle@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-05proc: avoid integer type confusion in get_proc_longLinus Torvalds
proc_get_long() is passed a size_t, but then assigns it to an 'int' variable for the length. Let's not do that, even if our IO paths are limited to MAX_RW_COUNT (exactly because of these kinds of type errors). So do the proper test in the rigth type. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>