summaryrefslogtreecommitdiff
path: root/drivers/cpuidle
AgeCommit message (Collapse)Author
2022-03-30RISC-V CPU Idle SupportPalmer Dabbelt
This series adds RISC-V CPU Idle support using SBI HSM suspend function. The RISC-V SBI CPU idle driver added by this series is highly inspired from the ARM PSCI CPU idle driver. Special thanks Sandeep Tripathy for providing early feeback on SBI HSM support in all above projects (RISC-V SBI specification, OpenSBI, and Linux RISC-V). * palmer/riscv-idle: RISC-V: Enable RISC-V SBI CPU Idle driver for QEMU virt machine dt-bindings: Add common bindings for ARM and RISC-V idle states cpuidle: Add RISC-V SBI CPU idle driver cpuidle: Factor-out power domain related code from PSCI domain driver RISC-V: Add SBI HSM suspend related defines RISC-V: Add arch functions for non-retentive suspend entry/exit RISC-V: Rename relocate() and make it global RISC-V: Enable CPU_IDLE drivers
2022-03-23Merge tag 'arm-drivers-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM driver updates from Arnd Bergmann: "There are a few separately maintained driver subsystems that we merge through the SoC tree, notable changes are: - Memory controller updates, mainly for Tegra and Mediatek SoCs, and clarifications for the memory controller DT bindings - SCMI firmware interface updates, in particular a new transport based on OPTEE and support for atomic operations. - Cleanups to the TEE subsystem, refactoring its memory management For SoC specific drivers without a separate subsystem, changes include - Smaller updates and fixes for TI, AT91/SAMA5, Qualcomm and NXP Layerscape SoCs. - Driver support for Microchip SAMA5D29, Tesla FSD, Renesas RZ/G2L, and Qualcomm SM8450. - Better power management on Mediatek MT81xx, NXP i.MX8MQ and older NVIDIA Tegra chips" * tag 'arm-drivers-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (154 commits) ARM: spear: fix typos in comments soc/microchip: fix invalid free in mpfs_sys_controller_delete soc: s4: Add support for power domains controller dt-bindings: power: add Amlogic s4 power domains bindings ARM: at91: add support in soc driver for new SAMA5D29 soc: mediatek: mmsys: add sw0_rst_offset in mmsys driver data dt-bindings: memory: renesas,rpc-if: Document RZ/V2L SoC memory: emif: check the pointer temp in get_device_details() memory: emif: Add check for setup_interrupts dt-bindings: arm: mediatek: mmsys: add support for MT8186 dt-bindings: mediatek: add compatible for MT8186 pwrap soc: mediatek: pwrap: add pwrap driver for MT8186 SoC soc: mediatek: mmsys: add mmsys reset control for MT8186 soc: mediatek: mtk-infracfg: Disable ACP on MT8192 soc: ti: k3-socinfo: Add AM62x JTAG ID soc: mediatek: add MTK mutex support for MT8186 soc: mediatek: mmsys: add mt8186 mmsys routing table soc: mediatek: pm-domains: Add support for mt8186 dt-bindings: power: Add MT8186 power domains soc: mediatek: pm-domains: Add support for mt8195 ...
2022-03-10cpuidle: Add RISC-V SBI CPU idle driverAnup Patel
The RISC-V SBI HSM extension provides HSM suspend call which can be used by Linux RISC-V to enter platform specific low-power state. This patch adds a CPU idle driver based on RISC-V SBI calls which will populate idle states from device tree and use SBI calls to entry these idle states. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Acked-by: Atish Patra <atishp@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-03-10cpuidle: Factor-out power domain related code from PSCI domain driverAnup Patel
The generic power domain related code in PSCI domain driver is largely independent of PSCI and can be shared with RISC-V SBI domain driver hence we factor-out this code into dt_idle_genpd.c and dt_idle_genpd.h. Signed-off-by: Anup Patel <anup.patel@wdc.com> Signed-off-by: Anup Patel <apatel@ventanamicro.com> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-03-09cpuidle: haltpoll: Call cpuidle_poll_state_init() laterLi RongQing
Call cpuidle_poll_state_init() only if it is needed to avoid doing useless work. Signed-off-by: Li RongQing <lirongqing@baidu.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03firmware: qcom: scm: Drop cpumask parameter from set_boot_addr()Stephan Gerhold
qcom_scm_set_cold/warm_boot_addr() currently take a cpumask parameter, but it's not very useful because at the end we always set the same entry address for all CPUs. This also allows speeding up probe of cpuidle-qcom-spm a bit because only one SCM call needs to be made to the TrustZone firmware, instead of one per CPU. The main reason for this change is that it allows implementing the "multi-cluster" variant of the set_boot_addr() call more easily without having to rely on functions that break in certain build configurations or that are not exported to modules. Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20211201130505.257379-4-stephan@gerhold.net
2022-02-03cpuidle: qcom-spm: Check if any CPU is managed by SPMStephan Gerhold
At the moment, the "qcom-spm-cpuidle" platform device is always created, even if none of the CPUs is actually managed by the SPM. On non-qcom platforms this will result in infinite probe-deferral due to the failing qcom_scm_is_available() call. To avoid this, look through the CPU DT nodes and check if there is actually any CPU managed by a SPM (as indicated by the qcom,saw property). It should also be available because e.g. MSM8916 has qcom,saw defined but it's typically not enabled with ARM64/PSCI firmwares. This is needed in preparation of a follow-up change that calls qcom_scm_set_warm_boot_addr() a single time before registering any cpuidle drivers. Otherwise this call might be made even on devices that have this driver enabled but actually make use of PSCI. Fixes: 60f3692b5f0b ("cpuidle: qcom_spm: Detach state machine from main SPM handling") Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/86e3e09f-a8d7-3dff-3fc6-ddd7d30c5d78@samsung.com/ Signed-off-by: Stephan Gerhold <stephan@gerhold.net> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20211201130505.257379-2-stephan@gerhold.net
2022-01-05cpuidle: use default_groups in kobj_typeGreg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the cpuidle sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-17cpuidle: Fix cpuidle_remove_state_sysfs() kerneldoc commentYang Li
Fix function name in sysfs.c kernel-doc comment to remove a warning found by running scripts/kernel-doc, which is caused by using 'make W=1'. drivers/cpuidle/sysfs.c:512: warning: expecting prototype for cpuidle_remove_driver_sysfs(). Prototype was for cpuidle_remove_state_sysfs() instead Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-24cpuidle: menu: Fix typo in a commentJason Wang
The word `these' in a comment is repeated, so drop one. Signed-off-by: Jason Wang <wangborong@cdjrlc.com> [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-03Merge tag 'drivers-5.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC driver updates from Arnd Bergmann: "These are all the driver updates for SoC specific drivers. There are a couple of subsystems with individual maintainers picking up their patches here: - The reset controller subsystem add support for a few new SoC variants to existing drivers, along with other minor improvements - The OP-TEE subsystem gets a driver for the ARM FF-A transport - The memory controller subsystem has improvements for Tegra, Mediatek, Renesas, Freescale and Broadcom specific drivers. - The tegra cpuidle driver changes get merged through this tree this time. There are only minor changes, but they depend on other tegra driver updates here. - The ep93xx platform finally moves to using the drivers/clk/ subsystem, moving the code out of arch/arm in the process. This depends on a small sound driver change that is included here as well. - There are some minor updates for Qualcomm and Tegra specific firmware drivers. The other driver updates are mainly for drivers/soc, which contains a mixture of vendor specific drivers that don't really fit elsewhere: - Mediatek drivers gain more support for MT8192, with new support for hw-mutex and mmsys routing, plus support for reset lines in the mmsys driver. - Qualcomm gains a new "sleep stats" driver, and support for the "Generic Packet Router" in the APR driver. - There is a new user interface for routing the UARTS on ASpeed BMCs, something that apparently nobody else has needed so far. - More drivers can now be built as loadable modules, in particular for Broadcom and Samsung platforms. - Lots of improvements to the TI sysc driver for better suspend/resume support" Finally, there are lots of minor cleanups and new device IDs for amlogic, renesas, tegra, qualcomm, mediateka, samsung, imx, layerscape, allwinner, broadcom, and omap" * tag 'drivers-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (179 commits) optee: Fix spelling mistake "reclain" -> "reclaim" Revert "firmware: qcom: scm: Add support for MC boot address API" qcom: spm: allow compile-testing firmware: arm_ffa: Remove unused 'compat_version' variable soc: samsung: exynos-chipid: add exynosautov9 SoC support firmware: qcom: scm: Don't break compile test on non-ARM platforms soc: qcom: smp2p: Add of_node_put() before goto soc: qcom: apr: Add of_node_put() before return soc: qcom: qcom_stats: Fix client votes offset soc: qcom: rpmhpd: fix sm8350_mxc's peer domain dt-bindings: arm: cpus: Document qcom,msm8916-smp enable-method ARM: qcom: Add qcom,msm8916-smp enable-method identical to MSM8226 firmware: qcom: scm: Add support for MC boot address API soc: qcom: spm: Add 8916 SPM register data dt-bindings: soc: qcom: spm: Document qcom,msm8916-saw2-v3.0-cpu soc: qcom: socinfo: Add PM8150C and SMB2351 models firmware: qcom_scm: Fix error retval in __qcom_scm_is_call_available() soc: aspeed: Add UART routing support soc: fsl: dpio: rename the enqueue descriptor variable soc: fsl: dpio: use an explicit NULL instead of 0 ...
2021-10-13Merge tag 'qcom-drivers-for-5.16' of ↵Arnd Bergmann
git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/drivers Qualcomm driver updates for v5.16 This drops the use of power-domains for exposing the load_state from the QMP driver to clients, to avoid issues related to system suspend. SMP2P becomes wakeup capable, to allow dying remoteprocs to wake up Linux from suspend to perform recovery. It adds RPM power-domain support for SM6350 and MSM8953 and base RPM support for MSM8953 and QCM2290. It adds support for MSM8996, SDM630 and SDM660 in the SPM driver, which will enable the introduction of proper voltage scaling of the CPU subsystem. Support for releasing secondary CPUs on MSM8226 is introduced. The Asynchronous Packet Router (APR) driver is extended to support the new Generic Packet Router (GPR) variant, which is used to communicate with the firmware in the new AudioReach audio driver. Lastly it transitions a number of drivers to safer string functions, as well as switching things to use devm_platform_ioremap_resource(). * tag 'qcom-drivers-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: (40 commits) soc: qcom: apr: Add GPR support soc: dt-bindings: qcom: add gpr bindings soc: qcom: apr: make code more reuseable soc: dt-bindings: qcom: apr: deprecate qcom,apr-domain property soc: dt-bindings: qcom: apr: convert to yaml dt-bindings: soc: qcom: aoss: Delete unused power-domain definitions dt-bindings: msm/dp: Remove aoss-qmp header soc: qcom: aoss: Drop power domain support dt-bindings: soc: qcom: aoss: Drop the load state power-domain soc: qcom: smp2p: Add wakeup capability to SMP2P IRQ dt-bindings: power: rpmpd: Add SM6350 to rpmpd binding dt-bindings: soc: qcom: aoss: Add SM6350 compatible soc: qcom: llcc: Disable MMUHWT retention soc: qcom: smd-rpm: Add QCM2290 compatible dt-bindings: soc: qcom: smd-rpm: Add QCM2290 compatible firmware: qcom_scm: Add compatible for MSM8953 SoC dt-bindings: firmware: qcom-scm: Document msm8953 bindings soc: qcom: pdr: Prefer strscpy over strcpy soc: qcom: rpmh-rsc: Make use of the helper function devm_platform_ioremap_resource_byname() soc: qcom: gsbi: Make use of the helper function devm_platform_ioremap_resource() ... Link: https://lore.kernel.org/r/20211012173442.1017010-1-bjorn.andersson@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-10-05cpuidle: tegra: Check whether PMC is readyDmitry Osipenko
Check whether PMC is ready before proceeding with the cpuidle registration. This fixes racing with the PMC driver probe order, which results in a disabled deepest CC6 idling state if cpuidle driver is probed before the PMC. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-10-05cpuidle: tegra: Enable compile testingDmitry Osipenko
Enable compile testing of tegra-cpuidle driver. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
2021-10-01cpuidle: Fix kobject memory leaks in error pathsAnel Orazgaliyeva
Commit c343bf1ba5ef ("cpuidle: Fix three reference count leaks") fixes the cleanup of kobjects; however, it removes kfree() calls altogether, leading to memory leaks. Fix those and also defer the initialization of dev->kobj_dev until after the error check, so that we do not end up with a dangling pointer. Fixes: c343bf1ba5ef ("cpuidle: Fix three reference count leaks") Signed-off-by: Anel Orazgaliyeva <anelkz@amazon.de> Suggested-by: Aman Priyadarshi <apeureka@amazon.de> [ rjw: Subject edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-09-21cpuidle: qcom_spm: Detach state machine from main SPM handlingAngeloGioacchino Del Regno
In commit a871be6b8eee ("cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver") the SPM driver has been converted to a generic CPUidle driver: that was mainly made to simplify the driver and that was a great accomplishment; Though, at that time, this driver was only applicable to ARM 32-bit SoCs, lacking logic about the handling of newer generation SAW. In preparation for the enablement of SPM features on AArch64/ARM64, split the cpuidle-qcom-spm driver in two: the CPUIdle related state machine (currently used only on ARM SoCs) stays there, while the SPM communication handling lands back in soc/qcom/spm.c and also making sure to not discard the simplifications that were introduced in the aforementioned commit. Since now the "two drivers" are split, the SCM dependency in the main SPM handling is gone and for this reason it was also possible to move the SPM initialization early: this will also make sure that whenever the SAW CPUIdle driver is getting initialized, the SPM driver will be ready to do the job. Please note that the anticipation of the SPM initialization was also done to optimize the boot times on platforms that have their CPU/L2 idle states managed by other means (such as PSCI), while needing SAW initialization for other purposes, like AVS control. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org> Reviewed-by: Stephan Gerhold <stephan@gerhold.net> Tested-by: Stephan Gerhold <stephan@gerhold.net> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20210729155609.608159-2-angelogioacchino.delregno@somainline.org
2021-09-07Merge tag 'mfd-next-5.15' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "Core Frameworks: - Add support for registering devices via MFD cells to Simple MFD (I2C) New Drivers: - Add support for Renesas Synchronization Management Unit (SMU) New Device Support: - Add support for N5010 to Intel M10 BMC - Add support for Cannon Lake to Intel LPSS ACPI - Add support for Samsung SSG{1,2} to ST-Ericsson's U8500 family - Add support for TQMx110EB and TQMxE40x to TQ-Systems PLD TQMx86 New Functionality: - Add support for GPIO to Intel LPC ICH - Add support for Reset to Texas Instruments TPS65086 Fix-ups: - Trivial, sorting, whitespace, renaming, etc; mt6360-core, db8500-prcmu-regs, tqmx86 - Device Tree fiddling; syscon, axp20x, qcom,pm8008, ti,tps65086, brcm,cru - Use proper APIs for IRQ map resolution; ab8500-core, stmpe, tc3589x, wm8994-irq - Pass 'supplied-from' property through axp288_fuel_gauge via swnode - Remove unused file entry; MAINTAINERS - Make interrupt line optional; tps65086 - Rename db8500-cpuidle driver symbol; db8500-prcmu - Remove support for unused hardware; tqmx86 - Provide a standard LPC clock frequency for unknown boards; tqmx86 - Remove unused code; ti_am335x_tscadc - Use of_iomap() instead of ioremap(); syscon Bug Fixes: - Clear GPIO IRQ resource flags when no IRQ is set; tqmx86 - Fix incorrect/misleading frequencies; db8500-prcmu - Mitigate namespace clash with other GPIOBASE users" * tag 'mfd-next-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (31 commits) mfd: lpc_sch: Rename GPIOBASE to prevent build error mfd: syscon: Use of_iomap() instead of ioremap() dt-bindings: mfd: Add Broadcom CRU mfd: ti_am335x_tscadc: Delete superfluous error message mfd: tqmx86: Assume 24MHz LPC clock for unknown boards mfd: tqmx86: Add support for TQ-Systems DMI IDs mfd: tqmx86: Add support for TQMx110EB and TQMxE40x mfd: tqmx86: Fix typo in "platform" mfd: tqmx86: Remove incorrect TQMx90UC board ID mfd: tqmx86: Clear GPIO IRQ resource when no IRQ is set mfd: simple-mfd-i2c: Add support for registering devices via MFD cells mfd/cpuidle: ux500: Rename driver symbol mfd: tps65086: Add cell entry for reset driver mfd: tps65086: Make interrupt line optional dt-bindings: mfd: Convert tps65086.txt to YAML MAINTAINERS: Adjust ARM/NOMADIK/Ux500 ARCHITECTURES to file renaming mfd: db8500-prcmu: Handle missing FW variant mfd: db8500-prcmu: Rename register header mfd: axp20x: Add supplied-from property to axp288_fuel_gauge cell mfd: Don't use irq_create_mapping() to resolve a mapping ...
2021-09-03Merge tag 'powerpc-5.15-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Convert pseries & powernv to use MSI IRQ domains. - Rework the pseries CPU numbering so that CPUs that are removed, and later re-added, are given a CPU number on the same node as previously, when possible. - Add support for a new more flexible device-tree format for specifying NUMA distances. - Convert powerpc to GENERIC_PTDUMP. - Retire sbc8548 and sbc8641d board support. - Various other small features and fixes. Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard, Cédric Le Goater, Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas, Fangrui Song, Finn Thain, Gautham R. Shenoy, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo Bras, Lukas Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan Chancellor, Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R. Sampat, Randy Dunlap, Sebastian Andrzej Siewior, Srikar Dronamraju, Wan Jiabing, Xiongwei Song, and Zheng Yongjun. * tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (154 commits) powerpc/bug: Cast to unsigned long before passing to inline asm powerpc/ptdump: Fix generic ptdump for 64-bit KVM: PPC: Fix clearing never mapped TCEs in realmode powerpc/pseries/iommu: Rename "direct window" to "dma window" powerpc/pseries/iommu: Make use of DDW for indirect mapping powerpc/pseries/iommu: Find existing DDW with given property name powerpc/pseries/iommu: Update remove_dma_window() to accept property name powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw() powerpc/pseries/iommu: Allow DDW windows starting at 0x00 powerpc/pseries/iommu: Add ddw_list_new_entry() helper powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper powerpc/kernel/iommu: Add new iommu_table_in_use() helper powerpc/pseries/iommu: Replace hard-coded page shift powerpc/numa: Update cpu_cpu_map on CPU online/offline powerpc/numa: Print debug statements only when required powerpc/numa: convert printk to pr_xxx powerpc/numa: Drop dbg in favour of pr_debug powerpc/smp: Enable CACHE domain for shared processor powerpc/smp: Update cpu_core_map on all PowerPc systems ...
2021-08-16mfd/cpuidle: ux500: Rename driver symbolLinus Walleij
The PRCMU driver defines this as a DT node but there are no bindings for it and it needs no data from the device tree. Just spawn the device directly in the same way as the watchdog. Name it "db8500-cpuidle" since there are no ambitions to support any more SoCs than this one. This rids this annoying boot message: [ 0.032610] cpuidle-dbx500: Failed to locate of_node [id: 0] However I think the device still spawns and work just fine, despite not finding a device tree node. Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
2021-08-04cpuidle: pseries: Mark pseries_idle_proble() as __initNathan Chancellor
After commit 7cbd631d4dec ("cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwards"), pseries_idle_probe() is no longer inlined when compiling with clang, which causes a modpost warning: WARNING: modpost: vmlinux.o(.text+0xc86a54): Section mismatch in reference from the function pseries_idle_probe() to the function .init.text:fixup_cede0_latency() The function pseries_idle_probe() references the function __init fixup_cede0_latency(). This is often because pseries_idle_probe lacks a __init annotation or the annotation of fixup_cede0_latency is wrong. pseries_idle_probe() is a non-init function, which calls fixup_cede0_latency(), which is an init function, explaining the mismatch. pseries_idle_probe() is only called from pseries_processor_idle_init(), which is an init function, so mark pseries_idle_probe() as __init so there is no more warning. Fixes: 054e44ba99ae ("cpuidle: pseries: Add function to parse extended CEDE records") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210803211547.1093820-1-nathan@kernel.org
2021-08-03cpuidle: teo: Rename two local variables in teo_select()Rafael J. Wysocki
Rename two local variables in teo_select() so that their names better reflect their purpose. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-03cpuidle: teo: Fix alternative idle state lookupRafael J. Wysocki
There are three mistakes in the loop in teo_select() that is looking for an alternative candidate idle state. First, it should walk all of the idle states shallower than the current candidate one, including all of the disabled ones, but it terminates after the first enabled idle state. Second, it should not terminate its last step if idle state 0 is disabled (which is related to the first issue). Finally, it may return the current alternative candidate idle state prematurely if the time span criterion is not met by the idle state under consideration at the moment. To address the issues mentioned above, make the loop in question walk all of the idle states shallower than the current candidate idle state all the way down to idle state 0 and rearrange the checks in it. Fixes: 77577558f25d ("cpuidle: teo: Rework most recent idle duration values treatment") Reported-by: Doug Smythies <dsmythies@telus.net> Tested-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-03cpuidle: pseries: Do not cap the CEDE0 latency in fixup_cede0_latency()Gautham R. Shenoy
Currently in fixup_cede0_latency() code, we perform the fixup the CEDE(0) exit latency value only if minimum advertized extended CEDE latency values are less than 10us. This was done so as to not break the expected behaviour on POWER8 platforms where the advertised latency was higher than the default 10us, which would delay the SMT folding on the core. However, after the earlier patch "cpuidle/pseries: Fixup CEDE0 latency only for POWER10 onwards", we can be sure that the fixup of CEDE0 latency is going to happen only from POWER10 onwards. Hence unconditionally use the minimum exit latency provided by the platform. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1626676399-15975-3-git-send-email-ego@linux.vnet.ibm.com
2021-08-03cpuidle: pseries: Fixup CEDE0 latency only for POWER10 onwardsGautham R. Shenoy
Commit d947fb4c965c ("cpuidle: pseries: Fixup exit latency for CEDE(0)") sets the exit latency of CEDE(0) based on the latency values of the Extended CEDE states advertised by the platform On POWER9 LPARs, the firmwares advertise a very low value of 2us for CEDE1 exit latency on a Dedicated LPAR. The latency advertized by the PHYP hypervisor corresponds to the latency required to wakeup from the underlying hardware idle state. However the wakeup latency from the LPAR perspective should include 1. The time taken to transition the CPU from the Hypervisor into the LPAR post wakeup from platform idle state 2. Time taken to send the IPI from the source CPU (waker) to the idle target CPU (wakee). 1. can be measured via timer idle test, where we queue a timer, say for 1ms, and enter the CEDE state. When the timer fires, in the timer handler we compute how much extra timer over the expected 1ms have we consumed. On a a POWER9 LPAR the numbers are CEDE latency measured using a timer (numbers in ns) N Min Median Avg 90%ile 99%ile Max Stddev 400 2601 5677 5668.74 5917 6413 9299 455.01 1. and 2. combined can be determined by an IPI latency test where we send an IPI to an idle CPU and in the handler compute the time difference between when the IPI was sent and when the handler ran. We see the following numbers on POWER9 LPAR. CEDE latency measured using an IPI (numbers in ns) N Min Median Avg 90%ile 99%ile Max Stddev 400 711 7564 7369.43 8559 9514 9698 1200.01 Suppose, we consider the 99th percentile latency value measured using the IPI to be the wakeup latency, the value would be 9.5us This is in the ballpark of the default value of 10us. Hence, use the exit latency of CEDE(0) based on the latency values advertized by platform only from POWER10 onwards. The values advertized on POWER10 platforms is more realistic and informed by the latency measurements. For earlier platforms stick to the default value of 10us. The fix was suggested by Michael Ellerman. Fixes: d947fb4c965c ("cpuidle: pseries: Fixup exit latency for CEDE(0)") Reported-by: Enrico Joedecke <joedecke@de.ibm.com> Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1626676399-15975-2-git-send-email-ego@linux.vnet.ibm.com
2021-06-30Merge tag 'cpuidle-v5.14-rc1' of ↵Rafael J. Wysocki
https://git.linaro.org/people/daniel.lezcano/linux Pull ARM cpuidle updates for v5.14 from Daniel Lezcano: "- Add support for Qcom MSM8226 (Bartosz Dudziak)" * tag 'cpuidle-v5.14-rc1' of https://git.linaro.org/people/daniel.lezcano/linux: cpuidle: qcom: Add SPM register data for MSM8226 dt-bindings: arm: msm: Add SAW2 for MSM8226
2021-06-29Merge tag 'pm-5.14-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add hybrid processors support to the intel_pstate driver and make it work with more processor models when HWP is disabled, make the intel_idle driver use special C6 idle state paremeters when package C-states are disabled, add cooling support to the tegra30 devfreq driver, rework the TEO (timer events oriented) cpuidle governor, extend the OPP (operating performance points) framework to use the required-opps DT property in more cases, fix some issues and clean up a number of assorted pieces of code. Specifics: - Make intel_pstate support hybrid processors using abstract performance units in the HWP interface (Rafael Wysocki). - Add Icelake servers and Cometlake support in no-HWP mode to intel_pstate (Giovanni Gherdovich). - Make cpufreq_online() error path be consistent with the CPU device removal path in cpufreq (Rafael Wysocki). - Clean up 3 cpufreq drivers and the statistics code (Hailong Liu, Randy Dunlap, Shaokun Zhang). - Make intel_idle use special idle state parameters for C6 when package C-states are disabled (Chen Yu). - Rework the TEO (timer events oriented) cpuidle governor to address some theoretical shortcomings in it (Rafael Wysocki). - Drop unneeded semicolon from the TEO governor (Wan Jiabing). - Modify the runtime PM framework to accept unassigned suspend and resume callback pointers (Ulf Hansson). - Improve pm_runtime_get_sync() documentation (Krzysztof Kozlowski). - Improve device performance states support in the generic power domains (genpd) framework (Ulf Hansson). - Fix some documentation issues in genpd (Yang Yingliang). - Make the operating performance points (OPP) framework use the required-opps DT property in use cases that are not related to genpd (Hsin-Yi Wang). - Make lazy_link_required_opp_table() use list_del_init instead of list_del/INIT_LIST_HEAD (Yang Yingliang). - Simplify wake IRQs handling in the core system-wide sleep support code and clean up some coding style inconsistencies in it (Tian Tao, Zhen Lei). - Add cooling support to the tegra30 devfreq driver and improve its DT bindings (Dmitry Osipenko). - Fix some assorted issues in the devfreq core and drivers (Chanwoo Choi, Dong Aisheng, YueHaibing)" * tag 'pm-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (39 commits) PM / devfreq: passive: Fix get_target_freq when not using required-opp cpufreq: Make cpufreq_online() call driver->offline() on errors opp: Allow required-opps to be used for non genpd use cases cpuidle: teo: remove unneeded semicolon in teo_select() dt-bindings: devfreq: tegra30-actmon: Add cooling-cells dt-bindings: devfreq: tegra30-actmon: Convert to schema PM / devfreq: userspace: Use DEVICE_ATTR_RW macro PM: runtime: Clarify documentation when callbacks are unassigned PM: runtime: Allow unassigned ->runtime_suspend|resume callbacks PM: runtime: Improve path in rpm_idle() when no callback PM: hibernate: remove leading spaces before tabs PM: sleep: remove trailing spaces and tabs PM: domains: Drop/restore performance state votes for devices at runtime PM PM: domains: Return early if perf state is already set for the device PM: domains: Split code in dev_pm_genpd_set_performance_state() cpuidle: teo: Use kerneldoc documentation in admin-guide cpuidle: teo: Rework most recent idle duration values treatment cpuidle: teo: Change the main idle state selection logic cpuidle: teo: Cosmetic modification of teo_select() cpuidle: teo: Cosmetic modifications of teo_update() ...
2021-06-17cpuidle: teo: remove unneeded semicolon in teo_select()Wan Jiabing
Fix following coccicheck warning: drivers/cpuidle/governors/teo.c:315:10-11: Unneeded semicolon Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-16cpuidle: qcom: Add SPM register data for MSM8226Bartosz Dudziak
Add MSM8226 register data to SPM AVS Wrapper 2 (SAW2) power controller driver. Reviewed-by: Stephan Gerhold <stephan@gerhold.net> Signed-off-by: Bartosz Dudziak <bartosz.dudziak@snejp.pl> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210612205335.9730-3-bartosz.dudziak@snejp.pl
2021-06-11cpuidle: teo: Use kerneldoc documentation in admin-guideRafael J. Wysocki
There are two descriptions of the TEO (Timer Events Oriented) cpuidle governor in the kernel source tree, one in the C file containing its code and one in cpuidle.rst which is part of admin-guide. Instead of trying to keep them both in sync and in order to reduce text duplication, include the governor description from the C file directly into cpuidle.rst. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-11cpuidle: teo: Rework most recent idle duration values treatmentRafael J. Wysocki
The TEO (Timer Events Oriented) cpuidle governor uses several most recent idle duration values for a given CPU to refine the idle state selection in case the previous long-term trends have not been followed recently and a new trend appears to be forming. That is done by computing the average of the most recent idle duration values falling below the time till the next timer event ("sleep length"), provided that they are the majority of the most recent idle duration values taken into account, and using it as the new expected idle duration value. However, idle state selection based on that value may not be optimal, because the average does not really indicate which of the idle states with target residencies less than or equal to it is likely to be the best fit. Thus, instead of computing the average, make the governor carry out computations based on the distribution of the most recent idle duration values among the bins corresponding to different idle states. Namely, if the majority of the most recent idle duration values taken into consideration are less than the current sleep length (which means that the CPU is likely to wake up early), find the idle state closest to the "candidate" one "matching" the sleep length whose target residency is less than or equal to the majority of the most recent idle duration values that have fallen below the current sleep length (which means that it is likely to be "shallow enough" this time). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-11cpuidle: teo: Change the main idle state selection logicRafael J. Wysocki
Two aspects of the current main idle state selection logic in the TEO (Timer Events Oriented) cpuidle governor are quite questionable. First of all, the "hits" and "misses" metrics used by it are only updated for a given idle state if the time till the next timer event ("sleep length") is between the target residency of that state and the target residency of the next one. Consequently, they are likely to become stale if the sleep length tends to fall outside that interval which increases the likelihood of subomtimal idle state selection. Second, the decision on whether or not to select the idle state "matching" the sleep length is based on the metrics collected for that state alone, whereas in principle the metrics collected for the other idle states should be taken into consideration when that decision is made. For example, if the measured idle duration is less than the target residency of the idle state "matching" the sleep length, then it is also less than the target residency of any deeper idle state and that should be taken into account when considering whether or not to select any of those states, but currently it is not. In order to address the above shortcomings, modify the main idle state selection logic in the TEO governor to take the metrics collected for all of the idle states into account when deciding whether or not to select the one "matching" the sleep length. Moreover, drop the "misses" metric that becomes redundant after the above change and rename the "early_hits" metric to "intercepts" so that its role is better reflected by its name (the idea being that if a CPU wakes up earlier than indicated by the sleep length, then it must be a result of a non-timer interrupt that "intercepts" the CPU). Also rename the states[] array in struct struct teo_cpu to state_bins[] to avoid confusing it with the states[] array in struct cpuidle_driver and update the documentation to match the new code (and make it more comprehensive while at it). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-11cpuidle: teo: Cosmetic modification of teo_select()Rafael J. Wysocki
Initialize local variables in teo_select() where they are declared. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-06-11cpuidle: teo: Cosmetic modifications of teo_update()Rafael J. Wysocki
Rename a local variable in teo_update() so that its purpose is better reflected by its name and use one more local variable in the loop over the CPU idle states in that function to make the code somewhat easier to read. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-05-12sched: Make nr_iowait_cpu() return 32-bit valueAlexey Dobriyan
Runqueue ->nr_iowait counters are 32-bit anyway. Propagate 32-bitness into other code, but don't try too hard. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210422200228.1423391-3-adobriyan@gmail.com
2021-04-08Merge back earlier cpuidle updates for v5.13.Rafael J. Wysocki
2021-04-08cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configurationHe Ying
When CONFIG_ARM_QCOM_SPM_CPUIDLE is y and CONFIG_MMU is not set, compiling errors are encountered as follows: drivers/cpuidle/cpuidle-qcom-spm.o: In function `spm_dev_probe': cpuidle-qcom-spm.c:(.text+0x140): undefined reference to `cpu_resume_arm' cpuidle-qcom-spm.c:(.text+0x148): undefined reference to `cpu_resume_arm' Note that cpu_resume_arm is defined when MMU is set. So, add dependency on MMU in ARM_QCOM_SPM_CPUIDLE configuration. Fixes: a871be6b8eee ("cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: He Ying <heying24@huawei.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210406123328.92904-1-heying24@huawei.com
2021-04-08cpuidle: tegra: Remove do_idle firmware callDmitry Osipenko
The do_idle firmware call is unused by all Tegra SoCs, hence remove it in order to keep driver's code clean. Tested-by: Anton Bambura <jenneron@protonmail.com> # TF701 T114 Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210302095405.28453-2-digetx@gmail.com
2021-04-08cpuidle: tegra: Fix C7 idling state on Tegra114Dmitry Osipenko
Trusted Foundation firmware doesn't implement the do_idle call and in this case suspending should fall back to the common suspend path. In order to fix this issue we will unconditionally set the NOFLUSH_L2 mode via firmware call, which is a NO-OP on Tegra30/124, and then proceed to the C7 idling, like it was done by the older Tegra114 cpuidle driver. Fixes: 14e086baca50 ("cpuidle: tegra: Squash Tegra114 driver into the common driver") Cc: stable@vger.kernel.org # 5.7+ Reported-by: Anton Bambura <jenneron@protonmail.com> # TF701 T114 Tested-by: Anton Bambura <jenneron@protonmail.com> # TF701 T114 Tested-by: Matt Merhar <mattmerhar@protonmail.com> # Ouya T30 Tested-by: Peter Geis <pgwipeout@gmail.com> # Ouya T30 Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20210302095405.28453-1-digetx@gmail.com
2021-04-07cpuidle: menu: Take negative "sleep length" values into accountRafael J. Wysocki
Make the menu governor check the tick_nohz_get_next_hrtimer() return value so as to avoid dealing with negative "sleep length" values and make it use that value directly when the tick is stopped. While at it, rename local variable delta_next in menu_select() to delta_tick which better reflects its purpose. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-07cpuidle: teo: Take negative "sleep length" values into accountRafael J. Wysocki
Modify the TEO governor to take possible negative return values of tick_nohz_get_next_hrtimer() into account by changing the data type of some variables used by it to s64 which allows it to carry out computations without potentially problematic data type conversions into u64. Also change the computations in teo_select() so that the negative values themselves are handled in a natural way to avoid adding extra negative value checks to that function. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-07cpuidle: teo: Adjust handling of very short idle timesRafael J. Wysocki
If the time till the next timer event is shorter than the target residency of the first idle state (state 0), the TEO governor does not update its metrics for any idle states, but arguably it should record a "hit" for idle state 0 in that case, so modify it to do that. Accordingly, also make it record an "early hit" for idle state 0 if the measured idle duration is less than its target residency, which allows one branch more to be dropped from teo_update(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-04-07cpuidle: Use s64 as exit_latency_ns and target_residency_ns data typeRafael J. Wysocki
Subsequent changes will cause the exit_latency_ns and target_residency_ns fields in struct cpuidle_state to be used in computations in which data type conversions to u64 may turn a negative number close to zero into a verly large positive number leading to incorrect results. In preparation for that, change the data type of the fields mentioned above to s64, but ensure that they will not be negative themselves. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-12-16Merge tag 'arm-soc-drivers-5.11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC driver updates from Arnd Bergmann: "There are a couple of subsystems maintained by other people that merge their drivers through the SoC tree, those changes include: - The SCMI firmware framework gains support for sensor notifications and for controlling voltage domains. - A large update for the Tegra memory controller driver, integrating it better with the interconnect framework - The memory controller subsystem gains support for Mediatek MT8192 - The reset controller framework gains support for sharing pulsed resets For Soc specific drivers in drivers/soc, the main changes are - The Allwinner/sunxi MBUS gets a rework for the way it handles dma_map_ops and offsets between physical and dma address spaces. - An errata fix plus some cleanups for Freescale Layerscape SoCs - A cleanup for renesas drivers regarding MMIO accesses. - New SoC specific drivers for Mediatek MT8192 and MT8183 power domains - New SoC specific drivers for Aspeed AST2600 LPC bus control and SoC identification. - Core Power Domain support for Qualcomm MSM8916, MSM8939, SDM660 and SDX55. - A rework of the TI AM33xx 'genpd' power domain support to use information from DT instead of platform data - Support for TI AM64x SoCs - Allow building some Amlogic drivers as modules instead of built-in Finally, there are numerous cleanups and smaller bug fixes for Mediatek, Tegra, Samsung, Qualcomm, TI OMAP, Amlogic, Rockchips, Renesas, and Xilinx SoCs" * tag 'arm-soc-drivers-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (222 commits) soc: mediatek: mmsys: Specify HAS_IOMEM dependency for MTK_MMSYS firmware: xilinx: Properly align function parameter firmware: xilinx: Add a blank line after function declaration firmware: xilinx: Remove additional newline firmware: xilinx: Fix kernel-doc warnings firmware: xlnx-zynqmp: fix compilation warning soc: xilinx: vcu: add missing register NUM_CORE soc: xilinx: vcu: use vcu-settings syscon registers dt-bindings: soc: xlnx: extract xlnx, vcu-settings to separate binding soc: xilinx: vcu: drop useless success message clk: samsung: mark PM functions as __maybe_unused soc: samsung: exynos-chipid: initialize later - with arch_initcall soc: samsung: exynos-chipid: order list of SoCs by name memory: jz4780_nemc: Fix potential NULL dereference in jz4780_nemc_probe() memory: ti-emif-sram: only build for ARMv7 memory: tegra30: Support interconnect framework memory: tegra20: Support hardware versioning and clean up OPP table initialization dt-bindings: memory: tegra20-emc: Document opp-supported-hw property soc: rockchip: io-domain: Fix error return code in rockchip_iodomain_probe() reset-controller: ti: force the write operation when assert or deassert ...
2020-12-15Merge tag 'pm-5.11-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These update cpufreq (core and drivers), cpuidle (polling state implementation and the PSCI driver), the OPP (operating performance points) framework, devfreq (core and drivers), the power capping RAPL (Running Average Power Limit) driver, the Energy Model support, the generic power domains (genpd) framework, the ACPI device power management, the core system-wide suspend code and power management utilities. Specifics: - Use local_clock() instead of jiffies in the cpufreq statistics to improve accuracy (Viresh Kumar). - Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq drivers (Viresh Kumar). - Clean up the cpufreq core, the intel_pstate driver and the schedutil cpufreq governor (Rafael Wysocki). - Fix up error code paths in the sti-cpufreq and mediatek cpufreq drivers (Yangtao Li, Qinglang Miao). - Fix cpufreq_online() to return error codes instead of success (0) in all cases when it fails (Wang ShaoBo). - Add mt8167 support to the mediatek cpufreq driver and blacklist mt8516 in the cpufreq-dt-platdev driver (Fabien Parent). - Modify the tegra194 cpufreq driver to always return values from the frequency table as the current frequency and clean up that driver (Sumit Gupta, Jon Hunter). - Modify the arm_scmi cpufreq driver to allow it to discover the power scale present in the performance protocol and provide this information to the Energy Model (Lukasz Luba). - Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali Rohár). - Clean up the CPPC cpufreq driver (Ionela Voinescu). - Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd Bergmann). - Rework the poling interval selection for the polling state in cpuidle (Mel Gorman). - Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver (Ulf Hansson). - Modify the OPP framework to support empty (node-less) OPP tables in DT for passing dependency information (Nicola Mazzucato). - Fix potential lockdep issue in the OPP core and clean up the OPP core (Viresh Kumar). - Modify dev_pm_opp_put_regulators() to accept a NULL argument and update its users accordingly (Viresh Kumar). - Add frequency changes tracepoint to devfreq (Matthias Kaehlcke). - Add support for governor feature flags to devfreq, make devfreq sysfs file permissions depend on the governor and clean up the devfreq core (Chanwoo Choi). - Clean up the tegra20 devfreq driver and deprecate it to allow another driver based on EMC_STAT to be used instead of it (Dmitry Osipenko). - Add interconnect support to the tegra30 devfreq driver, allow it to take the interconnect and OPP information from DT and clean it up (Dmitry Osipenko). - Add interconnect support to the exynos-bus devfreq driver along with interconnect properties documentation (Sylwester Nawrocki). - Add suport for AMD Fam17h and Fam19h processors to the RAPL power capping driver (Victor Ding, Kim Phillips). - Fix handling of overly long constraint names in the powercap framework (Lukasz Luba). - Fix the wakeup configuration handling for bridges in the ACPI device power management core (Rafael Wysocki). - Add support for using an abstract scale for power units in the Energy Model (EM) and document it (Lukasz Luba). - Add em_cpu_energy() micro-optimization to the EM (Pavankumar Kondeti). - Modify the generic power domains (genpd) framwework to support suspend-to-idle (Ulf Hansson). - Fix creation of debugfs nodes in genpd (Thierry Strudel). - Clean up genpd (Lina Iyer). - Clean up the core system-wide suspend code and make it print driver flags for devices with debug enabled (Alex Shi, Patrice Chotard, Chen Yu). - Modify the ACPI system reboot code to make it prepare for system power off to avoid confusing the platform firmware (Kai-Heng Feng). - Update the pm-graph (multiple changes, mostly usability-related) and cpupower (online and offline CPU information support) PM utilities (Todd Brandt, Brahadambal Srinivasan)" * tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits) cpufreq: Fix cpufreq_online() return value on errors cpufreq: Fix up several kerneldoc comments cpufreq: stats: Use local_clock() instead of jiffies cpufreq: schedutil: Simplify sugov_update_next_freq() cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() PM: domains: create debugfs nodes when adding power domains opp: of: Allow empty opp-table with opp-shared dt-bindings: opp: Allow empty OPP tables media: venus: dev_pm_opp_put_*() accepts NULL argument drm/panfrost: dev_pm_opp_put_*() accepts NULL argument drm/lima: dev_pm_opp_put_*() accepts NULL argument PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table opp: Don't create an OPP table from dev_pm_opp_get_opp_table() cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table opp: Reduce the size of critical section in _opp_kref_release() PM / EM: Micro optimization in em_cpu_energy cpufreq: arm_scmi: Discover the power scale in performance protocol ...
2020-12-01cpuidle: Select polling interval based on a c-state with a longer target ↵Mel Gorman
residency It was noted that a few workloads that idle rapidly regressed when commit 36fcb4292473 ("cpuidle: use first valid target residency as poll time") was merged. The workloads in question were heavy communicators that idle rapidly and were impacted by the c-state exit latency as the active CPUs were not polling at the time of wakeup. As they were not particularly realistic workloads, it was not considered to be a major problem. Unfortunately, a bug was reported for a real workload in a production environment that relied on large numbers of threads operating in a worker pool pattern. These threads would idle for periods of time longer than the C1 target residency and so incurred the c-state exit latency penalty. The application is very sensitive to wakeup latency and indirectly relying on behaviour prior to commit on a37b969a61c1 ("cpuidle: poll_state: Add time limit to poll_idle()") to poll for long enough to avoid the exit latency cost. The target residency of C1 is typically very short. On some x86 machines, it can be as low as 2 microseconds. In poll_idle(), the clock is checked every POLL_IDLE_RELAX_COUNT interations of cpu_relax() and even one iteration of that loop can be over 1 microsecond so the polling interval is very close to the granularity of what poll_idle() can detect. Furthermore, a basic ping pong workload like perf bench pipe has a longer round-trip time than the 2 microseconds meaning that the CPU will almost certainly not be polling when the ping-pong completes. This patch selects a polling interval based on an enabled c-state that has an target residency longer than 10usec. If there is no enabled-cstate then polling will be up to a TICK_NSEC/16 similar to what it was up until kernel 4.20. Polling for a full tick is unlikely (rescheduling event) and is much longer than the existing target residencies for a deep c-state. As an example, consider a CPU with the following c-state information from an Intel CPU; residency exit_latency C1 2 2 C1E 20 10 C3 100 33 C6 400 133 The polling interval selected is 20usec. If booted with intel_idle.max_cstate=1 then the polling interval is 250usec as the deeper c-states were not available. On an AMD EPYC machine, the c-state information is more limited and looks like residency exit_latency C1 2 1 C2 800 400 The polling interval selected is 250usec. While C2 was considered, the polling interval was clamped by CPUIDLE_POLL_MAX. Note that it is not expected that polling will be a universal win. As well as potentially trading power for performance, the performance is not guaranteed if the extra polling prevented a turbo state being reached. Making it a tunable was considered but it's driver-specific, may be overridden by a governor and is not a guaranteed polling interval making it difficult to describe without knowledge of the implementation. tbench4 vanilla polling Hmean 1 497.89 ( 0.00%) 543.15 * 9.09%* Hmean 2 975.88 ( 0.00%) 1059.73 * 8.59%* Hmean 4 1953.97 ( 0.00%) 2081.37 * 6.52%* Hmean 8 3645.76 ( 0.00%) 4052.95 * 11.17%* Hmean 16 6882.21 ( 0.00%) 6995.93 * 1.65%* Hmean 32 10752.20 ( 0.00%) 10731.53 * -0.19%* Hmean 64 12875.08 ( 0.00%) 12478.13 * -3.08%* Hmean 128 21500.54 ( 0.00%) 21098.60 * -1.87%* Hmean 256 21253.70 ( 0.00%) 21027.18 * -1.07%* Hmean 320 20813.50 ( 0.00%) 20580.64 * -1.12%* Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-27Merge branch 'linus' into sched/core, to resolve semantic conflictIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-11-24smp: Cleanup smp_call_function*()Peter Zijlstra
Get rid of the __call_single_node union and cleanup the API a little to avoid external code relying on the structure layout as much. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
2020-11-23Merge back cpuidle changes for v5.11.Rafael J. Wysocki
2020-11-16cpuidle: tegra: Annotate tegra_pm_set_cpu_in_lp2() with RCU_NONIDLEDmitry Osipenko
Annotate tegra_pm_set[clear]_cpu_in_lp2() with RCU_NONIDLE in order to fix lockdep warning about suspicious RCU usage of a spinlock during late idling phase. WARNING: suspicious RCU usage ... include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage! ... (dump_stack) from (lock_acquire) (lock_acquire) from (_raw_spin_lock) (_raw_spin_lock) from (tegra_pm_set_cpu_in_lp2) (tegra_pm_set_cpu_in_lp2) from (tegra_cpuidle_enter) (tegra_cpuidle_enter) from (cpuidle_enter_state) (cpuidle_enter_state) from (cpuidle_enter_state_coupled) (cpuidle_enter_state_coupled) from (cpuidle_enter) (cpuidle_enter) from (do_idle) ... Tested-by: Peter Geis <pgwipeout@gmail.com> Reported-by: Peter Geis <pgwipeout@gmail.com> Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-11-10cpuidle: psci: Enable suspend-to-idle for PSCI OSI modeUlf Hansson
To select domain idlestates for cpuidle-psci when OSI mode has been enabled, the PM domains via genpd are being managed through runtime PM. This works fine for the regular idlepath, but it doesn't during system wide suspend. More precisely, the domain idlestates becomes temporarily disabled, which is because the PM core disables runtime PM for devices during system wide suspend. Later in the system suspend phase, genpd intends to deal with this from its ->suspend_noirq() callback, but this doesn't work as expected for a device corresponding to a CPU, because the domain idlestates needs to be selected on a per CPU basis (the PM core doesn't invoke the callbacks like that). To address this problem, let's enable the syscore flag for the corresponding CPU device that becomes successfully attached to its PM domain (applicable only in OSI mode). This informs the PM core to skip invoke the system wide suspend/resume callbacks for the device, thus also prevents genpd from screwing up its internal state of it. Moreover, to properly select a domain idlestate for the CPUs during suspend-to-idle, let's assign a specific ->enter_s2idle() callback for the corresponding domain idlestate (applicable only in OSI mode). From that callback, let's invoke dev_pm_genpd_suspend|resume(), as this allows a domain idlestate to be selected for the current CPU by genpd. Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>