diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-12-15 16:30:31 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-12-15 16:30:31 -0800 |
commit | b4ec805464a4a0299216a003278351d0b4806450 (patch) | |
tree | 1a864de7b10116bbf6cf16c9095a19921d7ed7eb /Documentation/power/energy-model.rst | |
parent | b109bc72295363fb746bc42bdd777f7a8abb177b (diff) | |
parent | b3fac817830306328d5195e7f4fb332277f3b146 (diff) |
Merge tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These update cpufreq (core and drivers), cpuidle (polling state
implementation and the PSCI driver), the OPP (operating performance
points) framework, devfreq (core and drivers), the power capping RAPL
(Running Average Power Limit) driver, the Energy Model support, the
generic power domains (genpd) framework, the ACPI device power
management, the core system-wide suspend code and power management
utilities.
Specifics:
- Use local_clock() instead of jiffies in the cpufreq statistics to
improve accuracy (Viresh Kumar).
- Fix up OPP usage in the cpufreq-dt and qcom-cpufreq-nvmem cpufreq
drivers (Viresh Kumar).
- Clean up the cpufreq core, the intel_pstate driver and the
schedutil cpufreq governor (Rafael Wysocki).
- Fix up error code paths in the sti-cpufreq and mediatek cpufreq
drivers (Yangtao Li, Qinglang Miao).
- Fix cpufreq_online() to return error codes instead of success (0)
in all cases when it fails (Wang ShaoBo).
- Add mt8167 support to the mediatek cpufreq driver and blacklist
mt8516 in the cpufreq-dt-platdev driver (Fabien Parent).
- Modify the tegra194 cpufreq driver to always return values from the
frequency table as the current frequency and clean up that driver
(Sumit Gupta, Jon Hunter).
- Modify the arm_scmi cpufreq driver to allow it to discover the
power scale present in the performance protocol and provide this
information to the Energy Model (Lukasz Luba).
- Add missing MODULE_DEVICE_TABLE to several cpufreq drivers (Pali
Rohár).
- Clean up the CPPC cpufreq driver (Ionela Voinescu).
- Fix NVMEM_IMX_OCOTP dependency in the imx cpufreq driver (Arnd
Bergmann).
- Rework the poling interval selection for the polling state in
cpuidle (Mel Gorman).
- Enable suspend-to-idle for PSCI OSI mode in the PSCI cpuidle driver
(Ulf Hansson).
- Modify the OPP framework to support empty (node-less) OPP tables in
DT for passing dependency information (Nicola Mazzucato).
- Fix potential lockdep issue in the OPP core and clean up the OPP
core (Viresh Kumar).
- Modify dev_pm_opp_put_regulators() to accept a NULL argument and
update its users accordingly (Viresh Kumar).
- Add frequency changes tracepoint to devfreq (Matthias Kaehlcke).
- Add support for governor feature flags to devfreq, make devfreq
sysfs file permissions depend on the governor and clean up the
devfreq core (Chanwoo Choi).
- Clean up the tegra20 devfreq driver and deprecate it to allow
another driver based on EMC_STAT to be used instead of it (Dmitry
Osipenko).
- Add interconnect support to the tegra30 devfreq driver, allow it to
take the interconnect and OPP information from DT and clean it up
(Dmitry Osipenko).
- Add interconnect support to the exynos-bus devfreq driver along
with interconnect properties documentation (Sylwester Nawrocki).
- Add suport for AMD Fam17h and Fam19h processors to the RAPL power
capping driver (Victor Ding, Kim Phillips).
- Fix handling of overly long constraint names in the powercap
framework (Lukasz Luba).
- Fix the wakeup configuration handling for bridges in the ACPI
device power management core (Rafael Wysocki).
- Add support for using an abstract scale for power units in the
Energy Model (EM) and document it (Lukasz Luba).
- Add em_cpu_energy() micro-optimization to the EM (Pavankumar
Kondeti).
- Modify the generic power domains (genpd) framwework to support
suspend-to-idle (Ulf Hansson).
- Fix creation of debugfs nodes in genpd (Thierry Strudel).
- Clean up genpd (Lina Iyer).
- Clean up the core system-wide suspend code and make it print driver
flags for devices with debug enabled (Alex Shi, Patrice Chotard,
Chen Yu).
- Modify the ACPI system reboot code to make it prepare for system
power off to avoid confusing the platform firmware (Kai-Heng Feng).
- Update the pm-graph (multiple changes, mostly usability-related)
and cpupower (online and offline CPU information support) PM
utilities (Todd Brandt, Brahadambal Srinivasan)"
* tag 'pm-5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (86 commits)
cpufreq: Fix cpufreq_online() return value on errors
cpufreq: Fix up several kerneldoc comments
cpufreq: stats: Use local_clock() instead of jiffies
cpufreq: schedutil: Simplify sugov_update_next_freq()
cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate()
PM: domains: create debugfs nodes when adding power domains
opp: of: Allow empty opp-table with opp-shared
dt-bindings: opp: Allow empty OPP tables
media: venus: dev_pm_opp_put_*() accepts NULL argument
drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
drm/lima: dev_pm_opp_put_*() accepts NULL argument
PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
opp: Reduce the size of critical section in _opp_kref_release()
PM / EM: Micro optimization in em_cpu_energy
cpufreq: arm_scmi: Discover the power scale in performance protocol
...
Diffstat (limited to 'Documentation/power/energy-model.rst')
-rw-r--r-- | Documentation/power/energy-model.rst | 30 |
1 files changed, 25 insertions, 5 deletions
diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst index a6fb986abe3c..60ac091d3b0d 100644 --- a/Documentation/power/energy-model.rst +++ b/Documentation/power/energy-model.rst @@ -20,6 +20,21 @@ possible source of information on its own, the EM framework intervenes as an abstraction layer which standardizes the format of power cost tables in the kernel, hence enabling to avoid redundant work. +The power values might be expressed in milli-Watts or in an 'abstract scale'. +Multiple subsystems might use the EM and it is up to the system integrator to +check that the requirements for the power value scale types are met. An example +can be found in the Energy-Aware Scheduler documentation +Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or +powercap power values expressed in an 'abstract scale' might cause issues. +These subsystems are more interested in estimation of power used in the past, +thus the real milli-Watts might be needed. An example of these requirements can +be found in the Intelligent Power Allocation in +Documentation/driver-api/thermal/power_allocator.rst. +Kernel subsystems might implement automatic detection to check whether EM +registered devices have inconsistent scale (based on EM internal flag). +Important thing to keep in mind is that when the power values are expressed in +an 'abstract scale' deriving real energy in milli-Joules would not be possible. + The figure below depicts an example of drivers (Arm-specific here, but the approach is applicable to any architecture) providing power costs to the EM framework, and interested clients reading the data from it:: @@ -73,7 +88,7 @@ Drivers are expected to register performance domains into the EM framework by calling the following API:: int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, - struct em_data_callback *cb, cpumask_t *cpus); + struct em_data_callback *cb, cpumask_t *cpus, bool milliwatts); Drivers must provide a callback function returning <frequency, power> tuples for each performance state. The callback function provided by the driver is free @@ -81,6 +96,10 @@ to fetch data from any relevant location (DT, firmware, ...), and by any mean deemed necessary. Only for CPU devices, drivers must specify the CPUs of the performance domains using cpumask. For other devices than CPUs the last argument must be set to NULL. +The last argument 'milliwatts' is important to set with correct value. Kernel +subsystems which use EM might rely on this flag to check if all EM devices use +the same scale. If there are different scales, these subsystems might decide +to: return warning/error, stop working or panic. See Section 3. for an example of driver implementing this callback, and kernel/power/energy_model.c for further documentation on this API. @@ -156,7 +175,8 @@ EM framework:: 37 nr_opp = foo_get_nr_opp(policy); 38 39 /* And register the new performance domain */ - 40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus); - 41 - 42 return 0; - 43 } + 40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus, + 41 true); + 42 + 43 return 0; + 44 } |