summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2024-01-12thermal: netlink: Rework notify API for cooling devicesRafael J. Wysocki
In analogy with some previous thermal netlink API changes, redefine thermal_notify_cdev_state_update(), thermal_notify_cdev_add() and thermal_notify_cdev_delete() to take a const cdev pointer as their first argument and let them extract the requisite information from there by themselves. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-01-12thermal: core: Use kstrdup_const() during cooling device registrationChristophe JAILLET
Some *thermal_cooling_device_register() calls pass a string literal as the 'type' parameter, so kstrdup_const() can be used instead of kstrdup() to avoid a memory allocation in such cases. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-12thermal/debugfs: Add thermal debugfs information for mitigation episodesDaniel Lezcano
The mitigation episodes are recorded. A mitigation episode happens when the first trip point is crossed the way up and then the way down. During this episode other trip points can be crossed also and are accounted for this mitigation episode. The interesting information is the average temperature at the trip point, the undershot and the overshot. The standard deviation of the mitigated temperature will be added later. The thermal debugfs directory structure tries to stay consistent with the sysfs one but in a very simplified way: thermal/ `-- thermal_zones |-- 0 | `-- mitigations `-- 1 `-- mitigations The content of the mitigations file has the following format: ,-Mitigation at 349988258us, duration=130136ms | trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) | | 0 | passive | 65000 | 2000 | 130136 | 68227 | 62500 | 75625 | | 1 | passive | 75000 | 2000 | 104209 | 74857 | 71666 | 77500 | ,-Mitigation at 272451637us, duration=75000ms | trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) | | 0 | passive | 65000 | 2000 | 75000 | 68561 | 62500 | 75000 | | 1 | passive | 75000 | 2000 | 60714 | 74820 | 70555 | 77500 | ,-Mitigation at 238184119us, duration=27316ms | trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) | | 0 | passive | 65000 | 2000 | 27316 | 73377 | 62500 | 75000 | | 1 | passive | 75000 | 2000 | 19468 | 75284 | 69444 | 77500 | ,-Mitigation at 39863713us, duration=136196ms | trip | type | temp(°mC) | hyst(°mC) | duration | avg(°mC) | min(°mC) | max(°mC) | | 0 | passive | 65000 | 2000 | 136196 | 73922 | 62500 | 75000 | | 1 | passive | 75000 | 2000 | 91721 | 74386 | 69444 | 78125 | More information for a better understanding of the thermal behavior will be added after. The idea is to give detailed statistics information about the undershots and overshots, the temperature speed, etc... As all the information in a single file is too much, the idea would be to create a directory named with the mitigation timestamp where all data could be added. Please note this code is immune against trip ordering but not against a trip temperature change while a mitigation is happening. However, this situation should be extremely rare, perhaps not happening and we might question ourselves if something should be done in the core framework for other components first. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> [ rjw: White space fixups, rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-12thermal/debugfs: Add thermal cooling device debugfs informationDaniel Lezcano
The thermal framework does not have any debug information except a sysfs stat which is a bit controversial. This one allocates big chunks of memory for every cooling devices with a high number of states and could represent on some systems in production several megabytes of memory for just a portion of it. As the sysfs is limited to a page size, the output is not exploitable with large data array and gets truncated. The patch provides the same information than sysfs except the transitions are dynamically allocated, thus they won't show more events than the ones which actually occurred. There is no longer a size limitation and it opens the field for more debugging information where the debugfs is designed for, not sysfs. The thermal debugfs directory structure tries to stay consistent with the sysfs one but in a very simplified way: thermal/ -- cooling_devices |-- 0 | |-- clear | |-- time_in_state_ms | |-- total_trans | `-- trans_table |-- 1 | |-- clear | |-- time_in_state_ms | |-- total_trans | `-- trans_table |-- 2 | |-- clear | |-- time_in_state_ms | |-- total_trans | `-- trans_table |-- 3 | |-- clear | |-- time_in_state_ms | |-- total_trans | `-- trans_table `-- 4 |-- clear |-- time_in_state_ms |-- total_trans `-- trans_table The content of the files in the cooling devices directory is the same as the sysfs one except for the trans_table which has the following format: Transition Hits 1->0 246 0->1 246 2->1 632 1->2 632 3->2 98 2->3 98 Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> [ rjw: White space fixups, rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2024-01-09thermal: netlink: Pass thermal zone pointer to notify routinesRafael J. Wysocki
There are several rountines in the thermal netlink API that take a thermal zone ID or a thermal zone type as their arguments, but from their callers perspective it would be more convenient to pass a thermal zone pointer to them and let them extract the necessary data from the given thermal zone object by themselves. Modify the code accordingly. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-01-09thermal: netlink: Drop thermal_notify_tz_trip_add/delete()Rafael J. Wysocki
Because thermal_notify_tz_trip_add/delete() are never used, drop them entirely along with the related code. The addition or removal of trip points is not supported by the thermal core and is unlikely to be supported in the future, so it is also unlikely that these functions will ever be needed. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-01-09thermal: netlink: Pass pointers to thermal_notify_tz_trip_up/down()Rafael J. Wysocki
Instead of requiring the callers of thermal_notify_tz_trip_up/down() to provide specific values needed to populate struct param in them, make them extract those values from objects passed by the callers via const pointers. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-01-09thermal: netlink: Pass pointers to thermal_notify_tz_trip_change()Rafael J. Wysocki
Instead of requiring the caller of thermal_notify_tz_trip_change() to provide specific values needed to populate struct param in it, make it extract those values from objects passed to it by the caller via const pointers. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2024-01-04thermal: trip: Constify thermal zone argument of thermal_zone_trip_id()Rafael J. Wysocki
Because thermal_zone_trip_id() does not update the thermal zone object passed to it, its pointer argument representing the thermal zone can be const, so adjust its definition accordingly. No functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
2024-01-02Merge tag 'thermal-v6.8-rc1' of ↵Rafael J. Wysocki
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux into thermal Merge thermal control material for 6.8-rc1 from Daniel Lezcano: "- Converted Mediatek Thermal to the json-schema (Rafał Miłecki) - Fixed DT bindings issue on Loongson (Binbin Zhou) - Fixed returning NULL instead of -ENODEV on Loogsoo (Binbin Zhou) - Added the DT binding for the tsens on SM8650 platform (Neil Armstrong) - Added a reboot on critical option feature (Fabio Estevam) - Made usage of DEFINE_SIMPLE_DEV_PM_OPS on AmLogic (Uwe Kleine-König) - Added the D1/T113s THS controller support on Sun8i (Maxim Kiselev) - Fixed example in the DT binding for QCom SPMI (Johan Hovold) - Fixed compilation warning for the tmon utility (Florian Eckert) - Added interrupt based configuration on Exynos along with a set of related cleanups (Mateusz Majewski)" * tag 'thermal-v6.8-rc1' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (24 commits) thermal/drivers/exynos: Use set_trips ops thermal/drivers/exynos: Use BIT wherever possible thermal/drivers/exynos: Split initialization of TMU and the thermal zone thermal/drivers/exynos: Stop using the threshold mechanism on Exynos 4210 thermal/drivers/exynos: Simplify regulator (de)initialization thermal/drivers/exynos: Handle devm_regulator_get_optional return value correctly thermal/drivers/exynos: Wwitch from workqueue-driven interrupt handling to threaded interrupts thermal/drivers/exynos: Drop id field thermal/drivers/exynos: Remove an unnecessary field description tools/thermal/tmon: Fix compilation warning for wrong format dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Clean up examples dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Fix example node names thermal/drivers/sun8i: Add D1/T113s THS controller support dt-bindings: thermal: sun8i: Add binding for D1/T113s THS controller thermal: amlogic: Use DEFINE_SIMPLE_DEV_PM_OPS for PM functions thermal: amlogic: Make amlogic_thermal_disable() return void thermal/thermal_of: Allow rebooting after critical temp reboot: Introduce thermal_zone_device_critical_reboot() thermal/core: Prepare for introduction of thermal reboot dt-bindings: thermal-zones: Document critical-action ...
2024-01-02thermal/drivers/exynos: Use set_trips opsMateusz Majewski
Currently, each trip point defined in the device tree corresponds to a single hardware interrupt. This commit instead switches to using two hardware interrupts, whose values are set dynamically using the set_trips callback. Additionally, the critical temperature threshold is handled specifically. Setting interrupts in this way also fixes a long-standing lockdep warning, which was caused by calling thermal_zone_get_trips with our lock being held. Do note that this requires TMU initialization to be split into two parts, as done by the parent commit: parts of the initialization call into the thermal_zone_device structure and so must be done after its registration, but the initialization is also responsible for setting up calibration, which must be done before thermal_zone_device registration, which will call set_trips for the first time; if the calibration is not done in time, the interrupt values will be silently wrong! Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Mateusz Majewski <m.majewski2@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231201095625.301884-10-m.majewski2@samsung.com
2024-01-02thermal/drivers/exynos: Use BIT wherever possibleMateusz Majewski
The original driver did not use that macro and it allows us to make our intentions slightly clearer. Reviewed-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Mateusz Majewski <m.majewski2@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231201095625.301884-9-m.majewski2@samsung.com
2024-01-02thermal/drivers/exynos: Split initialization of TMU and the thermal zoneMateusz Majewski
This will be needed in the future, as the thermal zone subsystem might call our callbacks right after devm_thermal_of_zone_register. Currently we just make get_temp return EAGAIN in such case, but this will not be possible with state-modifying callbacks, for instance set_trips. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Mateusz Majewski <m.majewski2@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231201095625.301884-8-m.majewski2@samsung.com
2024-01-02thermal/drivers/exynos: Stop using the threshold mechanism on Exynos 4210Mateusz Majewski
Exynos 4210 supports setting a base threshold value, which is added to all trip points. This might be useful, but is not really necessary in our usecase, so we always set it to 0 to simplify the code a bit. Additionally, this change makes it so that we convert the value to the calibrated one in a slightly different place. This is more correct morally, though it does not make any change when single-point calibration is being used (which is the case currently). Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Mateusz Majewski <m.majewski2@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231201095625.301884-7-m.majewski2@samsung.com
2024-01-02thermal/drivers/exynos: Simplify regulator (de)initializationMateusz Majewski
We rewrite the initialization to enable the regulator as part of devm, which allows us to not handle the struct instance manually. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Mateusz Majewski <m.majewski2@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231201095625.301884-6-m.majewski2@samsung.com
2024-01-02thermal/drivers/exynos: Handle devm_regulator_get_optional return value ↵Mateusz Majewski
correctly Currently, if regulator is required in the SoC, but devm_regulator_get_optional fails for whatever reason, the execution will proceed without propagating the error. Meanwhile there is no reason to output the error in case of -ENODEV. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Mateusz Majewski <m.majewski2@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231201095625.301884-5-m.majewski2@samsung.com
2024-01-02thermal/drivers/exynos: Wwitch from workqueue-driven interrupt handling to ↵Mateusz Majewski
threaded interrupts The workqueue boilerplate is mostly one-to-one what the threaded interrupts do. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Mateusz Majewski <m.majewski2@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231201095625.301884-4-m.majewski2@samsung.com
2024-01-02thermal/drivers/exynos: Drop id fieldMateusz Majewski
We do not use the value, and only Exynos 7 defines this alias anyway. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Mateusz Majewski <m.majewski2@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231201095625.301884-3-m.majewski2@samsung.com
2024-01-02thermal/drivers/exynos: Remove an unnecessary field descriptionMateusz Majewski
It seems that the field has been removed in one of the previous commits, but the description has been forgotten. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Mateusz Majewski <m.majewski2@samsung.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231201095625.301884-2-m.majewski2@samsung.com
2024-01-02tools/thermal/tmon: Fix compilation warning for wrong formatFlorian Eckert
The following warnings are shown during compilation: tui.c: In function 'show_cooling_device': tui.c:216:40: warning: format '%d' expects argument of type 'int', but argument 7 has type 'long unsigned int' [-Wformat=] 216 | "%02d %12.12s%6d %6d", | ~~^ | | | int | %6ld ...... 219 | ptdata.cdi[j].cur_state, | ~~~~~~~~~~~~~~~~~~~~~~~ | | | long unsigned int tui.c:216:44: warning: format '%d' expects argument of type 'int', but argument 8 has type 'long unsigned int' [-Wformat=] 216 | "%02d %12.12s%6d %6d", | ~~^ | | | int | %6ld ...... 220 | ptdata.cdi[j].max_state); | ~~~~~~~~~~~~~~~~~~~~~~~ | | | long unsigned int To fix this, the correct string format must be used for printing. Signed-off-by: Florian Eckert <fe@dev.tdt.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231204141335.2798194-1-fe@dev.tdt.de
2024-01-02dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Clean up examplesJohan Hovold
Clean up the examples by adding newline separators, moving 'reg' properties after 'compatible' and dropping unused labels. Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231130174114.13122-3-johan+linaro@kernel.org
2024-01-02dt-bindings: thermal: qcom-spmi-adc-tm5/hc: Fix example node namesJohan Hovold
The ADC Thermal Monitor is part of an SPMI PMIC, which in turn sits on an SPMI bus. Fixes: db03874b8543 ("dt-bindings: thermal: qcom: add HC variant of adc-thermal monitor bindings") Fixes: e8ffd6c0756b ("dt-bindings: thermal: qcom: add adc-thermal monitor bindings") Signed-off-by: Johan Hovold <johan+linaro@kernel.org> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231130174114.13122-2-johan+linaro@kernel.org
2024-01-02thermal/drivers/sun8i: Add D1/T113s THS controller supportMaxim Kiselev
This patch adds a thermal sensor controller support for the D1/T113s, which is similar to the one on H6, but with only one sensor and different scale and offset values. Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com> Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231217210629.131486-3-bigunclemax@gmail.com
2024-01-02dt-bindings: thermal: sun8i: Add binding for D1/T113s THS controllerMaxim Kiselev
Add a binding for D1/T113s thermal sensor controller. Signed-off-by: Maxim Kiselev <bigunclemax@gmail.com> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231217210629.131486-2-bigunclemax@gmail.com
2024-01-02thermal: amlogic: Use DEFINE_SIMPLE_DEV_PM_OPS for PM functionsUwe Kleine-König
This macro has the advantage over SIMPLE_DEV_PM_OPS that we don't have to care about when the functions are actually used, so the corresponding __maybe_unused can be dropped. Also make use of pm_ptr() to discard all PM related stuff if CONFIG_PM isn't enabled. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231116112633.668826-3-u.kleine-koenig@pengutronix.de
2024-01-02thermal: amlogic: Make amlogic_thermal_disable() return voidUwe Kleine-König
amlogic_thermal_disable() returned zero unconditionally and amlogic_thermal_remove() already ignores the return value. Make it return no value and modify amlogic_thermal_suspend to not check the value. This patch introduces no semantic changes, but makes it more obvious for a human reader that amlogic_thermal_suspend() cannot fail. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231116112633.668826-2-u.kleine-koenig@pengutronix.de
2024-01-02thermal/thermal_of: Allow rebooting after critical tempFabio Estevam
Currently, the default mechanism is to trigger a shutdown after the critical temperature is reached. In some embedded cases, such behavior does not suit well, as the board may be unattended in the field and rebooting may be a better approach. The bootloader may also check the temperature and only allow the boot to proceed when the temperature is below a certain threshold. Introduce support for allowing a reboot to be triggered after the critical temperature is reached. If the "critical-action" devicetree property is not found, fall back to the shutdown action to preserve the existing default behavior. If a custom ops->critical exists, then it takes preference over critical-actions. Tested on a i.MX8MM board with the following devicetree changes: thermal-zones { cpu-thermal { critical-action = "reboot"; }; }; Signed-off-by: Fabio Estevam <festevam@denx.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231129124330.519423-4-festevam@gmail.com
2024-01-02reboot: Introduce thermal_zone_device_critical_reboot()Fabio Estevam
Introduce thermal_zone_device_critical_reboot() to trigger an emergency reboot. It is a counterpart of thermal_zone_device_critical() with the difference that it will force a reboot instead of shutdown. The motivation for doing this is to allow the thermal subystem to trigger a reboot when the temperature reaches the critical temperature. Signed-off-by: Fabio Estevam <festevam@denx.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231129124330.519423-3-festevam@gmail.com
2024-01-02thermal/core: Prepare for introduction of thermal rebootFabio Estevam
Add some helper functions to make it easier introducing the support for thermal reboot. No functional change. Signed-off-by: Fabio Estevam <festevam@denx.de> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231129124330.519423-2-festevam@gmail.com
2024-01-02dt-bindings: thermal-zones: Document critical-actionFabio Estevam
Document the critical-action property to describe the thermal action the OS should perform after the critical temperature is reached. The possible values are "shutdown" and "reboot". The motivation for introducing the critical-action property is that different systems may need different thermal actions when the critical temperature is reached. For example, in a desktop PC, it is desired that a shutdown happens after the critical temperature is reached. However, in some embedded cases, such behavior does not suit well, as the board may be unattended in the field and rebooting may be a better approach. The bootloader may also benefit from this new property as it can check the SoC temperature and in case the temperature is above the critical point, it can trigger a shutdown or reboot accordingly. Signed-off-by: Fabio Estevam <festevam@denx.de> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231129124330.519423-1-festevam@gmail.com
2024-01-02dt-bindings: thermal: qcom-tsens: document the SM8650 Temperature SensorNeil Armstrong
Document the Temperature Sensor (TSENS) on the SM8650 Platform. Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231128-topic-sm8650-upstream-bindings-tsens-v3-1-54179e6646d3@linaro.org
2024-01-02drivers/thermal/loongson2_thermal: Fix incorrect PTR_ERR() judgmentBinbin Zhou
PTR_ERR() returns -ENODEV when thermal-zones are undefined, and we need -ENODEV as the right value for comparison. Otherwise, tz->type is NULL when thermal-zones is undefined, resulting in the following error: [ 12.290030] CPU 1 Unable to handle kernel paging request at virtual address fffffffffffffff1, era == 900000000355f410, ra == 90000000031579b8 [ 12.302877] Oops[#1]: [ 12.305190] CPU: 1 PID: 181 Comm: systemd-udevd Not tainted 6.6.0-rc7+ #5385 [ 12.312304] pc 900000000355f410 ra 90000000031579b8 tp 90000001069e8000 sp 90000001069eba10 [ 12.320739] a0 0000000000000000 a1 fffffffffffffff1 a2 0000000000000014 a3 0000000000000001 [ 12.329173] a4 90000001069eb990 a5 0000000000000001 a6 0000000000001001 a7 900000010003431c [ 12.337606] t0 fffffffffffffff1 t1 54567fd5da9b4fd4 t2 900000010614ec40 t3 00000000000dc901 [ 12.346041] t4 0000000000000000 t5 0000000000000004 t6 900000010614ee20 t7 900000000d00b790 [ 12.354472] t8 00000000000dc901 u0 54567fd5da9b4fd4 s9 900000000402ae10 s0 900000010614ec40 [ 12.362916] s1 90000000039fced0 s2 ffffffffffffffed s3 ffffffffffffffed s4 9000000003acc000 [ 12.362931] s5 0000000000000004 s6 fffffffffffff000 s7 0000000000000490 s8 90000001028b2ec8 [ 12.362938] ra: 90000000031579b8 thermal_add_hwmon_sysfs+0x258/0x300 [ 12.386411] ERA: 900000000355f410 strscpy+0xf0/0x160 [ 12.391626] CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE) [ 12.397898] PRMD: 00000004 (PPLV0 +PIE -PWE) [ 12.403678] EUEN: 00000000 (-FPE -SXE -ASXE -BTE) [ 12.409859] ECFG: 00071c1c (LIE=2-4,10-12 VS=7) [ 12.415882] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0) [ 12.415907] BADV: fffffffffffffff1 [ 12.415911] PRID: 0014a000 (Loongson-64bit, Loongson-2K1000) [ 12.415917] Modules linked in: loongson2_thermal(+) vfat fat uio_pdrv_genirq uio fuse zram zsmalloc [ 12.415950] Process systemd-udevd (pid: 181, threadinfo=00000000358b9718, task=00000000ace72fe3) [ 12.415961] Stack : 0000000000000dc0 54567fd5da9b4fd4 900000000402ae10 9000000002df9358 [ 12.415982] ffffffffffffffed 0000000000000004 9000000107a10aa8 90000001002a3410 [ 12.415999] ffffffffffffffed ffffffffffffffed 9000000107a11268 9000000003157ab0 [ 12.416016] 9000000107a10aa8 ffffff80020fc0c8 90000001002a3410 ffffffffffffffed [ 12.416032] 0000000000000024 ffffff80020cc1e8 900000000402b2a0 9000000003acc000 [ 12.416048] 90000001002a3410 0000000000000000 ffffff80020f4030 90000001002a3410 [ 12.416065] 0000000000000000 9000000002df6808 90000001002a3410 0000000000000000 [ 12.416081] ffffff80020f4030 0000000000000000 90000001002a3410 9000000002df2ba8 [ 12.416097] 00000000000000b4 90000001002a34f4 90000001002a3410 0000000000000002 [ 12.416114] ffffff80020f4030 fffffffffffffff0 90000001002a3410 9000000002df2f30 [ 12.416131] ... [ 12.416138] Call Trace: [ 12.416142] [<900000000355f410>] strscpy+0xf0/0x160 [ 12.416167] [<90000000031579b8>] thermal_add_hwmon_sysfs+0x258/0x300 [ 12.416183] [<9000000003157ab0>] devm_thermal_add_hwmon_sysfs+0x50/0xe0 [ 12.416200] [<ffffff80020cc1e8>] loongson2_thermal_probe+0x128/0x200 [loongson2_thermal] [ 12.416232] [<9000000002df6808>] platform_probe+0x68/0x140 [ 12.416249] [<9000000002df2ba8>] really_probe+0xc8/0x3c0 [ 12.416269] [<9000000002df2f30>] __driver_probe_device+0x90/0x180 [ 12.416286] [<9000000002df3058>] driver_probe_device+0x38/0x160 [ 12.416302] [<9000000002df33a8>] __driver_attach+0xa8/0x200 [ 12.416314] [<9000000002deffec>] bus_for_each_dev+0x8c/0x120 [ 12.416330] [<9000000002df198c>] bus_add_driver+0x10c/0x2a0 [ 12.416346] [<9000000002df46b4>] driver_register+0x74/0x160 [ 12.416358] [<90000000022201a4>] do_one_initcall+0x84/0x220 [ 12.416372] [<90000000022f3ab8>] do_init_module+0x58/0x2c0 [ 12.416386] [<90000000022f6538>] init_module_from_file+0x98/0x100 [ 12.416399] [<90000000022f67f0>] sys_finit_module+0x230/0x3c0 [ 12.416412] [<900000000358f7c8>] do_syscall+0x88/0xc0 [ 12.416431] [<900000000222137c>] handle_syscall+0xbc/0x158 Fixes: e7e3a7c35791 ("thermal/drivers/loongson-2: Add thermal management support") Cc: Yinbo Zhu <zhuyinbo@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/343c14de98216636a47b43e8bfd47b70d0a8e068.1700817227.git.zhoubinbin@loongson.cn
2024-01-02dt-bindings: thermal: loongson,ls2k-thermal: Fix binding check issuesBinbin Zhou
Add the missing 'thermal-sensor-cells' property which is required for every thermal sensor as it's used when using phandles. And add the thermal-sensor.yaml reference. In fact, it was a careless mistake when submitting the driver that caused it to not work properly. So the fix is necessary, although it will result in the ABI break. Fixes: 72684d99a854 ("thermal: dt-bindings: add loongson-2 thermal") Cc: Yinbo Zhu <zhuyinbo@loongson.cn> Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/6d69362632271ab0af9a5fbfa3bc46a0894f1d54.1700817227.git.zhoubinbin@loongson.cn
2024-01-02dt-bindings: thermal: convert Mediatek Thermal to the json-schemaRafał Miłecki
This helps validating DTS files. Introduced changes: 1. Improved title 2. Simplified description (dropped "This describes the device tree...") 3. Dropped undocumented "reset-names" from example Signed-off-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Rob Herring <robh@kernel.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/20231117052214.24554-1-zajec5@gmail.com
2023-12-29thermal: gov_power_allocator: Support new update callback of weightsLukasz Luba
When the thermal instance's weight is updated from the sysfs the governor update_tz() callback is triggered. Implement proper reaction to this event in the IPA, which would save CPU cycles spent in throttle(). This will speed-up the main throttle() IPA function and clean it up a bit. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29thermal/sysfs: Update governors when the 'weight' has changedLukasz Luba
Support governors update when the thermal instance's weight has changed. This allows to adjust internal state for the governor. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> [ rjw: Add two empty code lines aroung the locking ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29thermal/sysfs: Update instance->weight under tz lockLukasz Luba
User space can change the weight of a thermal instance via sysfs while the .throttle() callback is running for a governor, because weight_store() does not use the zone lock. The IPA governor uses instance weight values for power calculations and caches the sum of them as total_weight, so it gets confused when one of them changes while its .throttle() callback is running. To prevent that from happening, use thermal zone locking in weight_store(). Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29thermal: gov_power_allocator: Simplify checks for valid power actorLukasz Luba
There is a need to check if the cooling device in the thermal zone supports IPA callback and is set for control trip point. Refactor the code which validates the power actor capabilities and make it more consistent in all places. No intentional functional impact. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29thermal: gov_power_allocator: Move memory allocation out of throttle()Lukasz Luba
The new thermal callback allows to react to the change of cooling instances in the thermal zone. Move the memory allocation to that new callback and save CPU cycles in the throttle() code path. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29thermal: gov_power_allocator: Change trace functionsLukasz Luba
Change trace event trace_thermal_power_allocator() to not use dynamic array for requested power and granted power for all power actors. Instead, simplify the trace event and print other simple values. Add new trace event to print power actor information of requested power and granted power. That trace event would be called in a loop for each power actor. The trace data would be easier to parse comparing to the dynamic array implementation. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29thermal: gov_power_allocator: Refactor checks in divvy_up_power()Lukasz Luba
Simplify the code and remove one extra 'if' block. No intentional functional impact. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29thermal: gov_power_allocator: Refactor check_power_actors()Lukasz Luba
In preparation for a subsequent change, rearrange check_power_actors(). No intentional functional impact. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-29thermal: core: Add governor callback for thermal zone changeLukasz Luba
Add a new callback to the struct thermal_governor. It can be used for updating governors when there is a change in the thermal zone internals, e.g. thermal cooling device is bind to the thermal zone. That makes possible to move some heavy operations like memory allocations related to the number of cooling instances out of the throttle() callback. Both callback code paths (throttle() and update_tz()) are protected with the same thermal zone lock, which guaranties the consistency. Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28thermal: netlink: Add thermal_group_has_listeners() helperStanislaw Gruszka
Add a helper function to check if there are listeners for thermal_gnl_family multicast groups. For now use it to avoid unnecessary allocations and sending thermal genl messages when there are no recipients. In the future, in conjunction with (not yet implemented) notification of change in the netlink socket group membership, this helper can be used to open/close hardware interfaces based on the presence of user space subscribers. Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28thermal: netlink: Add enum for mutlicast groups indexesStanislaw Gruszka
Use enum instead of hard-coded numbers for indexing multicast groups. Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28thermal: core: Resume thermal zones asynchronouslyRafael J. Wysocki
The resume of thermal zones in thermal_pm_notify() is carried out sequentially, which may be a problem if __thermal_zone_device_update() takes a significant time to run for some thermal zones, because some other thermal zones may need to wait for them to resume then and if any other PM notifiers are going to be invoked after the thermal one, they will need to wait for it either. To address this, make thermal_pm_notify() switch the poll_queue delayed work over to a one-shot thermal_zone_device_resume() work function that will restore the original one during the thermal zone resume and queue up poll_queue without a delay for each thermal zone. Link: https://lore.kernel.org/linux-pm/20231120234015.3273143-1-radusolea@google.com/ Reported-by: Radu Solea <radusolea@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28thermal: core: Initialize poll_queue in thermal_zone_device_init()Rafael J. Wysocki
In preparation for a subsequent change, move the initialization of the poll_queue delayed work from thermal_zone_device_register_with_trips() to thermal_zone_device_init() which is called by the former. However, because thermal_zone_device_init() is also called by thermal_pm_notify(), make the latter call cancel_delayed_work() on poll_queue before invoking the former, so as to allow the work item to be re-initialized safely. Also move thermal_zone_device_check() which needs to be defined before thermal_zone_device_init(), so the latter can pass it to the INIT_DELAYED_WORK() macro. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-28thermal: core: Fix thermal zone suspend-resume synchronizationRafael J. Wysocki
There are 3 synchronization issues with thermal zone suspend-resume during system-wide transitions: 1. The resume code runs in a PM notifier which is invoked after user space has been thawed, so it can run concurrently with user space which can trigger a thermal zone device removal. If that happens, the thermal zone resume code may use a stale pointer to the next list element and crash, because it does not hold thermal_list_lock while walking thermal_tz_list. 2. The thermal zone resume code calls thermal_zone_device_init() outside the zone lock, so user space or an update triggered by the platform firmware may see an inconsistent state of a thermal zone leading to unexpected behavior. 3. Clearing the in_suspend global variable in thermal_pm_notify() allows __thermal_zone_device_update() to continue for all thermal zones and it may as well run before the thermal_tz_list walk (or at any point during the list walk for that matter) and attempt to operate on a thermal zone that has not been resumed yet. It may also race destructively with thermal_zone_device_init(). To address these issues, add thermal_list_lock locking to thermal_pm_notify(), especially arount the thermal_tz_list, make it call thermal_zone_device_init() back-to-back with __thermal_zone_device_update() under the zone lock and replace in_suspend with per-zone bool "suspend" indicators set and unset under the given zone's lock. Link: https://lore.kernel.org/linux-pm/20231218162348.69101-1-bo.ye@mediatek.com/ Reported-by: Bo Ye <bo.ye@mediatek.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-21thermal: cpuidle_cooling: fix kernel-doc warning and a spelloRandy Dunlap
Correct one misuse of kernel-doc notation and one spelling error as reported by codespell. cpuidle_cooling.c:152: warning: cannot understand function prototype: 'struct thermal_cooling_device_ops cpuidle_cooling_ops = ' For the kernel-doc warning, don't use "/**" for a comment on data. kernel-doc can be used for structure declarations but not definitions. Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-12-15thermal: core: Fix NULL pointer dereference in zone registration error pathRafael J. Wysocki
If device_register() in thermal_zone_device_register_with_trips() returns an error, the tz variable is set to NULL and subsequently dereferenced in kfree(tz->tzp). Commit adc8749b150c ("thermal/drivers/core: Use put_device() if device_register() fails") added the tz = NULL assignment in question to avoid a possible double-free after dropping the reference to the zone device. However, after commit 4649620d9404 ("thermal: core: Make thermal_zone_device_unregister() return after freeing the zone"), that assignment has become redundant, because dropping the reference to the zone device does not cause the zone object to be freed any more. Drop it to address the NULL pointer dereference. Fixes: 3d439b1a2ad3 ("thermal/core: Alloc-copy-free the thermal zone parameters structure") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>