Skip to content

Commit 9b1b3dc

Browse files
committed
Merge tag 'pm-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki: "By the number of commits, cpufreq is the leading party (again) and the most visible change there is the removal of the omap-cpufreq driver that has not been used for a long time (good riddance). There are also quite a few changes in the cppc_cpufreq driver, mostly related to fixing its frequency invariance engine in the case when the CPPC registers used by it are not in PCC. In addition to that, support for AM62L3 is added to the ti-cpufreq driver and the cpufreq-dt-platdev list is updated for some platforms. The remaining cpufreq changes are assorted fixes and cleanups. Next up is cpuidle and the changes there are dominated by intel_idle driver updates, mostly related to the new command line facility allowing users to adjust the list of C-states used by the driver. There are also a few updates of cpuidle governors, including two menu governor fixes and some refinements of the teo governor, and a MAINTAINERS update adding Christian Loehle as a cpuidle reviewer. [Thanks for stepping up Christian!] The most significant update related to system suspend and hibernation is the one to stop freezing the PM runtime workqueue during system PM transitions which allows some deadlocks to be avoided. There is also a fix for possible concurrent bit field updates in the core device suspend code and a few other minor fixes. Apart from the above, several drivers are updated to discard the return value of pm_runtime_put() which is going to be converted to a void function as soon as everybody stops using its return value, PL4 support for Ice Lake is added to the Intel RAPL power capping driver, and there are assorted cleanups, documentation fixes, and some cpupower utility improvements. Specifics: - Remove the unused omap-cpufreq driver (Andreas Kemnade) - Optimize error handling code in cpufreq_boost_trigger_state() and make cpufreq_boost_trigger_state() return -EOPNOTSUPP if no policy supports boost (Lifeng Zheng) - Update cpufreq-dt-platdev list for tegra, qcom, TI (Aaron Kling, Dhruva Gole, and Konrad Dybcio) - Minor improvements to the cpufreq and cpumask rust implementation (Alexandre Courbot, Alice Ryhl, Tamir Duberstein, and Yilin Chen) - Add support for AM62L3 SoC to the ti-cpufreq driver (Dhruva Gole) - Update arch_freq_scale in the CPPC cpufreq driver's frequency invariance engine (FIE) in scheduler ticks if the related CPPC registers are not in PCC (Jie Zhan) - Assorted minor cleanups and improvements in ARM cpufreq drivers (Juan Martinez, Felix Gu, Luca Weiss, and Sergey Shtylyov) - Add generic helpers for sysfs show/store to cppc_cpufreq (Sumit Gupta) - Make the scaling_setspeed cpufreq sysfs attribute return the actual requested frequency to avoid confusion (Pengjie Zhang) - Simplify the idle CPU time granularity test in the ondemand cpufreq governor (Frederic Weisbecker) - Enable asym capacity in intel_pstate only when CPU SMT is not possible (Yaxiong Tian) - Update the description of rate_limit_us default value in cpufreq documentation (Yaxiong Tian) - Add a command line option to adjust the C-states table in the intel_idle driver, remove the 'preferred_cstates' module parameter from it, add C-states validation to it and clean it up (Artem Bityutskiy) - Make the menu cpuidle governor always check the time till the closest timer event when the scheduler tick has been stopped to prevent it from mistakenly selecting the deepest available idle state (Rafael Wysocki) - Update the teo cpuidle governor to avoid making suboptimal decisions in certain corner cases and generally improve idle state selection accuracy (Rafael Wysocki) - Remove an unlikely() annotation on the early-return condition in menu_select() that leads to branch misprediction 100% of the time on systems with only 1 idle state enabled, like ARM64 servers (Breno Leitao) - Add Christian Loehle to MAINTAINERS as a cpuidle reviewer (Christian Loehle) - Stop flagging the PM runtime workqueue as freezable to avoid system suspend and resume deadlocks in subsystems that assume asynchronous runtime PM to work during system-wide PM transitions (Rafael Wysocki) - Drop redundant NULL pointer checks before acomp_request_free() from the hibernation code handling image saving (Rafael Wysocki) - Update wakeup_sources_walk_start() to handle empty lists of wakeup sources as appropriate (Samuel Wu) - Make dev_pm_clear_wake_irq() check the power.wakeirq value under power.lock to avoid race conditions (Gui-Dong Han) - Avoid bit field races related to power.work_in_progress in the core device suspend code (Xuewen Yan) - Make several drivers discard pm_runtime_put() return value in preparation for converting that function to a void one (Rafael Wysocki) - Add PL4 support for Ice Lake to the Intel RAPL power capping driver (Daniel Tang) - Replace sprintf() with sysfs_emit() in power capping sysfs show functions (Sumeet Pawnikar) - Make dev_pm_opp_get_level() return value match the documentation after a previous update of the latter (Aleks Todorov) - Use scoped for each OF child loop in the OPP code (Krzysztof Kozlowski) - Fix a bug in an example code snippet and correct typos in the energy model management documentation (Patrick Little) - Fix miscellaneous problems in cpupower (Kaushlendra Kumar): * idle_monitor: Fix incorrect value logged after stop * Fix inverted APERF capability check * Use strcspn() to strip trailing newline * Reset errno before strtoull() * Show C0 in idle-info dump - Improve cpupower installation procedure by making the systemd step optional and allowing users to disable the installation of systemd's unit file (João Marcos Costa)" * tag 'pm-6.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (65 commits) PM: sleep: core: Avoid bit field races related to work_in_progress PM: sleep: wakeirq: harden dev_pm_clear_wake_irq() against races cpufreq: Documentation: Update description of rate_limit_us default value cpufreq: intel_pstate: Enable asym capacity only when CPU SMT is not possible PM: wakeup: Handle empty list in wakeup_sources_walk_start() PM: EM: Documentation: Fix bug in example code snippet Documentation: Fix typos in energy model documentation cpuidle: governors: teo: Refine intercepts-based idle state lookup cpuidle: governors: teo: Adjust the classification of wakeup events cpufreq: ondemand: Simplify idle cputime granularity test cpufreq: userspace: make scaling_setspeed return the actual requested frequency PM: hibernate: Drop NULL pointer checks before acomp_request_free() cpufreq: CPPC: Add generic helpers for sysfs show/store cpufreq: scmi: Fix device_node reference leak in scmi_cpu_domain_id() cpufreq: ti-cpufreq: add support for AM62L3 SoC cpufreq: dt-platdev: Add ti,am62l3 to blocklist cpufreq/amd-pstate: Add comment explaining nominal_perf usage for performance policy cpufreq: scmi: correct SCMI explanation cpufreq: dt-platdev: Block the driver from probing on more QC platforms rust: cpumask: rename methods of Cpumask for clarity and consistency ...
2 parents d84e173 + 0f64b6a commit 9b1b3dc

66 files changed

Lines changed: 633 additions & 546 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Documentation/admin-guide/pm/cpufreq.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -439,7 +439,7 @@ This governor exposes only one tunable:
439439
``rate_limit_us``
440440
Minimum time (in microseconds) that has to pass between two consecutive
441441
runs of governor computations (default: 1.5 times the scaling driver's
442-
transition latency or the maximum 2ms).
442+
transition latency or 1ms if the driver does not provide a latency value).
443443

444444
The purpose of this tunable is to reduce the scheduler context overhead
445445
of the governor which might be excessive without it.

Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ properties:
3535
- description: v2 of CPUFREQ HW (EPSS)
3636
items:
3737
- enum:
38+
- qcom,milos-cpufreq-epss
3839
- qcom,qcs8300-cpufreq-epss
3940
- qcom,qdu1000-cpufreq-epss
4041
- qcom,sa8255p-cpufreq-epss
@@ -169,6 +170,7 @@ allOf:
169170
compatible:
170171
contains:
171172
enum:
173+
- qcom,milos-cpufreq-epss
172174
- qcom,qcs8300-cpufreq-epss
173175
- qcom,sc7280-cpufreq-epss
174176
- qcom,sm8250-cpufreq-epss

Documentation/power/energy-model.rst

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ subsystems willing to use that information to make energy-aware decisions.
1414
The source of the information about the power consumed by devices can vary greatly
1515
from one platform to another. These power costs can be estimated using
1616
devicetree data in some cases. In others, the firmware will know better.
17-
Alternatively, userspace might be best positioned. And so on. In order to avoid
18-
each and every client subsystem to re-implement support for each and every
17+
Alternatively, userspace might be best positioned. In order to avoid
18+
having each and every client subsystem re-implement support for each and every
1919
possible source of information on its own, the EM framework intervenes as an
2020
abstraction layer which standardizes the format of power cost tables in the
2121
kernel, hence enabling to avoid redundant work.
@@ -32,7 +32,7 @@ be found in the Intelligent Power Allocation in
3232
Documentation/driver-api/thermal/power_allocator.rst.
3333
Kernel subsystems might implement automatic detection to check whether EM
3434
registered devices have inconsistent scale (based on EM internal flag).
35-
Important thing to keep in mind is that when the power values are expressed in
35+
An important thing to keep in mind is that when the power values are expressed in
3636
an 'abstract scale' deriving real energy in micro-Joules would not be possible.
3737

3838
The figure below depicts an example of drivers (Arm-specific here, but the
@@ -82,7 +82,7 @@ using kref mechanism. The device driver which provided the new EM at runtime,
8282
should call EM API to free it safely when it's no longer needed. The EM
8383
framework will handle the clean-up when it's possible.
8484

85-
The kernel code which want to modify the EM values is protected from concurrent
85+
The kernel code which wants to modify the EM values is protected from concurrent
8686
access using a mutex. Therefore, the device driver code must run in sleeping
8787
context when it tries to modify the EM.
8888

@@ -113,7 +113,7 @@ Registration of 'advanced' EM
113113
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
114114

115115
The 'advanced' EM gets its name due to the fact that the driver is allowed
116-
to provide more precised power model. It's not limited to some implemented math
116+
to provide a more precise power model. It's not limited to some implemented math
117117
formula in the framework (like it is in 'simple' EM case). It can better reflect
118118
the real power measurements performed for each performance state. Thus, this
119119
registration method should be preferred in case considering EM static power
@@ -172,7 +172,7 @@ Registration of 'simple' EM
172172
~~~~~~~~~~~~~~~~~~~~~~~~~~~
173173

174174
The 'simple' EM is registered using the framework helper function
175-
cpufreq_register_em_with_opp(). It implements a power model which is tight to
175+
cpufreq_register_em_with_opp(). It implements a power model which is tied to a
176176
math formula::
177177

178178
Power = C * V^2 * f
@@ -251,7 +251,7 @@ It returns the 'struct em_perf_state' pointer which is an array of performance
251251
states in ascending order.
252252
This function must be called in the RCU read lock section (after the
253253
rcu_read_lock()). When the EM table is not needed anymore there is a need to
254-
call rcu_real_unlock(). In this way the EM safely uses the RCU read section
254+
call rcu_read_unlock(). In this way the EM safely uses the RCU read section
255255
and protects the users. It also allows the EM framework to manage the memory
256256
and free it. More details how to use it can be found in Section 3.2 in the
257257
example driver.
@@ -308,12 +308,12 @@ EM framework::
308308
05
309309
06 /* Use the 'foo' protocol to ceil the frequency */
310310
07 freq = foo_get_freq_ceil(dev, *KHz);
311-
08 if (freq < 0);
311+
08 if (freq < 0)
312312
09 return freq;
313313
10
314314
11 /* Estimate the power cost for the dev at the relevant freq. */
315315
12 power = foo_estimate_power(dev, freq);
316-
13 if (power < 0);
316+
13 if (power < 0)
317317
14 return power;
318318
15
319319
16 /* Return the values to the EM framework */

Documentation/power/runtime_pm.rst

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -712,10 +712,9 @@ out the following operations:
712712
* During system suspend pm_runtime_get_noresume() is called for every device
713713
right before executing the subsystem-level .prepare() callback for it and
714714
pm_runtime_barrier() is called for every device right before executing the
715-
subsystem-level .suspend() callback for it. In addition to that the PM core
716-
calls __pm_runtime_disable() with 'false' as the second argument for every
717-
device right before executing the subsystem-level .suspend_late() callback
718-
for it.
715+
subsystem-level .suspend() callback for it. In addition to that, the PM
716+
core disables runtime PM for every device right before executing the
717+
subsystem-level .suspend_late() callback for it.
719718

720719
* During system resume pm_runtime_enable() and pm_runtime_put() are called for
721720
every device right after executing the subsystem-level .resume_early()

Documentation/scheduler/sched-energy.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -244,15 +244,15 @@ Example 2.
244244

245245

246246
From these calculations, the Case 1 has the lowest total energy. So CPU 1
247-
is be the best candidate from an energy-efficiency standpoint.
247+
is the best candidate from an energy-efficiency standpoint.
248248

249249
Big CPUs are generally more power hungry than the little ones and are thus used
250250
mainly when a task doesn't fit the littles. However, little CPUs aren't always
251251
necessarily more energy-efficient than big CPUs. For some systems, the high OPPs
252252
of the little CPUs can be less energy-efficient than the lowest OPPs of the
253253
bigs, for example. So, if the little CPUs happen to have enough utilization at
254254
a specific point in time, a small task waking up at that moment could be better
255-
of executing on the big side in order to save energy, even though it would fit
255+
off executing on the big side in order to save energy, even though it would fit
256256
on the little side.
257257

258258
And even in the case where all OPPs of the big CPUs are less energy-efficient
@@ -285,7 +285,7 @@ much that can be done by the scheduler to save energy without severely harming
285285
throughput. In order to avoid hurting performance with EAS, CPUs are flagged as
286286
'over-utilized' as soon as they are used at more than 80% of their compute
287287
capacity. As long as no CPUs are over-utilized in a root domain, load balancing
288-
is disabled and EAS overridess the wake-up balancing code. EAS is likely to load
288+
is disabled and EAS overrides the wake-up balancing code. EAS is likely to load
289289
the most energy efficient CPUs of the system more than the others if that can be
290290
done without harming throughput. So, the load-balancer is disabled to prevent
291291
it from breaking the energy-efficient task placement found by EAS. It is safe to
@@ -385,7 +385,7 @@ Using EAS with any other governor than schedutil is not supported.
385385
6.5 Scale-invariant utilization signals
386386
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
387387

388-
In order to make accurate prediction across CPUs and for all performance
388+
In order to make accurate predictions across CPUs and for all performance
389389
states, EAS needs frequency-invariant and CPU-invariant PELT signals. These can
390390
be obtained using the architecture-defined arch_scale{cpu,freq}_capacity()
391391
callbacks.

MAINTAINERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6561,6 +6561,7 @@ F: rust/kernel/cpu.rs
65616561
CPU IDLE TIME MANAGEMENT FRAMEWORK
65626562
M: "Rafael J. Wysocki" <rafael@kernel.org>
65636563
M: Daniel Lezcano <daniel.lezcano@linaro.org>
6564+
R: Christian Loehle <christian.loehle@arm.com>
65646565
L: linux-pm@vger.kernel.org
65656566
S: Maintained
65666567
B: https://bugzilla.kernel.org
@@ -19149,7 +19150,6 @@ M: Kevin Hilman <khilman@kernel.org>
1914919150
L: linux-omap@vger.kernel.org
1915019151
S: Maintained
1915119152
F: arch/arm/*omap*/*pm*
19152-
F: drivers/cpufreq/omap-cpufreq.c
1915319153

1915419154
OMAP POWERDOMAIN SOC ADAPTATION LAYER SUPPORT
1915519155
M: Paul Walmsley <paul@pwsan.com>

drivers/acpi/cppc_acpi.c

Lines changed: 27 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1423,6 +1423,32 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
14231423
}
14241424
EXPORT_SYMBOL_GPL(cppc_get_perf_caps);
14251425

1426+
/**
1427+
* cppc_perf_ctrs_in_pcc_cpu - Check if any perf counters of a CPU are in PCC.
1428+
* @cpu: CPU on which to check perf counters.
1429+
*
1430+
* Return: true if any of the counters are in PCC regions, false otherwise
1431+
*/
1432+
bool cppc_perf_ctrs_in_pcc_cpu(unsigned int cpu)
1433+
{
1434+
struct cpc_desc *cpc_desc = per_cpu(cpc_desc_ptr, cpu);
1435+
struct cpc_register_resource *ref_perf_reg;
1436+
1437+
/*
1438+
* If reference perf register is not supported then we should use the
1439+
* nominal perf value
1440+
*/
1441+
ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
1442+
if (!CPC_SUPPORTED(ref_perf_reg))
1443+
ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
1444+
1445+
return CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
1446+
CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
1447+
CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]) ||
1448+
CPC_IN_PCC(ref_perf_reg);
1449+
}
1450+
EXPORT_SYMBOL_GPL(cppc_perf_ctrs_in_pcc_cpu);
1451+
14261452
/**
14271453
* cppc_perf_ctrs_in_pcc - Check if any perf counters are in a PCC region.
14281454
*
@@ -1437,27 +1463,7 @@ bool cppc_perf_ctrs_in_pcc(void)
14371463
int cpu;
14381464

14391465
for_each_online_cpu(cpu) {
1440-
struct cpc_register_resource *ref_perf_reg;
1441-
struct cpc_desc *cpc_desc;
1442-
1443-
cpc_desc = per_cpu(cpc_desc_ptr, cpu);
1444-
1445-
if (CPC_IN_PCC(&cpc_desc->cpc_regs[DELIVERED_CTR]) ||
1446-
CPC_IN_PCC(&cpc_desc->cpc_regs[REFERENCE_CTR]) ||
1447-
CPC_IN_PCC(&cpc_desc->cpc_regs[CTR_WRAP_TIME]))
1448-
return true;
1449-
1450-
1451-
ref_perf_reg = &cpc_desc->cpc_regs[REFERENCE_PERF];
1452-
1453-
/*
1454-
* If reference perf register is not supported then we should
1455-
* use the nominal perf value
1456-
*/
1457-
if (!CPC_SUPPORTED(ref_perf_reg))
1458-
ref_perf_reg = &cpc_desc->cpc_regs[NOMINAL_PERF];
1459-
1460-
if (CPC_IN_PCC(ref_perf_reg))
1466+
if (cppc_perf_ctrs_in_pcc_cpu(cpu))
14611467
return true;
14621468
}
14631469

drivers/base/power/main.c

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1647,10 +1647,11 @@ static void device_suspend_late(struct device *dev, pm_message_t state, bool asy
16471647
goto Complete;
16481648

16491649
/*
1650-
* Disable runtime PM for the device without checking if there is a
1651-
* pending resume request for it.
1650+
* After this point, any runtime PM operations targeting the device
1651+
* will fail until the corresponding pm_runtime_enable() call in
1652+
* device_resume_early().
16521653
*/
1653-
__pm_runtime_disable(dev, false);
1654+
pm_runtime_disable(dev);
16541655

16551656
if (dev->power.syscore)
16561657
goto Skip;

drivers/base/power/wakeirq.c

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,13 +83,16 @@ EXPORT_SYMBOL_GPL(dev_pm_set_wake_irq);
8383
*/
8484
void dev_pm_clear_wake_irq(struct device *dev)
8585
{
86-
struct wake_irq *wirq = dev->power.wakeirq;
86+
struct wake_irq *wirq;
8787
unsigned long flags;
8888

89-
if (!wirq)
89+
spin_lock_irqsave(&dev->power.lock, flags);
90+
wirq = dev->power.wakeirq;
91+
if (!wirq) {
92+
spin_unlock_irqrestore(&dev->power.lock, flags);
9093
return;
94+
}
9195

92-
spin_lock_irqsave(&dev->power.lock, flags);
9396
device_wakeup_detach_irq(dev);
9497
dev->power.wakeirq = NULL;
9598
spin_unlock_irqrestore(&dev->power.lock, flags);

drivers/base/power/wakeup.c

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -275,9 +275,7 @@ EXPORT_SYMBOL_GPL(wakeup_sources_read_unlock);
275275
*/
276276
struct wakeup_source *wakeup_sources_walk_start(void)
277277
{
278-
struct list_head *ws_head = &wakeup_sources;
279-
280-
return list_entry_rcu(ws_head->next, struct wakeup_source, entry);
278+
return list_first_or_null_rcu(&wakeup_sources, struct wakeup_source, entry);
281279
}
282280
EXPORT_SYMBOL_GPL(wakeup_sources_walk_start);
283281

0 commit comments

Comments
 (0)