Skip to content

Commit 377e388

Browse files
committed
Merge back cpufreq material for 6.19
2 parents 4b747cc + 4ab25c9 commit 377e388

3 files changed

Lines changed: 132 additions & 112 deletions

File tree

Documentation/admin-guide/pm/intel_pstate.rst

Lines changed: 74 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,9 @@ only way to pass early-configuration-time parameters to it is via the kernel
4848
command line. However, its configuration can be adjusted via ``sysfs`` to a
4949
great extent. In some configurations it even is possible to unregister it via
5050
``sysfs`` which allows another ``CPUFreq`` scaling driver to be loaded and
51-
registered (see `below <status_attr_>`_).
51+
registered (see :ref:`below <status_attr>`).
5252

53+
.. _operation_modes:
5354

5455
Operation Modes
5556
===============
@@ -62,6 +63,8 @@ a certain performance scaling algorithm. Which of them will be in effect
6263
depends on what kernel command line options are used and on the capabilities of
6364
the processor.
6465

66+
.. _active_mode:
67+
6568
Active Mode
6669
-----------
6770

@@ -94,6 +97,8 @@ Which of the P-state selection algorithms is used by default depends on the
9497
Namely, if that option is set, the ``performance`` algorithm will be used by
9598
default, and the other one will be used by default if it is not set.
9699

100+
.. _active_mode_hwp:
101+
97102
Active Mode With HWP
98103
~~~~~~~~~~~~~~~~~~~~
99104

@@ -123,7 +128,7 @@ Energy-Performance Bias (EPB) knob (otherwise), which means that the processor's
123128
internal P-state selection logic is expected to focus entirely on performance.
124129

125130
This will override the EPP/EPB setting coming from the ``sysfs`` interface
126-
(see `Energy vs Performance Hints`_ below). Moreover, any attempts to change
131+
(see :ref:`energy_performance_hints` below). Moreover, any attempts to change
127132
the EPP/EPB to a value different from 0 ("performance") via ``sysfs`` in this
128133
configuration will be rejected.
129134

@@ -192,6 +197,8 @@ This is the default P-state selection algorithm if the
192197
:c:macro:`CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE` kernel configuration option
193198
is not set.
194199

200+
.. _passive_mode:
201+
195202
Passive Mode
196203
------------
197204

@@ -289,12 +296,12 @@ Unlike ``_PSS`` objects in the ACPI tables, ``intel_pstate`` always exposes
289296
the entire range of available P-states, including the whole turbo range, to the
290297
``CPUFreq`` core and (in the passive mode) to generic scaling governors. This
291298
generally causes turbo P-states to be set more often when ``intel_pstate`` is
292-
used relative to ACPI-based CPU performance scaling (see `below <acpi-cpufreq_>`_
293-
for more information).
299+
used relative to ACPI-based CPU performance scaling (see
300+
:ref:`below <acpi-cpufreq>` for more information).
294301

295302
Moreover, since ``intel_pstate`` always knows what the real turbo threshold is
296303
(even if the Configurable TDP feature is enabled in the processor), its
297-
``no_turbo`` attribute in ``sysfs`` (described `below <no_turbo_attr_>`_) should
304+
``no_turbo`` attribute in ``sysfs`` (described :ref:`below <no_turbo_attr>`) should
298305
work as expected in all cases (that is, if set to disable turbo P-states, it
299306
always should prevent ``intel_pstate`` from using them).
300307

@@ -307,12 +314,12 @@ pieces of information on it to be known, including:
307314

308315
* The minimum supported P-state.
309316

310-
* The maximum supported `non-turbo P-state <turbo_>`_.
317+
* The maximum supported :ref:`non-turbo P-state <turbo>`.
311318

312319
* Whether or not turbo P-states are supported at all.
313320

314-
* The maximum supported `one-core turbo P-state <turbo_>`_ (if turbo P-states
315-
are supported).
321+
* The maximum supported :ref:`one-core turbo P-state <turbo>` (if turbo
322+
P-states are supported).
316323

317324
* The scaling formula to translate the driver's internal representation
318325
of P-states into frequencies and the other way around.
@@ -400,10 +407,10 @@ Energy-Aware Scheduling Support
400407

401408
If ``CONFIG_ENERGY_MODEL`` has been set during kernel configuration and
402409
``intel_pstate`` runs on a hybrid processor without SMT, in addition to enabling
403-
`CAS <CAS_>`_ it registers an Energy Model for the processor. This allows the
410+
:ref:`CAS` it registers an Energy Model for the processor. This allows the
404411
Energy-Aware Scheduling (EAS) support to be enabled in the CPU scheduler if
405412
``schedutil`` is used as the ``CPUFreq`` governor which requires ``intel_pstate``
406-
to operate in the `passive mode <Passive Mode_>`_.
413+
to operate in the :ref:`passive mode <passive_mode>`.
407414

408415
The Energy Model registered by ``intel_pstate`` is artificial (that is, it is
409416
based on abstract cost values and it does not include any real power numbers)
@@ -432,6 +439,8 @@ the ``energy_model`` directory in ``debugfs`` (typlically mounted on
432439
User Space Interface in ``sysfs``
433440
=================================
434441

442+
.. _global_attributes:
443+
435444
Global Attributes
436445
-----------------
437446

@@ -444,17 +453,17 @@ argument is passed to the kernel in the command line.
444453

445454
``max_perf_pct``
446455
Maximum P-state the driver is allowed to set in percent of the
447-
maximum supported performance level (the highest supported `turbo
448-
P-state <turbo_>`_).
456+
maximum supported performance level (the highest supported :ref:`turbo
457+
P-state <turbo>`).
449458

450459
This attribute will not be exposed if the
451460
``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
452461
command line.
453462

454463
``min_perf_pct``
455464
Minimum P-state the driver is allowed to set in percent of the
456-
maximum supported performance level (the highest supported `turbo
457-
P-state <turbo_>`_).
465+
maximum supported performance level (the highest supported :ref:`turbo
466+
P-state <turbo>`).
458467

459468
This attribute will not be exposed if the
460469
``intel_pstate=per_cpu_perf_limits`` argument is present in the kernel
@@ -463,18 +472,18 @@ argument is passed to the kernel in the command line.
463472
``num_pstates``
464473
Number of P-states supported by the processor (between 0 and 255
465474
inclusive) including both turbo and non-turbo P-states (see
466-
`Turbo P-states Support`_).
475+
:ref:`turbo`).
467476

468477
This attribute is present only if the value exposed by it is the same
469478
for all of the CPUs in the system.
470479

471480
The value of this attribute is not affected by the ``no_turbo``
472-
setting described `below <no_turbo_attr_>`_.
481+
setting described :ref:`below <no_turbo_attr>`.
473482

474483
This attribute is read-only.
475484

476485
``turbo_pct``
477-
Ratio of the `turbo range <turbo_>`_ size to the size of the entire
486+
Ratio of the :ref:`turbo range <turbo>` size to the size of the entire
478487
range of supported P-states, in percent.
479488

480489
This attribute is present only if the value exposed by it is the same
@@ -486,7 +495,7 @@ argument is passed to the kernel in the command line.
486495

487496
``no_turbo``
488497
If set (equal to 1), the driver is not allowed to set any turbo P-states
489-
(see `Turbo P-states Support`_). If unset (equal to 0, which is the
498+
(see :ref:`turbo`). If unset (equal to 0, which is the
490499
default), turbo P-states can be set by the driver.
491500
[Note that ``intel_pstate`` does not support the general ``boost``
492501
attribute (supported by some other scaling drivers) which is replaced
@@ -495,11 +504,11 @@ argument is passed to the kernel in the command line.
495504
This attribute does not affect the maximum supported frequency value
496505
supplied to the ``CPUFreq`` core and exposed via the policy interface,
497506
but it affects the maximum possible value of per-policy P-state limits
498-
(see `Interpretation of Policy Attributes`_ below for details).
507+
(see :ref:`policy_attributes_interpretation` below for details).
499508

500509
``hwp_dynamic_boost``
501510
This attribute is only present if ``intel_pstate`` works in the
502-
`active mode with the HWP feature enabled <Active Mode With HWP_>`_ in
511+
:ref:`active mode with the HWP feature enabled <active_mode_hwp>` in
503512
the processor. If set (equal to 1), it causes the minimum P-state limit
504513
to be increased dynamically for a short time whenever a task previously
505514
waiting on I/O is selected to run on a given logical CPU (the purpose
@@ -514,12 +523,12 @@ argument is passed to the kernel in the command line.
514523
Operation mode of the driver: "active", "passive" or "off".
515524

516525
"active"
517-
The driver is functional and in the `active mode
518-
<Active Mode_>`_.
526+
The driver is functional and in the :ref:`active mode
527+
<active_mode>`.
519528

520529
"passive"
521-
The driver is functional and in the `passive mode
522-
<Passive Mode_>`_.
530+
The driver is functional and in the :ref:`passive mode
531+
<passive_mode>`.
523532

524533
"off"
525534
The driver is not functional (it is not registered as a scaling
@@ -547,13 +556,15 @@ argument is passed to the kernel in the command line.
547556
attribute to "1" enables the energy-efficiency optimizations and setting
548557
to "0" disables them.
549558

559+
.. _policy_attributes_interpretation:
560+
550561
Interpretation of Policy Attributes
551562
-----------------------------------
552563

553564
The interpretation of some ``CPUFreq`` policy attributes described in
554565
Documentation/admin-guide/pm/cpufreq.rst is special with ``intel_pstate``
555566
as the current scaling driver and it generally depends on the driver's
556-
`operation mode <Operation Modes_>`_.
567+
:ref:`operation mode <operation_modes>`.
557568

558569
First of all, the values of the ``cpuinfo_max_freq``, ``cpuinfo_min_freq`` and
559570
``scaling_cur_freq`` attributes are produced by applying a processor-specific
@@ -562,9 +573,10 @@ Also, the values of the ``scaling_max_freq`` and ``scaling_min_freq``
562573
attributes are capped by the frequency corresponding to the maximum P-state that
563574
the driver is allowed to set.
564575

565-
If the ``no_turbo`` `global attribute <no_turbo_attr_>`_ is set, the driver is
566-
not allowed to use turbo P-states, so the maximum value of ``scaling_max_freq``
567-
and ``scaling_min_freq`` is limited to the maximum non-turbo P-state frequency.
576+
If the ``no_turbo`` :ref:`global attribute <no_turbo_attr>` is set, the driver
577+
is not allowed to use turbo P-states, so the maximum value of
578+
``scaling_max_freq`` and ``scaling_min_freq`` is limited to the maximum
579+
non-turbo P-state frequency.
568580
Accordingly, setting ``no_turbo`` causes ``scaling_max_freq`` and
569581
``scaling_min_freq`` to go down to that value if they were above it before.
570582
However, the old values of ``scaling_max_freq`` and ``scaling_min_freq`` will be
@@ -576,7 +588,7 @@ and ``scaling_min_freq`` corresponds to the maximum supported turbo P-state,
576588
which also is the value of ``cpuinfo_max_freq`` in either case.
577589

578590
Next, the following policy attributes have special meaning if
579-
``intel_pstate`` works in the `active mode <Active Mode_>`_:
591+
``intel_pstate`` works in the :ref:`active mode <active_mode>`:
580592

581593
``scaling_available_governors``
582594
List of P-state selection algorithms provided by ``intel_pstate``.
@@ -597,20 +609,22 @@ processor:
597609
Shows the base frequency of the CPU. Any frequency above this will be
598610
in the turbo frequency range.
599611

600-
The meaning of these attributes in the `passive mode <Passive Mode_>`_ is the
612+
The meaning of these attributes in the :ref:`passive mode <passive_mode>` is the
601613
same as for other scaling drivers.
602614

603615
Additionally, the value of the ``scaling_driver`` attribute for ``intel_pstate``
604616
depends on the operation mode of the driver. Namely, it is either
605-
"intel_pstate" (in the `active mode <Active Mode_>`_) or "intel_cpufreq" (in the
606-
`passive mode <Passive Mode_>`_).
617+
"intel_pstate" (in the :ref:`active mode <active_mode>`) or "intel_cpufreq"
618+
(in the :ref:`passive mode <passive_mode>`).
619+
620+
.. _pstate_limits_coordination:
607621

608622
Coordination of P-State Limits
609623
------------------------------
610624

611625
``intel_pstate`` allows P-state limits to be set in two ways: with the help of
612-
the ``max_perf_pct`` and ``min_perf_pct`` `global attributes
613-
<Global Attributes_>`_ or via the ``scaling_max_freq`` and ``scaling_min_freq``
626+
the ``max_perf_pct`` and ``min_perf_pct`` :ref:`global attributes
627+
<global_attributes>` or via the ``scaling_max_freq`` and ``scaling_min_freq``
614628
``CPUFreq`` policy attributes. The coordination between those limits is based
615629
on the following rules, regardless of the current operation mode of the driver:
616630

@@ -632,17 +646,18 @@ on the following rules, regardless of the current operation mode of the driver:
632646

633647
3. The global and per-policy limits can be set independently.
634648

635-
In the `active mode with the HWP feature enabled <Active Mode With HWP_>`_, the
649+
In the :ref:`active mode with the HWP feature enabled <active_mode_hwp>`, the
636650
resulting effective values are written into hardware registers whenever the
637651
limits change in order to request its internal P-state selection logic to always
638652
set P-states within these limits. Otherwise, the limits are taken into account
639-
by scaling governors (in the `passive mode <Passive Mode_>`_) and by the driver
640-
every time before setting a new P-state for a CPU.
653+
by scaling governors (in the :ref:`passive mode <passive_mode>`) and by the
654+
driver every time before setting a new P-state for a CPU.
641655

642656
Additionally, if the ``intel_pstate=per_cpu_perf_limits`` command line argument
643657
is passed to the kernel, ``max_perf_pct`` and ``min_perf_pct`` are not exposed
644658
at all and the only way to set the limits is by using the policy attributes.
645659

660+
.. _energy_performance_hints:
646661

647662
Energy vs Performance Hints
648663
---------------------------
@@ -702,9 +717,9 @@ output.
702717
On those systems each ``_PSS`` object returns a list of P-states supported by
703718
the corresponding CPU which basically is a subset of the P-states range that can
704719
be used by ``intel_pstate`` on the same system, with one exception: the whole
705-
`turbo range <turbo_>`_ is represented by one item in it (the topmost one). By
706-
convention, the frequency returned by ``_PSS`` for that item is greater by 1 MHz
707-
than the frequency of the highest non-turbo P-state listed by it, but the
720+
:ref:`turbo range <turbo>` is represented by one item in it (the topmost one).
721+
By convention, the frequency returned by ``_PSS`` for that item is greater by
722+
1 MHz than the frequency of the highest non-turbo P-state listed by it, but the
708723
corresponding P-state representation (following the hardware specification)
709724
returned for it matches the maximum supported turbo P-state (or is the
710725
special value 255 meaning essentially "go as high as you can get").
@@ -730,18 +745,18 @@ benefit from running at turbo frequencies will be given non-turbo P-states
730745
instead.
731746

732747
One more issue related to that may appear on systems supporting the
733-
`Configurable TDP feature <turbo_>`_ allowing the platform firmware to set the
734-
turbo threshold. Namely, if that is not coordinated with the lists of P-states
735-
returned by ``_PSS`` properly, there may be more than one item corresponding to
736-
a turbo P-state in those lists and there may be a problem with avoiding the
737-
turbo range (if desirable or necessary). Usually, to avoid using turbo
738-
P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state listed
739-
by ``_PSS``, but that is not sufficient when there are other turbo P-states in
740-
the list returned by it.
748+
:ref:`Configurable TDP feature <turbo>` allowing the platform firmware to set
749+
the turbo threshold. Namely, if that is not coordinated with the lists of
750+
P-states returned by ``_PSS`` properly, there may be more than one item
751+
corresponding to a turbo P-state in those lists and there may be a problem with
752+
avoiding the turbo range (if desirable or necessary). Usually, to avoid using
753+
turbo P-states overall, ``acpi-cpufreq`` simply avoids using the topmost state
754+
listed by ``_PSS``, but that is not sufficient when there are other turbo
755+
P-states in the list returned by it.
741756

742757
Apart from the above, ``acpi-cpufreq`` works like ``intel_pstate`` in the
743-
`passive mode <Passive Mode_>`_, except that the number of P-states it can set
744-
is limited to the ones listed by the ACPI ``_PSS`` objects.
758+
:ref:`passive mode <passive_mode>`, except that the number of P-states it can
759+
set is limited to the ones listed by the ACPI ``_PSS`` objects.
745760

746761

747762
Kernel Command Line Options for ``intel_pstate``
@@ -756,11 +771,11 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
756771
processor is supported by it.
757772

758773
``active``
759-
Register ``intel_pstate`` in the `active mode <Active Mode_>`_ to start
760-
with.
774+
Register ``intel_pstate`` in the :ref:`active mode <active_mode>` to
775+
start with.
761776

762777
``passive``
763-
Register ``intel_pstate`` in the `passive mode <Passive Mode_>`_ to
778+
Register ``intel_pstate`` in the :ref:`passive mode <passive_mode>` to
764779
start with.
765780

766781
``force``
@@ -793,12 +808,12 @@ of them have to be prepended with the ``intel_pstate=`` prefix.
793808
and this option has no effect.
794809

795810
``per_cpu_perf_limits``
796-
Use per-logical-CPU P-State limits (see `Coordination of P-state
797-
Limits`_ for details).
811+
Use per-logical-CPU P-State limits (see
812+
:ref:`pstate_limits_coordination` for details).
798813

799814
``no_cas``
800-
Do not enable `capacity-aware scheduling <CAS_>`_ which is enabled by
801-
default on hybrid systems without SMT.
815+
Do not enable :ref:`capacity-aware scheduling <CAS>` which is enabled
816+
by default on hybrid systems without SMT.
802817

803818
Diagnostics and Tuning
804819
======================
@@ -810,7 +825,7 @@ There are two static trace events that can be used for ``intel_pstate``
810825
diagnostics. One of them is the ``cpu_frequency`` trace event generally used
811826
by ``CPUFreq``, and the other one is the ``pstate_sample`` trace event specific
812827
to ``intel_pstate``. Both of them are triggered by ``intel_pstate`` only if
813-
it works in the `active mode <Active Mode_>`_.
828+
it works in the :ref:`active mode <active_mode>`.
814829

815830
The following sequence of shell commands can be used to enable them and see
816831
their output (if the kernel is generally configured to support event tracing)::
@@ -822,7 +837,7 @@ their output (if the kernel is generally configured to support event tracing)::
822837
gnome-terminal--4510 [001] ..s. 1177.680733: pstate_sample: core_busy=107 scaled=94 from=26 to=26 mperf=1143818 aperf=1230607 tsc=29838618 freq=2474476
823838
cat-5235 [002] ..s. 1177.681723: cpu_frequency: state=2900000 cpu_id=2
824839

825-
If ``intel_pstate`` works in the `passive mode <Passive Mode_>`_, the
840+
If ``intel_pstate`` works in the :ref:`passive mode <passive_mode>`, the
826841
``cpu_frequency`` trace event will be triggered either by the ``schedutil``
827842
scaling governor (for the policies it is attached to), or by the ``CPUFreq``
828843
core (for the policies with other scaling governors).

drivers/cpufreq/cpufreq.c

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1421,9 +1421,12 @@ static int cpufreq_policy_online(struct cpufreq_policy *policy,
14211421
* If there is a problem with its frequency table, take it
14221422
* offline and drop it.
14231423
*/
1424-
ret = cpufreq_table_validate_and_sort(policy);
1425-
if (ret)
1426-
goto out_offline_policy;
1424+
if (policy->freq_table_sorted != CPUFREQ_TABLE_SORTED_ASCENDING &&
1425+
policy->freq_table_sorted != CPUFREQ_TABLE_SORTED_DESCENDING) {
1426+
ret = cpufreq_table_validate_and_sort(policy);
1427+
if (ret)
1428+
goto out_offline_policy;
1429+
}
14271430

14281431
/* related_cpus should at least include policy->cpus. */
14291432
cpumask_copy(policy->related_cpus, policy->cpus);
@@ -2550,7 +2553,7 @@ void cpufreq_unregister_governor(struct cpufreq_governor *governor)
25502553
for_each_inactive_policy(policy) {
25512554
if (!strcmp(policy->last_governor, governor->name)) {
25522555
policy->governor = NULL;
2553-
strcpy(policy->last_governor, "\0");
2556+
policy->last_governor[0] = '\0';
25542557
}
25552558
}
25562559
read_unlock_irqrestore(&cpufreq_driver_lock, flags);

0 commit comments

Comments
 (0)