Skip to content

Commit b57b849

Browse files
Tang YizhouJonathan Corbet
authored andcommitted
docs: scheduler: Convert schedutil.txt to ReST
All other scheduler documents have been converted to *.rst. Let's do the same for schedutil.txt. Also fixed some typos. Signed-off-by: Tang Yizhou <tangyizhou@huawei.com> Link: https://lore.kernel.org/r/20220312070751.16844-1-tangyizhou@huawei.com Signed-off-by: Jonathan Corbet <corbet@lwn.net>
1 parent ff13687 commit b57b849

2 files changed

Lines changed: 18 additions & 13 deletions

File tree

Documentation/scheduler/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Linux Scheduler
1414
sched-domains
1515
sched-capacity
1616
sched-energy
17+
schedutil
1718
sched-nice-design
1819
sched-rt-group
1920
sched-stats
Lines changed: 17 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1+
=========
2+
Schedutil
3+
=========
14

5+
.. note::
26

3-
NOTE; all this assumes a linear relation between frequency and work capacity,
4-
we know this is flawed, but it is the best workable approximation.
7+
All this assumes a linear relation between frequency and work capacity,
8+
we know this is flawed, but it is the best workable approximation.
59

610

711
PELT (Per Entity Load Tracking)
8-
-------------------------------
12+
===============================
913

1014
With PELT we track some metrics across the various scheduler entities, from
1115
individual tasks to task-group slices to CPU runqueues. As the basis for this
@@ -38,24 +42,24 @@ while 'runnable' will increase to reflect the amount of contention.
3842
For more detail see: kernel/sched/pelt.c
3943

4044

41-
Frequency- / CPU Invariance
42-
---------------------------
45+
Frequency / CPU Invariance
46+
==========================
4347

4448
Because consuming the CPU for 50% at 1GHz is not the same as consuming the CPU
4549
for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% on
4650
a big CPU, we allow architectures to scale the time delta with two ratios, one
4751
Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.
4852

4953
For simple DVFS architectures (where software is in full control) we trivially
50-
compute the ratio as:
54+
compute the ratio as::
5155

5256
f_cur
5357
r_dvfs := -----
5458
f_max
5559

5660
For more dynamic systems where the hardware is in control of DVFS we use
5761
hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this ratio.
58-
For Intel specifically, we use:
62+
For Intel specifically, we use::
5963

6064
APERF
6165
f_cur := ----- * P0
@@ -87,7 +91,7 @@ For more detail see:
8791

8892

8993
UTIL_EST / UTIL_EST_FASTUP
90-
--------------------------
94+
==========================
9195

9296
Because periodic tasks have their averages decayed while they sleep, even
9397
though when running their expected utilization will be the same, they suffer a
@@ -106,7 +110,7 @@ For more detail see: kernel/sched/fair.c:util_est_dequeue()
106110

107111

108112
UCLAMP
109-
------
113+
======
110114

111115
It is possible to set effective u_min and u_max clamps on each CFS or RT task;
112116
the runqueue keeps an max aggregate of these clamps for all running tasks.
@@ -115,15 +119,15 @@ For more detail see: include/uapi/linux/sched/types.h
115119

116120

117121
Schedutil / DVFS
118-
----------------
122+
================
119123

120124
Every time the scheduler load tracking is updated (task wakeup, task
121125
migration, time progression) we call out to schedutil to update the hardware
122126
DVFS state.
123127

124128
The basis is the CPU runqueue's 'running' metric, which per the above it is
125129
the frequency invariant utilization estimate of the CPU. From this we compute
126-
a desired frequency like:
130+
a desired frequency like::
127131

128132
max( running, util_est ); if UTIL_EST
129133
u_cfs := { running; otherwise
@@ -135,7 +139,7 @@ a desired frequency like:
135139

136140
f_des := min( f_max, 1.25 u * f_max )
137141

138-
XXX IO-wait; when the update is due to a task wakeup from IO-completion we
142+
XXX IO-wait: when the update is due to a task wakeup from IO-completion we
139143
boost 'u' above.
140144

141145
This frequency is then used to select a P-state/OPP or directly munged into a
@@ -153,7 +157,7 @@ For more information see: kernel/sched/cpufreq_schedutil.c
153157

154158

155159
NOTES
156-
-----
160+
=====
157161

158162
- On low-load scenarios, where DVFS is most relevant, the 'running' numbers
159163
will closely reflect utilization.

0 commit comments

Comments
 (0)