1+ =========
2+ Schedutil
3+ =========
14
5+ .. note ::
26
3- NOTE; all this assumes a linear relation between frequency and work capacity,
4- we know this is flawed, but it is the best workable approximation.
7+ All this assumes a linear relation between frequency and work capacity,
8+ we know this is flawed, but it is the best workable approximation.
59
610
711PELT (Per Entity Load Tracking)
8- -------------------------------
12+ ===============================
913
1014With PELT we track some metrics across the various scheduler entities, from
1115individual tasks to task-group slices to CPU runqueues. As the basis for this
@@ -38,24 +42,24 @@ while 'runnable' will increase to reflect the amount of contention.
3842For more detail see: kernel/sched/pelt.c
3943
4044
41- Frequency- / CPU Invariance
42- ---------------------------
45+ Frequency / CPU Invariance
46+ ==========================
4347
4448Because consuming the CPU for 50% at 1GHz is not the same as consuming the CPU
4549for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% on
4650a big CPU, we allow architectures to scale the time delta with two ratios, one
4751Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.
4852
4953For simple DVFS architectures (where software is in full control) we trivially
50- compute the ratio as:
54+ compute the ratio as::
5155
5256 f_cur
5357 r_dvfs := -----
5458 f_max
5559
5660For more dynamic systems where the hardware is in control of DVFS we use
5761hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this ratio.
58- For Intel specifically, we use:
62+ For Intel specifically, we use::
5963
6064 APERF
6165 f_cur := ----- * P0
@@ -87,7 +91,7 @@ For more detail see:
8791
8892
8993UTIL_EST / UTIL_EST_FASTUP
90- --------------------------
94+ ==========================
9195
9296Because periodic tasks have their averages decayed while they sleep, even
9397though when running their expected utilization will be the same, they suffer a
@@ -106,7 +110,7 @@ For more detail see: kernel/sched/fair.c:util_est_dequeue()
106110
107111
108112UCLAMP
109- ------
113+ ======
110114
111115It is possible to set effective u_min and u_max clamps on each CFS or RT task;
112116the runqueue keeps an max aggregate of these clamps for all running tasks.
@@ -115,15 +119,15 @@ For more detail see: include/uapi/linux/sched/types.h
115119
116120
117121Schedutil / DVFS
118- ----------------
122+ ================
119123
120124Every time the scheduler load tracking is updated (task wakeup, task
121125migration, time progression) we call out to schedutil to update the hardware
122126DVFS state.
123127
124128The basis is the CPU runqueue's 'running' metric, which per the above it is
125129the frequency invariant utilization estimate of the CPU. From this we compute
126- a desired frequency like:
130+ a desired frequency like::
127131
128132 max( running, util_est ); if UTIL_EST
129133 u_cfs := { running; otherwise
@@ -135,7 +139,7 @@ a desired frequency like:
135139
136140 f_des := min( f_max, 1.25 u * f_max )
137141
138- XXX IO-wait; when the update is due to a task wakeup from IO-completion we
142+ XXX IO-wait: when the update is due to a task wakeup from IO-completion we
139143boost 'u' above.
140144
141145This frequency is then used to select a P-state/OPP or directly munged into a
@@ -153,7 +157,7 @@ For more information see: kernel/sched/cpufreq_schedutil.c
153157
154158
155159NOTES
156- -----
160+ =====
157161
158162 - On low-load scenarios, where DVFS is most relevant, the 'running' numbers
159163 will closely reflect utilization.
0 commit comments