@@ -20,6 +20,21 @@ possible source of information on its own, the EM framework intervenes as an
2020abstraction layer which standardizes the format of power cost tables in the
2121kernel, hence enabling to avoid redundant work.
2222
23+ The power values might be expressed in milli-Watts or in an 'abstract scale'.
24+ Multiple subsystems might use the EM and it is up to the system integrator to
25+ check that the requirements for the power value scale types are met. An example
26+ can be found in the Energy-Aware Scheduler documentation
27+ Documentation/scheduler/sched-energy.rst. For some subsystems like thermal or
28+ powercap power values expressed in an 'abstract scale' might cause issues.
29+ These subsystems are more interested in estimation of power used in the past,
30+ thus the real milli-Watts might be needed. An example of these requirements can
31+ be found in the Intelligent Power Allocation in
32+ Documentation/driver-api/thermal/power_allocator.rst.
33+ Kernel subsystems might implement automatic detection to check whether EM
34+ registered devices have inconsistent scale (based on EM internal flag).
35+ Important thing to keep in mind is that when the power values are expressed in
36+ an 'abstract scale' deriving real energy in milli-Joules would not be possible.
37+
2338The figure below depicts an example of drivers (Arm-specific here, but the
2439approach is applicable to any architecture) providing power costs to the EM
2540framework, and interested clients reading the data from it::
@@ -73,14 +88,18 @@ Drivers are expected to register performance domains into the EM framework by
7388calling the following API::
7489
7590 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
76- struct em_data_callback *cb, cpumask_t *cpus);
91+ struct em_data_callback *cb, cpumask_t *cpus, bool milliwatts );
7792
7893Drivers must provide a callback function returning <frequency, power> tuples
7994for each performance state. The callback function provided by the driver is free
8095to fetch data from any relevant location (DT, firmware, ...), and by any mean
8196deemed necessary. Only for CPU devices, drivers must specify the CPUs of the
8297performance domains using cpumask. For other devices than CPUs the last
8398argument must be set to NULL.
99+ The last argument 'milliwatts' is important to set with correct value. Kernel
100+ subsystems which use EM might rely on this flag to check if all EM devices use
101+ the same scale. If there are different scales, these subsystems might decide
102+ to: return warning/error, stop working or panic.
84103See Section 3. for an example of driver implementing this
85104callback, and kernel/power/energy_model.c for further documentation on this
86105API.
@@ -156,7 +175,8 @@ EM framework::
156175 37 nr_opp = foo_get_nr_opp(policy);
157176 38
158177 39 /* And register the new performance domain */
159- 40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus);
160- 41
161- 42 return 0;
162- 43 }
178+ 40 em_dev_register_perf_domain(cpu_dev, nr_opp, &em_cb, policy->cpus,
179+ 41 true);
180+ 42
181+ 43 return 0;
182+ 44 }
0 commit comments