Skip to content

Commit ead70b7

Browse files
Frederic WeisbeckerKAGA-KOKO
authored andcommitted
timers/nohz: Add a comment about broken iowait counter update race
The per-cpu iowait task counter is incremented locally upon sleeping. But since the task can be woken to (and by) another CPU, the counter may then be decremented remotely. This is the source of a race involving readers VS writer of idle/iowait sleeptime. The following scenario shows an example where a /proc/stat reader observes a pending sleep time as IO whereas that pending sleep time later eventually gets accounted as non-IO. CPU 0 CPU 1 CPU 2 ----- ----- ------ //io_schedule() TASK A current->in_iowait = 1 rq(0)->nr_iowait++ //switch to idle // READ /proc/stat // See nr_iowait_cpu(0) == 1 return ts->iowait_sleeptime + ktime_sub(ktime_get(), ts->idle_entrytime) //try_to_wake_up(TASK A) rq(0)->nr_iowait-- //idle exit // See nr_iowait_cpu(0) == 0 ts->idle_sleeptime += ktime_sub(ktime_get(), ts->idle_entrytime) As a result subsequent reads on /proc/stat may expose backward progress. This is unfortunately hardly fixable. Just add a comment about that condition. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230222144649.624380-5-frederic@kernel.org
1 parent 620a30f commit ead70b7

1 file changed

Lines changed: 8 additions & 2 deletions

File tree

kernel/time/tick-sched.c

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -705,7 +705,10 @@ static u64 get_cpu_sleep_time_us(struct tick_sched *ts, ktime_t *sleeptime,
705705
* counters if NULL.
706706
*
707707
* Return the cumulative idle time (since boot) for a given
708-
* CPU, in microseconds.
708+
* CPU, in microseconds. Note this is partially broken due to
709+
* the counter of iowait tasks that can be remotely updated without
710+
* any synchronization. Therefore it is possible to observe backward
711+
* values within two consecutive reads.
709712
*
710713
* This time is measured via accounting rather than sampling,
711714
* and is as accurate as ktime_get() is.
@@ -728,7 +731,10 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
728731
* counters if NULL.
729732
*
730733
* Return the cumulative iowait time (since boot) for a given
731-
* CPU, in microseconds.
734+
* CPU, in microseconds. Note this is partially broken due to
735+
* the counter of iowait tasks that can be remotely updated without
736+
* any synchronization. Therefore it is possible to observe backward
737+
* values within two consecutive reads.
732738
*
733739
* This time is measured via accounting rather than sampling,
734740
* and is as accurate as ktime_get() is.

0 commit comments

Comments
 (0)