Skip to content

Commit e34881c

Browse files
Zicheng QuPeter Zijlstra
authored andcommitted
sched: Re-evaluate scheduling when migrating queued tasks out of throttled cgroups
Consider the following sequence on a CPU configured with nohz_full: 1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS bandwidth control. The gse (cgroup A) where the task P attached is dequeued and the CPU switches to idle. 2) Before cgroup A is unthrottled, task P is migrated from cgroup A to another cgroup B (not throttled). During sched_move_task(), the task P is observed as queued but not running, and therefore no resched_curr() is triggered. 3) Since the CPU is nohz_full, it remains in do_idle() waiting for an explicit scheduling event, i.e., resched_curr(). 4) For kernel <= 5.10: Later, cgroup A is unthrottled. However, the task P has already been migrated out of cgroup A, so unthrottle_cfs_rq() may observe load_weight == 0 and return early without resched_curr() called. For kernel >= 6.6: The unthrottling path normally triggers `resched_curr()` almost cases even when no runnable tasks remain in the unthrottled cgroup, preventing the idle stall described above. However, if cgroup A is removed before it gets unthrottled, the unthrottling path for cgroup A is never executed. In a result, no `resched_curr()` can be called. 5) At this point, the task P is runnable in cgroup B (not throttled), but the CPU remains in do_idle() with no pending reschedule point. The system stays in this state until an unrelated event (e.g. a new task wakeup or any cases) that can trigger a resched_curr() breaks the nohz_full idle state, and then the task P finally gets scheduled. The root cause is that sched_move_task() may classify the task as only queued, not running, and therefore fails to trigger a resched_curr(), while the later unthrottling path no longer has visibility of the migrated task. Preserve the existing behavior for running tasks by issuing resched_curr(), and explicitly invoke check_preempt_curr() for tasks that were queued at the time of migration. This ensures that runnable tasks are reconsidered for scheduling even when nohz_full suppresses periodic ticks. Fixes: 29f59db ("sched: group-scheduler core") Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com> Reviewed-by: Aaron Lu <ziqianlu@bytedance.com> Tested-by: Aaron Lu <ziqianlu@bytedance.com> Link: https://patch.msgid.link/20260130083438.1122457-1-quzicheng@huawei.com
1 parent 742fe83 commit e34881c

1 file changed

Lines changed: 4 additions & 0 deletions

File tree

kernel/sched/core.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9126,6 +9126,7 @@ void sched_move_task(struct task_struct *tsk, bool for_autogroup)
91269126
{
91279127
unsigned int queue_flags = DEQUEUE_SAVE | DEQUEUE_MOVE;
91289128
bool resched = false;
9129+
bool queued = false;
91299130
struct rq *rq;
91309131

91319132
CLASS(task_rq_lock, rq_guard)(tsk);
@@ -9137,10 +9138,13 @@ void sched_move_task(struct task_struct *tsk, bool for_autogroup)
91379138
scx_cgroup_move_task(tsk);
91389139
if (scope->running)
91399140
resched = true;
9141+
queued = scope->queued;
91409142
}
91419143

91429144
if (resched)
91439145
resched_curr(rq);
9146+
else if (queued)
9147+
wakeup_preempt(rq, tsk, 0);
91449148

91459149
__balance_callbacks(rq, &rq_guard.rf);
91469150
}

0 commit comments

Comments
 (0)