Skip to content

Commit 46f2029

Browse files
Jie1zhangalexdeucher
authored andcommitted
drm/amdgpu: resume MES scheduling after user queue hang detection and recovery
This patch ensures the Micro-Engine Scheduler (MES) is properly resumed after detecting and recovering from a user queue hang condition. Key changes: 1. Track when a hung user queue is detected using found_hung_queue flag 2. Call amdgpu_mes_resume() to restart MES scheduling after completing the hang recovery process 3. This complements the existing recovery steps (fence force completion and device wedging) by ensuring the scheduler can process new work Without this resume call, the MES scheduler may remain in a paused state even after the hung queue has been handled, preventing newly submitted work from being processed and leading to system stalls. Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1 parent 5479855 commit 46f2029

1 file changed

Lines changed: 7 additions & 0 deletions

File tree

drivers/gpu/drm/amd/amdgpu/mes_userqueue.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,7 @@ static int mes_userq_detect_and_reset(struct amdgpu_device *adev,
208208
unsigned int hung_db_num = 0;
209209
unsigned long queue_id;
210210
u32 db_array[8];
211+
bool found_hung_queue = false;
211212
int r, i;
212213

213214
if (db_array_size > 8) {
@@ -232,6 +233,7 @@ static int mes_userq_detect_and_reset(struct amdgpu_device *adev,
232233
for (i = 0; i < hung_db_num; i++) {
233234
if (queue->doorbell_index == db_array[i]) {
234235
queue->state = AMDGPU_USERQ_STATE_HUNG;
236+
found_hung_queue = true;
235237
atomic_inc(&adev->gpu_reset_counter);
236238
amdgpu_userq_fence_driver_force_completion(queue);
237239
drm_dev_wedged_event(adev_to_drm(adev), DRM_WEDGE_RECOVERY_NONE, NULL);
@@ -241,6 +243,11 @@ static int mes_userq_detect_and_reset(struct amdgpu_device *adev,
241243
}
242244
}
243245

246+
if (found_hung_queue) {
247+
/* Resume scheduling after hang recovery */
248+
r = amdgpu_mes_resume(adev);
249+
}
250+
244251
return r;
245252
}
246253

0 commit comments

Comments
 (0)