Skip to content

Commit 61e6b71

Browse files
tomaszliThomas Hellström
authored andcommitted
drm/xe/vf: Stop waiting for ring space on VF post migration recovery
If wait for ring space started just before migration, it can delay the recovery process, by waiting without bailout path for up to 2 seconds. Two second wait for recovery is not acceptable, and if the ring was completely filled even without the migration temporarily stopping execution, then such a wait will result in up to a thousand new jobs (assuming constant flow) being added while the wait is happening. While this will not cause data corruption, it will lead to warning messages getting logged due to reset being scheduled on a GT under recovery. Also several seconds of unresponsiveness, as the backlog of jobs gets progressively executed. Add a bailout condition, to make sure the recovery starts without much delay. The recovery is expected to finish in about 100 ms when under moderate stress, so the condition verification period needs to be below that - settling at 64 ms. The theoretical max time which the recovery can take depends on how many requests can be emitted to engine rings and be pending execution. While stress testing, it was possible to reach 10k pending requests on rings when a platform with two GTs was used. This resulted in max recovery time of 5 seconds. But in real life situations, it is very unlikely that the amount of pending requests will ever exceed 100, and for that the recovery time will be around 50 ms - well within our claimed limit of 100ms. Fixes: a4dae94 ("drm/xe/vf: Wakeup in GuC backend on VF post migration recovery") Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251204200820.2206168-1-tomasz.lis@intel.com (cherry picked from commit a00e305) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
1 parent 17d52ab commit 61e6b71

1 file changed

Lines changed: 6 additions & 4 deletions

File tree

drivers/gpu/drm/xe/xe_guc_submit.c

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -722,21 +722,23 @@ static int wq_wait_for_space(struct xe_exec_queue *q, u32 wqi_size)
722722
struct xe_guc *guc = exec_queue_to_guc(q);
723723
struct xe_device *xe = guc_to_xe(guc);
724724
struct iosys_map map = xe_lrc_parallel_map(q->lrc[0]);
725-
unsigned int sleep_period_ms = 1;
725+
unsigned int sleep_period_ms = 1, sleep_total_ms = 0;
726726

727727
#define AVAILABLE_SPACE \
728728
CIRC_SPACE(q->guc->wqi_tail, q->guc->wqi_head, WQ_SIZE)
729729
if (wqi_size > AVAILABLE_SPACE && !vf_recovery(guc)) {
730730
try_again:
731731
q->guc->wqi_head = parallel_read(xe, map, wq_desc.head);
732-
if (wqi_size > AVAILABLE_SPACE) {
733-
if (sleep_period_ms == 1024) {
732+
if (wqi_size > AVAILABLE_SPACE && !vf_recovery(guc)) {
733+
if (sleep_total_ms > 2000) {
734734
xe_gt_reset_async(q->gt);
735735
return -ENODEV;
736736
}
737737

738738
msleep(sleep_period_ms);
739-
sleep_period_ms <<= 1;
739+
sleep_total_ms += sleep_period_ms;
740+
if (sleep_period_ms < 64)
741+
sleep_period_ms <<= 1;
740742
goto try_again;
741743
}
742744
}

0 commit comments

Comments
 (0)