Skip to content

Commit fb3965f

Browse files
drm/i915/guc: Flag an error if an engine reset fails
If GuC encounters an error during engine reset, the i915 driver promotes to full GT reset. This includes an info message about why the reset is happening. However, that is not treated as a failure by any of the CI systems because resets are an expected occurrance during testing. This kind of failure is a major problem and should never happen. So, complain more loudly and make sure CI notices. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211211065859.2248188-4-John.C.Harrison@Intel.com
1 parent 0dd8674 commit fb3965f

1 file changed

Lines changed: 11 additions & 3 deletions

File tree

drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4033,11 +4033,12 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
40334033
const u32 *msg, u32 len)
40344034
{
40354035
struct intel_engine_cs *engine;
4036+
struct intel_gt *gt = guc_to_gt(guc);
40364037
u8 guc_class, instance;
40374038
u32 reason;
40384039

40394040
if (unlikely(len != 3)) {
4040-
drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len);
4041+
drm_err(&gt->i915->drm, "Invalid length %u", len);
40414042
return -EPROTO;
40424043
}
40434044

@@ -4047,12 +4048,19 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc,
40474048

40484049
engine = guc_lookup_engine(guc, guc_class, instance);
40494050
if (unlikely(!engine)) {
4050-
drm_err(&guc_to_gt(guc)->i915->drm,
4051+
drm_err(&gt->i915->drm,
40514052
"Invalid engine %d:%d", guc_class, instance);
40524053
return -EPROTO;
40534054
}
40544055

4055-
intel_gt_handle_error(guc_to_gt(guc), engine->mask,
4056+
/*
4057+
* This is an unexpected failure of a hardware feature. So, log a real
4058+
* error message not just the informational that comes with the reset.
4059+
*/
4060+
drm_err(&gt->i915->drm, "GuC engine reset request failed on %d:%d (%s) because 0x%08X",
4061+
guc_class, instance, engine->name, reason);
4062+
4063+
intel_gt_handle_error(gt, engine->mask,
40564064
I915_ERROR_CAPTURE,
40574065
"GuC failed to reset %s (reason=0x%08x)\n",
40584066
engine->name, reason);

0 commit comments

Comments
 (0)