Skip to content

Commit ca25834

Browse files
committed
accel/amdxdna: Fix deadlock between context destroy and job timeout
Hardware context destroy function holds dev_lock while waiting for all jobs to complete. The timeout job also needs to acquire dev_lock, this leads to a deadlock. Fix the issue by temporarily releasing dev_lock before waiting for all jobs to finish, and reacquiring it afterward. Fixes: 4fd6ca9 ("accel/amdxdna: Refactor hardware context destroy routine") Reviewed-by: Maciej Falkowski <maciej.falkowski@linux.intel.com> Signed-off-by: Lizhi Hou <lizhi.hou@amd.com> Link: https://patch.msgid.link/20251107181050.1293125-1-lizhi.hou@amd.com
1 parent 6ff9385 commit ca25834

1 file changed

Lines changed: 4 additions & 2 deletions

File tree

drivers/accel/amdxdna/aie2_ctx.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -690,17 +690,19 @@ void aie2_hwctx_fini(struct amdxdna_hwctx *hwctx)
690690
xdna = hwctx->client->xdna;
691691

692692
XDNA_DBG(xdna, "%s sequence number %lld", hwctx->name, hwctx->priv->seq);
693-
drm_sched_entity_destroy(&hwctx->priv->entity);
694-
695693
aie2_hwctx_wait_for_idle(hwctx);
696694

697695
/* Request fw to destroy hwctx and cancel the rest pending requests */
698696
aie2_release_resource(hwctx);
699697

698+
mutex_unlock(&xdna->dev_lock);
699+
drm_sched_entity_destroy(&hwctx->priv->entity);
700+
700701
/* Wait for all submitted jobs to be completed or canceled */
701702
wait_event(hwctx->priv->job_free_wq,
702703
atomic64_read(&hwctx->job_submit_cnt) ==
703704
atomic64_read(&hwctx->job_free_cnt));
705+
mutex_lock(&xdna->dev_lock);
704706

705707
drm_sched_fini(&hwctx->priv->sched);
706708
aie2_ctx_syncobj_destroy(hwctx);

0 commit comments

Comments
 (0)