Commit 465f1db
drm/xe/devcoredump: Defer devcoredump initialization during probe
Doing devcoredump initializing before GT though look harmless, it leads
to problem during driver unbind. Because of this order, GT/Engine
release functions will be called before xe devcoredump release function
(xe_driver_devcoredump_fini) leading to the following kernel crash[1]
because the devcoredump functions might still use GT/Engine
datastructures after those are freed.
The following crash is observed while running the IGT
xe_wedged@wedged-at-any-timeout. The test forces a wedged state by
submitting a workload which hangs. Then does a unbind/rebind of the
driver to recover from the wedged state.
The hanged workload leads to a devcoredump. The following crash is
noticed when the devcoredump capture races with the driver unbind.
During driver unbind, the release function hw_engine_fini() will be
called which assigns NULL to hwe->gt. But the same data structure is
accessed during the coredump capture in the function
xe_engine_snapshot_print by reading snapshot->hwe->gt.
With this patch, we make sure the devcoredump is stopped before
deinitializing the core driver functions.
[1]:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe]
RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe]
Call Trace:
<TASK>
? drm_printf+0x64/0x90
__xe_devcoredump_read+0x23f/0x2d0 [xe]
? __pfx___drm_printfn_coredump+0x10/0x10
? __pfx___drm_puts_coredump+0x10/0x10
xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe]
process_one_work+0x22e/0x6f0
worker_thread+0x1e8/0x3d0
? __pfx_worker_thread+0x10/0x10
kthread+0x11f/0x250
? __pfx_kthread+0x10/0x10
ret_from_fork+0x47/0x70
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
v2: Detailed commit description (Rodrigo)
v3: FIXME added (Rodrigo, Stuart)
Fixes: 4209d63 ("drm/xe: Remove devcoredump during driver release")
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 1fdc4c3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>1 parent df9bdd4 commit 465f1db
2 files changed
Lines changed: 10 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
802 | 802 | | |
803 | 803 | | |
804 | 804 | | |
805 | | - | |
806 | | - | |
807 | | - | |
808 | | - | |
809 | 805 | | |
810 | 806 | | |
811 | 807 | | |
| |||
870 | 866 | | |
871 | 867 | | |
872 | 868 | | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
873 | 873 | | |
874 | 874 | | |
875 | 875 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1817 | 1817 | | |
1818 | 1818 | | |
1819 | 1819 | | |
| 1820 | + | |
| 1821 | + | |
| 1822 | + | |
| 1823 | + | |
| 1824 | + | |
| 1825 | + | |
1820 | 1826 | | |
1821 | 1827 | | |
1822 | 1828 | | |
| |||
0 commit comments