Skip to content

Commit 28695ca

Browse files
superm1alexdeucher
authored andcommitted
drm/amd: Clean up kfd node on surprise disconnect
When an eGPU is unplugged the KFD topology should also be destroyed for that GPU. This never happens because the fini_sw callbacks never get to run. Run them manually before calling amdgpu_device_ip_fini_early() when a device has already been disconnected. This location is intentionally chosen to make sure that the kfd locking refcount doesn't get incremented unintentionally. Cc: kent.russell@amd.com Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6a23e7b) Cc: stable@vger.kernel.org
1 parent 9cb6278 commit 28695ca

1 file changed

Lines changed: 8 additions & 0 deletions

File tree

drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5063,6 +5063,14 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
50635063

50645064
amdgpu_ttm_set_buffer_funcs_status(adev, false);
50655065

5066+
/*
5067+
* device went through surprise hotplug; we need to destroy topology
5068+
* before ip_fini_early to prevent kfd locking refcount issues by calling
5069+
* amdgpu_amdkfd_suspend()
5070+
*/
5071+
if (drm_dev_is_unplugged(adev_to_drm(adev)))
5072+
amdgpu_amdkfd_device_fini_sw(adev);
5073+
50665074
amdgpu_device_ip_fini_early(adev);
50675075

50685076
amdgpu_irq_fini_hw(adev);

0 commit comments

Comments
 (0)