Skip to content

Commit f64fa33

Browse files
Farah-kassabriogabbay
authored andcommitted
accel/habanalabs: add pcie reset prepare/done hooks
When working on a bare-metal system, if FLR will happen the firmware will handle it and driver will have no knowledge of it, and this will cause two issues: 1.The driver will be in operational state while it should be in reset. This will cause the heartbeat mechanism to keep sending messages to FW while pci device is in reset. Eventually heartbeat will fail and the device will end up in non-operational state. 2. After FW handles the FLR, and due to the reset it'll go back to preboot stage, and driver need to perform hard reset in order to load the boot fit binary. This patch will add reset_prepare hook that will set the device to be in disabled state, so it'll be not operational, and also reset_done hook which will be called after the actual FLR handling, then it will perform hard reset. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
1 parent a0a2895 commit f64fa33

1 file changed

Lines changed: 34 additions & 0 deletions

File tree

drivers/accel/habanalabs/common/habanalabs_drv.c

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -670,6 +670,38 @@ static pci_ers_result_t hl_pci_err_slot_reset(struct pci_dev *pdev)
670670
return PCI_ERS_RESULT_RECOVERED;
671671
}
672672

673+
static void hl_pci_reset_prepare(struct pci_dev *pdev)
674+
{
675+
struct hl_device *hdev;
676+
677+
hdev = pci_get_drvdata(pdev);
678+
if (!hdev)
679+
return;
680+
681+
hdev->disabled = true;
682+
}
683+
684+
static void hl_pci_reset_done(struct pci_dev *pdev)
685+
{
686+
struct hl_device *hdev;
687+
u32 flags;
688+
689+
hdev = pci_get_drvdata(pdev);
690+
if (!hdev)
691+
return;
692+
693+
/*
694+
* Schedule a thread to trigger hard reset.
695+
* The reason for this handler, is for rare cases where the driver is up
696+
* and FLR occurs. This is valid only when working with no VM, so FW handles FLR
697+
* and resets the device. FW will go back preboot stage, so driver needs to perform
698+
* hard reset in order to load FW fit again.
699+
*/
700+
flags = HL_DRV_RESET_HARD | HL_DRV_RESET_BYPASS_REQ_TO_FW;
701+
702+
hl_device_reset(hdev, flags);
703+
}
704+
673705
static const struct dev_pm_ops hl_pm_ops = {
674706
.suspend = hl_pmops_suspend,
675707
.resume = hl_pmops_resume,
@@ -679,6 +711,8 @@ static const struct pci_error_handlers hl_pci_err_handler = {
679711
.error_detected = hl_pci_err_detected,
680712
.slot_reset = hl_pci_err_slot_reset,
681713
.resume = hl_pci_err_resume,
714+
.reset_prepare = hl_pci_reset_prepare,
715+
.reset_done = hl_pci_reset_done,
682716
};
683717

684718
static struct pci_driver hl_pci_driver = {

0 commit comments

Comments
 (0)