Skip to content

Commit c648548

Browse files
Tomer Tayarogabbay
authored andcommitted
accel/habanalabs/gaudi2: assume hard-reset by FW upon PCIe AXI drain
When a PCIe AXI drain event happens, it is possible that the driver cannot access the device through PCIe, and therefore cannot send a hard-reset request to FW. Starting from FW version 1.13, FW will initiate a hard-reset in such a case without waiting for a reset request from the driver. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
1 parent fbc2a09 commit c648548

2 files changed

Lines changed: 10 additions & 0 deletions

File tree

drivers/accel/habanalabs/common/habanalabs.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3594,6 +3594,14 @@ static inline bool hl_is_fw_sw_ver_below(struct hl_device *hdev, u32 fw_sw_major
35943594
return false;
35953595
}
35963596

3597+
static inline bool hl_is_fw_sw_ver_equal_or_greater(struct hl_device *hdev, u32 fw_sw_major,
3598+
u32 fw_sw_minor)
3599+
{
3600+
return (hdev->fw_sw_major_ver > fw_sw_major ||
3601+
(hdev->fw_sw_major_ver == fw_sw_major &&
3602+
hdev->fw_sw_minor_ver >= fw_sw_minor));
3603+
}
3604+
35973605
/*
35983606
* Kernel module functions that can be accessed by entire module
35993607
*/

drivers/accel/habanalabs/gaudi2/gaudi2.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10007,6 +10007,8 @@ static void gaudi2_handle_eqe(struct hl_device *hdev, struct hl_eq_entry *eq_ent
1000710007
error_count = gaudi2_handle_pcie_drain(hdev, &eq_entry->pcie_drain_ind_data);
1000810008
reset_flags |= HL_DRV_RESET_FW_FATAL_ERR;
1000910009
event_mask |= HL_NOTIFIER_EVENT_GENERAL_HW_ERR;
10010+
if (hl_is_fw_sw_ver_equal_or_greater(hdev, 1, 13))
10011+
is_critical = true;
1001010012
break;
1001110013

1001210014
case GAUDI2_EVENT_PSOC59_RPM_ERROR_OR_DRAIN:

0 commit comments

Comments
 (0)