Skip to content

Commit 0288c3e

Browse files
jbrandebkuba-moo
authored andcommitted
ice: reset first in crash dump kernels
When the system boots into the crash dump kernel after a panic, the ice networking device may still have pending transactions that can cause errors or machine checks when the device is re-enabled. This can prevent the crash dump kernel from loading the driver or collecting the crash data. To avoid this issue, perform a function level reset (FLR) on the ice device via PCIe config space before enabling it on the crash kernel. This will clear any outstanding transactions and stop all queues and interrupts. Restore the config space after the FLR, otherwise it was found in testing that the driver wouldn't load successfully. The following sequence causes the original issue: - Load the ice driver with modprobe ice - Enable SR-IOV with 2 VFs: echo 2 > /sys/class/net/eth0/device/sriov_num_vfs - Trigger a crash with echo c > /proc/sysrq-trigger - Load the ice driver again (or let it load automatically) with modprobe ice - The system crashes again during pcim_enable_device() Fixes: 837f08f ("ice: Add basic driver framework for Intel(R) E800 Series") Reported-by: Vishal Agrawal <vagrawal@redhat.com> Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Link: https://lore.kernel.org/r/20231011233334.336092-3-jacob.e.keller@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
1 parent fc6f716 commit 0288c3e

1 file changed

Lines changed: 15 additions & 0 deletions

File tree

drivers/net/ethernet/intel/ice/ice_main.c

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
77

88
#include <generated/utsrelease.h>
9+
#include <linux/crash_dump.h>
910
#include "ice.h"
1011
#include "ice_base.h"
1112
#include "ice_lib.h"
@@ -5014,6 +5015,20 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
50145015
return -EINVAL;
50155016
}
50165017

5018+
/* when under a kdump kernel initiate a reset before enabling the
5019+
* device in order to clear out any pending DMA transactions. These
5020+
* transactions can cause some systems to machine check when doing
5021+
* the pcim_enable_device() below.
5022+
*/
5023+
if (is_kdump_kernel()) {
5024+
pci_save_state(pdev);
5025+
pci_clear_master(pdev);
5026+
err = pcie_flr(pdev);
5027+
if (err)
5028+
return err;
5029+
pci_restore_state(pdev);
5030+
}
5031+
50175032
/* this driver uses devres, see
50185033
* Documentation/driver-api/driver-model/devres.rst
50195034
*/

0 commit comments

Comments
 (0)