Commit 4f6aaad
scsi: qla2xxx: Fix lost interrupts with qlini_mode=disabled
When qla2xxx is loaded with qlini_mode=disabled,
ha->flags.disable_msix_handshake is used before it is set, resulting in
the wrong interrupt handler being used on certain HBAs
(qla2xxx_msix_rsp_q_hs() is used when qla2xxx_msix_rsp_q() should be
used). The only difference between these two interrupt handlers is that
the _hs() version writes to a register to clear the "RISC" interrupt,
whereas the other version does not. So this bug results in the RISC
interrupt being cleared when it should not be. This occasionally causes
a different interrupt handler qla24xx_msix_default() for a different
vector to see ((stat & HSRX_RISC_INT) == 0) and ignore its interrupt,
which then causes problems like:
qla2xxx [0000:02:00.0]-d04c:6: MBX Command timeout for cmd 20,
iocontrol=8 jiffies=1090c0300 mb[0-3]=[0x4000 0x0 0x40 0xda] mb7 0x500
host_status 0x40000010 hccr 0x3f00
qla2xxx [0000:02:00.0]-101e:6: Mailbox cmd timeout occurred, cmd=0x20,
mb[0]=0x20. Scheduling ISP abort
(the cmd varies; sometimes it is 0x20, 0x22, 0x54, 0x5a, 0x5d, or 0x6a)
This problem can be reproduced with a 16 or 32 Gbps HBA by loading
qla2xxx with qlini_mode=disabled and running a high IOPS test while
triggering frequent RSCN database change events.
While analyzing the problem I discovered that even with
disable_msix_handshake forced to 0, it is not necessary to clear the
RISC interrupt from qla2xxx_msix_rsp_q_hs() (more below). So just
completely remove qla2xxx_msix_rsp_q_hs() and the logic for selecting
it, which also fixes the bug with qlini_mode=disabled.
The test below describes the justification for not needing
qla2xxx_msix_rsp_q_hs():
Force disable_msix_handshake to 0:
qla24xx_config_rings():
if (0 && (ha->fw_attributes & BIT_6) && (IS_MSIX_NACK_CAPABLE(ha)) &&
(ha->flags.msix_enabled)) {
In qla24xx_msix_rsp_q() and qla2xxx_msix_rsp_q_hs(), check:
(rd_reg_dword(®->host_status) & HSRX_RISC_INT)
Count the number of calls to each function with HSRX_RISC_INT set and
the number with HSRX_RISC_INT not set while performing some I/O.
If qla2xxx_msix_rsp_q_hs() clears the RISC interrupt (original code):
qla24xx_msix_rsp_q: 50% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs: 5% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
(# of qla24xx_msix_rsp_q interrupts) * 3
If qla2xxx_msix_rsp_q_hs() does not clear the RISC interrupt (patched
code):
qla24xx_msix_rsp_q: 100% of calls have HSRX_RISC_INT set
qla2xxx_msix_rsp_q_hs: 9% of calls have HSRX_RISC_INT set
(# of qla2xxx_msix_rsp_q_hs interrupts) =
(# of qla24xx_msix_rsp_q interrupts) * 3
In the case of the original code, qla24xx_msix_rsp_q() was seeing
HSRX_RISC_INT set only 50% of the time because qla2xxx_msix_rsp_q_hs()
was clearing it when it shouldn't have been. In the patched code,
qla24xx_msix_rsp_q() sees HSRX_RISC_INT set 100% of the time, which
makes sense if that interrupt handler needs to clear the RISC interrupt
(which it does). qla2xxx_msix_rsp_q_hs() sees HSRX_RISC_INT only 9% of
the time, which is just overlap from the other interrupt during the
high IOPS test.
Tested with SCST on:
QLE2742 FW:v9.08.02 (32 Gbps 2-port)
QLE2694L FW:v9.10.11 (16 Gbps 4-port)
QLE2694L FW:v9.08.02 (16 Gbps 4-port)
QLE2672 FW:v8.07.12 (16 Gbps 2-port)
both initiator and target mode
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Link: https://patch.msgid.link/56d378eb-14ad-49c7-bae9-c649b6c7691e@cybernetics.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>1 parent 8f58fc6 commit 4f6aaad
4 files changed
Lines changed: 5 additions & 34 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3503 | 3503 | | |
3504 | 3504 | | |
3505 | 3505 | | |
3506 | | - | |
3507 | 3506 | | |
3508 | 3507 | | |
3509 | 3508 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
766 | 766 | | |
767 | 767 | | |
768 | 768 | | |
769 | | - | |
| 769 | + | |
770 | 770 | | |
771 | 771 | | |
772 | 772 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4467 | 4467 | | |
4468 | 4468 | | |
4469 | 4469 | | |
4470 | | - | |
4471 | | - | |
4472 | | - | |
4473 | | - | |
4474 | | - | |
4475 | | - | |
4476 | | - | |
4477 | | - | |
4478 | | - | |
4479 | | - | |
4480 | | - | |
4481 | | - | |
4482 | | - | |
4483 | | - | |
4484 | | - | |
4485 | | - | |
4486 | | - | |
4487 | | - | |
4488 | | - | |
4489 | | - | |
4490 | | - | |
4491 | | - | |
4492 | | - | |
4493 | | - | |
4494 | | - | |
4495 | | - | |
4496 | 4470 | | |
4497 | 4471 | | |
4498 | 4472 | | |
| |||
4505 | 4479 | | |
4506 | 4480 | | |
4507 | 4481 | | |
4508 | | - | |
4509 | 4482 | | |
4510 | 4483 | | |
4511 | 4484 | | |
| |||
4792 | 4765 | | |
4793 | 4766 | | |
4794 | 4767 | | |
4795 | | - | |
| 4768 | + | |
4796 | 4769 | | |
4797 | | - | |
| 4770 | + | |
| 4771 | + | |
4798 | 4772 | | |
4799 | 4773 | | |
4800 | 4774 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
899 | 899 | | |
900 | 900 | | |
901 | 901 | | |
902 | | - | |
903 | | - | |
904 | | - | |
| 902 | + | |
905 | 903 | | |
906 | 904 | | |
907 | 905 | | |
| |||
0 commit comments