Commit de10506
nvme: fix reconnection fail due to reserved tag allocation
We found a issue on production environment while using NVMe over RDMA,
admin_q reconnect failed forever while remote target and network is ok.
After dig into it, we found it may caused by a ABBA deadlock due to tag
allocation. In my case, the tag was hold by a keep alive request
waiting inside admin_q, as we quiesced admin_q while reset ctrl, so the
request maked as idle and will not process before reset success. As
fabric_q shares tagset with admin_q, while reconnect remote target, we
need a tag for connect command, but the only one reserved tag was held
by keep alive command which waiting inside admin_q. As a result, we
failed to reconnect admin_q forever. In order to fix this issue, I
think we should keep two reserved tags for admin queue.
Fixes: ed01fee ("nvme-fabrics: only reserve a single tag")
Signed-off-by: Chunguang Xu <chunguang.xu@shopee.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>1 parent 2bc9174 commit de10506
2 files changed
Lines changed: 4 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4385 | 4385 | | |
4386 | 4386 | | |
4387 | 4387 | | |
4388 | | - | |
| 4388 | + | |
| 4389 | + | |
4389 | 4390 | | |
4390 | 4391 | | |
4391 | 4392 | | |
| |||
4454 | 4455 | | |
4455 | 4456 | | |
4456 | 4457 | | |
4457 | | - | |
| 4458 | + | |
| 4459 | + | |
4458 | 4460 | | |
4459 | 4461 | | |
4460 | 4462 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | 21 | | |
29 | 22 | | |
30 | 23 | | |
| |||
0 commit comments