Skip to content

Commit 673db05

Browse files
mikechristiemartinkpetersen
authored andcommitted
scsi: target: Fix multiple LUN_RESET handling
This fixes a bug where an initiator thinks a LUN_RESET has cleaned up running commands when it hasn't. The bug was added in commit 51ec502 ("target: Delete tmr from list before processing"). The problem occurs when: 1. We have N I/O cmds running in the target layer spread over 2 sessions. 2. The initiator sends a LUN_RESET for each session. 3. session1's LUN_RESET loops over all the running commands from both sessions and moves them to its local drain_task_list. 4. session2's LUN_RESET does not see the LUN_RESET from session1 because the commit above has it remove itself. session2 also does not see any commands since the other reset moved them off the state lists. 5. sessions2's LUN_RESET will then complete with a successful response. 6. sessions2's inititor believes the running commands on its session are now cleaned up due to the successful response and cleans up the running commands from its side. It then restarts them. 7. The commands do eventually complete on the backend and the target starts to return aborted task statuses for them. The initiator will either throw a invalid ITT error or might accidentally lookup a new task if the ITT has been reallocated already. Fix the bug by reverting the patch, and serialize the execution of LUN_RESETs and Preempt and Aborts. Also prevent us from waiting on LUN_RESETs in core_tmr_drain_tmr_list, because it turns out the original patch fixed a bug that was not mentioned. For LUN_RESET1 core_tmr_drain_tmr_list can see a second LUN_RESET and wait on it. Then the second reset will run core_tmr_drain_tmr_list and see the first reset and wait on it resulting in a deadlock. Fixes: 51ec502 ("target: Delete tmr from list before processing") Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230319015620.96006-8-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
1 parent d8990b5 commit 673db05

3 files changed

Lines changed: 25 additions & 3 deletions

File tree

drivers/target/target_core_device.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -782,6 +782,7 @@ struct se_device *target_alloc_device(struct se_hba *hba, const char *name)
782782
spin_lock_init(&dev->t10_alua.lba_map_lock);
783783

784784
INIT_WORK(&dev->delayed_cmd_work, target_do_delayed_work);
785+
mutex_init(&dev->lun_reset_mutex);
785786

786787
dev->t10_wwn.t10_dev = dev;
787788
/*

drivers/target/target_core_tmr.c

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -188,14 +188,23 @@ static void core_tmr_drain_tmr_list(
188188
* LUN_RESET tmr..
189189
*/
190190
spin_lock_irqsave(&dev->se_tmr_lock, flags);
191-
if (tmr)
192-
list_del_init(&tmr->tmr_list);
193191
list_for_each_entry_safe(tmr_p, tmr_pp, &dev->dev_tmr_list, tmr_list) {
192+
if (tmr_p == tmr)
193+
continue;
194+
194195
cmd = tmr_p->task_cmd;
195196
if (!cmd) {
196197
pr_err("Unable to locate struct se_cmd for TMR\n");
197198
continue;
198199
}
200+
201+
/*
202+
* We only execute one LUN_RESET at a time so we can't wait
203+
* on them below.
204+
*/
205+
if (tmr_p->function == TMR_LUN_RESET)
206+
continue;
207+
199208
/*
200209
* If this function was called with a valid pr_res_key
201210
* parameter (eg: for PROUT PREEMPT_AND_ABORT service action
@@ -379,14 +388,25 @@ int core_tmr_lun_reset(
379388
tmr_nacl->initiatorname);
380389
}
381390
}
391+
392+
393+
/*
394+
* We only allow one reset or preempt and abort to execute at a time
395+
* to prevent one call from claiming all the cmds causing a second
396+
* call from returning while cmds it should have waited on are still
397+
* running.
398+
*/
399+
mutex_lock(&dev->lun_reset_mutex);
400+
382401
pr_debug("LUN_RESET: %s starting for [%s], tas: %d\n",
383402
(preempt_and_abort_list) ? "Preempt" : "TMR",
384403
dev->transport->name, tas);
385-
386404
core_tmr_drain_tmr_list(dev, tmr, preempt_and_abort_list);
387405
core_tmr_drain_state_list(dev, prout_cmd, tmr_sess, tas,
388406
preempt_and_abort_list);
389407

408+
mutex_unlock(&dev->lun_reset_mutex);
409+
390410
/*
391411
* Clear any legacy SPC-2 reservation when called during
392412
* LOGICAL UNIT RESET

include/target/target_core_base.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -872,6 +872,7 @@ struct se_device {
872872
struct rcu_head rcu_head;
873873
int queue_cnt;
874874
struct se_device_queue *queues;
875+
struct mutex lun_reset_mutex;
875876
};
876877

877878
struct target_opcode_descriptor {

0 commit comments

Comments
 (0)