Skip to content

Commit b4bb6da

Browse files
Brian Kaomartinkpetersen
authored andcommitted
scsi: ufs: core: Fix EH failure after W-LUN resume error
When a W-LUN resume fails, its parent devices in the SCSI hierarchy, including the scsi_target, may be runtime suspended. Subsequently, the error handler in ufshcd_recover_pm_error() fails to set the W-LUN device back to active because the parent target is not active. This results in the following errors: google-ufshcd 3c2d0000.ufs: ufshcd_err_handler started; HBA state eh_fatal; ... ufs_device_wlun 0:0:0:49488: START_STOP failed for power mode: 1, result 40000 ufs_device_wlun 0:0:0:49488: ufshcd_wl_runtime_resume failed: -5 ... ufs_device_wlun 0:0:0:49488: runtime PM trying to activate child device 0:0:0:49488 but parent (target0:0:0) is not active Address this by: 1. Ensuring the W-LUN's parent scsi_target is runtime resumed before attempting to set the W-LUN to active within ufshcd_recover_pm_error(). 2. Explicitly checking for power.runtime_error on the HBA and W-LUN devices before calling pm_runtime_set_active() to clear the error state. 3. Adding pm_runtime_get_sync(hba->dev) in ufshcd_err_handling_prepare() to ensure the HBA itself is active during error recovery, even if a child device resume failed. These changes ensure the device power states are managed correctly during error recovery. Signed-off-by: Brian Kao <powenkao@google.com> Tested-by: Brian Kao <powenkao@google.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251112063214.1195761-1-powenkao@google.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
1 parent 82f78ac commit b4bb6da

1 file changed

Lines changed: 28 additions & 8 deletions

File tree

drivers/ufs/core/ufshcd.c

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6504,6 +6504,11 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
65046504

65056505
static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
65066506
{
6507+
/*
6508+
* A WLUN resume failure could potentially lead to the HBA being
6509+
* runtime suspended, so take an extra reference on hba->dev.
6510+
*/
6511+
pm_runtime_get_sync(hba->dev);
65076512
ufshcd_rpm_get_sync(hba);
65086513
if (pm_runtime_status_suspended(&hba->ufs_device_wlun->sdev_gendev) ||
65096514
hba->is_sys_suspended) {
@@ -6543,6 +6548,7 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
65436548
if (ufshcd_is_clkscaling_supported(hba))
65446549
ufshcd_clk_scaling_suspend(hba, false);
65456550
ufshcd_rpm_put(hba);
6551+
pm_runtime_put(hba->dev);
65466552
}
65476553

65486554
static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
@@ -6557,28 +6563,42 @@ static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba)
65576563
#ifdef CONFIG_PM
65586564
static void ufshcd_recover_pm_error(struct ufs_hba *hba)
65596565
{
6566+
struct scsi_target *starget = hba->ufs_device_wlun->sdev_target;
65606567
struct Scsi_Host *shost = hba->host;
65616568
struct scsi_device *sdev;
65626569
struct request_queue *q;
6563-
int ret;
6570+
bool resume_sdev_queues = false;
65646571

65656572
hba->is_sys_suspended = false;
6573+
65666574
/*
6567-
* Set RPM status of wlun device to RPM_ACTIVE,
6568-
* this also clears its runtime error.
6575+
* Ensure the parent's error status is cleared before proceeding
6576+
* to the child, as the parent must be active to activate the child.
65696577
*/
6570-
ret = pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev);
6578+
if (hba->dev->power.runtime_error) {
6579+
/* hba->dev has no functional parent thus simplily set RPM_ACTIVE */
6580+
pm_runtime_set_active(hba->dev);
6581+
resume_sdev_queues = true;
6582+
}
6583+
6584+
if (hba->ufs_device_wlun->sdev_gendev.power.runtime_error) {
6585+
/*
6586+
* starget, parent of wlun, might be suspended if wlun resume failed.
6587+
* Make sure parent is resumed before set child (wlun) active.
6588+
*/
6589+
pm_runtime_get_sync(&starget->dev);
6590+
pm_runtime_set_active(&hba->ufs_device_wlun->sdev_gendev);
6591+
pm_runtime_put_sync(&starget->dev);
6592+
resume_sdev_queues = true;
6593+
}
65716594

6572-
/* hba device might have a runtime error otherwise */
6573-
if (ret)
6574-
ret = pm_runtime_set_active(hba->dev);
65756595
/*
65766596
* If wlun device had runtime error, we also need to resume those
65776597
* consumer scsi devices in case any of them has failed to be
65786598
* resumed due to supplier runtime resume failure. This is to unblock
65796599
* blk_queue_enter in case there are bios waiting inside it.
65806600
*/
6581-
if (!ret) {
6601+
if (resume_sdev_queues) {
65826602
shost_for_each_device(sdev, shost) {
65836603
q = sdev->request_queue;
65846604
if (q->dev && (q->rpm_status == RPM_SUSPENDED ||

0 commit comments

Comments
 (0)