Skip to content

Commit 8d355a4

Browse files
Li Nanliu-song-6
authored andcommitted
md/raid10: Do not add spare disk when recovery fails
In raid10_sync_request(), if data cannot be read from any disk for recovery, it will go to 'giveup' and let 'chunks_skipped' + 1. After multiple 'giveup', when 'chunks_skipped >= geo.raid_disks', it will return 'max_sector', indicating that the recovery has been completed. However, the recovery is just aborted and the data remains inconsistent. Fix it by setting mirror->recovery_disabled, which will prevent the spare disk from being added to this mirror. The same issue also exists during resync, it will be fixed afterwards. Signed-off-by: Li Nan <linan122@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230602091839.743798-2-linan666@huaweicloud.com
1 parent 4d8a575 commit 8d355a4

1 file changed

Lines changed: 18 additions & 2 deletions

File tree

drivers/md/raid10.c

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3311,6 +3311,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
33113311
int chunks_skipped = 0;
33123312
sector_t chunk_mask = conf->geo.chunk_mask;
33133313
int page_idx = 0;
3314+
int error_disk = -1;
33143315

33153316
/*
33163317
* Allow skipping a full rebuild for incremental assembly
@@ -3394,8 +3395,21 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
33943395
return reshape_request(mddev, sector_nr, skipped);
33953396

33963397
if (chunks_skipped >= conf->geo.raid_disks) {
3397-
/* if there has been nothing to do on any drive,
3398-
* then there is nothing to do at all..
3398+
pr_err("md/raid10:%s: %s fails\n", mdname(mddev),
3399+
test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");
3400+
if (error_disk >= 0 &&
3401+
!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
3402+
/*
3403+
* recovery fails, set mirrors.recovery_disabled,
3404+
* device shouldn't be added to there.
3405+
*/
3406+
conf->mirrors[error_disk].recovery_disabled =
3407+
mddev->recovery_disabled;
3408+
return 0;
3409+
}
3410+
/*
3411+
* if there has been nothing to do on any drive,
3412+
* then there is nothing to do at all.
33993413
*/
34003414
*skipped = 1;
34013415
return (max_sector - sector_nr) + sectors_skipped;
@@ -3646,6 +3660,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
36463660
mdname(mddev));
36473661
mirror->recovery_disabled
36483662
= mddev->recovery_disabled;
3663+
} else {
3664+
error_disk = i;
36493665
}
36503666
put_buf(r10_bio);
36513667
if (rb2)

0 commit comments

Comments
 (0)