Skip to content

Commit 9164e4a

Browse files
committed
Merge branch 'md-suspend-rewrite' into md-next
From Yu Kuai, written by Song Liu Recent tests with raid10 revealed many issues with the following scenarios: - add or remove disks to the array - issue io to the array At first, we fixed each problem independently respect that io can concurrent with array reconfiguration. However, with more issues reported continuously, I am hoping to fix these problems thoroughly. Refer to how block layer protect io with queue reconfiguration (for example, change elevator): blk_mq_freeze_queue -> wait for all io to be done, and prevent new io to be dispatched // reconfiguration blk_mq_unfreeze_queue I think we can do something similar to synchronize io with array reconfiguration. Current synchronization works as the following. For the reconfiguration operation: 1. Hold 'reconfig_mutex'; 2. Check that rdev can be added/removed, one condition is that there is no IO (for example, check nr_pending). 3. Do the actual operations to add/remove a rdev, one procedure is set/clear a pointer to rdev. 4. Check if there is still no IO on this rdev, if not, revert the change. IO path uses rcu_read_lock/unlock() to access rdev. - rcu is used wrongly; - There are lots of places involved that old rdev can be read, however, many places doesn't handle old value correctly; - Between step 3 and 4, if new io is dispatched, NULL will be read for the rdev, and data will be lost if step 4 failed. The new synchronization is similar to blk_mq_freeze_queue(). To add or remove disk: 1. Suspend the array, that is, stop new IO from being dispatched and wait for inflight IO to finish. 2. Add or remove rdevs to array; 3. Resume the array; IO path doesn't need to change for now, and all rcu implementation can be removed. Then main work is divided into 3 steps: First, first make sure new apis to suspend the array is general: - make sure suspend array will wait for io to be done(Done by [1]); - make sure suspend array can be called for all personalities(Done by [2]); - make sure suspend array can be called at any time(Done by [3]); - make sure suspend array doesn't rely on 'reconfig_mutex'(PATCH 3-5); Second replace old apis with new apis(PATCH 6-16). Specifically, the synchronization is changed from: lock reconfig_mutex suspend array make changes resume array unlock reconfig_mutex to: suspend array lock reconfig_mutex make changes unlock reconfig_mutex resume array Finally, for the remain path that involved reconfiguration, suspend the array first(PATCH 11,12, [4] and PATCH 17): Preparatory work: [1] https://lore.kernel.org/all/20230621165110.1498313-1-yukuai1@huaweicloud.com/ [2] https://lore.kernel.org/all/20230628012931.88911-2-yukuai1@huaweicloud.com/ [3] https://lore.kernel.org/all/20230825030956.1527023-1-yukuai1@huaweicloud.com/ [4] https://lore.kernel.org/all/20230825031622.1530464-1-yukuai1@huaweicloud.com/ * md-suspend-rewrite: md: rename __mddev_suspend/resume() back to mddev_suspend/resume() md: remove old apis to suspend the array md: suspend array in md_start_sync() if array need reconfiguration md/raid5: replace suspend with quiesce() callback md/md-linear: cleanup linear_add() md: cleanup mddev_create/destroy_serial_pool() md: use new apis to suspend array before mddev_create/destroy_serial_pool md: use new apis to suspend array for ioctls involed array reconfiguration md: use new apis to suspend array for adding/removing rdev from state_store() md: use new apis to suspend array for sysfs apis md/raid5: use new apis to suspend array md/raid5-cache: use new apis to suspend array md/md-bitmap: use new apis to suspend array for location_store() md/dm-raid: use new apis to suspend array md: add new helpers to suspend/resume and lock/unlock array md: add new helpers to suspend/resume array md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery() md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log' md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
2 parents 9e55a22 + 2b16a52 commit 9164e4a

8 files changed

Lines changed: 226 additions & 204 deletions

File tree

drivers/md/dm-raid.c

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3244,7 +3244,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
32443244
set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);
32453245

32463246
/* Has to be held on running the array */
3247-
mddev_lock_nointr(&rs->md);
3247+
mddev_suspend_and_lock_nointr(&rs->md);
32483248
r = md_run(&rs->md);
32493249
rs->md.in_sync = 0; /* Assume already marked dirty */
32503250
if (r) {
@@ -3268,7 +3268,6 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
32683268
}
32693269
}
32703270

3271-
mddev_suspend(&rs->md);
32723271
set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags);
32733272

32743273
/* Try to adjust the raid4/5/6 stripe cache size to the stripe size */
@@ -3798,9 +3797,7 @@ static void raid_postsuspend(struct dm_target *ti)
37983797
if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
37993798
md_stop_writes(&rs->md);
38003799

3801-
mddev_lock_nointr(&rs->md);
3802-
mddev_suspend(&rs->md);
3803-
mddev_unlock(&rs->md);
3800+
mddev_suspend(&rs->md, false);
38043801
}
38053802
}
38063803

@@ -4059,8 +4056,7 @@ static void raid_resume(struct dm_target *ti)
40594056
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
40604057
mddev->ro = 0;
40614058
mddev->in_sync = 0;
4062-
mddev_resume(mddev);
4063-
mddev_unlock(mddev);
4059+
mddev_unlock_and_resume(mddev);
40644060
}
40654061
}
40664062

drivers/md/md-autodetect.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
175175
return;
176176
}
177177

178-
err = mddev_lock(mddev);
178+
err = mddev_suspend_and_lock(mddev);
179179
if (err) {
180180
pr_err("md: failed to lock array %s\n", name);
181181
goto out_mddev_put;
@@ -221,7 +221,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
221221
if (err)
222222
pr_warn("md: starting %s failed\n", name);
223223
out_unlock:
224-
mddev_unlock(mddev);
224+
mddev_unlock_and_resume(mddev);
225225
out_mddev_put:
226226
mddev_put(mddev);
227227
}

drivers/md/md-bitmap.c

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1861,7 +1861,7 @@ void md_bitmap_destroy(struct mddev *mddev)
18611861

18621862
md_bitmap_wait_behind_writes(mddev);
18631863
if (!mddev->serialize_policy)
1864-
mddev_destroy_serial_pool(mddev, NULL, true);
1864+
mddev_destroy_serial_pool(mddev, NULL);
18651865

18661866
mutex_lock(&mddev->bitmap_info.mutex);
18671867
spin_lock(&mddev->lock);
@@ -1977,7 +1977,7 @@ int md_bitmap_load(struct mddev *mddev)
19771977
goto out;
19781978

19791979
rdev_for_each(rdev, mddev)
1980-
mddev_create_serial_pool(mddev, rdev, true);
1980+
mddev_create_serial_pool(mddev, rdev);
19811981

19821982
if (mddev_is_clustered(mddev))
19831983
md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes);
@@ -2348,11 +2348,10 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
23482348
{
23492349
int rv;
23502350

2351-
rv = mddev_lock(mddev);
2351+
rv = mddev_suspend_and_lock(mddev);
23522352
if (rv)
23532353
return rv;
23542354

2355-
mddev_suspend(mddev);
23562355
if (mddev->pers) {
23572356
if (mddev->recovery || mddev->sync_thread) {
23582357
rv = -EBUSY;
@@ -2429,8 +2428,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
24292428
}
24302429
rv = 0;
24312430
out:
2432-
mddev_resume(mddev);
2433-
mddev_unlock(mddev);
2431+
mddev_unlock_and_resume(mddev);
24342432
if (rv)
24352433
return rv;
24362434
return len;
@@ -2539,7 +2537,7 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
25392537
if (backlog > COUNTER_MAX)
25402538
return -EINVAL;
25412539

2542-
rv = mddev_lock(mddev);
2540+
rv = mddev_suspend_and_lock(mddev);
25432541
if (rv)
25442542
return rv;
25452543

@@ -2564,16 +2562,16 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
25642562
if (!backlog && mddev->serial_info_pool) {
25652563
/* serial_info_pool is not needed if backlog is zero */
25662564
if (!mddev->serialize_policy)
2567-
mddev_destroy_serial_pool(mddev, NULL, false);
2565+
mddev_destroy_serial_pool(mddev, NULL);
25682566
} else if (backlog && !mddev->serial_info_pool) {
25692567
/* serial_info_pool is needed since backlog is not zero */
25702568
rdev_for_each(rdev, mddev)
2571-
mddev_create_serial_pool(mddev, rdev, false);
2569+
mddev_create_serial_pool(mddev, rdev);
25722570
}
25732571
if (old_mwb != backlog)
25742572
md_bitmap_update_sb(mddev->bitmap);
25752573

2576-
mddev_unlock(mddev);
2574+
mddev_unlock_and_resume(mddev);
25772575
return len;
25782576
}
25792577

drivers/md/md-linear.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
183183
* in linear_congested(), therefore kfree_rcu() is used to free
184184
* oldconf until no one uses it anymore.
185185
*/
186-
mddev_suspend(mddev);
187186
oldconf = rcu_dereference_protected(mddev->private,
188187
lockdep_is_held(&mddev->reconfig_mutex));
189188
mddev->raid_disks++;
@@ -192,7 +191,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
192191
rcu_assign_pointer(mddev->private, newconf);
193192
md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
194193
set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
195-
mddev_resume(mddev);
196194
kfree_rcu(oldconf, rcu);
197195
return 0;
198196
}

0 commit comments

Comments
 (0)