Skip to content

Commit

Permalink
Merge branch 'md-suspend-rewrite' into md-next
Browse files Browse the repository at this point in the history
From Yu Kuai, written by Song Liu

Recent tests with raid10 revealed many issues with the following scenarios:

- add or remove disks to the array
- issue io to the array

At first, we fixed each problem independently respect that io can
concurrent with array reconfiguration. However, with more issues reported
continuously, I am hoping to fix these problems thoroughly.

Refer to how block layer protect io with queue reconfiguration (for
example, change elevator):

blk_mq_freeze_queue
-> wait for all io to be done, and prevent new io to be dispatched
// reconfiguration
blk_mq_unfreeze_queue

I think we can do something similar to synchronize io with array
reconfiguration.

Current synchronization works as the following. For the reconfiguration
operation:

1. Hold 'reconfig_mutex';
2. Check that rdev can be added/removed, one condition is that there is no
   IO (for example, check nr_pending).
3. Do the actual operations to add/remove a rdev, one procedure is
   set/clear a pointer to rdev.
4. Check if there is still no IO on this rdev, if not, revert the
   change.

IO path uses rcu_read_lock/unlock() to access rdev.

- rcu is used wrongly;
- There are lots of places involved that old rdev can be read, however,
many places doesn't handle old value correctly;
- Between step 3 and 4, if new io is dispatched, NULL will be read for
the rdev, and data will be lost if step 4 failed.

The new synchronization is similar to blk_mq_freeze_queue(). To add or
remove disk:

1. Suspend the array, that is, stop new IO from being dispatched
   and wait for inflight IO to finish.
2. Add or remove rdevs to array;
3. Resume the array;

IO path doesn't need to change for now, and all rcu implementation can
be removed.

Then main work is divided into 3 steps:

First, first make sure new apis to suspend the array is general:

- make sure suspend array will wait for io to be done(Done by [1]);
- make sure suspend array can be called for all personalities(Done by [2]);
- make sure suspend array can be called at any time(Done by [3]);
- make sure suspend array doesn't rely on 'reconfig_mutex'(PATCH 3-5);

Second replace old apis with new apis(PATCH 6-16). Specifically, the
synchronization is changed from:

  lock reconfig_mutex
  suspend array
  make changes
  resume array
  unlock reconfig_mutex

to:
   suspend array
   lock reconfig_mutex
   make changes
   unlock reconfig_mutex
   resume array

Finally, for the remain path that involved reconfiguration, suspend the
array first(PATCH 11,12, [4] and PATCH 17):

Preparatory work:
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
[3] https://lore.kernel.org/all/[email protected]/
[4] https://lore.kernel.org/all/[email protected]/

* md-suspend-rewrite:
  md: rename __mddev_suspend/resume() back to mddev_suspend/resume()
  md: remove old apis to suspend the array
  md: suspend array in md_start_sync() if array need reconfiguration
  md/raid5: replace suspend with quiesce() callback
  md/md-linear: cleanup linear_add()
  md: cleanup mddev_create/destroy_serial_pool()
  md: use new apis to suspend array before mddev_create/destroy_serial_pool
  md: use new apis to suspend array for ioctls involed array reconfiguration
  md: use new apis to suspend array for adding/removing rdev from state_store()
  md: use new apis to suspend array for sysfs apis
  md/raid5: use new apis to suspend array
  md/raid5-cache: use new apis to suspend array
  md/md-bitmap: use new apis to suspend array for location_store()
  md/dm-raid: use new apis to suspend array
  md: add new helpers to suspend/resume and lock/unlock array
  md: add new helpers to suspend/resume array
  md: replace is_md_suspended() with 'mddev->suspended' in md_check_recovery()
  md/raid5-cache: use READ_ONCE/WRITE_ONCE for 'conf->log'
  md: use READ_ONCE/WRITE_ONCE for 'suspend_lo' and 'suspend_hi'
  • Loading branch information
liu-song-6 committed Oct 11, 2023
2 parents 9e55a22 + 2b16a52 commit 9164e4a
Show file tree
Hide file tree
Showing 8 changed files with 226 additions and 204 deletions.
10 changes: 3 additions & 7 deletions drivers/md/dm-raid.c
Original file line number Diff line number Diff line change
Expand Up @@ -3244,7 +3244,7 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
set_bit(MD_RECOVERY_FROZEN, &rs->md.recovery);

/* Has to be held on running the array */
mddev_lock_nointr(&rs->md);
mddev_suspend_and_lock_nointr(&rs->md);
r = md_run(&rs->md);
rs->md.in_sync = 0; /* Assume already marked dirty */
if (r) {
Expand All @@ -3268,7 +3268,6 @@ static int raid_ctr(struct dm_target *ti, unsigned int argc, char **argv)
}
}

mddev_suspend(&rs->md);
set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags);

/* Try to adjust the raid4/5/6 stripe cache size to the stripe size */
Expand Down Expand Up @@ -3798,9 +3797,7 @@ static void raid_postsuspend(struct dm_target *ti)
if (!test_bit(MD_RECOVERY_FROZEN, &rs->md.recovery))
md_stop_writes(&rs->md);

mddev_lock_nointr(&rs->md);
mddev_suspend(&rs->md);
mddev_unlock(&rs->md);
mddev_suspend(&rs->md, false);
}
}

Expand Down Expand Up @@ -4059,8 +4056,7 @@ static void raid_resume(struct dm_target *ti)
clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery);
mddev->ro = 0;
mddev->in_sync = 0;
mddev_resume(mddev);
mddev_unlock(mddev);
mddev_unlock_and_resume(mddev);
}
}

Expand Down
4 changes: 2 additions & 2 deletions drivers/md/md-autodetect.c
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
return;
}

err = mddev_lock(mddev);
err = mddev_suspend_and_lock(mddev);
if (err) {
pr_err("md: failed to lock array %s\n", name);
goto out_mddev_put;
Expand Down Expand Up @@ -221,7 +221,7 @@ static void __init md_setup_drive(struct md_setup_args *args)
if (err)
pr_warn("md: starting %s failed\n", name);
out_unlock:
mddev_unlock(mddev);
mddev_unlock_and_resume(mddev);
out_mddev_put:
mddev_put(mddev);
}
Expand Down
18 changes: 8 additions & 10 deletions drivers/md/md-bitmap.c
Original file line number Diff line number Diff line change
Expand Up @@ -1861,7 +1861,7 @@ void md_bitmap_destroy(struct mddev *mddev)

md_bitmap_wait_behind_writes(mddev);
if (!mddev->serialize_policy)
mddev_destroy_serial_pool(mddev, NULL, true);
mddev_destroy_serial_pool(mddev, NULL);

mutex_lock(&mddev->bitmap_info.mutex);
spin_lock(&mddev->lock);
Expand Down Expand Up @@ -1977,7 +1977,7 @@ int md_bitmap_load(struct mddev *mddev)
goto out;

rdev_for_each(rdev, mddev)
mddev_create_serial_pool(mddev, rdev, true);
mddev_create_serial_pool(mddev, rdev);

if (mddev_is_clustered(mddev))
md_cluster_ops->load_bitmaps(mddev, mddev->bitmap_info.nodes);
Expand Down Expand Up @@ -2348,11 +2348,10 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
{
int rv;

rv = mddev_lock(mddev);
rv = mddev_suspend_and_lock(mddev);
if (rv)
return rv;

mddev_suspend(mddev);
if (mddev->pers) {
if (mddev->recovery || mddev->sync_thread) {
rv = -EBUSY;
Expand Down Expand Up @@ -2429,8 +2428,7 @@ location_store(struct mddev *mddev, const char *buf, size_t len)
}
rv = 0;
out:
mddev_resume(mddev);
mddev_unlock(mddev);
mddev_unlock_and_resume(mddev);
if (rv)
return rv;
return len;
Expand Down Expand Up @@ -2539,7 +2537,7 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
if (backlog > COUNTER_MAX)
return -EINVAL;

rv = mddev_lock(mddev);
rv = mddev_suspend_and_lock(mddev);
if (rv)
return rv;

Expand All @@ -2564,16 +2562,16 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
if (!backlog && mddev->serial_info_pool) {
/* serial_info_pool is not needed if backlog is zero */
if (!mddev->serialize_policy)
mddev_destroy_serial_pool(mddev, NULL, false);
mddev_destroy_serial_pool(mddev, NULL);
} else if (backlog && !mddev->serial_info_pool) {
/* serial_info_pool is needed since backlog is not zero */
rdev_for_each(rdev, mddev)
mddev_create_serial_pool(mddev, rdev, false);
mddev_create_serial_pool(mddev, rdev);
}
if (old_mwb != backlog)
md_bitmap_update_sb(mddev->bitmap);

mddev_unlock(mddev);
mddev_unlock_and_resume(mddev);
return len;
}

Expand Down
2 changes: 0 additions & 2 deletions drivers/md/md-linear.c
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
* in linear_congested(), therefore kfree_rcu() is used to free
* oldconf until no one uses it anymore.
*/
mddev_suspend(mddev);
oldconf = rcu_dereference_protected(mddev->private,
lockdep_is_held(&mddev->reconfig_mutex));
mddev->raid_disks++;
Expand All @@ -192,7 +191,6 @@ static int linear_add(struct mddev *mddev, struct md_rdev *rdev)
rcu_assign_pointer(mddev->private, newconf);
md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
set_capacity_and_notify(mddev->gendisk, mddev->array_sectors);
mddev_resume(mddev);
kfree_rcu(oldconf, rcu);
return 0;
}
Expand Down
Loading

0 comments on commit 9164e4a

Please sign in to comment.