Skip to content

Commit

Permalink
Merge branch 'for-linus' of git://neil.brown.name/md
Browse files Browse the repository at this point in the history
* 'for-linus' of git://neil.brown.name/md: (53 commits)
  md/raid5 revise rules for when to update metadata during reshape
  md/raid5: minor code cleanups in make_request.
  md: remove CONFIG_MD_RAID_RESHAPE config option.
  md/raid5: be more careful about write ordering when reshaping.
  md: don't display meaningless values in sysfs files resync_start and sync_speed
  md/raid5: allow layout and chunksize to be changed on active array.
  md/raid5: reshape using largest of old and new chunk size
  md/raid5: prepare for allowing reshape to change layout
  md/raid5: prepare for allowing reshape to change chunksize.
  md/raid5: clearly differentiate 'before' and 'after' stripes during reshape.
  Documentation/md.txt update
  md: allow number of drives in raid5 to be reduced
  md/raid5: change reshape-progress measurement to cope with reshaping backwards.
  md: add explicit method to signal the end of a reshape.
  md/raid5: enhance raid5_size to work correctly with negative delta_disks
  md/raid5: drop qd_idx from r6_state
  md/raid6: move raid6 data processing to raid6_pq.ko
  md: raid5 run(): Fix max_degraded for raid level 4.
  md: 'array_size' sysfs attribute
  md: centralize ->array_sectors modifications
  ...
  • Loading branch information
torvalds committed Apr 3, 2009
2 parents 31e6e2d + c8f517c commit 223cdea
Show file tree
Hide file tree
Showing 39 changed files with 2,002 additions and 860 deletions.
37 changes: 30 additions & 7 deletions Documentation/md.txt
Original file line number Diff line number Diff line change
Expand Up @@ -164,15 +164,19 @@ All md devices contain:
raid_disks
a text file with a simple number indicating the number of devices
in a fully functional array. If this is not yet known, the file
will be empty. If an array is being resized (not currently
possible) this will contain the larger of the old and new sizes.
Some raid level (RAID1) allow this value to be set while the
array is active. This will reconfigure the array. Otherwise
it can only be set while assembling an array.
will be empty. If an array is being resized this will contain
the new number of devices.
Some raid levels allow this value to be set while the array is
active. This will reconfigure the array. Otherwise it can only
be set while assembling an array.
A change to this attribute will not be permitted if it would
reduce the size of the array. To reduce the number of drives
in an e.g. raid5, the array size must first be reduced by
setting the 'array_size' attribute.

chunk_size
This is the size if bytes for 'chunks' and is only relevant to
raid levels that involve striping (1,4,5,6,10). The address space
This is the size in bytes for 'chunks' and is only relevant to
raid levels that involve striping (0,4,5,6,10). The address space
of the array is conceptually divided into chunks and consecutive
chunks are striped onto neighbouring devices.
The size should be at least PAGE_SIZE (4k) and should be a power
Expand All @@ -183,6 +187,20 @@ All md devices contain:
simply a number that is interpretted differently by different
levels. It can be written while assembling an array.

array_size
This can be used to artificially constrain the available space in
the array to be less than is actually available on the combined
devices. Writing a number (in Kilobytes) which is less than
the available size will set the size. Any reconfiguration of the
array (e.g. adding devices) will not cause the size to change.
Writing the word 'default' will cause the effective size of the
array to be whatever size is actually available based on
'level', 'chunk_size' and 'component_size'.

This can be used to reduce the size of the array before reducing
the number of devices in a raid4/5/6, or to support external
metadata formats which mandate such clipping.

reshape_position
This is either "none" or a sector number within the devices of
the array where "reshape" is up to. If this is set, the three
Expand All @@ -207,6 +225,11 @@ All md devices contain:
about the array. It can be 0.90 (traditional format), 1.0, 1.1,
1.2 (newer format in varying locations) or "none" indicating that
the kernel isn't managing metadata at all.
Alternately it can be "external:" followed by a string which
is set by user-space. This indicates that metadata is managed
by a user-space program. Any device failure or other event that
requires a metadata update will cause array activity to be
suspended until the event is acknowledged.

resync_start
The point at which resync should start. If no resync is needed,
Expand Down
2 changes: 1 addition & 1 deletion crypto/xor.c
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@

#define BH_TRACE 0
#include <linux/module.h>
#include <linux/raid/md.h>
#include <linux/raid/xor.h>
#include <linux/jiffies.h>
#include <asm/xor.h>

/* The xor routines to use. */
Expand Down
31 changes: 3 additions & 28 deletions drivers/md/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ config MD_RAID10
config MD_RAID456
tristate "RAID-4/RAID-5/RAID-6 mode"
depends on BLK_DEV_MD
select MD_RAID6_PQ
select ASYNC_MEMCPY
select ASYNC_XOR
---help---
Expand Down Expand Up @@ -151,34 +152,8 @@ config MD_RAID456

If unsure, say Y.

config MD_RAID5_RESHAPE
bool "Support adding drives to a raid-5 array"
depends on MD_RAID456
default y
---help---
A RAID-5 set can be expanded by adding extra drives. This
requires "restriping" the array which means (almost) every
block must be written to a different place.

This option allows such restriping to be done while the array
is online.

You will need mdadm version 2.4.1 or later to use this
feature safely. During the early stage of reshape there is
a critical section where live data is being over-written. A
crash during this time needs extra care for recovery. The
newer mdadm takes a copy of the data in the critical section
and will restore it, if necessary, after a crash.

The mdadm usage is e.g.
mdadm --grow /dev/md1 --raid-disks=6
to grow '/dev/md1' to having 6 disks.

Note: The array can only be expanded, not contracted.
There should be enough spares already present to make the new
array workable.

If unsure, say Y.
config MD_RAID6_PQ
tristate

config MD_MULTIPATH
tristate "Multipath I/O support"
Expand Down
16 changes: 9 additions & 7 deletions drivers/md/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -2,20 +2,21 @@
# Makefile for the kernel software RAID and LVM drivers.
#

dm-mod-objs := dm.o dm-table.o dm-target.o dm-linear.o dm-stripe.o \
dm-mod-y += dm.o dm-table.o dm-target.o dm-linear.o dm-stripe.o \
dm-ioctl.o dm-io.o dm-kcopyd.o dm-sysfs.o
dm-multipath-objs := dm-path-selector.o dm-mpath.o
dm-snapshot-objs := dm-snap.o dm-exception-store.o dm-snap-transient.o \
dm-multipath-y += dm-path-selector.o dm-mpath.o
dm-snapshot-y += dm-snap.o dm-exception-store.o dm-snap-transient.o \
dm-snap-persistent.o
dm-mirror-objs := dm-raid1.o
md-mod-objs := md.o bitmap.o
raid456-objs := raid5.o raid6algos.o raid6recov.o raid6tables.o \
dm-mirror-y += dm-raid1.o
md-mod-y += md.o bitmap.o
raid456-y += raid5.o
raid6_pq-y += raid6algos.o raid6recov.o raid6tables.o \
raid6int1.o raid6int2.o raid6int4.o \
raid6int8.o raid6int16.o raid6int32.o \
raid6altivec1.o raid6altivec2.o raid6altivec4.o \
raid6altivec8.o \
raid6mmx.o raid6sse1.o raid6sse2.o
hostprogs-y := mktables
hostprogs-y += mktables

# Note: link order is important. All raid personalities
# and must come before md.o, as they each initialise
Expand All @@ -26,6 +27,7 @@ obj-$(CONFIG_MD_LINEAR) += linear.o
obj-$(CONFIG_MD_RAID0) += raid0.o
obj-$(CONFIG_MD_RAID1) += raid1.o
obj-$(CONFIG_MD_RAID10) += raid10.o
obj-$(CONFIG_MD_RAID6_PQ) += raid6_pq.o
obj-$(CONFIG_MD_RAID456) += raid456.o
obj-$(CONFIG_MD_MULTIPATH) += multipath.o
obj-$(CONFIG_MD_FAULTY) += faulty.o
Expand Down
49 changes: 39 additions & 10 deletions drivers/md/bitmap.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
* wait if count gets too high, wake when it drops to half.
*/

#include <linux/blkdev.h>
#include <linux/module.h>
#include <linux/errno.h>
#include <linux/slab.h>
Expand All @@ -26,8 +27,8 @@
#include <linux/file.h>
#include <linux/mount.h>
#include <linux/buffer_head.h>
#include <linux/raid/md.h>
#include <linux/raid/bitmap.h>
#include "md.h"
#include "bitmap.h"

/* debug macros */

Expand Down Expand Up @@ -111,9 +112,10 @@ static int bitmap_checkpage(struct bitmap *bitmap, unsigned long page, int creat
unsigned char *mappage;

if (page >= bitmap->pages) {
printk(KERN_ALERT
"%s: invalid bitmap page request: %lu (> %lu)\n",
bmname(bitmap), page, bitmap->pages-1);
/* This can happen if bitmap_start_sync goes beyond
* End-of-device while looking for a whole page.
* It is harmless.
*/
return -EINVAL;
}

Expand Down Expand Up @@ -265,7 +267,6 @@ static mdk_rdev_t *next_active_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
list_for_each_continue_rcu(pos, &mddev->disks) {
rdev = list_entry(pos, mdk_rdev_t, same_set);
if (rdev->raid_disk >= 0 &&
test_bit(In_sync, &rdev->flags) &&
!test_bit(Faulty, &rdev->flags)) {
/* this is a usable devices */
atomic_inc(&rdev->nr_pending);
Expand Down Expand Up @@ -297,7 +298,7 @@ static int write_sb_page(struct bitmap *bitmap, struct page *page, int wait)
+ size/512 > 0)
/* bitmap runs in to metadata */
goto bad_alignment;
if (rdev->data_offset + mddev->size*2
if (rdev->data_offset + mddev->dev_sectors
> rdev->sb_start + bitmap->offset)
/* data runs in to bitmap */
goto bad_alignment;
Expand Down Expand Up @@ -570,7 +571,7 @@ static int bitmap_read_sb(struct bitmap *bitmap)
else if (le32_to_cpu(sb->version) < BITMAP_MAJOR_LO ||
le32_to_cpu(sb->version) > BITMAP_MAJOR_HI)
reason = "unrecognized superblock version";
else if (chunksize < PAGE_SIZE)
else if (chunksize < 512)
reason = "bitmap chunksize too small";
else if ((1 << ffz(~chunksize)) != chunksize)
reason = "bitmap chunksize not a power of 2";
Expand Down Expand Up @@ -1306,6 +1307,9 @@ void bitmap_endwrite(struct bitmap *bitmap, sector_t offset, unsigned long secto
PRINTK(KERN_DEBUG "dec write-behind count %d/%d\n",
atomic_read(&bitmap->behind_writes), bitmap->max_write_behind);
}
if (bitmap->mddev->degraded)
/* Never clear bits or update events_cleared when degraded */
success = 0;

while (sectors) {
int blocks;
Expand Down Expand Up @@ -1345,8 +1349,8 @@ void bitmap_endwrite(struct bitmap *bitmap, sector_t offset, unsigned long secto
}
}

int bitmap_start_sync(struct bitmap *bitmap, sector_t offset, int *blocks,
int degraded)
static int __bitmap_start_sync(struct bitmap *bitmap, sector_t offset, int *blocks,
int degraded)
{
bitmap_counter_t *bmc;
int rv;
Expand Down Expand Up @@ -1374,6 +1378,29 @@ int bitmap_start_sync(struct bitmap *bitmap, sector_t offset, int *blocks,
return rv;
}

int bitmap_start_sync(struct bitmap *bitmap, sector_t offset, int *blocks,
int degraded)
{
/* bitmap_start_sync must always report on multiples of whole
* pages, otherwise resync (which is very PAGE_SIZE based) will
* get confused.
* So call __bitmap_start_sync repeatedly (if needed) until
* At least PAGE_SIZE>>9 blocks are covered.
* Return the 'or' of the result.
*/
int rv = 0;
int blocks1;

*blocks = 0;
while (*blocks < (PAGE_SIZE>>9)) {
rv |= __bitmap_start_sync(bitmap, offset,
&blocks1, degraded);
offset += blocks1;
*blocks += blocks1;
}
return rv;
}

void bitmap_end_sync(struct bitmap *bitmap, sector_t offset, int *blocks, int aborted)
{
bitmap_counter_t *bmc;
Expand Down Expand Up @@ -1443,6 +1470,8 @@ void bitmap_cond_end_sync(struct bitmap *bitmap, sector_t sector)
wait_event(bitmap->mddev->recovery_wait,
atomic_read(&bitmap->mddev->recovery_active) == 0);

bitmap->mddev->curr_resync_completed = bitmap->mddev->curr_resync;
set_bit(MD_CHANGE_CLEAN, &bitmap->mddev->flags);
sector &= ~((1ULL << CHUNK_BLOCK_SHIFT(bitmap)) - 1);
s = 0;
while (s < sector && s < bitmap->mddev->resync_max_sectors) {
Expand Down
File renamed without changes.
19 changes: 17 additions & 2 deletions drivers/md/faulty.c
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,10 @@
#define ModeShift 5

#define MaxFault 50
#include <linux/raid/md.h>
#include <linux/blkdev.h>
#include <linux/raid/md_u.h>
#include "md.h"
#include <linux/seq_file.h>


static void faulty_fail(struct bio *bio, int error)
Expand Down Expand Up @@ -280,6 +283,17 @@ static int reconfig(mddev_t *mddev, int layout, int chunk_size)
return 0;
}

static sector_t faulty_size(mddev_t *mddev, sector_t sectors, int raid_disks)
{
WARN_ONCE(raid_disks,
"%s does not support generic reshape\n", __func__);

if (sectors == 0)
return mddev->dev_sectors;

return sectors;
}

static int run(mddev_t *mddev)
{
mdk_rdev_t *rdev;
Expand All @@ -298,7 +312,7 @@ static int run(mddev_t *mddev)
list_for_each_entry(rdev, &mddev->disks, same_set)
conf->rdev = rdev;

mddev->array_sectors = mddev->size * 2;
md_set_array_sectors(mddev, faulty_size(mddev, 0, 0));
mddev->private = conf;

reconfig(mddev, mddev->layout, -1);
Expand All @@ -325,6 +339,7 @@ static struct mdk_personality faulty_personality =
.stop = stop,
.status = status,
.reconfig = reconfig,
.size = faulty_size,
};

static int __init raid_init(void)
Expand Down
25 changes: 20 additions & 5 deletions drivers/md/linear.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@
Software Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/

#include <linux/raid/linear.h>
#include <linux/blkdev.h>
#include <linux/raid/md_u.h>
#include <linux/seq_file.h>
#include "md.h"
#include "linear.h"

/*
* find which device holds a particular offset
Expand Down Expand Up @@ -97,6 +101,16 @@ static int linear_congested(void *data, int bits)
return ret;
}

static sector_t linear_size(mddev_t *mddev, sector_t sectors, int raid_disks)
{
linear_conf_t *conf = mddev_to_conf(mddev);

WARN_ONCE(sectors || raid_disks,
"%s does not support generic reshape\n", __func__);

return conf->array_sectors;
}

static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
{
linear_conf_t *conf;
Expand Down Expand Up @@ -135,8 +149,8 @@ static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
mddev->queue->max_sectors > (PAGE_SIZE>>9))
blk_queue_max_sectors(mddev->queue, PAGE_SIZE>>9);

disk->num_sectors = rdev->size * 2;
conf->array_sectors += rdev->size * 2;
disk->num_sectors = rdev->sectors;
conf->array_sectors += rdev->sectors;

cnt++;
}
Expand Down Expand Up @@ -249,7 +263,7 @@ static int linear_run (mddev_t *mddev)
if (!conf)
return 1;
mddev->private = conf;
mddev->array_sectors = conf->array_sectors;
md_set_array_sectors(mddev, linear_size(mddev, 0, 0));

blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec);
mddev->queue->unplug_fn = linear_unplug;
Expand Down Expand Up @@ -283,7 +297,7 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev)
newconf->prev = mddev_to_conf(mddev);
mddev->private = newconf;
mddev->raid_disks++;
mddev->array_sectors = newconf->array_sectors;
md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
set_capacity(mddev->gendisk, mddev->array_sectors);
return 0;
}
Expand Down Expand Up @@ -381,6 +395,7 @@ static struct mdk_personality linear_personality =
.stop = linear_stop,
.status = linear_status,
.hot_add_disk = linear_add,
.size = linear_size,
};

static int __init linear_init (void)
Expand Down
2 changes: 0 additions & 2 deletions include/linux/raid/linear.h → drivers/md/linear.h
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
#ifndef _LINEAR_H
#define _LINEAR_H

#include <linux/raid/md.h>

struct dev_info {
mdk_rdev_t *rdev;
sector_t num_sectors;
Expand Down
Loading

0 comments on commit 223cdea

Please sign in to comment.