Skip to content

Commit

Permalink
Merge tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/ke…
Browse files Browse the repository at this point in the history
…rnel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - Fix DM multipath by restoring full path selector functionality for
   bio-based configurations that don't haave a SCSI device handler.

 - Fix dm-btree removal to ensure non-root btree nodes have at least
   (max_entries / 3) entries. This resolves userspace thin_check
   utility's report of "too few entries in btree_node".

 - Fix both the DM thin-provisioning and dm-clone targets to properly
   flush the data device prior to metadata commit. This resolves the
   potential for inconsistency across a power loss event when the data
   device has a volatile writeback cache.

 - Small documentation fixes to dm-clone and dm-integrity.

* tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  docs: dm-integrity: remove reference to ARC4
  dm thin: Flush data device before committing metadata
  dm thin metadata: Add support for a pre-commit callback
  dm clone: Flush destination device before committing metadata
  dm clone metadata: Use a two phase commit
  dm clone metadata: Track exact changes per transaction
  dm btree: increase rebalance threshold in __rebalance2()
  dm: add dm-clone to the documentation index
  dm mpath: remove harmful bio-based optimization
  • Loading branch information
torvalds committed Dec 13, 2019
2 parents 22ff311 + 7fc979f commit 15da849
Show file tree
Hide file tree
Showing 10 changed files with 248 additions and 84 deletions.
2 changes: 1 addition & 1 deletion Documentation/admin-guide/device-mapper/dm-integrity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ journal_crypt:algorithm(:key) (the key is optional)
Encrypt the journal using given algorithm to make sure that the
attacker can't read the journal. You can use a block cipher here
(such as "cbc(aes)") or a stream cipher (for example "chacha20",
"salsa20", "ctr(aes)" or "ecb(arc4)").
"salsa20" or "ctr(aes)").

The journal contains history of last writes to the block device,
an attacker reading the journal could see the last sector nubmers
Expand Down
1 change: 1 addition & 0 deletions Documentation/admin-guide/device-mapper/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Device Mapper
cache-policies
cache
delay
dm-clone
dm-crypt
dm-dust
dm-flakey
Expand Down
136 changes: 99 additions & 37 deletions drivers/md/dm-clone-metadata.c
Original file line number Diff line number Diff line change
Expand Up @@ -67,23 +67,34 @@ struct superblock_disk {
* To save constantly doing look ups on disk we keep an in core copy of the
* on-disk bitmap, the region_map.
*
* To further reduce metadata I/O overhead we use a second bitmap, the dmap
* (dirty bitmap), which tracks the dirty words, i.e. longs, of the region_map.
* In order to track which regions are hydrated during a metadata transaction,
* we use a second set of bitmaps, the dmap (dirty bitmap), which includes two
* bitmaps, namely dirty_regions and dirty_words. The dirty_regions bitmap
* tracks the regions that got hydrated during the current metadata
* transaction. The dirty_words bitmap tracks the dirty words, i.e. longs, of
* the dirty_regions bitmap.
*
* This allows us to precisely track the regions that were hydrated during the
* current metadata transaction and update the metadata accordingly, when we
* commit the current transaction. This is important because dm-clone should
* only commit the metadata of regions that were properly flushed to the
* destination device beforehand. Otherwise, in case of a crash, we could end
* up with a corrupted dm-clone device.
*
* When a region finishes hydrating dm-clone calls
* dm_clone_set_region_hydrated(), or for discard requests
* dm_clone_cond_set_range(), which sets the corresponding bits in region_map
* and dmap.
*
* During a metadata commit we scan the dmap for dirty region_map words (longs)
* and update accordingly the on-disk metadata. Thus, we don't have to flush to
* disk the whole region_map. We can just flush the dirty region_map words.
* During a metadata commit we scan dmap->dirty_words and dmap->dirty_regions
* and update the on-disk metadata accordingly. Thus, we don't have to flush to
* disk the whole region_map. We can just flush the dirty region_map bits.
*
* We use a dirty bitmap, which is smaller than the original region_map, to
* reduce the amount of memory accesses during a metadata commit. As dm-bitset
* accesses the on-disk bitmap in 64-bit word granularity, there is no
* significant benefit in tracking the dirty region_map bits with a smaller
* granularity.
* We use the helper dmap->dirty_words bitmap, which is smaller than the
* original region_map, to reduce the amount of memory accesses during a
* metadata commit. Moreover, as dm-bitset also accesses the on-disk bitmap in
* 64-bit word granularity, the dirty_words bitmap helps us avoid useless disk
* accesses.
*
* We could update directly the on-disk bitmap, when dm-clone calls either
* dm_clone_set_region_hydrated() or dm_clone_cond_set_range(), buts this
Expand All @@ -92,12 +103,13 @@ struct superblock_disk {
* e.g., in a hooked overwrite bio's completion routine, and further reduce the
* I/O completion latency.
*
* We maintain two dirty bitmaps. During a metadata commit we atomically swap
* the currently used dmap with the unused one. This allows the metadata update
* functions to run concurrently with an ongoing commit.
* We maintain two dirty bitmap sets. During a metadata commit we atomically
* swap the currently used dmap with the unused one. This allows the metadata
* update functions to run concurrently with an ongoing commit.
*/
struct dirty_map {
unsigned long *dirty_words;
unsigned long *dirty_regions;
unsigned int changed;
};

Expand All @@ -115,6 +127,9 @@ struct dm_clone_metadata {
struct dirty_map dmap[2];
struct dirty_map *current_dmap;

/* Protected by lock */
struct dirty_map *committing_dmap;

/*
* In core copy of the on-disk bitmap to save constantly doing look ups
* on disk.
Expand Down Expand Up @@ -461,34 +476,53 @@ static size_t bitmap_size(unsigned long nr_bits)
return BITS_TO_LONGS(nr_bits) * sizeof(long);
}

static int dirty_map_init(struct dm_clone_metadata *cmd)
static int __dirty_map_init(struct dirty_map *dmap, unsigned long nr_words,
unsigned long nr_regions)
{
cmd->dmap[0].changed = 0;
cmd->dmap[0].dirty_words = kvzalloc(bitmap_size(cmd->nr_words), GFP_KERNEL);
dmap->changed = 0;

if (!cmd->dmap[0].dirty_words) {
DMERR("Failed to allocate dirty bitmap");
dmap->dirty_words = kvzalloc(bitmap_size(nr_words), GFP_KERNEL);
if (!dmap->dirty_words)
return -ENOMEM;

dmap->dirty_regions = kvzalloc(bitmap_size(nr_regions), GFP_KERNEL);
if (!dmap->dirty_regions) {
kvfree(dmap->dirty_words);
return -ENOMEM;
}

cmd->dmap[1].changed = 0;
cmd->dmap[1].dirty_words = kvzalloc(bitmap_size(cmd->nr_words), GFP_KERNEL);
return 0;
}

static void __dirty_map_exit(struct dirty_map *dmap)
{
kvfree(dmap->dirty_words);
kvfree(dmap->dirty_regions);
}

static int dirty_map_init(struct dm_clone_metadata *cmd)
{
if (__dirty_map_init(&cmd->dmap[0], cmd->nr_words, cmd->nr_regions)) {
DMERR("Failed to allocate dirty bitmap");
return -ENOMEM;
}

if (!cmd->dmap[1].dirty_words) {
if (__dirty_map_init(&cmd->dmap[1], cmd->nr_words, cmd->nr_regions)) {
DMERR("Failed to allocate dirty bitmap");
kvfree(cmd->dmap[0].dirty_words);
__dirty_map_exit(&cmd->dmap[0]);
return -ENOMEM;
}

cmd->current_dmap = &cmd->dmap[0];
cmd->committing_dmap = NULL;

return 0;
}

static void dirty_map_exit(struct dm_clone_metadata *cmd)
{
kvfree(cmd->dmap[0].dirty_words);
kvfree(cmd->dmap[1].dirty_words);
__dirty_map_exit(&cmd->dmap[0]);
__dirty_map_exit(&cmd->dmap[1]);
}

static int __load_bitset_in_core(struct dm_clone_metadata *cmd)
Expand Down Expand Up @@ -633,21 +667,23 @@ unsigned long dm_clone_find_next_unhydrated_region(struct dm_clone_metadata *cmd
return find_next_zero_bit(cmd->region_map, cmd->nr_regions, start);
}

static int __update_metadata_word(struct dm_clone_metadata *cmd, unsigned long word)
static int __update_metadata_word(struct dm_clone_metadata *cmd,
unsigned long *dirty_regions,
unsigned long word)
{
int r;
unsigned long index = word * BITS_PER_LONG;
unsigned long max_index = min(cmd->nr_regions, (word + 1) * BITS_PER_LONG);

while (index < max_index) {
if (test_bit(index, cmd->region_map)) {
if (test_bit(index, dirty_regions)) {
r = dm_bitset_set_bit(&cmd->bitset_info, cmd->bitset_root,
index, &cmd->bitset_root);

if (r) {
DMERR("dm_bitset_set_bit failed");
return r;
}
__clear_bit(index, dirty_regions);
}
index++;
}
Expand Down Expand Up @@ -721,7 +757,7 @@ static int __flush_dmap(struct dm_clone_metadata *cmd, struct dirty_map *dmap)
if (word == cmd->nr_words)
break;

r = __update_metadata_word(cmd, word);
r = __update_metadata_word(cmd, dmap->dirty_regions, word);

if (r)
return r;
Expand All @@ -743,15 +779,17 @@ static int __flush_dmap(struct dm_clone_metadata *cmd, struct dirty_map *dmap)
return 0;
}

int dm_clone_metadata_commit(struct dm_clone_metadata *cmd)
int dm_clone_metadata_pre_commit(struct dm_clone_metadata *cmd)
{
int r = -EPERM;
int r = 0;
struct dirty_map *dmap, *next_dmap;

down_write(&cmd->lock);

if (cmd->fail_io || dm_bm_is_read_only(cmd->bm))
if (cmd->fail_io || dm_bm_is_read_only(cmd->bm)) {
r = -EPERM;
goto out;
}

/* Get current dirty bitmap */
dmap = cmd->current_dmap;
Expand All @@ -763,7 +801,7 @@ int dm_clone_metadata_commit(struct dm_clone_metadata *cmd)
* The last commit failed, so we don't have a clean dirty-bitmap to
* use.
*/
if (WARN_ON(next_dmap->changed)) {
if (WARN_ON(next_dmap->changed || cmd->committing_dmap)) {
r = -EINVAL;
goto out;
}
Expand All @@ -773,11 +811,33 @@ int dm_clone_metadata_commit(struct dm_clone_metadata *cmd)
cmd->current_dmap = next_dmap;
spin_unlock_irq(&cmd->bitmap_lock);

/*
* No one is accessing the old dirty bitmap anymore, so we can flush
* it.
*/
r = __flush_dmap(cmd, dmap);
/* Set old dirty bitmap as currently committing */
cmd->committing_dmap = dmap;
out:
up_write(&cmd->lock);

return r;
}

int dm_clone_metadata_commit(struct dm_clone_metadata *cmd)
{
int r = -EPERM;

down_write(&cmd->lock);

if (cmd->fail_io || dm_bm_is_read_only(cmd->bm))
goto out;

if (WARN_ON(!cmd->committing_dmap)) {
r = -EINVAL;
goto out;
}

r = __flush_dmap(cmd, cmd->committing_dmap);
if (!r) {
/* Clear committing dmap */
cmd->committing_dmap = NULL;
}
out:
up_write(&cmd->lock);

Expand All @@ -802,6 +862,7 @@ int dm_clone_set_region_hydrated(struct dm_clone_metadata *cmd, unsigned long re
dmap = cmd->current_dmap;

__set_bit(word, dmap->dirty_words);
__set_bit(region_nr, dmap->dirty_regions);
__set_bit(region_nr, cmd->region_map);
dmap->changed = 1;

Expand Down Expand Up @@ -830,6 +891,7 @@ int dm_clone_cond_set_range(struct dm_clone_metadata *cmd, unsigned long start,
if (!test_bit(region_nr, cmd->region_map)) {
word = region_nr / BITS_PER_LONG;
__set_bit(word, dmap->dirty_words);
__set_bit(region_nr, dmap->dirty_regions);
__set_bit(region_nr, cmd->region_map);
dmap->changed = 1;
}
Expand Down
17 changes: 17 additions & 0 deletions drivers/md/dm-clone-metadata.h
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,23 @@ void dm_clone_metadata_close(struct dm_clone_metadata *cmd);

/*
* Commit dm-clone metadata to disk.
*
* We use a two phase commit:
*
* 1. dm_clone_metadata_pre_commit(): Prepare the current transaction for
* committing. After this is called, all subsequent metadata updates, done
* through either dm_clone_set_region_hydrated() or
* dm_clone_cond_set_range(), will be part of the **next** transaction.
*
* 2. dm_clone_metadata_commit(): Actually commit the current transaction to
* disk and start a new transaction.
*
* This allows dm-clone to flush the destination device after step (1) to
* ensure that all freshly hydrated regions, for which we are updating the
* metadata, are properly written to non-volatile storage and won't be lost in
* case of a crash.
*/
int dm_clone_metadata_pre_commit(struct dm_clone_metadata *cmd);
int dm_clone_metadata_commit(struct dm_clone_metadata *cmd);

/*
Expand Down Expand Up @@ -112,6 +128,7 @@ int dm_clone_metadata_abort(struct dm_clone_metadata *cmd);
* Switches metadata to a read only mode. Once read-only mode has been entered
* the following functions will return -EPERM:
*
* dm_clone_metadata_pre_commit()
* dm_clone_metadata_commit()
* dm_clone_set_region_hydrated()
* dm_clone_cond_set_range()
Expand Down
Loading

0 comments on commit 15da849

Please sign in to comment.