Skip to content

Commit

Permalink
Merge tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kern…
Browse files Browse the repository at this point in the history
…el/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - the most extensive changes this cycle are the DM core improvements to
   add full blk-mq support to request-based DM.

    - disabled by default but user can opt-in with CONFIG_DM_MQ_DEFAULT
    - depends on some blk-mq changes from Jens' for-4.1/core branch so
      that explains why this pull is built on linux-block.git

 - update DM to use name_to_dev_t() rather than open-coding a less
   capable device parser.

    - includes a couple small improvements to name_to_dev_t() that offer
      stricter constraints that DM's code provided.

 - improvements to the dm-cache "mq" cache replacement policy.

 - a DM crypt crypt_ctr() error path fix and an async crypto deadlock
   fix

 - a small efficiency improvement for DM crypt decryption by leveraging
   immutable biovecs

 - add error handling modes for corrupted blocks to DM verity

 - a new "log-writes" DM target from Josef Bacik that is meant for file
   system developers to test file system integrity at particular points
   in the life of a file system

 - a few DM log userspace cleanups and fixes

 - a few Documentation fixes (for thin, cache, crypt and switch)

* tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (34 commits)
  dm crypt: fix missing error code return from crypt_ctr error path
  dm crypt: fix deadlock when async crypto algorithm returns -EBUSY
  dm crypt: leverage immutable biovecs when decrypting on read
  dm crypt: update URLs to new cryptsetup project page
  dm: add log writes target
  dm table: use bool function return values of true/false not 1/0
  dm verity: add error handling modes for corrupted blocks
  dm thin: remove stale 'trim' message documentation
  dm delay: use msecs_to_jiffies for time conversion
  dm log userspace base: fix compile warning
  dm log userspace transfer: match wait_for_completion_timeout return type
  dm table: fall back to getting device using name_to_dev_t()
  init: stricter checking of major:minor root= values
  init: export name_to_dev_t and mark name argument as const
  dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr
  dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq
  dm: add full blk-mq support to request-based DM
  dm: impose configurable deadline for dm_request_fn's merge heuristic
  dm sysfs: introduce ability to add writable attributes
  dm: don't start current request if it would've merged with the previous
  ...
  • Loading branch information
torvalds committed Apr 18, 2015
2 parents 04b7fe6 + 44c144f commit afad97e
Show file tree
Hide file tree
Showing 24 changed files with 1,928 additions and 343 deletions.
22 changes: 22 additions & 0 deletions Documentation/ABI/testing/sysfs-block-dm
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,25 @@ Description: Device-mapper device suspend state.
Contains the value 1 while the device is suspended.
Otherwise it contains 0. Read-only attribute.
Users: util-linux, device-mapper udev rules

What: /sys/block/dm-<num>/dm/rq_based_seq_io_merge_deadline
Date: March 2015
KernelVersion: 4.1
Contact: [email protected]
Description: Allow control over how long a request that is a
reasonable merge candidate can be queued on the request
queue. The resolution of this deadline is in
microseconds (ranging from 1 to 100000 usecs).
Setting this attribute to 0 (the default) will disable
request-based DM's merge heuristic and associated extra
accounting. This attribute is not applicable to
bio-based DM devices so it will only ever report 0 for
them.

What: /sys/block/dm-<num>/dm/use_blk_mq
Date: March 2015
KernelVersion: 4.1
Contact: [email protected]
Description: Request-based Device-mapper blk-mq I/O path mode.
Contains the value 1 if the device is using blk-mq.
Otherwise it contains 0. Read-only attribute.
4 changes: 2 additions & 2 deletions Documentation/device-mapper/dm-crypt.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Device-Mapper's "crypt" target provides transparent encryption of block devices
using the kernel crypto API.

For a more detailed description of supported parameters see:
http://code.google.com/p/cryptsetup/wiki/DMCrypt
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt

Parameters: <cipher> <key> <iv_offset> <device path> \
<offset> [<#opt_params> <opt_params>]
Expand Down Expand Up @@ -80,7 +80,7 @@ Example scripts
===============
LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
encryption with dm-crypt using the 'cryptsetup' utility, see
http://code.google.com/p/cryptsetup/
https://gitlab.com/cryptsetup/cryptsetup

[[
#!/bin/sh
Expand Down
140 changes: 140 additions & 0 deletions Documentation/device-mapper/log-writes.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
dm-log-writes
=============

This target takes 2 devices, one to pass all IO to normally, and one to log all
of the write operations to. This is intended for file system developers wishing
to verify the integrity of metadata or data as the file system is written to.
There is a log_write_entry written for every WRITE request and the target is
able to take arbitrary data from userspace to insert into the log. The data
that is in the WRITE requests is copied into the log to make the replay happen
exactly as it happened originally.

Log Ordering
============

We log things in order of completion once we are sure the write is no longer in
cache. This means that normal WRITE requests are not actually logged until the
next REQ_FLUSH request. This is to make it easier for userspace to replay the
log in a way that correlates to what is on disk and not what is in cache, to
make it easier to detect improper waiting/flushing.

This works by attaching all WRITE requests to a list once the write completes.
Once we see a REQ_FLUSH request we splice this list onto the request and once
the FLUSH request completes we log all of the WRITEs and then the FLUSH. Only
completed WRITEs, at the time the REQ_FLUSH is issued, are added in order to
simulate the worst case scenario with regard to power failures. Consider the
following example (W means write, C means complete):

W1,W2,W3,C3,C2,Wflush,C1,Cflush

The log would show the following

W3,W2,flush,W1....

Again this is to simulate what is actually on disk, this allows us to detect
cases where a power failure at a particular point in time would create an
inconsistent file system.

Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
they complete as those requests will obviously bypass the device cache.

Any REQ_DISCARD requests are treated like WRITE requests. Otherwise we would
have all the DISCARD requests, and then the WRITE requests and then the FLUSH
request. Consider the following example:

WRITE block 1, DISCARD block 1, FLUSH

If we logged DISCARD when it completed, the replay would look like this

DISCARD 1, WRITE 1, FLUSH

which isn't quite what happened and wouldn't be caught during the log replay.

Target interface
================

i) Constructor

log-writes <dev_path> <log_dev_path>

dev_path : Device that all of the IO will go to normally.
log_dev_path : Device where the log entries are written to.

ii) Status

<#logged entries> <highest allocated sector>

#logged entries : Number of logged entries
highest allocated sector : Highest allocated sector

iii) Messages

mark <description>

You can use a dmsetup message to set an arbitrary mark in a log.
For example say you want to fsck a file system after every
write, but first you need to replay up to the mkfs to make sure
we're fsck'ing something reasonable, you would do something like
this:

mkfs.btrfs -f /dev/mapper/log
dmsetup message log 0 mark mkfs
<run test>

This would allow you to replay the log up to the mkfs mark and
then replay from that point on doing the fsck check in the
interval that you want.

Every log has a mark at the end labeled "dm-log-writes-end".

Userspace component
===================

There is a userspace tool that will replay the log for you in various ways.
It can be found here: https://github.com/josefbacik/log-writes

Example usage
=============

Say you want to test fsync on your file system. You would do something like
this:

TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
dmsetup create log --table "$TABLE"
mkfs.btrfs -f /dev/mapper/log
dmsetup message log 0 mark mkfs

mount /dev/mapper/log /mnt/btrfs-test
<some test that does fsync at the end>
dmsetup message log 0 mark fsync
md5sum /mnt/btrfs-test/foo
umount /mnt/btrfs-test

dmsetup remove log
replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync
mount /dev/sdb /mnt/btrfs-test
md5sum /mnt/btrfs-test/foo
<verify md5sum's are correct>

Another option is to do a complicated file system operation and verify the file
system is consistent during the entire operation. You could do this with:

TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc"
dmsetup create log --table "$TABLE"
mkfs.btrfs -f /dev/mapper/log
dmsetup message log 0 mark mkfs

mount /dev/mapper/log /mnt/btrfs-test
<fsstress to dirty the fs>
btrfs filesystem balance /mnt/btrfs-test
umount /mnt/btrfs-test
dmsetup remove log

replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs
btrfsck /dev/sdb
replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \
--fsck "btrfsck /dev/sdb" --check fua

And that will replay the log until it sees a FUA request, run the fsck command
and if the fsck passes it will replay to the next FUA, until it is completed or
the fsck command exists abnormally.
4 changes: 2 additions & 2 deletions Documentation/device-mapper/switch.txt
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,8 @@ consume far too much memory.
Using this device-mapper switch target we can now build a two-layer
device hierarchy:

Upper Tier Determine which array member the I/O should be sent to.
Lower Tier Load balance amongst paths to a particular member.
Upper Tier - Determine which array member the I/O should be sent to.
Lower Tier - Load balance amongst paths to a particular member.

The lower tier consists of a single dm multipath device for each member.
Each of these multipath devices contains the set of paths directly to
Expand Down
3 changes: 0 additions & 3 deletions Documentation/device-mapper/thin-provisioning.txt
Original file line number Diff line number Diff line change
Expand Up @@ -380,9 +380,6 @@ then you'll have no access to blocks mapped beyond the end. If you
load a target that is bigger than before, then extra blocks will be
provisioned as and when needed.

If you wish to reduce the size of your thin device and potentially
regain some space then send the 'trim' message to the pool.

ii) Status

<nr mapped sectors> <highest mapped sector>
Expand Down
21 changes: 19 additions & 2 deletions Documentation/device-mapper/verity.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Construction Parameters
<data_block_size> <hash_block_size>
<num_data_blocks> <hash_start_block>
<algorithm> <digest> <salt>
[<#opt_params> <opt_params>]

<version>
This is the type of the on-disk hash format.
Expand Down Expand Up @@ -62,6 +63,22 @@ Construction Parameters
<salt>
The hexadecimal encoding of the salt value.

<#opt_params>
Number of optional parameters. If there are no optional parameters,
the optional paramaters section can be skipped or #opt_params can be zero.
Otherwise #opt_params is the number of following arguments.

Example of optional parameters section:
1 ignore_corruption

ignore_corruption
Log corrupted blocks, but allow read operations to proceed normally.

restart_on_corruption
Restart the system when a corrupted block is discovered. This option is
not compatible with ignore_corruption and requires user space support to
avoid restart loops.

Theory of operation
===================

Expand Down Expand Up @@ -125,7 +142,7 @@ block boundary) are the hash blocks which are stored a depth at a time

The full specification of kernel parameters and on-disk metadata format
is available at the cryptsetup project's wiki page
http://code.google.com/p/cryptsetup/wiki/DMVerity
https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity

Status
======
Expand All @@ -142,7 +159,7 @@ Set up a device:

A command line tool veritysetup is available to compute or verify
the hash tree or activate the kernel device. This is available from
the cryptsetup upstream repository http://code.google.com/p/cryptsetup/
the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/
(as a libcryptsetup extension).

Create hash on the device:
Expand Down
27 changes: 27 additions & 0 deletions drivers/md/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,17 @@ config BLK_DEV_DM

If unsure, say N.

config DM_MQ_DEFAULT
bool "request-based DM: use blk-mq I/O path by default"
depends on BLK_DEV_DM
---help---
This option enables the blk-mq based I/O path for request-based
DM devices by default. With the option the dm_mod.use_blk_mq
module/boot option defaults to Y, without it to N, but it can
still be overriden either way.

If unsure say N.

config DM_DEBUG
bool "Device mapper debugging support"
depends on BLK_DEV_DM
Expand Down Expand Up @@ -432,4 +443,20 @@ config DM_SWITCH

If unsure, say N.

config DM_LOG_WRITES
tristate "Log writes target support"
depends on BLK_DEV_DM
---help---
This device-mapper target takes two devices, one device to use
normally, one to log all write operations done to the first device.
This is for use by file system developers wishing to verify that
their fs is writing a consitent file system at all times by allowing
them to replay the log in a variety of ways and to check the
contents.

To compile this code as a module, choose M here: the module will
be called dm-log-writes.

If unsure, say N.

endif # MD
1 change: 1 addition & 0 deletions drivers/md/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ obj-$(CONFIG_DM_CACHE) += dm-cache.o
obj-$(CONFIG_DM_CACHE_MQ) += dm-cache-mq.o
obj-$(CONFIG_DM_CACHE_CLEANER) += dm-cache-cleaner.o
obj-$(CONFIG_DM_ERA) += dm-era.o
obj-$(CONFIG_DM_LOG_WRITES) += dm-log-writes.o

ifeq ($(CONFIG_DM_UEVENT),y)
dm-mod-objs += dm-uevent.o
Expand Down
Loading

0 comments on commit afad97e

Please sign in to comment.