forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kern…
…el/git/device-mapper/linux-dm Pull device mapper updates from Mike Snitzer: - the most extensive changes this cycle are the DM core improvements to add full blk-mq support to request-based DM. - disabled by default but user can opt-in with CONFIG_DM_MQ_DEFAULT - depends on some blk-mq changes from Jens' for-4.1/core branch so that explains why this pull is built on linux-block.git - update DM to use name_to_dev_t() rather than open-coding a less capable device parser. - includes a couple small improvements to name_to_dev_t() that offer stricter constraints that DM's code provided. - improvements to the dm-cache "mq" cache replacement policy. - a DM crypt crypt_ctr() error path fix and an async crypto deadlock fix - a small efficiency improvement for DM crypt decryption by leveraging immutable biovecs - add error handling modes for corrupted blocks to DM verity - a new "log-writes" DM target from Josef Bacik that is meant for file system developers to test file system integrity at particular points in the life of a file system - a few DM log userspace cleanups and fixes - a few Documentation fixes (for thin, cache, crypt and switch) * tag 'dm-4.1-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (34 commits) dm crypt: fix missing error code return from crypt_ctr error path dm crypt: fix deadlock when async crypto algorithm returns -EBUSY dm crypt: leverage immutable biovecs when decrypting on read dm crypt: update URLs to new cryptsetup project page dm: add log writes target dm table: use bool function return values of true/false not 1/0 dm verity: add error handling modes for corrupted blocks dm thin: remove stale 'trim' message documentation dm delay: use msecs_to_jiffies for time conversion dm log userspace base: fix compile warning dm log userspace transfer: match wait_for_completion_timeout return type dm table: fall back to getting device using name_to_dev_t() init: stricter checking of major:minor root= values init: export name_to_dev_t and mark name argument as const dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq dm: add full blk-mq support to request-based DM dm: impose configurable deadline for dm_request_fn's merge heuristic dm sysfs: introduce ability to add writable attributes dm: don't start current request if it would've merged with the previous ...
- Loading branch information
Showing
24 changed files
with
1,928 additions
and
343 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -23,3 +23,25 @@ Description: Device-mapper device suspend state. | |
Contains the value 1 while the device is suspended. | ||
Otherwise it contains 0. Read-only attribute. | ||
Users: util-linux, device-mapper udev rules | ||
|
||
What: /sys/block/dm-<num>/dm/rq_based_seq_io_merge_deadline | ||
Date: March 2015 | ||
KernelVersion: 4.1 | ||
Contact: [email protected] | ||
Description: Allow control over how long a request that is a | ||
reasonable merge candidate can be queued on the request | ||
queue. The resolution of this deadline is in | ||
microseconds (ranging from 1 to 100000 usecs). | ||
Setting this attribute to 0 (the default) will disable | ||
request-based DM's merge heuristic and associated extra | ||
accounting. This attribute is not applicable to | ||
bio-based DM devices so it will only ever report 0 for | ||
them. | ||
|
||
What: /sys/block/dm-<num>/dm/use_blk_mq | ||
Date: March 2015 | ||
KernelVersion: 4.1 | ||
Contact: [email protected] | ||
Description: Request-based Device-mapper blk-mq I/O path mode. | ||
Contains the value 1 if the device is using blk-mq. | ||
Otherwise it contains 0. Read-only attribute. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,140 @@ | ||
dm-log-writes | ||
============= | ||
|
||
This target takes 2 devices, one to pass all IO to normally, and one to log all | ||
of the write operations to. This is intended for file system developers wishing | ||
to verify the integrity of metadata or data as the file system is written to. | ||
There is a log_write_entry written for every WRITE request and the target is | ||
able to take arbitrary data from userspace to insert into the log. The data | ||
that is in the WRITE requests is copied into the log to make the replay happen | ||
exactly as it happened originally. | ||
|
||
Log Ordering | ||
============ | ||
|
||
We log things in order of completion once we are sure the write is no longer in | ||
cache. This means that normal WRITE requests are not actually logged until the | ||
next REQ_FLUSH request. This is to make it easier for userspace to replay the | ||
log in a way that correlates to what is on disk and not what is in cache, to | ||
make it easier to detect improper waiting/flushing. | ||
|
||
This works by attaching all WRITE requests to a list once the write completes. | ||
Once we see a REQ_FLUSH request we splice this list onto the request and once | ||
the FLUSH request completes we log all of the WRITEs and then the FLUSH. Only | ||
completed WRITEs, at the time the REQ_FLUSH is issued, are added in order to | ||
simulate the worst case scenario with regard to power failures. Consider the | ||
following example (W means write, C means complete): | ||
|
||
W1,W2,W3,C3,C2,Wflush,C1,Cflush | ||
|
||
The log would show the following | ||
|
||
W3,W2,flush,W1.... | ||
|
||
Again this is to simulate what is actually on disk, this allows us to detect | ||
cases where a power failure at a particular point in time would create an | ||
inconsistent file system. | ||
|
||
Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as | ||
they complete as those requests will obviously bypass the device cache. | ||
|
||
Any REQ_DISCARD requests are treated like WRITE requests. Otherwise we would | ||
have all the DISCARD requests, and then the WRITE requests and then the FLUSH | ||
request. Consider the following example: | ||
|
||
WRITE block 1, DISCARD block 1, FLUSH | ||
|
||
If we logged DISCARD when it completed, the replay would look like this | ||
|
||
DISCARD 1, WRITE 1, FLUSH | ||
|
||
which isn't quite what happened and wouldn't be caught during the log replay. | ||
|
||
Target interface | ||
================ | ||
|
||
i) Constructor | ||
|
||
log-writes <dev_path> <log_dev_path> | ||
|
||
dev_path : Device that all of the IO will go to normally. | ||
log_dev_path : Device where the log entries are written to. | ||
|
||
ii) Status | ||
|
||
<#logged entries> <highest allocated sector> | ||
|
||
#logged entries : Number of logged entries | ||
highest allocated sector : Highest allocated sector | ||
|
||
iii) Messages | ||
|
||
mark <description> | ||
|
||
You can use a dmsetup message to set an arbitrary mark in a log. | ||
For example say you want to fsck a file system after every | ||
write, but first you need to replay up to the mkfs to make sure | ||
we're fsck'ing something reasonable, you would do something like | ||
this: | ||
|
||
mkfs.btrfs -f /dev/mapper/log | ||
dmsetup message log 0 mark mkfs | ||
<run test> | ||
|
||
This would allow you to replay the log up to the mkfs mark and | ||
then replay from that point on doing the fsck check in the | ||
interval that you want. | ||
|
||
Every log has a mark at the end labeled "dm-log-writes-end". | ||
|
||
Userspace component | ||
=================== | ||
|
||
There is a userspace tool that will replay the log for you in various ways. | ||
It can be found here: https://github.com/josefbacik/log-writes | ||
|
||
Example usage | ||
============= | ||
|
||
Say you want to test fsync on your file system. You would do something like | ||
this: | ||
|
||
TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" | ||
dmsetup create log --table "$TABLE" | ||
mkfs.btrfs -f /dev/mapper/log | ||
dmsetup message log 0 mark mkfs | ||
|
||
mount /dev/mapper/log /mnt/btrfs-test | ||
<some test that does fsync at the end> | ||
dmsetup message log 0 mark fsync | ||
md5sum /mnt/btrfs-test/foo | ||
umount /mnt/btrfs-test | ||
|
||
dmsetup remove log | ||
replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync | ||
mount /dev/sdb /mnt/btrfs-test | ||
md5sum /mnt/btrfs-test/foo | ||
<verify md5sum's are correct> | ||
|
||
Another option is to do a complicated file system operation and verify the file | ||
system is consistent during the entire operation. You could do this with: | ||
|
||
TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" | ||
dmsetup create log --table "$TABLE" | ||
mkfs.btrfs -f /dev/mapper/log | ||
dmsetup message log 0 mark mkfs | ||
|
||
mount /dev/mapper/log /mnt/btrfs-test | ||
<fsstress to dirty the fs> | ||
btrfs filesystem balance /mnt/btrfs-test | ||
umount /mnt/btrfs-test | ||
dmsetup remove log | ||
|
||
replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs | ||
btrfsck /dev/sdb | ||
replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ | ||
--fsck "btrfsck /dev/sdb" --check fua | ||
|
||
And that will replay the log until it sees a FUA request, run the fsck command | ||
and if the fsck passes it will replay to the next FUA, until it is completed or | ||
the fsck command exists abnormally. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.