forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block
Pull core block IO changes from Jens Axboe: "The major piece in here is the immutable bio_ve series from Kent, the rest is fairly minor. It was supposed to go in last round, but various issues pushed it to this release instead. The pull request contains: - Various smaller blk-mq fixes from different folks. Nothing major here, just minor fixes and cleanups. - Fix for a memory leak in the error path in the block ioctl code from Christian Engelmayer. - Header export fix from CaiZhiyong. - Finally the immutable biovec changes from Kent Overstreet. This enables some nice future work on making arbitrarily sized bios possible, and splitting more efficient. Related fixes to immutable bio_vecs: - dm-cache immutable fixup from Mike Snitzer. - btrfs immutable fixup from Muthu Kumar. - bio-integrity fix from Nic Bellinger, which is also going to stable" * 'for-3.14/core' of git://git.kernel.dk/linux-block: (44 commits) xtensa: fixup simdisk driver to work with immutable bio_vecs block/blk-mq-cpu.c: use hotcpu_notifier() blk-mq: for_each_* macro correctness block: Fix memory leak in rw_copy_check_uvector() handling bio-integrity: Fix bio_integrity_verify segment start bug block: remove unrelated header files and export symbol blk-mq: uses page->list incorrectly blk-mq: use __smp_call_function_single directly btrfs: fix missing increment of bi_remaining Revert "block: Warn and free bio if bi_end_io is not set" block: Warn and free bio if bi_end_io is not set blk-mq: fix initializing request's start time block: blk-mq: don't export blk_mq_free_queue() block: blk-mq: make blk_sync_queue support mq block: blk-mq: support draining mq queue dm cache: increment bi_remaining when bi_end_io is restored block: fixup for generic bio chaining block: Really silence spurious compiler warnings block: Silence spurious compiler warnings block: Kill bio_pair_split() ...
- Loading branch information
Showing
139 changed files
with
2,144 additions
and
2,683 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
|
||
Immutable biovecs and biovec iterators: | ||
======================================= | ||
|
||
Kent Overstreet <[email protected]> | ||
|
||
As of 3.13, biovecs should never be modified after a bio has been submitted. | ||
Instead, we have a new struct bvec_iter which represents a range of a biovec - | ||
the iterator will be modified as the bio is completed, not the biovec. | ||
|
||
More specifically, old code that needed to partially complete a bio would | ||
update bi_sector and bi_size, and advance bi_idx to the next biovec. If it | ||
ended up partway through a biovec, it would increment bv_offset and decrement | ||
bv_len by the number of bytes completed in that biovec. | ||
|
||
In the new scheme of things, everything that must be mutated in order to | ||
partially complete a bio is segregated into struct bvec_iter: bi_sector, | ||
bi_size and bi_idx have been moved there; and instead of modifying bv_offset | ||
and bv_len, struct bvec_iter has bi_bvec_done, which represents the number of | ||
bytes completed in the current bvec. | ||
|
||
There are a bunch of new helper macros for hiding the gory details - in | ||
particular, presenting the illusion of partially completed biovecs so that | ||
normal code doesn't have to deal with bi_bvec_done. | ||
|
||
* Driver code should no longer refer to biovecs directly; we now have | ||
bio_iovec() and bio_iovec_iter() macros that return literal struct biovecs, | ||
constructed from the raw biovecs but taking into account bi_bvec_done and | ||
bi_size. | ||
|
||
bio_for_each_segment() has been updated to take a bvec_iter argument | ||
instead of an integer (that corresponded to bi_idx); for a lot of code the | ||
conversion just required changing the types of the arguments to | ||
bio_for_each_segment(). | ||
|
||
* Advancing a bvec_iter is done with bio_advance_iter(); bio_advance() is a | ||
wrapper around bio_advance_iter() that operates on bio->bi_iter, and also | ||
advances the bio integrity's iter if present. | ||
|
||
There is a lower level advance function - bvec_iter_advance() - which takes | ||
a pointer to a biovec, not a bio; this is used by the bio integrity code. | ||
|
||
What's all this get us? | ||
======================= | ||
|
||
Having a real iterator, and making biovecs immutable, has a number of | ||
advantages: | ||
|
||
* Before, iterating over bios was very awkward when you weren't processing | ||
exactly one bvec at a time - for example, bio_copy_data() in fs/bio.c, | ||
which copies the contents of one bio into another. Because the biovecs | ||
wouldn't necessarily be the same size, the old code was tricky convoluted - | ||
it had to walk two different bios at the same time, keeping both bi_idx and | ||
and offset into the current biovec for each. | ||
|
||
The new code is much more straightforward - have a look. This sort of | ||
pattern comes up in a lot of places; a lot of drivers were essentially open | ||
coding bvec iterators before, and having common implementation considerably | ||
simplifies a lot of code. | ||
|
||
* Before, any code that might need to use the biovec after the bio had been | ||
completed (perhaps to copy the data somewhere else, or perhaps to resubmit | ||
it somewhere else if there was an error) had to save the entire bvec array | ||
- again, this was being done in a fair number of places. | ||
|
||
* Biovecs can be shared between multiple bios - a bvec iter can represent an | ||
arbitrary range of an existing biovec, both starting and ending midway | ||
through biovecs. This is what enables efficient splitting of arbitrary | ||
bios. Note that this means we _only_ use bi_size to determine when we've | ||
reached the end of a bio, not bi_vcnt - and the bio_iovec() macro takes | ||
bi_size into account when constructing biovecs. | ||
|
||
* Splitting bios is now much simpler. The old bio_split() didn't even work on | ||
bios with more than a single bvec! Now, we can efficiently split arbitrary | ||
size bios - because the new bio can share the old bio's biovec. | ||
|
||
Care must be taken to ensure the biovec isn't freed while the split bio is | ||
still using it, in case the original bio completes first, though. Using | ||
bio_chain() when splitting bios helps with this. | ||
|
||
* Submitting partially completed bios is now perfectly fine - this comes up | ||
occasionally in stacking block drivers and various code (e.g. md and | ||
bcache) had some ugly workarounds for this. | ||
|
||
It used to be the case that submitting a partially completed bio would work | ||
fine to _most_ devices, but since accessing the raw bvec array was the | ||
norm, not all drivers would respect bi_idx and those would break. Now, | ||
since all drivers _must_ go through the bvec iterator - and have been | ||
audited to make sure they are - submitting partially completed bios is | ||
perfectly fine. | ||
|
||
Other implications: | ||
=================== | ||
|
||
* Almost all usage of bi_idx is now incorrect and has been removed; instead, | ||
where previously you would have used bi_idx you'd now use a bvec_iter, | ||
probably passing it to one of the helper macros. | ||
|
||
I.e. instead of using bio_iovec_idx() (or bio->bi_iovec[bio->bi_idx]), you | ||
now use bio_iter_iovec(), which takes a bvec_iter and returns a | ||
literal struct bio_vec - constructed on the fly from the raw biovec but | ||
taking into account bi_bvec_done (and bi_size). | ||
|
||
* bi_vcnt can't be trusted or relied upon by driver code - i.e. anything that | ||
doesn't actually own the bio. The reason is twofold: firstly, it's not | ||
actually needed for iterating over the bio anymore - we only use bi_size. | ||
Secondly, when cloning a bio and reusing (a portion of) the original bio's | ||
biovec, in order to calculate bi_vcnt for the new bio we'd have to iterate | ||
over all the biovecs in the new bio - which is silly as it's not needed. | ||
|
||
So, don't use bi_vcnt anymore. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.