Skip to content

Commit

Permalink
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Browse files Browse the repository at this point in the history
Pull block-related fixes from Jens Axboe:

 - Improvements to the buffered and direct write IO plugging from
   Fengguang.

 - Abstract out the mapping of a bio in a request, and use that to
   provide a blk_bio_map_sg() helper.  Useful for mapping just a bio
   instead of a full request.

 - Regression fix from Hugh, fixing up a patch that went into the
   previous release cycle (and marked stable, too) attempting to prevent
   a loop in __getblk_slow().

 - Updates to discard requests, fixing up the sizing and how we align
   them.  Also a change to disallow merging of discard requests, since
   that doesn't really work properly yet.

 - A few drbd fixes.

 - Documentation updates.

* 'for-linus' of git://git.kernel.dk/linux-block:
  block: replace __getblk_slow misfix by grow_dev_page fix
  drbd: Write all pages of the bitmap after an online resize
  drbd: Finish requests that completed while IO was frozen
  drbd: fix drbd wire compatibility for empty flushes
  Documentation: update tunable options in block/cfq-iosched.txt
  Documentation: update tunable options in block/cfq-iosched.txt
  Documentation: update missing index files in block/00-INDEX
  block: move down direct IO plugging
  block: remove plugging at buffered write time
  block: disable discard request merge temporarily
  bio: Fix potential memory leak in bio_find_or_create_slab()
  block: Don't use static to define "void *p" in show_partition_start()
  block: Add blk_bio_map_sg() helper
  block: Introduce __blk_segment_map_sg() helper
  fs/block-dev.c:fix performance regression in O_DIRECT writes to md block devices
  block: split discard into aligned requests
  block: reorganize rounding of max_discard_sectors
  • Loading branch information
torvalds committed Aug 25, 2012
2 parents da31ce7 + 676ce6d commit a7e546f
Show file tree
Hide file tree
Showing 17 changed files with 378 additions and 123 deletions.
10 changes: 8 additions & 2 deletions Documentation/block/00-INDEX
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,21 @@
biodoc.txt
- Notes on the Generic Block Layer Rewrite in Linux 2.5
capability.txt
- Generic Block Device Capability (/sys/block/<disk>/capability)
- Generic Block Device Capability (/sys/block/<device>/capability)
cfq-iosched.txt
- CFQ IO scheduler tunables
data-integrity.txt
- Block data integrity
deadline-iosched.txt
- Deadline IO scheduler tunables
ioprio.txt
- Block io priorities (in CFQ scheduler)
queue-sysfs.txt
- Queue's sysfs entries
request.txt
- The members of struct request (in include/linux/blkdev.h)
stat.txt
- Block layer statistics in /sys/block/<dev>/stat
- Block layer statistics in /sys/block/<device>/stat
switching-sched.txt
- Switching I/O schedulers at runtime
writeback_cache_control.txt
Expand Down
77 changes: 77 additions & 0 deletions Documentation/block/cfq-iosched.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
CFQ (Complete Fairness Queueing)
===============================

The main aim of CFQ scheduler is to provide a fair allocation of the disk
I/O bandwidth for all the processes which requests an I/O operation.

CFQ maintains the per process queue for the processes which request I/O
operation(syncronous requests). In case of asynchronous requests, all the
requests from all the processes are batched together according to their
process's I/O priority.

CFQ ioscheduler tunables
========================

Expand Down Expand Up @@ -25,6 +36,72 @@ there are multiple spindles behind single LUN (Host based hardware RAID
controller or for storage arrays), setting slice_idle=0 might end up in better
throughput and acceptable latencies.

back_seek_max
-------------
This specifies, given in Kbytes, the maximum "distance" for backward seeking.
The distance is the amount of space from the current head location to the
sectors that are backward in terms of distance.

This parameter allows the scheduler to anticipate requests in the "backward"
direction and consider them as being the "next" if they are within this
distance from the current head location.

back_seek_penalty
-----------------
This parameter is used to compute the cost of backward seeking. If the
backward distance of request is just 1/back_seek_penalty from a "front"
request, then the seeking cost of two requests is considered equivalent.

So scheduler will not bias toward one or the other request (otherwise scheduler
will bias toward front request). Default value of back_seek_penalty is 2.

fifo_expire_async
-----------------
This parameter is used to set the timeout of asynchronous requests. Default
value of this is 248ms.

fifo_expire_sync
----------------
This parameter is used to set the timeout of synchronous requests. Default
value of this is 124ms. In case to favor synchronous requests over asynchronous
one, this value should be decreased relative to fifo_expire_async.

slice_async
-----------
This parameter is same as of slice_sync but for asynchronous queue. The
default value is 40ms.

slice_async_rq
--------------
This parameter is used to limit the dispatching of asynchronous request to
device request queue in queue's slice time. The maximum number of request that
are allowed to be dispatched also depends upon the io priority. Default value
for this is 2.

slice_sync
----------
When a queue is selected for execution, the queues IO requests are only
executed for a certain amount of time(time_slice) before switching to another
queue. This parameter is used to calculate the time slice of synchronous
queue.

time_slice is computed using the below equation:-
time_slice = slice_sync + (slice_sync/5 * (4 - prio)). To increase the
time_slice of synchronous queue, increase the value of slice_sync. Default
value is 100ms.

quantum
-------
This specifies the number of request dispatched to the device queue. In a
queue's time slice, a request will not be dispatched if the number of request
in the device exceeds this parameter. This parameter is used for synchronous
request.

In case of storage with several disk, this setting can limit the parallel
processing of request. Therefore, increasing the value can imporve the
performace although this can cause the latency of some I/O to increase due
to more number of requests.

CFQ IOPS Mode for group scheduling
===================================
Basic CFQ design is to provide priority based time slices. Higher priority
Expand Down
64 changes: 64 additions & 0 deletions Documentation/block/queue-sysfs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,71 @@ These files are the ones found in the /sys/block/xxx/queue/ directory.
Files denoted with a RO postfix are readonly and the RW postfix means
read-write.

add_random (RW)
----------------
This file allows to trun off the disk entropy contribution. Default
value of this file is '1'(on).

discard_granularity (RO)
-----------------------
This shows the size of internal allocation of the device in bytes, if
reported by the device. A value of '0' means device does not support
the discard functionality.

discard_max_bytes (RO)
----------------------
Devices that support discard functionality may have internal limits on
the number of bytes that can be trimmed or unmapped in a single operation.
The discard_max_bytes parameter is set by the device driver to the maximum
number of bytes that can be discarded in a single operation. Discard
requests issued to the device must not exceed this limit. A discard_max_bytes
value of 0 means that the device does not support discard functionality.

discard_zeroes_data (RO)
------------------------
When read, this file will show if the discarded block are zeroed by the
device or not. If its value is '1' the blocks are zeroed otherwise not.

hw_sector_size (RO)
-------------------
This is the hardware sector size of the device, in bytes.

iostats (RW)
-------------
This file is used to control (on/off) the iostats accounting of the
disk.

logical_block_size (RO)
-----------------------
This is the logcal block size of the device, in bytes.

max_hw_sectors_kb (RO)
----------------------
This is the maximum number of kilobytes supported in a single data transfer.

max_integrity_segments (RO)
---------------------------
When read, this file shows the max limit of integrity segments as
set by block layer which a hardware controller can handle.

max_sectors_kb (RW)
-------------------
This is the maximum number of kilobytes that the block layer will allow
for a filesystem request. Must be smaller than or equal to the maximum
size allowed by the hardware.

max_segments (RO)
-----------------
Maximum number of segments of the device.

max_segment_size (RO)
---------------------
Maximum segment size of the device.

minimum_io_size (RO)
--------------------
This is the smallest preferred io size reported by the device.

nomerges (RW)
-------------
This enables the user to disable the lookup logic involved with IO
Expand All @@ -45,11 +96,24 @@ per-block-cgroup request pool. IOW, if there are N block cgroups,
each request queue may have upto N request pools, each independently
regulated by nr_requests.

optimal_io_size (RO)
--------------------
This is the optimal io size reported by the device.

physical_block_size (RO)
------------------------
This is the physical block size of device, in bytes.

read_ahead_kb (RW)
------------------
Maximum number of kilobytes to read-ahead for filesystems on this block
device.

rotational (RW)
---------------
This file is used to stat if the device is of rotational type or
non-rotational type.

rq_affinity (RW)
----------------
If this option is '1', the block layer will migrate request completions to the
Expand Down
41 changes: 28 additions & 13 deletions block/blk-lib.c
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
struct request_queue *q = bdev_get_queue(bdev);
int type = REQ_WRITE | REQ_DISCARD;
unsigned int max_discard_sectors;
unsigned int granularity, alignment, mask;
struct bio_batch bb;
struct bio *bio;
int ret = 0;
Expand All @@ -54,18 +55,20 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
if (!blk_queue_discard(q))
return -EOPNOTSUPP;

/* Zero-sector (unknown) and one-sector granularities are the same. */
granularity = max(q->limits.discard_granularity >> 9, 1U);
mask = granularity - 1;
alignment = (bdev_discard_alignment(bdev) >> 9) & mask;

/*
* Ensure that max_discard_sectors is of the proper
* granularity
* granularity, so that requests stay aligned after a split.
*/
max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
max_discard_sectors = round_down(max_discard_sectors, granularity);
if (unlikely(!max_discard_sectors)) {
/* Avoid infinite loop below. Being cautious never hurts. */
return -EOPNOTSUPP;
} else if (q->limits.discard_granularity) {
unsigned int disc_sects = q->limits.discard_granularity >> 9;

max_discard_sectors &= ~(disc_sects - 1);
}

if (flags & BLKDEV_DISCARD_SECURE) {
Expand All @@ -79,25 +82,37 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
bb.wait = &wait;

while (nr_sects) {
unsigned int req_sects;
sector_t end_sect;

bio = bio_alloc(gfp_mask, 1);
if (!bio) {
ret = -ENOMEM;
break;
}

req_sects = min_t(sector_t, nr_sects, max_discard_sectors);

/*
* If splitting a request, and the next starting sector would be
* misaligned, stop the discard at the previous aligned sector.
*/
end_sect = sector + req_sects;
if (req_sects < nr_sects && (end_sect & mask) != alignment) {
end_sect =
round_down(end_sect - alignment, granularity)
+ alignment;
req_sects = end_sect - sector;
}

bio->bi_sector = sector;
bio->bi_end_io = bio_batch_end_io;
bio->bi_bdev = bdev;
bio->bi_private = &bb;

if (nr_sects > max_discard_sectors) {
bio->bi_size = max_discard_sectors << 9;
nr_sects -= max_discard_sectors;
sector += max_discard_sectors;
} else {
bio->bi_size = nr_sects << 9;
nr_sects = 0;
}
bio->bi_size = req_sects << 9;
nr_sects -= req_sects;
sector = end_sect;

atomic_inc(&bb.done);
submit_bio(type, bio);
Expand Down
Loading

0 comments on commit a7e546f

Please sign in to comment.