Skip to content

Commit

Permalink
Merge branch 'for-3.9/core' of git://git.kernel.dk/linux-block
Browse files Browse the repository at this point in the history
Pull block IO core bits from Jens Axboe:
 "Below are the core block IO bits for 3.9.  It was delayed a few days
  since my workstation kept crashing every 2-8h after pulling it into
  current -git, but turns out it is a bug in the new pstate code (divide
  by zero, will report separately).  In any case, it contains:

   - The big cfq/blkcg update from Tejun and and Vivek.

   - Additional block and writeback tracepoints from Tejun.

   - Improvement of the should sort (based on queues) logic in the plug
     flushing.

   - _io() variants of the wait_for_completion() interface, using
     io_schedule() instead of schedule() to contribute to io wait
     properly.

   - Various little fixes.

  You'll get two trivial merge conflicts, which should be easy enough to
  fix up"

Fix up the trivial conflicts due to hlist traversal cleanups (commit
b67bfe0: "hlist: drop the node parameter from iterators").

* 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
  block: remove redundant check to bd_openers()
  block: use i_size_write() in bd_set_size()
  cfq: fix lock imbalance with failed allocations
  drivers/block/swim3.c: fix null pointer dereference
  block: don't select PERCPU_RWSEM
  block: account iowait time when waiting for completion of IO request
  sched: add wait_for_completion_io[_timeout]
  writeback: add more tracepoints
  block: add block_{touch|dirty}_buffer tracepoint
  buffer: make touch_buffer() an exported function
  block: add @Req to bio_{front|back}_merge tracepoints
  block: add missing block_bio_complete() tracepoint
  block: Remove should_sort judgement when flush blk_plug
  block,elevator: use new hashtable implementation
  cfq-iosched: add hierarchical cfq_group statistics
  cfq-iosched: collect stats from dead cfqgs
  cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
  blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
  block: RCU free request_queue
  blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
  ...
  • Loading branch information
torvalds committed Feb 28, 2013
2 parents 21f3b24 + de33127 commit ee89f81
Show file tree
Hide file tree
Showing 30 changed files with 1,246 additions and 258 deletions.
58 changes: 58 additions & 0 deletions Documentation/block/cfq-iosched.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,64 @@ processing of request. Therefore, increasing the value can imporve the
performace although this can cause the latency of some I/O to increase due
to more number of requests.

CFQ Group scheduling
====================

CFQ supports blkio cgroup and has "blkio." prefixed files in each
blkio cgroup directory. It is weight-based and there are four knobs
for configuration - weight[_device] and leaf_weight[_device].
Internal cgroup nodes (the ones with children) can also have tasks in
them, so the former two configure how much proportion the cgroup as a
whole is entitled to at its parent's level while the latter two
configure how much proportion the tasks in the cgroup have compared to
its direct children.

Another way to think about it is assuming that each internal node has
an implicit leaf child node which hosts all the tasks whose weight is
configured by leaf_weight[_device]. Let's assume a blkio hierarchy
composed of five cgroups - root, A, B, AA and AB - with the following
weights where the names represent the hierarchy.

weight leaf_weight
root : 125 125
A : 500 750
B : 250 500
AA : 500 500
AB : 1000 500

root never has a parent making its weight is meaningless. For backward
compatibility, weight is always kept in sync with leaf_weight. B, AA
and AB have no child and thus its tasks have no children cgroup to
compete with. They always get 100% of what the cgroup won at the
parent level. Considering only the weights which matter, the hierarchy
looks like the following.

root
/ | \
A B leaf
500 250 125
/ | \
AA AB leaf
500 1000 750

If all cgroups have active IOs and competing with each other, disk
time will be distributed like the following.

Distribution below root. The total active weight at this level is
A:500 + B:250 + C:125 = 875.

root-leaf : 125 / 875 =~ 14%
A : 500 / 875 =~ 57%
B(-leaf) : 250 / 875 =~ 28%

A has children and further distributes its 57% among the children and
the implicit leaf node. The total active weight at this level is
AA:500 + AB:1000 + A-leaf:750 = 2250.

A-leaf : ( 750 / 2250) * A =~ 19%
AA(-leaf) : ( 500 / 2250) * A =~ 12%
AB(-leaf) : (1000 / 2250) * A =~ 25%

CFQ IOPS Mode for group scheduling
===================================
Basic CFQ design is to provide priority based time slices. Higher priority
Expand Down
35 changes: 24 additions & 11 deletions Documentation/cgroups/blkio-controller.txt
Original file line number Diff line number Diff line change
Expand Up @@ -94,30 +94,32 @@ Throttling/Upper Limit policy

Hierarchical Cgroups
====================
- Currently none of the IO control policy supports hierarchical groups. But
cgroup interface does allow creation of hierarchical cgroups and internally
IO policies treat them as flat hierarchy.
- Currently only CFQ supports hierarchical groups. For throttling,
cgroup interface does allow creation of hierarchical cgroups and
internally it treats them as flat hierarchy.

So this patch will allow creation of cgroup hierarchcy but at the backend
everything will be treated as flat. So if somebody created a hierarchy like
as follows.
If somebody created a hierarchy like as follows.

root
/ \
test1 test2
|
test3

CFQ and throttling will practically treat all groups at same level.
CFQ will handle the hierarchy correctly but and throttling will
practically treat all groups at same level. For details on CFQ
hierarchy support, refer to Documentation/block/cfq-iosched.txt.
Throttling will treat the hierarchy as if it looks like the
following.

pivot
/ / \ \
root test1 test2 test3

Down the line we can implement hierarchical accounting/control support
and also introduce a new cgroup file "use_hierarchy" which will control
whether cgroup hierarchy is viewed as flat or hierarchical by the policy..
This is how memory controller also has implemented the things.
Nesting cgroups, while allowed, isn't officially supported and blkio
genereates warning when cgroups nest. Once throttling implements
hierarchy support, hierarchy will be supported and the warning will
be removed.

Various user visible config options
===================================
Expand Down Expand Up @@ -172,6 +174,12 @@ Proportional weight policy files
dev weight
8:16 300

- blkio.leaf_weight[_device]
- Equivalents of blkio.weight[_device] for the purpose of
deciding how much weight tasks in the given cgroup has while
competing with the cgroup's child cgroups. For details,
please refer to Documentation/block/cfq-iosched.txt.

- blkio.time
- disk time allocated to cgroup per device in milliseconds. First
two fields specify the major and minor number of the device and
Expand Down Expand Up @@ -279,6 +287,11 @@ Proportional weight policy files
and minor number of the device and third field specifies the number
of times a group was dequeued from a particular device.

- blkio.*_recursive
- Recursive version of various stats. These files show the
same information as their non-recursive counterparts but
include stats from all the descendant cgroups.

Throttling/Upper limit policy files
-----------------------------------
- blkio.throttle.read_bps_device
Expand Down
1 change: 0 additions & 1 deletion block/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
menuconfig BLOCK
bool "Enable the block layer" if EXPERT
default y
select PERCPU_RWSEM
help
Provide block layer support for the kernel.

Expand Down
Loading

0 comments on commit ee89f81

Please sign in to comment.