Skip to content

Commit

Permalink
docs: scheduler: convert docs to ReST and rename to *.rst
Browse files Browse the repository at this point in the history
In order to prepare to add them to the Kernel API book,
convert the files to ReST format.

The conversion is actually:
  - add blank lines and identation in order to identify paragraphs;
  - fix tables markups;
  - add some lists markups;
  - mark literal blocks;
  - adjust title markups.

At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.

Signed-off-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Jonathan Corbet <[email protected]>
  • Loading branch information
mchehab authored and Jonathan Corbet committed Jun 14, 2019
1 parent d223884 commit d6a3b24
Show file tree
Hide file tree
Showing 16 changed files with 340 additions and 242 deletions.
2 changes: 1 addition & 1 deletion Documentation/ABI/testing/sysfs-kernel-uids
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ Description:
example would be, if User A has shares = 1024 and user
B has shares = 2048, User B will get twice the CPU
bandwidth user A will. For more details refer
Documentation/scheduler/sched-design-CFS.txt
Documentation/scheduler/sched-design-CFS.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
================================================
Completions - "wait for completion" barrier APIs
================================================

Expand Down Expand Up @@ -46,7 +47,7 @@ it has to wait for it.

To use completions you need to #include <linux/completion.h> and
create a static or dynamic variable of type 'struct completion',
which has only two fields:
which has only two fields::

struct completion {
unsigned int done;
Expand All @@ -57,7 +58,7 @@ This provides the ->wait waitqueue to place tasks on for waiting (if any), and
the ->done completion flag for indicating whether it's completed or not.

Completions should be named to refer to the event that is being synchronized on.
A good example is:
A good example is::

wait_for_completion(&early_console_added);

Expand All @@ -81,7 +82,7 @@ have taken place, even if these wait functions return prematurely due to a timeo
or a signal triggering.

Initializing of dynamically allocated completion objects is done via a call to
init_completion():
init_completion()::

init_completion(&dynamic_object->done);

Expand All @@ -100,7 +101,8 @@ but be aware of other races.

For static declaration and initialization, macros are available.

For static (or global) declarations in file scope you can use DECLARE_COMPLETION():
For static (or global) declarations in file scope you can use
DECLARE_COMPLETION()::

static DECLARE_COMPLETION(setup_done);
DECLARE_COMPLETION(setup_done);
Expand All @@ -111,7 +113,7 @@ initialized to 'not done' and doesn't require an init_completion() call.
When a completion is declared as a local variable within a function,
then the initialization should always use DECLARE_COMPLETION_ONSTACK()
explicitly, not just to make lockdep happy, but also to make it clear
that limited scope had been considered and is intentional:
that limited scope had been considered and is intentional::

DECLARE_COMPLETION_ONSTACK(setup_done)

Expand Down Expand Up @@ -140,11 +142,11 @@ Waiting for completions:
------------------------

For a thread to wait for some concurrent activity to finish, it
calls wait_for_completion() on the initialized completion structure:
calls wait_for_completion() on the initialized completion structure::

void wait_for_completion(struct completion *done)

A typical usage scenario is:
A typical usage scenario is::

CPU#1 CPU#2

Expand Down Expand Up @@ -192,17 +194,17 @@ A common problem that occurs is to have unclean assignment of return types,
so take care to assign return-values to variables of the proper type.

Checking for the specific meaning of return values also has been found
to be quite inaccurate, e.g. constructs like:
to be quite inaccurate, e.g. constructs like::

if (!wait_for_completion_interruptible_timeout(...))

... would execute the same code path for successful completion and for the
interrupted case - which is probably not what you want.
interrupted case - which is probably not what you want::

int wait_for_completion_interruptible(struct completion *done)

This function marks the task TASK_INTERRUPTIBLE while it is waiting.
If a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise.
If a signal was received while waiting it will return -ERESTARTSYS; 0 otherwise::

unsigned long wait_for_completion_timeout(struct completion *done, unsigned long timeout)

Expand All @@ -214,7 +216,7 @@ Timeouts are preferably calculated with msecs_to_jiffies() or usecs_to_jiffies()
to make the code largely HZ-invariant.

If the returned timeout value is deliberately ignored a comment should probably explain
why (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc()).
why (e.g. see drivers/mfd/wm8350-core.c wm8350_read_auxadc())::

long wait_for_completion_interruptible_timeout(struct completion *done, unsigned long timeout)

Expand All @@ -225,14 +227,14 @@ jiffies if completion occurred.

Further variants include _killable which uses TASK_KILLABLE as the
designated tasks state and will return -ERESTARTSYS if it is interrupted,
or 0 if completion was achieved. There is a _timeout variant as well:
or 0 if completion was achieved. There is a _timeout variant as well::

long wait_for_completion_killable(struct completion *done)
long wait_for_completion_killable_timeout(struct completion *done, unsigned long timeout)

The _io variants wait_for_completion_io() behave the same as the non-_io
variants, except for accounting waiting time as 'waiting on IO', which has
an impact on how the task is accounted in scheduling/IO stats:
an impact on how the task is accounted in scheduling/IO stats::

void wait_for_completion_io(struct completion *done)
unsigned long wait_for_completion_io_timeout(struct completion *done, unsigned long timeout)
Expand All @@ -243,11 +245,11 @@ Signaling completions:

A thread that wants to signal that the conditions for continuation have been
achieved calls complete() to signal exactly one of the waiters that it can
continue:
continue::

void complete(struct completion *done)

... or calls complete_all() to signal all current and future waiters:
... or calls complete_all() to signal all current and future waiters::

void complete_all(struct completion *done)

Expand All @@ -268,22 +270,22 @@ probably are a design bug.

Signaling completion from IRQ context is fine as it will appropriately
lock with spin_lock_irqsave()/spin_unlock_irqrestore() and it will never
sleep.
sleep.


try_wait_for_completion()/completion_done():
--------------------------------------------

The try_wait_for_completion() function will not put the thread on the wait
queue but rather returns false if it would need to enqueue (block) the thread,
else it consumes one posted completion and returns true.
else it consumes one posted completion and returns true::

bool try_wait_for_completion(struct completion *done)

Finally, to check the state of a completion without changing it in any way,
call completion_done(), which returns false if there are no posted
completions that were not yet consumed by waiters (implying that there are
waiters) and true otherwise;
waiters) and true otherwise::

bool completion_done(struct completion *done)

Expand Down
29 changes: 29 additions & 0 deletions Documentation/scheduler/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
:orphan:

===============
Linux Scheduler
===============

.. toctree::
:maxdepth: 1


completion
sched-arch
sched-bwc
sched-deadline
sched-design-CFS
sched-domains
sched-energy
sched-nice-design
sched-rt-group
sched-stats

text_files

.. only:: subproject and html

Indices
=======

* :ref:`genindex`
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
CPU Scheduler implementation hints for architecture specific code
=================================================================
CPU Scheduler implementation hints for architecture specific code
=================================================================

Nick Piggin, 2005

Expand Down Expand Up @@ -35,9 +37,10 @@ Your cpu_idle routines need to obey the following rules:
4. The only time interrupts need to be disabled when checking
need_resched is if we are about to sleep the processor until
the next interrupt (this doesn't provide any protection of
need_resched, it prevents losing an interrupt).
need_resched, it prevents losing an interrupt):

4a. Common problem with this type of sleep appears to be::

4a. Common problem with this type of sleep appears to be:
local_irq_disable();
if (!need_resched()) {
local_irq_enable();
Expand All @@ -51,10 +54,10 @@ Your cpu_idle routines need to obey the following rules:
although it may be reasonable to do some background work or enter
a low CPU priority.

5a. If TIF_POLLING_NRFLAG is set, and we do decide to enter
an interrupt sleep, it needs to be cleared then a memory
barrier issued (followed by a test of need_resched with
interrupts disabled, as explained in 3).
- 5a. If TIF_POLLING_NRFLAG is set, and we do decide to enter
an interrupt sleep, it needs to be cleared then a memory
barrier issued (followed by a test of need_resched with
interrupts disabled, as explained in 3).

arch/x86/kernel/process.c has examples of both polling and
sleeping idle functions.
Expand All @@ -71,4 +74,3 @@ sh64 - Is sleeping racy vs interrupts? (See #4a)

sparc - IRQs on at this point(?), change local_irq_save to _disable.
- TODO: needs secondary CPUs to disable preempt (See #1)

Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
=====================
CFS Bandwidth Control
=====================

[ This document only discusses CPU bandwidth control for SCHED_NORMAL.
The SCHED_RT case is covered in Documentation/scheduler/sched-rt-group.txt ]
The SCHED_RT case is covered in Documentation/scheduler/sched-rt-group.rst ]

CFS bandwidth control is a CONFIG_FAIR_GROUP_SCHED extension which allows the
specification of the maximum CPU bandwidth available to a group or hierarchy.
Expand All @@ -27,7 +28,8 @@ cpu.cfs_quota_us: the total available run-time within a period (in microseconds)
cpu.cfs_period_us: the length of a period (in microseconds)
cpu.stat: exports throttling statistics [explained further below]

The default values are:
The default values are::

cpu.cfs_period_us=100ms
cpu.cfs_quota=-1

Expand Down Expand Up @@ -55,7 +57,8 @@ For efficiency run-time is transferred between the global pool and CPU local
on large systems. The amount transferred each time such an update is required
is described as the "slice".

This is tunable via procfs:
This is tunable via procfs::

/proc/sys/kernel/sched_cfs_bandwidth_slice_us (default=5ms)

Larger slice values will reduce transfer overheads, while smaller values allow
Expand All @@ -66,6 +69,7 @@ Statistics
A group's bandwidth statistics are exported via 3 fields in cpu.stat.

cpu.stat:

- nr_periods: Number of enforcement intervals that have elapsed.
- nr_throttled: Number of times the group has been throttled/limited.
- throttled_time: The total time duration (in nanoseconds) for which entities
Expand All @@ -78,12 +82,15 @@ Hierarchical considerations
The interface enforces that an individual entity's bandwidth is always
attainable, that is: max(c_i) <= C. However, over-subscription in the
aggregate case is explicitly allowed to enable work-conserving semantics
within a hierarchy.
within a hierarchy:

e.g. \Sum (c_i) may exceed C

[ Where C is the parent's bandwidth, and c_i its children ]


There are two ways in which a group may become throttled:

a. it fully consumes its own quota within a period
b. a parent's quota is fully consumed within its period

Expand All @@ -92,18 +99,18 @@ be allowed to until the parent's runtime is refreshed.

Examples
--------
1. Limit a group to 1 CPU worth of runtime.
1. Limit a group to 1 CPU worth of runtime::

If period is 250ms and quota is also 250ms, the group will get
1 CPU worth of runtime every 250ms.

# echo 250000 > cpu.cfs_quota_us /* quota = 250ms */
# echo 250000 > cpu.cfs_period_us /* period = 250ms */

2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine.
2. Limit a group to 2 CPUs worth of runtime on a multi-CPU machine

With 500ms period and 1000ms quota, the group can get 2 CPUs worth of
runtime every 500ms.
With 500ms period and 1000ms quota, the group can get 2 CPUs worth of
runtime every 500ms::

# echo 1000000 > cpu.cfs_quota_us /* quota = 1000ms */
# echo 500000 > cpu.cfs_period_us /* period = 500ms */
Expand All @@ -112,11 +119,10 @@ Examples

3. Limit a group to 20% of 1 CPU.

With 50ms period, 10ms quota will be equivalent to 20% of 1 CPU.
With 50ms period, 10ms quota will be equivalent to 20% of 1 CPU::

# echo 10000 > cpu.cfs_quota_us /* quota = 10ms */
# echo 50000 > cpu.cfs_period_us /* period = 50ms */
By using a small period here we are ensuring a consistent latency
response at the expense of burst capacity.

By using a small period here we are ensuring a consistent latency
response at the expense of burst capacity.
Loading

0 comments on commit d6a3b24

Please sign in to comment.