Skip to content

Commit

Permalink
Merge branch 'pm-sleep' into pm-for-linus
Browse files Browse the repository at this point in the history
* pm-sleep: (51 commits)
  PM: Drop generic_subsys_pm_ops
  PM / Sleep: Remove forward-only callbacks from AMBA bus type
  PM / Sleep: Remove forward-only callbacks from platform bus type
  PM: Run the driver callback directly if the subsystem one is not there
  PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
  PM / Sleep: Merge internal functions in generic_ops.c
  PM / Sleep: Simplify generic system suspend callbacks
  PM / Hibernate: Remove deprecated hibernation snapshot ioctls
  PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
  PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly
  PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep()
  PM / Sleep: Make [un]lock_system_sleep() generic
  PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs
  PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()
  PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else'
  Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
  PM / Sleep: Unify diagnostic messages from device suspend/resume
  ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR
  PM / Hibernate: Remove deprecated hibernation test modes
  PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path
  ...

Conflicts:
	kernel/kmod.c
  • Loading branch information
rjwysocki committed Dec 25, 2011
2 parents 8d274ab + 90363dd commit b7ba68c
Show file tree
Hide file tree
Showing 81 changed files with 770 additions and 1,268 deletions.
11 changes: 0 additions & 11 deletions Documentation/feature-removal-schedule.txt
Original file line number Diff line number Diff line change
Expand Up @@ -85,17 +85,6 @@ Who: Robin Getz <[email protected]> & Matt Mackall <[email protected]>

---------------------------

What: Deprecated snapshot ioctls
When: 2.6.36

Why: The ioctls in kernel/power/user.c were marked as deprecated long time
ago. Now they notify users about that so that they need to replace
their userspace. After some more time, remove them completely.

Who: Jiri Slaby <[email protected]>

---------------------------

What: The ieee80211_regdom module parameter
When: March 2010 / desktop catchup

Expand Down
37 changes: 21 additions & 16 deletions Documentation/power/devices.txt
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,9 @@ The core methods to suspend and resume devices reside in struct dev_pm_ops
pointed to by the ops member of struct dev_pm_domain, or by the pm member of
struct bus_type, struct device_type and struct class. They are mostly of
interest to the people writing infrastructure for platforms and buses, like PCI
or USB, or device type and device class drivers.
or USB, or device type and device class drivers. They also are relevant to the
writers of device drivers whose subsystems (PM domains, device types, device
classes and bus types) don't provide all power management methods.

Bus drivers implement these methods as appropriate for the hardware and the
drivers using it; PCI works differently from USB, and so on. Not many people
Expand Down Expand Up @@ -268,32 +270,35 @@ various phases always run after tasks have been frozen and before they are
unfrozen. Furthermore, the *_noirq phases run at a time when IRQ handlers have
been disabled (except for those marked with the IRQF_NO_SUSPEND flag).

All phases use PM domain, bus, type, or class callbacks (that is, methods
defined in dev->pm_domain->ops, dev->bus->pm, dev->type->pm, or dev->class->pm).
These callbacks are regarded by the PM core as mutually exclusive. Moreover,
PM domain callbacks always take precedence over bus, type and class callbacks,
while type callbacks take precedence over bus and class callbacks, and class
callbacks take precedence over bus callbacks. To be precise, the following
rules are used to determine which callback to execute in the given phase:
All phases use PM domain, bus, type, class or driver callbacks (that is, methods
defined in dev->pm_domain->ops, dev->bus->pm, dev->type->pm, dev->class->pm or
dev->driver->pm). These callbacks are regarded by the PM core as mutually
exclusive. Moreover, PM domain callbacks always take precedence over all of the
other callbacks and, for example, type callbacks take precedence over bus, class
and driver callbacks. To be precise, the following rules are used to determine
which callback to execute in the given phase:

1. If dev->pm_domain is present, the PM core will attempt to execute the
callback included in dev->pm_domain->ops. If that callback is not
present, no action will be carried out for the given device.
1. If dev->pm_domain is present, the PM core will choose the callback
included in dev->pm_domain->ops for execution

2. Otherwise, if both dev->type and dev->type->pm are present, the callback
included in dev->type->pm will be executed.
included in dev->type->pm will be chosen for execution.

3. Otherwise, if both dev->class and dev->class->pm are present, the
callback included in dev->class->pm will be executed.
callback included in dev->class->pm will be chosen for execution.

4. Otherwise, if both dev->bus and dev->bus->pm are present, the callback
included in dev->bus->pm will be executed.
included in dev->bus->pm will be chosen for execution.

This allows PM domains and device types to override callbacks provided by bus
types or device classes if necessary.

These callbacks may in turn invoke device- or driver-specific methods stored in
dev->driver->pm, but they don't have to.
The PM domain, type, class and bus callbacks may in turn invoke device- or
driver-specific methods stored in dev->driver->pm, but they don't have to do
that.

If the subsystem callback chosen for execution is not present, the PM core will
execute the corresponding method from dev->driver->pm instead if there is one.


Entering System Suspend
Expand Down
39 changes: 32 additions & 7 deletions Documentation/power/freezing-of-tasks.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,18 +21,18 @@ freeze_processes() (defined in kernel/power/process.c) is called. It executes
try_to_freeze_tasks() that sets TIF_FREEZE for all of the freezable tasks and
either wakes them up, if they are kernel threads, or sends fake signals to them,
if they are user space processes. A task that has TIF_FREEZE set, should react
to it by calling the function called refrigerator() (defined in
to it by calling the function called __refrigerator() (defined in
kernel/freezer.c), which sets the task's PF_FROZEN flag, changes its state
to TASK_UNINTERRUPTIBLE and makes it loop until PF_FROZEN is cleared for it.
Then, we say that the task is 'frozen' and therefore the set of functions
handling this mechanism is referred to as 'the freezer' (these functions are
defined in kernel/power/process.c, kernel/freezer.c & include/linux/freezer.h).
User space processes are generally frozen before kernel threads.

It is not recommended to call refrigerator() directly. Instead, it is
recommended to use the try_to_freeze() function (defined in
include/linux/freezer.h), that checks the task's TIF_FREEZE flag and makes the
task enter refrigerator() if the flag is set.
__refrigerator() must not be called directly. Instead, use the
try_to_freeze() function (defined in include/linux/freezer.h), that checks
the task's TIF_FREEZE flag and makes the task enter __refrigerator() if the
flag is set.

For user space processes try_to_freeze() is called automatically from the
signal-handling code, but the freezable kernel threads need to call it
Expand Down Expand Up @@ -61,13 +61,13 @@ wait_event_freezable() and wait_event_freezable_timeout() macros.
After the system memory state has been restored from a hibernation image and
devices have been reinitialized, the function thaw_processes() is called in
order to clear the PF_FROZEN flag for each frozen task. Then, the tasks that
have been frozen leave refrigerator() and continue running.
have been frozen leave __refrigerator() and continue running.

III. Which kernel threads are freezable?

Kernel threads are not freezable by default. However, a kernel thread may clear
PF_NOFREEZE for itself by calling set_freezable() (the resetting of PF_NOFREEZE
directly is strongly discouraged). From this point it is regarded as freezable
directly is not allowed). From this point it is regarded as freezable
and must call try_to_freeze() in a suitable place.

IV. Why do we do that?
Expand Down Expand Up @@ -176,3 +176,28 @@ tasks, since it generally exists anyway.
A driver must have all firmwares it may need in RAM before suspend() is called.
If keeping them is not practical, for example due to their size, they must be
requested early enough using the suspend notifier API described in notifiers.txt.

VI. Are there any precautions to be taken to prevent freezing failures?

Yes, there are.

First of all, grabbing the 'pm_mutex' lock to mutually exclude a piece of code
from system-wide sleep such as suspend/hibernation is not encouraged.
If possible, that piece of code must instead hook onto the suspend/hibernation
notifiers to achieve mutual exclusion. Look at the CPU-Hotplug code
(kernel/cpu.c) for an example.

However, if that is not feasible, and grabbing 'pm_mutex' is deemed necessary,
it is strongly discouraged to directly call mutex_[un]lock(&pm_mutex) since
that could lead to freezing failures, because if the suspend/hibernate code
successfully acquired the 'pm_mutex' lock, and hence that other entity failed
to acquire the lock, then that task would get blocked in TASK_UNINTERRUPTIBLE
state. As a consequence, the freezer would not be able to freeze that task,
leading to freezing failure.

However, the [un]lock_system_sleep() APIs are safe to use in this scenario,
since they ask the freezer to skip freezing this task, since it is anyway
"frozen enough" as it is blocked on 'pm_mutex', which will be released
only after the entire suspend/hibernation sequence is complete.
So, to summarize, use [un]lock_system_sleep() instead of directly using
mutex_[un]lock(&pm_mutex). That would prevent freezing failures.
130 changes: 68 additions & 62 deletions Documentation/power/runtime_pm.txt
Original file line number Diff line number Diff line change
Expand Up @@ -57,93 +57,99 @@ the following:

4. Bus type of the device, if both dev->bus and dev->bus->pm are present.

If the subsystem chosen by applying the above rules doesn't provide the relevant
callback, the PM core will invoke the corresponding driver callback stored in
dev->driver->pm directly (if present).

The PM core always checks which callback to use in the order given above, so the
priority order of callbacks from high to low is: PM domain, device type, class
and bus type. Moreover, the high-priority one will always take precedence over
a low-priority one. The PM domain, bus type, device type and class callbacks
are referred to as subsystem-level callbacks in what follows.

By default, the callbacks are always invoked in process context with interrupts
enabled. However, subsystems can use the pm_runtime_irq_safe() helper function
to tell the PM core that their ->runtime_suspend(), ->runtime_resume() and
->runtime_idle() callbacks may be invoked in atomic context with interrupts
disabled for a given device. This implies that the callback routines in
question must not block or sleep, but it also means that the synchronous helper
functions listed at the end of Section 4 may be used for that device within an
interrupt handler or generally in an atomic context.

The subsystem-level suspend callback is _entirely_ _responsible_ for handling
the suspend of the device as appropriate, which may, but need not include
executing the device driver's own ->runtime_suspend() callback (from the
enabled. However, the pm_runtime_irq_safe() helper function can be used to tell
the PM core that it is safe to run the ->runtime_suspend(), ->runtime_resume()
and ->runtime_idle() callbacks for the given device in atomic context with
interrupts disabled. This implies that the callback routines in question must
not block or sleep, but it also means that the synchronous helper functions
listed at the end of Section 4 may be used for that device within an interrupt
handler or generally in an atomic context.

The subsystem-level suspend callback, if present, is _entirely_ _responsible_
for handling the suspend of the device as appropriate, which may, but need not
include executing the device driver's own ->runtime_suspend() callback (from the
PM core's point of view it is not necessary to implement a ->runtime_suspend()
callback in a device driver as long as the subsystem-level suspend callback
knows what to do to handle the device).

* Once the subsystem-level suspend callback has completed successfully
for given device, the PM core regards the device as suspended, which need
not mean that the device has been put into a low power state. It is
supposed to mean, however, that the device will not process data and will
not communicate with the CPU(s) and RAM until the subsystem-level resume
callback is executed for it. The runtime PM status of a device after
successful execution of the subsystem-level suspend callback is 'suspended'.

* If the subsystem-level suspend callback returns -EBUSY or -EAGAIN,
the device's runtime PM status is 'active', which means that the device
_must_ be fully operational afterwards.

* If the subsystem-level suspend callback returns an error code different
from -EBUSY or -EAGAIN, the PM core regards this as a fatal error and will
refuse to run the helper functions described in Section 4 for the device,
until the status of it is directly set either to 'active', or to 'suspended'
(the PM core provides special helper functions for this purpose).

In particular, if the driver requires remote wake-up capability (i.e. hardware
* Once the subsystem-level suspend callback (or the driver suspend callback,
if invoked directly) has completed successfully for the given device, the PM
core regards the device as suspended, which need not mean that it has been
put into a low power state. It is supposed to mean, however, that the
device will not process data and will not communicate with the CPU(s) and
RAM until the appropriate resume callback is executed for it. The runtime
PM status of a device after successful execution of the suspend callback is
'suspended'.

* If the suspend callback returns -EBUSY or -EAGAIN, the device's runtime PM
status remains 'active', which means that the device _must_ be fully
operational afterwards.

* If the suspend callback returns an error code different from -EBUSY and
-EAGAIN, the PM core regards this as a fatal error and will refuse to run
the helper functions described in Section 4 for the device until its status
is directly set to either'active', or 'suspended' (the PM core provides
special helper functions for this purpose).

In particular, if the driver requires remote wakeup capability (i.e. hardware
mechanism allowing the device to request a change of its power state, such as
PCI PME) for proper functioning and device_run_wake() returns 'false' for the
device, then ->runtime_suspend() should return -EBUSY. On the other hand, if
device_run_wake() returns 'true' for the device and the device is put into a low
power state during the execution of the subsystem-level suspend callback, it is
expected that remote wake-up will be enabled for the device. Generally, remote
wake-up should be enabled for all input devices put into a low power state at
run time.

The subsystem-level resume callback is _entirely_ _responsible_ for handling the
resume of the device as appropriate, which may, but need not include executing
the device driver's own ->runtime_resume() callback (from the PM core's point of
view it is not necessary to implement a ->runtime_resume() callback in a device
driver as long as the subsystem-level resume callback knows what to do to handle
the device).

* Once the subsystem-level resume callback has completed successfully, the PM
core regards the device as fully operational, which means that the device
_must_ be able to complete I/O operations as needed. The runtime PM status
of the device is then 'active'.

* If the subsystem-level resume callback returns an error code, the PM core
regards this as a fatal error and will refuse to run the helper functions
described in Section 4 for the device, until its status is directly set
either to 'active' or to 'suspended' (the PM core provides special helper
functions for this purpose).

The subsystem-level idle callback is executed by the PM core whenever the device
appears to be idle, which is indicated to the PM core by two counters, the
device's usage counter and the counter of 'active' children of the device.
device_run_wake() returns 'true' for the device and the device is put into a
low-power state during the execution of the suspend callback, it is expected
that remote wakeup will be enabled for the device. Generally, remote wakeup
should be enabled for all input devices put into low-power states at run time.

The subsystem-level resume callback, if present, is _entirely_ _responsible_ for
handling the resume of the device as appropriate, which may, but need not
include executing the device driver's own ->runtime_resume() callback (from the
PM core's point of view it is not necessary to implement a ->runtime_resume()
callback in a device driver as long as the subsystem-level resume callback knows
what to do to handle the device).

* Once the subsystem-level resume callback (or the driver resume callback, if
invoked directly) has completed successfully, the PM core regards the device
as fully operational, which means that the device _must_ be able to complete
I/O operations as needed. The runtime PM status of the device is then
'active'.

* If the resume callback returns an error code, the PM core regards this as a
fatal error and will refuse to run the helper functions described in Section
4 for the device, until its status is directly set to either 'active', or
'suspended' (by means of special helper functions provided by the PM core
for this purpose).

The idle callback (a subsystem-level one, if present, or the driver one) is
executed by the PM core whenever the device appears to be idle, which is
indicated to the PM core by two counters, the device's usage counter and the
counter of 'active' children of the device.

* If any of these counters is decreased using a helper function provided by
the PM core and it turns out to be equal to zero, the other counter is
checked. If that counter also is equal to zero, the PM core executes the
subsystem-level idle callback with the device as an argument.
idle callback with the device as its argument.

The action performed by a subsystem-level idle callback is totally dependent on
the subsystem in question, but the expected and recommended action is to check
The action performed by the idle callback is totally dependent on the subsystem
(or driver) in question, but the expected and recommended action is to check
if the device can be suspended (i.e. if all of the conditions necessary for
suspending the device are satisfied) and to queue up a suspend request for the
device in that case. The value returned by this callback is ignored by the PM
core.

The helper functions provided by the PM core, described in Section 4, guarantee
that the following constraints are met with respect to the bus type's runtime
PM callbacks:
that the following constraints are met with respect to runtime PM callbacks for
one device:

(1) The callbacks are mutually exclusive (e.g. it is forbidden to execute
->runtime_suspend() in parallel with ->runtime_resume() or with another
Expand Down
2 changes: 0 additions & 2 deletions arch/alpha/include/asm/thread_info.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,15 +79,13 @@ register struct thread_info *__current_thread_info __asm__("$8");
#define TIF_UAC_SIGBUS 12 /* ! userspace part of 'osf_sysinfo' */
#define TIF_MEMDIE 13 /* is terminating due to OOM killer */
#define TIF_RESTORE_SIGMASK 14 /* restore signal mask in do_signal */
#define TIF_FREEZE 16 /* is freezing for suspend */

#define _TIF_SYSCALL_TRACE (1<<TIF_SYSCALL_TRACE)
#define _TIF_SIGPENDING (1<<TIF_SIGPENDING)
#define _TIF_NEED_RESCHED (1<<TIF_NEED_RESCHED)
#define _TIF_POLLING_NRFLAG (1<<TIF_POLLING_NRFLAG)
#define _TIF_RESTORE_SIGMASK (1<<TIF_RESTORE_SIGMASK)
#define _TIF_NOTIFY_RESUME (1<<TIF_NOTIFY_RESUME)
#define _TIF_FREEZE (1<<TIF_FREEZE)

/* Work to do on interrupt/exception return. */
#define _TIF_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
Expand Down
2 changes: 0 additions & 2 deletions arch/arm/include/asm/thread_info.h
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,6 @@ extern void vfp_flush_hwstate(struct thread_info *);
#define TIF_POLLING_NRFLAG 16
#define TIF_USING_IWMMXT 17
#define TIF_MEMDIE 18 /* is terminating due to OOM killer */
#define TIF_FREEZE 19
#define TIF_RESTORE_SIGMASK 20
#define TIF_SECCOMP 21

Expand All @@ -152,7 +151,6 @@ extern void vfp_flush_hwstate(struct thread_info *);
#define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
#define _TIF_POLLING_NRFLAG (1 << TIF_POLLING_NRFLAG)
#define _TIF_USING_IWMMXT (1 << TIF_USING_IWMMXT)
#define _TIF_FREEZE (1 << TIF_FREEZE)
#define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK)
#define _TIF_SECCOMP (1 << TIF_SECCOMP)

Expand Down
2 changes: 0 additions & 2 deletions arch/avr32/include/asm/thread_info.h
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ static inline struct thread_info *current_thread_info(void)
#define TIF_RESTORE_SIGMASK 7 /* restore signal mask in do_signal */
#define TIF_CPU_GOING_TO_SLEEP 8 /* CPU is entering sleep 0 mode */
#define TIF_NOTIFY_RESUME 9 /* callback before returning to user */
#define TIF_FREEZE 29
#define TIF_DEBUG 30 /* debugging enabled */
#define TIF_USERSPACE 31 /* true if FS sets userspace */

Expand All @@ -98,7 +97,6 @@ static inline struct thread_info *current_thread_info(void)
#define _TIF_RESTORE_SIGMASK (1 << TIF_RESTORE_SIGMASK)
#define _TIF_CPU_GOING_TO_SLEEP (1 << TIF_CPU_GOING_TO_SLEEP)
#define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
#define _TIF_FREEZE (1 << TIF_FREEZE)

/* Note: The masks below must never span more than 16 bits! */

Expand Down
Loading

0 comments on commit b7ba68c

Please sign in to comment.