Skip to content

Commit

Permalink
Merge tag 'pm-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/…
Browse files Browse the repository at this point in the history
…git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "There are no real big ticket items here this time.

  The most noticeable change is probably the relocation of the OPP
  (Operating Performance Points) framework to its own directory under
  drivers/ as it has grown big enough for that. Also Viresh is now going
  to maintain it and send pull requests for it to me, so you will see
  this change in the git history going forward (but still not right
  now).

  Another noticeable set of changes is the modifications of the PM core,
  the PCI subsystem and the ACPI PM domain to allow of more integration
  between system-wide suspend/resume and runtime PM. For now it's just a
  way to avoid resuming devices from runtime suspend unnecessarily
  during system suspend (if the driver sets a flag to indicate its
  readiness for that) and in the works is an analogous mechanism to
  allow devices to stay suspended after system resume.

  In addition to that, we have some changes related to supporting
  frequency-invariant CPU utilization metrics in the scheduler and in
  the schedutil cpufreq governor on ARM and changes to add support for
  device performance states to the generic power domains (genpd)
  framework.

  The rest is mostly fixes and cleanups of various sorts.

  Specifics:

   - Relocate the OPP (Operating Performance Points) framework to its
     own directory under drivers/ and add support for power domain
     performance states to it (Viresh Kumar).

   - Modify the PM core, the PCI bus type and the ACPI PM domain to
     support power management driver flags allowing device drivers to
     specify their capabilities and preferences regarding the handling
     of devices with enabled runtime PM during system suspend/resume and
     clean up that code somewhat (Rafael Wysocki, Ulf Hansson).

   - Add frequency-invariant accounting support to the task scheduler on
     ARM and ARM64 (Dietmar Eggemann).

   - Fix PM QoS device resume latency framework to prevent "no
     restriction" requests from overriding requests with specific
     requirements and drop the confusing PM_QOS_FLAG_REMOTE_WAKEUP
     device PM QoS flag (Rafael Wysocki).

   - Drop legacy class suspend/resume operations from the PM core and
     drop legacy bus type suspend and resume callbacks from ARM/locomo
     (Rafael Wysocki).

   - Add min/max frequency support to devfreq and clean it up somewhat
     (Chanwoo Choi).

   - Rework wakeup support in the generic power domains (genpd)
     framework and update some of its users accordingly (Geert
     Uytterhoeven).

   - Convert timers in the PM core to use timer_setup() (Kees Cook).

   - Add support for exposing the SLP_S0 (Low Power S0 Idle) residency
     counter based on the LPIT ACPI table on Intel platforms (Srinivas
     Pandruvada).

   - Add per-CPU PM QoS resume latency support to the ladder cpuidle
     governor (Ramesh Thomas).

   - Fix a deadlock between the wakeup notify handler and the notifier
     removal in the ACPI core (Ville Syrjälä).

   - Fix a cpufreq schedutil governor issue causing it to use stale
     cached frequency values sometimes (Viresh Kumar).

   - Fix an issue in the system suspend core support code causing wakeup
     events detection to fail in some cases (Rajat Jain).

   - Fix the generic power domains (genpd) framework to prevent the PM
     core from using the direct-complete optimization with it as that is
     guaranteed to fail (Ulf Hansson).

   - Fix a minor issue in the cpuidle core and clean it up a bit (Gaurav
     Jindal, Nicholas Piggin).

   - Fix and clean up the intel_idle and ARM cpuidle drivers (Jason
     Baron, Len Brown, Leo Yan).

   - Fix a couple of minor issues in the OPP framework and clean it up
     (Arvind Yadav, Fabio Estevam, Sudeep Holla, Tobias Jordan).

   - Fix and clean up some cpufreq drivers and fix a minor issue in the
     cpufreq statistics code (Arvind Yadav, Bhumika Goyal, Fabio
     Estevam, Gautham Shenoy, Gustavo Silva, Marek Szyprowski, Masahiro
     Yamada, Robert Jarzmik, Zumeng Chen).

   - Fix minor issues in the system suspend and hibernation core, in
     power management documentation and in the AVS (Adaptive Voltage
     Scaling) framework (Helge Deller, Himanshu Jha, Joe Perches, Rafael
     Wysocki).

   - Fix some issues in the cpupower utility and document that Shuah
     Khan is going to maintain it going forward (Prarit Bhargava, Shuah
     Khan)"

* tag 'pm-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (88 commits)
  tools/power/cpupower: add libcpupower.so.0.0.1 to .gitignore
  tools/power/cpupower: Add 64 bit library detection
  intel_idle: Graceful probe failure when MWAIT is disabled
  cpufreq: schedutil: Reset cached_raw_freq when not in sync with next_freq
  freezer: Fix typo in freezable_schedule_timeout() comment
  PM / s2idle: Clear the events_check_enabled flag
  cpufreq: stats: Handle the case when trans_table goes beyond PAGE_SIZE
  cpufreq: arm_big_little: make cpufreq_arm_bL_ops structures const
  cpufreq: arm_big_little: make function arguments and structure pointer const
  cpuidle: Avoid assignment in if () argument
  cpuidle: Clean up cpuidle_enable_device() error handling a bit
  ACPI / PM: Fix acpi_pm_notifier_lock vs flush_workqueue() deadlock
  PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare()
  cpuidle: ladder: Add per CPU PM QoS resume latency support
  PM / QoS: Fix device resume latency framework
  PM / domains: Rework governor code to be more consistent
  PM / Domains: Remove gpd_dev_ops.active_wakeup() callback
  soc: rockchip: power-domain: Use GENPD_FLAG_ACTIVE_WAKEUP
  soc: mediatek: Use GENPD_FLAG_ACTIVE_WAKEUP
  ARM: shmobile: pm-rmobile: Use GENPD_FLAG_ACTIVE_WAKEUP
  ...
  • Loading branch information
torvalds committed Nov 14, 2017
2 parents b29c6ef + 990a848 commit bd2cd7d
Show file tree
Hide file tree
Showing 99 changed files with 1,758 additions and 1,063 deletions.
20 changes: 3 additions & 17 deletions Documentation/ABI/testing/sysfs-devices-power
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,9 @@ Description:
device, after it has been suspended at run time, from a resume
request to the moment the device will be ready to process I/O,
in microseconds. If it is equal to 0, however, this means that
the PM QoS resume latency may be arbitrary.
the PM QoS resume latency may be arbitrary and the special value
"n/a" means that user space cannot accept any resume latency at
all for the given device.

Not all drivers support this attribute. If it isn't supported,
it is not present.
Expand Down Expand Up @@ -258,19 +260,3 @@ Description:

This attribute has no effect on system-wide suspend/resume and
hibernation.

What: /sys/devices/.../power/pm_qos_remote_wakeup
Date: September 2012
Contact: Rafael J. Wysocki <[email protected]>
Description:
The /sys/devices/.../power/pm_qos_remote_wakeup attribute
is used for manipulating the PM QoS "remote wakeup required"
flag. If set, this flag indicates to the kernel that the
device is a source of user events that have to be signaled from
its low-power states.

Not all drivers support this attribute. If it isn't supported,
it is not present.

This attribute has no effect on system-wide suspend/resume and
hibernation.
25 changes: 25 additions & 0 deletions Documentation/acpi/lpit.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
To enumerate platform Low Power Idle states, Intel platforms are using
“Low Power Idle Table” (LPIT). More details about this table can be
downloaded from:
http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf

Residencies for each low power state can be read via FFH
(Function fixed hardware) or a memory mapped interface.

On platforms supporting S0ix sleep states, there can be two types of
residencies:
- CPU PKG C10 (Read via FFH interface)
- Platform Controller Hub (PCH) SLP_S0 (Read via memory mapped interface)

The following attributes are added dynamically to the cpuidle
sysfs attribute group:
/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
/sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us

The "low_power_idle_cpu_residency_us" attribute shows time spent
by the CPU package in PKG C10

The "low_power_idle_system_residency_us" attribute shows SLP_S0
residency, or system time spent with the SLP_S0# signal asserted.
This is the lowest possible system power state, achieved only when CPU is in
PKG C10 and all functional blocks in PCH are in a low power state.
3 changes: 3 additions & 0 deletions Documentation/cpu-freq/cpufreq-stats.txt
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,9 @@ Freq_i to Freq_j. Freq_i is in descending order with increasing rows and
Freq_j is in descending order with increasing columns. The output here also
contains the actual freq values for each row and column for better readability.

If the transition table is bigger than PAGE_SIZE, reading this will
return an -EFBIG error.

--------------------------------------------------------------------------------
<mysystem>:/sys/devices/system/cpu/cpu0/cpufreq/stats # cat trans_table
From : To
Expand Down
61 changes: 59 additions & 2 deletions Documentation/driver-api/pm/devices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ sleep states and the hibernation state ("suspend-to-disk"). Each phase involves
executing callbacks for every device before the next phase begins. Not all
buses or classes support all these callbacks and not all drivers use all the
callbacks. The various phases always run after tasks have been frozen and
before they are unfrozen. Furthermore, the ``*_noirq phases`` run at a time
before they are unfrozen. Furthermore, the ``*_noirq`` phases run at a time
when IRQ handlers have been disabled (except for those marked with the
IRQF_NO_SUSPEND flag).

Expand Down Expand Up @@ -328,7 +328,10 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
After the ``->prepare`` callback method returns, no new children may be
registered below the device. The method may also prepare the device or
driver in some way for the upcoming system power transition, but it
should not put the device into a low-power state.
should not put the device into a low-power state. Moreover, if the
device supports runtime power management, the ``->prepare`` callback
method must not update its state in case it is necessary to resume it
from runtime suspend later on.

For devices supporting runtime power management, the return value of the
prepare callback can be used to indicate to the PM core that it may
Expand All @@ -351,11 +354,35 @@ the phases are: ``prepare``, ``suspend``, ``suspend_late``, ``suspend_noirq``.
is because all such devices are initially set to runtime-suspended with
runtime PM disabled.

This feature also can be controlled by device drivers by using the
``DPM_FLAG_NEVER_SKIP`` and ``DPM_FLAG_SMART_PREPARE`` driver power
management flags. [Typically, they are set at the time the driver is
probed against the device in question by passing them to the
:c:func:`dev_pm_set_driver_flags` helper function.] If the first of
these flags is set, the PM core will not apply the direct-complete
procedure described above to the given device and, consequenty, to any
of its ancestors. The second flag, when set, informs the middle layer
code (bus types, device types, PM domains, classes) that it should take
the return value of the ``->prepare`` callback provided by the driver
into account and it may only return a positive value from its own
``->prepare`` callback if the driver's one also has returned a positive
value.

2. The ``->suspend`` methods should quiesce the device to stop it from
performing I/O. They also may save the device registers and put it into
the appropriate low-power state, depending on the bus type the device is
on, and they may enable wakeup events.

However, for devices supporting runtime power management, the
``->suspend`` methods provided by subsystems (bus types and PM domains
in particular) must follow an additional rule regarding what can be done
to the devices before their drivers' ``->suspend`` methods are called.
Namely, they can only resume the devices from runtime suspend by
calling :c:func:`pm_runtime_resume` for them, if that is necessary, and
they must not update the state of the devices in any other way at that
time (in case the drivers need to resume the devices from runtime
suspend in their ``->suspend`` methods).

3. For a number of devices it is convenient to split suspend into the
"quiesce device" and "save device state" phases, in which cases
``suspend_late`` is meant to do the latter. It is always executed after
Expand Down Expand Up @@ -729,6 +756,36 @@ state temporarily, for example so that its system wakeup capability can be
disabled. This all depends on the hardware and the design of the subsystem and
device driver in question.

If it is necessary to resume a device from runtime suspend during a system-wide
transition into a sleep state, that can be done by calling
:c:func:`pm_runtime_resume` for it from the ``->suspend`` callback (or its
couterpart for transitions related to hibernation) of either the device's driver
or a subsystem responsible for it (for example, a bus type or a PM domain).
That is guaranteed to work by the requirement that subsystems must not change
the state of devices (possibly except for resuming them from runtime suspend)
from their ``->prepare`` and ``->suspend`` callbacks (or equivalent) *before*
invoking device drivers' ``->suspend`` callbacks (or equivalent).

Some bus types and PM domains have a policy to resume all devices from runtime
suspend upfront in their ``->suspend`` callbacks, but that may not be really
necessary if the driver of the device can cope with runtime-suspended devices.
The driver can indicate that by setting ``DPM_FLAG_SMART_SUSPEND`` in
:c:member:`power.driver_flags` at the probe time, by passing it to the
:c:func:`dev_pm_set_driver_flags` helper. That also may cause middle-layer code
(bus types, PM domains etc.) to skip the ``->suspend_late`` and
``->suspend_noirq`` callbacks provided by the driver if the device remains in
runtime suspend at the beginning of the ``suspend_late`` phase of system-wide
suspend (or in the ``poweroff_late`` phase of hibernation), when runtime PM
has been disabled for it, under the assumption that its state should not change
after that point until the system-wide transition is over. If that happens, the
driver's system-wide resume callbacks, if present, may still be invoked during
the subsequent system-wide resume transition and the device's runtime power
management status may be set to "active" before enabling runtime PM for it,
so the driver must be prepared to cope with the invocation of its system-wide
resume callbacks back-to-back with its ``->runtime_suspend`` one (without the
intervening ``->runtime_resume`` and so on) and the final state of the device
must reflect the "active" status for runtime PM in that case.

During system-wide resume from a sleep state it's easiest to put devices into
the full-power state, as explained in :file:`Documentation/power/runtime_pm.txt`.
Refer to that document for more information regarding this particular issue as
Expand Down
33 changes: 33 additions & 0 deletions Documentation/power/pci.txt
Original file line number Diff line number Diff line change
Expand Up @@ -961,6 +961,39 @@ dev_pm_ops to indicate that one suspend routine is to be pointed to by the
.suspend(), .freeze(), and .poweroff() members and one resume routine is to
be pointed to by the .resume(), .thaw(), and .restore() members.

3.1.19. Driver Flags for Power Management

The PM core allows device drivers to set flags that influence the handling of
power management for the devices by the core itself and by middle layer code
including the PCI bus type. The flags should be set once at the driver probe
time with the help of the dev_pm_set_driver_flags() function and they should not
be updated directly afterwards.

The DPM_FLAG_NEVER_SKIP flag prevents the PM core from using the direct-complete
mechanism allowing device suspend/resume callbacks to be skipped if the device
is in runtime suspend when the system suspend starts. That also affects all of
the ancestors of the device, so this flag should only be used if absolutely
necessary.

The DPM_FLAG_SMART_PREPARE flag instructs the PCI bus type to only return a
positive value from pci_pm_prepare() if the ->prepare callback provided by the
driver of the device returns a positive value. That allows the driver to opt
out from using the direct-complete mechanism dynamically.

The DPM_FLAG_SMART_SUSPEND flag tells the PCI bus type that from the driver's
perspective the device can be safely left in runtime suspend during system
suspend. That causes pci_pm_suspend(), pci_pm_freeze() and pci_pm_poweroff()
to skip resuming the device from runtime suspend unless there are PCI-specific
reasons for doing that. Also, it causes pci_pm_suspend_late/noirq(),
pci_pm_freeze_late/noirq() and pci_pm_poweroff_late/noirq() to return early
if the device remains in runtime suspend in the beginning of the "late" phase
of the system-wide transition under way. Moreover, if the device is in
runtime suspend in pci_pm_resume_noirq() or pci_pm_restore_noirq(), its runtime
power management status will be changed to "active" (as it is going to be put
into D0 going forward), but if it is in runtime suspend in pci_pm_thaw_noirq(),
the function will set the power.direct_complete flag for it (to make the PM core
skip the subsequent "thaw" callbacks for it) and return.

3.2. Device Runtime Power Management
------------------------------------
In addition to providing device power management callbacks PCI device drivers
Expand Down
13 changes: 6 additions & 7 deletions Documentation/power/pm_qos_interface.txt
Original file line number Diff line number Diff line change
Expand Up @@ -98,8 +98,7 @@ Values are updated in response to changes of the request list.
The target values of resume latency and active state latency tolerance are
simply the minimum of the request values held in the parameter list elements.
The PM QoS flags aggregate value is a gather (bitwise OR) of all list elements'
values. Two device PM QoS flags are defined currently: PM_QOS_FLAG_NO_POWER_OFF
and PM_QOS_FLAG_REMOTE_WAKEUP.
values. One device PM QoS flag is defined currently: PM_QOS_FLAG_NO_POWER_OFF.

Note: The aggregated target values are implemented in such a way that reading
the aggregated value does not require any locking mechanism.
Expand Down Expand Up @@ -153,14 +152,14 @@ PM QoS list of resume latency constraints and remove sysfs attribute
pm_qos_resume_latency_us from the device's power directory.

int dev_pm_qos_expose_flags(device, value)
Add a request to the device's PM QoS list of flags and create sysfs attributes
pm_qos_no_power_off and pm_qos_remote_wakeup under the device's power directory
allowing user space to change these flags' value.
Add a request to the device's PM QoS list of flags and create sysfs attribute
pm_qos_no_power_off under the device's power directory allowing user space to
change the value of the PM_QOS_FLAG_NO_POWER_OFF flag.

void dev_pm_qos_hide_flags(device)
Drop the request added by dev_pm_qos_expose_flags() from the device's PM QoS list
of flags and remove sysfs attributes pm_qos_no_power_off and pm_qos_remote_wakeup
under the device's power directory.
of flags and remove sysfs attribute pm_qos_no_power_off from the device's power
directory.

Notification mechanisms:
The per-device PM QoS framework has a per-device notification tree.
Expand Down
4 changes: 3 additions & 1 deletion MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -3637,6 +3637,8 @@ F: drivers/cpufreq/arm_big_little_dt.c

CPU POWER MONITORING SUBSYSTEM
M: Thomas Renninger <[email protected]>
M: Shuah Khan <[email protected]>
M: Shuah Khan <[email protected]>
L: [email protected]
S: Maintained
F: tools/power/cpupower/
Expand Down Expand Up @@ -10060,7 +10062,7 @@ M: Stephen Boyd <[email protected]>
L: [email protected]
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git
F: drivers/base/power/opp/
F: drivers/opp/
F: include/linux/pm_opp.h
F: Documentation/power/opp.txt
F: Documentation/devicetree/bindings/opp/
Expand Down
24 changes: 0 additions & 24 deletions arch/arm/common/locomo.c
Original file line number Diff line number Diff line change
Expand Up @@ -826,28 +826,6 @@ static int locomo_match(struct device *_dev, struct device_driver *_drv)
return dev->devid == drv->devid;
}

static int locomo_bus_suspend(struct device *dev, pm_message_t state)
{
struct locomo_dev *ldev = LOCOMO_DEV(dev);
struct locomo_driver *drv = LOCOMO_DRV(dev->driver);
int ret = 0;

if (drv && drv->suspend)
ret = drv->suspend(ldev, state);
return ret;
}

static int locomo_bus_resume(struct device *dev)
{
struct locomo_dev *ldev = LOCOMO_DEV(dev);
struct locomo_driver *drv = LOCOMO_DRV(dev->driver);
int ret = 0;

if (drv && drv->resume)
ret = drv->resume(ldev);
return ret;
}

static int locomo_bus_probe(struct device *dev)
{
struct locomo_dev *ldev = LOCOMO_DEV(dev);
Expand Down Expand Up @@ -875,8 +853,6 @@ struct bus_type locomo_bus_type = {
.match = locomo_match,
.probe = locomo_bus_probe,
.remove = locomo_bus_remove,
.suspend = locomo_bus_suspend,
.resume = locomo_bus_resume,
};

int locomo_driver_register(struct locomo_driver *driver)
Expand Down
2 changes: 0 additions & 2 deletions arch/arm/include/asm/hardware/locomo.h
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,6 @@ struct locomo_driver {
unsigned int devid;
int (*probe)(struct locomo_dev *);
int (*remove)(struct locomo_dev *);
int (*suspend)(struct locomo_dev *, pm_message_t);
int (*resume)(struct locomo_dev *);
};

#define LOCOMO_DRV(_d) container_of((_d), struct locomo_driver, drv)
Expand Down
8 changes: 8 additions & 0 deletions arch/arm/include/asm/topology.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ void init_cpu_topology(void);
void store_cpu_topology(unsigned int cpuid);
const struct cpumask *cpu_coregroup_mask(int cpu);

#include <linux/arch_topology.h>

/* Replace task scheduler's default frequency-invariant accounting */
#define arch_scale_freq_capacity topology_get_freq_scale

/* Replace task scheduler's default cpu-invariant accounting */
#define arch_scale_cpu_capacity topology_get_cpu_scale

#else

static inline void init_cpu_topology(void) { }
Expand Down
Loading

0 comments on commit bd2cd7d

Please sign in to comment.