Skip to content

Commit

Permalink
Merge tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/…
Browse files Browse the repository at this point in the history
…git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "Again, cpufreq gets more changes than the other parts this time (one
  new driver, one old driver less, a bunch of enhancements of the
  existing code, new CPU IDs, fixes, cleanups)

  There also are some changes in cpuidle (idle injection rework, a
  couple of new CPU IDs, online/offline rework in intel_idle, fixes and
  cleanups), in the generic power domains framework (mostly related to
  supporting power domains containing CPUs), and in the Operating
  Performance Points (OPP) library (mostly related to supporting devices
  with multiple voltage regulators)

  In addition to that, the system sleep state selection interface is
  modified to make it easier for distributions with unchanged user space
  to support suspend-to-idle as the default system suspend method, some
  issues are fixed in the PM core, the latency tolerance PM QoS
  framework is improved a bit, the Intel RAPL power capping driver is
  cleaned up and there are some fixes and cleanups in the devfreq
  subsystem

  Specifics:

   - New cpufreq driver for Broadcom STB SoCs and a Device Tree binding
     for it (Markus Mayer)

   - Support for ARM Integrator/AP and Integrator/CP in the generic DT
     cpufreq driver and elimination of the old Integrator cpufreq driver
     (Linus Walleij)

   - Support for the zx296718, r8a7743 and r8a7745, Socionext UniPhier,
     and PXA SoCs in the the generic DT cpufreq driver (Baoyou Xie,
     Geert Uytterhoeven, Masahiro Yamada, Robert Jarzmik)

   - cpufreq core fix to eliminate races that may lead to using inactive
     policy objects and related cleanups (Rafael Wysocki)

   - cpufreq schedutil governor update to make it use SCHED_FIFO kernel
     threads (instead of regular workqueues) for doing delayed work (to
     reduce the response latency in some cases) and related cleanups
     (Viresh Kumar)

   - New cpufreq sysfs attribute for resetting statistics (Markus Mayer)

   - cpufreq governors fixes and cleanups (Chen Yu, Stratos Karafotis,
     Viresh Kumar)

   - Support for using generic cpufreq governors in the intel_pstate
     driver (Rafael Wysocki)

   - Support for per-logical-CPU P-state limits and the EPP/EPB (Energy
     Performance Preference/Energy Performance Bias) knobs in the
     intel_pstate driver (Srinivas Pandruvada)

   - New CPU ID for Knights Mill in intel_pstate (Piotr Luc)

   - intel_pstate driver modification to use the P-state selection
     algorithm based on CPU load on platforms with the system profile in
     the ACPI tables set to "mobile" (Srinivas Pandruvada)

   - intel_pstate driver cleanups (Arnd Bergmann, Rafael Wysocki,
     Srinivas Pandruvada)

   - cpufreq powernv driver updates including fast switching support
     (for the schedutil governor), fixes and cleanus (Akshay Adiga,
     Andrew Donnellan, Denis Kirjanov)

   - acpi-cpufreq driver rework to switch it over to the new CPU
     offline/online state machine (Sebastian Andrzej Siewior)

   - Assorted cleanups in cpufreq drivers (Wei Yongjun, Prashanth
     Prakash)

   - Idle injection rework (to make it use the regular idle path instead
     of a home-grown custom one) and related powerclamp thermal driver
     updates (Peter Zijlstra, Jacob Pan, Petr Mladek, Sebastian Andrzej
     Siewior)

   - New CPU IDs for Atom Z34xx and Knights Mill in intel_idle (Andy
     Shevchenko, Piotr Luc)

   - intel_idle driver cleanups and switch over to using the new CPU
     offline/online state machine (Anna-Maria Gleixner, Sebastian
     Andrzej Siewior)

   - cpuidle DT driver update to support suspend-to-idle properly
     (Sudeep Holla)

   - cpuidle core cleanups and misc updates (Daniel Lezcano, Pan Bian,
     Rafael Wysocki)

   - Preliminary support for power domains including CPUs in the generic
     power domains (genpd) framework and related DT bindings (Lina Iyer)

   - Assorted fixes and cleanups in the generic power domains (genpd)
     framework (Colin Ian King, Dan Carpenter, Geert Uytterhoeven)

   - Preliminary support for devices with multiple voltage regulators
     and related fixes and cleanups in the Operating Performance Points
     (OPP) library (Viresh Kumar, Masahiro Yamada, Stephen Boyd)

   - System sleep state selection interface rework to make it easier to
     support suspend-to-idle as the default system suspend method
     (Rafael Wysocki)

   - PM core fixes and cleanups, mostly related to the interactions
     between the system suspend and runtime PM frameworks (Ulf Hansson,
     Sahitya Tummala, Tony Lindgren)

   - Latency tolerance PM QoS framework imorovements (Andrew Lutomirski)

   - New Knights Mill CPU ID for the Intel RAPL power capping driver
     (Piotr Luc)

   - Intel RAPL power capping driver fixes, cleanups and switch over to
     using the new CPU offline/online state machine (Jacob Pan, Thomas
     Gleixner, Sebastian Andrzej Siewior)

   - Fixes and cleanups in the exynos-ppmu, exynos-nocp, rk3399_dmc,
     rockchip-dfi devfreq drivers and the devfreq core (Axel Lin,
     Chanwoo Choi, Javier Martinez Canillas, MyungJoo Ham, Viresh Kumar)

   - Fix for false-positive KASAN warnings during resume from ACPI S3
     (suspend-to-RAM) on x86 (Josh Poimboeuf)

   - Memory map verification during resume from hibernation on x86 to
     ensure a consistent address space layout (Chen Yu)

   - Wakeup sources debugging enhancement (Xing Wei)

   - rockchip-io AVS driver cleanup (Shawn Lin)"

* tag 'pm-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (127 commits)
  devfreq: rk3399_dmc: Don't use OPP structures outside of RCU locks
  devfreq: rk3399_dmc: Remove dangling rcu_read_unlock()
  devfreq: exynos: Don't use OPP structures outside of RCU locks
  Documentation: intel_pstate: Document HWP energy/performance hints
  cpufreq: intel_pstate: Support for energy performance hints with HWP
  cpufreq: intel_pstate: Add locking around HWP requests
  PM / sleep: Print active wakeup sources when blocking on wakeup_count reads
  PM / core: Fix bug in the error handling of async suspend
  PM / wakeirq: Fix dedicated wakeirq for drivers not using autosuspend
  PM / Domains: Fix compatible for domain idle state
  PM / OPP: Don't WARN on multiple calls to dev_pm_opp_set_regulators()
  PM / OPP: Allow platform specific custom set_opp() callbacks
  PM / OPP: Separate out _generic_set_opp()
  PM / OPP: Add infrastructure to manage multiple regulators
  PM / OPP: Pass struct dev_pm_opp_supply to _set_opp_voltage()
  PM / OPP: Manage supply's voltage/current in a separate structure
  PM / OPP: Don't use OPP structure outside of rcu protected section
  PM / OPP: Reword binding supporting multiple regulators per device
  PM / OPP: Fix incorrect cpu-supply property in binding
  cpuidle: Add a kerneldoc comment to cpuidle_use_deepest_state()
  ..
  • Loading branch information
torvalds committed Dec 13, 2016
2 parents 36869cb + bbc17bb commit 7b9dc3f
Show file tree
Hide file tree
Showing 79 changed files with 4,399 additions and 1,549 deletions.
45 changes: 25 additions & 20 deletions Documentation/ABI/testing/sysfs-power
Original file line number Diff line number Diff line change
Expand Up @@ -7,30 +7,35 @@ Description:
subsystem.

What: /sys/power/state
Date: May 2014
Date: November 2016
Contact: Rafael J. Wysocki <[email protected]>
Description:
The /sys/power/state file controls system sleep states.
Reading from this file returns the available sleep state
labels, which may be "mem", "standby", "freeze" and "disk"
(hibernation). The meanings of the first three labels depend on
the relative_sleep_states command line argument as follows:
1) relative_sleep_states = 1
"mem", "standby", "freeze" represent non-hibernation sleep
states from the deepest ("mem", always present) to the
shallowest ("freeze"). "standby" and "freeze" may or may
not be present depending on the capabilities of the
platform. "freeze" can only be present if "standby" is
present.
2) relative_sleep_states = 0 (default)
"mem" - "suspend-to-RAM", present if supported.
"standby" - "power-on suspend", present if supported.
"freeze" - "suspend-to-idle", always present.

Writing to this file one of these strings causes the system to
transition into the corresponding state, if available. See
Documentation/power/states.txt for a description of what
"suspend-to-RAM", "power-on suspend" and "suspend-to-idle" mean.
labels, which may be "mem" (suspend), "standby" (power-on
suspend), "freeze" (suspend-to-idle) and "disk" (hibernation).

Writing one of the above strings to this file causes the system
to transition into the corresponding state, if available.

See Documentation/power/states.txt for more information.

What: /sys/power/mem_sleep
Date: November 2016
Contact: Rafael J. Wysocki <[email protected]>
Description:
The /sys/power/mem_sleep file controls the operating mode of
system suspend. Reading from it returns the available modes
as "s2idle" (always present), "shallow" and "deep" (present if
supported). The mode that will be used on subsequent attempts
to suspend the system (by writing "mem" to the /sys/power/state
file described above) is enclosed in square brackets.

Writing one of the above strings to this file causes the mode
represented by it to be used on subsequent attempts to suspend
the system.

See Documentation/power/states.txt for more information.

What: /sys/power/disk
Date: September 2006
Expand Down
22 changes: 15 additions & 7 deletions Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1560,6 +1560,12 @@
disable
Do not enable intel_pstate as the default
scaling driver for the supported processors
passive
Use intel_pstate as a scaling driver, but configure it
to work with generic cpufreq governors (instead of
enabling its internal governor). This mode cannot be
used along with the hardware-managed P-states (HWP)
feature.
force
Enable intel_pstate on systems that prohibit it by default
in favor of acpi-cpufreq. Forcing the intel_pstate driver
Expand All @@ -1580,6 +1586,9 @@
Description Table, specifies preferred power management
profile as "Enterprise Server" or "Performance Server",
then this feature is turned on by default.
per_cpu_perf_limits
Allow per-logical-CPU P-State performance control limits using
cpufreq sysfs interface

intremap= [X86-64, Intel-IOMMU]
on enable Interrupt Remapping (default)
Expand Down Expand Up @@ -2122,6 +2131,12 @@
memory contents and reserves bad memory
regions that are detected.

mem_sleep_default= [SUSPEND] Default system suspend mode:
s2idle - Suspend-To-Idle
shallow - Power-On Suspend or equivalent (if supported)
deep - Suspend-To-RAM or equivalent (if supported)
See Documentation/power/states.txt.

meye.*= [HW] Set MotionEye Camera parameters
See Documentation/video4linux/meye.txt.

Expand Down Expand Up @@ -3475,13 +3490,6 @@
[KNL, SMP] Set scheduler's default relax_domain_level.
See Documentation/cgroup-v1/cpusets.txt.

relative_sleep_states=
[SUSPEND] Use sleep state labeling where the deepest
state available other than hibernation is always "mem".
Format: { "0" | "1" }
0 -- Traditional sleep state labels.
1 -- Relative sleep state labels.

reserve= [KNL,BUGS] Force the kernel to ignore some iomem area

reservetop= [X86-32]
Expand Down
6 changes: 6 additions & 0 deletions Documentation/cpu-freq/cpufreq-stats.txt
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,17 @@ the stats driver insertion.
total 0
drwxr-xr-x 2 root root 0 May 14 16:06 .
drwxr-xr-x 3 root root 0 May 14 15:58 ..
--w------- 1 root root 4096 May 14 16:06 reset
-r--r--r-- 1 root root 4096 May 14 16:06 time_in_state
-r--r--r-- 1 root root 4096 May 14 16:06 total_trans
-r--r--r-- 1 root root 4096 May 14 16:06 trans_table
--------------------------------------------------------------------------------

- reset
Write-only attribute that can be used to reset the stat counters. This can be
useful for evaluating system behaviour under different governors without the
need for a reboot.

- time_in_state
This gives the amount of time spent in each of the frequencies supported by
this CPU. The cat output will have "<frequency> <time>" pair in each line, which
Expand Down
54 changes: 49 additions & 5 deletions Documentation/cpu-freq/intel-pstate.txt
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ In addition to the frequency-controlling interfaces provided by the cpufreq
core, the driver provides its own sysfs files to control the P-State selection.
These files have been added to /sys/devices/system/cpu/intel_pstate/.
Any changes made to these files are applicable to all CPUs (even in a
multi-package system).
multi-package system, Refer to later section on placing "Per-CPU limits").

max_perf_pct: Limits the maximum P-State that will be requested by
the driver. It states it as a percentage of the available performance. The
Expand Down Expand Up @@ -120,13 +120,57 @@ frequency is fictional for Intel Core processors. Even if the scaling
driver selects a single P-State, the actual frequency the processor
will run at is selected by the processor itself.

Per-CPU limits

The kernel command line option "intel_pstate=per_cpu_perf_limits" forces
the intel_pstate driver to use per-CPU performance limits. When it is set,
the sysfs control interface described above is subject to limitations.
- The following controls are not available for both read and write
/sys/devices/system/cpu/intel_pstate/max_perf_pct
/sys/devices/system/cpu/intel_pstate/min_perf_pct
- The following controls can be used to set performance limits, as far as the
architecture of the processor permits:
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- User can still observe turbo percent and number of P-States from
/sys/devices/system/cpu/intel_pstate/turbo_pct
/sys/devices/system/cpu/intel_pstate/num_pstates
- User can read write system wide turbo status
/sys/devices/system/cpu/no_turbo

Support of energy performance hints
It is possible to provide hints to the HWP algorithms in the processor
to be more performance centric to more energy centric. When the driver
is using HWP, two additional cpufreq sysfs attributes are presented for
each logical CPU.
These attributes are:
- energy_performance_available_preferences
- energy_performance_preference

To get list of supported hints:
$ cat energy_performance_available_preferences
default performance balance_performance balance_power power

The current preference can be read or changed via cpufreq sysfs
attribute "energy_performance_preference". Reading from this attribute
will display current effective setting. User can write any of the valid
preference string to this attribute. User can always restore to power-on
default by writing "default".

Since threads can migrate to different CPUs, this is possible that the
new CPU may have different energy performance preference than the previous
one. To avoid such issues, either threads can be pinned to specific CPUs
or set the same energy performance preference value to all CPUs.

Tuning Intel P-State driver

When HWP mode is not used, debugfs files have also been added to allow the
tuning of the internal governor algorithm. These files are located at
/sys/kernel/debug/pstate_snb/. The algorithm uses a PID (Proportional
Integral Derivative) controller. The PID tunable parameters are:
When the performance can be tuned using PID (Proportional Integral
Derivative) controller, debugfs files are provided for adjusting performance.
They are presented under:
/sys/kernel/debug/pstate_snb/

The PID tunable parameters are:
deadband
d_gain_pct
i_gain_pct
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
Broadcom AVS mail box and interrupt register bindings
=====================================================

A total of three DT nodes are required. One node (brcm,avs-cpu-data-mem)
references the mailbox register used to communicate with the AVS CPU[1]. The
second node (brcm,avs-cpu-l2-intr) is required to trigger an interrupt on
the AVS CPU. The interrupt tells the AVS CPU that it needs to process a
command sent to it by a driver. Interrupting the AVS CPU is mandatory for
commands to be processed.

The interface also requires a reference to the AVS host interrupt controller,
so a driver can react to interrupts generated by the AVS CPU whenever a command
has been processed. See [2] for more information on the brcm,l2-intc node.

[1] The AVS CPU is an independent co-processor that runs proprietary
firmware. On some SoCs, this firmware supports DFS and DVFS in addition to
Adaptive Voltage Scaling.

[2] Documentation/devicetree/bindings/interrupt-controller/brcm,l2-intc.txt


Node brcm,avs-cpu-data-mem
--------------------------

Required properties:
- compatible: must include: brcm,avs-cpu-data-mem and
should include: one of brcm,bcm7271-avs-cpu-data-mem or
brcm,bcm7268-avs-cpu-data-mem
- reg: Specifies base physical address and size of the registers.
- interrupts: The interrupt that the AVS CPU will use to interrupt the host
when a command completed.
- interrupt-parent: The interrupt controller the above interrupt is routed
through.
- interrupt-names: The name of the interrupt used to interrupt the host.

Optional properties:
- None

Node brcm,avs-cpu-l2-intr
-------------------------

Required properties:
- compatible: must include: brcm,avs-cpu-l2-intr and
should include: one of brcm,bcm7271-avs-cpu-l2-intr or
brcm,bcm7268-avs-cpu-l2-intr
- reg: Specifies base physical address and size of the registers.

Optional properties:
- None


Example
=======

avs_host_l2_intc: interrupt-controller@f04d1200 {
#interrupt-cells = <1>;
compatible = "brcm,l2-intc";
interrupt-parent = <&intc>;
reg = <0xf04d1200 0x48>;
interrupt-controller;
interrupts = <0x0 0x19 0x0>;
interrupt-names = "avs";
};

avs-cpu-data-mem@f04c4000 {
compatible = "brcm,bcm7271-avs-cpu-data-mem",
"brcm,avs-cpu-data-mem";
reg = <0xf04c4000 0x60>;
interrupts = <0x1a>;
interrupt-parent = <&avs_host_l2_intc>;
interrupt-names = "sw_intr";
};

avs-cpu-l2-intr@f04d1100 {
compatible = "brcm,bcm7271-avs-cpu-l2-intr",
"brcm,avs-cpu-l2-intr";
reg = <0xf04d1100 0x10>;
};
27 changes: 19 additions & 8 deletions Documentation/devicetree/bindings/opp/opp.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,14 @@ Optional properties:
Single entry is for target voltage and three entries are for <target min max>
voltages.

Entries for multiple regulators must be present in the same order as
regulators are specified in device's DT node.
Entries for multiple regulators shall be provided in the same field separated
by angular brackets <>. The OPP binding doesn't provide any provisions to
relate the values to their power supplies or the order in which the supplies
need to be configured and that is left for the implementation specific
binding.

Entries for all regulators shall be of the same size, i.e. either all use a
single value or triplets.

- opp-microvolt-<name>: Named opp-microvolt property. This is exactly similar to
the above opp-microvolt property, but allows multiple voltage ranges to be
Expand All @@ -104,10 +110,13 @@ Optional properties:

Should only be set if opp-microvolt is set for the OPP.

Entries for multiple regulators must be present in the same order as
regulators are specified in device's DT node. If this property isn't required
for few regulators, then this should be marked as zero for them. If it isn't
required for any regulator, then this property need not be present.
Entries for multiple regulators shall be provided in the same field separated
by angular brackets <>. If current values aren't required for a regulator,
then it shall be filled with 0. If current values aren't required for any of
the regulators, then this field is not required. The OPP binding doesn't
provide any provisions to relate the values to their power supplies or the
order in which the supplies need to be configured and that is left for the
implementation specific binding.

- opp-microamp-<name>: Named opp-microamp property. Similar to
opp-microvolt-<name> property, but for microamp instead.
Expand Down Expand Up @@ -386,10 +395,12 @@ Example 4: Handling multiple regulators
/ {
cpus {
cpu@0 {
compatible = "arm,cortex-a7";
compatible = "vendor,cpu-type";
...

cpu-supply = <&cpu_supply0>, <&cpu_supply1>, <&cpu_supply2>;
vcc0-supply = <&cpu_supply0>;
vcc1-supply = <&cpu_supply1>;
vcc2-supply = <&cpu_supply2>;
operating-points-v2 = <&cpu0_opp_table>;
};
};
Expand Down
33 changes: 33 additions & 0 deletions Documentation/devicetree/bindings/power/domain-idle-state.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
PM Domain Idle State Node:

A domain idle state node represents the state parameters that will be used to
select the state when there are no active components in the domain.

The state node has the following parameters -

- compatible:
Usage: Required
Value type: <string>
Definition: Must be "domain-idle-state".

- entry-latency-us
Usage: Required
Value type: <prop-encoded-array>
Definition: u32 value representing worst case latency in
microseconds required to enter the idle state.
The exit-latency-us duration may be guaranteed
only after entry-latency-us has passed.

- exit-latency-us
Usage: Required
Value type: <prop-encoded-array>
Definition: u32 value representing worst case latency
in microseconds required to exit the idle state.

- min-residency-us
Usage: Required
Value type: <prop-encoded-array>
Definition: u32 value representing minimum residency duration
in microseconds after which the idle state will yield
power benefits after overcoming the overhead in entering
i the idle state.
Loading

0 comments on commit 7b9dc3f

Please sign in to comment.