forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kerne…
…l/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "A pretty small batch for us, and apologies for it being a bit late, I wanted to sneak Christophe's user_access_begin() series in. Summary: - Implement user_access_begin() and friends for our platforms that support controlling kernel access to userspace. - Enable CONFIG_VMAP_STACK on 32-bit Book3S and 8xx. - Some tweaks to our pseries IOMMU code to allow SVMs ("secure" virtual machines) to use the IOMMU. - Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE to the 32-bit VDSO, and some other improvements. - A series to use the PCI hotplug framework to control opencapi card's so that they can be reset and re-read after flashing a new FPGA image. As well as other minor fixes and improvements as usual. Thanks to: Alastair D'Silva, Alexandre Ghiti, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Bai Yingjie, Chen Zhou, Christophe Leroy, Frederic Barrat, Greg Kurz, Jason A. Donenfeld, Joel Stanley, Jordan Niethe, Julia Lawall, Krzysztof Kozlowski, Laurent Dufour, Laurentiu Tudor, Linus Walleij, Michael Bringmann, Nathan Chancellor, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Peter Ujfalusi, Pingfan Liu, Ram Pai, Randy Dunlap, Russell Currey, Sam Bobroff, Sebastian Andrzej Siewior, Shawn Anastasio, Stephen Rothwell, Steve Best, Sukadev Bhattiprolu, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain" * tag 'powerpc-5.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (131 commits) powerpc: configs: Cleanup old Kconfig options powerpc/configs/skiroot: Enable some more hardening options powerpc/configs/skiroot: Disable xmon default & enable reboot on panic powerpc/configs/skiroot: Enable security features powerpc/configs/skiroot: Update for symbol movement only powerpc/configs/skiroot: Drop default n CONFIG_CRYPTO_ECHAINIV powerpc/configs/skiroot: Drop HID_LOGITECH powerpc/configs: Drop NET_VENDOR_HP which moved to staging powerpc/configs: NET_CADENCE became NET_VENDOR_CADENCE powerpc/configs: Drop CONFIG_QLGE which moved to staging powerpc: Do not consider weak unresolved symbol relocations as bad powerpc/32s: Fix kasan_early_hash_table() for CONFIG_VMAP_STACK powerpc: indent to improve Kconfig readability powerpc: Provide initial documentation for PAPR hcalls powerpc: Implement user_access_save() and user_access_restore() powerpc: Implement user_access_begin and friends powerpc/32s: Prepare prevent_user_access() for user_access_end() powerpc/32s: Drop NULL addr verification powerpc/kuap: Fix set direction in allow/prevent_user_access() powerpc/32s: Fix bad_kuap_fault() ...
- Loading branch information
Showing
141 changed files
with
2,310 additions
and
1,121 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
.. SPDX-License-Identifier: GPL-2.0 | ||
.. _imc: | ||
|
||
=================================== | ||
IMC (In-Memory Collection Counters) | ||
=================================== | ||
|
||
Anju T Sudhakar, 10 May 2019 | ||
|
||
.. contents:: | ||
:depth: 3 | ||
|
||
|
||
Basic overview | ||
============== | ||
|
||
IMC (In-Memory collection counters) is a hardware monitoring facility that | ||
collects large numbers of hardware performance events at Nest level (these are | ||
on-chip but off-core), Core level and Thread level. | ||
|
||
The Nest PMU counters are handled by a Nest IMC microcode which runs in the OCC | ||
(On-Chip Controller) complex. The microcode collects the counter data and moves | ||
the nest IMC counter data to memory. | ||
|
||
The Core and Thread IMC PMU counters are handled in the core. Core level PMU | ||
counters give us the IMC counters' data per core and thread level PMU counters | ||
give us the IMC counters' data per CPU thread. | ||
|
||
OPAL obtains the IMC PMU and supported events information from the IMC Catalog | ||
and passes on to the kernel via the device tree. The event's information | ||
contains: | ||
|
||
- Event name | ||
- Event Offset | ||
- Event description | ||
|
||
and possibly also: | ||
|
||
- Event scale | ||
- Event unit | ||
|
||
Some PMUs may have a common scale and unit values for all their supported | ||
events. For those cases, the scale and unit properties for those events must be | ||
inherited from the PMU. | ||
|
||
The event offset in the memory is where the counter data gets accumulated. | ||
|
||
IMC catalog is available at: | ||
https://github.com/open-power/ima-catalog | ||
|
||
The kernel discovers the IMC counters information in the device tree at the | ||
`imc-counters` device node which has a compatible field | ||
`ibm,opal-in-memory-counters`. From the device tree, the kernel parses the PMUs | ||
and their event's information and register the PMU and its attributes in the | ||
kernel. | ||
|
||
IMC example usage | ||
================= | ||
|
||
.. code-block:: sh | ||
# perf list | ||
[...] | ||
nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/ [Kernel PMU event] | ||
nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/ [Kernel PMU event] | ||
[...] | ||
core_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] | ||
core_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] | ||
[...] | ||
thread_imc/CPM_0THRD_NON_IDLE_PCYC/ [Kernel PMU event] | ||
thread_imc/CPM_1THRD_NON_IDLE_INST/ [Kernel PMU event] | ||
To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/: | ||
|
||
.. code-block:: sh | ||
# ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket | ||
To see non-idle instructions for core 0: | ||
|
||
.. code-block:: sh | ||
# ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000 | ||
To see non-idle instructions for a "make": | ||
|
||
.. code-block:: sh | ||
# ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make | ||
IMC Trace-mode | ||
=============== | ||
|
||
POWER9 supports two modes for IMC which are the Accumulation mode and Trace | ||
mode. In Accumulation mode, event counts are accumulated in system Memory. | ||
Hypervisor then reads the posted counts periodically or when requested. In IMC | ||
Trace mode, the 64 bit trace SCOM value is initialized with the event | ||
information. The CPMCxSEL and CPMC_LOAD in the trace SCOM, specifies the event | ||
to be monitored and the sampling duration. On each overflow in the CPMCxSEL, | ||
hardware snapshots the program counter along with event counts and writes into | ||
memory pointed by LDBAR. | ||
|
||
LDBAR is a 64 bit special purpose per thread register, it has bits to indicate | ||
whether hardware is configured for accumulation or trace mode. | ||
|
||
LDBAR Register Layout | ||
--------------------- | ||
|
||
+-------+----------------------+ | ||
| 0 | Enable/Disable | | ||
+-------+----------------------+ | ||
| 1 | 0: Accumulation Mode | | ||
| +----------------------+ | ||
| | 1: Trace Mode | | ||
+-------+----------------------+ | ||
| 2:3 | Reserved | | ||
+-------+----------------------+ | ||
| 4-6 | PB scope | | ||
+-------+----------------------+ | ||
| 7 | Reserved | | ||
+-------+----------------------+ | ||
| 8:50 | Counter Address | | ||
+-------+----------------------+ | ||
| 51:63 | Reserved | | ||
+-------+----------------------+ | ||
|
||
TRACE_IMC_SCOM bit representation | ||
--------------------------------- | ||
|
||
+-------+------------+ | ||
| 0:1 | SAMPSEL | | ||
+-------+------------+ | ||
| 2:33 | CPMC_LOAD | | ||
+-------+------------+ | ||
| 34:40 | CPMC1SEL | | ||
+-------+------------+ | ||
| 41:47 | CPMC2SEL | | ||
+-------+------------+ | ||
| 48:50 | BUFFERSIZE | | ||
+-------+------------+ | ||
| 51:63 | RESERVED | | ||
+-------+------------+ | ||
|
||
CPMC_LOAD contains the sampling duration. SAMPSEL and CPMCxSEL determines the | ||
event to count. BUFFERSIZE indicates the memory range. On each overflow, | ||
hardware snapshots the program counter along with event counts and updates the | ||
memory and reloads the CMPC_LOAD value for the next sampling duration. IMC | ||
hardware does not support exceptions, so it quietly wraps around if memory | ||
buffer reaches the end. | ||
|
||
*Currently the event monitored for trace-mode is fixed as cycle.* | ||
|
||
Trace IMC example usage | ||
======================= | ||
|
||
.. code-block:: sh | ||
# perf list | ||
[....] | ||
trace_imc/trace_cycles/ [Kernel PMU event] | ||
To record an application/process with trace-imc event: | ||
|
||
.. code-block:: sh | ||
# perf record -e trace_imc/trace_cycles/ yes > /dev/null | ||
[ perf record: Woken up 1 times to write data ] | ||
[ perf record: Captured and wrote 0.012 MB perf.data (21 samples) ] | ||
The `perf.data` generated, can be read using perf report. | ||
|
||
Benefits of using IMC trace-mode | ||
================================ | ||
|
||
PMI (Performance Monitoring Interrupts) interrupt handling is avoided, since IMC | ||
trace mode snapshots the program counter and updates to the memory. And this | ||
also provide a way for the operating system to do instruction sampling in real | ||
time without PMI processing overhead. | ||
|
||
Performance data using `perf top` with and without trace-imc event. | ||
|
||
PMI interrupts count when `perf top` command is executed without trace-imc event. | ||
|
||
.. code-block:: sh | ||
# grep PMI /proc/interrupts | ||
PMI: 0 0 0 0 Performance monitoring interrupts | ||
# ./perf top | ||
... | ||
# grep PMI /proc/interrupts | ||
PMI: 39735 8710 17338 17801 Performance monitoring interrupts | ||
# ./perf top -e trace_imc/trace_cycles/ | ||
... | ||
# grep PMI /proc/interrupts | ||
PMI: 39735 8710 17338 17801 Performance monitoring interrupts | ||
That is, the PMI interrupt counts do not increment when using the `trace_imc` event. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.