forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kern…
…el/git/arm64/linux Pull arm64 updates from Catalin Marinas: - arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE vector granule register added to the user regs together with SVE perf extensions documentation. - SVE updates: add HWCAP for SVE EBF16, update the SVE ABI documentation to match the actual kernel behaviour (zeroing the registers on syscall rather than "zeroed or preserved" previously). - More conversions to automatic system registers generation. - vDSO: use self-synchronising virtual counter access in gettimeofday() if the architecture supports it. - arm64 stacktrace cleanups and improvements. - arm64 atomics improvements: always inline assembly, remove LL/SC trampolines. - Improve the reporting of EL1 exceptions: rework BTI and FPAC exception handling, better EL1 undefs reporting. - Cortex-A510 erratum 2658417: remove BF16 support due to incorrect result. - arm64 defconfig updates: build CoreSight as a module, enable options necessary for docker, memory hotplug/hotremove, enable all PMUs provided by Arm. - arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME extensions). - arm64 ftraces updates/fixes: fix module PLTs with mcount, remove unused function. - kselftest updates for arm64: simple HWCAP validation, FP stress test improvements, validation of ZA regs in signal handlers, include larger SVE and SME vector lengths in signal tests, various cleanups. - arm64 alternatives (code patching) improvements to robustness and consistency: replace cpucap static branches with equivalent alternatives, associate callback alternatives with a cpucap. - Miscellaneous updates: optimise kprobe performance of patching single-step slots, simplify uaccess_mask_ptr(), move MTE registers initialisation to C, support huge vmalloc() mappings, run softirqs on the per-CPU IRQ stack, compat (arm32) misalignment fixups for multiword accesses. * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (126 commits) arm64: alternatives: Use vdso/bits.h instead of linux/bits.h arm64/kprobe: Optimize the performance of patching single-step slot arm64: defconfig: Add Coresight as module kselftest/arm64: Handle EINTR while reading data from children kselftest/arm64: Flag fp-stress as exiting when we begin finishing up kselftest/arm64: Don't repeat termination handler for fp-stress ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs arm64/mm: fold check for KFENCE into can_set_direct_map() arm64: ftrace: fix module PLTs with mcount arm64: module: Remove unused plt_entry_is_initialized() arm64: module: Make plt_equals_entry() static arm64: fix the build with binutils 2.27 kselftest/arm64: Don't enable v8.5 for MTE selftest builds arm64: uaccess: simplify uaccess_mask_ptr() arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header kselftest/arm64: Fix typo in hwcap check arm64: mte: move register initialization to C arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate() arm64: dma: Drop cache invalidation from arch_dma_prep_coherent() arm64/sve: Add Perf extensions documentation ...
- Loading branch information
Showing
130 changed files
with
4,608 additions
and
1,259 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
============================================================= | ||
Alibaba's T-Head SoC Uncore Performance Monitoring Unit (PMU) | ||
============================================================= | ||
|
||
The Yitian 710, custom-built by Alibaba Group's chip development business, | ||
T-Head, implements uncore PMU for performance and functional debugging to | ||
facilitate system maintenance. | ||
|
||
DDR Sub-System Driveway (DRW) PMU Driver | ||
========================================= | ||
|
||
Yitian 710 employs eight DDR5/4 channels, four on each die. Each DDR5 channel | ||
is independent of others to service system memory requests. And one DDR5 | ||
channel is split into two independent sub-channels. The DDR Sub-System Driveway | ||
implements separate PMUs for each sub-channel to monitor various performance | ||
metrics. | ||
|
||
The Driveway PMU devices are named as ali_drw_<sys_base_addr> with perf. | ||
For example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two | ||
sub-channels of the same channel in die 0. And the PMU device of die 1 is | ||
prefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000. | ||
|
||
Each sub-channel has 36 PMU counters in total, which is classified into | ||
four groups: | ||
|
||
- Group 0: PMU Cycle Counter. This group has one pair of counters | ||
pmu_cycle_cnt_low and pmu_cycle_cnt_high, that is used as the cycle count | ||
based on DDRC core clock. | ||
|
||
- Group 1: PMU Bandwidth Counters. This group has 8 counters that are used | ||
to count the total access number of either the eight bank groups in a | ||
selected rank, or four ranks separately in the first 4 counters. The base | ||
transfer unit is 64B. | ||
|
||
- Group 2: PMU Retry Counters. This group has 10 counters, that intend to | ||
count the total retry number of each type of uncorrectable error. | ||
|
||
- Group 3: PMU Common Counters. This group has 16 counters, that are used | ||
to count the common events. | ||
|
||
For now, the Driveway PMU driver only uses counters in group 0 and group 3. | ||
|
||
The DDR Controller (DDRCTL) and DDR PHY combine to create a complete solution | ||
for connecting an SoC application bus to DDR memory devices. The DDRCTL | ||
receives transactions Host Interface (HIF) which is custom-defined by Synopsys. | ||
These transactions are queued internally and scheduled for access while | ||
satisfying the SDRAM protocol timing requirements, transaction priorities, and | ||
dependencies between the transactions. The DDRCTL in turn issues commands on | ||
the DDR PHY Interface (DFI) to the PHY module, which launches and captures data | ||
to and from the SDRAM. The driveway PMUs have hardware logic to gather | ||
statistics and performance logging signals on HIF, DFI, etc. | ||
|
||
By counting the READ, WRITE and RMW commands sent to the DDRC through the HIF | ||
interface, we could calculate the bandwidth. Example usage of counting memory | ||
data bandwidth:: | ||
|
||
perf stat \ | ||
-e ali_drw_21000/hif_wr/ \ | ||
-e ali_drw_21000/hif_rd/ \ | ||
-e ali_drw_21000/hif_rmw/ \ | ||
-e ali_drw_21000/cycle/ \ | ||
-e ali_drw_21080/hif_wr/ \ | ||
-e ali_drw_21080/hif_rd/ \ | ||
-e ali_drw_21080/hif_rmw/ \ | ||
-e ali_drw_21080/cycle/ \ | ||
-e ali_drw_23000/hif_wr/ \ | ||
-e ali_drw_23000/hif_rd/ \ | ||
-e ali_drw_23000/hif_rmw/ \ | ||
-e ali_drw_23000/cycle/ \ | ||
-e ali_drw_23080/hif_wr/ \ | ||
-e ali_drw_23080/hif_rd/ \ | ||
-e ali_drw_23080/hif_rmw/ \ | ||
-e ali_drw_23080/cycle/ \ | ||
-e ali_drw_25000/hif_wr/ \ | ||
-e ali_drw_25000/hif_rd/ \ | ||
-e ali_drw_25000/hif_rmw/ \ | ||
-e ali_drw_25000/cycle/ \ | ||
-e ali_drw_25080/hif_wr/ \ | ||
-e ali_drw_25080/hif_rd/ \ | ||
-e ali_drw_25080/hif_rmw/ \ | ||
-e ali_drw_25080/cycle/ \ | ||
-e ali_drw_27000/hif_wr/ \ | ||
-e ali_drw_27000/hif_rd/ \ | ||
-e ali_drw_27000/hif_rmw/ \ | ||
-e ali_drw_27000/cycle/ \ | ||
-e ali_drw_27080/hif_wr/ \ | ||
-e ali_drw_27080/hif_rd/ \ | ||
-e ali_drw_27080/hif_rmw/ \ | ||
-e ali_drw_27080/cycle/ -- sleep 10 | ||
|
||
The average DRAM bandwidth can be calculated as follows: | ||
|
||
- Read Bandwidth = perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle | ||
- Write Bandwidth = (perf_hif_wr + perf_hif_rmw) * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle | ||
|
||
Here, DDRC_WIDTH = 64 bytes. | ||
|
||
The current driver does not support sampling. So "perf record" is | ||
unsupported. Also attach to a task is unsupported as the events are all | ||
uncore. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,3 +18,4 @@ Performance monitor support | |
xgene-pmu | ||
arm_dsu_pmu | ||
thunderx2-pmu | ||
alibaba_pmu |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -748,6 +748,12 @@ S: Supported | |
F: drivers/infiniband/hw/erdma | ||
F: include/uapi/rdma/erdma-abi.h | ||
|
||
ALIBABA PMU DRIVER | ||
M: Shuai Xue <[email protected]> | ||
S: Supported | ||
F: Documentation/admin-guide/perf/alibaba_pmu.rst | ||
F: drivers/perf/alibaba_uncore_dwr_pmu.c | ||
|
||
ALIENWARE WMI DRIVER | ||
L: [email protected] | ||
S: Maintained | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.