Skip to content

Commit

Permalink
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kern…
Browse files Browse the repository at this point in the history
…el/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - arm64 perf: DDR PMU driver for Alibaba's T-Head Yitian 710 SoC, SVE
   vector granule register added to the user regs together with SVE perf
   extensions documentation.

 - SVE updates: add HWCAP for SVE EBF16, update the SVE ABI
   documentation to match the actual kernel behaviour (zeroing the
   registers on syscall rather than "zeroed or preserved" previously).

 - More conversions to automatic system registers generation.

 - vDSO: use self-synchronising virtual counter access in gettimeofday()
   if the architecture supports it.

 - arm64 stacktrace cleanups and improvements.

 - arm64 atomics improvements: always inline assembly, remove LL/SC
   trampolines.

 - Improve the reporting of EL1 exceptions: rework BTI and FPAC
   exception handling, better EL1 undefs reporting.

 - Cortex-A510 erratum 2658417: remove BF16 support due to incorrect
   result.

 - arm64 defconfig updates: build CoreSight as a module, enable options
   necessary for docker, memory hotplug/hotremove, enable all PMUs
   provided by Arm.

 - arm64 ptrace() support for TPIDR2_EL0 (register provided with the SME
   extensions).

 - arm64 ftraces updates/fixes: fix module PLTs with mcount, remove
   unused function.

 - kselftest updates for arm64: simple HWCAP validation, FP stress test
   improvements, validation of ZA regs in signal handlers, include
   larger SVE and SME vector lengths in signal tests, various cleanups.

 - arm64 alternatives (code patching) improvements to robustness and
   consistency: replace cpucap static branches with equivalent
   alternatives, associate callback alternatives with a cpucap.

 - Miscellaneous updates: optimise kprobe performance of patching
   single-step slots, simplify uaccess_mask_ptr(), move MTE registers
   initialisation to C, support huge vmalloc() mappings, run softirqs on
   the per-CPU IRQ stack, compat (arm32) misalignment fixups for
   multiword accesses.

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (126 commits)
  arm64: alternatives: Use vdso/bits.h instead of linux/bits.h
  arm64/kprobe: Optimize the performance of patching single-step slot
  arm64: defconfig: Add Coresight as module
  kselftest/arm64: Handle EINTR while reading data from children
  kselftest/arm64: Flag fp-stress as exiting when we begin finishing up
  kselftest/arm64: Don't repeat termination handler for fp-stress
  ARM64: reloc_test: add __init/__exit annotations to module init/exit funcs
  arm64/mm: fold check for KFENCE into can_set_direct_map()
  arm64: ftrace: fix module PLTs with mcount
  arm64: module: Remove unused plt_entry_is_initialized()
  arm64: module: Make plt_equals_entry() static
  arm64: fix the build with binutils 2.27
  kselftest/arm64: Don't enable v8.5 for MTE selftest builds
  arm64: uaccess: simplify uaccess_mask_ptr()
  arm64: asm/perf_regs.h: Avoid C++-style comment in UAPI header
  kselftest/arm64: Fix typo in hwcap check
  arm64: mte: move register initialization to C
  arm64: mm: handle ARM64_KERNEL_USES_PMD_MAPS in vmemmap_populate()
  arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()
  arm64/sve: Add Perf extensions documentation
  ...
  • Loading branch information
torvalds committed Oct 6, 2022
2 parents 41fc64a + d299524 commit 18fd049
Show file tree
Hide file tree
Showing 130 changed files with 4,608 additions and 1,259 deletions.
7 changes: 6 additions & 1 deletion Documentation/admin-guide/kernel-parameters.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3203,6 +3203,7 @@
spectre_v2_user=off [X86]
spec_store_bypass_disable=off [X86,PPC]
ssbd=force-off [ARM64]
nospectre_bhb [ARM64]
l1tf=off [X86]
mds=off [X86]
tsx_async_abort=off [X86]
Expand Down Expand Up @@ -3609,7 +3610,7 @@

nohugeiomap [KNL,X86,PPC,ARM64] Disable kernel huge I/O mappings.

nohugevmalloc [PPC] Disable kernel huge vmalloc mappings.
nohugevmalloc [KNL,X86,PPC,ARM64] Disable kernel huge vmalloc mappings.

nosmt [KNL,S390] Disable symmetric multithreading (SMT).
Equivalent to smt=1.
Expand All @@ -3627,6 +3628,10 @@
vulnerability. System may allow data leaks with this
option.

nospectre_bhb [ARM64] Disable all mitigations for Spectre-BHB (branch
history injection) vulnerability. System may allow data leaks
with this option.

nospec_store_bypass_disable
[HW] Disable all mitigations for the Speculative Store Bypass vulnerability

Expand Down
100 changes: 100 additions & 0 deletions Documentation/admin-guide/perf/alibaba_pmu.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
=============================================================
Alibaba's T-Head SoC Uncore Performance Monitoring Unit (PMU)
=============================================================

The Yitian 710, custom-built by Alibaba Group's chip development business,
T-Head, implements uncore PMU for performance and functional debugging to
facilitate system maintenance.

DDR Sub-System Driveway (DRW) PMU Driver
=========================================

Yitian 710 employs eight DDR5/4 channels, four on each die. Each DDR5 channel
is independent of others to service system memory requests. And one DDR5
channel is split into two independent sub-channels. The DDR Sub-System Driveway
implements separate PMUs for each sub-channel to monitor various performance
metrics.

The Driveway PMU devices are named as ali_drw_<sys_base_addr> with perf.
For example, ali_drw_21000 and ali_drw_21080 are two PMU devices for two
sub-channels of the same channel in die 0. And the PMU device of die 1 is
prefixed with ali_drw_400XXXXX, e.g. ali_drw_40021000.

Each sub-channel has 36 PMU counters in total, which is classified into
four groups:

- Group 0: PMU Cycle Counter. This group has one pair of counters
pmu_cycle_cnt_low and pmu_cycle_cnt_high, that is used as the cycle count
based on DDRC core clock.

- Group 1: PMU Bandwidth Counters. This group has 8 counters that are used
to count the total access number of either the eight bank groups in a
selected rank, or four ranks separately in the first 4 counters. The base
transfer unit is 64B.

- Group 2: PMU Retry Counters. This group has 10 counters, that intend to
count the total retry number of each type of uncorrectable error.

- Group 3: PMU Common Counters. This group has 16 counters, that are used
to count the common events.

For now, the Driveway PMU driver only uses counters in group 0 and group 3.

The DDR Controller (DDRCTL) and DDR PHY combine to create a complete solution
for connecting an SoC application bus to DDR memory devices. The DDRCTL
receives transactions Host Interface (HIF) which is custom-defined by Synopsys.
These transactions are queued internally and scheduled for access while
satisfying the SDRAM protocol timing requirements, transaction priorities, and
dependencies between the transactions. The DDRCTL in turn issues commands on
the DDR PHY Interface (DFI) to the PHY module, which launches and captures data
to and from the SDRAM. The driveway PMUs have hardware logic to gather
statistics and performance logging signals on HIF, DFI, etc.

By counting the READ, WRITE and RMW commands sent to the DDRC through the HIF
interface, we could calculate the bandwidth. Example usage of counting memory
data bandwidth::

perf stat \
-e ali_drw_21000/hif_wr/ \
-e ali_drw_21000/hif_rd/ \
-e ali_drw_21000/hif_rmw/ \
-e ali_drw_21000/cycle/ \
-e ali_drw_21080/hif_wr/ \
-e ali_drw_21080/hif_rd/ \
-e ali_drw_21080/hif_rmw/ \
-e ali_drw_21080/cycle/ \
-e ali_drw_23000/hif_wr/ \
-e ali_drw_23000/hif_rd/ \
-e ali_drw_23000/hif_rmw/ \
-e ali_drw_23000/cycle/ \
-e ali_drw_23080/hif_wr/ \
-e ali_drw_23080/hif_rd/ \
-e ali_drw_23080/hif_rmw/ \
-e ali_drw_23080/cycle/ \
-e ali_drw_25000/hif_wr/ \
-e ali_drw_25000/hif_rd/ \
-e ali_drw_25000/hif_rmw/ \
-e ali_drw_25000/cycle/ \
-e ali_drw_25080/hif_wr/ \
-e ali_drw_25080/hif_rd/ \
-e ali_drw_25080/hif_rmw/ \
-e ali_drw_25080/cycle/ \
-e ali_drw_27000/hif_wr/ \
-e ali_drw_27000/hif_rd/ \
-e ali_drw_27000/hif_rmw/ \
-e ali_drw_27000/cycle/ \
-e ali_drw_27080/hif_wr/ \
-e ali_drw_27080/hif_rd/ \
-e ali_drw_27080/hif_rmw/ \
-e ali_drw_27080/cycle/ -- sleep 10

The average DRAM bandwidth can be calculated as follows:

- Read Bandwidth = perf_hif_rd * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle
- Write Bandwidth = (perf_hif_wr + perf_hif_rmw) * DDRC_WIDTH * DDRC_Freq / DDRC_Cycle

Here, DDRC_WIDTH = 64 bytes.

The current driver does not support sampling. So "perf record" is
unsupported. Also attach to a task is unsupported as the events are all
uncore.
1 change: 1 addition & 0 deletions Documentation/admin-guide/perf/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ Performance monitor support
xgene-pmu
arm_dsu_pmu
thunderx2-pmu
alibaba_pmu
3 changes: 3 additions & 0 deletions Documentation/arm64/elf_hwcaps.rst
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,9 @@ HWCAP2_WFXT
HWCAP2_EBF16
Functionality implied by ID_AA64ISAR1_EL1.BF16 == 0b0010.

HWCAP2_SVE_EBF16
Functionality implied by ID_AA64ZFR0_EL1.BF16 == 0b0010.

4. Unused AT_HWCAP bits
-----------------------

Expand Down
2 changes: 2 additions & 0 deletions Documentation/arm64/silicon-errata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A510 | #2441009 | ARM64_ERRATUM_2441009 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 |
Expand Down
3 changes: 3 additions & 0 deletions Documentation/arm64/sme.rst
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,9 @@ The regset data starts with struct user_za_header, containing:
been read if a PTRACE_GETREGSET of NT_ARM_ZA were executed for each thread
when the coredump was generated.

* The NT_ARM_TLS note will be extended to two registers, the second register
will contain TPIDR2_EL0 on systems that support SME and will be read as
zero with writes ignored otherwise.

9. System runtime configuration
--------------------------------
Expand Down
22 changes: 21 additions & 1 deletion Documentation/arm64/sve.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ the SVE instruction set architecture.

* On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of
Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR
become unspecified on return from a syscall.
become zero on return from a syscall.

* The SVE registers are not used to pass arguments to or receive results from
any syscall.
Expand Down Expand Up @@ -452,6 +452,24 @@ The regset data starts with struct user_sve_header, containing:
* Modifying the system default vector length does not affect the vector length
of any existing process or thread that does not make an execve() call.

10. Perf extensions
--------------------------------

* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register
at index 46. This register is used for DWARF unwinding when variable length
SVE registers are pushed onto the stack.

* Its value is equivalent to the current SVE vector length (VL) in bits divided
by 64.

* The value is included in Perf samples in the regs[46] field if
PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set.

* The value is the current value at the time the sample was taken, and it can
change over time.

* If the system doesn't support SVE when perf_event_open is called with these
settings, the event will fail to open.

Appendix A. SVE programmer's model (informative)
=================================================
Expand Down Expand Up @@ -593,3 +611,5 @@ References
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
Procedure Call Standard for the ARM 64-bit Architecture (AArch64)

[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
6 changes: 6 additions & 0 deletions MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -748,6 +748,12 @@ S: Supported
F: drivers/infiniband/hw/erdma
F: include/uapi/rdma/erdma-abi.h

ALIBABA PMU DRIVER
M: Shuai Xue <[email protected]>
S: Supported
F: Documentation/admin-guide/perf/alibaba_pmu.rst
F: drivers/perf/alibaba_uncore_dwr_pmu.c

ALIENWARE WMI DRIVER
L: [email protected]
S: Maintained
Expand Down
18 changes: 18 additions & 0 deletions arch/arm64/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ config ARM64
select HAVE_ARCH_AUDITSYSCALL
select HAVE_ARCH_BITREVERSE
select HAVE_ARCH_COMPILER_H
select HAVE_ARCH_HUGE_VMALLOC
select HAVE_ARCH_HUGE_VMAP
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
Expand Down Expand Up @@ -230,6 +231,7 @@ config ARM64
select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
select TRACE_IRQFLAGS_SUPPORT
select TRACE_IRQFLAGS_NMI_SUPPORT
select HAVE_SOFTIRQ_ON_OWN_STACK
help
ARM 64-bit (AArch64) Linux support.

Expand Down Expand Up @@ -733,6 +735,19 @@ config ARM64_ERRATUM_2077057

If unsure, say Y.

config ARM64_ERRATUM_2658417
bool "Cortex-A510: 2658417: remove BF16 support due to incorrect result"
default y
help
This option adds the workaround for ARM Cortex-A510 erratum 2658417.
Affected Cortex-A510 (r0p0 to r1p1) may produce the wrong result for
BFMMLA or VMMLA instructions in rare circumstances when a pair of
A510 CPUs are using shared neon hardware. As the sharing is not
discoverable by the kernel, hide the BF16 HWCAP to indicate that
user-space should not be using these instructions.

If unsure, say Y.

config ARM64_ERRATUM_2119858
bool "Cortex-A710/X2: 2119858: workaround TRBE overwriting trace data in FILL mode"
default y
Expand Down Expand Up @@ -1562,6 +1577,9 @@ config THUMB2_COMPAT_VDSO
Compile the compat vDSO with '-mthumb -fomit-frame-pointer' if y,
otherwise with '-marm'.

config COMPAT_ALIGNMENT_FIXUPS
bool "Fix up misaligned multi-word loads and stores in user space"

menuconfig ARMV8_DEPRECATED
bool "Emulate deprecated/obsolete ARMv8 instructions"
depends on SYSCTL
Expand Down
17 changes: 17 additions & 0 deletions arch/arm64/configs/defconfig
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ CONFIG_NUMA_BALANCING=y
CONFIG_MEMCG=y
CONFIG_BLK_CGROUP=y
CONFIG_CGROUP_PIDS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_HUGETLB=y
CONFIG_CPUSETS=y
CONFIG_CGROUP_DEVICE=y
Expand Down Expand Up @@ -102,6 +103,8 @@ CONFIG_ARM_SCMI_CPUFREQ=y
CONFIG_ARM_TEGRA186_CPUFREQ=y
CONFIG_QORIQ_CPUFREQ=y
CONFIG_ACPI=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_ACPI_HMAT=y
CONFIG_ACPI_APEI=y
CONFIG_ACPI_APEI_GHES=y
CONFIG_ACPI_APEI_PCIEAER=y
Expand All @@ -126,6 +129,8 @@ CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_KSM=y
CONFIG_MEMORY_FAILURE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
Expand All @@ -139,12 +144,16 @@ CONFIG_IP_PNP_DHCP=y
CONFIG_IP_PNP_BOOTP=y
CONFIG_IPV6=m
CONFIG_NETFILTER=y
CONFIG_BRIDGE_NETFILTER=m
CONFIG_NF_CONNTRACK=m
CONFIG_NF_CONNTRACK_EVENTS=y
CONFIG_NETFILTER_XT_MARK=m
CONFIG_NETFILTER_XT_TARGET_CHECKSUM=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NETFILTER_XT_MATCH_ADDRTYPE=m
CONFIG_NETFILTER_XT_MATCH_CONNTRACK=m
CONFIG_NETFILTER_XT_MATCH_IPVS=m
CONFIG_IP_VS=m
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_FILTER=m
CONFIG_IP_NF_TARGET_REJECT=m
Expand Down Expand Up @@ -1349,4 +1358,12 @@ CONFIG_DEBUG_FS=y
# CONFIG_SCHED_DEBUG is not set
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_FTRACE is not set
CONFIG_CORESIGHT=m
CONFIG_CORESIGHT_LINK_AND_SINK_TMC=m
CONFIG_CORESIGHT_CATU=m
CONFIG_CORESIGHT_SINK_TPIU=m
CONFIG_CORESIGHT_SINK_ETBV10=m
CONFIG_CORESIGHT_STM=m
CONFIG_CORESIGHT_CPU_DEBUG=m
CONFIG_CORESIGHT_CTI=m
CONFIG_MEMTEST=y
Loading

0 comments on commit 18fd049

Please sign in to comment.