Skip to content

Commit

Permalink
[Arm64] Vector Load/Store structure instructions (dotnet#33461)
Browse files Browse the repository at this point in the history
This adds support in the JIT emitter for Vector Load/Store structure instructions (C3.2.10 - Arm
Architecture Reference Manual):

- LD1 (1-4 registers)
- LD2
- LD3
- LD4
- LD1R
- LD2R
- LD3R
- LD4R
- ST1 (1-4 registers)
- ST2
- ST3
- ST4

in the following addressing modes:

- Base register only
- Post-indexed by a 64-bit register
- Post-indexed by an immediate, equal to the number of bytes transferred

Also adds support in JitDump for printing of

* A SIMD vector register list.
  For example, ld1     {v5.16b, v6.16b, v7.16b, v8.16b}, [x9]

* A SIMD vector element list. 
  For example, st1     {v0.b}[3], [x1],dotnet#1
  • Loading branch information
echesakov authored Mar 14, 2020
1 parent a1af0f2 commit 6b8cda0
Show file tree
Hide file tree
Showing 6 changed files with 1,727 additions and 181 deletions.
720 changes: 720 additions & 0 deletions src/coreclr/src/jit/codegenarm64.cpp

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions src/coreclr/src/jit/emit.h
Original file line number Diff line number Diff line change
Expand Up @@ -1233,6 +1233,8 @@ class emitter
#define PERFSCORE_THROUGHPUT_4C 4.0f // slower - 4 cycles
#define PERFSCORE_THROUGHPUT_5C 5.0f // slower - 5 cycles
#define PERFSCORE_THROUGHPUT_6C 6.0f // slower - 6 cycles
#define PERFSCORE_THROUGHPUT_7C 7.0f // slower - 7 cycles
#define PERFSCORE_THROUGHPUT_8C 8.0f // slower - 8 cycles
#define PERFSCORE_THROUGHPUT_9C 9.0f // slower - 9 cycles
#define PERFSCORE_THROUGHPUT_10C 10.0f // slower - 10 cycles
#define PERFSCORE_THROUGHPUT_13C 13.0f // slower - 13 cycles
Expand Down
Loading

0 comments on commit 6b8cda0

Please sign in to comment.