Skip to content

Commit

Permalink
In visitSTORE, always use FindBetterChain, rather than only when UseA…
Browse files Browse the repository at this point in the history
…A is enabled.

Retrying after fixing overly aggressive load-store forwarding optimization.

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@289221 91177308-0d34-0410-b5e6-96231b3b80d8
  • Loading branch information
niravhdave committed Dec 9, 2016
1 parent 8c44335 commit 615f3cc
Show file tree
Hide file tree
Showing 67 changed files with 1,682 additions and 1,995 deletions.
416 changes: 141 additions & 275 deletions lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion lib/CodeGen/TargetLoweringBase.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -828,7 +828,7 @@ TargetLoweringBase::TargetLoweringBase(const TargetMachine &tm) : TM(tm) {
MinFunctionAlignment = 0;
PrefFunctionAlignment = 0;
PrefLoopAlignment = 0;
GatherAllAliasesMaxDepth = 6;
GatherAllAliasesMaxDepth = 18;
MinStackArgumentAlignment = 1;
// TODO: the default will be switched to 0 in the next commit, along
// with the Target-specific changes necessary.
Expand Down
10 changes: 0 additions & 10 deletions lib/Target/AMDGPU/AMDGPUISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -449,16 +449,6 @@ AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,

PredictableSelectIsExpensive = false;

// We want to find all load dependencies for long chains of stores to enable
// merging into very wide vectors. The problem is with vectors with > 4
// elements. MergeConsecutiveStores will attempt to merge these because x8/x16
// vectors are a legal type, even though we have to split the loads
// usually. When we can more precisely specify load legality per address
// space, we should be able to make FindBetterChain/MergeConsecutiveStores
// smarter so that they can figure out what to do in 2 iterations without all
// N > 4 stores on the same chain.
GatherAllAliasesMaxDepth = 16;

// FIXME: Need to really handle these.
MaxStoresPerMemcpy = 4096;
MaxStoresPerMemmove = 4096;
Expand Down
2 changes: 1 addition & 1 deletion test/CodeGen/AArch64/argument-blocks.ll
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ define i64 @test_hfa_ignores_gprs([7 x float], [2 x float] %in, i64, i64 %res) {
; but should go in an 8-byte aligned slot.
define void @test_varargs_stackalign() {
; CHECK-LABEL: test_varargs_stackalign:
; CHECK-DARWINPCS: stp {{w[0-9]+}}, {{w[0-9]+}}, [sp, #16]
; CHECK-DARWINPCS: str {{x[0-9]+}}, [sp, #16]

call void(...) @callee([3 x float] undef, [2 x float] [float 1.0, float 2.0])
ret void
Expand Down
5 changes: 1 addition & 4 deletions test/CodeGen/AArch64/arm64-abi.ll
Original file line number Diff line number Diff line change
Expand Up @@ -205,10 +205,7 @@ declare i32 @args_i32(i32, i32, i32, i32, i32, i32, i32, i32, i16 signext, i32,
define i32 @test8(i32 %argc, i8** nocapture %argv) nounwind {
entry:
; CHECK-LABEL: test8
; CHECK: strb {{w[0-9]+}}, [sp, #3]
; CHECK: strb wzr, [sp, #2]
; CHECK: strb {{w[0-9]+}}, [sp, #1]
; CHECK: strb wzr, [sp]
; CHECK: str w8, [sp]
; CHECK: bl
; FAST-LABEL: test8
; FAST: strb {{w[0-9]+}}, [sp]
Expand Down
4 changes: 2 additions & 2 deletions test/CodeGen/AArch64/arm64-memset-inline.ll
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ define void @t2() nounwind ssp {
entry:
; CHECK-LABEL: t2:
; CHECK: strh wzr, [sp, #32]
; CHECK: stp xzr, xzr, [sp, #16]
; CHECK: str xzr, [sp, #8]
; CHECK: stp xzr, xzr, [sp, #8]
; CHECK: str xzr, [sp, #24]
%buf = alloca [26 x i8], align 1
%0 = getelementptr inbounds [26 x i8], [26 x i8]* %buf, i32 0, i32 0
call void @llvm.memset.p0i8.i32(i8* %0, i8 0, i32 26, i32 1, i1 false)
Expand Down
13 changes: 8 additions & 5 deletions test/CodeGen/AArch64/ldst-opt.ll
Original file line number Diff line number Diff line change
Expand Up @@ -1370,8 +1370,11 @@ entry:
define void @merge_zr32_2_offset(i32* %p) {
; CHECK-LABEL: merge_zr32_2_offset:
; CHECK: // %entry
; CHECK-NEXT: stp xzr, xzr, [x{{[0-9]+}}, #504]
; CHECK-NEXT: str xzr, [x0, #512]
; CHECK-NEXT: str xzr, [x0, #504]
; CHECK-NEXT: ret
; We should be able to merge these stores
; CHECKFIXME-NEXT: stp xzr, xzr, [x{{[0-9]+}}, #504]
entry:
%p0 = getelementptr i32, i32* %p, i32 126
store i32 0, i32* %p0
Expand Down Expand Up @@ -1411,8 +1414,8 @@ entry:
define void @merge_zr32_3(i32* %p) {
; CHECK-LABEL: merge_zr32_3:
; CHECK: // %entry
; CHECK-NEXT: movi v[[REG:[0-9]]].2d, #0000000000000000
; CHECK-NEXT: stp q[[REG]], q[[REG]], [x{{[0-9]+}}]
; CHECK-NEXT: stp xzr, xzr, [x[[REG:[0-9]+]]]
; CHECK-NEXT: stp xzr, xzr, [x[[REG]], #16]
; CHECK-NEXT: ret
entry:
store i32 0, i32* %p
Expand Down Expand Up @@ -1507,8 +1510,8 @@ entry:
define void @merge_zr64_2(i64* %p) {
; CHECK-LABEL: merge_zr64_2:
; CHECK: // %entry
; CHECK-NEXT: movi v[[REG:[0-9]]].2d, #0000000000000000
; CHECK-NEXT: stp q[[REG]], q[[REG]], [x{{[0-9]+}}]
; CHECK-NEXT: stp xzr, xzr, [x[[REG:[0-9]+]]]
; CHECK-NEXT: stp xzr, xzr, [x[[REG]], #16]
; CHECK-NEXT: ret
entry:
store i64 0, i64* %p
Expand Down
5 changes: 3 additions & 2 deletions test/CodeGen/AArch64/merge-store.ll
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,9 @@
@g0 = external global <3 x float>, align 16
@g1 = external global <3 x float>, align 4

; CHECK: ldr s[[R0:[0-9]+]], {{\[}}[[R1:x[0-9]+]]{{\]}}, #4
; CHECK: ld1{{\.?s?}} { v[[R0]]{{\.?s?}} }[1], {{\[}}[[R1]]{{\]}}
; CHECK: ldr q[[R0:[0-9]+]], {{\[}}[[R1:x[0-9]+]], :lo12:g0
;; TODO: this next line seems like a redundant no-op move?
; CHECK: ins v0.s[1], v0.s[1]
; CHECK: str d[[R0]]

define void @blam() {
Expand Down
3 changes: 1 addition & 2 deletions test/CodeGen/AArch64/vector_merge_dep_check.ll
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
; RUN: llc --combiner-alias-analysis=false < %s | FileCheck %s
; RUN: llc --combiner-alias-analysis=true < %s | FileCheck %s
; RUN: llc < %s | FileCheck %s

; This test checks that we do not merge stores together which have
; dependencies through their non-chain operands (e.g. one store is the
Expand Down
24 changes: 16 additions & 8 deletions test/CodeGen/AMDGPU/debugger-insert-nops.ll
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -mattr=+amdgpu-debugger-insert-nops -verify-machineinstrs < %s | FileCheck %s
; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -mattr=+amdgpu-debugger-insert-nops -verify-machineinstrs < %s | FileCheck %s --check-prefix=CHECK
; RUN: llc -O0 -mtriple=amdgcn--amdhsa -mcpu=fiji -mattr=+amdgpu-debugger-insert-nops -verify-machineinstrs < %s | FileCheck %s --check-prefix=CHECKNOP

; CHECK: test01.cl:2:{{[0-9]+}}
; CHECK-NEXT: s_nop 0
; This test expects that we have one instance for each line in some order with "s_nop 0" instances after each.

; CHECK: test01.cl:3:{{[0-9]+}}
; CHECK-NEXT: s_nop 0
; Check that each line appears at least once
; CHECK-DAG: test01.cl:2:3
; CHECK-DAG: test01.cl:3:3
; CHECK-DAG: test01.cl:4:3

; CHECK: test01.cl:4:{{[0-9]+}}
; CHECK-NEXT: s_nop 0

; Check that each of each of the lines consists of the line output, followed by "s_nop 0"
; CHECKNOP: test01.cl:{{[234]}}:3
; CHECKNOP-NEXT: s_nop 0
; CHECKNOP: test01.cl:{{[234]}}:3
; CHECKNOP-NEXT: s_nop 0
; CHECKNOP: test01.cl:{{[234]}}:3
; CHECKNOP-NEXT: s_nop 0

; CHECK: test01.cl:5:{{[0-9]+}}
; CHECK-NEXT: s_nop 0
Expand All @@ -21,7 +29,7 @@ entry:
call void @llvm.dbg.declare(metadata i32 addrspace(1)** %A.addr, metadata !17, metadata !18), !dbg !19
%0 = load i32 addrspace(1)*, i32 addrspace(1)** %A.addr, align 4, !dbg !20
%arrayidx = getelementptr inbounds i32, i32 addrspace(1)* %0, i32 0, !dbg !20
store i32 1, i32 addrspace(1)* %arrayidx, align 4, !dbg !21
store i32 1, i32 addrspace(1)* %arrayidx, align 4, !dbg !20
%1 = load i32 addrspace(1)*, i32 addrspace(1)** %A.addr, align 4, !dbg !22
%arrayidx1 = getelementptr inbounds i32, i32 addrspace(1)* %1, i32 1, !dbg !22
store i32 2, i32 addrspace(1)* %arrayidx1, align 4, !dbg !23
Expand Down
5 changes: 2 additions & 3 deletions test/CodeGen/AMDGPU/insert_vector_elt.ll
Original file line number Diff line number Diff line change
Expand Up @@ -253,9 +253,8 @@ define void @dynamic_insertelement_v2i8(<2 x i8> addrspace(1)* %out, <2 x i8> %a

; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:2
; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offset:1
; GCN-DAG: buffer_store_byte v{{[0-9]+}}, off, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}{{$}}

; GCN: buffer_store_byte v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}
; GCN-DAG: buffer_store_byte v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}
; GCN-DAG: buffer_store_byte v{{[0-9]+}}, v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}} offen{{$}}

; GCN: buffer_load_ubyte
; GCN: buffer_load_ubyte
Expand Down
17 changes: 3 additions & 14 deletions test/CodeGen/AMDGPU/merge-stores.ll
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-NOAA %s
; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-NOAA %s

; RUN: llc -march=amdgcn -verify-machineinstrs -combiner-alias-analysis -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-AA %s
; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -combiner-alias-analysis -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-AA %s
; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-AA %s
; RUN: llc -march=amdgcn -mcpu=bonaire -verify-machineinstrs -amdgpu-load-store-vectorizer=0 < %s | FileCheck -check-prefix=SI -check-prefix=GCN -check-prefix=GCN-AA %s

; This test is mostly to test DAG store merging, so disable the vectorizer.
; Run with devices with different unaligned load restrictions.
Expand Down Expand Up @@ -474,17 +471,9 @@ define void @merge_global_store_4_adjacent_loads_i8_natural_align(i8 addrspace(1
ret void
}

; This works once AA is enabled on the subtarget
; GCN-LABEL: {{^}}merge_global_store_4_vector_elts_loads_v4i32:
; GCN: buffer_load_dwordx4 [[LOAD:v\[[0-9]+:[0-9]+\]]]

; GCN-NOAA: buffer_store_dword v
; GCN-NOAA: buffer_store_dword v
; GCN-NOAA: buffer_store_dword v
; GCN-NOAA: buffer_store_dword v

; GCN-AA: buffer_store_dwordx4 [[LOAD]]

; GCN: buffer_store_dwordx4 [[LOAD]]
; GCN: s_endpgm
define void @merge_global_store_4_vector_elts_loads_v4i32(i32 addrspace(1)* %out, <4 x i32> addrspace(1)* %in) #0 {
%out.gep.1 = getelementptr i32, i32 addrspace(1)* %out, i32 1
Expand Down
12 changes: 6 additions & 6 deletions test/CodeGen/AMDGPU/private-element-size.ll
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,10 @@
; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:24{{$}}
; HSA-ELT4-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s9 offset:28{{$}}

; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}
; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}
; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}}
; HSA-ELT4: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}}
; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen{{$}}
; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:4{{$}}
; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:8{{$}}
; HSA-ELT4-DAG: buffer_load_dword {{v[0-9]+}}, v{{[0-9]+}}, s[0:3], s9 offen offset:12{{$}}
define void @private_elt_size_v4i32(<4 x i32> addrspace(1)* %out, i32 addrspace(1)* %index.array) #0 {
entry:
%tid = call i32 @llvm.amdgcn.workitem.id.x()
Expand Down Expand Up @@ -130,8 +130,8 @@ entry:
; HSA-ELT8: private_element_size = 2
; HSA-ELT4: private_element_size = 1

; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9{{$}}
; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, off, s[0:3], s9 offset:8
; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off|v[0-9]}}, s[0:3], s9{{$}}
; HSA-ELTGE8-DAG: buffer_store_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, {{off|v[0-9]}}, s[0:3], s9 offset:8

; HSA-ELTGE8: buffer_load_dwordx2 {{v\[[0-9]+:[0-9]+\]}}, v{{[0-9]+}}, s[0:3], s9 offen

Expand Down
17 changes: 8 additions & 9 deletions test/CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll
Original file line number Diff line number Diff line change
Expand Up @@ -157,9 +157,8 @@ define void @reorder_global_load_local_store_global_load(i32 addrspace(1)* %out,

; FUNC-LABEL: @reorder_local_offsets
; CI: ds_read2_b32 {{v\[[0-9]+:[0-9]+\]}}, {{v[0-9]+}} offset0:100 offset1:102
; CI: ds_write2_b32 {{v[0-9]+}}, {{v[0-9]+}}, {{v[0-9]+}} offset0:3 offset1:100
; CI: ds_read_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:12
; CI: ds_write_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:408
; CI-DAG: ds_write2_b32 {{v[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}} offset0:3 offset1:100
; CI-DAG: ds_write_b32 {{v[0-9]+}}, {{v[0-9]+}} offset:408
; CI: buffer_store_dword
; CI: s_endpgm
define void @reorder_local_offsets(i32 addrspace(1)* nocapture %out, i32 addrspace(1)* noalias nocapture readnone %gptr, i32 addrspace(3)* noalias nocapture %ptr0) #0 {
Expand All @@ -181,12 +180,12 @@ define void @reorder_local_offsets(i32 addrspace(1)* nocapture %out, i32 addrspa
}

; FUNC-LABEL: @reorder_global_offsets
; CI: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:400
; CI: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:408
; CI: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:12
; CI: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:400
; CI: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:408
; CI: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:12
; CI-DAG: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:400
; CI-DAG: buffer_load_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:408
; CI-DAG: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:12
; CI-DAG: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:400
; CI-DAG: buffer_store_dword {{v[0-9]+}}, off, {{s\[[0-9]+:[0-9]+\]}}, 0 offset:408
; CI: buffer_store_dword
; CI: s_endpgm
define void @reorder_global_offsets(i32 addrspace(1)* nocapture %out, i32 addrspace(1)* noalias nocapture readnone %gptr, i32 addrspace(1)* noalias nocapture %ptr0) #0 {
%ptr1 = getelementptr inbounds i32, i32 addrspace(1)* %ptr0, i32 3
Expand Down
3 changes: 2 additions & 1 deletion test/CodeGen/ARM/2012-10-04-AAPCS-byval-align8.ll
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ define void @test_byval_8_bytes_alignment(i32 %i, ...) {
entry:
; CHECK: sub sp, sp, #12
; CHECK: sub sp, sp, #4
; CHECK: stmib sp, {r1, r2, r3}
; CHECK: add r0, sp, #4
; CHECK: stm sp, {r0, r1, r2, r3}
%g = alloca i8*
%g1 = bitcast i8** %g to i8*
call void @llvm.va_start(i8* %g1)
Expand Down
100 changes: 51 additions & 49 deletions test/CodeGen/ARM/alloc-no-stack-realign.ll
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
; RUN: llc < %s -mtriple=armv7-apple-ios -O0 | FileCheck %s -check-prefix=NO-REALIGN
; RUN: llc < %s -mtriple=armv7-apple-ios -O0 | FileCheck %s -check-prefix=REALIGN
; RUN: llc < %s -mtriple=armv7-apple-ios -O0 | FileCheck %s

; rdar://12713765
; When realign-stack is set to false, make sure we are not creating stack
Expand All @@ -8,29 +7,31 @@

define void @test1(<16 x float>* noalias sret %agg.result) nounwind ssp "no-realign-stack" {
entry:
; NO-REALIGN-LABEL: test1
; NO-REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]]
; NO-REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32
; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1]], #48
; NO-REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]

; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1:[0-9]+]], #48
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: mov r[[R3:[0-9]+]], r[[R1]]
; NO-REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R3]]:128]!
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R3]]:128]

; NO-REALIGN: add r[[R2:[0-9]+]], r[[R0:0]], #48
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: add r[[R2:[0-9]+]], r[[R0]], #32
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; NO-REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]!
; NO-REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]
; CHECK-LABEL: test1
; CHECK: ldr r[[R1:[0-9]+]], [pc, r1]
; CHECK: add r[[R2:[0-9]+]], r1, #48
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: mov r[[R2:[0-9]+]], r[[R1]]
; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: add r[[R1:[0-9]+]], r[[R1]], #32
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: mov r[[R1:[0-9]+]], sp
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: add r[[R2:[0-9]+]], r[[R1]], #32
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: add r[[R1:[0-9]+]], r0, #48
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: add r[[R1:[0-9]+]], r0, #32
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r0:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r0:128]
%retval = alloca <16 x float>, align 16
%0 = load <16 x float>, <16 x float>* @T3_retval, align 16
store <16 x float> %0, <16 x float>* %retval
Expand All @@ -41,32 +42,33 @@ entry:

define void @test2(<16 x float>* noalias sret %agg.result) nounwind ssp {
entry:
; REALIGN-LABEL: test2
; REALIGN: bfc sp, #0, #6
; REALIGN: mov r[[R2:[0-9]+]], r[[R1:[0-9]+]]
; REALIGN: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #32
; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: add r[[R2:[0-9]+]], r[[R1]], #48
; REALIGN: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: ldr r[[R1:[0-9]+]], [pc, r1]
; CHECK: add r[[R2:[0-9]+]], r[[R1]], #48
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: mov r[[R2:[0-9]+]], r[[R1]]
; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: add r[[R1:[0-9]+]], r[[R1]], #32
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: mov r[[R1:[0-9]+]], sp
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: orr r[[R2:[0-9]+]], r[[R1]], #32
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vld1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vld1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; CHECK: add r[[R1:[0-9]+]], r0, #48
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: add r[[R1:[0-9]+]], r0, #32
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; CHECK: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r0:128]!
; CHECK: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r0:128]


; REALIGN: orr r[[R2:[0-9]+]], r[[R1:[0-9]+]], #48
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: orr r[[R2:[0-9]+]], r[[R1]], #32
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: orr r[[R2:[0-9]+]], r[[R1]], #16
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R2]]:128]
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]

; REALIGN: add r[[R1:[0-9]+]], r[[R0:0]], #48
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; REALIGN: add r[[R1:[0-9]+]], r[[R0]], #32
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R1]]:128]
; REALIGN: vst1.32 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]!
; REALIGN: vst1.64 {{{d[0-9]+, d[0-9]+}}}, [r[[R0]]:128]
%retval = alloca <16 x float>, align 16
%retval = alloca <16 x float>, align 16
%0 = load <16 x float>, <16 x float>* @T3_retval, align 16
store <16 x float> %0, <16 x float>* %retval
%1 = load <16 x float>, <16 x float>* %retval
Expand Down
2 changes: 0 additions & 2 deletions test/CodeGen/ARM/ifcvt10.ll
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@ entry:
; CHECK-LABEL: t:
; CHECK: vpop {d8}
; CHECK-NOT: vpopne
; CHECK: pop {r7, pc}
; CHECK: vpop {d8}
; CHECK: pop {r7, pc}
br i1 undef, label %if.else, label %if.then

Expand Down
Loading

0 comments on commit 615f3cc

Please sign in to comment.