Skip to content

Commit

Permalink
compile: prefer an AND instead of SHR+SHL instructions
Browse files Browse the repository at this point in the history
On modern 64bit CPUs a SHR, SHL or AND instruction take 1 cycle to execute.
A pair of shifts that operate on the same register will take 2 cycles
and needs to wait for the input register value to be available.

Large constants used to mask the high bits of a register with an AND
instruction can not be encoded as an immediate in the AND instruction
on amd64 and therefore need to be loaded into a register with a MOV
instruction.

However that MOV instruction is not dependent on the output register and
on many CPUs does not compete with the AND or shift instructions for
execution ports.

Using a pair of shifts to mask high bits instead of an AND to mask high
bits of a register has a shorter encoding and uses one less general
purpose register but is slower due to taking one clock cycle longer
if there is no register pressure that would make the AND variant need to
generate a spill.

For example the instructions emitted for (x & 1 << 63) before this CL are:
48c1ea3f                SHRQ $0x3f, DX
48c1e23f                SHLQ $0x3f, DX

after this CL the instructions are the same as GCC and LLVM use:
48b80000000000000080    MOVQ $0x8000000000000000, AX
4821d0                  ANDQ DX, AX

Some platforms such as arm64 already have SSA optimization rules to fuse
two shift instructions back into an AND.

Removing the general rule to rewrite AND to SHR+SHL speeds up this benchmark:

    var GlobalU uint

    func BenchmarkAndHighBits(b *testing.B) {
        x := uint(0)
        for i := 0; i < b.N; i++ {
                x &= 1 << 63
        }
        GlobalU = x
    }

amd64/darwin on Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz:
name           old time/op  new time/op  delta
AndHighBits-4  0.61ns ± 6%  0.42ns ± 6%  -31.42%  (p=0.000 n=25+25):

'go run run.go -all_codegen -v codegen' passes  with following adjustments:

ARM64: The BFXIL pattern ((x << lc) >> rc | y & ac) needed adjustment
       since ORshiftRL generation fusing '>> rc' and '|' interferes
       with matching ((x << lc) >> rc) to generate UBFX. Previously
       ORshiftLL was created first using the shifts generated for (y & ac).

S390X: Add rules for abs and copysign to match use of AND instead of SHIFTs.

Updates golang#33826
Updates golang#32781

Change-Id: I43227da76b625de03fbc51117162b23b9c678cdb
Reviewed-on: https://go-review.googlesource.com/c/go/+/194297
Run-TryBot: Martin Möhrmann <[email protected]>
TryBot-Result: Gobot Gobot <[email protected]>
Reviewed-by: Cherry Zhang <[email protected]>
  • Loading branch information
martisch committed Sep 21, 2019
1 parent ecc7dd5 commit 4e2b84f
Show file tree
Hide file tree
Showing 7 changed files with 625 additions and 293 deletions.
7 changes: 4 additions & 3 deletions src/cmd/compile/internal/ssa/gen/ARM64.rules
Original file line number Diff line number Diff line change
Expand Up @@ -1863,9 +1863,8 @@
(XORshiftLL <t> [c] (UBFX [bfc] x) x2) && c < 32 && t.Size() == 4 && bfc == armBFAuxInt(32-c, c)
-> (EXTRWconst [32-c] x2 x)

// Generic rules rewrite certain AND to a pair of shifts.
// However, on ARM64 the bitmask can fit into an instruction.
// Rewrite it back to AND.
// Rewrite special pairs of shifts to AND.
// On ARM64 the bitmask can fit into an instruction.
(SRLconst [c] (SLLconst [c] x)) && 0 < c && c < 64 -> (ANDconst [1<<uint(64-c)-1] x) // mask out high bits
(SLLconst [c] (SRLconst [c] x)) && 0 < c && c < 64 -> (ANDconst [^(1<<uint(c)-1)] x) // mask out low bits

Expand Down Expand Up @@ -1971,6 +1970,8 @@
-> (BFXIL [bfc] y x)
(ORshiftLL [sc] (UBFX [bfc] x) (SRLconst [sc] y)) && sc == getARM64BFwidth(bfc)
-> (BFXIL [bfc] y x)
(ORshiftRL [rc] (ANDconst [ac] y) (SLLconst [lc] x)) && lc < rc && ac == ^((1<<uint(64-rc)-1))
-> (BFXIL [armBFAuxInt(rc-lc, 64-rc)] y x)

// do combined loads
// little endian loads
Expand Down
4 changes: 4 additions & 0 deletions src/cmd/compile/internal/ssa/gen/S390X.rules
Original file line number Diff line number Diff line change
Expand Up @@ -701,6 +701,8 @@
// may need to be reworked when NIHH/OIHH are added
(SRDconst [1] (SLDconst [1] (LGDR <t> x))) -> (LGDR <t> (LPDFR <x.Type> x))
(LDGR <t> (SRDconst [1] (SLDconst [1] x))) -> (LPDFR (LDGR <t> x))
(AND (MOVDconst [^(-1<<63)]) (LGDR <t> x)) -> (LGDR <t> (LPDFR <x.Type> x))
(LDGR <t> (AND (MOVDconst [^(-1<<63)]) x)) -> (LPDFR (LDGR <t> x))
(OR (MOVDconst [-1<<63]) (LGDR <t> x)) -> (LGDR <t> (LNDFR <x.Type> x))
(LDGR <t> (OR (MOVDconst [-1<<63]) x)) -> (LNDFR (LDGR <t> x))

Expand All @@ -710,6 +712,8 @@
// detect copysign
(OR (SLDconst [63] (SRDconst [63] (LGDR x))) (LGDR (LPDFR <t> y))) -> (LGDR (CPSDR <t> y x))
(OR (SLDconst [63] (SRDconst [63] (LGDR x))) (MOVDconst [c])) && c & -1<<63 == 0 -> (LGDR (CPSDR <x.Type> (FMOVDconst <x.Type> [c]) x))
(OR (AND (MOVDconst [-1<<63]) (LGDR x)) (LGDR (LPDFR <t> y))) -> (LGDR (CPSDR <t> y x))
(OR (AND (MOVDconst [-1<<63]) (LGDR x)) (MOVDconst [c])) && c & -1<<63 == 0 -> (LGDR (CPSDR <x.Type> (FMOVDconst <x.Type> [c]) x))
(CPSDR y (FMOVDconst [c])) && c & -1<<63 == 0 -> (LPDFR y)
(CPSDR y (FMOVDconst [c])) && c & -1<<63 != 0 -> (LNDFR y)

Expand Down
8 changes: 0 additions & 8 deletions src/cmd/compile/internal/ssa/gen/generic.rules
Original file line number Diff line number Diff line change
Expand Up @@ -542,14 +542,6 @@
(Slicemask (Const64 [x])) && x > 0 -> (Const64 [-1])
(Slicemask (Const64 [0])) -> (Const64 [0])

// Rewrite AND of consts as shifts if possible, slightly faster for 64 bit operands
// leading zeros can be shifted left, then right
(And64 <t> (Const64 [y]) x) && nlz(y) + nto(y) == 64 && nto(y) >= 32
-> (Rsh64Ux64 (Lsh64x64 <t> x (Const64 <t> [nlz(y)])) (Const64 <t> [nlz(y)]))
// trailing zeros can be shifted right, then left
(And64 <t> (Const64 [y]) x) && nlo(y) + ntz(y) == 64 && ntz(y) >= 32
-> (Lsh64x64 (Rsh64Ux64 <t> x (Const64 <t> [ntz(y)])) (Const64 <t> [ntz(y)]))

// simplifications often used for lengths. e.g. len(s[i:i+5])==5
(Sub(64|32|16|8) (Add(64|32|16|8) x y) x) -> y
(Sub(64|32|16|8) (Add(64|32|16|8) x y) y) -> x
Expand Down
27 changes: 27 additions & 0 deletions src/cmd/compile/internal/ssa/rewriteARM64.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 4e2b84f

Please sign in to comment.