[AIE2P] Instruction select G_TRUNC for vector operands #341

niwinanto · 2025-02-06T17:02:11Z

We can instruction select G_TRUNC in to VSHUFFLE instruction. VSHUFFLE instruction supports de-interleaved modes which can be translated to G_TRUNC operation.

For an example, Dst = VSHUFFLE Src0, Src1, Mode0 is equalant to a shuffle mask of <0, 2, 4, ...> on concatenated output of Src0 and Src1.

martien-de-jong · 2025-02-07T08:30:31Z

llvm/lib/Target/AIE/aie2p/AIE2PInstructionSelector.cpp

+    I.eraseFromParent();
+    return constrainSelectedInstRegOperands(*MI, TII, TRI, RBI);
+  }
+  default:


We are called for Size >= 512, we only handle 1024?

Yes, this was work in progress. Now ready for review.

konstantinschwarz · 2025-02-07T19:08:59Z

llvm/lib/Target/AIE/aie2p/AIE2PInstrPatterns.td

+// vectors of the VSHUFFLE instruction.
+
+// src accumulator register bank
+def : Pat<(v16i32 (trunc (v16i64 ACC1024:$s1))),


Looks like these patterns could be refactored into a common class?

Its a good idea. Done.

konstantinschwarz · 2025-02-07T19:14:01Z

llvm/test/CodeGen/AIE/extractelement.ll

@@ -72,10 +72,10 @@ define signext i8 @extract_v16i8_signext(<16 x i8> %v) nounwind {
 ; AIE2P-LABEL: extract_v16i8_signext:
 ; AIE2P:         .p2align 4
 ; AIE2P-NEXT:  // %bb.0:
-; AIE2P-NEXT:    ret lr
+; AIE2P-NEXT:    nopa ; nopb ; nops ; ret lr; nopm ; nopv


We don't handle 128-bit vectors properly in the calling convention.
For AIE2, we have custom code to handle passing 128-bit vectors in 256-bit registers: https://github.com/Xilinx/llvm-aie/blob/aie-public/llvm/lib/Target/AIE/AIE2ISelLowering.cpp#L58

@SagarMaheshwari99 is working on a fix

konstantinschwarz · 2025-02-07T19:15:33Z

llvm/lib/Target/AIE/AIELegalizerHelper.cpp

+  MachineIRBuilder &MIRBuilder = Helper.MIRBuilder;
+  MachineRegisterInfo &MRI = *MIRBuilder.getMRI();
+
+  const Register DstReg = MI.getOperand(0).getReg();


Nit: can be written as:
const auto [DstReg, DstVecTy, SrcReg, SrcVecTy] = MI.getFirst2RegLLTs();

This is really nice. Done.

konstantinschwarz · 2025-02-07T19:18:03Z

llvm/lib/Target/AIE/AIELegalizerHelper.cpp

+  const unsigned PadOpc = II->getGenericPadVectorOpcode();
+  const unsigned UnpadOpc = II->getGenericUnpadVectorOpcode();
+
+  const LLT NewPadRegTy = LLT::fixed_vector(SrcVecTy.getNumElements() * 2,


Should use

const unsigned BasicVecSize = II()->getBasicVectorBitSize();

to get the size to pad

konstantinschwarz · 2025-02-07T19:28:20Z

llvm/lib/Target/AIE/aie2p/AIE2PInstrPatterns.td

+    (VSHUFFLE_vec_shuffle_x ACC512:$s1, ACC512:$s1, (MOV_RLC_imm11_pseudo (i32 2))), 
+    sub_256_lo)>;
+
+// src vector register bank


Why is there no pattern for
v16i32 (trunc (v16i64 ...))
on the vector register bank?

You are right. I missed it.

martien-de-jong · 2025-02-10T12:09:44Z

llvm/lib/Target/AIE/AIELegalizerHelper.cpp

+
+  const AIEBaseInstrInfo *II = ST.getInstrInfo();
+  const unsigned PadOpc = II->getGenericPadVectorOpcode();
+  const unsigned UnpadOpc = II->getGenericUnpadVectorOpcode();


Conventionally called TII, TargetInstructionInfo

martien-de-jong · 2025-02-10T12:13:36Z

llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp

+        const LLT &SrcTy = Query.Types[1];
+        const LLT &DstTy = Query.Types[0];
+        return SrcTy.isVector() && DstTy.isVector() &&
+               SrcTy.getSizeInBits() > 256 && SrcTy.getSizeInBits() < 2048 &&


nit: I think we can just as easily pin this down to 512 and 1024.

andcarminati · 2025-02-10T12:33:26Z

llvm/lib/Target/AIE/aie2p/AIE2PInstrPatterns.td

+def : Pat<(v16i32 (trunc (v16i64 ACC1024:$s1))),
+  (VSHUFFLE_vec_shuffle_x
+   (EXTRACT_SUBREG ACC1024:$s1, sub_512_acc_lo ),
+   (EXTRACT_SUBREG ACC1024:$s1, sub_512_acc_lo ),


What happens to the hi subreg?

You are absolutely right.

khallouh · 2025-02-10T13:16:07Z

llvm/lib/Target/AIE/aie2p/AIE2PInstructionSelector.cpp

+  LLT SrcTy = MRI.getType(SrcReg);
+  unsigned Size = SrcTy.getSizeInBits();
+  // G_TRUNC S32 <- S64
+  if (Size == 64) {


Nit: SrcSize

khallouh · 2025-02-10T13:18:42Z

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/inst-select-trunc.mir

+# RUN: llc -mtriple aie2p -run-pass=instruction-select %s -verify-machineinstrs -o - | FileCheck %s
+
+---
+name:            v16s32_trunc_v16s64_acc1024


Could you add instruction select tests for scalar source? I noticed we don't have them for AIE2P

Probably I would like to keep it for a followup PR, I want to focus only with vector operands with this PR.

We already have the test in aie2 aie2/GlobalISel/inst-select-gtrunc.mir, you could just enable it for AIE2P but up to you.

khallouh · 2025-02-10T13:19:27Z

llvm/lib/Target/AIE/aie2p/AIE2PInstructionSelector.cpp

+    return selectImpl(I, *CoverageInfo);
+  } else {
+    I.setDesc(TII.get(TargetOpcode::COPY));
+    return selectCopy(I, MRI);


Curious, which legal case can we select to a simple copy?

Not from my head, need to investigate.

I checked with Konstantin, this was meant for the trunc from 32 to 20 bits. Maybe we could (here or in a follow-up for scalars) check explicitly for that case and return false otherwise, so that we don't only fail later in CopyPhysReg for other unsupported cases.

So, lets do it it a followup PR along with scalar tests. AIECC-925

khallouh · 2025-02-10T14:35:18Z

llvm/test/CodeGen/AIE/aie2p/GlobalIsel/inst-select-trunc.mir

+    ; CHECK-NEXT: [[VSHUFFLE_vec_shuffle_x:%[0-9]+]]:vec512 = VSHUFFLE_vec_shuffle_x [[COPY2]], [[COPY3]], [[MOV_RLC_imm11_pseudo]]
+    ; CHECK-NEXT: PseudoRET implicit $lr, implicit [[VSHUFFLE_vec_shuffle_x]]
+    %1:accregbank(<16 x s64>) = G_IMPLICIT_DEF
+    %0:vregbank(<16 x s32>) = G_TRUNC %1(<16 x s64>)


What if it gets mapped to the accumulator bank? Could we select VSHUFFLE_vec_shuffle_bm in that case?

You are right, accumulator bank is possible for this type and the pattern was missing. Included the pattern.

khallouh

Do we need custom handling of G_TRUNC in RegBankSelect? Not sure when we will map which bank (we don't have regbankselect tests).
If G_TRUNC happens to be mapped to the accumulator bank, we won't be able to select with the current patterns.

niwinanto · 2025-02-10T21:59:30Z

Do we need custom handling of G_TRUNC in RegBankSelect? Not sure when we will map which bank (we don't have regbankselect tests). If G_TRUNC happens to be mapped to the accumulator bank, we won't be able to select with the current patterns.

Irrespective of the source register bank, we should be able to instruction select the correct instruction. If accumulator register bank, cross bank copies will be inserted to use vector register bank. And for output type accumulator register bank, one pattern was missing and I did include that as well. Please see the tests.

khallouh · 2025-02-11T10:54:59Z

llvm/lib/Target/AIE/aie2p/AIE2PLegalizerInfo.cpp

+      })
+      .customIf([=](const LegalityQuery &Query) {
+        const LLT &SrcTy = Query.Types[1];
+        return SrcTy.isVector() && SrcTy.getSizeInBits() == 256;


Do we want to check if the the Src size is double the Dst size like we do for the legal case above?

niwinanto · 2025-02-11T13:39:40Z

Do we need custom handling of G_TRUNC in RegBankSelect? Not sure when we will map which bank (we don't have regbankselect tests). If G_TRUNC happens to be mapped to the accumulator bank, we won't be able to select with the current patterns.

Irrespective of the source register bank, we should be able to instruction select the correct instruction. If accumulator register bank, cross bank copies will be inserted to use vector register bank. And for output type accumulator register bank, one pattern was missing and I did include that as well. Please see the tests.

I did further investigate the register bank selection for G_TRUNC and found we always assign vector register bank to destination operand. So if we want to assign accumulator bank, we might need to use the use-def approach like we do for other special cases. However, we cannot select it using tablegen to *_bm variant of vshuffle, because AIE2PAcc512RegisterClass has only [v16i32, v8i64, v16f32] types. We might need to add v32i16, v64i8 to the list. @konstantinschwarz @khallouh Do you think we should go for it? or ok with the current PR, that is one additional cross bank copy incase of accumulator destination operand. Please have a look at the new commit which has the registerbank tests.
Also, greedy register bank selection was failing because we had alternative mapping to gpr bank irrespective of vector/scalar.

khallouh · 2025-02-11T17:35:38Z

Do we need custom handling of G_TRUNC in RegBankSelect? Not sure when we will map which bank (we don't have regbankselect tests). If G_TRUNC happens to be mapped to the accumulator bank, we won't be able to select with the current patterns.

Irrespective of the source register bank, we should be able to instruction select the correct instruction. If accumulator register bank, cross bank copies will be inserted to use vector register bank. And for output type accumulator register bank, one pattern was missing and I did include that as well. Please see the tests.

I did further investigate the register bank selection for G_TRUNC and found we always assign vector register bank to destination operand. So if we want to assign accumulator bank, we might need to use the use-def approach like we do for other special cases. However, we cannot select it using tablegen to *_bm variant of vshuffle, because AIE2PAcc512RegisterClass has only [v16i32, v8i64, v16f32] types. We might need to add v32i16, v64i8 to the list. @konstantinschwarz @khallouh Do you think we should go for it? or ok with the current PR, that is one additional cross bank copy incase of accumulator destination operand. Please have a look at the new commit which has the registerbank tests. Also, greedy register bank selection was failing because we had alternative mapping to gpr bank irrespective of vector/scalar.

For me its fine as is for now. We can keep the pattern and survive with the cross bank copies. We can optimize this in a follow-up. Maybe you could add a TODO and create a ticket to keep track of this.

martien-de-jong reviewed Feb 7, 2025

View reviewed changes

niwinanto force-pushed the niwin.gtrunc.vec branch from d3812fe to 3dc635a Compare February 7, 2025 11:37

niwinanto changed the title ~~[WIP] Instruction G_TRUNC to VSHUFFLE~~ [AIE2P] Instruction select G_TRUNC for vector operands Feb 7, 2025

niwinanto force-pushed the niwin.gtrunc.vec branch 2 times, most recently from 6603a0e to 2f8c42a Compare February 7, 2025 13:53

niwinanto marked this pull request as ready for review February 7, 2025 14:00

niwinanto requested review from abhinay-anubola, abnikant, andcarminati, F-Stuckmann, gbossu, katerynamuts, khallouh, konstantinschwarz, SagarMaheshwari99 and stephenneuendorffer as code owners February 7, 2025 14:00

konstantinschwarz reviewed Feb 7, 2025

View reviewed changes

martien-de-jong reviewed Feb 10, 2025

View reviewed changes

andcarminati reviewed Feb 10, 2025

View reviewed changes

khallouh reviewed Feb 10, 2025

View reviewed changes

niwinanto added 2 commits February 10, 2025 21:33

[AIE2P] Instruction select G_TRUNC for vector operands

72697af

[AIE2P] Legalize G_TRUNC for vector types

c593c62

niwinanto force-pushed the niwin.gtrunc.vec branch from 2f8c42a to c593c62 Compare February 10, 2025 21:35

khallouh reviewed Feb 11, 2025

View reviewed changes

[AIE2P] Add register bank select tests for G_TRUNC with vector operands

8b0414b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AIE2P] Instruction select G_TRUNC for vector operands #341

[AIE2P] Instruction select G_TRUNC for vector operands #341

niwinanto commented Feb 6, 2025 •

edited

Loading

martien-de-jong Feb 7, 2025

niwinanto Feb 7, 2025

konstantinschwarz Feb 7, 2025

niwinanto Feb 10, 2025

konstantinschwarz Feb 7, 2025

konstantinschwarz Feb 7, 2025

niwinanto Feb 10, 2025

konstantinschwarz Feb 7, 2025

konstantinschwarz Feb 7, 2025

niwinanto Feb 10, 2025

martien-de-jong Feb 10, 2025

martien-de-jong Feb 10, 2025

andcarminati Feb 10, 2025

niwinanto Feb 10, 2025

khallouh Feb 10, 2025

khallouh Feb 10, 2025

niwinanto Feb 10, 2025

khallouh Feb 11, 2025

khallouh Feb 10, 2025

niwinanto Feb 10, 2025

khallouh Feb 11, 2025

niwinanto Feb 11, 2025

khallouh Feb 10, 2025

niwinanto Feb 10, 2025

khallouh left a comment

niwinanto commented Feb 10, 2025

khallouh Feb 11, 2025

niwinanto commented Feb 11, 2025

khallouh commented Feb 11, 2025

[AIE2P] Instruction select G_TRUNC for vector operands #341

Are you sure you want to change the base?

[AIE2P] Instruction select G_TRUNC for vector operands #341

Conversation

niwinanto commented Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

khallouh left a comment

Choose a reason for hiding this comment

niwinanto commented Feb 10, 2025

Choose a reason for hiding this comment

niwinanto commented Feb 11, 2025

khallouh commented Feb 11, 2025

niwinanto commented Feb 6, 2025 •

edited

Loading