Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIE2P] Instruction select G_TRUNC for vector operands #341

Open
wants to merge 3 commits into
base: aie-public
Choose a base branch
from

Conversation

niwinanto
Copy link
Collaborator

@niwinanto niwinanto commented Feb 6, 2025

We can instruction select G_TRUNC in to VSHUFFLE instruction. VSHUFFLE instruction supports de-interleaved modes which can be translated to G_TRUNC operation.

For an example, Dst = VSHUFFLE Src0, Src1, Mode0 is equalant to a shuffle mask of <0, 2, 4, ...> on concatenated output of Src0 and Src1.

I.eraseFromParent();
return constrainSelectedInstRegOperands(*MI, TII, TRI, RBI);
}
default:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are called for Size >= 512, we only handle 1024?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was work in progress. Now ready for review.

@niwinanto niwinanto changed the title [WIP] Instruction G_TRUNC to VSHUFFLE [AIE2P] Instruction select G_TRUNC for vector operands Feb 7, 2025
@niwinanto niwinanto force-pushed the niwin.gtrunc.vec branch 2 times, most recently from 6603a0e to 2f8c42a Compare February 7, 2025 13:53
@niwinanto niwinanto marked this pull request as ready for review February 7, 2025 14:00
// vectors of the VSHUFFLE instruction.

// src accumulator register bank
def : Pat<(v16i32 (trunc (v16i64 ACC1024:$s1))),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these patterns could be refactored into a common class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its a good idea. Done.

@@ -72,10 +72,10 @@ define signext i8 @extract_v16i8_signext(<16 x i8> %v) nounwind {
; AIE2P-LABEL: extract_v16i8_signext:
; AIE2P: .p2align 4
; AIE2P-NEXT: // %bb.0:
; AIE2P-NEXT: ret lr
; AIE2P-NEXT: nopa ; nopb ; nops ; ret lr; nopm ; nopv
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle 128-bit vectors properly in the calling convention.
For AIE2, we have custom code to handle passing 128-bit vectors in 256-bit registers: https://github.com/Xilinx/llvm-aie/blob/aie-public/llvm/lib/Target/AIE/AIE2ISelLowering.cpp#L58

@SagarMaheshwari99 is working on a fix

MachineIRBuilder &MIRBuilder = Helper.MIRBuilder;
MachineRegisterInfo &MRI = *MIRBuilder.getMRI();

const Register DstReg = MI.getOperand(0).getReg();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can be written as:
const auto [DstReg, DstVecTy, SrcReg, SrcVecTy] = MI.getFirst2RegLLTs();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really nice. Done.

const unsigned PadOpc = II->getGenericPadVectorOpcode();
const unsigned UnpadOpc = II->getGenericUnpadVectorOpcode();

const LLT NewPadRegTy = LLT::fixed_vector(SrcVecTy.getNumElements() * 2,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use

    const unsigned BasicVecSize = II()->getBasicVectorBitSize();

to get the size to pad

(VSHUFFLE_vec_shuffle_x ACC512:$s1, ACC512:$s1, (MOV_RLC_imm11_pseudo (i32 2))),
sub_256_lo)>;

// src vector register bank
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there no pattern for
v16i32 (trunc (v16i64 ...))
on the vector register bank?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I missed it.


const AIEBaseInstrInfo *II = ST.getInstrInfo();
const unsigned PadOpc = II->getGenericPadVectorOpcode();
const unsigned UnpadOpc = II->getGenericUnpadVectorOpcode();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventionally called TII, TargetInstructionInfo

const LLT &SrcTy = Query.Types[1];
const LLT &DstTy = Query.Types[0];
return SrcTy.isVector() && DstTy.isVector() &&
SrcTy.getSizeInBits() > 256 && SrcTy.getSizeInBits() < 2048 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can just as easily pin this down to 512 and 1024.

def : Pat<(v16i32 (trunc (v16i64 ACC1024:$s1))),
(VSHUFFLE_vec_shuffle_x
(EXTRACT_SUBREG ACC1024:$s1, sub_512_acc_lo ),
(EXTRACT_SUBREG ACC1024:$s1, sub_512_acc_lo ),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to the hi subreg?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right.

LLT SrcTy = MRI.getType(SrcReg);
unsigned Size = SrcTy.getSizeInBits();
// G_TRUNC S32 <- S64
if (Size == 64) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: SrcSize

# RUN: llc -mtriple aie2p -run-pass=instruction-select %s -verify-machineinstrs -o - | FileCheck %s

---
name: v16s32_trunc_v16s64_acc1024
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add instruction select tests for scalar source? I noticed we don't have them for AIE2P

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I would like to keep it for a followup PR, I want to focus only with vector operands with this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the test in aie2 aie2/GlobalISel/inst-select-gtrunc.mir, you could just enable it for AIE2P but up to you.

return selectImpl(I, *CoverageInfo);
} else {
I.setDesc(TII.get(TargetOpcode::COPY));
return selectCopy(I, MRI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, which legal case can we select to a simple copy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not from my head, need to investigate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with Konstantin, this was meant for the trunc from 32 to 20 bits. Maybe we could (here or in a follow-up for scalars) check explicitly for that case and return false otherwise, so that we don't only fail later in CopyPhysReg for other unsupported cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, lets do it it a followup PR along with scalar tests. AIECC-925

; CHECK-NEXT: [[VSHUFFLE_vec_shuffle_x:%[0-9]+]]:vec512 = VSHUFFLE_vec_shuffle_x [[COPY2]], [[COPY3]], [[MOV_RLC_imm11_pseudo]]
; CHECK-NEXT: PseudoRET implicit $lr, implicit [[VSHUFFLE_vec_shuffle_x]]
%1:accregbank(<16 x s64>) = G_IMPLICIT_DEF
%0:vregbank(<16 x s32>) = G_TRUNC %1(<16 x s64>)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it gets mapped to the accumulator bank? Could we select VSHUFFLE_vec_shuffle_bm in that case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, accumulator bank is possible for this type and the pattern was missing. Included the pattern.

Copy link
Collaborator

@khallouh khallouh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need custom handling of G_TRUNC in RegBankSelect? Not sure when we will map which bank (we don't have regbankselect tests).
If G_TRUNC happens to be mapped to the accumulator bank, we won't be able to select with the current patterns.

@niwinanto
Copy link
Collaborator Author

Do we need custom handling of G_TRUNC in RegBankSelect? Not sure when we will map which bank (we don't have regbankselect tests). If G_TRUNC happens to be mapped to the accumulator bank, we won't be able to select with the current patterns.

Irrespective of the source register bank, we should be able to instruction select the correct instruction. If accumulator register bank, cross bank copies will be inserted to use vector register bank. And for output type accumulator register bank, one pattern was missing and I did include that as well. Please see the tests.

})
.customIf([=](const LegalityQuery &Query) {
const LLT &SrcTy = Query.Types[1];
return SrcTy.isVector() && SrcTy.getSizeInBits() == 256;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to check if the the Src size is double the Dst size like we do for the legal case above?

@niwinanto
Copy link
Collaborator Author

Do we need custom handling of G_TRUNC in RegBankSelect? Not sure when we will map which bank (we don't have regbankselect tests). If G_TRUNC happens to be mapped to the accumulator bank, we won't be able to select with the current patterns.

Irrespective of the source register bank, we should be able to instruction select the correct instruction. If accumulator register bank, cross bank copies will be inserted to use vector register bank. And for output type accumulator register bank, one pattern was missing and I did include that as well. Please see the tests.

I did further investigate the register bank selection for G_TRUNC and found we always assign vector register bank to destination operand. So if we want to assign accumulator bank, we might need to use the use-def approach like we do for other special cases. However, we cannot select it using tablegen to *_bm variant of vshuffle, because AIE2PAcc512RegisterClass has only [v16i32, v8i64, v16f32] types. We might need to add v32i16, v64i8 to the list. @konstantinschwarz @khallouh Do you think we should go for it? or ok with the current PR, that is one additional cross bank copy incase of accumulator destination operand. Please have a look at the new commit which has the registerbank tests.
Also, greedy register bank selection was failing because we had alternative mapping to gpr bank irrespective of vector/scalar.

@khallouh
Copy link
Collaborator

Do we need custom handling of G_TRUNC in RegBankSelect? Not sure when we will map which bank (we don't have regbankselect tests). If G_TRUNC happens to be mapped to the accumulator bank, we won't be able to select with the current patterns.

Irrespective of the source register bank, we should be able to instruction select the correct instruction. If accumulator register bank, cross bank copies will be inserted to use vector register bank. And for output type accumulator register bank, one pattern was missing and I did include that as well. Please see the tests.

I did further investigate the register bank selection for G_TRUNC and found we always assign vector register bank to destination operand. So if we want to assign accumulator bank, we might need to use the use-def approach like we do for other special cases. However, we cannot select it using tablegen to *_bm variant of vshuffle, because AIE2PAcc512RegisterClass has only [v16i32, v8i64, v16f32] types. We might need to add v32i16, v64i8 to the list. @konstantinschwarz @khallouh Do you think we should go for it? or ok with the current PR, that is one additional cross bank copy incase of accumulator destination operand. Please have a look at the new commit which has the registerbank tests. Also, greedy register bank selection was failing because we had alternative mapping to gpr bank irrespective of vector/scalar.

For me its fine as is for now. We can keep the pattern and survive with the cross bank copies. We can optimize this in a follow-up. Maybe you could add a TODO and create a ticket to keep track of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants