-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIE2P] Instruction select G_TRUNC for vector operands #341
base: aie-public
Are you sure you want to change the base?
Conversation
I.eraseFromParent(); | ||
return constrainSelectedInstRegOperands(*MI, TII, TRI, RBI); | ||
} | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are called for Size >= 512, we only handle 1024?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was work in progress. Now ready for review.
d3812fe
to
3dc635a
Compare
6603a0e
to
2f8c42a
Compare
// vectors of the VSHUFFLE instruction. | ||
|
||
// src accumulator register bank | ||
def : Pat<(v16i32 (trunc (v16i64 ACC1024:$s1))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these patterns could be refactored into a common class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a good idea. Done.
@@ -72,10 +72,10 @@ define signext i8 @extract_v16i8_signext(<16 x i8> %v) nounwind { | |||
; AIE2P-LABEL: extract_v16i8_signext: | |||
; AIE2P: .p2align 4 | |||
; AIE2P-NEXT: // %bb.0: | |||
; AIE2P-NEXT: ret lr | |||
; AIE2P-NEXT: nopa ; nopb ; nops ; ret lr; nopm ; nopv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't handle 128-bit vectors properly in the calling convention.
For AIE2, we have custom code to handle passing 128-bit vectors in 256-bit registers: https://github.com/Xilinx/llvm-aie/blob/aie-public/llvm/lib/Target/AIE/AIE2ISelLowering.cpp#L58
@SagarMaheshwari99 is working on a fix
MachineIRBuilder &MIRBuilder = Helper.MIRBuilder; | ||
MachineRegisterInfo &MRI = *MIRBuilder.getMRI(); | ||
|
||
const Register DstReg = MI.getOperand(0).getReg(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: can be written as:
const auto [DstReg, DstVecTy, SrcReg, SrcVecTy] = MI.getFirst2RegLLTs();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really nice. Done.
const unsigned PadOpc = II->getGenericPadVectorOpcode(); | ||
const unsigned UnpadOpc = II->getGenericUnpadVectorOpcode(); | ||
|
||
const LLT NewPadRegTy = LLT::fixed_vector(SrcVecTy.getNumElements() * 2, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use
const unsigned BasicVecSize = II()->getBasicVectorBitSize();
to get the size to pad
(VSHUFFLE_vec_shuffle_x ACC512:$s1, ACC512:$s1, (MOV_RLC_imm11_pseudo (i32 2))), | ||
sub_256_lo)>; | ||
|
||
// src vector register bank |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there no pattern for
v16i32 (trunc (v16i64 ...))
on the vector register bank?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I missed it.
|
||
const AIEBaseInstrInfo *II = ST.getInstrInfo(); | ||
const unsigned PadOpc = II->getGenericPadVectorOpcode(); | ||
const unsigned UnpadOpc = II->getGenericUnpadVectorOpcode(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventionally called TII, TargetInstructionInfo
const LLT &SrcTy = Query.Types[1]; | ||
const LLT &DstTy = Query.Types[0]; | ||
return SrcTy.isVector() && DstTy.isVector() && | ||
SrcTy.getSizeInBits() > 256 && SrcTy.getSizeInBits() < 2048 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we can just as easily pin this down to 512 and 1024.
def : Pat<(v16i32 (trunc (v16i64 ACC1024:$s1))), | ||
(VSHUFFLE_vec_shuffle_x | ||
(EXTRACT_SUBREG ACC1024:$s1, sub_512_acc_lo ), | ||
(EXTRACT_SUBREG ACC1024:$s1, sub_512_acc_lo ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens to the hi subreg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right.
LLT SrcTy = MRI.getType(SrcReg); | ||
unsigned Size = SrcTy.getSizeInBits(); | ||
// G_TRUNC S32 <- S64 | ||
if (Size == 64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: SrcSize
# RUN: llc -mtriple aie2p -run-pass=instruction-select %s -verify-machineinstrs -o - | FileCheck %s | ||
|
||
--- | ||
name: v16s32_trunc_v16s64_acc1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add instruction select tests for scalar source? I noticed we don't have them for AIE2P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably I would like to keep it for a followup PR, I want to focus only with vector operands with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have the test in aie2 aie2/GlobalISel/inst-select-gtrunc.mir
, you could just enable it for AIE2P but up to you.
return selectImpl(I, *CoverageInfo); | ||
} else { | ||
I.setDesc(TII.get(TargetOpcode::COPY)); | ||
return selectCopy(I, MRI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, which legal case can we select to a simple copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not from my head, need to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with Konstantin, this was meant for the trunc from 32 to 20 bits. Maybe we could (here or in a follow-up for scalars) check explicitly for that case and return false otherwise, so that we don't only fail later in CopyPhysReg for other unsupported cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, lets do it it a followup PR along with scalar tests. AIECC-925
; CHECK-NEXT: [[VSHUFFLE_vec_shuffle_x:%[0-9]+]]:vec512 = VSHUFFLE_vec_shuffle_x [[COPY2]], [[COPY3]], [[MOV_RLC_imm11_pseudo]] | ||
; CHECK-NEXT: PseudoRET implicit $lr, implicit [[VSHUFFLE_vec_shuffle_x]] | ||
%1:accregbank(<16 x s64>) = G_IMPLICIT_DEF | ||
%0:vregbank(<16 x s32>) = G_TRUNC %1(<16 x s64>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it gets mapped to the accumulator bank? Could we select VSHUFFLE_vec_shuffle_bm
in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, accumulator bank is possible for this type and the pattern was missing. Included the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need custom handling of G_TRUNC
in RegBankSelect
? Not sure when we will map which bank (we don't have regbankselect
tests).
If G_TRUNC
happens to be mapped to the accumulator bank, we won't be able to select with the current patterns.
2f8c42a
to
c593c62
Compare
Irrespective of the source register bank, we should be able to instruction select the correct instruction. If accumulator register bank, cross bank copies will be inserted to use vector register bank. And for output type accumulator register bank, one pattern was missing and I did include that as well. Please see the tests. |
}) | ||
.customIf([=](const LegalityQuery &Query) { | ||
const LLT &SrcTy = Query.Types[1]; | ||
return SrcTy.isVector() && SrcTy.getSizeInBits() == 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to check if the the Src size is double the Dst size like we do for the legal case above?
I did further investigate the register bank selection for G_TRUNC and found we always assign vector register bank to destination operand. So if we want to assign accumulator bank, we might need to use the use-def approach like we do for other special cases. However, we cannot select it using tablegen to *_bm variant of vshuffle, because |
For me its fine as is for now. We can keep the pattern and survive with the cross bank copies. We can optimize this in a follow-up. Maybe you could add a TODO and create a ticket to keep track of this. |
We can instruction select
G_TRUNC
in toVSHUFFLE
instruction.VSHUFFLE
instruction supports de-interleaved modes which can be translated to G_TRUNC operation.For an example,
Dst = VSHUFFLE Src0, Src1, Mode0
is equalant to ashuffle mask of <0, 2, 4, ...>
on concatenated output ofSrc0
andSrc1
.