Skip to content

Latest commit

 

History

History

aie_kernels

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

AIE Kernels

These kernels are provided as example building blocks for larger designs, and also as illustrations of how to write single core programs for AIEs which can then be duplicated or mixed into multi-core designs using the structural IRON API.

In some cases, the kernels are just generic C code, and will run on any family of AI Engines with varying performance. Other kernels are then optimized for the AIE1 and AIE2 architectures. Finally, some kernels use the AIE API, which is a C++ header-only library providing types and operations that get translated into efficient low-level intrinsics, and whose documentation can be found here, while others use the architecture specific low-level intrinsics directly

NOTE: this set of AIE kernels are meant for demonstration along with the programming examples. The goal is not to be 100% performant, there may be room for further improvement. The kernels are provided as-is with no guarantees of support of AMD or AMD Research and Advanced Development.

Generic

Class Name Coding style Purpose Datatypes
basic passThrough.cc AIE API A simple memcpy operation uint8_t, int16_t, int32_t

AIE1

Name Coding style Purpose

AIE2

Class Name Coding style Purpose Datatypes
basic zero.cc AIE API Fill a tensor with zeroes template
basic add.cc AIE API Pointwise addition of 2 tensors bfloat16
basic mul.cc AIE API Pointwise multiplication of 2 tensors bfloat16
basic scale.cc AIE API Scale all elements of a tensor with a scale factor int32_t
basic bitwiseOR.cc AIE API Bitwise OR of fixed point tensors uint8_t,int16_t,int32_t
basic bitwiseAND.cc AIE API Bitwise AND of fixed point tensors uint8_t,int16_t,int32_t
gemm mm.cc AIE API Matrix/Matrix multiplication int16_t,bfloat16_t
gemm mv.cc AIE API Matrix/Vector multiplication bfloat16_t
reduction reduce_add.cc Intrinsics Find the sum of elements in a tensor int32 _t
reduction reduce_max.cc Intrinsics Find max value across a tensor int32 _t
reduction reduce_min.cc Intrinsics Find min value across a tensor int32 _t
ml conv2dk1_i8.cc AIE API 1x1 Conv2D int8_t
ml conv2dk1.cc AIE API 1x1 Conv2D with fused ReLU int8_t, uint8_t
ml conv2dk3.cc AIE API 3x3 Conv2D with fused ReLU int8_t, uint8_t
ml conv2dk1_skip.cc AIE API 1x1 Conv2D with fused skip addition int8_t, uint8_t
ml conv2dk1_skip_init.cc AIE API 1x1 Conv2D with fused 1x1 Conv2D skip addition int8_t, uint8_t
ml relu.cc Intrinsics ReLU activation function bfloat16_t
ml bf16_exp.cc AIE API Raise all elements in a bfloat tensor to $e^x$ bfloat16_t
vision gray2rgba.cc AIE API Convert from grayscale to RGBA format uint8_t
vision rgba2gray.cc AIE API Convert from RGBA format to grayscale uint8_t
vision rgba2hue.cc AIE API Convert from RGBA to hue uint8_t
vision addWeighted.cc AIE API Fixed point weighted sum of two tensors uint8_t
vision threshold.cc AIE API Clipping uint8_t
vision filter2d.cc AIE API Fixed point 2D image processing filter uint8_t