-
Facebook
- Menlo Park
Stars
[TMLR 2024] Efficient Large Language Models: A Survey
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
Fast and memory-efficient exact attention
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!