Skip to content
View ipiszy's full-sized avatar
  • Facebook
  • Menlo Park

Block or report ipiszy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[TMLR 2024] Efficient Large Language Models: A Survey

1,104 94 Updated Feb 4, 2025

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

Cuda 97 5 Updated Jul 13, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 6,755 1,109 Updated Feb 26, 2025

Fast and memory-efficient exact attention

Python 15,937 1,497 Updated Feb 25, 2025

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,603 376 Updated Dec 4, 2024

Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!

Python 38,796 7,347 Updated Nov 27, 2022