ipiszy

Ying Zhang ipiszy

30 followers · 8 following

Facebook
Menlo Park

Achievements

Stars

deepseek-ai / DeepSeek-V3

Python 89,322 14,392 Updated Feb 24, 2025

AIoT-MLSys-Lab / Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

1,104 94 Updated Feb 4, 2025

wangsiping97 / FastGEMV

High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.

Cuda 97 5 Updated Jul 13, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 6,755 1,109 Updated Feb 26, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 15,937 1,497 Updated Feb 25, 2025

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,603 376 Updated Dec 4, 2024

floodsung / Deep-Learning-Papers-Reading-Roadmap

Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!

Python 38,796 7,347 Updated Nov 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ying Zhang ipiszy

Achievements

Achievements

Block or report ipiszy

Stars

deepseek-ai / DeepSeek-V3

AIoT-MLSys-Lab / Efficient-LLMs-Survey

wangsiping97 / FastGEMV

NVIDIA / cutlass

Dao-AILab / flash-attention

facebookincubator / AITemplate

floodsung / Deep-Learning-Papers-Reading-Roadmap