🎯
Focusing
CS PhD Student at Rice University
-
Rice University
- Houston, United States
Stars
14
stars
written in C++
Clear filter
High-speed Large Language Model Serving for Local Deployment
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Optimized primitives for collective multi-GPU communication
On-device AI across mobile, embedded and edge for PyTorch
A GPU benchmark suite for assessing on-chip GPU memory bandwidth