Stars
3
results
for sponsorable starred repositories
Clear filter
Transformer: PyTorch Implementation of "Attention Is All You Need"
A high-throughput and memory-efficient inference and serving engine for LLMs
LLM papers I'm reading, mostly on inference and model compression