Stars
JupyterLab for AI in Docker! Anaconda and PyTorch GPU supported.
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).
Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought and OpenAI o1 🍓
Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.
An implementation of local windowed attention for language modeling
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
Diffusion Reading Group at EleutherAI
Fast Hadamard transform in CUDA, with a PyTorch interface
Efficient GPU kernels for block-sparse matrix multiplication and convolution
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Official repository of Agent Attention (ECCV2024)
Transformer based on a variant of attention that is linear complexity in respect to sequence length
Awesome list for LLM quantization
Code repo for the paper "SpinQuant LLM quantization with learned rotations"
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
A list of papers, blogs, datasets and software in the field of lifelong/continual machine learning
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Code for the article "What if Neural Networks had SVDs?", to be presented as a spotlight paper at NeurIPS 2020.
Fast and memory-efficient exact attention
A learning rate range test implementation in PyTorch
A new regularization technique that freezes the layers of the deep neural networks stochastically.
An implementation of Knowledge distillation for segmentation, to train a small (student) UNet from a larger (teacher) UNet thereby reducing the size of the network while achieving performance simil…
Optimization with orthogonal constraints and on general manifolds