-
Microsoft
- Redmond, Washington
- https://renll.github.io/
Stars
nnScaler: Compiling DNN models for Parallel Training
DeepEP: an efficient expert-parallel communication library
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
MoBA: Mixture of Block Attention for Long-Context LLMs
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you hav…
Pretraining code for a large-scale depth-recurrent language model
Democratizing Reinforcement Learning for LLMs
EvaByte: Efficient Byte-level Language Models at Scale
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Training Large Language Model to Reason in a Continuous Latent Space
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
This repository contains the results and code for the AlgoPerf v0.5 benchmark.
Helpful tools and examples for working with flex-attention
A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.
Fast and memory-efficient exact attention
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
A large-scale RWKV v6, v7(World, ARWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy on docker. Supports true multi-batch generation and dynamic State sw…