Skip to content
View renll's full-sized avatar

Organizations

@IPPPP

Block or report renll

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

s1: Simple test-time scaling

Python 5,914 680 Updated Mar 6, 2025

nnScaler: Compiling DNN models for Parallel Training

Python 100 13 Updated Feb 14, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,105 613 Updated Mar 10, 2025

FlashMLA: Efficient MLA decoding kernels

C++ 11,232 785 Updated Mar 1, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 527 27 Updated Mar 4, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,712 197 Updated Mar 4, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,635 93 Updated Mar 7, 2025

A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you hav…

Python 55 2 Updated Jul 4, 2023

Pretraining code for a large-scale depth-recurrent language model

Python 667 53 Updated Mar 5, 2025

Democratizing Reinforcement Learning for LLMs

Python 1,962 171 Updated Feb 16, 2025

EvaByte: Efficient Byte-level Language Models at Scale

Python 84 3 Updated Feb 28, 2025

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python 3,108 229 Updated Feb 19, 2025

Clean, minimal, accessible reproduction of DeepSeek R1-Zero

Python 11,071 1,411 Updated Feb 1, 2025

Training Large Language Model to Reason in a Continuous Latent Space

Python 950 83 Updated Jan 24, 2025
Python 258 13 Updated Feb 21, 2025
Python 20 Updated May 4, 2024

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 138 9 Updated Feb 23, 2025

Code for BLT research paper

Python 1,432 109 Updated Mar 5, 2025

[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…

Jupyter Notebook 6,878 443 Updated Jan 12, 2025
Python 21 Updated Nov 9, 2024

This repository contains the results and code for the AlgoPerf v0.5 benchmark.

Python 5 Updated Oct 4, 2024

Helpful tools and examples for working with flex-attention

Python 677 36 Updated Mar 9, 2025

A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.

Python 505 42 Updated Feb 25, 2025

Fast and memory-efficient exact attention

Python 66 7 Updated Mar 3, 2025

sigma-MoE layer

Python 18 2 Updated Jan 5, 2024

NanoGPT (124M) in 3 minutes

Python 2,364 259 Updated Mar 9, 2025

A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.

Python 63 6 Updated Aug 2, 2024

A large-scale RWKV v6, v7(World, ARWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy on docker. Supports true multi-batch generation and dynamic State sw…

Python 31 1 Updated Feb 21, 2025
Next