-
Tsinghua University
- Beijing, China
- https://shiml20.github.io/
Highlights
- Pro
Stars
MoBA: Mixture of Block Attention for Long-Context LLMs
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
SkyReels V1: The first and most advanced open-source human-centric video foundation model
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Various AI scripts. Mostly Stable Diffusion stuff.
Fully open reproduction of DeepSeek-R1
Ongoing research training transformer models at scale
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
[Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
The official GitHub page for the survey paper "A Survey on Mixture of Experts in Large Language Models".
A PyTorch implementation of the paper "All are Worth Words: A ViT Backbone for Diffusion Models".
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
“FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching” FlowAR employs a simplest scale design and is compatible with any VAE.
Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
MoH: Multi-Head Attention as Mixture-of-Head Attention
The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (V…
[NeurIPS 2024] DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model