Starred repositories
verl: Volcano Engine Reinforcement Learning for LLMs
SOTA Re-identification Methods and Toolbox
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, I…
A paper list of some recent works about Token Compress for Vit and VLM
✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy
A jounery to real multimodel R1 ! We are doing on large-scale experiment
Witness the aha moment of VLM with less than $3.
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
Fully open reproduction of DeepSeek-R1
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.
[CVPR 2024] CapsFusion: Rethinking Image-Text Data at Scale
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …
Performance instrumentation and tracing for Android, Linux and Chrome (read-only mirror of https://android.googlesource.com/platform/external/perfetto/)
FlagGems is an operator library for large language models implemented in Triton Language.
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Efficient Triton Kernels for LLM Training
Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"
The official GitHub page for the survey paper "A Survey of Large Language Models".