Highlights
- Pro
Stars
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
A fork to add multimodal model training to open-r1
Fully open reproduction of DeepSeek-R1
RAGEN is the first open-source reproduction of DeepSeek-R1 for training agentic models via reinforcement learning.
Benchmarking LLMs' Gaming Ability in Multi-Agent Environments
official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Official inference repo for FLUX.1 models
GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
A minimal and universal controller for FLUX.1.
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Video-LlaVA fine-tune for CinePile evaluation
[Siggraph Asia 2024] Follow-Your-Emoji: This repo is the official implementation of "Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation"
Fast and memory-efficient exact attention
A personal investigative project to track the latest progress in the field of multi-modal object tracking.
(arXiv.2405.18406) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Accepted as [NeurIPS 2024] Spotlight Presentation Paper
SEED-Story: Multimodal Long Story Generation with Large Language Model