-
Huazhong University of Science and Technology
- Wuhan, Hubei, China
Lists (14)
Sort Name ascending (A-Z)
Stars
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Witness the aha moment of VLM with less than $3.
Fully open reproduction of DeepSeek-R1
Solve Visual Understanding with Reinforced VLMs
MM-EUREKA: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
A paper list of some recent works about Token Compress for Vit and VLM
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
[CVPR 2025] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Official repository for paper "Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving"
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
[TMLR 2025🔥] A survey for the autoregressive models in vision.
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
[ECCV 2024] A Simple and Effective 3D DETR in Point Clouds
[NeurIPS 2024] Official code of ”LION: Linear Group RNN for 3D Object Detection in Point Clouds“
[ECCV 2024] OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
[NeurIPS 2023] Query-based Temporal Fusion with Explicit Motion for 3D Object Detection
The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
Efficient Triton Kernels for LLM Training
[CVPR 2025] MINIMA: Modality Invariant Image Matching
Doe-1: Closed-Loop Autonomous Driving with Large World Model
Collection of papers and repos for multimodal chain-of-thought
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥