-
HKUST; NKU
- [email protected]
Highlights
- Pro
Stars
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
[ICLR 2025] Implementation of "FACTS: A Factored State-Space Framework For World Modelling"
An open source code repository of driving world models, with training, inferencing, evaluation tools, and pretrained checkpoints.
Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)
Official code for "QueST: Self-Supervised Skill Abstractions for Continuous Control" [NeurIPS 2024]
Famous Vision Language Models and Their Architectures
Deep Reinforcement Learning for mobile robot navigation in ROS Gazebo simulator. Using Twin Delayed Deep Deterministic Policy Gradient (TD3) neural network, a robot learns to navigate to a random g…
A goal-driven autonomous exploration through deep reinforcement learning (ICRA 2022) system that combines reactive and planned robot navigation in unknown environments
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.
The official implementation of flow Q-learning (FQL)
Witness the aha moment of VLM with less than $3.
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Official implementation of "ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills"
[ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"
Clean, minimal, accessible reproduction of DeepSeek R1-Zero
[RSS 2024] NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
Code for "MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training", Arxiv 2025.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
[CVPR 2024] MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
Official repository for our work on micro-budget training of large-scale diffusion models.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Helpful tools and examples for working with flex-attention
Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…