- Shenzhen,China
-
16:31
(UTC +08:00)
Lists (8)
Sort Name ascending (A-Z)
Stars
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
This repository provides a simple pipeline to load data, train a Pi Zero model, and evaluate its performance.
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making
Implementation of π₀, the robotic foundation model architecture proposed by Physical Intelligence
Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜
Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw
A simple testbed for robotics manipulation policies
[RSS 2024] Code for "Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals" for CALVIN experiments with pre-trained weights
Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
TorchCFM: a Conditional Flow Matching library
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
A comprehensive list of papers about Robot Manipulation, including papers, codes, and related websites.
An Open-source Toolkit for LLM Development
Codebase for the BestMan Mobile Manipulator Platform
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Central repository for tools, tutorials, resources, and documentation for robotics simulation in Unity.
This repo is designed for General Robotic Operation System
Official implementation of CoPa: General Robotic Manipulation through Spatial Constraints of Parts with Foundation Models
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Learning Latent Dynamics for Planning from Pixels
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
O1 Replication Journey: A Strategic Progress Report – Part I
Fast and simple implementation of RL algorithms, designed to run fully on GPU.