Stars
Latent Motion Token as the Bridging Language for Robot Manipulation
IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
[CoRL 2024] Official code for "Scaling Robot Policy Learning via Zero-Shot Labeling with Foundation Models"
Handy to download arXiv pdf files by ID(s) on command line
Fetch citations and abstracts of a Google Scholar paper and generate prompt for LLM
Depth Any Video with Scalable Synthetic Data
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Sparsh Self-supervised touch representations for vision-based tactile sensing
Web-based tool converts GitHub repository contents into a single formatted text file
A suite of image and video neural tokenizers
robofamily / 1xgpt
Forked from 1x-technologies/1xgptworld modeling challenge for humanoid robots
It's not a list of papers, but a list of paper reading lists...
Code for "Differentiable Robot Rendering" (CoRL 2024)
A comprehensive list of papers about Robot Manipulation, including papers, codes, and related websites.
Visualizing the DROID dataset using Rerun
LAPA: Latent Action Pretraining from Videos
Janus-Series: Unified Multimodal Understanding and Generation Models
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
Inpaint anything using Segment Anything and inpainting models.
📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.
Grounded SAM 2: Ground and Track Anything in Videos with Grounding DINO, Florence-2 and SAM 2