Stars
A unified architecture for multimodal multi-task robotic policy learning.
[ICML 2024] 3D-VLA: A 3D Vision-Language-Action Generative World Model
Code for 3D-LLM: Injecting the 3D World into Large Language Models
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
2021年最新总结,推荐工程师合适读本,计算机科学,软件技术,创业,思想类,数学类,人物传记书籍
Official codebase for "Any-point Trajectory Modeling for Policy Learning"
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
利用HuggingFace的官方下载工具从镜像网站进行高速下载。
An example RLDS dataset builder for X-embodiment dataset conversion.
CVPR2023 (highlight) - UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View
A JAX-based simulator for autonomous driving research.
An open-source codebase for exploring autonomous driving pre-training
Awesome papers about Multi-Camera 3D Object Detection and Segmentation in Bird's-Eye-View, such as DETR3D, BEVDet, BEVFormer, BEVDepth, UniAD
Running large language models on a single GPU for throughput-oriented scenarios.
Making large AI models cheaper, faster and more accessible