Skip to content

liyaxuanliyaxuan/TinyVLA

Repository files navigation

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

  • TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Modelsfor Robotic Manipulation
    arXiv

📰 News

  • Feb. 17th, 2025: 🔥🔥🔥Our code is released!
  • Feb. 9th, 2025: 🔥🔥🔥TinyVLA is accepted by IEEE Robotics and Automation Letters (RA-L) 2025!
  • Nov. 19th, 2024: TinyVLA is out! Paper can be found here. The project web can be found here.

Contents

Install

  1. Clone this repository and navigate to diffusion-vla folder
git clone https://github.com/liyaxuanliyaxuan/TinyVLA
  1. Install Package
conda create -n tinyvla python=3.10 -y
conda activate tinyvla
pip install --upgrade pip  # 
pip install -r requirements.txt
cd policy_heads
pip install -e . 
# install llava-pythia
cd ../llava-pythia
pip install -e . 

Data Preparation

  1. Our data format is the same as act, so you need to transfer your data into h5py format. You can refer to the rlds_to_h5py.py which is used to transfer the data from rlds format to h5py format.
# h5 data structure
root
  |-action (100,10)
  |-language_raw (1,)
  |-observations
      |-images # multi-view
          |-left (100,480,640,3)
          |-right (100,480,640,3)
          |-wrist (100,480,640,3)
      |-joint_positions (100,7)
      |-qpos (100,7)
      |-qvel (100,7)
  1. You have to add one entry in constants.py to specify the path of your data as follows.
    'your_task_name':{
        'dataset_dir': DATA_DIR + '/your_task_path', # define the path of the dataset
        'episode_len': 1000, #max length of the episode,
        'camera_names': ['front', 'wrist'] # define the camera names which are used as the key when reading data
    }

Download Pretrained VLM

We construct the VLM backbone by integrating a series of tiny LLM(Pythia) into Llava framework. We follow the standard training pipe line and data provided by Llava. All the weights of VLM used in our paper are listed as following:

Model Usage Link
Llava-Pythia(~400M) For TinyVLA-S huggingface
Llava-Pythia(~700M) For TinyVLA-B huggingface
Llava-Pythia(~1.3B) For TinyVLA-H huggingface

Train

The training script is "scripts/train.sh". And you need to change following parameters:

  1. OUTPUT :refers to the save directory for training, which must include the keyword "llava_pythia" (and optionally "lora"). If LoRA training is used, the name must include "lora" (e.g., "llava_pythia_lora").
  2. task_name :refers to the tasks used for training, which should be corresponded to "your_task_name" in aloha_scripts/constant.py
  3. model_name_or_path :path to the pretrained VLM weights
  4. Other hyperparameters like "batch_size", "save_steps" could be customized according to your computation resources.

Start training by following commands:

./scripts/train.sh

Evaluation

Before evaluation, we provide a post process script to generate a usable and smaller weights. The process script is "scripts/process_ckpts.sh". And you need to change following parameters:

  1. source_dir :path to trained VLA dir equals to OUTPUT in train.sh
  2. target_dir :path to save processed VLA weights

You can refer to our evaluation script eval_real_franka.py.

Acknowledgement

We build our project based on:

  • LLaVA: an amazing open-sourced project for vision language assistant
  • act-plus-plus: an amazing open-sourced project for robotics visuomotor learning
  • Miphi: an amazing open-sourced project for tiny vision language model

Citation

If you find Tiny-VLA useful for your research and applications, please cite using this BibTeX:

@misc{
    @inproceedings{wen2024tinyvla,
    title={Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation},
    author={Wen, Junjie and Zhu, Yichen and Li, Jinming and Zhu, Minjie and Wu, Kun and Xu, Zhiyuan and Liu, Ning and Cheng, Ran and Shen, Chaomin and Peng, Yaxin and others},
    booktitle={IEEE Robotics and Automation Letters (RA-L)},
    year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published