TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Modelsfor Robotic Manipulation

📰 News

Feb. 17th, 2025: 🔥🔥🔥Our code is released!
Feb. 9th, 2025: 🔥🔥🔥TinyVLA is accepted by IEEE Robotics and Automation Letters (RA-L) 2025!
Nov. 19th, 2024: TinyVLA is out! Paper can be found here. The project web can be found here.

Install

Clone this repository and navigate to diffusion-vla folder

git clone https://github.com/liyaxuanliyaxuan/TinyVLA

Install Package

conda create -n tinyvla python=3.10 -y
conda activate tinyvla
pip install --upgrade pip  # 
pip install -r requirements.txt
cd policy_heads
pip install -e . 
# install llava-pythia
cd ../llava-pythia
pip install -e .

Data Preparation

Our data format is the same as act, so you need to transfer your data into h5py format. You can refer to the rlds_to_h5py.py which is used to transfer the data from rlds format to h5py format.

# h5 data structure
root
  |-action (100,10)
  |-language_raw (1,)
  |-observations
      |-images # multi-view
          |-left (100,480,640,3)
          |-right (100,480,640,3)
          |-wrist (100,480,640,3)
      |-joint_positions (100,7)
      |-qpos (100,7)
      |-qvel (100,7)

You have to add one entry in constants.py to specify the path of your data as follows.

    'your_task_name':{
        'dataset_dir': DATA_DIR + '/your_task_path', # define the path of the dataset
        'episode_len': 1000, #max length of the episode,
        'camera_names': ['front', 'wrist'] # define the camera names which are used as the key when reading data
    }

Download Pretrained VLM

We construct the VLM backbone by integrating a series of tiny LLM(Pythia) into Llava framework. We follow the standard training pipe line and data provided by Llava. All the weights of VLM used in our paper are listed as following:

Model	Usage	Link
Llava-Pythia(~400M)	For TinyVLA-S	huggingface
Llava-Pythia(~700M)	For TinyVLA-B	huggingface
Llava-Pythia(~1.3B)	For TinyVLA-H	huggingface

Train

The training script is "scripts/train.sh". And you need to change following parameters:

OUTPUT :refers to the save directory for training, which must include the keyword "llava_pythia" (and optionally "lora"). If LoRA training is used, the name must include "lora" (e.g., "llava_pythia_lora").
task_name :refers to the tasks used for training, which should be corresponded to "your_task_name" in aloha_scripts/constant.py
model_name_or_path :path to the pretrained VLM weights
Other hyperparameters like "batch_size", "save_steps" could be customized according to your computation resources.

Start training by following commands:

./scripts/train.sh

Evaluation

Before evaluation, we provide a post process script to generate a usable and smaller weights. The process script is "scripts/process_ckpts.sh". And you need to change following parameters:

source_dir :path to trained VLA dir equals to OUTPUT in train.sh
target_dir :path to save processed VLA weights

You can refer to our evaluation script eval_real_franka.py.

Acknowledgement

We build our project based on:

LLaVA: an amazing open-sourced project for vision language assistant
act-plus-plus: an amazing open-sourced project for robotics visuomotor learning
Miphi: an amazing open-sourced project for tiny vision language model

Citation

If you find Tiny-VLA useful for your research and applications, please cite using this BibTeX:

@misc{
    @inproceedings{wen2024tinyvla,
    title={Tinyvla: Towards fast, data-efficient vision-language-action models for robotic manipulation},
    author={Wen, Junjie and Zhu, Yichen and Li, Jinming and Zhu, Minjie and Wu, Kun and Xu, Zhiyuan and Liu, Ning and Cheng, Ran and Shen, Chaomin and Peng, Yaxin and others},
    booktitle={IEEE Robotics and Automation Letters (RA-L)},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.idea		.idea
__pycache__		__pycache__
aloha_scripts		aloha_scripts
data_utils		data_utils
llava-pythia		llava-pythia
policy_heads		policy_heads
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval_real_franka.py		eval_real_franka.py
requirements.txt		requirements.txt
setup.py		setup.py
torch_utils.py		torch_utils.py
train_tinyvla.py		train_tinyvla.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

📰 News

Contents

Install

Data Preparation

Download Pretrained VLM

Train

Evaluation

Acknowledgement

Citation

About

Releases

Packages

Languages

License

liyaxuanliyaxuan/TinyVLA

Folders and files

Latest commit

History

Repository files navigation

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

📰 News

Contents

Install

Data Preparation

Download Pretrained VLM

Train

Evaluation

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages