The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning
Masked Multimodal Learning (M3L) is a representation learning technique for reinforcement learning that targets robotic manipulation systems provided with vision and high-resolution touch.
Please install tactile_envs
first. Then, install the remaining dependencies:
pip install -r requirements.txt
MUJOCO_GL='egl' python train.py --env tactile_envs/Insertion-v0
MUJOCO_GL='egl' python train.py --env tactile_envs/Insertion-v0 --vision_only_control True
If you find M3L useful for your research, please cite this work:
@article{sferrazza2023power,
title={The power of the senses: Generalizable manipulation from vision and touch through masked multimodal learning},
author={Sferrazza, Carmelo and Seo, Younggyo and Liu, Hao and Lee, Youngwoon and Abbeel, Pieter},
journal={arXiv preprint arXiv:2311.00924},
year={2023}
}
This codebase contains some files adapted from other sources:
- vit-pytorch: https://github.com/lucidrains/vit-pytorch