Skip to content

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning implemented in PyTorch

License

Notifications You must be signed in to change notification settings

jkulhanek/visual-navigation-agent-pytorch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Target-driven Visual Navigation Model using Deep Reinforcement Learning

This is implementation of http://web.stanford.edu/~yukez/papers/icra2017.pdf in PyTorch. It attempts to achieve the same results as the Tensorflow implementation, which can be found here: https://github.com/zfw1226/icra2017-visual-navigation.

THOR scene samples

Introduction

This repocitory provides a Tensorflow implementation of the deep siamese actor-critic model for indoor scene navigation introduced in the following paper:

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, and Ali Farhadi
ICRA 2017, Singapore

Setup and run

This code is implemented in Pytorch 0.4. It uses Docker to automate instalation process. In order to run this code, I recommend pulling it from my dockerhub repository.

You can run those commands in order to start training:

git clone https://github.com/jkulhanek/visual-navigation-agent-pytorch
docker-compose run train

Scenes

To facilitate training, we provide hdf5 dumps of the simulated scenes. Each dump contains the agent's first-person observations sampled from a discrete grid in four cardinal directions. To be more specific, each dump stores the following information row by row:

  • observation: 300x400x3 RGB image (agent's first-person view)
  • resnet_feature: 2048-d ResNet-50 feature extracted from the observations
  • location: (x, y) coordinates of the sampled scene locations on a discrete grid with 0.5-meter offset
  • rotation: agent's rotation in one of the four cardinal directions, 0, 90, 180, and 270 degrees
  • graph: a state-action transition graph, where graph[i][j] is the location id of the destination by taking action j in location i, and -1 indicates collision while the agent stays in the same place.
  • shortest_path_distance: a square matrix of shortest path distance (in number of steps) between pairwise locations, where -1 means two states are unreachable from each other.

Acknowledgements

I would like to acknowledge the following references that have offered great help for me to implement the model.

Citation

Please cite our ICRA'17 paper if you find this code useful for your research.

@InProceedings{zhu2017icra,
  title = {{Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning}},
  author = {Yuke Zhu and Roozbeh Mottaghi and Eric Kolve and Joseph J. Lim and Abhinav Gupta and Li Fei-Fei and Ali Farhadi},
  booktitle = {{IEEE International Conference on Robotics and Automation}},
  year = 2017,
}

License

MIT

About

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning implemented in PyTorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published