Reinforcement Learning (RL) is an area of machine learning focused on agents maximizing a total reward after a duration in an environment. The agent is often some robot or game avatar, but it can also be a recommender system, a notification bot, and a variety of other avatars that make decisions. The reward can be points in a game, or more engaging content on a website. Facebook uses reinforcement learning to power several efforts in the company. Sharing an open-source fork of our caffe2 RL framework allows us to give back to the open source community and also collaborate with other institutions as RL finds more applications in industry.
This project, called RL_Caffe2, contains several RL implementations built on caffe2 and running inside OpenAI Gym.
RL_Caffe2 runs on any platform that supports caffe2 and OpenAI Gym. Notably, windows support for OpenAI Gym is being tracked here: openai/gym#11 .
For mac users, we recommend using Anaconda instead of the system implementation of python. The system python does not support upgrading numpy and is outdated in other ways. Install anaconda and ensure that you are on the anaconda version of python before installing the other dependencies.
To install caffe2, follow this tutorial: Installing Caffe2.
The KNN-DQN model depends on FAISS. For details on installing FAISS, go here:
OpenAI Gym can be installed using pip which should come with your python installation in the case of linux or with anaconda in the case of OS/X. For the basic environments, run:
pip install gym
This installs the basic version with these domains:
To install all environments, run this instead:
pip install "gym[all]"
Clone and Install from source:
git clone
cd reinforcement-learning-models
python build
Checking arguments from helper
python -h
Train models by specifying openai-gym environment, model id, type and other hyper parameters (by default, using environment and model setting: -g CartPole-v0 -m DQN):
python -g CartPole-v0 -l 0.1
python -g CartPole-v1 -y 2 -z 200
python -g Acrobot-v1 -w 1000 -r
python -g FrozenLake-v0 -l 0.5 -y 2 -z 100 -i 5000
python -g MountainCar-v0 -l 0.1 -w 5000
python -g MountainCarContinuous-v0 -m ACTORCRITIC
python -g Pendulum-v0 -m ACTORCRITIC -l 0.01 -y 10 -z 500 -x -1 -i 50000 -w 10000 -c
If you installed caffe2 from source, you may need to first run:
export PYTHONPATH=/usr/local:$PYTHONPATH
Evaluate models with -t option and specifying openai-gym environment, model id and type:
python -t [... rests same as trainer]
A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track.
python -g CartPole-v0 -o ADAGRAD -l 0.1
When validating, the average reward should be > 195.0
python -g CartPole-v0 -o ADAGRAD -l 0.1 -t
python -g CartPole-v1 -y 2 -z 200
Average reward should be > 475
python -g CartPole-v1 -y 2 -z 200 -t
Check out this page for the success criteria of additional environments:
Currently we are releasing SARSA, DQN-max-action, and Actor-Critic models.
- SARSA: on-policy td-learing
- input: state: s_t, discrete action: a_t
- output: value_of_a: Q(s_t, a_t)
- DQN-max-action: Deep Q Network from dqn-Atari by Deepmind
- input: state: s_t
- output: value_max_a: Q_max(s_t, a)
- Actor-Critic: ActorCritic-mujoco (deepmind)
- input: state: s_t, continuous action: a_t
- output: policy: u(s_t), value_of_u: Q(s_t, u(s_t))
If there are any issues/feedback with the implementations, feel free to file an issue:
Otherwise feel free to contact [email protected] with questions.
rl_caffe2 is BSD-licensed. We also provide an additional patent grant.