This is a Tensorflow 2.0 implementation of the MADDPG algorithm. This work is completed by Bowei He(Buaa) and Alex Zhao(Umich). The code is largely based on the original MADDPG implementation.
This is the code for implementing the MADDPG algorithm presented in the paper: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. It is configured to be run in conjunction with environments from the Multi-Agent Particle Environments (MPE). Note: this codebase has been restructured since the original paper, and the results may vary from those reported in the paper.
Update: the original implementation for policy ensemble and policy estimation can be found here. The code is provided as-is.
-
To install,
cd
into the root directory and typepip install -e .
-
Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.8.0), numpy (1.14.5)
We demonstrate here how the code can be used in conjunction with the Multi-Agent Particle Environments (MPE).
-
Download and install the MPE code here by following the
README
. -
Ensure that
multiagent-particle-envs
has been added to yourPYTHONPATH
(e.g. in~/.bashrc
or~/.bash_profile
). -
To run the code,
cd
into theexperiments
directory and runtrain.py
:
python train.py --scenario simple
- You can replace
simple
with any environment in the MPE you'd like to run.
-
--scenario
: defines which environment in the MPE is to be used (default:"simple"
) -
--max-episode-len
maximum length of each episode for the environment (default:25
) -
--num-episodes
total number of training episodes (default:60000
) -
--num-adversaries
: number of adversaries in the environment (default:0
) -
--good-policy
: algorithm used for the 'good' (non adversary) policies in the environment (default:"maddpg"
; options: {"maddpg"
,"ddpg"
}) -
--adv-policy
: algorithm used for the adversary policies in the environment (default:"maddpg"
; options: {"maddpg"
,"ddpg"
})
-
--lr
: learning rate (default:1e-2
) -
--gamma
: discount factor (default:0.95
) -
--batch-size
: batch size (default:1024
) -
--num-units
: number of units in the MLP (default:64
)
-
--exp-name
: name of the experiment, used as the file name to save all results (default:None
) -
--save-dir
: directory where intermediate training results and model will be saved (default:"/tmp/policy/"
) -
--save-rate
: model is saved every time this number of episodes has been completed (default:1000
) -
--load-dir
: directory where training state and model are loaded from (default:""
)
-
--restore
: restores previous training state stored inload-dir
(or insave-dir
if noload-dir
has been provided), and continues training (default:False
) -
--display
: displays to the screen the trained policy stored inload-dir
(or insave-dir
if noload-dir
has been provided), but does not continue training (default:False
) -
--benchmark
: runs benchmarking evaluations on saved policy, saves results tobenchmark-dir
folder (default:False
) -
--benchmark-iters
: number of iterations to run benchmarking for (default:100000
) -
--benchmark-dir
: directory where benchmarking data is saved (default:"./benchmark_files/"
) -
--plots-dir
: directory where training curves are saved (default:"./learning_curves/"
)
-
./experiments/train.py
: contains code for training MADDPG on the MPE -
./maddpg/trainer/maddpg.py
: core code for the MADDPG algorithm -
./maddpg/trainer/replay_buffer.py
: replay buffer code for MADDPG -
./maddpg/common/distributions.py
: useful distributions used inmaddpg.py
-
./maddpg/common/tf_util.py
: useful tensorflow functions used inmaddpg.py
If you used this code for your experiments or found it helpful, consider citing the following paper:
@article{lowe2017multi, title={Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments}, author={Lowe, Ryan and Wu, Yi and Tamar, Aviv and Harb, Jean and Abbeel, Pieter and Mordatch, Igor}, journal={Neural Information Processing Systems (NIPS)}, year={2017} }