Why this fork

I'm experimenting with adapting this framework to hex-and-counter wargames. Such games have significant differences from the usual abstract 2-player strategy games:

A player may move one or more pieces during their turn
The player turn is broken down in several phases, where units sometimes move, sometimes attack
Sometimes during a player'r turn, the opponent is asked to take decisions
There are many different types of units
The outcome of attacks is random

In order to understand how to represent some of these things in the AlphaGo-General framework, I'm planning a series of experimental mini-games. The first one is RunToTheTop, where a player may move zero, one, or all their units during their turn.

Alpha Zero General (any game, any framework!)

A simplified, highly flexible, commented and (hopefully) easy to understand implementation of self-play based reinforcement learning based on the AlphaGo Zero paper (Silver et al). It is designed to be easy to adopt for any two-player turn-based adversarial game and any deep learning framework of your choice. A sample implementation has been provided for the game of Othello in PyTorch and Keras. An accompanying tutorial can be found here. We also have implementations for many other games like GoBang and TicTacToe.

To use a game of your choice, subclass the classes in Game.py and NeuralNet.py and implement their functions. Example implementations for Othello can be found in othello/OthelloGame.py and othello/{pytorch,keras}/NNet.py.

Coach.py contains the core training loop and MCTS.py performs the Monte Carlo Tree Search. The parameters for the self-play can be specified in main.py. Additional neural network parameters are in othello/{pytorch,keras}/NNet.py (cuda flag, batch size, epochs, learning rate etc.).

To start training a model for Othello:

python main.py

Choose your framework and game in main.py.

Docker Installation

For easy environment setup, we can use nvidia-docker. Once you have nvidia-docker set up, we can then simply run:

./setup_env.sh

to set up a (default: pyTorch) Jupyter docker container. We can now open a new terminal and enter:

docker exec -ti pytorch_notebook python main.py

Experiments

We trained a PyTorch model for 6x6 Othello (~80 iterations, 100 episodes per iteration and 25 MCTS simulations per turn). This took about 3 days on an NVIDIA Tesla K80. The pretrained model (PyTorch) can be found in pretrained_models/othello/pytorch/. You can play a game against it using pit.py. Below is the performance of the model against a random and a greedy baseline with the number of iterations.

A concise description of our algorithm can be found here.

Citation

If you found this work useful, feel free to cite it as

@misc{thakoor2016learning,
  title={Learning to play othello without human knowledge},
  author={Thakoor, Shantanu and Nair, Surag and Jhunjhunwala, Megha},
  year={2016},
  publisher={Stanford University, Final Project Report}
}

Contributing

While the current code is fairly functional, we could benefit from the following contributions:

Game logic files for more games that follow the specifications in Game.py, along with their neural networks
Neural networks in other frameworks
Pre-trained models for different game configurations
An asynchronous version of the code- parallel processes for self-play, neural net training and model comparison.
Asynchronous MCTS as described in the paper

Some extensions have been implented here.

Contributors and Credits

Shantanu Thakoor and Megha Jhunjhunwala helped with core design and implementation.
Shantanu Kumar contributed TensorFlow and Keras models for Othello.
Evgeny Tyurin contributed rules and a trained model for TicTacToe.
MBoss contributed rules and a model for GoBang.
Jernej Habjan contributed RTS game.
Adam Lawson contributed rules and a trained model for 3D TicTacToe.
Carlos Aguayo contributed rules and a trained model for Dots and Boxes along with a JavaScript implementation.
Robert Ronan contributed rules for Santorini.

Note: Chainer and TensorFlow v1 versions have been removed but can be found prior to commit 2ad461c.

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
connect4		connect4
docker		docker
dotsandboxes		dotsandboxes
gobang		gobang
othello		othello
pretrained_models		pretrained_models
rts		rts
runforthetop		runforthetop
santorini		santorini
tafl		tafl
tictactoe		tictactoe
tictactoe_3d		tictactoe_3d
.gitignore		.gitignore
Arena.py		Arena.py
Coach.py		Coach.py
Game.py		Game.py
LICENSE		LICENSE
MCTS.py		MCTS.py
Makefile		Makefile
NeuralNet.py		NeuralNet.py
README.md		README.md
main.py		main.py
main_runforthetop.py		main_runforthetop.py
pit.py		pit.py
pit_runforthetop.py		pit_runforthetop.py
pit_tictactoe.py		pit_tictactoe.py
requirements.txt		requirements.txt
setup_env.sh		setup_env.sh
show_examples.py		show_examples.py
test_all_games.py		test_all_games.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why this fork

Alpha Zero General (any game, any framework!)

Docker Installation

Experiments

Citation

Contributing

Contributors and Credits

About

Releases

Packages

Languages

License

xpmatteo/alpha-zero-general

Folders and files

Latest commit

History

Repository files navigation

Why this fork

Alpha Zero General (any game, any framework!)

Docker Installation

Experiments

Citation

Contributing

Contributors and Credits

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages