This is a simplified implementation of MiniGo based on the code provided by the authors: MiniGo.
MiniGo is a minimalist Go engine modeled after AlphaGo Zero, "Mastering the Game of Go without Human Knowledge". An useful one-diagram overview of Alphago Zero can be found in the cheat sheet.
The implementation of MiniGo consists of three main components: the DualNet model, the Monte Carlo Tree Search (MCTS), and Go domain knowledge. Currently, the DualNet model is our focus.
DualNet is the neural network used in MiniGo. It's based on residual blocks with two heads output. Following is a brief overview of the DualNet architecture.
The input to the neural network is a [board_size * board_size * 17] image stack comprising 17 binary feature planes. 8 feature planes consist of binary values indicating the presence of the current player's stones; A further 8 feature planes represent the corresponding features for the opponent's stones; The final feature plane represents the color to play, and has a constant value of either 1 if black is to play or 0 if white to play. Check features.py for more details.
In MiniGo implementation, the input features are processed by a residual tower that consists of a single convolutional block followed by either 9 or 19 residual blocks. The convolutional block applies the following modules:
- A convolution of num_filter filters of kernel size 3 x 3 with stride 1
- Batch normalization
- A rectifier non-linearity
Each residual block applies the following modules sequentially to its input:
- A convolution of num_filter filters of kernel size 3 x 3 with stride 1
- Batch normalization
- A rectifier non-linearity
- A convolution of num_filter filters of kernel size 3 x 3 with stride 1
- Batch normalization
- A skip connection that adds the input to the block
- A rectifier non-linearity
Note: num_filter is 128 for 19 x 19 board size, and 32 for 9 x 9 board size in MiniGo implementation.
The output of the residual tower is passed into two separate "heads" for computing the policy and value respectively. The policy head applies the following modules:
- A convolution of 2 filters of kernel size 1 x 1 with stride 1
- Batch normalization
- A rectifier non-linearity
- A fully connected linear layer that outputs a vector of size (board_size * board_size + 1) corresponding to logit probabilities for all intersections and the pass move
The value head applies the following modules:
- A convolution of 1 filter of kernel size 1 x 1 with stride 1
- Batch normalization
- A rectifier non-linearity
- A fully connected linear layer to a hidden layer of size 256 for 19 x 19 board size and 64 for 9x9 board size
- A rectifier non-linearity
- A fully connected linear layer to a scalar
- A tanh non-linearity outputting a scalar in the range [-1, 1]
In MiniGo, the overall network depth, in the 10 or 20 block network, is 19 or 39 parameterized layers respectively for the residual tower, plus an additional 2 layers for the policy head and 3 layers for the value head.
This project assumes you have virtualenv, TensorFlow (>= 1.5) and two other Go-related packages pygtp(>=0.4) and sgf (==0.5).
One iteration of reinforcement learning (RL) consists of the following steps:
- Bootstrap: initializes a random DualNet model. If the estimator directory has exist, the model is initialized with the last checkpoint.
- Selfplay: plays games with the latest model or the best model so far identified by evaluation, producing data used for training
- Gather: groups games played with the same model into larger files of tfexamples.
- Train: trains a new model with the selfplay results from the most recent N generations.
To run the RL pipeline, issue the following command:
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256
Arguments:
--base_dir
: Base directory for MiniGo data and models. If not specified, it's set as /tmp/minigo/ by default.--board_size
: Go board size. It can be either 9 or 19. By default, it's 9.--batch_size
: Batch size for model training. If not specified, it's calculated based on go board size. Use the--help
or-h
flag to get a full list of possible arguments. Besides all these arguments, other parameters about RL pipeline and DualNet model can be found and configured in model_params.py.
Suppose the base directory argument base_dir
is $HOME/minigo/
and we use 9 as the board_size
. After model training, the following directories are created to store models and game data:
$HOME/minigo # base directory
│
├── 9_size # directory for 9x9 board size
│ │
│ ├── data
│ │ ├── holdout # holdout data for model validation
│ │ ├── selfplay # data generated by selfplay of each model
│ │ └── training_chunks # gatherd tf_examples for model training
│ │
│ ├── estimator_model_dir # estimator working directory
│ │
│ ├── trained_models # all the trained models
│ │
│ └── sgf # sgf (smart go files) folder
│ ├── 000000-bootstrap # model name
│ │ ├── clean # clean sgf files of model selfplay
│ │ └── full # full sgf files of model selfplay
│ ├── ...
│ └── evaluate # clean sgf files of model evaluation
│
└── ...
To validate the trained model, issue the following command with --validation
argument:
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --validation
The performance of two models are compared with evaluation step. Given two models, one plays black and the other plays white. They play several games (# of games can be configured by parameter eval_games
in model_params.py), and the one wins by a margin of 55% will be the winner.
To include the evaluation step in the RL pipeline, --evaluation
argument can be specified to compare the performance of the current_trained_model
and the best_model_so_far
. The winner is used to update best_model_so_far
. Run the following command to include evaluation step in the pipeline:
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --evaluation
As the whole RL pipeline may take hours to train even for a 9x9 board size, a --test
argument is provided to test the pipeline quickly with a dummy neural network model.
To test the RL pipeline with a dummy model, issue the following command:
python minigo.py --base_dir=$HOME/minigo/ --board_size=9 --batch_size=256 --test
Self-play only option is provided to run selfplay step individually to generate training data in parallel. Issue the following command to run selfplay only with the latest trained model:
python minigo.py --selfplay
Other optional arguments:
--selfplay_model_name
: The name of the model used for selfplay only. If not specified, the latest trained model will be used for selfplay.--selfplay_max_games
: The maximum number of games selfplay is required to generate. If not specified, the default parametermax_games_per_generation
is used.