A commented and documented implementation of MuZero based on the Google DeepMind paper (Nov 2019) and the associated pseudocode. It is designed to be easily adaptable for every games or reinforcement learning environments (like gym). You only need to add a game file with the hyperparameters and the game class. Please refer to the documentation and the example.
MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions. MuZero is also close to Value prediction networks. See How it works.
- termination and truncation information Gymnasium
- Residual Network and Fully connected network in PyTorch
- Multi-Threaded/Asynchronous/Cluster with Ray
- Multi GPU support for the training and the selfplay
- TensorBoard real-time monitoring
- Model weights automatically saved at checkpoints
- Single and two player mode
- Commented and documented
- Easily adaptable for new games
- Examples of board games, Gym and Atari games (See list of implemented games)
- Pretrained weights available
These improvements are active research, they are personal ideas and go beyond MuZero paper. We are open to contributions and other ideas.
- Hyperparameter search
- Continuous action space
- Tool to understand the learned model
- Support of stochastic environments
- Support of more than two player games
- RL tricks (Never Give Up, Adaptive Exploration, ...)
All performances are tracked and displayed in real time in TensorBoard :
Testing Lunar Lander :
- Cartpole (Tested with the fully connected network)
- Lunar Lander (Tested in deterministic mode with the fully connected network)
- Gridworld (Tested with the fully connected network)
- Tic-tac-toe (Tested with the fully connected network and the residual network)
- Connect4 (Slightly tested with the residual network)
- Gomoku
- Twenty-One / Blackjack (Tested with the residual network)
- Atari Breakout
Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome.
Network summary:
建议在window
中安装
参考文档
- 在 WSL (Ubuntu 20.04) 上安装 MongoDB(版本 6.0):
-
下载
-
安装
bash Anaconda3-2022.05-Linux-x86_64.sh
- 更新
conda update -n base -c defaults conda
- 安装
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
- 验证
- rl 环境
# Install Ray with support for the dashboard + cluster launcher
pip install -U "ray[default]"
# Install Ray with minimal dependencies
# pip install -U ray
- 源代码
git clone https://github.com/liudengfeng/muzeroxq.git
cd muzeroxq
conda activate rl
- 安装包
pip install -r requirements.txt
pip install -U tensorboard-plugin-profile
- 编译
- 启动
Visual studio code
,将Cmake
设置为release
模式
-
编译
-
安装
cd muzeroxq
conda activate rl
pip install .
# 调试安装
# pip install -e .
- 测试
cd muzeroxq
pytest --html report.html
python muzero.py
To visualize the training results, run in a new terminal:
tensorboard --logdir ./results
目标:局域网内WSL2互访
- 固定
WSL2
地址 编辑文件/etc/wsl.conf
,防止没有更新IP地址,文件中包含如下内容
[network]
generateHosts = false
如不存在则新建文件wsl.conf
,本地编辑后移动/etc
目录下
sudo mv wsl.conf /etc/
- 安装工具包
sudo apt install net-tools
- 运行
powershell
脚本 以下需要以管理员身份进入powershell
- 设置脚本运行政策
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
- 编辑脚本文件 文件名称:WSL2.ps1
$remoteport = bash.exe -c "ifconfig eth0 | grep 'inet '"
$found = $remoteport -match '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';
if( $found ){
$remoteport = $matches[0];
} else{
Write-Output "The Script Exited, the ip address of WSL 2 cannot be found";
exit;
}
#[Ports]
# 6379 for ray
# 40050 for redis
#All the ports you want to forward separated by coma
$ports=@(6379,40050,40051,40052);
#[Static ip]
#You can change the addr to your ip config to listen to a specific address
$addr='0.0.0.0';
$ports_a = $ports -join ",";
#Remove Firewall Exception Rules
# 移除旧的防火墙设置
Invoke-Expression "Remove-NetFireWallRule -DisplayName 'WSL2 Firewall Unlock' ";
# #adding Exception Rules for inbound and outbound Rules
Invoke-Expression "New-NetFireWallRule -DisplayName 'WSL2 Firewall Unlock' -Direction Outbound -LocalPort $ports_a -Action Allow -Protocol TCP";
Invoke-Expression "New-NetFireWallRule -DisplayName 'WSL2 Firewall Unlock' -Direction Inbound -LocalPort $ports_a -Action Allow -Protocol TCP";
for( $i = 0; $i -lt $ports.length; $i++ ){
$port = $ports[$i];
# 删除旧的端口转发
Invoke-Expression "netsh interface portproxy delete v4tov4 listenport=$port listenaddress=$addr";
# 添加新的端口转发
Invoke-Expression "netsh interface portproxy add v4tov4 listenport=$port listenaddress=$addr connectport=$port connectaddress=$remoteport";
}
- 在
WSL2.ps1
文件所在目录执行
.\WSL2.ps1
- 验证
- 在
WSL2
内启动http
服务
python -m http.server 6379
-
本机浏览器内地址栏输入
http://localhost:6379/
,应当可看到文件目录树 -
在局域网内另外一台机器的浏览器地址栏输入
http://<服务器所在IP地址>:6379/
,如可看到同样内容,验证通过
🚨 ray
端口配置
Ray 需要在群集中的节点之间进行双向通信。每个节点都应该打开特定的端口来接收传入的网络请求。
因此,需要在每台机器上运行上述过程。
ray
本地群集
- 在每个节点安装
pip install -U "ray[default]"
- 启动头部节点
ray start --head --port=6379
- 启动工作节点
# 注意头部节点IP地址为windows中的IP地址,而非WSL2 IP地址
ray start --address=<head-node-address:port>
Start the Head Node
You can adapt the configurations of each game by editing the MuZeroConfig
class of the respective file in the games folder.
conda activate rl
cd ~/github/muzeroxq
python main.py --op train --force --use_wandb
tensorboard --logdir results --load_fast true
- Werner Duvaud
- Aurèle Hainaut
- Paul Lenoir
- Contributors
Please use this bibtex if you want to cite this repository (master branch) in your publications:
@misc{muzero-general,
author = {Werner Duvaud, Aurèle Hainaut},
title = {MuZero General: Open Reimplementation of MuZero},
year = {2019},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/werner-duvaud/muzero-general}},
}
- GitHub Issues: For reporting bugs.
- Pull Requests: For submitting code contributions.
- Discord server: For discussions about development or any general questions.