MuZero General

A commented and documented implementation of MuZero based on the Google DeepMind paper (Nov 2019) and the associated pseudocode. It is designed to be easily adaptable for every games or reinforcement learning environments (like gym). You only need to add a game file with the hyperparameters and the game class. Please refer to the documentation and the example.

MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions. MuZero is also close to Value prediction networks. See How it works.

Features

Further improvements

These improvements are active research, they are personal ideas and go beyond MuZero paper. We are open to contributions and other ideas.

Hyperparameter search
Continuous action space
Tool to understand the learned model
Support of stochastic environments
Support of more than two player games
RL tricks (Never Give Up, Adaptive Exploration, ...)

Demo

All performances are tracked and displayed in real time in TensorBoard :

Testing Lunar Lander :

Games already implemented

Cartpole (Tested with the fully connected network)
Lunar Lander (Tested in deterministic mode with the fully connected network)
Gridworld (Tested with the fully connected network)
Tic-tac-toe (Tested with the fully connected network and the residual network)
Connect4 (Slightly tested with the residual network)
Gomoku
Twenty-One / Blackjack (Tested with the residual network)
Atari Breakout

Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome.

Code structure

Network summary:

Getting started

预先准备

Visual Studio Code

WSL2

pygame 中文字体

中文字体

matplotlib 中文字体

中文字体

安装MongoDB

建议在window中安装参考文档

在 WSL (Ubuntu 20.04) 上安装 MongoDB（版本 6.0）：

Anaconda

下载
安装

bash Anaconda3-2022.05-Linux-x86_64.sh

更新

conda update -n base -c defaults conda

pytorch

安装

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

验证

ray

rl 环境

# Install Ray with support for the dashboard + cluster launcher
pip install -U "ray[default]"

# Install Ray with minimal dependencies
# pip install -U ray

Installation

源代码

git clone https://github.com/liudengfeng/muzeroxq.git
cd muzeroxq
conda activate rl

安装包

pip install -r requirements.txt

pip install -U tensorboard-plugin-profile

编译

启动Visual studio code，将Cmake设置为release模式

编译
安装

cd muzeroxq
conda activate rl

pip install . 

# 调试安装
# pip install -e .

测试

cd muzeroxq
pytest --html report.html

Run

python muzero.py

To visualize the training results, run in a new terminal:

tensorboard --logdir ./results

局域网WSL2访问

目标：局域网内WSL2互访

固定WSL2地址编辑文件/etc/wsl.conf，防止没有更新IP地址，文件中包含如下内容

[network]
generateHosts = false

如不存在则新建文件wsl.conf，本地编辑后移动/etc目录下

sudo mv wsl.conf /etc/

安装工具包

sudo apt install net-tools

运行powershell脚本以下需要以管理员身份进入powershell

设置脚本运行政策

Set-ExecutionPolicy RemoteSigned -Scope CurrentUser

编辑脚本文件文件名称：WSL2.ps1

$remoteport = bash.exe -c "ifconfig eth0 | grep 'inet '"
$found = $remoteport -match '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';

if( $found ){
  $remoteport = $matches[0];
} else{
  Write-Output "The Script Exited, the ip address of WSL 2 cannot be found";
  exit;
}

#[Ports]

# 6379 for ray
# 40050 for redis
#All the ports you want to forward separated by coma
$ports=@(6379,40050,40051,40052);


#[Static ip]
#You can change the addr to your ip config to listen to a specific address
$addr='0.0.0.0';
$ports_a = $ports -join ",";


#Remove Firewall Exception Rules
# 移除旧的防火墙设置
Invoke-Expression "Remove-NetFireWallRule -DisplayName 'WSL2 Firewall Unlock' ";

# #adding Exception Rules for inbound and outbound Rules
Invoke-Expression "New-NetFireWallRule -DisplayName 'WSL2 Firewall Unlock' -Direction Outbound -LocalPort $ports_a -Action Allow -Protocol TCP";
Invoke-Expression "New-NetFireWallRule -DisplayName 'WSL2 Firewall Unlock' -Direction Inbound -LocalPort $ports_a -Action Allow -Protocol TCP";

for( $i = 0; $i -lt $ports.length; $i++ ){
  $port = $ports[$i];
  # 删除旧的端口转发
  Invoke-Expression "netsh interface portproxy delete v4tov4 listenport=$port listenaddress=$addr";
  # 添加新的端口转发
  Invoke-Expression "netsh interface portproxy add v4tov4 listenport=$port listenaddress=$addr connectport=$port connectaddress=$remoteport";
}

.\WSL2.ps1

验证

在WSL2内启动http服务

python -m http.server 6379

本机浏览器内地址栏输入http://localhost:6379/，应当可看到文件目录树
在局域网内另外一台机器的浏览器地址栏输入http://<服务器所在IP地址>:6379/，如可看到同样内容，验证通过

🚨 ray端口配置 Ray 需要在群集中的节点之间进行双向通信。每个节点都应该打开特定的端口来接收传入的网络请求。因此，需要在每台机器上运行上述过程。

ray本地群集

在每个节点安装

pip install -U "ray[default]"

启动头部节点

ray start --head --port=6379

启动工作节点

# 注意头部节点IP地址为windows中的IP地址，而非WSL2 IP地址
ray start --address=<head-node-address:port>

Start the Head Node

You can adapt the configurations of each game by editing the MuZeroConfig class of the respective file in the games folder.

运行

简单模型

conda activate rl
cd ~/github/muzeroxq
python main.py --op train --force --use_wandb

tensorboard --logdir results --load_fast true

Authors

Werner Duvaud
Aurèle Hainaut
Paul Lenoir
Contributors

Please use this bibtex if you want to cite this repository (master branch) in your publications:

@misc{muzero-general,
  author       = {Werner Duvaud, Aurèle Hainaut},
  title        = {MuZero General: Open Reimplementation of MuZero},
  year         = {2019},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\url{https://github.com/werner-duvaud/muzero-general}},
}

Getting involved

GitHub Issues: For reporting bugs.
Pull Requests: For submitting code contributions.
Discord server: For discussions about development or any general questions.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
assets		assets
cpp		cpp
docs		docs
muzero		muzero
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE copy		LICENSE copy
README.md		README.md
demo_absorb.ipynb		demo_absorb.ipynb
demo_absorb.py		demo_absorb.py
demo_duration.py		demo_duration.py
demo_env_human.py		demo_env_human.py
demo_env_space.ipynb		demo_env_space.ipynb
demo_env_video.py		demo_env_video.py
demo_gen_data.py		demo_gen_data.py
demo_mcts.py		demo_mcts.py
demo_mcts_tree.ipynb		demo_mcts_tree.ipynb
demo_memory_profile.py		demo_memory_profile.py
demo_model.ipynb		demo_model.ipynb
demo_profile.py		demo_profile.py
demo_ray_memory.ipynb		demo_ray_memory.ipynb
demo_selfplay.py		demo_selfplay.py
demo_train.ipynb		demo_train.ipynb
demo_train.py		demo_train.py
demo_vec_env.py		demo_vec_env.py
main.py		main.py
mymodule.py		mymodule.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuZero General

Features

Further improvements

Demo

Games already implemented

Code structure

Getting started

预先准备

Visual Studio Code

WSL2

pygame 中文字体

matplotlib 中文字体

安装MongoDB

Anaconda

pytorch

ray

Installation

Run

局域网WSL2访问

运行

简单模型

Authors

Getting involved

About

Releases

Packages

Languages

License

liudengfeng/muzero_xq_v2

Folders and files

Latest commit

History

Repository files navigation

MuZero General

Features

Further improvements

Demo

Games already implemented

Code structure

Getting started

预先准备

Visual Studio Code

WSL2

pygame 中文字体

matplotlib 中文字体

安装MongoDB

Anaconda

pytorch

ray

Installation

Run

局域网WSL2访问

运行

简单模型

Authors

Getting involved

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages