Skip to content

liudengfeng/muzero_xq_v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

supported platforms supported python versions dependencies status style black license MIT discord badge

MuZero General

A commented and documented implementation of MuZero based on the Google DeepMind paper (Nov 2019) and the associated pseudocode. It is designed to be easily adaptable for every games or reinforcement learning environments (like gym). You only need to add a game file with the hyperparameters and the game class. Please refer to the documentation and the example.

MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions. MuZero is also close to Value prediction networks. See How it works.

Features

  • termination and truncation information Gymnasium
  • Residual Network and Fully connected network in PyTorch
  • Multi-Threaded/Asynchronous/Cluster with Ray
  • Multi GPU support for the training and the selfplay
  • TensorBoard real-time monitoring
  • Model weights automatically saved at checkpoints
  • Single and two player mode
  • Commented and documented
  • Easily adaptable for new games
  • Examples of board games, Gym and Atari games (See list of implemented games)
  • Pretrained weights available

Further improvements

These improvements are active research, they are personal ideas and go beyond MuZero paper. We are open to contributions and other ideas.

Demo

All performances are tracked and displayed in real time in TensorBoard :

cartpole training summary

Testing Lunar Lander :

lunarlander training preview

Games already implemented

  • Cartpole (Tested with the fully connected network)
  • Lunar Lander (Tested in deterministic mode with the fully connected network)
  • Gridworld (Tested with the fully connected network)
  • Tic-tac-toe (Tested with the fully connected network and the residual network)
  • Connect4 (Slightly tested with the residual network)
  • Gomoku
  • Twenty-One / Blackjack (Tested with the residual network)
  • Atari Breakout

Tests are done on Ubuntu with 16 GB RAM / Intel i7 / GTX 1050Ti Max-Q. We make sure to obtain a progression and a level which ensures that it has learned. But we do not systematically reach a human level. For certain environments, we notice a regression after a certain time. The proposed configurations are certainly not optimal and we do not focus for now on the optimization of hyperparameters. Any help is welcome.

Code structure

code structure

Network summary:

Getting started

预先准备

Visual Studio Code

WSL2

pygame 中文字体

中文字体

matplotlib 中文字体

中文字体

安装MongoDB

建议在window中安装 参考文档

  • 在 WSL (Ubuntu 20.04) 上安装 MongoDB(版本 6.0):

Anaconda

  • 下载

  • 安装

bash Anaconda3-2022.05-Linux-x86_64.sh
  • 更新
conda update -n base -c defaults conda

pytorch

  • 安装
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
  • 验证

gpu

ray

  • rl 环境
# Install Ray with support for the dashboard + cluster launcher
pip install -U "ray[default]"

# Install Ray with minimal dependencies
# pip install -U ray

Installation

  • 源代码
git clone https://github.com/liudengfeng/muzeroxq.git
cd muzeroxq
conda activate rl
  • 安装包
pip install -r requirements.txt

pip install -U tensorboard-plugin-profile
  • 编译
  1. 启动Visual studio code,将Cmake设置为release模式

编译

  1. 编译

  2. 安装

cd muzeroxq
conda activate rl
pip install . 

# 调试安装
# pip install -e .
  1. 测试
cd muzeroxq
pytest --html report.html

Run

python muzero.py

To visualize the training results, run in a new terminal:

tensorboard --logdir ./results

局域网WSL2访问

目标:局域网内WSL2互访

  1. 固定WSL2地址 编辑文件/etc/wsl.conf,防止没有更新IP地址,文件中包含如下内容
[network]
generateHosts = false

如不存在则新建文件wsl.conf,本地编辑后移动/etc目录下

sudo mv wsl.conf /etc/
  1. 安装工具包
sudo apt install net-tools
  1. 运行powershell脚本 以下需要以管理员身份进入powershell
  • 设置脚本运行政策
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser
  • 编辑脚本文件 文件名称:WSL2.ps1
$remoteport = bash.exe -c "ifconfig eth0 | grep 'inet '"
$found = $remoteport -match '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';

if( $found ){
  $remoteport = $matches[0];
} else{
  Write-Output "The Script Exited, the ip address of WSL 2 cannot be found";
  exit;
}

#[Ports]

# 6379 for ray
# 40050 for redis
#All the ports you want to forward separated by coma
$ports=@(6379,40050,40051,40052);


#[Static ip]
#You can change the addr to your ip config to listen to a specific address
$addr='0.0.0.0';
$ports_a = $ports -join ",";


#Remove Firewall Exception Rules
# 移除旧的防火墙设置
Invoke-Expression "Remove-NetFireWallRule -DisplayName 'WSL2 Firewall Unlock' ";

# #adding Exception Rules for inbound and outbound Rules
Invoke-Expression "New-NetFireWallRule -DisplayName 'WSL2 Firewall Unlock' -Direction Outbound -LocalPort $ports_a -Action Allow -Protocol TCP";
Invoke-Expression "New-NetFireWallRule -DisplayName 'WSL2 Firewall Unlock' -Direction Inbound -LocalPort $ports_a -Action Allow -Protocol TCP";

for( $i = 0; $i -lt $ports.length; $i++ ){
  $port = $ports[$i];
  # 删除旧的端口转发
  Invoke-Expression "netsh interface portproxy delete v4tov4 listenport=$port listenaddress=$addr";
  # 添加新的端口转发
  Invoke-Expression "netsh interface portproxy add v4tov4 listenport=$port listenaddress=$addr connectport=$port connectaddress=$remoteport";
}
  • WSL2.ps1文件所在目录执行
.\WSL2.ps1
  1. 验证
  • WSL2内启动http服务
python -m http.server 6379
  • 本机浏览器内地址栏输入http://localhost:6379/,应当可看到文件目录树

  • 在局域网内另外一台机器的浏览器地址栏输入http://<服务器所在IP地址>:6379/,如可看到同样内容,验证通过

🚨 ray端口配置 Ray 需要在群集中的节点之间进行双向通信。每个节点都应该打开特定的端口来接收传入的网络请求。 因此,需要在每台机器上运行上述过程。

  1. ray本地群集
  • 在每个节点安装
pip install -U "ray[default]"
  • 启动头部节点
ray start --head --port=6379
  • 启动工作节点
# 注意头部节点IP地址为windows中的IP地址,而非WSL2 IP地址
ray start --address=<head-node-address:port>

Start the Head Node

You can adapt the configurations of each game by editing the MuZeroConfig class of the respective file in the games folder.

运行

简单模型

conda activate rl
cd ~/github/muzeroxq
python main.py --op train --force --use_wandb
tensorboard --logdir results --load_fast true

Authors

Please use this bibtex if you want to cite this repository (master branch) in your publications:

@misc{muzero-general,
  author       = {Werner Duvaud, Aurèle Hainaut},
  title        = {MuZero General: Open Reimplementation of MuZero},
  year         = {2019},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\url{https://github.com/werner-duvaud/muzero-general}},
}

Getting involved

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published