Reinforcement Learning for Othello

An Othello Environment for Reinforcement Learning Learning and Testing. Run Play.py to start the game.

Requirements:

Othello Environments:
1. Parallelisable: Batch running, full use of GPU. It may not perform as well at smaller batch sizes, but it still can save the cost of transferring data between the CPU and GPU.
2. Support the standard interface of Gymnasium.
Policy Gradient:
1. Only Actor.
2. Online Learning.
3. Use Mask to eliminate moves on invalid positions, and modify the default probability distribution used in calculating KL dispersion accordingly.
4. Versus random win rate: 99%.
PPO:
1. Only Actor.
2. Online Learning.
3. Versus random win rate: 99.8%.
PPO:
1. Actor + Critic.
2. GAE: reducing the variance.
3. Versus random win rate: 99.8%.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.assets		README.assets
model		model
.gitattributes		.gitattributes
APPO.ipynb		APPO.ipynb
Attention.py		Attention.py
CNN.py		CNN.py
Othello.py		Othello.py
PPO.ipynb		PPO.ipynb
PPO_critic.ipynb		PPO_critic.ipynb
Play.py		Play.py
PolicyGradient.ipynb		PolicyGradient.ipynb
README.md		README.md