An Othello Environment for Reinforcement Learning Learning and Testing. Run Play.py to start the game.
- numpy
- matplotlib
- torch
- einops
- tqdm
- gymnasium
-
Othello Environments:
- Parallelisable: Batch running, full use of GPU. It may not perform as well at smaller batch sizes, but it still can save the cost of transferring data between the CPU and GPU.
- Support the standard interface of Gymnasium.
-
Policy Gradient:
- Only Actor.
- Online Learning.
- Use Mask to eliminate moves on invalid positions, and modify the default probability distribution used in calculating KL dispersion accordingly.
- Versus random win rate: 99%.
-
PPO:
- Only Actor.
- Online Learning.
- Versus random win rate: 99.8%.
-
PPO:
- Actor + Critic.
- GAE: reducing the variance.
- Versus random win rate: 99.8%.
-
More Algorithms:
- PPO ✅
- SAC ⌛
- DQN ⌛
- DDQN ⌛
- TD3 ⌛
-
More Tricks:
- GAE + Critic ✅
- Monte-Carlo Search ⌛