Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
polarbart committed Feb 3, 2021
1 parent 0e829a1 commit 5274816
Show file tree
Hide file tree
Showing 17 changed files with 219 additions and 45 deletions.
173 changes: 164 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,170 @@
# Super Hexagon AI

- Based on Reinforcement Learning (Q-Learning)
- Beats all six levels easily end to end (i. e. from pixel input)
- Fast C++ implementation that hooks into the game and hands the screen over to the python process
- Based on reinforcement learning (Distributional Q-Learning [1] with Noisy Nets [2])
- Beats all six levels easily end to end (i. e. from pixel input to action output)
- A level counts as beaten if one survives at least 60 seconds
- Fast C++ implementation that hooks into the game and hands the frames over to the python process

## Example
<a href="https://youtu.be/gjqSZ_4mQjg" target="_blank"> <img src="images/thumbnail+controls.png"> </a>

## Results

- The graphs below show the performance of the AI
- The AI was trained for roughly 1.5 days resulting in roughly 9 days of in-game time
- The AI quickly learns to complete all levels, after a few hours of in-game time
- A level counts as completed if the player survives for 60+ seconds

<div align="center">
<img src="images/smoothed_time_alive_2.png" width="100%">
<br>
<img src="images/highscores_2.png" width="100%">
</div>

### Glitches
The AI discovered that it can glitch through a wall in some occurrences.

The left glitch was also discovered by <a href="https://youtu.be/CI04x9au_Es" target="_blank">others</a>.
The right glitch may only be possible when rotation is disabled.

<div align="center">
<img src="images/a_compressed.gif" width="47%">
&nbsp;
<img src="images/b_compressed2.gif" width="47%">
</div>

## Implementational Details

### Training
- The reinforcement learning agent receives a reward of -1 if it dies otherwise a reward of 0
- The network has two convolutional streams
- For the first stream the frame is cropped to a square and resized to 60 by 60 pixel
- Since the small triangle which is controlled by the player is barely visible for the first stream,
a zoomed-in input is given to the second stream
- For the fully connected part of the network the feature maps of both streams are flattened and concatenated
- See `utils.Network` for more implementational details
- Additionally, a threshold function is applied to the frame
such that the walls and the player belong to the foreground and everything else belongs to the background
- See `superhexagon.SuperHexagonInterface._preprocess_frame` for more implementational details
- The used hyperparameters can be found at the bottom of `trainer.py` below `if __name__ == '__main__':`


### Rainbow
- All six Rainbow [3] extensions have been evaluated
- Double Q-Learning [4] and Dueling Networks [5] did not improve the performance
- n-step significantly decreased the performance
- Prioritized experience replay [6] at first performs better,
however, after roughly 300,000 training steps the agent trained without prioritized experience replay performs better
- The distributional approach [1] significantly increases the performance of the agent
- Distributional RL with quantile regression [7] gives similar results
- Noisy networks [2] facilitate the exploration process
- However, the noise is turned off after 500,000 training iterations

### PyRLHook
In order to efficiently train the agent, a C++ library was written. This library serves two functions:
Firstly, it efficiently retrieves the frames and sends them to the python process.
Secondly, it intercepts the system calls used to get the system time such that the game can be run at a desired speed.

To do so, the library injects a DLL into the game's process.
This DLL hooks into the OpenGL function `wglSwapBuffers` as well as the system calls `timeGetTime`, `GetTickCount`, `GetTickCount64`, and `RtlQueryPerformanceCounter`.

The function `wglSwapBuffers` is called every time the game finishes rendering a frame in order to swap the back and front buffer.
The `wglSwapBuffers` hook first locks the games execution.
If one wants to advance the game for one step and retrieve the next frame `GameInterface.step` can be called from python.
The this releases the lock until `wglSwapBuffers` is called again.
Then the back buffer is copied into a shared memory space in order to be returned by `GameInterface.step`.

The time perceived by the game can be adjusted with the methods `GameInterface.set_speed(double factor)` and `GameInterface.run_afap(double fps)`.
`set_speed` adjusts the perceived time by the given factor. I. e. if the factor is `0.5` the game runs at half the speed.
`run_afap` makes the game think it runs with the specified FPS i. e. the current time is incremented by `1/fps` every time `GameInterface.step` is called.

For more implementational details see `RLHookLib/PyRLHook/GameInterface.cpp` (especially the constructor and the method `step`)
as well as `RLHookLib/RLHookDLL/GameInterfaceDll.cpp` (especially the methods `onAttach`, `initFromRenderThread`, and `hWglSwapBuffers`).

In theory, this library should also work for other games written with OpenGL.

## Usage

The following python libraries are required:

```
numpy
pytorch
opencv-python
```

Note that both the python process as well as the game process need to be run with admin privileges.
In order to always run the game with admin privileges right click on the Super Hexagon executable `superhexagon.exe`,
select `Properties` and within the `Compatability` tab check `Run this program as an administrator`.

Additionally, you need to compile the C++ library.
Make sure you have a Visual Studio compiler installed, including CMake.

# Approach
This AI is based on Reinforcement Learning more specifically Deep Q-Learning [1].
I. e. the Q-Function is learned which represents the discounted expected future reward if the agent acts according to the Q-Function.
The agent receives a reward of -1 if he dies otherwise a reward of 0.
The game should be in windowed mode and VSync should be disabled.

Clone the repository.
```
git clone https://github.com/polarbart/SuperHexagonAI.git
cd SuperHexagonAI
```

Compile the DLL as well as a helper executable.
```
cd RLHookLib
python compile_additional_binaries.py
```

Then you can install the library globally using pip.

```
pip install .
```

### Evaluation
In order to run the AI, first download the pretrained network [super_hexagon_net](https://github.com/polarbart/SuperHexagonAI/releases/tag/v1.0)
and place it in the main folder (i. e. the folder where `eval.py` is located).

Then, start the game and execute the `eval.py` script, both with admin privileges.

The level being played as well as other parameters can be adjusted within the script.

### Training
In order to train your own AI run `trainer.py` with admin privileges.

Note that the AI is trained on all six levels simultaneously and that you do not need to start the game manually,
since the script starts the game automatically.
Please adjust the path to the Super Hexagon executable at the bottom of the trainer script if necessary
and make sure that the game is always run with admin privileges as described above.

Since sometimes the game gets stuck within one level, the script will sometimes automatically restart the game.
Therefore, you may want to disable the message box asking for admin privileges.


## Other Peoples Approaches
This person (<a href="http://cs231n.stanford.edu/reports/2016/pdfs/115_Report.pdf" target="_blank">pdf</a>) first takes a screenshot of the game.
Then they used a CNN in order to detect the walls and the player.
This information is then used by a hand-crafted decision maker in order to select an action.

This person (<a href="https://github.com/adrianchifor/super-hexagon-ai" target="_blank">github</a>) reads the game's state from the game's memory.
Then they write the correct player position into the game's memory.

This person (<a href="https://crackedopenmind.com/portfolio/super-hexagon-bot/" target="_blank">crackedopenmind.com</a>)
also retrieves the frames by hooking `SwapBuffers`.
Then they analyze the frames with OpenCV and use a hand-crafted decision maker in order to select an action.

Let me know if you find any other approaches, so i can add them here :)

# References
[1] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." nature 518.7540 (2015): 529-533.

[1] Bellemare, Marc G., Will Dabney, and Rémi Munos. "A distributional perspective on reinforcement learning." arXiv preprint arXiv:1707.06887 (2017).

[2] Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295 (2017).

[3] Hessel, Matteo, et al. "Rainbow: Combining improvements in deep reinforcement learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.

[4] Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol. 30. No. 1. 2016.

[5] Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.

[6] Schaul, Tom, et al. "Prioritized experience replay." arXiv preprint arXiv:1511.05952 (2015).

[7] Dabney, Will, et al. "Distributional reinforcement learning with quantile regression." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.
6 changes: 6 additions & 0 deletions RLHookLib/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ project(RLHookLib)

set(CMAKE_CXX_STANDARD 17)

find_package(Git QUIET)
if(GIT_FOUND AND (NOT EXISTS "${PROJECT_SOURCE_DIR}/libs/pybind11/CMakeLists.txt" OR NOT EXISTS "${PROJECT_SOURCE_DIR}/libs/Detours/Detours/README.md"))
message(STATUS "Cloning pybind11 and detours")
execute_process(COMMAND ${GIT_EXECUTABLE} submodule update --init --recursive)
endif()

if (NOT EXISTS "${PROJECT_SOURCE_DIR}/libs/pybind11/CMakeLists.txt" OR NOT EXISTS "${PROJECT_SOURCE_DIR}/libs/Detours/Detours/README.md")
message(FATAL_ERROR "The submodules pybind11 and/or Detours have not been cloned! Clone them with \"git submodule update --init --recursive\".")
endif()
Expand Down
8 changes: 8 additions & 0 deletions RLHookLib/PyRLHook/src/GameInterface.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,11 @@ GameInterface::GameInterface(DWORD pid,
gameProcess = OpenProcess(OPEN_PROCESS_ACCESS_RIGHTS, FALSE, pid);
Utils::checkError(gameProcess == nullptr, "OpenProcess");

// check if the game uses OpenGL
HMODULE openGlModule = Utils::getModuleBaseAddress(gameProcess, "Opengl32.dll");
if (openGlModule == nullptr)
throw std::runtime_error("Does the game use OpenGL? Could not find \"Opengl32.dll\" within the target process.");

// create mailslot to which the target process can communicate exceptions
exceptionMailSlot = Utils::createMailslot(EXCEPTION_MS_NAME);

Expand Down Expand Up @@ -142,7 +147,9 @@ std::optional<py::array> GameInterface::step(bool readPixelBuffer) {
checkForException();

try {
// request the next frame
pipe.write(readPixelBuffer);
// wait for the game to advance one fame
pipe.read<bool>();
} catch (...) {
checkForException();
Expand All @@ -152,6 +159,7 @@ std::optional<py::array> GameInterface::step(bool readPixelBuffer) {
if (!readPixelBuffer)
return std::nullopt;

// return the screen which was written into the shared memory 'pBuf'
return py::array(
pixelDataType == UBYTE ? py::dtype::of<std::uint8_t>() : py::dtype::of<std::float_t>(),
{height, width, channels},
Expand Down
4 changes: 1 addition & 3 deletions RLHookLib/PyRLHook/src/PythonBindings.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,6 @@ PYBIND11_MODULE(pyrlhook, m) {
.value("UINT8", PixelDataType::UBYTE)
.value("FLOAT32", PixelDataType::FLOAT32);

// py::scoped_interpreter guard {};
py::dict locals;
auto path = py::module::import("sys").attr("prefix").cast<std::string>().append("\\Lib\\site-packages");
auto path = py::module::import("sys").attr("prefix").cast<std::string>() + "\\Lib\\site-packages";
GameInterface::basePath = path;
}
4 changes: 3 additions & 1 deletion RLHookLib/RLHookDLL/src/GameInterfaceDll.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ void GameInterface::onDetach() {

if (isOutOfRenderHook != INVALID_HANDLE_VALUE) {
WaitForSingleObject(isOutOfRenderHook, 1000);
// just to make sure that the game thread is out of this module, wait for another 100ms
// just to make sure that the render thread is out of this module, wait for another 100ms
std::this_thread::sleep_for(std::chrono::milliseconds(100));
CloseHandle(isOutOfRenderHook);
isOutOfRenderHook = INVALID_HANDLE_VALUE;
Expand Down Expand Up @@ -276,7 +276,9 @@ void WINAPI GameInterface::hWglSwapBuffers(HDC arg) {
tWglSwapBuffers(arg);

if (clientConnected) {
// notify the python process that the game advanced one frame
pipe.write(true);
// wait for the python process to request the next frame
readPixelBuffer = pipe.read<bool>([] { return !isFinished; });
}

Expand Down
4 changes: 1 addition & 3 deletions RLHookLib/compile_additional_binaries.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,7 @@ def build_cmake_project(
f'-DCMAKE_RUNTIME_OUTPUT_DIRECTORY={out_dir}',
f'-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{cfg.upper()}={out_dir}',
f'-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY_{cfg.upper()}={out_dir}',
f'-DCMAKE_RUNTIME_OUTPUT_DIRECTORY_{cfg.upper()}={out_dir}',
#f'-DPYTHON_EXECUTABLE={sys.executable}',
#f'-DCMAKE_BUILD_TYPE={cfg.upper()}'
f'-DCMAKE_RUNTIME_OUTPUT_DIRECTORY_{cfg.upper()}={out_dir}'
]
if additional_binaries:
cmake_args += ['-DONLY_ADDITIONAL_BINARIES=ON']
Expand Down
13 changes: 7 additions & 6 deletions eval.py
Original file line number Diff line number Diff line change
@@ -1,26 +1,27 @@
import numpy as np
import torch
from Utils import Network
from SuperHexagon import SuperHexagonInterface
from utils import Network
from superhexagon import SuperHexagonInterface
from itertools import count


if __name__ == '__main__':

# parameters
level = 0
device = 'cuda'
net_path = 'super_hexagon_net'

n_frames = 4
frame_skip = 4
log_every = 1000
n_atoms = 51
level = 0
net_path = 'net1'
device = 'cuda'

# setup
fp, fcp = np.zeros((1, n_frames, *SuperHexagonInterface.frame_size), dtype=np.bool), np.zeros((1, n_frames, *SuperHexagonInterface.frame_size_cropped), dtype=np.bool)
support = np.linspace(-1, 0, n_atoms)

net = Network(n_frames, SuperHexagonInterface.n_actions, n_atoms, 1024).cuda()
net = Network(n_frames, SuperHexagonInterface.n_actions, n_atoms).cuda()
net.load_state_dict(torch.load(net_path))
net.eval()

Expand Down
Binary file added images/a_compressed.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/b_compressed2.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/highscores_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/smoothed_time_alive_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/thumbnail+controls.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/thumbnail.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/thumbnail.xcf
Binary file not shown.
8 changes: 5 additions & 3 deletions SuperHexagon.py → superhexagon.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,8 @@ def _restart_game(self):
def _attach_game(self):
self.game = GameInterface('superhexagon.exe', PixelFormat.RGB, PixelDataType.UINT8)

# make the game think it runs at 62.5 FPS no matter how frequently self.game.step is called.
# afap -> as fast as possible
if self.run_afap:
self.game.run_afap(62.5)

Expand Down Expand Up @@ -200,7 +202,7 @@ def select_level(self, level: int):
for _ in range(30):
self.game.step(False)

def _postprocess_frame(self, frame):
def _preprocess_frame(self, frame):
f = cv2.cvtColor(cv2.resize(frame[:, 144:624], self.frame_size, interpolation=cv2.INTER_NEAREST), cv2.COLOR_RGB2GRAY)
fc = cv2.cvtColor(cv2.resize(frame[150:330, 294:474], self.frame_size_cropped, interpolation=cv2.INTER_NEAREST), cv2.COLOR_RGB2GRAY)
center_color = f[self.frame_size[0] // 2, self.frame_size[1] // 2]
Expand Down Expand Up @@ -236,7 +238,7 @@ def reset(self):
self.steps_alive = self.get_n_survived_frames()
self.simulated_steps = 0

return self._postprocess_frame(frame)
return self._preprocess_frame(frame)

def step(self, action):

Expand All @@ -255,7 +257,7 @@ def step(self, action):

is_game_over = self.steps_alive < steps_alive_old + self.frame_skip

frame, frame_cropped = self._postprocess_frame(frame)
frame, frame_cropped = self._preprocess_frame(frame)

if action == 1:
self._left(False)
Expand Down
Loading

0 comments on commit 5274816

Please sign in to comment.