initial commit

polarbart · Feb 3, 2021 · 5274816 · 5274816
1 parent 0e829a1
commit 5274816
Show file tree

Hide file tree

Showing 17 changed files with 219 additions and 45 deletions.
diff --git a/README.md b/README.md
@@ -1,15 +1,170 @@
 # Super Hexagon AI
 
- - Based on Reinforcement Learning (Q-Learning)
- - Beats all six levels easily end to end (i. e. from pixel input)
- - Fast C++ implementation that hooks into the game and hands the screen over to the python process
+ - Based on reinforcement learning (Distributional Q-Learning [1] with Noisy Nets [2])
+ - Beats all six levels easily end to end (i. e. from pixel input to action output)
+   - A level counts as beaten if one survives at least 60 seconds
+ - Fast C++ implementation that hooks into the game and hands the frames over to the python process
+
+## Example
+<a href="https://youtu.be/gjqSZ_4mQjg" target="_blank"> <img src="images/thumbnail+controls.png"> </a>
+
+## Results
+
+ - The graphs below show the performance of the AI
+ - The AI was trained for roughly 1.5 days resulting in roughly 9 days of in-game time
+ - The AI quickly learns to complete all levels, after a few hours of in-game time
+   - A level counts as completed if the player survives for 60+ seconds
 
+<div align="center">
+<img src="images/smoothed_time_alive_2.png" width="100%">
+<br>
+<img src="images/highscores_2.png" width="100%">
+</div>
+
+### Glitches
+The AI discovered that it can glitch through a wall in some occurrences. 
+
+The left glitch was also discovered by <a href="https://youtu.be/CI04x9au_Es" target="_blank">others</a>.
+The right glitch may only be possible when rotation is disabled.
+
+<div align="center">
+<img src="images/a_compressed.gif" width="47%">
+&nbsp;
+<img src="images/b_compressed2.gif" width="47%">
+</div>
+
+## Implementational Details
+
+### Training
+ - The reinforcement learning agent receives a reward of -1 if it dies otherwise a reward of 0
+ - The network has two convolutional streams
+   - For the first stream the frame is cropped to a square and resized to 60 by 60 pixel
+   - Since the small triangle which is controlled by the player is barely visible for the first stream,
+     a zoomed-in input is given to the second stream
+   - For the fully connected part of the network the feature maps of both streams are flattened and concatenated
+   - See `utils.Network` for more implementational details
+ - Additionally, a threshold function is applied to the frame
+     such that the walls and the player belong to the foreground and everything else belongs to the background
+   - See `superhexagon.SuperHexagonInterface._preprocess_frame` for more implementational details
+ - The used hyperparameters can be found at the bottom of `trainer.py` below `if __name__ == '__main__':`
+
+
+### Rainbow
+ - All six Rainbow [3] extensions have been evaluated
+ - Double Q-Learning [4] and Dueling Networks [5] did not improve the performance
+ - n-step significantly decreased the performance
+ - Prioritized experience replay [6] at first performs better, 
+   however, after roughly 300,000 training steps the agent trained without prioritized experience replay performs better
+ - The distributional approach [1] significantly increases the performance of the agent
+   - Distributional RL with quantile regression [7] gives similar results
+ - Noisy networks [2] facilitate the exploration process
+   - However, the noise is turned off after 500,000 training iterations
+
+### PyRLHook
+In order to efficiently train the agent, a C++ library was written. This library serves two functions:
+Firstly, it efficiently retrieves the frames and sends them to the python process. 
+Secondly, it intercepts the system calls used to get the system time such that the game can be run at a desired speed.
+
+To do so, the library injects a DLL into the game's process.
+This DLL hooks into the OpenGL function `wglSwapBuffers` as well as the system calls `timeGetTime`, `GetTickCount`, `GetTickCount64`, and `RtlQueryPerformanceCounter`.
+
+The function `wglSwapBuffers` is called every time the game finishes rendering a frame in order to swap the back and front buffer. 
+The `wglSwapBuffers` hook first locks the games execution. 
+If one wants to advance the game for one step and retrieve the next frame `GameInterface.step` can be called from python.
+The this releases the lock until `wglSwapBuffers` is called again. 
+Then the back buffer is copied into a shared memory space in order to be returned by `GameInterface.step`.
+
+The time perceived by the game can be adjusted with the methods `GameInterface.set_speed(double factor)` and `GameInterface.run_afap(double fps)`. 
+`set_speed` adjusts the perceived time by the given factor. I. e. if the factor is `0.5` the game runs at half the speed. 
+`run_afap` makes the game think it runs with the specified FPS i. e. the current time is incremented by `1/fps` every time `GameInterface.step` is called.
+
+For more implementational details see `RLHookLib/PyRLHook/GameInterface.cpp` (especially the constructor and the method `step`) 
+as well as `RLHookLib/RLHookDLL/GameInterfaceDll.cpp` (especially the methods `onAttach`, `initFromRenderThread`, and `hWglSwapBuffers`).
+
+In theory, this library should also work for other games written with OpenGL.
+
+## Usage
+
+The following python libraries are required:
+
+```
+numpy
+pytorch
+opencv-python
+```
+
+Note that both the python process as well as the game process need to be run with admin privileges.
+In order to always run the game with admin privileges right click on the Super Hexagon executable `superhexagon.exe`, 
+select `Properties` and within the `Compatability` tab check `Run this program as an administrator`.
+
+Additionally, you need to compile the C++ library. 
+Make sure you have a Visual Studio compiler installed, including CMake.
 
-# Approach
-This AI is based on Reinforcement Learning more specifically Deep Q-Learning [1]. 
-I. e. the Q-Function is learned which represents the discounted expected future reward if the agent acts according to the Q-Function.
-The agent receives a reward of -1 if he dies otherwise a reward of 0.  
+The game should be in windowed mode and VSync should be disabled.
+
+Clone the repository.
+```
+git clone https://github.com/polarbart/SuperHexagonAI.git
+cd SuperHexagonAI
+```
+
+Compile the DLL as well as a helper executable.
+```
+cd RLHookLib
+python compile_additional_binaries.py
+```
+
+Then you can install the library globally using pip.
+
+```
+pip install .
+```
+
+### Evaluation
+In order to run the AI, first download the pretrained network [super_hexagon_net](https://github.com/polarbart/SuperHexagonAI/releases/tag/v1.0)
+and place it in the main folder (i. e. the folder where `eval.py` is located).
+
+Then, start the game and execute the `eval.py` script, both with admin privileges.
+
+The level being played as well as other parameters can be adjusted within the script.
+
+### Training
+In order to train your own AI run `trainer.py` with admin privileges. 
+
+Note that the AI is trained on all six levels simultaneously and that you do not need to start the game manually, 
+since the script starts the game automatically. 
+Please adjust the path to the Super Hexagon executable at the bottom of the trainer script if necessary 
+and make sure that the game is always run with admin privileges as described above.
+
+Since sometimes the game gets stuck within one level, the script will sometimes automatically restart the game. 
+Therefore, you may want to disable the message box asking for admin privileges.
+
+
+## Other Peoples Approaches
+This person (<a href="http://cs231n.stanford.edu/reports/2016/pdfs/115_Report.pdf" target="_blank">pdf</a>) first takes a screenshot of the game.
+Then they used a CNN in order to detect the walls and the player. 
+This information is then used by a hand-crafted decision maker in order to select an action.
+
+This person (<a href="https://github.com/adrianchifor/super-hexagon-ai" target="_blank">github</a>) reads the game's state from the game's memory. 
+Then they write the correct player position into the game's memory. 
+
+This person (<a href="https://crackedopenmind.com/portfolio/super-hexagon-bot/" target="_blank">crackedopenmind.com</a>) 
+also retrieves the frames by hooking `SwapBuffers`. 
+Then they analyze the frames with OpenCV and use a hand-crafted decision maker in order to select an action.
+
+Let me know if you find any other approaches, so i can add them here :)
 
 # References
-[1] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." nature 518.7540 (2015): 529-533. 
-
+[1] Bellemare, Marc G., Will Dabney, and Rémi Munos. "A distributional perspective on reinforcement learning." arXiv preprint arXiv:1707.06887 (2017).
+
+[2] Fortunato, Meire, et al. "Noisy networks for exploration." arXiv preprint arXiv:1706.10295 (2017).
+
+[3] Hessel, Matteo, et al. "Rainbow: Combining improvements in deep reinforcement learning." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.
+
+[4] Van Hasselt, Hado, Arthur Guez, and David Silver. "Deep reinforcement learning with double q-learning." Proceedings of the AAAI conference on artificial intelligence. Vol. 30. No. 1. 2016.
+
+[5] Wang, Ziyu, et al. "Dueling network architectures for deep reinforcement learning." International conference on machine learning. PMLR, 2016.
+
+[6] Schaul, Tom, et al. "Prioritized experience replay." arXiv preprint arXiv:1511.05952 (2015).
+
+[7] Dabney, Will, et al. "Distributional reinforcement learning with quantile regression." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 32. No. 1. 2018.
diff --git a/RLHookLib/CMakeLists.txt b/RLHookLib/CMakeLists.txt
@@ -3,6 +3,12 @@ project(RLHookLib)
 
 set(CMAKE_CXX_STANDARD 17)
 
+find_package(Git QUIET)
+if(GIT_FOUND AND (NOT EXISTS "${PROJECT_SOURCE_DIR}/libs/pybind11/CMakeLists.txt" OR NOT EXISTS "${PROJECT_SOURCE_DIR}/libs/Detours/Detours/README.md"))
+    message(STATUS "Cloning pybind11 and detours")
+    execute_process(COMMAND ${GIT_EXECUTABLE} submodule update --init --recursive)
+endif()
+
 if (NOT EXISTS "${PROJECT_SOURCE_DIR}/libs/pybind11/CMakeLists.txt" OR NOT EXISTS "${PROJECT_SOURCE_DIR}/libs/Detours/Detours/README.md")
     message(FATAL_ERROR "The submodules pybind11 and/or Detours have not been cloned! Clone them with \"git submodule update --init --recursive\".")
 endif()

diff --git a/RLHookLib/PyRLHook/src/GameInterface.cpp b/RLHookLib/PyRLHook/src/GameInterface.cpp
@@ -36,6 +36,11 @@ GameInterface::GameInterface(DWORD pid,
         gameProcess = OpenProcess(OPEN_PROCESS_ACCESS_RIGHTS, FALSE, pid);
         Utils::checkError(gameProcess == nullptr, "OpenProcess");
 
+        // check if the game uses OpenGL
+        HMODULE openGlModule = Utils::getModuleBaseAddress(gameProcess, "Opengl32.dll");
+        if (openGlModule == nullptr)
+            throw std::runtime_error("Does the game use OpenGL? Could not find \"Opengl32.dll\" within the target process.");
+
         // create mailslot to which the target process can communicate exceptions
         exceptionMailSlot = Utils::createMailslot(EXCEPTION_MS_NAME);
 
@@ -142,7 +147,9 @@ std::optional<py::array> GameInterface::step(bool readPixelBuffer) {
     checkForException();
 
     try {
+        // request the next frame
         pipe.write(readPixelBuffer);
+        // wait for the game to advance one fame
         pipe.read<bool>();
     } catch (...) {
         checkForException();
@@ -152,6 +159,7 @@ std::optional<py::array> GameInterface::step(bool readPixelBuffer) {
     if (!readPixelBuffer)
         return std::nullopt;
 
+    // return the screen which was written into the shared memory 'pBuf'
     return py::array(
             pixelDataType == UBYTE ? py::dtype::of<std::uint8_t>() : py::dtype::of<std::float_t>(),
             {height, width, channels},

diff --git a/RLHookLib/PyRLHook/src/PythonBindings.cpp b/RLHookLib/PyRLHook/src/PythonBindings.cpp
@@ -43,8 +43,6 @@ PYBIND11_MODULE(pyrlhook, m) {
 	        .value("UINT8", PixelDataType::UBYTE)
 	        .value("FLOAT32", PixelDataType::FLOAT32);
 
-    // py::scoped_interpreter guard {};
-    py::dict locals;
-    auto path = py::module::import("sys").attr("prefix").cast<std::string>().append("\\Lib\\site-packages");
+    auto path = py::module::import("sys").attr("prefix").cast<std::string>() + "\\Lib\\site-packages";
 	GameInterface::basePath = path;
 }
diff --git a/RLHookLib/RLHookDLL/src/GameInterfaceDll.cpp b/RLHookLib/RLHookDLL/src/GameInterfaceDll.cpp
@@ -229,7 +229,7 @@ void GameInterface::onDetach() {
 
     if (isOutOfRenderHook != INVALID_HANDLE_VALUE) {
         WaitForSingleObject(isOutOfRenderHook, 1000);
-        // just to make sure that the game thread is out of this module, wait for another 100ms
+        // just to make sure that the render thread is out of this module, wait for another 100ms
         std::this_thread::sleep_for(std::chrono::milliseconds(100));
         CloseHandle(isOutOfRenderHook);
         isOutOfRenderHook = INVALID_HANDLE_VALUE;
@@ -276,7 +276,9 @@ void WINAPI GameInterface::hWglSwapBuffers(HDC arg) {
         tWglSwapBuffers(arg);
 
         if (clientConnected) {
+            // notify the python process that the game advanced one frame
             pipe.write(true);
+            // wait for the python process to request the next frame
             readPixelBuffer = pipe.read<bool>([] { return !isFinished; });
         }
 

diff --git a/RLHookLib/compile_additional_binaries.py b/RLHookLib/compile_additional_binaries.py
@@ -37,9 +37,7 @@ def build_cmake_project(
         f'-DCMAKE_RUNTIME_OUTPUT_DIRECTORY={out_dir}',
         f'-DCMAKE_LIBRARY_OUTPUT_DIRECTORY_{cfg.upper()}={out_dir}',
         f'-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY_{cfg.upper()}={out_dir}',
-        f'-DCMAKE_RUNTIME_OUTPUT_DIRECTORY_{cfg.upper()}={out_dir}',
-        #f'-DPYTHON_EXECUTABLE={sys.executable}',
-        #f'-DCMAKE_BUILD_TYPE={cfg.upper()}'
+        f'-DCMAKE_RUNTIME_OUTPUT_DIRECTORY_{cfg.upper()}={out_dir}'
     ]
     if additional_binaries:
         cmake_args += ['-DONLY_ADDITIONAL_BINARIES=ON']

diff --git a/eval.py b/eval.py
@@ -1,26 +1,27 @@
 import numpy as np
 import torch
-from Utils import Network
-from SuperHexagon import SuperHexagonInterface
+from utils import Network
+from superhexagon import SuperHexagonInterface
 from itertools import count
 
 
 if __name__ == '__main__':
 
     # parameters
+    level = 0
+    device = 'cuda'
+    net_path = 'super_hexagon_net'
+
     n_frames = 4
     frame_skip = 4
     log_every = 1000
     n_atoms = 51
-    level = 0
-    net_path = 'net1'
-    device = 'cuda'
 
     # setup
     fp, fcp = np.zeros((1, n_frames, *SuperHexagonInterface.frame_size), dtype=np.bool), np.zeros((1, n_frames, *SuperHexagonInterface.frame_size_cropped), dtype=np.bool)
     support = np.linspace(-1, 0, n_atoms)
 
-    net = Network(n_frames, SuperHexagonInterface.n_actions, n_atoms, 1024).cuda()
+    net = Network(n_frames, SuperHexagonInterface.n_actions, n_atoms).cuda()
     net.load_state_dict(torch.load(net_path))
     net.eval()
 

diff --git a/images/a_compressed.gif b/images/a_compressed.gif
diff --git a/images/b_compressed2.gif b/images/b_compressed2.gif
diff --git a/images/highscores_2.png b/images/highscores_2.png
diff --git a/images/smoothed_time_alive_2.png b/images/smoothed_time_alive_2.png
diff --git a/images/thumbnail+controls.png b/images/thumbnail+controls.png
diff --git a/images/thumbnail.png b/images/thumbnail.png
diff --git a/images/thumbnail.xcf b/images/thumbnail.xcf
diff --git a/SuperHexagon.py → superhexagon.py b/SuperHexagon.py → superhexagon.py
@@ -120,6 +120,8 @@ def _restart_game(self):
     def _attach_game(self):
         self.game = GameInterface('superhexagon.exe', PixelFormat.RGB, PixelDataType.UINT8)
 
+        # make the game think it runs at 62.5 FPS no matter how frequently self.game.step is called.
+        # afap -> as fast as possible
         if self.run_afap:
             self.game.run_afap(62.5)
 
@@ -200,7 +202,7 @@ def select_level(self, level: int):
         for _ in range(30):
             self.game.step(False)
 
-    def _postprocess_frame(self, frame):
+    def _preprocess_frame(self, frame):
         f = cv2.cvtColor(cv2.resize(frame[:, 144:624], self.frame_size, interpolation=cv2.INTER_NEAREST), cv2.COLOR_RGB2GRAY)
         fc = cv2.cvtColor(cv2.resize(frame[150:330, 294:474], self.frame_size_cropped, interpolation=cv2.INTER_NEAREST), cv2.COLOR_RGB2GRAY)
         center_color = f[self.frame_size[0] // 2, self.frame_size[1] // 2]
@@ -236,7 +238,7 @@ def reset(self):
         self.steps_alive = self.get_n_survived_frames()
         self.simulated_steps = 0
 
-        return self._postprocess_frame(frame)
+        return self._preprocess_frame(frame)
 
     def step(self, action):
 
@@ -255,7 +257,7 @@ def step(self, action):
 
         is_game_over = self.steps_alive < steps_alive_old + self.frame_skip
 
-        frame, frame_cropped = self._postprocess_frame(frame)
+        frame, frame_cropped = self._preprocess_frame(frame)
 
         if action == 1:
             self._left(False)