First, we sincerely appreciate https://github.com/DLR-RM/stable-baselines3 for supplying the base code.
Our goal is to simulate a Mujoco high jump since the Mujoco environment usually only aims to go forward.
CPU : Intel i7 - 13700
GPU : RTX - 4090
RAM : 64 GB
OS : Ubuntu 22.04.03 LTS
Conda environment : Python 3.11.0
PyTorch version : 2.1.0
MuJoCo version : MuJoCo210
MuJoCo Env version : v4 (Gymnasium not Open AI Gym)
Mujoco model : mujoco210
Base code : Stable baselines3, Stable baselines3-contrib
Train settings (All of them are v4)
Humanoid | Walker2d | Hopper | HalfCheetah | Ant |
---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
![]() |
No Wall | Wall 0.4 |
---|---|
![]() |
![]() |
Normal reward | Jump reward |
---|---|
![]() |
![]() |
A2C | SAC | DDPG | TD3 | PPO | TRPO |
---|
(For the jump reward, we added weighted z-coordinates and z-velocity to the total_reward)
To make these operations a whole pipeline, we made codes that would enable editing the environments efficiently.
※ Beware that our environment is Ubuntu 22.04.03LTS; it may not work in Windows Subsystem for Linux, Virtual Machine, or Mac OSX.
- Creating the conda environment
git clone https://github.com/john123zerg/Mujoco_high_jump_simulator.git conda create -n mujoco python==3.11.0 -y conda activate mujoco cd Mujoco_high_jump_simulator pip install gymnasium pip install sb3_contrib pip install gymnasium[mujoco] pip install tensorboard pip install install patchelf python init.py
The following platforms are currently supported:
- Linux with Python 3.6+. See the
Dockerfile
for the canonical list of system dependencies. - OS X with Python 3.6+.
The following platforms are DEPRECATED and unsupported:
- Windows support has been DEPRECATED and removed in 2.0.2.0. One known good past version is 1.50.1.68.
- Python 2 has been DEPRECATED and removed in 1.50.1.0. Python 2 users can stay on the
0.5
branch. The latest release there is0.5.7
which can be installed withpip install mujoco-py==0.5.7
.
- Download the MuJoCo version 2.1 binaries for Linux or OSX.
- Extract the downloaded
mujoco210
directory into~/.mujoco/mujoco210
.
If you want to specify a nonstandard location for the package,
use the env variable MUJOCO_PY_MUJOCO_PATH
.
[reference : https://github.com/openai/mujoco-py/blob/master/README.md?plain=1 ]
['Walker2d','Hopper','HalfCheetah','Humanoid','Ant'] ['SAC','A2C','PPO','TRPO','DDPG','TD3']
- Train
if wall 0 -> don't need to write -w
It will train until 1 million.
python main.py Walker2d SAC -t -z 1 -w 1 -ws 0.2 -tw 1 -tws 0.2 -z 1
The parameters
-t : train
-z : changing_the_reward_function_to_high_jump_reward (Bool)
-w : wall existence for path (Bool)
-ws : wall_size for path (Float)
-tw : train_wall (Bool)
-tws : modify the wall size (Float)
- Test
If you want to test with a wall when you didn't train with a wall,
python main.py Humanoid SAC -s . -w 1 -ws 0.2 -tw 1 -tws 0.2 -z 1 -r 1 -f 0
The parameters
-s : Enables entering test mode
-tw : train_wall -> tells the path_parser whether to find a wall_trained model or not (Bool)
-w -ws -tws it's changing the XML so it deletes, creates, or modifies the wall
-r : replay file - 1 enables the test to last forever if not, it will end after 10 seconds (Bool), -f : file rank number ranking - 0 is default (Int)
- Tensorboard commands (Your train code needs to be running)
tensorboard --logdir ./logs
- Episode reward mean results (e.g., Humanoid without walls and with normal rewards)
※ If you want extra models, please contact me.
- Humanoid no wall base reward
python main.py Humanoid SAC -s . -w 1 -ws 0.2 -tw 0 -tws 0.2 -z 0 -r 1
- Humanoid no wall change reward
python main.py Humanoid SAC -s . -w 1 -ws 0.2 -tw 0 -tws 0.2 -z 1 -r 1
- Humanoid wall base reward
python main.py Humanoid SAC -s . -w 1 -ws 0.2 -tw 1 -tws 0.2 -z 0 -r 1
- Humanoid wall change reward
python main.py Humanoid SAC -s . -w 1 -ws 0.2 -tw 1 -tws 0.2 -z 1 -r 1
- HalfCheetah no wall base reward
python main.py HalfCheetah SAC -s . -w 1 -ws 0.2 -tw 0 -tws 0.2 -z 0 -r 1
- HalfCheetah no wall change reward
python main.py HalfCheetah SAC -s . -w 1 -ws 0.2 -tw 0 -tws 0.2 -z 1 -r 1
- HalfCheetah wall base reward
python main.py HalfCheetah SAC -s . -w 1 -ws 0.2 -tw 1 -tws 0.2 -z 0 -r 1
- HalfCheetah wall change reward
python main.py HalfCheetah SAC -s . -w 1 -ws 0.2 -tw 1 -tws 0.2 -z 1 -r 1