This repository contains a project that utilizes a reinforcement learning algorithm to make a humanoid robot walk in a PyBullet simulation. The project is structured into multiple Python files and uses a provided URDF file for the humanoid robot. The training process prints the reward for every generation and performs a PyBullet simulation of every Nth generation. The simulation animates the model and waits until it falls before it ends.
train.py
: Main script to train the humanoid robot.agent.py
: Defines the PPO agent used for training.environment.py
: Custom Gym environment for the humanoid robot.humanoid.urdf
: URDF file for the humanoid robot.
- Python 3.7+
- PyBullet
- Gym
- NumPy
- PyTorch
- Clone the repository:
git clone https://github.com/yourusername/Humanoid-Robot-Reinforcement-Learning-PPO.git
cd Humanoid-Robot-Reinforcement-Learning-PPO
- Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install the required packages:
pip install -r requirements.txt
If requirements.txt
is not provided, manually install the packages:
pip install pybullet gym numpy torch
To train the humanoid robot, run the train.py
script:
python train.py
This script will:
- Initialize the custom Gym environment (
HumanoidEnv
). - Initialize the PPO agent.
- Train the agent for a specified number of episodes.
- Print the reward for every episode.
- Visualize the robot's behavior every Nth episode.
To visualize a saved episode, you can use the visualize_episode
function in the train.py
script. This function is called automatically every Nth episode during training. If you want to visualize a specific saved model or episode, modify the visualize_episode
function as needed.
You can customize various parameters in the train.py
script, such as the number of episodes and the frequency of visualization. Here are some key variables you might want to adjust:
n_episodes
: Number of episodes for training.visualize_every
: Frequency of visualization (e.g., every 100 episodes).
n_episodes = 1000
visualize_every = 100
Defines the HumanoidEnv
class, a custom Gym environment for the humanoid robot:
__init__
: Initializes the environment and loads the URDF file.step
: Executes a step in the environment using the given action.reset
: Resets the environment to its initial state.render
: Renders the environment (not used in this example).close
: Closes the environment.get_state
: Retrieves the current state of the robot.calculate_reward
: Calculates the reward based on the robot's position.check_done
: Checks if the episode is done.set_episode_number
andupdate_debug_text
: Used for displaying the current episode number in the PyBullet GUI.
Defines the PPO agent and the Actor-Critic neural network:
ActorCritic
: Neural network with separate actor and critic components.PPO
: Proximal Policy Optimization agent that interacts with the environment, stores transitions, and performs training.
Script for training and visualizing the humanoid robot:
main
: Initializes the environment and agent, runs the training loop, and handles visualization.visualize_episode
: Visualizes the robot's behavior for a specific episode.
Main script for running saved episodes from episodes_data.pkl
If you encounter issues during training or visualization, ensure that:
- All required packages are installed.
- The URDF file (
humanoid.urdf
) is in the correct location. - You are using a compatible version of Python (3.7+).
For further assistance, feel free to open an issue on the repository.
Contributions are welcome! If you have suggestions for improvements or new features, please open an issue or submit a pull request.