Skip to content

Robot arm control using reinforcement learning algorithms : DDPG and TD3 with hindsight experience replay (HER)

License

Notifications You must be signed in to change notification settings

kaymen99/Robot-arm-control-with-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robot arm control with Reinforcement Learning

anim

This project focuses on controlling a 7 DOF robot arm provided in the [pandas_gym](https://github.com/qgallouedec/panda-gym) Reacher environment using two continuous reinforcement learning algorithms: DDPG (Deep Deterministic Policy Gradients) and TD3 (Twin Delayed Deep Deterministic Policy Gradients). The technique of Hindsight Experience Replay is used to enhance the learning process of both algorithms.

Continuous RL Algorithms

Continuous reinforcement learning deals with environments where actions are continuous, such as the precise control of robotic arm joints or controlling the throttle of an autonomous vehicle. The primary objective is to find policies that effectively map observed states to continuous actions, ultimately optimizing the accumulation of expected rewards. Several algorithms have been specifically developed to address this challenge, including DDPG, TD3, SAC, PPO, and more.

1- DDPG (Deep Deterministic Policy Gradients)

DDPG is an actor-critic algorithm designed for continuous action spaces. It combines the strengths of policy gradients and Q-learning. In DDPG, an actor network learns the policy, while a critic network approximates the action-value (Q-function). The actor network directly outputs continuous actions, which are evaluted by the critic network to find the best action thus allowing for fine-grained control.

2- TD3 (Twin Delayed Deep Deterministic Policy Gradients)

TD3 is an enhancement of DDPG that addresses issues such as overestimation bias. It introduces the concept of "twin" critics to estimate the Q-value (it uses two critic networks instead of a single one like in DDPG), and it uses target networks with delayed updates to stabilize training. TD3 is known for its robustness and improved performance over DDPG.

Hindsight Experience Replay

Hindsight Experience Replay (HER) is a technique developed to address the challenge of sparse and binary rewards in RL environments. For example, in many robotic tasks, achieving the desired goal is rare, and traditional RL algorithms struggle to learn from such feedback (agent always gets a zero reward unless the robot successfully completed the task which makes it difficult for the algorithm to learn as it doesn't know if the steps done were good or not).

HER tackles this issue by reusing past experiences for learning, even if they didn't lead to the desired goal. It works by relabeling and storing experiences in a replay buffer, allowing the agent to learn from both successful and failed attempts which significantly accelerates the learning process.

Link to HER paper: https://arxiv.org/pdf/1707.01495.pdf

How ro run

  • You can train a given model simply by running one of the files in the `training` folder.

    DDPG With HER: ddpg_her.py

    TD3 With HER: td3_her_training.py

  • You can change the values of the hyperparameters of both algorithms (learning_rate (alpha/beta), discount factor (gamma),...) by going directly to each agent class in the agents folder. The architecture of the Actor/Critic networks can be modified from the networks.py file.

Results

The training of both agents was done in the colab environment :


Contact

If you have any questions, feedback, or issues, please don't hesitate to open an issue or reach out to me: [email protected].

License

Distributed under the MIT License. See LICENSE.txt for more information.

About

Robot arm control using reinforcement learning algorithms : DDPG and TD3 with hindsight experience replay (HER)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages