Name		Name	Last commit message	Last commit date
parent directory ..
examples		examples
src/rllib_td3/td3		src/rllib_td3/td3
tests		tests
tuned_examples		tuned_examples
BUILD		BUILD
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

README.md

TD3 (Twin Delayed DDPG)

TD3 While DDPG can achieve great performance sometimes, it is frequently brittle with respect to hyperparameters and other kinds of tuning. A common failure mode for DDPG is that the learned Q-function begins to dramatically overestimate Q-values, which then leads to the policy breaking, because it exploits the errors in the Q-function. Twin Delayed DDPG (TD3) is an algorithm that addresses this issue by introducing three critical tricks:

Trick One: Clipped Double-Q Learning. TD3 learns two Q-functions instead of one (hence “twin”), and uses the smaller of the two Q-values to form the targets in the Bellman error loss functions.

Trick Two: “Delayed” Policy Updates. TD3 updates the policy (and target networks) less frequently than the Q-function. The paper recommends one policy update for every two Q-function updates.

Trick Three: Target Policy Smoothing. TD3 adds noise to the target action, to make it harder for the policy to exploit Q-function errors by smoothing out Q along changes in action.

Together, these three tricks result in substantially improved performance over baseline DDPG.

Installation

conda create -n rllib-td3 python=3.10
conda activate rllib-td3
pip install -r requirements.txt
pip install -e '.[development]'

Usage

TD3 Example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

td3

td3

README.md

TD3 (Twin Delayed DDPG)

Installation

Usage

Files

td3

Directory actions

More options

Directory actions

More options

Latest commit

History

td3

Folders and files

parent directory

README.md

TD3 (Twin Delayed DDPG)

Installation

Usage