Highway-Decision-Transformer

Decision Transformer for offline single-agent autonomous highway driving.

Code Structure

.
├── README.md
├── modules # all model modules contained here
│   ├── __init__.py
├── pipelines # all training, testing, preprocessing, and data gathering pipelines contained here
│   ├── __init__.py
├── expert_scripts # all expert data collection contained here
├── example-notebooks # scartch/example jupyternotebooks
├── experiments # all training, testing, and demo experiment files for various models

Environment

We used the open source highway-env Python framework to simulate a highway environment. highway-env is built on top of the OpenAI Gym toolkit, which is widely used in the field of reinforcement learning. highway-env provides a customizable and modular framework for designing experiments related to highway traffic, such as vehicle dynamics, traffic flow, and behavior modeling. It also includes various metrics and evaluation tools to assess the performance of different agents in the simulated environment. In our experiment, we will focus on a class of highway-env instance, consisting of 3 lanes and 20 other simulated cars. Our goal is to train an agent to drive safely in this scenario and maximize the default highway-env reward:

$R(s,a) = a \frac{v - v_{min}}{v_{max} - v_{min}} - b \text{collision}$

where $v$, $v_{min}$, and $v_{max}$ are the current, minimum, and maximum speed of the ego-vehicle respectively and $a$ and $b$ are coefficients.

Data Collection

Three online RL methods were used to collect expert data: Proximal Policy Optimization (PPO), Deep Q-Network (DQN), and Monte Carlo Tree Search (MCTS). The scripts to collect data can be found in /expert_scripts.

Below is a demonstration of the performance of the various experts. PPO and DQN are highly popular, state-of-the-art online RL methods while MCTS completely searches the game tree at each iteration and thus always finds the maximum reward/best move.

Proximal Policy Optimization (PPO)

Deep Q-Network (DQN)

Monte Carlo Tree Search (MCTS)

Models

Benchmark: Behaviour Cloning

This is one of two benchmark models used by the original DT paper. By following an imitation-learning approach, we developed an agent to mimic the behaviours of the expert on which it is trained on.

Benchmark: Conservative Q-Learning

The is the state-of-the-art offline RL method. It uses a temporal difference learning approach.

Baseline Decision Transformer

Several experiments with various configurations of training datasets and parameters have been conducted in /experiments/.

The DT model is based on GPT-2 defined in /modules/trajectory_gpt2.py.

Decision Transformer with Different Encoders

We used two embeddings

Kinematics: We used a 5x5 array of the position and velocities of the nearest 5 vehicles.
Image-Based: We used 4 of the most recent grayscale birdseye view images.

LSTM

LSTM is a type of recurrent neural network (RNN) that is classically used for sequence modelling problems. A common criticism of DTs is that they are no different than sequence modelling with RNNs. We plan on replacing DT blocks with LSTM blocks to verify whether this criticism holds.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
experiments		experiments
expert_scripts		expert_scripts
figures		figures
modules		modules
pipelines		pipelines
wandb		wandb
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Highway-Decision-Transformer

Code Structure

Environment

Data Collection

Proximal Policy Optimization (PPO)

Deep Q-Network (DQN)

Monte Carlo Tree Search (MCTS)

Models

Benchmark: Behaviour Cloning

Benchmark: Conservative Q-Learning

Baseline Decision Transformer

Decision Transformer with Different Encoders

LSTM

Result

Decision Transformer on Kinematic Input (best 42.19 mean reward)

Decision Tranformer on Grayscale Image Input (best 61.39 mean reward)

About

Releases

Packages

Contributors 3

Languages

License

caixunshiren/Highway-Decision-Transformer

Folders and files

Latest commit

History

Repository files navigation

Highway-Decision-Transformer

Code Structure

Environment

Data Collection

Models

Benchmark: Conservative Q-Learning

Decision Transformer with Different Encoders

LSTM

Result

Decision Transformer on Kinematic Input (best 42.19 mean reward)

Decision Tranformer on Grayscale Image Input (best 61.39 mean reward)

About

Resources

License

Stars

Watchers

Forks

Languages