gym-simple-rl 🤖

A hands-on exploration of reinforcement learning algorithms solving the CartPole environment from OpenAI Gym (now Gymnasium). Perfect for learning and experimenting with RL basics!

Note: This project was built for self-learning and experimentation purposes with extensive assistance from various Large Language Models (LLMs). 🤝

Supported Algorithms 🧮

Q-Learning
SARSA
Expected SARSA
Q-Learning with Eligibility Traces (Q(λ))
SARSA with Eligibility Traces (SARSA(λ))
True Online SARSA(λ)

Installation 🔧

Clone this repository:

git clone https://github.com/tsilva/gym-simple-rl.git
cd gym-simple-rl

Install Miniconda:
- Visit the Miniconda website and download the appropriate installer for your operating system.
- Follow the installation instructions for your platform.
Create a new Conda environment:

conda env create -f environment.yml

Activate the new environment:

conda activate gym-simple-rl

Usage 🎮

The script supports three main modes of operation:

Train 📚

Train an agent using a specific algorithm:

python main.py train --algo <algorithm_name> --seeds <seed_values>

Example:

python main.py train --algo qlearning --seeds 123

Evaluate 📊

Evaluate a trained model:

python main.py eval --model_path <path_to_model>

Example:

python main.py eval --model_path output/best_cartpole_model.npy

Tune ⚡

Perform hyperparameter optimization:

python main.py tune --study_name <study_name> --seeds <seed_values> --n_trials <num_trials> --algo <algorithm_name>

Example:

python main.py tune --study_name sarsa_study --seeds 123 456 789 --n_trials 100 --algo sarsa

Additional arguments:

--n_timesteps: Number of timesteps for training (default: 500,000)
--trial_prune_interval: Interval for pruning trials in Optuna (default: 500)

Configuration ⚙️

Customize your experiment through the configuration dictionary:

Environment ID
State discretization settings
Learning parameters (learning rate, discount factor, etc.)
Exploration parameters (epsilon min/max, decay rate)
Algorithm-specific parameters (e.g., λ for eligibility trace methods)

Key Components 🔑

State Discretization 📊: Smart conversion of continuous state space to discrete values
Action Selection 🎯: Implements ε-greedy policy for balanced exploration
Learning Functions 🧠: Clean implementation of each supported algorithm
Training Loop 🔄: Efficient main training process
Evaluation 👀: Visual feedback of your agent's performance
Hyperparameter Tuning 🎛️: Optuna-powered optimization

Logging and Visualization 📈

Track your agent's progress with:

Console logging for real-time updates
TensorBoard logging for detailed metrics
Optional reward plotting

Fire up TensorBoard to see your results:

tensorboard --logdir=runs/cartpole

License 📜

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
environment.yml		environment.yml
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gym-simple-rl 🤖

Table of Contents

Supported Algorithms 🧮

Installation 🔧

Usage 🎮

Train 📚

Evaluate 📊

Tune ⚡

Configuration ⚙️

Key Components 🔑

Logging and Visualization 📈

License 📜

About

Languages

License

tsilva/gym-simple-rl

Folders and files

Latest commit

History

Repository files navigation

gym-simple-rl 🤖

Table of Contents

Supported Algorithms 🧮

Installation 🔧

Usage 🎮

Train 📚

Evaluate 📊

Tune ⚡

Configuration ⚙️

Key Components 🔑

Logging and Visualization 📈

License 📜

About

Resources

License

Stars

Watchers

Forks

Languages