A hands-on exploration of reinforcement learning algorithms solving the CartPole environment from OpenAI Gym (now Gymnasium). Perfect for learning and experimenting with RL basics!
Note: This project was built for self-learning and experimentation purposes with extensive assistance from various Large Language Models (LLMs). 🤝
- Supported Algorithms
- Installation
- Miniconda Setup
- Usage
- Configuration
- Key Components
- Logging and Visualization
- Contributing
- License
- Q-Learning
- SARSA
- Expected SARSA
- Q-Learning with Eligibility Traces (Q(λ))
- SARSA with Eligibility Traces (SARSA(λ))
- True Online SARSA(λ)
- Clone this repository:
git clone https://github.com/tsilva/gym-simple-rl.git
cd gym-simple-rl
-
Install Miniconda:
- Visit the Miniconda website and download the appropriate installer for your operating system.
- Follow the installation instructions for your platform.
-
Create a new Conda environment:
conda env create -f environment.yml
- Activate the new environment:
conda activate gym-simple-rl
The script supports three main modes of operation:
Train an agent using a specific algorithm:
python main.py train --algo <algorithm_name> --seeds <seed_values>
Example:
python main.py train --algo qlearning --seeds 123
Evaluate a trained model:
python main.py eval --model_path <path_to_model>
Example:
python main.py eval --model_path output/best_cartpole_model.npy
Perform hyperparameter optimization:
python main.py tune --study_name <study_name> --seeds <seed_values> --n_trials <num_trials> --algo <algorithm_name>
Example:
python main.py tune --study_name sarsa_study --seeds 123 456 789 --n_trials 100 --algo sarsa
Additional arguments:
--n_timesteps
: Number of timesteps for training (default: 500,000)--trial_prune_interval
: Interval for pruning trials in Optuna (default: 500)
Customize your experiment through the configuration dictionary:
- Environment ID
- State discretization settings
- Learning parameters (learning rate, discount factor, etc.)
- Exploration parameters (epsilon min/max, decay rate)
- Algorithm-specific parameters (e.g., λ for eligibility trace methods)
- State Discretization 📊: Smart conversion of continuous state space to discrete values
- Action Selection 🎯: Implements ε-greedy policy for balanced exploration
- Learning Functions 🧠: Clean implementation of each supported algorithm
- Training Loop 🔄: Efficient main training process
- Evaluation 👀: Visual feedback of your agent's performance
- Hyperparameter Tuning 🎛️: Optuna-powered optimization
Track your agent's progress with:
- Console logging for real-time updates
- TensorBoard logging for detailed metrics
- Optional reward plotting
Fire up TensorBoard to see your results:
tensorboard --logdir=runs/cartpole
This project is licensed under the MIT License - see the LICENSE file for details.