QGym is an open-source simulation framework designed to benchmark queuing policies across diverse and realistic problem instances. The framework supports a wide range of environments including parallel servers, criss-cross, tandem, and re-entrant networks. It provides a platform for comparing both model-free RL methods and classical queuing policies. See more details in our paper.
- OpenAI Gym Interface: Easy deployment of RL algorithms.
- Event-driven Simulation: Precise timekeeping and fast simulation.
- Job-Level Tracking: Allows modeling parallel server systems.
- Arbitrary Arrival Patterns: Simulates time-varying arrival patterns.
- Server Pool: Fast simulation for a large number of same-class servers.
- Batch Simulation: Efficient parallel simulation of multiple trajectories.
- Open-sourced: Adaptable to custom needs.
main
run_experiments.py
: Runs a batch of experiments configured inconfigs/experiments
. See Running Experiments for details.trainer.py
: Defines the Trainer class, which is a collection of methods for training and evaluating models.env.py
Simulator code with OpenAI Gym interface
configs
experiments
: Each directory underexperiments
contains one or more YAML files. Each YAML file configures a single experiment. See Experiment Configuration for details.env
: Contains YAML files for specifying queueing networks.model
: Contains YAML files for specifying queueing policies.scripts
: Contains Python files for defining arrival and serive patterns.
logs
: Contains loss and switchplot logging for experimentspolicies
: Contains queueing policy implementationsutils
: Contains utility functions for routing and plotting.RL
: Contains the PPO and its variant baselines used in the benchmark.policies
: Contains various PPO policies, including 'WC' and 'Vanilla'PPO
: Contains the implementation of PPO algorithmtrainer.py
: PPO trainertrain.py
: Main script for launching PPO experiments.
utils
:rl_env.py
: Wrapper for the queueing environment to make it compatible with RL algorithmseval.py
: Evaluation utilities for RL models
policy_configs
: Contains YAML configuration files for different RL policies
User can write customized experiment code using only our environment as an OpenAI Gym Envrionment using DiffDiscreteEventSystem
class in main/env.py
. User can interact with this environment with reset
and step
method as other OpenAI Gym environments.
Below is detailed documentation of DiffDiscreteEventSystem
class and other helper classes in main/env.py
.
To use this simulator, create an instance of DiffDiscreteEventSystem
with appropriate parameters, then use the reset()
method to initialize the environment and step(action)
to simulate the system over time.
Example:
env = DiffDiscreteEventSystem(network, mu, h, draw_service, draw_inter_arrivals)
obs, state = env.reset()
for _ in range(num_steps):
action = policy(obs)
obs, reward, done, truncated, info = env.step(action)
See details in how to define each parameter for a queue system in section Defining a Queueing Network
-
network
(torch.Tensor): The network topology. -
mu
(torch.Tensor): Service rates for each server. -
h
(float): Holding cost per unit time for jobs in queues. -
draw_service
(callable): Function to draw service times for jobs in the queues.- Input:
time
(torch.Tensor): Current simulation time, shape (batch_size, 1).
- Output:
torch.Tensor
: Drawn service times, shape (batch_size, 1, num_queues). Each element represents the service time for a new job in the corresponding queue.
- Input:
-
draw_inter_arrivals
(callable): Function to draw inter-arrival times for each queue.- Input:
time
(torch.Tensor): Current simulation time, shape (batch_size, 1).
- Output:
torch.Tensor
: Drawn inter-arrival times, shape (batch_size, num_queues). Each element represents the time until the next arrival for the corresponding queue.
- Input:
-
init_time
(float, optional): Initial simulation time. Default is 0. -
batch
(int, optional): Batch size for parallel simulations. Default is 1. -
queue_event_options
(torch.Tensor, optional): Custom queue event options. -
straight_through_min
(bool, optional): Use straight-through estimator for min operation. Default is False. -
queue_lim
(int, optional): Maximum queue length. -
temp
(float, optional): Temperature for Gumbel-Softmax. Default is 1. -
seed
(int, optional): Random seed. Default is 3003. -
device
(str, optional): Device to run computations on. Default is "cpu". -
f_hook
(bool, optional): Enable hooks for debugging. Default is False. -
f_verbose
(bool, optional): Enable verbose output. Default is False.
Reset the environment to its initial state.
-
Parameters:
init_queues
(torch.Tensor, optional): Initial queue lengths. If None, all queues start empty.time
(torch.Tensor, optional): Initial simulation time. If None, starts at 0.seed
(int, optional): Random seed for reproducibility.
-
Returns:
Obs
: An Obs namedtuple containing:queues
(torch.Tensor): Initial queue lengths.time
(torch.Tensor): Initial simulation time.
EnvState
: An EnvState namedtuple containing:queues
(torch.Tensor): Initial queue lengths.time
(torch.Tensor): Initial simulation time.service_times
(list of lists): Initial service times for each job in each queue.arrival_times
(torch.Tensor): Initial time until next arrival for each queue.
Perform one step of the simulation given an action.
-
Parameters:
action
(torch.Tensor): The action to take, representing the allocation of servers to queues.
-
Returns:
queues
(torch.Tensor): Updated queue lengths after the step.reward
(torch.Tensor): Negative of the cost incurred during this step.done
(bool): Whether the episode has ended (always False in this implementation).truncated
(bool): Whether the episode was truncated (always False in this implementation).info
(dict): Additional information about the step, containing:obs
(Obs): Current observation after the step.state
(EnvState): Full environment state after the step.cost
(torch.Tensor): Cost incurred during this step.event_time
(torch.Tensor): Time elapsed during this step.queues
(torch.Tensor): Current queue lengths after the step.
Get the current observation of the system state.
- Returns:
torch.Tensor
: Current queue lengths.
Print the current state of the system.
- Prints:
- Total accumulated cost
- Total time elapsed
- Current queue lengths
- Remaining service times for jobs in each queue
- Time until next arrival for each queue
In addition to the Gym environment for queueing system, we provide interface for easy configuration of queuing systems, policies, training and testing procedure. Users can easily create and run training/testing experiments and view results. We provide a demo in this Colab notebook. We detail each step in configuring an experiment below:
In the main
directory, run the run_experiments.py
script with the -exp_dir
argument set to the name of the subdirectory in configs/experiments
containing the desired experiment YAML files.
For example, to run all experiments in the reentrant_5
subdirectory, run:
python main/run_experiments.py -exp_dir=reentrant_5
Experiments are configured using YAML files located in the configs/experiments
directory. Each experiment has its own subdirectory containing one or more YAML files specifying the environment, model, and script to run.
An example experiment YAML file:
env: 'reentrant_5.yaml'
model: 'ppg_linearassignment.yaml'
script: 'fixed_arrival_rate_cmuq.py'
experiment_name: 'reentrant_5_cmuq'
Further description of each field:
env
: refers to the file located underconfigs/env
. Use tihs file to define parameters for queuing network. See section Defining a Queueing Network for more details.model
: refers to the file located underconfigs/model
. Use this file to define parameters for routing policy. See section Defining a Queueing Policy for more details.script
: refers to the file located underconfigs/scripts
. Use this file to (1) define arrival and service patterns as functions of time and using parameters specified inenv
file; (2) Specify which policy class to use and create policy using parameters specified inmodel
file; (3) train and evaluate the policy and output loss log underlogs/{experiment_name}
.
Parameters for a queueing network is defined in a file under the configs/env
directory. Each YAML file contains the following keys:
network
: Defines the network topology.mu
: Defines the arrival rate.lam_params
: Defines the arrival parameters.server_pool_size
: Defines the server pool size.h
: Defines holding cost for each queue.queue_event_options
: Define changes to each queue at each arrival or service event.train_T
,test_T
: step number of each simulation trajectory.init_queues
: Initial queue lengths
The figure below shows an intuitive illustration of ingredients of a queueing network and example parameters for criss-cross network.
We also provide configuration files of all systems we benchmarked in our paper in configs/env
. Refer to the figure below for an intuitive illustration of the queueing systems we benchmarked:
The arrival and service patterns are defined in the configs/scripts
directory.
Define an arrival and service patterns as arbitrary function of time that returns time until next arrival or service for each queue.
draw_inter_arrivals(self, time)
...
return interarrivals
To define the exact logic of the policy, use file policies/<policy>.py
. Each of policy file contains a class that implements the policy. The policy class is used in configs/scripts/<script>.py
.
Each class includes mandatory test_forward
that takes in observations and return queue-server priority matrix. Optionally, it can include train_forward
for training the policy.
We provide the code for policies we benchmarked in our paper in polices
directory.
Parameters for a queueing policy is defined in a file under configs/model
directory. Each YAML file contains the following important hyperparameters:
test_batch_size
: The batch size for the test set.num_epochs
: The number of epochs to train the policy.test_policy
,train_policy
: Assignment algorithm. Supported options arelinear_assignment
,sinkhorn
, andsoftmax
.
For static policies such as c-$\mu$ and max weight, use ppg_linearassignment.yaml
.
We provide codebase to train and evaluate several RL baselines. To run a reinforcement learning experiment, use the following command in the RL/PPO
directory:
python train.py <policy-config-name> <queue-env-name>
Specifically, you can choose from three policy configs:
WC.yaml
: Work-conserving policyvanilla.yaml
: Vanilla policydiscrete.yaml
: Discrete action space policy
These policy config files are located in the QGym/RL/policy_configs
directory. For the queue environment name, use the name of the YAML file (without the .yaml extension) from the QGym/configs/env
directory that defines your desired queueing network.
For example, to train a work-conserving policy on the reentrant line with 2 stations, you would run:
python train.py WC reentrant_2
To simulate different environments, such as defining different service and arrival functions, you can go to QGym/RL/utils/rl_env.py
and change your custom functions in load_rl_p_env
.
We show current benchmarking results below
Network | MW | MP | FP | PPO | PPO BC | PPO WC | |
---|---|---|---|---|---|---|---|
Criss Cross BH |
MW | MP | FP | PPO | PPO BC | PPO WC | ||
---|---|---|---|---|---|---|---|
2 | |||||||
3 | |||||||
4 | |||||||
5 | |||||||
6 | |||||||
7 | |||||||
8 | |||||||
9 | |||||||
10 |
Network | MW | MP | FP | PPO | PPO BC | PPO WC | |
---|---|---|---|---|---|---|---|
2 | |||||||
3 | |||||||
4 | |||||||
5 | |||||||
6 | |||||||
7 | |||||||
8 | |||||||
9 | |||||||
10 |
L | cμ | MW | MP | FP | PPO | PPO BC | PPO WC | A2C WC |
---|---|---|---|---|---|---|---|---|
2 | 31.69 ± 1.3 | 22.40 ± 1.2 | 43.8 ± 1.8 | 43.6 ± 7.5 | 9.9E+3 ± 20.7 | 62.7 ± 1.3 | 29.9 ± 0.7 | 30.2 ± 0.7 |
3 | 36.76 ± 1.9 | 43.00 ± 2.2 | 68.7 ± 2.7 | 59.2 ± 8.2 | 19.6E+3 ± 58.0 | 305.1 ± 13.8 | 47.5 ± 0.8 | 47.8 ± 1.1 |
4 | 58.58 ± 2.5 | 74.54 ± 2.8 | 89.4 ± 3.6 | 75.6 ± 15.3 | 18.9E+3 ± 53.1 | 167.2 ± 5.1 | 64.4 ± 1.2 | 62.8 ± 1.4 |
5 | 68.91 ± 4.0 | 73.19 ± 3.7 | 112.0 ± 4.9 | 97.0 ± 12.9 | 48.0E+3 ± 153.5 | 913.4 ± 19.9 | 81.8 ± 1.1 | 84.9 ± 1.2 |
6 | 85.16 ± 4.7 | 98.75 ± 3.9 | 126.7 ± 6.2 | 111.2 ± 14.4 | 59.1E+3 ± 336.4 | 2383.0 ± 15.2 | 99.8 ± 1.5 | 100.8 ± 1.4 |
7 | 100.24 ± 5.9 | 119.01 ± 3.9 | 152.3 ± 6.6 | 151.0 ± 21.3 | 65.4E+3 ± 325.9 | 3054.6 ± 16.6 | 118.2 ± 2.0 | 120.5 ± 2.1 |
Net | MW | MP | FP | PPO | PPO BC | PPO WC | |
---|---|---|---|---|---|---|---|
2 | |||||||
3 | |||||||
4 | |||||||
5 | |||||||
6 | |||||||
7 |
Network | MW | MP | FP | PPO | PPO BC | PPO WC | |
---|---|---|---|---|---|---|---|
N Model | |||||||
Five-by-Five (Time-varying) |
Network | MW | MP | FP | PPO | PPO BC | PPO WC | |
---|---|---|---|---|---|---|---|
Input Switch | |||||||
Hospital |