Skip to content

Commit dd5bf4a

Browse files
[Draft] PettingZoo Support (LucasAlegre#45)
PettingZoo env and parallel_env suppor! Co-authored-by: Lucas Alegre <[email protected]>
1 parent f0b387f commit dd5bf4a

16 files changed

+409
-101
lines changed

.github/workflows/linux-test.yml

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: Python tests
2+
3+
on:
4+
push:
5+
branches: [ master ]
6+
pull_request:
7+
branches: [ master ]
8+
9+
jobs:
10+
linux-test:
11+
runs-on: ubuntu-20.04
12+
strategy:
13+
matrix:
14+
python-version: ['3.6', '3.7', '3.8', '3.9']
15+
steps:
16+
- uses: actions/checkout@v2
17+
- name: Set up Python ${{ matrix.python-version }}
18+
uses: actions/setup-python@v2
19+
with:
20+
python-version: ${{ matrix.python-version }}
21+
- name: Install dependencies
22+
run: |
23+
sudo add-apt-repository ppa:sumo/stable
24+
sudo apt-get update
25+
sudo apt-get install sumo sumo-tools sumo-doc
26+
pip install pytest
27+
pip install -e .[all]
28+
- name: Full Python tests
29+
run: |
30+
export SUMO_HOME="/usr/share/sumo"
31+
export LIBSUMO_AS_TRACI=1
32+
pytest ./tests/pz_test.py

README.md

+18-3
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,9 @@
88

99
SUMO-RL provides a simple interface to instantiate Reinforcement Learning environments with [SUMO](https://github.com/eclipse/sumo) for Traffic Signal Control.
1010

11-
The main class [SumoEnvironment](https://github.com/LucasAlegre/sumo-rl/blob/master/environment/env.py) inherits [MultiAgentEnv](https://github.com/ray-project/ray/blob/master/python/ray/rllib/env/multi_agent_env.py) from [RLlib](https://github.com/ray-project/ray/tree/master/python/ray/rllib).
11+
The main class [SumoEnvironment](https://github.com/LucasAlegre/sumo-rl/blob/master/sumo_rl/environment/env.py) behaves like a [MultiAgentEnv](https://github.com/ray-project/ray/blob/master/python/ray/rllib/env/multi_agent_env.py) from [RLlib](https://github.com/ray-project/ray/tree/master/python/ray/rllib).
1212
If instantiated with parameter 'single-agent=True', it behaves like a regular [Gym Env](https://github.com/openai/gym/blob/master/gym/core.py) from [OpenAI](https://github.com/openai).
13+
Call [env](https://github.com/LucasAlegre/sumo-rl/blob/master/sumo_rl/environment/env.py) or [parallel_env](https://github.com/LucasAlegre/sumo-rl/blob/master/sumo_rl/environment/env.py) for [PettingZoo](https://github.com/PettingZoo-Team/PettingZoo) environment support.
1314
[TrafficSignal](https://github.com/LucasAlegre/sumo-rl/blob/master/sumo_rl/environment/traffic_signal.py) is responsible for retrieving information and actuating on traffic lights using [TraCI](https://sumo.dlr.de/wiki/TraCI) API.
1415

1516
Goals of this repository:
@@ -57,9 +58,10 @@ pip install -e .
5758
### Observation
5859
The default observation for each traffic signal agent is a vector:
5960
```
60-
obs = [phase_one_hot, lane_1_density,...,lane_n_density, lane_1_queue,...,lane_n_queue]
61+
obs = [phase_one_hot, min_green_elapsed, lane_1_density,...,lane_n_density, lane_1_queue,...,lane_n_queue]
6162
```
6263
- ```phase_one_hot``` is a one-hot encoded vector indicating the current active green phase
64+
- ```min_green_elapsed``` is a binary variable indicating whether min_green seconds have already passed in the current phase
6365
- ```lane_i_density``` is the number of vehicles in incoming lane i dividided by the total capacity of the lane
6466
- ```lane_i_queue```is the number of queued (speed below 0.1 m/s) vehicles in incoming lane i divided by the total capacity of the lane
6567

@@ -73,7 +75,7 @@ E.g.: In the [2-way single intersection](https://github.com/DLR-RM/stable-baseli
7375

7476
<img src="outputs/actions.png" align="center" width="75%"/>
7577

76-
Obs: Every time a phase change occurs, the next phase is preeceded by a yellow phase lasting ```yellow_time``` seconds.
78+
Important: every time a phase change occurs, the next phase is preeceded by a yellow phase lasting ```yellow_time``` seconds.
7779

7880
### Rewards
7981
The default reward function is the change in cumulative vehicle delay:
@@ -86,6 +88,19 @@ You can define your own reward function changing the method 'compute_reward' of
8688

8789
## Examples
8890

91+
### PettingZoo API
92+
```python
93+
env = sumo_rl.env(net_file='sumo_net_file.net.xml',
94+
route_file='sumo_route_file.rou.xml',
95+
use_gui=True,
96+
num_seconds=3600)
97+
env.reset()
98+
for agent in env.agent_iter():
99+
observation, reward, done, info = env.last()
100+
action = policy(observation)
101+
env.step(action)
102+
```
103+
89104
Check [experiments](https://github.com/LucasAlegre/sumo-rl/tree/master/experiments) to see how to instantiate a SumoEnvironment and use it with your RL algorithm.
90105

91106
### [Q-learning](https://github.com/LucasAlegre/sumo-rl/blob/master/agents/ql_agent.py) in a one-way single intersection:

experiments/a2c_2way-single-intersection.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,14 @@
1818

1919
write_route_file('nets/2way-single-intersection/single-intersection-gen.rou.xml', 400000, 100000)
2020

21-
# multiprocess environment
22-
n_cpu = 1
2321
env = SubprocVecEnv([lambda: SumoEnvironment(net_file='nets/2way-single-intersection/single-intersection.net.xml',
2422
route_file='nets/2way-single-intersection/single-intersection-gen.rou.xml',
2523
out_csv_name='outputs/2way-single-intersection/a2c',
2624
single_agent=True,
2725
use_gui=False,
2826
num_seconds=100000,
2927
min_green=5,
30-
max_depart_delay=0) for _ in range(n_cpu)])
28+
max_depart_delay=0)])
3129

3230
model = A2C(MlpPolicy, env, verbose=1, learning_rate=0.001, lr_schedule='constant')
3331
model.learn(total_timesteps=100000)

experiments/a3c_4x4grid.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
import argparse
21
import os
32
import sys
43
if 'SUMO_HOME' in os.environ:
@@ -10,27 +9,28 @@
109
import ray
1110
from ray.rllib.agents.a3c.a3c import A3CTrainer
1211
from ray.rllib.agents.a3c.a3c_tf_policy import A3CTFPolicy
12+
from ray.rllib.env import PettingZooEnv
1313
from ray.tune.registry import register_env
1414
from gym import spaces
1515
import numpy as np
16-
from sumo_rl import SumoEnvironment
16+
import sumo_rl
1717
import traci
1818

1919

2020
if __name__ == '__main__':
2121
ray.init()
2222

23-
register_env("4x4grid", lambda _: SumoEnvironment(net_file='nets/4x4-Lucas/4x4.net.xml',
23+
register_env("4x4grid", lambda _: PettingZooEnv(sumo_rl.env(net_file='nets/4x4-Lucas/4x4.net.xml',
2424
route_file='nets/4x4-Lucas/4x4c1c2c1c2.rou.xml',
2525
out_csv_name='outputs/4x4grid/a3c',
2626
use_gui=False,
2727
num_seconds=80000,
28-
max_depart_delay=0))
28+
max_depart_delay=0)))
2929

3030
trainer = A3CTrainer(env="4x4grid", config={
3131
"multiagent": {
3232
"policies": {
33-
'0': (A3CTFPolicy, spaces.Box(low=np.zeros(10), high=np.ones(10)), spaces.Discrete(2), {})
33+
'0': (A3CTFPolicy, spaces.Box(low=np.zeros(11), high=np.ones(11)), spaces.Discrete(2), {})
3434
},
3535
"policy_mapping_fn": (lambda id: '0') # Traffic lights are always controlled by this policy
3636
},

experiments/ql_4x4grid.py

+5-3
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,10 @@
2424

2525
env = SumoEnvironment(net_file='nets/4x4-Lucas/4x4.net.xml',
2626
route_file='nets/4x4-Lucas/4x4c1c2c1c2.rou.xml',
27-
use_gui=True,
27+
use_gui=False,
2828
num_seconds=80000,
29+
min_green=8,
30+
delta_time=5,
2931
max_depart_delay=0)
3032

3133
for run in range(1, runs+1):
@@ -42,11 +44,11 @@
4244
actions = {ts: ql_agents[ts].act() for ts in ql_agents.keys()}
4345

4446
s, r, done, info = env.step(action=actions)
45-
47+
4648
for agent_id in s.keys():
4749
ql_agents[agent_id].learn(next_state=env.encode(s[agent_id], agent_id), reward=r[agent_id])
4850

49-
env.save_csv('outputs/4x4/ql_test', run)
51+
env.save_csv('outputs/4x4/ql-test!', run)
5052
env.close()
5153

5254

experiments/ql_4x4grid_pz.py

+52
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
import argparse
2+
import os
3+
import sys
4+
import pandas as pd
5+
6+
if 'SUMO_HOME' in os.environ:
7+
tools = os.path.join(os.environ['SUMO_HOME'], 'tools')
8+
sys.path.append(tools)
9+
else:
10+
sys.exit("Please declare the environment variable 'SUMO_HOME'")
11+
12+
import traci
13+
import sumo_rl
14+
from sumo_rl.agents import QLAgent
15+
from sumo_rl.exploration import EpsilonGreedy
16+
17+
18+
if __name__ == '__main__':
19+
20+
alpha = 0.1
21+
gamma = 0.99
22+
decay = 1
23+
runs = 1
24+
25+
env = sumo_rl.env(net_file='nets/4x4-Lucas/4x4.net.xml',
26+
route_file='nets/4x4-Lucas/4x4c1c2c1c2.rou.xml',
27+
use_gui=False,
28+
min_green=8,
29+
delta_time=5,
30+
num_seconds=80000,
31+
max_depart_delay=0)
32+
33+
for run in range(1, runs+1):
34+
env.reset()
35+
initial_states = {ts: env.observe(ts) for ts in env.agents}
36+
ql_agents = {ts: QLAgent(starting_state=env.unwrapped.env.encode(initial_states[ts], ts),
37+
state_space=env.observation_spaces[ts],
38+
action_space=env.action_spaces[ts],
39+
alpha=alpha,
40+
gamma=gamma,
41+
exploration_strategy=EpsilonGreedy(initial_epsilon=0.05, min_epsilon=0.005, decay=decay)) for ts in env.agents}
42+
infos = []
43+
for agent in env.agent_iter():
44+
s, r, done, info = env.last()
45+
if ql_agents[agent].action is not None:
46+
ql_agents[agent].learn(next_state=env.unwrapped.env.encode(s, agent), reward=r)
47+
48+
action = ql_agents[agent].act() if not done else None
49+
env.step(action)
50+
51+
env.unwrapped.env.save_csv('outputs/4x4/pz_ql', run)
52+
env.close()

experiments/sarsa_2way-single-intersection.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -41,8 +41,7 @@
4141
use_gui=args.gui,
4242
num_seconds=args.seconds,
4343
min_green=args.min_green,
44-
max_green=args.max_green,
45-
max_depart_delay=0)
44+
max_green=args.max_green)
4645

4746
for run in range(1, args.runs+1):
4847
obs = env.reset()

experiments/sarsa_double.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,7 @@ def run(use_gui=True, runs=1):
2525
num_seconds=86400,
2626
yellow_time=3,
2727
min_green=5,
28-
max_green=60,
29-
max_depart_delay=300,
30-
time_to_load_vehicles=0)
28+
max_green=60)
3129

3230
fixed_tl = False
3331
agents = {ts_id: TrueOnlineSarsaLambda(env.observation_spaces(ts_id), env.action_spaces(ts_id), alpha=0.000000001, gamma=0.95, epsilon=0.05, lamb=0.1, fourier_order=7)

experiments/sb3.py

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
from stable_baselines3 import PPO
2+
import sumo_rl
3+
import supersuit as ss
4+
from stable_baselines3.common.vec_env import VecMonitor
5+
from stable_baselines3.common.evaluation import evaluate_policy
6+
from stable_baselines3.common.callbacks import EvalCallback
7+
import numpy as np
8+
from array2gif import write_gif
9+
10+
n_evaluations = 20
11+
n_agents = 2
12+
n_envs = 1
13+
n_timesteps = 8000000
14+
15+
env = sumo_rl.parallel_env(net_file='nets/4x4-Lucas/4x4.net.xml',
16+
route_file='nets/4x4-Lucas/4x4c1c2c1c2.rou.xml',
17+
out_csv_name='outputs/4x4grid/test',
18+
use_gui=False,
19+
num_seconds=80000)
20+
21+
env = ss.frame_stack_v1(env, 3)
22+
env = ss.pettingzoo_env_to_vec_env_v0(env)
23+
env = ss.concat_vec_envs_v0(env, n_envs, num_cpus=1, base_class='stable_baselines3')
24+
env = VecMonitor(env)
25+
26+
""" eval_env = sumo_rl.parallel_env(net_file='nets/4x4-Lucas/4x4.net.xml',
27+
route_file='nets/4x4-Lucas/4x4c1c2c1c2.rou.xml',
28+
out_csv_name='outputs/4x4grid/test',
29+
use_gui=False,
30+
num_seconds=80000)
31+
32+
eval_env = ss.frame_stack_v1(eval_env, 3)
33+
eval_env = ss.pettingzoo_env_to_vec_env_v0(eval_env)
34+
eval_env = ss.concat_vec_envs_v0(eval_env, 1, num_cpus=1, base_class='stable_baselines3')
35+
eval_env = VecMonitor(eval_env) """
36+
37+
eval_freq = int(n_timesteps / n_evaluations)
38+
eval_freq = max(eval_freq // (n_envs*n_agents), 1)
39+
40+
model = PPO("MlpPolicy", env, verbose=3, gamma=0.95, n_steps=256, ent_coef=0.0905168, learning_rate=0.00062211, vf_coef=0.042202, max_grad_norm=0.9, gae_lambda=0.99, n_epochs=5, clip_range=0.3, batch_size=256)
41+
#eval_callback = EvalCallback(eval_env, best_model_save_path='./logs/', log_path='./logs/', eval_freq=eval_freq, deterministic=True, render=False)
42+
model.learn(total_timesteps=n_timesteps) #callback=eval_callback)
43+
44+
model = PPO.load("./logs/best_model")
45+
46+
mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=10)
47+
48+
print(mean_reward)
49+
print(std_reward)
50+
51+
""" render_env = sumo_rl.env(net_file='nets/4x4-Lucas/4x4.net.xml',
52+
route_file='nets/4x4-Lucas/4x4c1c2c1c2.rou.xml',
53+
out_csv_name='outputs/4x4grid/test',
54+
use_gui=False,
55+
num_seconds=80000)
56+
57+
render_env = render_env.parallel_env()
58+
render_env = ss.color_reduction_v0(render_env, mode='B')
59+
render_env = ss.resize_v0(render_env, x_size=84, y_size=84)
60+
render_env = ss.frame_stack_v1(render_env, 3)
61+
62+
obs_list = []
63+
i = 0
64+
render_env.reset()
65+
66+
67+
while True:
68+
for agent in render_env.agent_iter():
69+
observation, _, done, _ = render_env.last()
70+
action = model.predict(observation, deterministic=True)[0] if not done else None
71+
72+
render_env.step(action)
73+
i += 1
74+
if i % (len(render_env.possible_agents)) == 0:
75+
obs_list.append(np.transpose(render_env.render(mode='rgb_array'), axes=(1, 0, 2)))
76+
render_env.close()
77+
break
78+
79+
print('Writing gif')
80+
write_gif(obs_list, 'kaz.gif', fps=15) """

setup.py

+8-5
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,24 @@
11
from setuptools import setup, find_packages
22

3-
REQUIRED = ['gym', 'numpy', 'pandas', 'ray[rllib]']
3+
REQUIRED = ['gym', 'numpy', 'pandas', 'pillow']
44

5-
with open("README.md", "r") as fh:
6-
long_description = fh.read()
5+
extras = {
6+
"pettingzoo": ["pettingzoo"],
7+
}
8+
extras["all"] = extras["pettingzoo"]
79

810
setup(
911
name='sumo-rl',
1012
version='1.0',
11-
packages=['sumo_rl',],
13+
packages=['sumo_rl'],
1214
install_requires=REQUIRED,
15+
extras_require=extras,
1316
author='LucasAlegre',
1417
author_email='[email protected]',
1518
url='https://github.com/LucasAlegre/sumo-rl',
1619
download_url='https://github.com/LucasAlegre/sumo-rl/archive/v1.0.tar.gz',
1720
long_description=open("README.md", encoding="utf-8").read(),
1821
long_description_content_type="text/markdown",
1922
license="MIT",
20-
description='Environments inheriting OpenAI Gym Env and RL algorithms for Traffic Signal Control on SUMO.'
23+
description='RL environments and learning code for traffic signal control in SUMO.'
2124
)

sumo_rl/.DS_Store

6 KB
Binary file not shown.

sumo_rl/__init__.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
from sumo_rl.environment.env import SumoEnvironment
1+
from sumo_rl.environment.env import SumoEnvironment
2+
from sumo_rl.environment.env import env, parallel_env

sumo_rl/environment/.DS_Store

6 KB
Binary file not shown.

0 commit comments

Comments
 (0)