Skip to content

Commit

Permalink
updated RL docs
Browse files Browse the repository at this point in the history
  • Loading branch information
sytelus committed Nov 10, 2017
1 parent 2bff15e commit a0e70d6
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 12 deletions.
Binary file added docs/images/dqn_car.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/dqn_quadcopter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
58 changes: 47 additions & 11 deletions docs/reinforcement_learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,20 @@ This is still in active development. What we share below is a framework that can

## RL with Car

This example works with AirSimNeighborhood environment available in releases.
First, we need to get the images from simulation and transform it appropriately. Below, we show how a depth image can be obtained from the ego camera and transformed to a 84X84 input to the network. (you can use other sensor modalities, and sensor inputs as well – of course you’ll have to modify the code accordingly)
[Source code](https://github.com/Microsoft/AirSim/blob/master/PythonClient/DQNcar.py)

This example works with AirSimNeighborhood environment available in [releases](https://github.com/Microsoft/AirSim/releases).

First, we need to get the images from simulation and transform it appropriately. Below, we show how a depth image can be obtained from the ego camera and transformed to a 84X84 input to the network. (you can use other sensor modalities, and sensor inputs as well – of course you’ll have to modify the code accordingly).

```
responses = client.simGetImages([ImageRequest(0, AirSimImageType.DepthPerspective, True, False)])
current_state = transform_input(responses)
```

We further define the six actions (breaking, straight with throttle, full-left with throttle, full-right with throttle, half-left with throttle, half-right with throttle) that an agent can execute. This is done via the function “interpret_action”
We further define the six actions (breaking, straight with throttle, full-left with throttle, full-right with throttle, half-left with throttle, half-right with throttle) that an agent can execute. This is done via the function `interpret_action`:

```
def interpret_action(action):
car_controls.brake = 0
car_controls.throttle = 1
Expand All @@ -32,7 +40,11 @@ def interpret_action(action):
else:
car_controls.steering = -0.25
return car_controls
We then define the reward function in “compute_reward” as a convex combination of how fast the vehicle is travelling and how much it deviates from the center line. The agent gets a high reward when its moving fast and staying in the center of the lane.
```

We then define the reward function in `compute_reward` as a convex combination of how fast the vehicle is travelling and how much it deviates from the center line. The agent gets a high reward when its moving fast and staying in the center of the lane.

```
def compute_reward(car_state):
MAX_SPEED = 300
MIN_SPEED = 10
Expand All @@ -57,7 +69,11 @@ def compute_reward(car_state):
reward = reward_dist + reward_speed
return reward
The function “isDone” determines if the episode has terminated (e.g. due to collision). We look at the speed of the vehicle and if it is less than a threshold than the episode is considered to be terminated.
```

The function `isDone` determines if the episode has terminated (e.g. due to collision). We look at the speed of the vehicle and if it is less than a threshold than the episode is considered to be terminated.

```
def isDone(car_state, car_controls, reward):
done = 0
if reward < -1:
Expand All @@ -66,20 +82,32 @@ def isDone(car_state, car_controls, reward):
if car_state.speed <= 5:
done = 1
return done
```

The main loop then sequences through obtaining the image, computing the action to take according to the current policy, getting a reward and so forth.
If the episode terminates then we reset the vehicle to the original state via:

```
client.reset()
car_control = interpret_action(1) // Reset position and drive straight for one second
client.setCarControls(car_control)
time.sleep(1)
```

Note that the simulation needs to be up and running before you execute DQNcar.py. The video below shows first few episodes of DQN training.

<Insert Car Video>
[![Reinforcement Learning - Car](images/dqn_car.png)](https://youtu.be/fv-oFPAqSZ4)

## RL with Quadrotor

[Source code](https://github.com/Microsoft/AirSim/blob/master/PythonClient/DQNdrone.py)

This example works with AirSimMountainLandscape environment available in [releases](https://github.com/Microsoft/AirSim/releases).

RL with Quadrotor (use AIrSimlandscape environment with this)
We can similarly apply RL for various autonomous flight scenarios with quadrotors. Below is an example on how RL could be used to train quadrotors to follow high tension powerlines (e.g. application for energy infrastructure inspection).
There are seven actions here that correspond to different directions in which the quadrotor can move in (six directions + one hovering action)
There are seven actions here that correspond to different directions in which the quadrotor can move in (six directions + one hovering action).

```
def interpret_action(action):
if action == 0:
quad_offset = (0, 0, 0)
Expand All @@ -96,9 +124,11 @@ def interpret_action(action):
elif action == 6:
quad_offset = (0, 0, -1)
return quad_offset
```

The reward again is a function how how fast the quad travels in conjunction with how far it gets from the known powerlines.

```
def compute_reward(quad_state, quad_vel, collision_info):
thresh_dist = 10
beta = 1
Expand All @@ -120,14 +150,20 @@ def compute_reward(quad_state, quad_vel, collision_info):
reward = 2*(0.5 - math.exp(-beta*dist)) + np.linalg.norm([quad_vel.x_val, quad_vel.y_val, quad_vel.z_val])
return reward
```

We consider an episode to terminate if it drifts too much away from the known powerline coordinates.
The reset function here flies the quadrotor to the initial starting point

The reset function here flies the quadrotor to the initial starting point:

```
if done:
client.moveToZ(clearZ, 2)
client.moveToPosition(initX, initY, clearZ, 2)
client.moveToPosition(initZ, initY, initZ, 2)
current_step +=1
Here is the video of first few episodes during the training.
<insert quad video>
```

Here is the video of first few episodes during the training.

[![Reinforcement Learning - Quadrotor](images/dqn_quadcopter.png)](https://youtu.be/uKm15Y3M1Nk)
2 changes: 1 addition & 1 deletion docs/who_is_using.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Who is Using AirSim?

#### Would you like to see your own project here? Please add a [GitHub issue](https://github.com/microsoft/airsim/issues) with quick details and link to your website or paper.
Would you like to see your own group or project here? Just add a [GitHub issue](https://github.com/microsoft/airsim/issues) with quick details and link to your website or paper or send us email at msrair at microsoft.com.

* [Astrobotic](https://www.astrobotic.com/technology) - [Kerry Snyder](http://www.ohscope.com/)
* [Ghent University](https://www.ugent.be) - [Nils Tijtgat](https://timebutt.github.io/static/author/nils/)
Expand Down

0 comments on commit a0e70d6

Please sign in to comment.