updated RL docs

kevindehecker · Nov 10, 2017 · a0e70d6 · a0e70d6
1 parent 2bff15e
commit a0e70d6
Show file tree

Hide file tree

Showing 4 changed files with 48 additions and 12 deletions.
diff --git a/docs/images/dqn_car.png b/docs/images/dqn_car.png
diff --git a/docs/images/dqn_quadcopter.png b/docs/images/dqn_quadcopter.png
diff --git a/docs/reinforcement_learning.md b/docs/reinforcement_learning.md
@@ -9,12 +9,20 @@ This is still in active development. What we share below is a framework that can
 
 ## RL with Car
 
-This example works with AirSimNeighborhood environment available in releases.
-First, we need to get the images from simulation and transform it appropriately. Below, we show how a depth image can be obtained from the ego camera and transformed to a 84X84 input to the network. (you can use other sensor modalities, and sensor inputs as well – of course you’ll have to modify the code accordingly)
+[Source code](https://github.com/Microsoft/AirSim/blob/master/PythonClient/DQNcar.py)
+
+This example works with AirSimNeighborhood environment available in [releases](https://github.com/Microsoft/AirSim/releases).
+
+First, we need to get the images from simulation and transform it appropriately. Below, we show how a depth image can be obtained from the ego camera and transformed to a 84X84 input to the network. (you can use other sensor modalities, and sensor inputs as well – of course you’ll have to modify the code accordingly).
+
+```
 responses = client.simGetImages([ImageRequest(0, AirSimImageType.DepthPerspective, True, False)])
 current_state = transform_input(responses)
+```
 
-We further define the six actions (breaking, straight with throttle, full-left with throttle, full-right with throttle, half-left with throttle, half-right with throttle) that an agent can execute. This is done via the function “interpret_action”
+We further define the six actions (breaking, straight with throttle, full-left with throttle, full-right with throttle, half-left with throttle, half-right with throttle) that an agent can execute. This is done via the function `interpret_action`:
+
+```
 def interpret_action(action):
     car_controls.brake = 0
     car_controls.throttle = 1
@@ -32,7 +40,11 @@ def interpret_action(action):
     else:
         car_controls.steering = -0.25
     return car_controls
-We then define the reward function in “compute_reward” as a convex combination of how fast the vehicle is travelling and how much it deviates from the center line. The agent gets a high reward when its moving fast and staying in the center of the lane.
+```
+
+We then define the reward function in `compute_reward` as a convex combination of how fast the vehicle is travelling and how much it deviates from the center line. The agent gets a high reward when its moving fast and staying in the center of the lane.
+
+```
 def compute_reward(car_state):
     MAX_SPEED = 300
     MIN_SPEED = 10
@@ -57,7 +69,11 @@ def compute_reward(car_state):
         reward = reward_dist + reward_speed
 
     return reward
-The function “isDone” determines if the episode has terminated (e.g. due to collision). We look at the speed of the vehicle and if it is less than a threshold than the episode is considered to be terminated.
+```
+
+The function `isDone` determines if the episode has terminated (e.g. due to collision). We look at the speed of the vehicle and if it is less than a threshold than the episode is considered to be terminated.
+
+```
 def isDone(car_state, car_controls, reward):
     done = 0
     if reward < -1:
@@ -66,20 +82,32 @@ def isDone(car_state, car_controls, reward):
         if car_state.speed <= 5:
             done = 1
     return done
+```
+
 The main loop then sequences through obtaining the image, computing the action to take according to the current policy, getting a reward and so forth.
 If the episode terminates then we reset the vehicle to the original state via: 
+
+```
 client.reset()
 car_control = interpret_action(1) // Reset position and drive straight for one second
 client.setCarControls(car_control)
 time.sleep(1)
+```
 
 Note that the simulation needs to be up and running before you execute DQNcar.py. The video below shows first few episodes of DQN training.
 
-<Insert Car Video>
+[![Reinforcement Learning - Car](images/dqn_car.png)](https://youtu.be/fv-oFPAqSZ4)
+
+## RL with Quadrotor
+
+[Source code](https://github.com/Microsoft/AirSim/blob/master/PythonClient/DQNdrone.py)
+
+This example works with AirSimMountainLandscape environment available in [releases](https://github.com/Microsoft/AirSim/releases).
 
-RL with Quadrotor (use AIrSimlandscape environment with this)
 We can similarly apply RL for various autonomous flight scenarios with quadrotors. Below is an example on how RL could be used to train quadrotors to follow high tension powerlines (e.g. application for energy infrastructure inspection).
-There are seven actions here that correspond to different directions in which the quadrotor can move in (six directions + one hovering action)
+There are seven actions here that correspond to different directions in which the quadrotor can move in (six directions + one hovering action).
+
+```
 def interpret_action(action):
     if action == 0:
         quad_offset = (0, 0, 0)
@@ -96,9 +124,11 @@ def interpret_action(action):
     elif action == 6:
         quad_offset = (0, 0, -1)
     return quad_offset
+```
 
 The reward again is a function how how fast the quad travels in conjunction with how far it gets from the known powerlines.
 
+```
 def compute_reward(quad_state, quad_vel, collision_info):
     thresh_dist = 10
     beta = 1
@@ -120,14 +150,20 @@ def compute_reward(quad_state, quad_vel, collision_info):
             reward = 2*(0.5 - math.exp(-beta*dist)) + np.linalg.norm([quad_vel.x_val, quad_vel.y_val, quad_vel.z_val])
 
     return reward
+```
+
 We consider an episode to terminate if it drifts too much away from the known powerline coordinates. 
-The reset function here flies the quadrotor to the initial starting point
+
+The reset function here flies the quadrotor to the initial starting point:
+
+```
     if done:
         client.moveToZ(clearZ, 2)
         client.moveToPosition(initX, initY, clearZ, 2)
         client.moveToPosition(initZ, initY, initZ, 2)
         current_step +=1
-Here is the video of first few episodes during the training.
-<insert quad video>
+```
 
+Here is the video of first few episodes during the training.
 
+[![Reinforcement Learning - Quadrotor](images/dqn_quadcopter.png)](https://youtu.be/uKm15Y3M1Nk)
diff --git a/docs/who_is_using.md b/docs/who_is_using.md
@@ -1,6 +1,6 @@
 # Who is Using AirSim?
 
-#### Would you like to see your own project here? Please add a [GitHub issue](https://github.com/microsoft/airsim/issues) with quick details and link to your website or paper. 
+Would you like to see your own group or project here? Just add a [GitHub issue](https://github.com/microsoft/airsim/issues) with quick details and link to your website or paper or send us email at msrair at microsoft.com.
 
 *  [Astrobotic](https://www.astrobotic.com/technology) - [Kerry Snyder](http://www.ohscope.com/)
 * [Ghent University](https://www.ugent.be) - [Nils Tijtgat](https://timebutt.github.io/static/author/nils/)