fixed magent documentation

colin-fox · Nov 29, 2020 · 4282893 · 4282893
1 parent 2912898
commit 4282893
Show file tree

Hide file tree

Showing 7 changed files with 144 additions and 33 deletions.
diff --git a/docs/magent.md b/docs/magent.md
@@ -20,19 +20,15 @@ Gather is a competitive free for all game where agents try to stay alive for as
 
 ### Key Concepts
 
-* **HP decay**: The agents in Gather and the tigers in Tiger-Deer have HP (health points) that decays over time, so the agents slowly lose HP until they die. The only way to prevent this is by eating something.
-
-* **HP recovery**: In battle games, agents recover HP over time, so low HP agents can hide or be protected until they heal.
-
 * **Observation view**: All agents observe a box around themselves. They see whether the coordinates are empty, contain an obstacle, or contain an agent in any of the observation channels. If an agent in on a coordinate, that entry will contain the value (agent's HP / max agent HP).
 
-* **Feature vector**: The feature vector contains `<agent_id, action, last_reward>`
+* **Feature vector**: The feature vector contains information about the agent itself, rather than its surrounding. In normal mode it contains `<agent_id, action, last_reward>`, in minimap mode it also contains the agent position on the map, normalized to 0-1.
 
-* Observation concatenates the 1D feature vector with 3D observation view by repeating the value of the feature across an entire image channel.
+* **Observation**: The observation is 3D observation view concatenated with the 1D feature vector by repeating the value of the feature across an entire image channel.
 
-* **Minimap mode**: For the battle games (Battle, Battlefield, Combined Arms), the agents have access to additional global information: two density maps of the teams' respective presences on the map that are binned and concatenated onto the agent's observation view (concatenated in the channel dimension, axis=2). Their own absolute positions on the global map is appended to the feature vector.
+* **Minimap mode**: For most of the games (Battle, Battlefield, Combined Arms, Gather), the agents have access to additional global information: two density maps of the teams' respective presences on the map that are binned and concatenated onto the agent's observation view (concatenated in the channel dimension, axis=2). Their own absolute positions on the global map is appended to the feature vector. This feature can be turned on or off with the `minimap_mode` environment argument.
 
-* **Moving and attacking**: An agent can only act or move with a single action, so the action space is the concatenations of all possible moves and all possible attacks.
+* **Moving and attacking**: An agent can only act or move each step, so the action space is the concatenations of all possible moves and all possible attacks.
 
 ### Termination
 

diff --git a/docs/magent/adversarial_pursuit.md b/docs/magent/adversarial_pursuit.md
@@ -13,20 +13,26 @@ agent-labels: "agents= [predator_[0-24], prey_[0-49]]"
 
 {% include info_box.md %}
 
-In this environment, red agents work navigate the obstacles and attack the blue agents, who in turn work to avoid attatcks. To be effect the red agents, who are much are slower and larger than the blue agents, must work together to trap blue agents and attack them continually.
+The red agents must navigate the obstacles and tag (attack, but without damaging) the blue agents. The blue agents should try to avoid being tagged. To be effective, the red agents, who are much are slower and larger than the blue agents, must work together to trap blue agents so they can be tagged continually.
+
+#### Action Space
+
+Key: `move_N`: options to move to the N nearest squares.
 
 Predator action options: `[do_nothing, move_4, attack_8]`
 
+#### Reward
+
 Predator's reward is given as:
 
-* 1 reward for attacking a prey
-* -0.2 reward for attacking (attack_penalty option)
+* 1 reward for tagging a prey
+* -0.2 reward for attacking anywhere (`attack_penalty` option)
 
 Prey action options: `[do_nothing, move_8]`
 
 Prey's reward is given as:
 
-* -1 reward for being attacked
+* -1 reward for being tagged
 
 Observation space: `[obstacle, my_team_presence, my_team_presence_health, other_team_presence, other_team_presence_health, one_hot_action, last_reward]`
 
@@ -36,10 +42,10 @@ Observation space: `[obstacle, my_team_presence, my_team_presence_health, other_
 adversarial_pursuit_v2.env(map_size=45, minimap_mode=False, attack_penalty=-0.2, max_cycles=500)
 ```
 
-`map_size`: Sets dimensions of the (square) map. Increasing the size increases the number of agents.
+`map_size`: Sets dimensions of the (square) map. Increasing the size increases the number of agents. Minimum size is 7.
 
 `minimap_mode`: Turns on global minimap observations. These observations include your and your opponents piece densities binned over the 2d grid of the observation space. Also includes your `agent_position`, the absolute position on the map (rescaled from 0 to 1).
 
-`attack_penalty`:  Adds the following value to the reward whenever an attacking action is taken
+`attack_penalty`:  reward when red agents tag anything
 
 `max_cycles`:  number of frames (a step for each agent) until game terminates
diff --git a/docs/magent/battle.md b/docs/magent/battle.md
@@ -15,12 +15,19 @@ agent-labels: "agents= [red_[0-80], blue_[0-80]]"
 
 
 
-A large-scale team battle.
+A large-scale team battle. Agents are rewarded for their individual performance, and not for the performance of their neighbors, so coordination is difficult.  Agents slowly regain HP over time, so it is best to kill an opposing agent quickly.
 
 Like all MAgent environments, agents can either move or attack each turn. An attack against another agent on their own team will not be registered.
 
+#### Action space
+
+Key: `move_N`: options to move to the N nearest squares.
+
 Action options: `[do_nothing, move_12, attack_8]`
 
+
+#### Reward
+
 Reward is given as:
 
 * 5 reward for killing an opponent
@@ -31,24 +38,41 @@ Reward is given as:
 
 If multiple options apply, rewards are added.
 
-Observation space: `[obstacle, my_team_presence, my_team_presence_health, my_team_presence_minimap, other_team_presence, other_team_presence_health, other_team_presence_minimap, binary_agent_id(10), one_hot_action, last_reward, agent_position]`
+#### Observation space
+
+The observation space is a 13x13 map with 41 channels, which are (in order):
+
+name | number of channels
+--- | ---
+obstacle/off the map| 1
+my_team_presence| 1
+my_team_hp| 1
+my_team_minimap| 1
+other_team_presence| 1
+other_team_hp| 1
+other_team_minimap| 1
+binary_agent_id| 10
+one_hot_action| 21
+last_reward| 1
+agent_position| 2
+
 
 ### Arguments
 
 ```
 battle_v2.env(map_size=45, minimap_mode=True, step_reward=-0.005, dead_penalty=-0.1, attack_penalty=-0.1, attack_opponent_reward=0.2, max_cycles=1000)
 ```
 
-`map_size`: Sets dimensions of the (square) map. Increasing the size increases the number of agents.
+`map_size`: Sets dimensions of the (square) map. Increasing the size increases the number of agents. Minimum size is 12.
 
 `minimap_mode`: Turns on global minimap observations. These observations include your and your opponents piece densities binned over the 2d grid of the observation space. Also includes your `agent_position`, the absolute position on the map (rescaled from 0 to 1).
 
 
-`step_reward`:  reward added unconditionally
+`step_reward`:  reward after every step
 
-`dead_penalty`:  reward added when killed
+`dead_penalty`:  reward when killed
 
-`attack_penalty`:  reward added for attacking
+`attack_penalty`:  reward when attacking anything
 
 `attack_opponent_reward`:  reward added for attacking an opponent
 

diff --git a/docs/magent/battlefield.md b/docs/magent/battlefield.md
@@ -17,12 +17,18 @@ agent-labels: "agents= [red_[0-11], blue_[0-11]]"
 
 Same as [battle](./battle) but with fewer agents arrayed in a larger space with obstacles.
 
-A small-scale team battle, where agents have to figure out the optimal way to coordinate their small team in a large space and maneuver around obstacles in order to defeat the opposing team.
+A small-scale team battle, where agents have to figure out the optimal way to coordinate their small team in a large space and maneuver around obstacles in order to defeat the opposing team. Agents are rewarded for their individual performance, and not for the performance of their neighbors, so coordination is difficult.  Agents slowly regain HP over time, so it is best to kill an opposing agent quickly.
 
 Like all MAgent environments, agents can either move or attack each turn. An attack against another agent on their own team will not be registered.
 
+#### Action Space
+
+Key: `move_N`: options to move to the N nearest squares.
+
 Action options: `[do_nothing, move_12, attack_8]`
 
+#### Reward
+
 Reward is given as:
 
 * 5 reward for killing an opponent
@@ -33,15 +39,31 @@ Reward is given as:
 
 If multiple options apply, rewards are added.
 
-Observation space: `[obstacle, my_team_presence, my_team_presence_health, my_team_presence_minimap, other_team_presence, other_team_presence_health, other_team_presence_minimap, binary_agent_id(10), one_hot_action, last_reward, agent_position]`
+#### Observation space
+
+The observation space is a 13x13 map with 41 channels, which are (in order):
+
+name | number of channels
+--- | ---
+obstacle/off the map| 1
+my_team_presence| 1
+my_team_hp| 1
+my_team_minimap| 1
+other_team_presence| 1
+other_team_hp| 1
+other_team_minimap| 1
+binary_agent_id| 10
+one_hot_action| 21
+last_reward| 1
+agent_position| 2
 
 ### Arguments
 
 ```
 battle_v2.env(map_size=80, minimap_mode=True, step_reward-0.005, dead_penalty=-0.1, attack_penalty=-0.1, attack_opponent_reward=0.2, max_cycles=1000)
 ```
 
-`map_size`: Sets dimensions of the (square) map.
+`map_size`: Sets dimensions of the (square) map. Minimum size is 45.
 
 `minimap_mode`: Turns on global minimap observations. These observations include your and your opponents piece densities binned over the 2d grid of the observation space. Also includes your `agent_position`, the absolute position on the map (rescaled from 0 to 1).
 

diff --git a/docs/magent/combined_arms.md b/docs/magent/combined_arms.md
@@ -15,12 +15,18 @@ agent-labels: "agents= [redmelee_[0-44], redranged_[0-35], bluemelee_[0-44], blu
 
 
 
-A large-scale team battle. Here there are two types of agents on each team, ranged units which can attack father and move faster but have less HP, and melee units which can only attack close units and move more slowly but have more HP. Unlike battle and battlefield, agents can attack units on their own team (they just are not rewarded for doing so).
+A large-scale team battle. Here there are two types of agents on each team, ranged units which can attack father and move faster but have less HP, and melee units which can only attack close units and move more slowly but have more HP. Unlike battle and battlefield, agents can attack units on their own team (they just are not rewarded for doing so). Agents slowly regain HP over time, so it is best to kill an opposing agent quickly.
+
+#### Action Space
+
+Key: `move_N`: options to move to the N nearest squares.
 
 Melee action options: `[do_nothing, move_4, attack_4]`
 
 Ranged action options: `[do_nothing, move_12, attack_12]`
 
+#### Reward
+
 Reward is given as:
 
 * 5 reward for killing an opponent
@@ -31,7 +37,22 @@ Reward is given as:
 
 If multiple options apply, rewards are added.
 
-Observation space: `[obstacle, my_group_presence, my_group_presence_health, my_group_presence_minimap, other_team_presences_healths_minimaps(9), binary_agent_id(10), one_hot_action, last_reward, agent_position]`
+
+#### Observation space
+
+The observation space is a 13x13 map with 35 channels for Melee and 51 channels for Ranged units, which are (in order):
+
+name | number of channels
+--- | ---
+obstacle/off the map| 1
+my_team_presence| 1
+my_team_hp| 1
+my_team_minimap| 1
+Other teams presences/heaths/minimaps (in some order) | 9
+binary_agent_id| 10
+one_hot_action| 9 Melee/25 ranged
+last_reward| 1
+agent_position| 2
 
 
 ### Arguments
@@ -40,7 +61,7 @@ Observation space: `[obstacle, my_group_presence, my_group_presence_health, my_g
 combined_arms_v3.env(map_size=45, minimap_mode=True, step_reward=-0.005, dead_penalty=-0.1, attack_penalty=-0.1, attack_opponent_reward=0.2, max_cycles=1000)
 ```
 
-`map_size`: Sets dimensions of the (square) map. Increasing the size increases the number of agents.
+`map_size`: Sets dimensions of the (square) map. Increasing the size increases the number of agents. Minimum size is 16.
 
 `minimap_mode`: Turns on global minimap observations. These observations include your and your opponents piece densities binned over the 2d grid of the observation space. Also includes your `agent_position`, the absolute position on the map (rescaled from 0 to 1).
 

diff --git a/docs/magent/gather.md b/docs/magent/gather.md
@@ -15,10 +15,16 @@ agent-labels: "agents= [ omnivore_[0-494] ]"
 
 
 
-In gather, the agents gain reward by eating food. Agent's don't die unless attacked. You expect to see that agents coordinate by not attacking each other until food is scarce. When food is scarce, agents may attack each other to try to monopolize the food.
+In gather, the agents gain reward by eating food. Food needs to be broken down by several "attacks" before it is absorbed. Since there is finite food on the map, there is competitive pressure between agents over the food. You expect to see that agents coordinate by not attacking each other until food is scarce. When food is scarce, agents may attack each other to try to monopolize the food.
+
+#### Action Space
+
+Key: `move_N`: options to move to the N nearest squares.
 
 Action options: `[do_nothing, move_28, attack_4]`
 
+#### Reward
+
 Reward is given as:
 
 * 5 reward for eating a food (requires multiple attacks)
@@ -27,7 +33,23 @@ Reward is given as:
 * -1 reward for dying (dead_penalty option)
 * 0.5 reward for attacking a food (attack_food_reward option)
 
-Observation space: `[empty, obstacle, omnivore, food, omnivore_minimap, food_minimap, one_hot_action, last_reward, agent_position]`
+#### Observation space
+
+The observation space is a 13x13 map with 41 channels, which are (in order):
+
+name | number of channels
+--- | ---
+obstacle/off the map| 1
+omnivore_presence| 1
+omnivore_hp| 1
+omnivore_minimap| 1
+food_presense| 1
+food_hp| 1
+food_minimap| 1
+one_hot_action| 33
+last_reward| 1
+agent_position| 2
+
 
 ### Arguments
 

diff --git a/docs/magent/tiger_deer.md b/docs/magent/tiger_deer.md
@@ -15,31 +15,51 @@ agent-labels: "agents= [ deer_[0-100], tiger_[0-19] ]"
 {% include info_box.md %}
 
 
+In tiger-deer, there are a number of tigers who are only rewarded for teaming up to take down the deer (two tigers must attack a deer in the same step to receive reward). If they do not eat the deer, they will slowly lose heath until they die. At the same time, the deer are trying to avoid getting attacked. It is not clear what emergent behavior is expected in this environment.
 
-In tiger-deer, there are a number of tigers who are only rewarded for teaming up to take down the deer (two tigers must attack a deer in the same step to receive reward). If they do not eat the deer, they will slowly lose heath until they die. At the same time, the deer are trying to avoid getting attacked.  
+
+#### Action Space
+
+Key: `move_N`: options to move to the N nearest squares.
 
 Tiger action space: `[do_nothing, move_4, attack_4]`
 
+Deer action space: `[do_nothing, move_4]`
+
+#### Reward
+
 Tiger's reward scheme is:
 
 * 1 reward for attacking a deer alongside another tiger
 
-Deer action space: `[do_nothing, move_4]`
-
 Deer's reward scheme is:
 
 * -1 reward for dying
 * -0.1 for being attacked
 
-Observation space: `[obstacle, my_team_presence, my_team_presence_health, other_team_presence, other_team_presence_health, one_hot_action, last_reward]`
+#### Observation space
+
+The observation space is a 3x3 map with 21 channels for deer and 9x9 map with 25 channels for tigers, which are (in order):
+
+name | number of channels
+--- | ---
+obstacle/off the map| 1
+my_team_presence| 1
+my_team_hp| 1
+other_team_presence| 1
+other_team_hp| 1
+binary_agent_id| 10
+one_hot_action| 5 Deer/9 Tiger
+last_reward| 1
+agent_position| 2
 
 ### Arguments
 
 ```
 tiger_deer_v3.env(map_size=45, minimap_mode=False, tiger_step_recover=-0.1, deer_attacked=-0.1, max_cycles=500)
 ```
 
-`map_size`: Sets dimensions of the (square) map. Increasing the size increases the number of agents.
+`map_size`: Sets dimensions of the (square) map. Increasing the size increases the number of agents.  Minimum size is 10.
 
 `minimap_mode`: Turns on global minimap observations. These observations include your and your opponents piece densities binned over the 2d grid of the observation space. Also includes your `agent_position`, the absolute position on the map (rescaled from 0 to 1).