Skip to content

Latest commit

 

History

History
124 lines (71 loc) · 4.57 KB

AI_CS188_Application.md

File metadata and controls

124 lines (71 loc) · 4.57 KB

...menustart

...menuend

Application

Starcraft

Why is Starcraft Hard?

  • The game of Starcraft is:
    • Adversarial
    • Long Horizon
    • Partially Observable
      • that's the fog of war
    • Realtime
    • Huge branching factor
    • Concurrent
      • you have a lot of choices that you can command at the same time
      • so your braching factor is huge for your actions because you have a lot of units that could be doing something at any given time
    • Resource-rich
  • No single algorithm (e.g. minimax) will solve it off-the-shelf!

The Berkeley Overmind

  • Search: path planning
  • CSPs: base layout
  • Minimax: targeting
  • Learning: micro control
  • Inference: tracking units
    • sending scouts to gather evidence
  • Scheduling: resources
    • think about what resources you need in which order you need to generate them to be successful
    • it is essentially constraint satisfaction problem solving
  • Hierarchical control
    • plan at different layers
    • at a high level you plan maybe what you're going to build next and so forth
    • at a low level you then instantiate how you build it , what resources you need and so forth

Search for Pathing

  • you are the red unit , the greens are enemy
  • you wanna arrive to top-up screen
  • What's problem here ?
    • the shortest path is very a deadly path
  • this is al search , this is planning ahead , that doesn't just have a cost for length of the path, but also a cost for how dangerous

Minimax for Targeting

In this game there are many things you might want to target. There are production units can produce new unit , there are battle units that can damage you, ...

The question is how I decide what to do ? This is a minimax problem.

Machine Learning for Micro Control

Reinforcemeng learning used a policy search.

This is repreentation for path planing called "potential fields". What you do in potential fields is that with anything that's on the map you associate a repelling cost or an attracting. So there is a repelling force or an attracting force.

So at any given location , you can see what is the force field on you and that will pull or push you in a centain direction.

Now it's difficult to design those force fields to behave well because you need to trade off how powerful each of these forces are how quickly they decay as a function of distance and so forth.

That's a lot like learning a set of ways for a feature vector. So your policy is encoded as a force field there's a weighting between the different contributions to the force filed and then you can run reinforcement learning to learn the weighting that gives you the best performance. So you multiple runs you have certain weighting and see what happends then you change the weighting a little bit. If the result are better you keep the new weighting, if the result are worse you go back to the old weighting and keep repeating this.

Inference / VPI / Scouting

It's really interesting to know what the other units what the other player is doing. So what you want to build depends on what the enemy builds.

You never get to see their plan because you can not see in their mind. But you might see some information about what they're doing.