Skip to content

Latest commit

 

History

History
107 lines (64 loc) · 3.18 KB

CS294.md

File metadata and controls

107 lines (64 loc) · 3.18 KB

...menustart

...menuend

CS294: Deep Reinforcement Learning, Spring 2017

http://rll.berkeley.edu/deeprlcourse

https://www.reddit.com/r/berkeleydeeprlcourse/

Week1 : Introduction

Why deep reinforcement learning?

  • Deep = can process complex sensory input
    • … and also compute really complex functions
  • Reinforcement learning = can choose complex actions

What is Reinforcement Learning?

  • agent interacting with a previously unknown environment, trying to maximize cumulative reward
  • Formalized as partially observable Markov decision process (POMDP)

Motor Control and Robotics

Robotics:

  • Observations: camera images, joint angles
  • Actions: joint torques
  • Rewards: stay balanced, navigate to target locations, serve and protect humans

Business Operations

Inventory Management

  • Observations: current inventory levels
  • Actions: number of units of each item to purchase
  • Rewards: profit

What Is Deep Reinforcement Learning?

  • Reinforcement learning using neural networks to approximate functions
    • Policies (select next action)
    • Value functions ( measure goodness of states or state-action pairs )
    • Dynamics Models (predict next states and rewards)
      • try to approximate how the system is going to evolve over time

How Does RL Relate to Other ML Problems?

  • Reinforcement learning:
    • Environment samples input xt ~ P(xt | xt-1, yt-1)
      • Environment is stateful: input depends on your previous actions!
      • Agent takes action ŷt = f(xt)
    • Agent receives cost ct ~ P(ct | xt , ŷt ) where P a probability distribution unknown to the agent.

Week2 : Supervised learning and decision making (Levine)

Terminology & notation

  • xt : state

  • ot : observation

  • ut : action

  • π(uθ|ot) : policy

  • c : cost function

  • r : reward

    • c = - r