...menustart
- CS294: Deep Reinforcement Learning, Spring 2017
- Week1 : Introduction
- Week2 : Supervised learning and decision making (Levine)
...menuend
http://rll.berkeley.edu/deeprlcourse
https://www.reddit.com/r/berkeleydeeprlcourse/
- Deep = can process complex sensory input
- … and also compute really complex functions
- Reinforcement learning = can choose complex actions
- agent interacting with a previously unknown environment, trying to maximize cumulative reward
- Formalized as partially observable Markov decision process (POMDP)
Robotics:
- Observations: camera images, joint angles
- Actions: joint torques
- Rewards: stay balanced, navigate to target locations, serve and protect humans
Inventory Management
- Observations: current inventory levels
- Actions: number of units of each item to purchase
- Rewards: profit
- Reinforcement learning using neural networks to approximate functions
- Policies (select next action)
- Value functions ( measure goodness of states or state-action pairs )
- Dynamics Models (predict next states and rewards)
- try to approximate how the system is going to evolve over time
- Reinforcement learning:
- Environment samples input xt ~ P(xt | xt-1, yt-1)
- Environment is stateful: input depends on your previous actions!
- Agent takes action ŷt = f(xt)
- Agent receives cost ct ~ P(ct | xt , ŷt ) where P a probability distribution unknown to the agent.
- Environment samples input xt ~ P(xt | xt-1, yt-1)
-
xt : state
-
ot : observation
-
ut : action
-
π(uθ|ot) : policy
-
c : cost function
-
r : reward
- c = - r