sunhangqi · Dec 15, 2016
diff --git a/‎README.md
+19-120 b/‎README.md
+19-120
diff --git a/‎RL/1_command_line_reinforcement_learning/treasure_on_right.py
+107 b/‎RL/1_command_line_reinforcement_learning/treasure_on_right.py
+107
diff --git a/‎RL/q_learning_example/RL_brain.py ‎RL/2_Q_Learning_maze/RL_brain.py b/‎RL/q_learning_example/RL_brain.py ‎RL/2_Q_Learning_maze/RL_brain.py
diff --git a/‎RL/q_learning_example/maze_env.py ‎RL/2_Q_Learning_maze/maze_env.py b/‎RL/q_learning_example/maze_env.py ‎RL/2_Q_Learning_maze/maze_env.py
diff --git a/‎RL/q_learning_example/run_this.py ‎RL/2_Q_Learning_maze/run_this.py b/‎RL/q_learning_example/run_this.py ‎RL/2_Q_Learning_maze/run_this.py
diff --git a/‎RL/sarsa_example/RL_brain.py ‎RL/3_Sarsa_maze/RL_brain.py b/‎RL/sarsa_example/RL_brain.py ‎RL/3_Sarsa_maze/RL_brain.py
diff --git a/‎RL/sarsa_example/maze_env.py ‎RL/3_Sarsa_maze/maze_env.py b/‎RL/sarsa_example/maze_env.py ‎RL/3_Sarsa_maze/maze_env.py
diff --git a/‎RL/sarsa_example/run_this.py ‎RL/3_Sarsa_maze/run_this.py b/‎RL/sarsa_example/run_this.py ‎RL/3_Sarsa_maze/run_this.py
diff --git a/‎RL/sarsa_lambda_example/RL_brain.py ‎RL/4_Sarsa_lambda_maze/RL_brain.py b/‎RL/sarsa_lambda_example/RL_brain.py ‎RL/4_Sarsa_lambda_maze/RL_brain.py
diff --git a/‎RL/sarsa_lambda_example/maze_env.py ‎RL/4_Sarsa_lambda_maze/maze_env.py b/‎RL/sarsa_lambda_example/maze_env.py ‎RL/4_Sarsa_lambda_maze/maze_env.py
diff --git a/‎RL/sarsa_lambda_example/run_this.py ‎RL/4_Sarsa_lambda_maze/run_this.py b/‎RL/sarsa_lambda_example/run_this.py ‎RL/4_Sarsa_lambda_maze/run_this.py
@@ -2,7 +2,7 @@
 这个 github 是我做 python 机器学习视频教程的支柱. 里面都是视频中提到的代码源码. 主要做的方面是机器学习的, 包括神经网络等等. 
 自己的 research 是在神经网络和强化学习方面, 所以以后也会补上强化学习的教程, 敬请期待. 欢迎大家分享传播, 让我们都能轻松的学习~
 
-[<img src="https://github.com/MorvanZhou/tutorials/blob/master/%E7%89%87%E5%A4%B4.png?raw=true" height="200">](http://morvanzhou.github.io/tutorials/)
+[<img src="https://github.com/MorvanZhou/tutorials/blob/master/%E7%89%87%E5%A4%B4.png?raw=true" height="200">](https://morvanzhou.github.io/tutorials/)
 
 
 ## 大陆没办法翻墙, 我们还有优酷, 网易云
@@ -15,123 +15,22 @@
 [Youtube 频道主页](https://www.youtube.com/channel/UCdyjiB5H8Pu7aDTNVXTTpcg)
 
 
-## 以下内容是频道内的结构咯
-### Python 基础
-对于 python 还很陌生的同学们,没问题, 这里是一个快速入门的通道. 这不一定是最好的通道, 不过是通往机器学习道路上比较好的通道.我筛选了一些没必要的内容, 只留下专门为了机器学习有关的内容.
-
-[Python 基础 教程视频(Youtube)](https://www.youtube.com/playlist?list=PLXO45tsB95cIRP5gCi8AlYwQ1uFO2aQBw)
-
-[Python 基础 教程视频(优酷)](http://list.youku.com/albumlist/show?id=27312381&ascending=1&page=1)
-
----
-
-### [机器学习-简介系列](https://github.com/MorvanZhou/tutorials/blob/master/ML_intro/README.md)
-[<img src='https://github.com/MorvanZhou/tutorials/blob/master/ML_intro/ML%20brief%20intro.png?raw=true' height=200>](https://github.com/MorvanZhou/tutorials/blob/master/ML_intro/README.md)
-
-这是对于机器学习的各种方法的一个简短介绍, 还有学好机器学习很多实用的小技巧.
-
-[机器学习-简介系列 视频教程(Youtuebe)](https://www.youtube.com/playlist?list=PLXO45tsB95cIFm8Y8vMkNNPPXAtYXwKin)
-
-[机器学习-简介系列 视频教程(优酷)](http://list.youku.com/albumlist/show?id=27892935&ascending=1&page=1)
-
-以及<机器学习-简介系列>的[学习目录](https://github.com/MorvanZhou/tutorials/blob/master/ML_intro/README.md)
-
----
-
-### [Tensorflow](https://github.com/MorvanZhou/tutorials/blob/master/tensorflowTUT/Readme.md)
-[<img src="https://github.com/MorvanZhou/tutorials/blob/master/tensorflowTUT/Tensorflow%20course%20cover%20page.jpg?raw=true" height='200'>](https://github.com/MorvanZhou/tutorials/blob/master/tensorflowTUT/Readme.md)
-
-Tensorflow 是神经网络方面的主力军之一. 这一套教程从最基础的方面,一直简单风趣地讲解到最高级的方面. 是入门的Tensorflow 神经网络的首选.
-
-[Tensorflow 神经网络 深度学习 视频教程(Youtube)](https://www.youtube.com/playlist?list=PLXO45tsB95cKI5AIlf5TxxFPzb-0zeVZ8)
-
-[Tensorflow 神经网络 深度学习 视频教程(优酷)](http://www.youku.com/playlist_show/id_27327189.html)
-
-以及 Tensorflow 的[学习目录](https://github.com/MorvanZhou/tutorials/blob/master/tensorflowTUT/Readme.md)
-
----
-
-### [SciKit-Learn (sklearn)](https://github.com/MorvanZhou/tutorials/blob/master/sklearnTUT/README.md)
-[<img src='https://github.com/MorvanZhou/tutorials/blob/master/sklearnTUT/sklearn%20cover%20page.jpg?raw=true' height='200'>](https://github.com/MorvanZhou/tutorials/blob/master/sklearnTUT/README.md)
-
-SciKit-Learn 汇集了各种各样的机器学习方法, 是一个全方面的汇总. 对于不同的项目, 我们可能会用到不同的机器学习方法. 掌握 sklearn 的通用学习形式, 我们就能打遍天下无敌是.
-
-[scikit-learn 机器学习 视频教程(Youtube)](https://www.youtube.com/playlist?list=PLXO45tsB95cI7ZleLM5i3XXhhe9YmVrRO)
-
-[scikit-learn 机器学习 视频教程(优酷)](http://www.youku.com/playlist_show/id_27469882.html)
-
-以及 Sklearn 的[学习目录](https://github.com/MorvanZhou/tutorials/blob/master/sklearnTUT/README.md)
-
----
-
-### [Theano](https://github.com/MorvanZhou/tutorials/blob/master/theanoTUT/README.md)
-[<img src='https://github.com/MorvanZhou/tutorials/blob/master/theanoTUT/theano%20cover%20page.jpg?raw=true' height='200'>](https://github.com/MorvanZhou/tutorials/blob/master/theanoTUT/README.md)
-
-Theano 算得上是 Tensorflow 的前身, 和 Tensorflow 有着类似的结构形式. 目前的 Tensorflow 还不能很好的支持 Windows 系统, 所以如果想在 Windows 系统上玩转神经网络, Theano 是一个很好的替代选择.
-
-[Theano 神经网络 机器学习 视频教程(Youtube)](https://www.youtube.com/playlist?list=PLXO45tsB95cKpDID642AjNkygrSR5X15T)
-
-[Theano 神经网络 机器学习 视频教程(优酷)](http://www.youku.com/playlist_show/id_27743371.html)
-
-以及 Theano 的[学习目录](https://github.com/MorvanZhou/tutorials/blob/master/theanoTUT/README.md)
-
-
----
-
-### [Matplotlib](https://github.com/MorvanZhou/tutorials/blob/master/matplotlibTUT/README.md)
-[<img src='https://github.com/MorvanZhou/tutorials/blob/master/matplotlibTUT/cover%20page.jpg?raw=true' height='200'>](https://github.com/MorvanZhou/tutorials/blob/master/matplotlibTUT/README.md)
-
-有了机器学习的各种方法, 我很还需要很多其他的工具来完善学习过程, 提高自己的学习效率. 其中之一就是把数据,结果都可视化出来. 有了直观的可视化结果, 我们就能更好的掌握知识. Matplotlib 就是 python 的一个可视化神器.
-
-[Matplotlib 图像可视化 视频教程(Youtube)](https://www.youtube.com/playlist?list=PLXO45tsB95cKiBRXYqNNCw8AUo6tYen3l)
-
-[Matplotlib 图像可视化 视频教程(优酷)](http://www.youku.com/playlist_show/id_28097045.html)
-
-以及 Matplotlib 的[学习目录](https://github.com/MorvanZhou/tutorials/blob/master/matplotlibTUT/README.md)
-
----
-
-### [Numpy & Pandas](https://github.com/MorvanZhou/tutorials/blob/master/numpy%26pandas/README.md)
-[<img src='https://github.com/MorvanZhou/tutorials/blob/master/numpy&pandas/cover%20page.jpg?raw=true' height='200'>](https://github.com/MorvanZhou/tutorials/blob/master/numpy%26pandas/README.md)
-
-科学运算是工科生必不可少的一个工具, numpy 和 pandas 就是为了我们在 python 中运用高效的科学运算所开发出来的.
-
-[Numpy & Pandas 数据处理 视频教程(Youtube)](https://www.youtube.com/playlist?list=PLXO45tsB95cKKyC45gatc8wEc3Ue7BlI4)
-
-[Numpy & Pandas 数据处理 视频教程(优酷)](http://www.youku.com/playlist_show/id_27329155.html)
-
-以及 Numpy & Pandas 的[学习目录](https://github.com/MorvanZhou/tutorials/blob/master/numpy%26pandas/README.md)
-
----
-
-### Python Multiprocessing
-这是 python 的基础教程之一, 对于了解如何高效的运用我们的计算机运算能力有很好的帮助.
-
-[Multiprocessing 多进程 视频教程(Youtube)](https://www.youtube.com/playlist?list=PLXO45tsB95cJgYDaJbwhg629-Il5cfkhe)
-
-[Multiprocessing 多进程 视频教程(优酷)](http://www.youku.com/playlist_show/id_27423283.html)
-
----
-
-### Python Threading
-同上, 可是在机器学习的运用中, 有时并没有 multiprocessing 模块好.
-
-[Threading 多线程 视频教学(Youtube)](https://www.youtube.com/playlist?list=PLXO45tsB95cKaHtKLn-jat8SOGndS3MEt)
-
-[Threading 多线程 视频教学(优酷)](http://www.youku.com/playlist_show/id_27399497.html)
-
----
-
-### [Python Tkinter](https://github.com/MorvanZhou/tutorials/blob/master/tkinterTUT/README.md)
-[<img src='https://github.com/MorvanZhou/tutorials/blob/master/tkinterTUT/cover%20page.jpg?raw=true' height='200'>](https://github.com/MorvanZhou/tutorials/blob/master/tkinterTUT/README.md)
-
-Tkinter 是 python 自带的可视化窗口包, 可以用来做 simulation 的可视化, 而且他也是跨平台兼容. 
-
-[Tkinter 跨平台 GUI 视频教学(Youtube)](https://www.youtube.com/playlist?list=PLXO45tsB95cJU56K4EtkG0YNGBZCuDwAH)
-
-[Tkinter 跨平台 GUI 视频教学(优酷)](http://www.youku.com/playlist_show/id_27433146.html)
-
-以及 tkinter 的[学习目录](https://github.com/MorvanZhou/tutorials/blob/master/tkinterTUT/README.md)
-
----
+全部内容:
+
+* [Python 基础](https://morvanzhou.github.io/tutorials/python-basic/)
+  * [基础](https://morvanzhou.github.io/tutorials/python-basic/basic/)
+  * [多线程 threading](https://morvanzhou.github.io/tutorials/python-basic/threading/)
+  * [多进程 multiprocessing](https://morvanzhou.github.io/tutorials/python-basic/multiprocessing/)
+  * [简单窗口 tkinter](https://morvanzhou.github.io/tutorials/python-basic/tkinter/)
+* [机器学习](https://morvanzhou.github.io/tutorials/machine-learning/)
+  * [有趣的机器学习](https://morvanzhou.github.io/tutorials/machine-learning/ML-intro/)
+  * [Tensorflow](https://morvanzhou.github.io/tutorials/machine-learning/tensorflow/)
+  * [Theano](https://morvanzhou.github.io/tutorials/machine-learning/theano/)
+  * [Keras](https://morvanzhou.github.io/tutorials/machine-learning/keras/)
+  * [Scikit-Learn](https://morvanzhou.github.io/tutorials/machine-learning/sklearn/)
+* [数据处理](https://morvanzhou.github.io/tutorials/data-manipulation/)
+  * [Numpy & Pandas](https://morvanzhou.github.io/tutorials/data-manipulation/np-pd/)
+  * [Matplotlib](https://morvanzhou.github.io/tutorials/data-manipulation/plt/)
+* [其他](https://morvanzhou.github.io/tutorials/others/)
+  * [Git 版本管理](https://morvanzhou.github.io/tutorials/others/git/)
 
@@ -0,0 +1,107 @@
+"""
+A simple example for Reinforcement Learning using table lookup Q-learning method.
+An agent "o" is on the left of a 1 dimensional world, the treasure is on the rightmost location.
+Run this program and to see how the agent will improve its strategy of finding the treasure.
+
+View more on 莫烦Python: https://morvanzhou.github.io/tutorials/
+"""
+
+import numpy as np
+import pandas as pd
+import time
+
+np.random.seed(2)  # reproducible
+
+
+N_STATES = 6   # the length of the 1 dimensional world
+ACTIONS = ['left', 'right']     # available actions
+EPSILON = 0.9   # greedy police
+ALPHA = 0.1     # learning rate
+LAMBDA = 0.9    # discount factor
+MAX_EPISODES = 13   # maximum episodes
+FRESH_TIME = 0.3    # fresh time for one move
+
+
+def build_q_table(n_states, actions):
+    table = pd.DataFrame(
+        np.zeros((n_states, len(actions))),     # q_table initial values
+        columns=actions,    # actions's name
+    )
+    # print(table)    # show table
+    return table
+
+
+def choose_action(state, q_table):
+    # This is how to choose an action
+    state_actions = q_table.iloc[state, :]
+    if (np.random.rand() > EPSILON) or (state_actions.all() == 0):  # act non-greedy or state-action have no value
+        action_name = np.random.choice(ACTIONS)
+    else:   # act greedy
+        action_name = state_actions.argmax()
+    return action_name
+
+
+def get_env_feedback(S, A):
+    # This is how agent will interact with the environment
+    if A == 'right':    # move right
+        if S == N_STATES - 2:   # terminate
+            S_ = 'terminal'
+            R = 1
+        else:
+            S_ = S + 1
+            R = 0
+    else:   # move left
+        R = 0
+        if S == 0:
+            S_ = S  # reach the wall
+        else:
+            S_ = S - 1
+    return S_, R
+
+
+def update_env(S, episode, step_counter):
+    # This is how environment be updated
+    env_list = ['-']*(N_STATES-1) + ['T']   # '---------T' our environment
+    if S == 'terminal':
+        interaction = 'Episode %s: total_steps = %s' % (episode+1, step_counter)
+        print('\r{}'.format(interaction), end='')
+        time.sleep(2)
+        print('\r                                ', end='')
+    else:
+        env_list[S] = 'o'
+        interaction = ''.join(env_list)
+        print('\r{}'.format(interaction), end='')
+        time.sleep(FRESH_TIME)
+
+
+def rl():
+    # main part of RL loop
+    q_table = build_q_table(N_STATES, ACTIONS)
+    for episode in range(MAX_EPISODES):
+        step_counter = 0
+        S = 0
+        is_terminated = False
+        update_env(S, episode, step_counter)
+        while not is_terminated:
+
+            A = choose_action(S, q_table)
+            S_, R = get_env_feedback(S, A)  # take action & get next state and reward
+            q_predict = q_table.ix[S, A]
+            if S_ != 'terminal':
+                q_target = R + LAMBDA * q_table.iloc[S_, :].max()   # next state is not terminal
+            else:
+                q_target = R     # next state is terminal
+                is_terminated = True    # terminate this episode
+
+            q_table.ix[S, A] += ALPHA * (q_target - q_predict)  # update
+            S = S_  # move to next state
+
+            update_env(S, episode, step_counter+1)
+            step_counter += 1
+    return q_table
+
+
+if __name__ == "__main__":
+    q_table = rl()
+    print('\r\nQ-table:\n')
+    print(q_table)