使用增强学习来玩21点游戏
use reinforcement learning to play blackjack
use value function
use mc to sample
result: 随机
total : win = 48185 , lose = 47967 tie= 3847
base策略
total : win = 58928 , lose = 36216 tie= 4855
rl结果
total : win = 65167 , lose = 30227 tie= 4605