강의 도움말
-
8. Deep Q Network강의시간01:17:28
-
9. Policy based RL : Stochastic Policy Gradient강의시간41:54
-
10. Policy based RL : TRPO, PPO강의시간44:54
-
11. Policy based RL : DPG, DDPG, CEM강의시간43:36
-
12. Exploration vs Exploitation강의시간49:00
-
13. Average reward MDP and finite horizon MDP강의시간38:12
-
14. AlphaGo & Reward shaping강의시간52:18
준비중입니다.