교수자 소개
-
KAIST 산업및시스템공학과 신하용 교수님
교수자 : 신하용 
2001-현재 : KAIST 산업및시스템공학과 교수
1991~2001 : LG전자, ㈜큐빅테크, Chrysler(미) 연구원
대한산업공학회 부회장(저널), 정헌학술대상 수상 (2021)
한국CDE학회 수석부회장, 가헌학술상 수상 (2002, 2005, 2009)
Computer-Aided Design 저널 Editorial board member(2005~)
강의계획
강의
-
8. Deep Q Network
- Neural net
- NN for RL
- DQN
- DQN 개선
- Quiz 8
-
9. Policy based RL : Stochastic Policy Gradient
- Policy based RL
- Policy gradient theorem
- Policy gradient algorithms
- Quiz 9
-
10. Policy based RL : TRPO, PPO
- Revisiting policy gradient
- Trust region policy optimization (TRPO) algorithm
- Proximal Policy Optimization (PPO) algorithm
- Quiz 10
-
11. Policy based RL : DPG, DDPG, CEM
- Theoretical foundation of DPG
- DPG & DDPG algorithms
- Derivative free method and CEM
- Quiz 11
-
12. Exploration vs Exploitation
- Multi-Armed Bandit problem
- Basic MAB algorithm
- Advanced MAB algorithms
- Quiz 12
-
13. Average reward MDP and finite horizon MDP
- Average reward RL
- Finite horizon MDP
- Finite horizon MDP examples
- Quiz 13
-
14. AlphaGo & Reward shaping
- Components of AlphaGo
- Training AlphaGo and MCTS
- AlphaGo Zero and next
- Reward shaping
- Quiz 14