
Instructor Introduction
-
KAIST 산업및시스템공학과 신하용 교수님
교수자 : 신하용 
2001-현재 : KAIST 산업및시스템공학과 교수
1991~2001 : LG전자, ㈜큐빅테크, Chrysler(미) 연구원
대한산업공학회 부회장(저널), 정헌학술대상 수상 (2021)
한국CDE학회 수석부회장, 가헌학술상 수상 (2002, 2005, 2009)
Computer-Aided Design 저널 Editorial board member(2005~)
Lecture plan
강의
-
1. Introduction
- 강화학습이란?
- 강화학습의 특징과 사례
- 동적 시스템
- 강화학습 구성요소
- Quiz 1
-
2. Markov Decision Process
- Markov Chain
- Markov Reward Process
- Markov Decision Process
- Quiz 2
-
3. Dynamic Programming
- Dynamic programming?
- Policy evaluation
- Optimal policy
- Asynchronous DP
- Quiz 3
-
4. Monte Carlo methods
- Monte Carlo method 개요
- Stochastic approximation
- MC policy evaluation
- MC control
- Quiz 4
-
5. Temporal difference methods
- TD learning 개요
- TD control
- Q learning
- Double Q learning
- Quiz 5
-
6. n-Step TD methods
- n-step return
- TD(λ) policy evaluation
- Eligibility trace와 TD control
- Q(λ) algorithm
- Quiz 6
-
7. Value function approximation
- Value function approximation 개요
- Features for VFA
- Application of VFA : Cartpole
- Linear VFA for Cartpole
- Quiz 7