際際滷

際際滷Share a Scribd company logo
????? ??? ??
? ?
03.30
????? ??? ??? ??
???? ?????? ??? ??? ??
?? ?? ?? ??????.
??
??? ???
City University of New York -Baruch College
Data Science ??
ConnexionAI ???
Freelancer Data Scientist
?????? ???? ???
Github:
https://github.com/wonseokjung
Facebook:
https://www.facebook.com/ws.jung.798
Blog:
https://wonseokjung.github.io/
1. Dynamic Programming
a. Policy iteration
b. Value iteration
2. Monte Carlo method
3. Temporal-Difference Learning
a. Sarsa
b. Q-learning
4. ??? ????? ??? ?? ? ????? ?? ??
5. DQN? ??? ???? ????? ???
??
1. Dynamic Programming
a. Policy iteration
b. Value iteration
2. Monte Carlo method
3. Temporal-Difference Learning
a. Sarsa
b. Q-learning
4. ????? ?? ?? ? ??? ????? ??? ??
5. DQN? ??? ???? ????? ???
Model-free
Model-based
Deeplearning?
+?
RL
??
1. Dynamic Programming
a. Policy iteration
b. Value iteration
2. Monte Carlo method
3. Temporal-Difference Learning
a. Sarsa
b. Q-learning
4. ????? ?? ?? ? ??? ????? ??? ??
5. DQN? ??? ???? ????? ???
Grid world
??
Before Deeplearning After Deeplearning
Tabular Image,text,voice´
?? ??? ??
Classic RL + DeepLearning = !
?? ????? ??? ??
Q-learning + CNN -> DQN
DQN
? Level? State? ??? ??? General agent??
???? ???.
??? ?? ???
????? ??
University of California, Berkeley
ICML 2017
Curiosity-driven Exploration by
Self-supervised Prediction
https://github.com/wonseokjung/KIPS_Reinforcement
?????
Code + Jupyter Notebook ?? + ??? ?? ??
????? ???? ????? ?????!
Markov Decision Process
Return of Episode
Episode ?? Return? Reward ? ?
Total Reward
Discounted Return
Discounted factor? ??? Reward? ?
Total Reward with Discounted
MDP ??? 5 x 5 Grid world
Grid World Environment
MDP??? 5 x 5 Grid world
State : ???? ??
Action : ?, ? , ?, ?
Reward : ?? = -1, ?? = 1?
Transition Probability : 1
Discount factor : 0.9
Reward
+ 1
Reward
-1
State
Action
Grid World Environment
State-value function
(Policy? ?? state-value function)
State value
State-value function
(Policy? ?? state-value function)
State value
Action Value function
(Policy? ?? action-value function)
State-action value
Bellman equation !
A Fundamental property of value function
Optimal Policy? ?? - state
Value? ??? !
Optimal state value function
Optimal Policy ? ?? - state action
Value? ??? !
Optimal state-action value function
Bellman equation + Optimality
Bellman optimality equation v*
Bellman equation + Optimality
Bellman optimality equation q*
??
MDP
Return Episode
Return Epsisode(discount)
State-value function
Action-value function
Optimal PolicyBellmanEquationBellman optimal equation
Bellman Equation + Optimal Policy
Dynamic Programming
??
State, Reward, Action
??
Transition Probability
?? ????.
Dynamic Programming?
Value function? ???? ?? ?? Policy
? ???? ??? ??? ???? ??.
Dynamic programming? Key idea!
Dynamic programming
5 x 5 Grid world?? Dynamic Programming
Grid World Environment
5 x 5 Grid world
State : ???? ??
Action : ?, ? , ?, ?
Reward : ?? = -1, ?? = 1?
Transition Probability : 1
Discount factor : 0.9
Reward
+ 1
Reward
-1
?? state
Action
Grid World Environment
?? state
?? state
Update Rule?
Bellman equation? ???? ??????.
State!
? ??? Optimal Value functions
State Value
Bellman optimality equations ??
? ??? Optimal Value functions
Action Value
Bellman optimality equations ??
Dynamic Programming?
? ??? ?? Value function
State-action Value function
Policy Iteration
Value Iteration
Dynamic Programming
Policy iteration
1.Policy? ?? state-value? ????!
Policy Evaluation
2. ? ?? Policy? ??!
Policy Improvement
??? Policy?
???? ???
??
Policy iteration- Policy Evaluation
Update Rule? ???? Evaluation? ??.
Value update
Policy Transition
Probability
Reward Next State?
estimated value
1. ?? state? V(s) = 0 ?? ??? ???.
2. ? state? Update Rule? ???? V(s)? ???? ??.
Policy iteration- Policy Evaluation
3. ?????? V(s)? ???? ?? ??? ????? ???.
Policy? ?? state-value? ????!
Policy iteration- Improvement
Policy? ?? Value function? ??? ??? ? ??
Policy? ???????.
Greedy Policy
Policy iteration- Improvement
Greedy Policy ??
Policy iteration
Policy iteration? Optimal policy? ?????
Policy Evaluation? Policy Improvement? ????.
Grid World Environment
5 x 5 Grid world
5 x 5 Grid world
State : ???? ??
Action : ?, ? , ?, ?
Reward : ?? ?????? -1
Transition Probability : 1
Discount factor : 0.9
Grid World Environment - Policy iteration
Reward
-1
Goal
Action
Goal
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Reward
-1
Grid World Environment - Policy iteration
k=0 ?? (???)
Vk Greed Policy
0.00.0 0.0 0.0 0.0
0.00.0 0.0 0.0 0.0
0.00.0 0.0 0.0 0.0
0.00.0 0.0 0.0 0.0
0.00.0 0.0 0.0 0.0
Vk Greed Policy
-1.00.0
0.0
k=1
-1.0 -1.0 -1.0
-1.0-1.0 -1.0 -1.0 -1.0
-1.0-1.0 -1.0 -1.0 -1.0
-1.0-1.0 -1.0 -1.0 -1.0
-1.0-1.0 -1.0 -1.0
Grid World Environment - Policy iteration
Vk Greed Policy
-1.70.0
0.0
k=2
-2.0 -2.0 -2.0
-2.0-1.7
-2.0
-1.7
-1.7
Grid World Environment - Policy iteration
-2.0 -2.0 -2.0
-2.0 -2.0 -2.0 -2.0
-2.0 -2.0 -2.0 -2.0
-2.0 -2.0 -2.0
Vk Greed Policy
-900.0
0.0
k=inf
-98 -99 -100
-97-90
-91
-90
-90
Grid World Environment - Policy iteration
-98
-99
-99
-98 -99
-98
-98
-99 -99 -98 -97
-100 -99 -98
Policy iteration
??
Policy Iteration
Value Iteration
Dynamic Programming
Vk Greed Policy
-900.0
0.0
k=????
-98 -99 -100
-97-90
-91
-90
-90
Grid World Environment - Value iteration
-98
-99
-99
-98 -99
-98
-98
-99 -99 -98 -97
-100 -99 -98
Value Iteration
???? ???!
State, Action!
Value iteration
??
Model? ?????
??? ??? ??? ??? ????? ????.
Monte Carlo method
Monte Carlo method? Dynamic programing??
?? ??? ?? ?? ?? ?? ??
??? ??? ?? ??? ????? ??.
Monte Carlo
??? ??? ?? ??? ??? ?? ?? environment?
??? ??? ??? ??? ?? optimal behavior? ???
???
Monte Carlo
Monte Carlo
Monte Carlo? episode-by-episode? ???? ??
????? ??? ???? terminal state ??
?? ???? ??.
Monte Carlo? ??? ?? return ? sample? ????
state-action value? ???? ??????.
Goal
Monte Carlo-GridWorld
??? ??? Update
Start
Monte Carlo
??
Temporal-Difference Learning
?? ????? ??? ? ?? ????? ??? ?
?? TD(temporal-difference) learning ? ???.
-Sutton
Temporal-Difference Learning
Temporal-Difference Learning
Monte Carlo + Dynamic programming
Monte Carlo ?? ?? ?? ??? ??? value? ????
DP?? ??? ?? ??? ???? value? estimate???? ??
Temporal-Difference Learning
?? state?? action? ???? ?? Reward ? ?? State? discount factor? ???
state value? estimate?? update??.
Monte Carolo??? Gt? ??? ???? ?? :
TD? ??
1. Monte Carlo? ??? ??? ??? ?? ??? ????
2. Dynamic programming?? ?? On-line ????.
3. ??? ???? ???, ???? update? ????? episode?
?? ??? ?? continue? model?? ???? ??.
TD? ????
Temporal-Diffrenece Learning
Sarsa Q-learning
Temporal-Diffrenece Learning? Sarsa? Q-learning? ?? ????? ???.
On-policy Off-policy
Sarsa
Q-learning
Temporal-Diffrenece Learning
Sarsa
on-policy ??? ???? Sarsa
state-value function ?? action-value function? ??
Sarsa
?? time step ?? state? action? ?? ???? action value? estimate??.
Sarsa-pseudo code
Sarsa-pseudo code
On-policy
Sarsa-gridworld
Goal
StartAt+1 St+1
Sarsa-gridworld
1.1
2.0
0.0
1.0
0.80.9 0.7 0.6 0.5
???? ?? ????? ??? ??.
Sarsa-gridworld
1.1
2.0
0.0
1.0
0.80.9 0.7 0.6 0.5
??? ??? ??? action-value? ??????.
Policy? On-policy
0.8
0.7
0.6
2.0 1.1 1.0 0.9
Sarsa
??
Sarsa
Q-learning
Temporal-Diffrenece Learning
Q-learning
Q-learning??? ??? off-policy TD control?? ????? ???? ??? ???.
-(Watkins, 1989)
exploration? exploitation? ?? ??.
Q-learning-pseudo code
Off-policy
qlearning-gridworld
Goal
Start
Argmax
St+1
Q-learning- gridworld
1.1
2.0
0.0
1.0
0.80.9 0.7 0.6 0.5
???? ?? ????? ??? ??.
Q-learning- gridworld
1.1
2.0
0.0
1.0
0.80.9 0.7 0.6 0.5
??? ??? ??? action-value? ??????.
Policy? Off-policy
0.8
0.7
0.6
2.0 1.1 1.0 0.9
Q-learning
??
??? ????? ??? ?? ????? ?? ??
Deeplearning
https://goo.gl/images/VA89CC
Deeplearning?? ??
https://chaosmail.github.io/deeplearning/2016/10/22/intro-to-deep-learning-for-computer-vision/
??? input?? ???? ????
Deep Reinforcement Learning
Deeplearning+Reinforcement Learning
https://goo.gl/images/oNu5Gr
Deepmind, DQN
https://www.youtube.com/watch?v=V1eYniJ0Rnk
Deeplearning? ????? ????, ???? ???? ??? ????? ??
DQN, Keras(breakout)
????
1.Main
2.library
3.Function
DQN, Keras(breakout)
????
1.Main
2.library
3.DQN
1. ??? ????.
2. agent? ????.
3. score, episode, global_step ? ????.
4. ??? ?????? ??? ??
- ?? ???? ????.
- ??? ???? ????.
- ?? ?????? ?? action? ??? ????
?? ??.
- ?? ?? ???? ??? ??? ???.
- ????? ???? ??? ????? ???.
Main-1
* ??? ????? ?? ???? while?? ??
?.
-render? ??? ????. (render? ??? ?
? ?? ??? ????.
- ??? ??? ??? ????.
- ??? ??? ????.
- ??? state( history )? ???? action?
????.
Main-2
Main-3
- ??? action?? ??? ?????? ???? ?? ???
?, reward, done, info? ?? ???.
- ?? ?? ????? ?? ??? ???.
- history ?? ?? ??? ?? ??? state? ???
next_history? ??
- q_max? ??? ???? ??? ??? model ? ?? ?? Q
?? max? agent.avg_q_max? ???.
- ?? dead? ?? dead? True? ???, start_life? ??
????.
Main-4
- ?????? target model? update??.
- ??? ??? dead = false? ??? ???
next history ?? history? ???.
- ??? done?? ????? ????? ??
?? ????.
- ?? ?????? ??? ????.
DQN, Keras(breakout)
????
1.Main
2.library
3.DQN
Main-4
- ?? ??? ????? ??? ? ??? ???? ??? -1~1
? ??.
- s,a,r,s'? ???? ???? ????.
- ???? ???? ????? ???? ??? ????.
Main-4
- ?????? target model? update??.
- ??? ??? dead = false? ??? ???
next history ?? history? ???.
- ?? ?????? ??? ????.
Import-1
1.??? ?????? ????.
a.Keras
* CNN layer
* Dense layer
* optimizer
* ?????? ??? ??
Import-2
b. ??? ???
* input?? ???? ??? ?? ??
* RGB? Gray? ??? ?????
* replay memory???
c.Tensorflow
* tensorflow backend
* tensorflow
d. ??
* numpy
* random
* gym
* os
DQN, Keras(breakout)
????
1.Main
2.library
3.DQN
DQN-1
? - render ? ??
? - model? load ??
? - state ???
? - action ???
? - epsilon?
? - epsilon? ???? ?? ( decay? ?
? )
? - epsilon? decay step ??
?
???
DQN-2
? - ???? ????? ?? ????? ??
? - ??? ??? ?? ??
? - ?? ??? ???? ?? ??
? - discount factor
? - ??????? ???? ??
? - ????? action? ??? ???? ??
? - Deeplearning model
? - Target model
? - update target model
?
???
DQN-3
? - optimizer
? - Tensorboard
?
???-2
Save? ??? ???? ???? ??
DQN-4
Keras? ??? ?? ???
CNN Layers
Dense Layer
DQN-5
action? ???? ??(policy) : ?
??? Epsilon greedy
?? model? weight? ???? target
model? ???? ???? ?? ??
DQN-6
state, action, reward, next state? ???? ???? ????? ??
Replay Memory
DQN-7
???? ????? ??? ??? ??? ???? ??
Replay memory-2
DQN-8
? optimizer
? Tensorboard
? ???? ? ?? ??? ???
? ??
???
DQN-9
???? ???? ?? ??
?
???
DQN-10
Optimizer ??
???? Huber Loss??
https://goo.gl/images/XGsfYx
DQN, Keras(breakout)
??
????? ?? ??
Human A.I.
Deeplearning+Reinforcement Learning
?? ?????? ??? ????!
?? ??? ??
??? ??? ?? ??? ???? ????? ???? ??.
??? ????
State ??, action? ???. ?
?
?? ????? ??????, Deeplearning model, hyper parameter?
???? ???? ??.
Emulator
Environment
Algorithm
Programming Language
????? ??? ??? ???
1. https://www.python.org/downloads/ - 3.5 version
2. https://www.anaconda.com/download/ -Anaconda
3. https://www.tensorflow.org/install/ - TensorFlow
4. https://keras.io/#installation -Keras
Programming Language - Python
Emulator http://www.fceux.com/web/home.html
Ubuntu
sudo apt-get update
sudo apt-get install fceux
MAC
https://brew.sh/ -homebrew website
Terminal open -> brew install fceux
sudo apt-get install fceux
Emulator -FCUX
Environment OpenAI_Gym
https://github.com/openai/gym
pip3 install gym
git clone https://github.com/openai/gym.git?
cd gym?
pip install -e
OpenAI_Gym
OpenAI? Gym? ???? ?? ?? ???? ??? ????.
Environment
Baselines
https://github.com/openai/baselines
pip3 install baselines
git clone https://github.com/openai/baselines.git?
cd baselines?
pip install -e .
OpenAI_Baselines
Environment
Philip Paquette
https://github.com/ppaquette/gym-super-mario
pip3 install gym-pull
import gym
import gym_pull
gym_pull.pull('github.com/ppaquette/gym-super-mario')
env = gym.make('ppaquette/SuperMarioBros-1-1-v0')
SuperMario
Algorithm
DEEP Q-NETWORK
Algorithm-DQN
??? ??? ?????
https://github.com/wonseokjung/KIPS_Reinforcement/tree/
master/DQN?
?
?? ???? ???? ?? github? ??? ??? ???????.
DQN? ??? ???? ????? ???
?????? ??? ????
State : ???? ??
Action : ?, ? , ?, ?
Reward : ?? = -1, ?? = 1?
Transition Probability : 1
Discount factor : 0.9
Reward
+ 1
Reward
-1
State
Action
?????? ???????
Goal? ??
Goal
Start
?????? ??? ??? ???????? ??? goal state? ???
???????? ??
State : ??
Action : ?, ? , ?, ?,??,???, action? ??
Reward : ??? ???? Reward +1, ???? -1?
Transition Probability : 1
Discount factor : 0.9
State
Action
???? ??? ??? ??? ?? reward? ???.
???? ??´
??
1. ???? ??? ???? ???? ?? ??
2. State? breakout?? ? ???? action? ??.
Reward ??
?????? ??? -
??? ????? -
???? ???? -
??? ????? +
??? ???? +
Penalty, Bonus reward??
Deeplearning model
VGG model and regular ??
https://goo.gl/images/eoXooChttps://goo.gl/images/s8XrCK
? ?? ????
????(reinforcement learning) ?? ?? ? Unity ml-agent? ???? ?? ??? ???? ???? ??
---
Github:
https://github.com/wonseokjung
Facebook:
https://www.facebook.com/ws.jung.798
Blog:
https://wonseokjung.github.io/??!!
DQN? ??? ???? ????? ???
??
?? 1 ? ??? ?? ???? ?? ?..
?? ???? ??? ?? ????. Overfitting? ????
https://goo.gl/images/6uDmqH
????? ??? ??? ??? !
Reward Exploration Algorithm
?????.
Github:
https://github.com/wonseokjung
Facebook:
https://www.facebook.com/ws.jung.798
Blog:
https://wonseokjung.github.io/
References:
* Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto Second Edition, in progress MIT Press, Cambridge,
MA, 2017
* https://github.com/rlcode/reinforcement-learning-kr

More Related Content

What's hot (20)

AlphaGo ?????????? ?????
AlphaGo ?????????? ?????AlphaGo ?????????? ?????
AlphaGo ?????????? ?????
Jooyoul Lee
?
[224]??? ??? ???
[224]??? ??? ???[224]??? ??? ???
[224]??? ??? ???
NAVER D2
?
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
?
???? ???? ??? ???? NAVER 2017
???? ???? ??? ???? NAVER 2017???? ???? ??? ???? NAVER 2017
???? ???? ??? ???? NAVER 2017
Taehoon Kim
?
Nagoya.R #15 了珸vS方の佚m曝gの麻竃
Nagoya.R #15 了珸vS方の佚m曝gの麻竃Nagoya.R #15 了珸vS方の佚m曝gの麻竃
Nagoya.R #15 了珸vS方の佚m曝gの麻竃
Yusaku Kawaguchi
?
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
David Voyles
?
???-??????2
???-??????2???-??????2
???-??????2
jdo
?
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
?
?.?.?.?. ????!
?.?.?.?. ????!?.?.?.?. ????!
?.?.?.?. ????!
Dongmin Lee
?
由柴議咀惚容胎茶膿氏 及1指
由柴議咀惚容胎茶膿氏 及1指由柴議咀惚容胎茶膿氏 及1指
由柴議咀惚容胎茶膿氏 及1指
Hikaru GOTO
?
???? ???? ??? ??? ???
???? ???? ??? ??? ??????? ???? ??? ??? ???
???? ???? ??? ??? ???
Kwangsik Lee
?
署蛮〜粥鴛で盾くべき諒籾は採か
署蛮〜粥鴛で盾くべき諒籾は採か署蛮〜粥鴛で盾くべき諒籾は採か
署蛮〜粥鴛で盾くべき諒籾は採か
Tsunehiko Nagayama
?
C亠僥 / Deep Learning 寄畠 (2) Deep Learning 児A
C亠僥 / Deep Learning 寄畠 (2) Deep Learning 児AC亠僥 / Deep Learning 寄畠 (2) Deep Learning 児A
C亠僥 / Deep Learning 寄畠 (2) Deep Learning 児A
Daiyu Hatakeyama
?
MIRU2014 tutorial deeplearning
MIRU2014 tutorial deeplearningMIRU2014 tutorial deeplearning
MIRU2014 tutorial deeplearning
Takayoshi Yamashita
?
Lstm
LstmLstm
Lstm
Mehrnaz Faraz
?
LSTM
LSTMLSTM
LSTM
煮磐 吊
?
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
DongHyun Kwak
?
??? ?? ???? ?? ? ?? ?? ? ?? ???
??? ?? ???? ?? ? ?? ?? ? ?? ?????? ?? ???? ?? ? ?? ?? ? ?? ???
??? ?? ???? ?? ? ?? ?? ? ?? ???
NAVER Engineering
?
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Ding Li
?
2021-11-16 ??????? ??? ???????? ??
2021-11-16 ??????? ??? ???????? ??2021-11-16 ??????? ??? ???????? ??
2021-11-16 ??????? ??? ???????? ??
JongkukLim
?
AlphaGo ?????????? ?????
AlphaGo ?????????? ?????AlphaGo ?????????? ?????
AlphaGo ?????????? ?????
Jooyoul Lee
?
[224]??? ??? ???
[224]??? ??? ???[224]??? ??? ???
[224]??? ??? ???
NAVER D2
?
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
?
???? ???? ??? ???? NAVER 2017
???? ???? ??? ???? NAVER 2017???? ???? ??? ???? NAVER 2017
???? ???? ??? ???? NAVER 2017
Taehoon Kim
?
Nagoya.R #15 了珸vS方の佚m曝gの麻竃
Nagoya.R #15 了珸vS方の佚m曝gの麻竃Nagoya.R #15 了珸vS方の佚m曝gの麻竃
Nagoya.R #15 了珸vS方の佚m曝gの麻竃
Yusaku Kawaguchi
?
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
David Voyles
?
???-??????2
???-??????2???-??????2
???-??????2
jdo
?
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
?
由柴議咀惚容胎茶膿氏 及1指
由柴議咀惚容胎茶膿氏 及1指由柴議咀惚容胎茶膿氏 及1指
由柴議咀惚容胎茶膿氏 及1指
Hikaru GOTO
?
???? ???? ??? ??? ???
???? ???? ??? ??? ??????? ???? ??? ??? ???
???? ???? ??? ??? ???
Kwangsik Lee
?
署蛮〜粥鴛で盾くべき諒籾は採か
署蛮〜粥鴛で盾くべき諒籾は採か署蛮〜粥鴛で盾くべき諒籾は採か
署蛮〜粥鴛で盾くべき諒籾は採か
Tsunehiko Nagayama
?
C亠僥 / Deep Learning 寄畠 (2) Deep Learning 児A
C亠僥 / Deep Learning 寄畠 (2) Deep Learning 児AC亠僥 / Deep Learning 寄畠 (2) Deep Learning 児A
C亠僥 / Deep Learning 寄畠 (2) Deep Learning 児A
Daiyu Hatakeyama
?
??? ?? ???? ?? ? ?? ?? ? ?? ???
??? ?? ???? ?? ? ?? ?? ? ?? ?????? ?? ???? ?? ? ?? ?? ? ?? ???
??? ?? ???? ?? ? ?? ?? ? ?? ???
NAVER Engineering
?
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Ding Li
?
2021-11-16 ??????? ??? ???????? ??
2021-11-16 ??????? ??? ???????? ??2021-11-16 ??????? ??? ???????? ??
2021-11-16 ??????? ??? ???????? ??
JongkukLim
?

Similar to Rl (20)

Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Tae Young Lee
?
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
Kyunghwan Kim
?
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
?
ML + ?? phase 2
ML + ??  phase 2ML + ??  phase 2
ML + ?? phase 2
HoChul Shin
?
Policy gradient
Policy gradientPolicy gradient
Policy gradient
?? ?
?
Guided policy search
Guided policy searchGuided policy search
Guided policy search
Jaehyeon Park
?
World model
World modelWorld model
World model
Sooyoung Moon
?
???? ???? ??? ???? ????
???? ???? ??? ???? ???????? ???? ??? ???? ????
???? ???? ??? ???? ????
Woong won Lee
?
CS294-112 Lecture 13
CS294-112 Lecture 13CS294-112 Lecture 13
CS294-112 Lecture 13
Gyubin Son
?
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5
SANG WON PARK
?
???? ????? ??? Part 2
???? ????? ??? Part 2???? ????? ??? Part 2
???? ????? ??? Part 2
Dongmin Lee
?
??????????? & Unity ML Agents
??????????? & Unity ML Agents??????????? & Unity ML Agents
??????????? & Unity ML Agents
Hyunjong Lee
?
AUTOML
AUTOMLAUTOML
AUTOML
?? ?
?
Automl
AutomlAutoml
Automl
?? ?
?
Unity ml agent quick guide
Unity ml agent quick guideUnity ml agent quick guide
Unity ml agent quick guide
Kyoungman Lee
?
Reinforcement learning basic
Reinforcement learning basicReinforcement learning basic
Reinforcement learning basic
Jicheol Woo
?
??_??????(Change point Detection)
??_??????(Change point Detection)??_??????(Change point Detection)
??_??????(Change point Detection)
Seung-Woo Kang
?
Rl from scratch part4
Rl from scratch part4Rl from scratch part4
Rl from scratch part4
Shinwoo Park
?
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
?? ?
?
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
Kyunghwan Kim
?
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
?
Policy gradient
Policy gradientPolicy gradient
Policy gradient
?? ?
?
???? ???? ??? ???? ????
???? ???? ??? ???? ???????? ???? ??? ???? ????
???? ???? ??? ???? ????
Woong won Lee
?
CS294-112 Lecture 13
CS294-112 Lecture 13CS294-112 Lecture 13
CS294-112 Lecture 13
Gyubin Son
?
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5
SANG WON PARK
?
???? ????? ??? Part 2
???? ????? ??? Part 2???? ????? ??? Part 2
???? ????? ??? Part 2
Dongmin Lee
?
??????????? & Unity ML Agents
??????????? & Unity ML Agents??????????? & Unity ML Agents
??????????? & Unity ML Agents
Hyunjong Lee
?
AUTOML
AUTOMLAUTOML
AUTOML
?? ?
?
Automl
AutomlAutoml
Automl
?? ?
?
Unity ml agent quick guide
Unity ml agent quick guideUnity ml agent quick guide
Unity ml agent quick guide
Kyoungman Lee
?
Reinforcement learning basic
Reinforcement learning basicReinforcement learning basic
Reinforcement learning basic
Jicheol Woo
?
??_??????(Change point Detection)
??_??????(Change point Detection)??_??????(Change point Detection)
??_??????(Change point Detection)
Seung-Woo Kang
?
Rl from scratch part4
Rl from scratch part4Rl from scratch part4
Rl from scratch part4
Shinwoo Park
?
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
?? ?
?

Rl