際際滷

際際滷Share a Scribd company logo
????? ??
???
? ??? ?? ?? ???? ??????.
?? : ???? ???? ??? ????
??? ?? : http://wikibook.co.kr/reinforcement-learning
Reference
Index
1. ???? ??
2. ???? ??
3. ??
????? ??
1. ???? ??
1. ???? ??
?????? ???? ????? ????
1. ???? ??
?????? ???? ????? ????
?????? ????
??(Reinforcement)? ???
????(Trial and Error)? ?? ???? ?? ? ??
?????? ????
??(Reinforcement)? ???
????(Trial and Error)? ?? ???? ?? ? ??
?????? ????
???? ? ??? ????
?????? ????
1. ?? ?? ??? ???.
2. ?? ?????? ??? ?? ?? ??
???? ??? ??.
3. ???? ??? ??? ???.
4. ???? ??? ??? ???? ?????
??? ?? ?? ?????.
5. ???? ??? ?? ?? ???? ???
?? ?? ??? ??? ??? ???
?? ?? ?? ???? ?? ??? ??.
6. ? ??? ????? ?? ???? ???
??? ?? ? ??? ?? ????.
?? : ???? ???? ??? ???? ????
??? ?? : /WoongwonLee/ss-78783597
?????? ????
??? ??? ?? ??? ?? ??? ??? ?? ??
?????? ????
?? ??? ??? ? ? ??? ???? ????
??? ?? ??? ??? ???? ? ?? ??.
?? : ???? ???? ??? ???? ????
??? ?? : /WoongwonLee/ss-78783597
?????? ????
?? ????? ?? : ??? ???? ?? ??? ?? ??
?? : ??????2 ????(StarCraft Å Reinforcement Learning)
??? ?? : /sjhshy/2-starcraft-ii-reinforcement-learning-80779324
1. ???? ??
?????? ???? ????? ????
????? ????
????(Machine Learning)???
????? ? ????
???? ??? ???? ?? ????? ???? ?? ??
?? : ????, ?????, ????
????(Reinforcement Learning)
- ??(Reward)? ?? ??
- ??? ???? ??? ??(Action)? ?? ??? ??
- ??? ??? ??? ???? ??? ?? ??
- ??? ?? ?? ??? ?? ?? ??? ??
????? ????
????? ????
?? : [???AI???]???? ???? ????? ??
??? ?? : https://brunch.co.kr/@kakao-it/73
????? ????
????? ?? : ^??? ???? ?? ?? ̄? ???? ?
?? : [???AI???]???? ???? ????? ??
??? ?? : https://brunch.co.kr/@kakao-it/73
????? ????
????? ?????
????? ????
??? ?? ????? ??? ????? ?
2. ???? ??
2. ???? ??
????? ??? ????? ??? ?? ??? ??
2. ???? ??
????? ??? ?? ??? ??? ? ???? ??
MDP(Markov Decision Process)
MDP(Markov Decision Process)
- ??(State) : ??? ?? + ??? ??(ex. ??, ??? ?)
- ??(Action) : ??? ???? ?? ? ?? ??(ex. ?, ?, ?, ?)
- ??(Reward) : ????? ??? ? ?? ??? ??
(?? ???? ??? ??? ????? ?? ???? ??!)
- ??(Policy) : ??? ?? ?? ??(MDP)?? ??? ? ?
?? ??? ?? ????? ?? ??? ?? ??? ???? ?
??? ?? ?? ??? ???? -> `Optimal Policy¨? ???!
????? ??????
?????.

More Related Content

What's hot (20)

???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
Kyunghwan Kim
?
[猟B初] LSTM (LONG SHORT-TERM MEMORY)
[猟B初] LSTM (LONG SHORT-TERM MEMORY)[猟B初] LSTM (LONG SHORT-TERM MEMORY)
[猟B初] LSTM (LONG SHORT-TERM MEMORY)
Tomoyuki Hioki
?
???? ???? DQN?? (Reinforcement Learning from Basics to DQN)
???? ???? DQN?? (Reinforcement Learning from Basics to DQN)???? ???? DQN?? (Reinforcement Learning from Basics to DQN)
???? ???? DQN?? (Reinforcement Learning from Basics to DQN)
Curt Park
?
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
?
Introduction to A3C model
Introduction to A3C modelIntroduction to A3C model
Introduction to A3C model
WEBFARMER. ltd.
?
[DLi氏]Object-Centric Learning with Slot Attention
[DLi氏]Object-Centric Learning with Slot Attention[DLi氏]Object-Centric Learning with Slot Attention
[DLi氏]Object-Centric Learning with Slot Attention
Deep Learning JP
?
膿晒僥楼と剃膿晒僥楼を怏み栽わせた庁僥楼
膿晒僥楼と剃膿晒僥楼を怏み栽わせた庁僥楼膿晒僥楼と剃膿晒僥楼を怏み栽わせた庁僥楼
膿晒僥楼と剃膿晒僥楼を怏み栽わせた庁僥楼
Eiji Uchibe
?
RLCode? A3C ?? ?? ????
RLCode? A3C ?? ?? ????RLCode? A3C ?? ?? ????
RLCode? A3C ?? ?? ????
Woong won Lee
?
Active Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkActive Learning と Bayesian Neural Network
Active Learning と Bayesian Neural Network
Naoki Matsunaga
?
???? ?? ???? ??? ??? ??? AI ???? DEVIEW 2016
???? ?? ???? ??? ??? ??? AI ???? DEVIEW 2016???? ?? ???? ??? ??? ??? AI ???? DEVIEW 2016
???? ?? ???? ??? ??? ??? AI ???? DEVIEW 2016
Taehoon Kim
?
??? - ??? ??? ??
??? - ??? ??? ????? - ??? ??? ??
??? - ??? ??? ??
Hyungsoo Ryoo
?
[DLi氏]Control as Inferenceとk婢
[DLi氏]Control as Inferenceとk婢[DLi氏]Control as Inferenceとk婢
[DLi氏]Control as Inferenceとk婢
Deep Learning JP
?
膿晒僥楼の児粥議な深え圭と諒籾の蛍窃
膿晒僥楼の児粥議な深え圭と諒籾の蛍窃膿晒僥楼の児粥議な深え圭と諒籾の蛍窃
膿晒僥楼の児粥議な深え圭と諒籾の蛍窃
嗷 遮勸
?
膿晒僥楼その3
膿晒僥楼その3膿晒僥楼その3
膿晒僥楼その3
nishio
?
Introduction to YOLO detection model
Introduction to YOLO detection modelIntroduction to YOLO detection model
Introduction to YOLO detection model
WEBFARMER. ltd.
?
pycon2018 "RL Adventure : DQN ?? Rainbow DQN??"
pycon2018 "RL Adventure : DQN ?? Rainbow DQN??"pycon2018 "RL Adventure : DQN ?? Rainbow DQN??"
pycon2018 "RL Adventure : DQN ?? Rainbow DQN??"
Yechan(Paul) Kim
?
PILCO - 及匯指互鰕仂進劵皀妊襯扎`ス晒僥茶氏
PILCO - 及匯指互鰕仂進劵皀妊襯扎`ス晒僥茶氏PILCO - 及匯指互鰕仂進劵皀妊襯扎`ス晒僥茶氏
PILCO - 及匯指互鰕仂進劵皀妊襯扎`ス晒僥茶氏
Shunichi Sekiguchi
?
Convolutionl Neural Network 秘T
Convolutionl Neural Network 秘TConvolutionl Neural Network 秘T
Convolutionl Neural Network 秘T
maruyama097
?
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
Dongmin Lee
?
Q Learning? CNN? ??? Object Localization
Q Learning? CNN? ??? Object LocalizationQ Learning? CNN? ??? Object Localization
Q Learning? CNN? ??? Object Localization
?? ?
?
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)
Kyunghwan Kim
?
[猟B初] LSTM (LONG SHORT-TERM MEMORY)
[猟B初] LSTM (LONG SHORT-TERM MEMORY)[猟B初] LSTM (LONG SHORT-TERM MEMORY)
[猟B初] LSTM (LONG SHORT-TERM MEMORY)
Tomoyuki Hioki
?
???? ???? DQN?? (Reinforcement Learning from Basics to DQN)
???? ???? DQN?? (Reinforcement Learning from Basics to DQN)???? ???? DQN?? (Reinforcement Learning from Basics to DQN)
???? ???? DQN?? (Reinforcement Learning from Basics to DQN)
Curt Park
?
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
?
[DLi氏]Object-Centric Learning with Slot Attention
[DLi氏]Object-Centric Learning with Slot Attention[DLi氏]Object-Centric Learning with Slot Attention
[DLi氏]Object-Centric Learning with Slot Attention
Deep Learning JP
?
膿晒僥楼と剃膿晒僥楼を怏み栽わせた庁僥楼
膿晒僥楼と剃膿晒僥楼を怏み栽わせた庁僥楼膿晒僥楼と剃膿晒僥楼を怏み栽わせた庁僥楼
膿晒僥楼と剃膿晒僥楼を怏み栽わせた庁僥楼
Eiji Uchibe
?
Active Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkActive Learning と Bayesian Neural Network
Active Learning と Bayesian Neural Network
Naoki Matsunaga
?
???? ?? ???? ??? ??? ??? AI ???? DEVIEW 2016
???? ?? ???? ??? ??? ??? AI ???? DEVIEW 2016???? ?? ???? ??? ??? ??? AI ???? DEVIEW 2016
???? ?? ???? ??? ??? ??? AI ???? DEVIEW 2016
Taehoon Kim
?
[DLi氏]Control as Inferenceとk婢
[DLi氏]Control as Inferenceとk婢[DLi氏]Control as Inferenceとk婢
[DLi氏]Control as Inferenceとk婢
Deep Learning JP
?
膿晒僥楼の児粥議な深え圭と諒籾の蛍窃
膿晒僥楼の児粥議な深え圭と諒籾の蛍窃膿晒僥楼の児粥議な深え圭と諒籾の蛍窃
膿晒僥楼の児粥議な深え圭と諒籾の蛍窃
嗷 遮勸
?
膿晒僥楼その3
膿晒僥楼その3膿晒僥楼その3
膿晒僥楼その3
nishio
?
Introduction to YOLO detection model
Introduction to YOLO detection modelIntroduction to YOLO detection model
Introduction to YOLO detection model
WEBFARMER. ltd.
?
pycon2018 "RL Adventure : DQN ?? Rainbow DQN??"
pycon2018 "RL Adventure : DQN ?? Rainbow DQN??"pycon2018 "RL Adventure : DQN ?? Rainbow DQN??"
pycon2018 "RL Adventure : DQN ?? Rainbow DQN??"
Yechan(Paul) Kim
?
PILCO - 及匯指互鰕仂進劵皀妊襯扎`ス晒僥茶氏
PILCO - 及匯指互鰕仂進劵皀妊襯扎`ス晒僥茶氏PILCO - 及匯指互鰕仂進劵皀妊襯扎`ス晒僥茶氏
PILCO - 及匯指互鰕仂進劵皀妊襯扎`ス晒僥茶氏
Shunichi Sekiguchi
?
Convolutionl Neural Network 秘T
Convolutionl Neural Network 秘TConvolutionl Neural Network 秘T
Convolutionl Neural Network 秘T
maruyama097
?
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
Dongmin Lee
?
Q Learning? CNN? ??? Object Localization
Q Learning? CNN? ??? Object LocalizationQ Learning? CNN? ??? Object Localization
Q Learning? CNN? ??? Object Localization
?? ?
?

Similar to ????? ?? (6)

???? ??-?????.pptx
???? ??-?????.pptx???? ??-?????.pptx
???? ??-?????.pptx
ssuser935bf5
?
? 5? ??? ???? ppt 201225039 ???
? 5? ??? ???? ppt 201225039 ???? 5? ??? ???? ppt 201225039 ???
? 5? ??? ???? ppt 201225039 ???
gudtnqls
?

More from Dongmin Lee (12)

Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
Dongmin Lee
?
Character Controllers using Motion VAEs
Character Controllers using Motion VAEsCharacter Controllers using Motion VAEs
Character Controllers using Motion VAEs
Dongmin Lee
?
Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
Dongmin Lee
?
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Dongmin Lee
?
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
Dongmin Lee
?
Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement Learning
Dongmin Lee
?
Let's do Inverse RLLet's do Inverse RL
Let's do Inverse RL
Dongmin Lee
?
??? ?? PG?? ?????? ?? PG?? ???
??? ?? PG?? ???
Dongmin Lee
?
Safe Reinforcement Learning
Safe Reinforcement LearningSafe Reinforcement Learning
Safe Reinforcement Learning
Dongmin Lee
?
?.?.?.?. ????!
?.?.?.?. ????!?.?.?.?. ????!
?.?.?.?. ????!
Dongmin Lee
?
Planning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsPlanning and Learning with Tabular Methods
Planning and Learning with Tabular Methods
Dongmin Lee
?
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
Dongmin Lee
?
Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
Dongmin Lee
?
Character Controllers using Motion VAEs
Character Controllers using Motion VAEsCharacter Controllers using Motion VAEs
Character Controllers using Motion VAEs
Dongmin Lee
?
Causal Confusion in Imitation Learning
Causal Confusion in Imitation LearningCausal Confusion in Imitation Learning
Causal Confusion in Imitation Learning
Dongmin Lee
?
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Dongmin Lee
?
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...
Dongmin Lee
?
Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement Learning
Dongmin Lee
?
Let's do Inverse RLLet's do Inverse RL
Let's do Inverse RL
Dongmin Lee
?
??? ?? PG?? ?????? ?? PG?? ???
??? ?? PG?? ???
Dongmin Lee
?
Safe Reinforcement Learning
Safe Reinforcement LearningSafe Reinforcement Learning
Safe Reinforcement Learning
Dongmin Lee
?
Planning and Learning with Tabular Methods
Planning and Learning with Tabular MethodsPlanning and Learning with Tabular Methods
Planning and Learning with Tabular Methods
Dongmin Lee
?
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
Dongmin Lee
?

????? ??