際際滷

際際滷Share a Scribd company logo
What is the World Model Algorithm?
Aug 29, 2018

Sooyoung Moon
2
World Model 覦 覈
3
World Model 覦 覈
襷 RNN瑚 る? 旧 蠍  RNN l    .
4
World Model 覦 process (simple tasks ver.)
<Car Racing> Random Policy襦 Exploration
Fake environment
Exploration & Flexibility
When training (M)When testing
When training (C)
5
<Car Racing>
Training
Simulate
1. Environment reset 豌 obs 螻

2. Agent螳 obs 磯 random action 豬

3. obs, reward, done, info = model.env.step(action)

4. [encoded_obs, action]螳 rnn 誤朱 れ願

5.  z螳螻 h螳 

6. Total reward += reward

7. 2覯朱 螳 覦覲
Optimizing
CMA-ES: cma朱 optimizer
襯 伎 讌 螻襴讀

Cumulative reward襯 豕螳
蟆  W, b 谿場譴
World Model 覦 process (simple tasks ver.)
6
<VizDoom>
Random Policy襦 Exploration
Fake environment
Exploration & Flexibility
World Model 覦 process (simple tasks ver.)
World Model 覦 process (complicated tasks ver.)
1. M,C 蠏 螳  朱 り朱
朱誤磯ゼ 豐蠍壱 .

2. 襷 ロ螻 obs  願姥 伎
M 碁企.

3. action  reward, done 企 覿覿れ
覿 覈碁 蟆 .  M 牛磯伎 蟆
C襯 碁企.

4. 企慨  覲旧″  M-C 碁企 
朱る 2覯朱 螳.
Loss function
Maximum likelihood loss function
01_generate_data.py
03_generate_rnn_data.py
04_train_rnn.py
Implementation 鍖les
02_train_vae.py
05_train_controller.py
python 01_generate_data.py car_racing --total_episodes 2000 --start_batch 0 --time_steps 300
python 02_train_vae.py --start_batch 0 --max_batch 9 --new_model
python 03_generate_rnn_data.py --start_batch 0 --max_batch 9
python 04_train_rnn.py --start_batch 0 --max_batch 9 --new_model
python 05_train_controller.py car_racing --num_worker 16 --num_worker_trial 2 --num_episode 4 --
max_length 1000 --eval_steps 25
!9
VAE 牛 latent space 襷 
MDN-RNN 牛 sequential 覲 豌襴 

300 time steps




.

.

.
200 x 10 batches VAE
Weights.h5
01_generate_data.py
02_train_vae.py
03_generate_rnn_data.py
./vae/weights.h5
04_train_rnn.py
MDN-
RNN
Weights.h5
./rnn/weights.h5
Implementation 鍖les
!10
CMA-ES
CMA-ES螳 企 蟆語 讌蟯朱 危危蠍
http://blog.otoro.net/2017/10/29/visual-evolution-strategies/
05_train_controller.py
Implementation 鍖les
!11
Discussion
1. 願 螳旧 一碁.

2. 襴殊螳 旧     螻 襴 蟆暑慨 覈詩讌 .

3. 譴 : 襴 蟆曙 觜瑚 一磯 襷 蟆 觜 蟲ロ 企. 一磯 豌願
.

4. World model simulation朱 谿蠏殊姶蠏 覦一 蟆 襴 襦 牛 policy襯 transfer
  蠍 覓語  糾骸 蟆壱  螻殊 讌朱    

5. 蠏 覦 レ: 

1. Controller model 蟆 襷れ蠍 覓語 覲旧″ 螻襴讀 郁鍵 覲企 朱
一壱蠍  es. 襷 襷讌襷 鍖nal cumulative reward襷 螳讌螻 optimizer 襴
 .

2. ES レ 覲  GPU 郁鍵 譬 伎 豕 RNN 蟲譟磯ゼ
  蟆 .

6. :

1. capacity螳  . 螻朱 continuous 一危郁 ル蠍 覓語

2. RNN 郁鍵 覓語 螳 讌襦 讌 catastrophic forgetting 覓語襯 願屋
 . 

 solution: small MDN-RNN 蟆 襷り碓 external memory module 
 蟾 

7. 朱:

1. 觚危 螳ロ讌 豌危企慨覃 譬 蟆

2. 企 task 磯 譬讌?
!12

More Related Content

Similar to World model (20)

Ml for 一危
Ml for 一危Ml for 一危
Ml for 一危
JEEHYUN PAIK
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
Unity ml agent quick guide
Unity ml agent quick guideUnity ml agent quick guide
Unity ml agent quick guide
Kyoungman Lee
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basics
Jinho Lee
襾語 + 譯殊 曙蠍
襾語 + 譯殊 曙蠍襾語 + 譯殊 曙蠍
襾語 + 譯殊 曙蠍
HoChul Shin
CS294-112 18
CS294-112 18CS294-112 18
CS294-112 18
Gyubin Son
Image Deep Learning る伎
Image Deep Learning る伎Image Deep Learning る伎
Image Deep Learning る伎
Youngjae Kim
ル螻 觚襭れ, Apache MXNet 螻牛蠍 - れ谿 (AWS 覦るΜろ) り (殊 SDS)
ル螻 觚襭れ, Apache MXNet 螻牛蠍 - れ谿 (AWS 覦るΜろ) り (殊 SDS)ル螻 觚襭れ, Apache MXNet 螻牛蠍 - れ谿 (AWS 覦るΜろ) り (殊 SDS)
ル螻 觚襭れ, Apache MXNet 螻牛蠍 - れ谿 (AWS 覦るΜろ) り (殊 SDS)
Amazon Web Services Korea
Guided policy search
Guided policy searchGuided policy search
Guided policy search
Jaehyeon Park
Transfer learning usage
Transfer learning usageTransfer learning usage
Transfer learning usage
Tae Young Lee
炎豸 一危 螳 一 AI 蟆曙 1
炎豸 一危  螳 一 AI 蟆曙 1 炎豸 一危  螳 一 AI 蟆曙 1
炎豸 一危 螳 一 AI 蟆曙 1
DACON AI 一伎
Function approximation as supervised learning
Function approximation as supervised learningFunction approximation as supervised learning
Function approximation as supervised learning
Sunggon Song
Openface
OpenfaceOpenface
Openface
jaeho kang
Deep learning overview
Deep learning overviewDeep learning overview
Deep learning overview
螳覩手記 螳覩手記
Legacy code refactoring video rental system
Legacy code refactoring   video rental systemLegacy code refactoring   video rental system
Legacy code refactoring video rental system
Jaehoon Oh
RLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdfRLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdf
ssuser1bc84b
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...
Gyubin Son
Chapter 11 Practical Methodology
Chapter 11 Practical MethodologyChapter 11 Practical Methodology
Chapter 11 Practical Methodology
KyeongUkJang
伎朱 牛 ル 蠍磯蓋 (18)
伎朱 牛 ル 蠍磯蓋 (18)伎朱 牛 ル 蠍磯蓋 (18)
伎朱 牛 ル 蠍磯蓋 (18)
SK(譯) C&C - 螳覲
19 覲伎讀(BOAZ) 觜一危 貉朱一 - [讌 4襦] : RAD(Reinforcement learning method for ...
 19 覲伎讀(BOAZ) 觜一危 貉朱一 - [讌 4襦] : RAD(Reinforcement learning method for ... 19 覲伎讀(BOAZ) 觜一危 貉朱一 - [讌 4襦] : RAD(Reinforcement learning method for ...
19 覲伎讀(BOAZ) 觜一危 貉朱一 - [讌 4襦] : RAD(Reinforcement learning method for ...
BOAZ Bigdata
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
Unity ml agent quick guide
Unity ml agent quick guideUnity ml agent quick guide
Unity ml agent quick guide
Kyoungman Lee
Ch.5 machine learning basics
Ch.5  machine learning basicsCh.5  machine learning basics
Ch.5 machine learning basics
Jinho Lee
襾語 + 譯殊 曙蠍
襾語 + 譯殊 曙蠍襾語 + 譯殊 曙蠍
襾語 + 譯殊 曙蠍
HoChul Shin
CS294-112 18
CS294-112 18CS294-112 18
CS294-112 18
Gyubin Son
Image Deep Learning る伎
Image Deep Learning る伎Image Deep Learning る伎
Image Deep Learning る伎
Youngjae Kim
ル螻 觚襭れ, Apache MXNet 螻牛蠍 - れ谿 (AWS 覦るΜろ) り (殊 SDS)
ル螻 觚襭れ, Apache MXNet 螻牛蠍 - れ谿 (AWS 覦るΜろ) り (殊 SDS)ル螻 觚襭れ, Apache MXNet 螻牛蠍 - れ谿 (AWS 覦るΜろ) り (殊 SDS)
ル螻 觚襭れ, Apache MXNet 螻牛蠍 - れ谿 (AWS 覦るΜろ) り (殊 SDS)
Amazon Web Services Korea
Guided policy search
Guided policy searchGuided policy search
Guided policy search
Jaehyeon Park
Transfer learning usage
Transfer learning usageTransfer learning usage
Transfer learning usage
Tae Young Lee
炎豸 一危 螳 一 AI 蟆曙 1
炎豸 一危  螳 一 AI 蟆曙 1 炎豸 一危  螳 一 AI 蟆曙 1
炎豸 一危 螳 一 AI 蟆曙 1
DACON AI 一伎
Function approximation as supervised learning
Function approximation as supervised learningFunction approximation as supervised learning
Function approximation as supervised learning
Sunggon Song
Legacy code refactoring video rental system
Legacy code refactoring   video rental systemLegacy code refactoring   video rental system
Legacy code refactoring video rental system
Jaehoon Oh
RLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdfRLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdf
ssuser1bc84b
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...
[paper review] 蠏觜 - Eye in the sky & 3D human pose estimation in video with ...
Gyubin Son
Chapter 11 Practical Methodology
Chapter 11 Practical MethodologyChapter 11 Practical Methodology
Chapter 11 Practical Methodology
KyeongUkJang
19 覲伎讀(BOAZ) 觜一危 貉朱一 - [讌 4襦] : RAD(Reinforcement learning method for ...
 19 覲伎讀(BOAZ) 觜一危 貉朱一 - [讌 4襦] : RAD(Reinforcement learning method for ... 19 覲伎讀(BOAZ) 觜一危 貉朱一 - [讌 4襦] : RAD(Reinforcement learning method for ...
19 覲伎讀(BOAZ) 觜一危 貉朱一 - [讌 4襦] : RAD(Reinforcement learning method for ...
BOAZ Bigdata

World model

  • 1. What is the World Model Algorithm? Aug 29, 2018 Sooyoung Moon
  • 3. 3 World Model 覦 覈 襷 RNN瑚 る? 旧 蠍 RNN l .
  • 4. 4 World Model 覦 process (simple tasks ver.) <Car Racing> Random Policy襦 Exploration Fake environment Exploration & Flexibility When training (M)When testing
  • 5. When training (C) 5 <Car Racing> Training Simulate 1. Environment reset 豌 obs 螻 2. Agent螳 obs 磯 random action 豬 3. obs, reward, done, info = model.env.step(action) 4. [encoded_obs, action]螳 rnn 誤朱 れ願 5. z螳螻 h螳 6. Total reward += reward 7. 2覯朱 螳 覦覲 Optimizing CMA-ES: cma朱 optimizer 襯 伎 讌 螻襴讀 Cumulative reward襯 豕螳 蟆 W, b 谿場譴 World Model 覦 process (simple tasks ver.)
  • 6. 6 <VizDoom> Random Policy襦 Exploration Fake environment Exploration & Flexibility World Model 覦 process (simple tasks ver.)
  • 7. World Model 覦 process (complicated tasks ver.) 1. M,C 蠏 螳 朱 り朱 朱誤磯ゼ 豐蠍壱 . 2. 襷 ロ螻 obs 願姥 伎 M 碁企. 3. action reward, done 企 覿覿れ 覿 覈碁 蟆 . M 牛磯伎 蟆 C襯 碁企. 4. 企慨 覲旧″ M-C 碁企 朱る 2覯朱 螳.
  • 9. 01_generate_data.py 03_generate_rnn_data.py 04_train_rnn.py Implementation 鍖les 02_train_vae.py 05_train_controller.py python 01_generate_data.py car_racing --total_episodes 2000 --start_batch 0 --time_steps 300 python 02_train_vae.py --start_batch 0 --max_batch 9 --new_model python 03_generate_rnn_data.py --start_batch 0 --max_batch 9 python 04_train_rnn.py --start_batch 0 --max_batch 9 --new_model python 05_train_controller.py car_racing --num_worker 16 --num_worker_trial 2 --num_episode 4 -- max_length 1000 --eval_steps 25 !9
  • 10. VAE 牛 latent space 襷 MDN-RNN 牛 sequential 覲 豌襴 300 time steps . . . 200 x 10 batches VAE Weights.h5 01_generate_data.py 02_train_vae.py 03_generate_rnn_data.py ./vae/weights.h5 04_train_rnn.py MDN- RNN Weights.h5 ./rnn/weights.h5 Implementation 鍖les !10
  • 11. CMA-ES CMA-ES螳 企 蟆語 讌蟯朱 危危蠍 http://blog.otoro.net/2017/10/29/visual-evolution-strategies/ 05_train_controller.py Implementation 鍖les !11
  • 12. Discussion 1. 願 螳旧 一碁. 2. 襴殊螳 旧 螻 襴 蟆暑慨 覈詩讌 . 3. 譴 : 襴 蟆曙 觜瑚 一磯 襷 蟆 觜 蟲ロ 企. 一磯 豌願 . 4. World model simulation朱 谿蠏殊姶蠏 覦一 蟆 襴 襦 牛 policy襯 transfer 蠍 覓語 糾骸 蟆壱 螻殊 讌朱 5. 蠏 覦 レ: 1. Controller model 蟆 襷れ蠍 覓語 覲旧″ 螻襴讀 郁鍵 覲企 朱 一壱蠍 es. 襷 襷讌襷 鍖nal cumulative reward襷 螳讌螻 optimizer 襴 . 2. ES レ 覲 GPU 郁鍵 譬 伎 豕 RNN 蟲譟磯ゼ 蟆 . 6. : 1. capacity螳 . 螻朱 continuous 一危郁 ル蠍 覓語 2. RNN 郁鍵 覓語 螳 讌襦 讌 catastrophic forgetting 覓語襯 願屋 . solution: small MDN-RNN 蟆 襷り碓 external memory module 蟾 7. 朱: 1. 觚危 螳ロ讌 豌危企慨覃 譬 蟆 2. 企 task 磯 譬讌? !12