際際滷

際際滷Share a Scribd company logo
 
Running A.I
1st 蟷 ル 貉朱一
06.28
螳

伎襴暑 - Baruch college (Data Science Major)
ConnexionAI Freelance Researcher
覈郁規 CTRL (Contest in RL) 
DeepLearningCollege 螳 郁規
Github:
https://github.com/wonseokjung
Facebook:
https://www.facebook.com/ws.jung.798
Blog:
https://wonseokjung.github.io/
1. Reinforcement Learning
2. Atari
3. SuperMario
4.Sonic
5.Prosthetics
6.Latest trend

The Rise of Reinforcemet Learning
By Wonseok Jung
Reinforcement learning
蠍磯 企至 覦一瑚?
https://goo.gl/images/RQgtpW
蠍郁 朱伎, 譯殊襯 覦朱慨螻,  り, 碁 
蟲郁螳 螳襯伎  蟆 蟆企.
覦朱蓋
襷讌

企
願 牛 螻殊
殊企 り 豺谿 殊企 り
 襾轟朱 . 蠑語  襾轟
碁, 螻 ろ
願 蟆所骸 語 覃 覦一磯 覦覯螻 螳
 覿螻, computation ろ朱 牛 覦覯
Reinforcement learning(螳)企手 .
Reward襯 豕襦  action 
https://goo.gl/images/HpBRJT
Reinforcement learning Reward(覲伎) 豕  action() .
1. 覦讌襯 譴
2. 襯 覲伎螻 .
3.螳 蠍企.
4. 襯 °.
Fail and Success
Learner(覦一磯)  action 企慨覃, 
reward襯 螳 蟆 覦 action 谿城.
https://goo.gl/images/GoHQYh
Reinforcement Learning
 action 轟レ reward 訖襷 ,
れ   れ 殊企蟆  reward レ 殊 .
Action
轟レ
 覲
覩碁 
Reward 覩碁 Reward
Exploration and Exploitation
Agent reward襯  襷 覦 action 蠍  exploitation 伎 讌襷,
螳讌 action 螻螻襭 企慨覃 襷  蟆渚蠍 伎 exploration 伎狩
.
Agent
Exploitation
Exploration
?
Markov Decision process
Agent action Env
St At
Rt+1
St+1
Agent
St+1 .
Agent MDP襯 牛 env 語 覃 覦一企.
Atari
High dimensional state
Discrete actions
Deeplearning
ル 煙レ誤 朱 high dimensional data襯 input朱 覦蟆 螳ロ伎.
Deep learning+Reinforcement Learning
https://goo.gl/images/oNu5Gr
deep network reinforcement learning 蟆壱 螻襴讀
Deepmind, DQN
Deeplearning 螳旧 , 覲企 企ゼ  瑚概讌レ 襷
Deep Q network Architecture
input
Action
value
EnvQ-Network
Replay memory
(St, At, Rt+1, St+1)
St
At
Q(st, at)
St+1 Rt+1
Atari DQN 螻
轟 蟆曙襷 狩襾殊り 譬
焔レ 蟠 蟆
焔レ 譴 蟠 蟆
Skiing Chopper commandJamesBond
覦郁化 覲 蟆曙  焔レ 伎.
Result
JamesBond
Result
Skiing
Result
Command chopper
 覲旧″ state  襷 action  蟆曙 ?
SuperMario
High dimensional state
Discrete actions
Complex Environment
First challenge - SuperMario Bros
1985 Nintendo
螳旧朱  Mario襯 襷れ企慨
覯暑蟾蠍一 朱襴れ Goal 觜蟲
朱襴る 蟾覦 °蟆 覈覯暑 覈  蟆 覈
Reward - Breakout
State : 覃, [210, 260 , 3]
Action : None, 殊, るジ讓
Reward : 覯暑 蟆
State
覯暑 襦  Reward襯 覦.
Reward - 朱襴
State : 覃
Action : ,  , 譬, ,,襴蠍, action 譟壱
Reward : 朱 讌 Reward +1, る螳覃 -1
Transition Probability : 1
State
Action
谿讌 蟾覦 螳蟾 螳襦  reward襯 覦.
DQN  
input
Action
value
EnvQ-Network s
s
Replay memory
Q(s,a)
a
r
(St, At, Rt+1, St+1)
螻 ろ
https://youtu.be/zRf_7Xa_MSE
語 覓伎手?
Action 覯
Complexity
覲旧′煙朱 誤 旧  企給.
Reward れ
覈燕讌 覈詩覃 -
螳 讌襷 -
蟾覦 覃伎覃 -
蟾覦 螳蟾讌覃 +
覈 谿覃 +
Penalty, Bonus reward豢螳
Deep learning model
VGG model and regular 觜蟲
https://goo.gl/images/eoXooChttps://goo.gl/images/s8XrCK
 蟾蟆 覲伎
Level 1 糾骸!
https://youtu.be/WlLBRsgSFt8
After

7000Episodes

6 Days
螳 Level 覃伎 るゴ蠍 覓語 General agent襯錫
襷り鍵螳 企給.
襴讌  覓語
覯 2襯 旧る 譴..
Exploration??
https://youtu.be/EvyM4ZUhDpE
Sonic
High dimensional state
Discrete actions
More Complex Environment
Skills
Third steps
( more more complex state,
more more actions )
OpenAI Retro challenge
OpenAI 螳豕 Sonic Contest 谿語
Third steps
( more more complex state,
more more actions )
action 譟壱 + skill  覲旧′煙 讌
 企れ讌 企 襷讌 action 譟壱
豕 DQN 螻襴讀 矧企慨
To the Rainbow
2017 10 Deepmind Rainbow DQN 覦
https://github.com/wonseokjung/wonseokjung.github.io/blob/master/_posts/2018-05-23-RL-Totherb7.md
谿瑚 :
https://wonseokjung.github.io//reinforcementlearning/update/RL-Totherb7/
+ 7.A3C
Deep Q network
input Env
Double DQN , 
DIS
s
s
Replay memory
Q(s,a)
a
r
Noisy
(Rt+1 + 粒t+1q慮(St+1, argmaxaq(St)  q慮(St, At))2
Prioritized
replay
旧  
transitions
sample
Multi-step learning
To the Rainbow-2
DQN螻伎 螻襴讀6螳 A3C襯 譟壱 襷 螳 螻襴讀企.
SuperMario  DQN螻襴讀 
觜 豌
Atari蟆曙 焔ル蟲
Rainbow襯  Sonic 
Sonic -Rainbow DQN(with noisy network, epsilon =0 )
https://contest.openai.com/videos/132.mp4
 10%襦 OpenAI  襷覓企Μ!
蟆   襷 action 螳讌 agent
螳旧朱 旧 螳ロ蟾 ?
A.I Prosthetics
High dimensional state
Continuous actions
Forth step
( Continuos action )
NIPS 2018 : AI for Prosthetics Challenge
Discrete Action Continuos Action
Action in Real world
DQN solved High dimensional state, but not continues action
https://twitter.com/iamruj
Two methods of choosing action
1. action-value :
- Learning the action value
- Estimate action value 覦朱 action .
- Policies would not even exist without the action-value estimates
2. Parameterized policy :
- select actions without consulting value function
- Value function still be used to learn policy parameter
- Value function action  蠍一朱 讌 
J(慮) : Performance measure
q
(s, a) = E[Gt  St = s, At = a]
Discrete Action Continuos Action
Select action using PG Method
https://www.cs.ubc.ca/~gberseth/blog/demystifying-the-many-deep-reinforcement-learning-algorithms.html http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/pg.pdf
襯 一危故
Emergence of Locomotion Behaviours in Rich Environments
https://www.youtube.com/watch?v=hx_bgoTF7bs&t=98s
覓 蠍一
Community 殊 蠍譴..
螳旧 伎狩 覓語
Latest trends
DeepMimic
Reference Motion 覲
螻 磯狩 伎
伎 螻殊 讌 
襦..
https://www.youtube.com/watch?v=XCLSkFKTWyg
This virtual stuntman could improve video game physics
企 覡伎 蟆曙 螳語 襷れ 蟾?
Unity ml-agent
Unity Machine Learning Agents襯  螳語 蟆曙  蟆 螳
Unity ml-agent
Imitation learning
https://www.youtube.com/watch?v=kpb8ZkMBFYs&feature=youtu.be
 危蟆 
旧朱
Unity ml-agent
Curriculum learning
Easy
Medium
Hard
Very easy
Very hard
https://youtu.be/vRPJAefVYEQ
Exploration ?
Sparse Reward?
Exploration
1. Reinforcement Learning
2. Atari
3. SuperMario
4.Sonic
5.Prosthetics
6.Latest trend
Summary
The Rise of Reinforcemet Learning
By Wonseok Jung
螳.
Github:
https://github.com/wonseokjung
Facebook:
https://www.facebook.com/ws.jung.798
Blog:
https://wonseokjung.github.io/

More Related Content

Deeplearning conf