際際滷

際際滷Share a Scribd company logo
FeUdal Networks for
Hierarchical RL
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, etc.
DeepMind
Youngseok Yoon, MLG POSTECH.
Contents
 Architecture
 Learning
 Experiments
 Ablative analysis
 Discussion
Architecture (1)
Architecture (2)
  : CNN(16 8x8 4, 32 4x4 2) + FCL(256)
  : FCL
 To encourage exploration in transition policy,
by , emit random goal sampled.
  
: Standard LSTM, 256
  
: dLSTM, 256
 dLSTM: dilated LSTM
Learning (1)
 Advantage A-C: 諮  =   諮 log ( | ; )
  =     ; 
 The direction of state-space (goal):
(+|, )      + ,  
 Learning of Goal:
諮  = 諮     諮 + , t = 諮
 諮 =  

諮   +  ,  
 

=   

 ,
Learning (2)
 Intrinsic reward for Worker:


=
1

=1

    ≠, ≠
 Workers Policy gradient:
諮  =  

諮 log ( | ; )
 

=  + 腫

 

 ; 
 Managers Transition Policy Gradients
諮  

=      諮 log (+|; )
Experiments (1)
Experiments (2)
Experiments (3)
Experiments (4)
Experiments (5)
Experiments (7)
Ablative analysis (1)
Ablative analysis (2)
Ablative analysis (3)
Ablative analysis (4)
Ablative analysis (5)
Discussion
 Previous action  input朱 れ願る 瑚 .
轟壱 れ願 蟇伎, 蠍一 l 蟇伎.
 蟆郁記 + state襯 predict sub-goal  襷りる 覦.
 Step c 蠍郁 企  讌. (c=1  .)
=>c size螳 譴 覓語 螻  覓語 り 螳.
 Manager input state s襯 c step  蟇企一 predict ,
 c襯 hyper-parameter螳  model parameter襦  
 讌
 Mager 2c, c2 煙 Meta-Manager (Hyper-Manager)襯 
 讌.
Thank you!

More Related Content

Feudal networks for hierarchical reinforcement learning