Diversity is all you need(DIAYN) : Learning Skills without a Reward FunctionYechan(Paul) Kim
油
DIAYN is an unsupervised reinforcement learning method that learns diverse skills without a reward function. It works by maximizing the mutual information between skills and states visited to ensure skills dictate different states, while minimizing the mutual information between skills and actions given a state to distinguish skills based on states. It also maximizes a mixture of policies to encourage diverse skills. Experiments show DIAYN discovers locomotion skills in complex environments and sometimes learns skills that solve benchmark tasks. The learned skills can then be adapted to maximize rewards, used for hierarchical RL, and to imitate experts.
5. Training Data Input
Training Data Input
Training data 覈 3-tuple (w, x, y)襯
W : natural-language question
X : image
Y : answer
覈語 覈 {m} 讌螻 螳螳 郁 襷り覲
theta(るジ讓 蠏碁殊 W) string
network襦 襷ろ network layout predictor P襦
蟲焔
覈語 P(w)襯 蠍磯朱 ろ語襯 語ろ伎ろ螻 x襯
レ朱 危 企 牛 覿襯 詞企
(ex. VQA 伎 豢 覈 Classifier襦 れ)
5
6. Modules
Modules
覈 覈 蟲煙朱 assemble 覈 覲 蟆. 企 豕
譟壱 螳ロ vision primitive 覲 企
Moduleれ 3螳讌 basic data type 伎 operation
A. Images
B. Unnormalized attention
C. Labels
TYPE[INSTANCE](ARG, )
A. TYPE : high-level module type(Attention, Re-Attention, )
B. INSTANCES : particular instance of model under consideration
6
13. Parsing
Parsing
Stanford Parser襦 煙 universal dependency representation 詞企
Parser kites 螳 覲旧 kite 螳 襦 lemmatization
危 譟伎 讌 讌覓語 wh-word 企 磯
: 覓語レ 覩語 覿覿 symbolic form 螻
ex)
what is standing in the field? -> what(stand)
What color is the tuck -> color(truck)
Is there a circle next to a square? -> is(circle, next-to(square))
13
14. Layout
Layout
覈 leaf attend module, internal nodes re-attend 轟 combine module, root
node YES/NO襯 牛 QAろ measure module襦 襾語 QA 蟆曙磯
classify module襦 郁屋
Parameter 郁屋
狩 high-level 蟲譟磯ゼ 螳讌襷 螳覲 覈れ るジ instanceれ 狩蟆 batch豌襴螳
螳ロ蠍 覓語
ex. what color is the cat? -> classify[color](attend[cat]),
where is the truck? -> classify[where](attend[truck]))
14
15. Answering natural
language questions
LSTM question Encoder
A. parser襷 蟆曙 讌覓語 蠍 覓語 覓語 覩碁ゼ れ朱 覦蠑語 讌襷 旧 レ
譴 覓碁 螳 蠍磯
ex) What is flying, What are flying? -> what(fly)襦 convert.
讌襷 旧 螳螳 kites kite螳 伎
=> question encoder 一危一 syntactic(蟲覓碁) regularities襯 覈碁蟆
れ
B. semantic(覩碁) regularities 谿 .
ex) what color is the bear?朱 讌覓語 朱 bear手 牛 蟆 襴.
green企手 豢襦 蟆 伎
=> question encode 企 譬襯 螻. 讀, semantic(覩碁) regularities襯 覈碁
15
16. Answering natural
language questions
LSTM question Encoder
A. parser襷 蟆曙 讌覓語 蠍 覓語 覓語 覩碁ゼ れ朱 覦蠑語 讌襷 旧 レ
譴 覓碁 螳 蠍磯
ex) What is flying, What are flying? -> what(fly)襦 convert.
讌襷 旧 螳螳 kites kite螳 伎
=> question encoder 一危一 syntactic(蟲覓碁) regularities襯 覈碁蟆
れ
B. semantic(覩碁) regularities 谿 .
ex) what color is the bear?朱 讌覓語 朱 bear手 牛 蟆 襴.
green企手 豢襦 蟆 伎
=> question encode 企 譬襯 螻. 讀, semantic(覩碁) regularities襯 覈碁
豕譬覈語
Neural Module Network
Output螻
LSTM question
Encoder襯
蟆壱
16