5. Training Data Input
Training Data Input
Training data 覈 3-tuple (w, x, y)襯
W : natural-language question
X : image
Y : answer
覈語 覈 {m} 讌螻 螳螳 郁 襷り覲
theta(るジ讓 蠏碁殊 W) string
network襦 襷ろ network layout predictor P襦
蟲焔
覈語 P(w)襯 蠍磯朱 ろ語襯 語ろ伎ろ螻 x襯
レ朱 危 企 牛 覿襯 詞企
(ex. VQA 伎 豢 覈 Classifier襦 れ)
5
6. Modules
Modules
覈 覈 蟲煙朱 assemble 覈 覲 蟆. 企 豕
譟壱 螳ロ vision primitive 覲 企
Moduleれ 3螳讌 basic data type 伎 operation
A. Images
B. Unnormalized attention
C. Labels
TYPE[INSTANCE](ARG, )
A. TYPE : high-level module type(Attention, Re-Attention, )
B. INSTANCES : particular instance of model under consideration
6
13. Parsing
Parsing
Stanford Parser襦 煙 universal dependency representation 詞企
Parser kites 螳 覲旧 kite 螳 襦 lemmatization
危 譟伎 讌 讌覓語 wh-word 企 磯
: 覓語レ 覩語 覿覿 symbolic form 螻
ex)
what is standing in the field? -> what(stand)
What color is the tuck -> color(truck)
Is there a circle next to a square? -> is(circle, next-to(square))
13
14. Layout
Layout
覈 leaf attend module, internal nodes re-attend 轟 combine module, root
node YES/NO襯 牛 QAろ measure module襦 襾語 QA 蟆曙磯
classify module襦 郁屋
Parameter 郁屋
狩 high-level 蟲譟磯ゼ 螳讌襷 螳覲 覈れ るジ instanceれ 狩蟆 batch豌襴螳
螳ロ蠍 覓語
ex. what color is the cat? -> classify[color](attend[cat]),
where is the truck? -> classify[where](attend[truck]))
14
15. Answering natural
language questions
LSTM question Encoder
A. parser襷 蟆曙 讌覓語 蠍 覓語 覓語 覩碁ゼ れ朱 覦蠑語 讌襷 旧 レ
譴 覓碁 螳 蠍磯
ex) What is flying, What are flying? -> what(fly)襦 convert.
讌襷 旧 螳螳 kites kite螳 伎
=> question encoder 一危一 syntactic(蟲覓碁) regularities襯 覈碁蟆
れ
B. semantic(覩碁) regularities 谿 .
ex) what color is the bear?朱 讌覓語 朱 bear手 牛 蟆 襴.
green企手 豢襦 蟆 伎
=> question encode 企 譬襯 螻. 讀, semantic(覩碁) regularities襯 覈碁
15
16. Answering natural
language questions
LSTM question Encoder
A. parser襷 蟆曙 讌覓語 蠍 覓語 覓語 覩碁ゼ れ朱 覦蠑語 讌襷 旧 レ
譴 覓碁 螳 蠍磯
ex) What is flying, What are flying? -> what(fly)襦 convert.
讌襷 旧 螳螳 kites kite螳 伎
=> question encoder 一危一 syntactic(蟲覓碁) regularities襯 覈碁蟆
れ
B. semantic(覩碁) regularities 谿 .
ex) what color is the bear?朱 讌覓語 朱 bear手 牛 蟆 襴.
green企手 豢襦 蟆 伎
=> question encode 企 譬襯 螻. 讀, semantic(覩碁) regularities襯 覈碁
豕譬覈語
Neural Module Network
Output螻
LSTM question
Encoder襯
蟆壱
16