ݺߣ

ݺߣShare a Scribd company logo
Ask the right question
Active Question Reformulation with Reinforcement Learning

2018.06.11 ???
Table of Contents
1. Reinforcement Learning

2. Active Question Answering

3. BiDirectional Attention Flow

4. Experiment

5. Analysis of The Agents Language
Reinforcement Learning
Reinforcement Learning
? Reinforcement Learning = Reinforcement + Machine Learning

? What is Reinforcement?

? ??? ???? ?? ????? ??? ? ?? ?? ??? ??? ??? ? 

? Ex) Skinner? ???? ??
!4
Reinforcement Learning
? What is Reinforcement Learning

? ??? X: ?? ???? ?? ??? ???

? ?? Y: ??? ??? ????

? ???? ?? ??? ????? ??  ??? ?? ?? ?? ??? ??
!5
Reinforcement Learning
? What is Reinforcement Learning

? Agent? Environment? ????  ??? ??(state, action, reward? history )

? ??? policy? ?? ?? ??  ?? reward? ????? ?

? Agent: ??? ???? ??? ???? ?? ???? ??

? Environment: agent? ??? ???
!6
Markov Decision Process
? MDP(Markov Decision Process)

? Sequential decision making ??? ???? framework

? 5-tuple (state, action, reward, transition probability, discount factor)
!7
Markov Decision Process
? ????? MDP ??
!8
Markov Decision Process
? ????? MDP ??

? ??? ??? ??? policy? ?? ?

? Policy = ?? state?? ?? action? ??? probability
!9
Markov Decision Process
? Return: ?? state?? ?? action? ?? ??? ?? reward? ?
Policy Gradient
? Parameterized policy? ?? (linear function approx. or Neural Network)

? policy? input? state? feature??? raw pixels / output? probability of action
http://karpathy.github.io/2016/05/31/rl/
Policy Gradient
? Supervised Learning: Maximize log likelihood 

? ?? ?? ??? policy? ???? (Imitation Learning, correct action label? ??)
Policy Gradient
? Policy Gradient

? maximized log likelihood of probability of taking action weighted by reward(return)
Policy Gradient
? + reward? ?? ??? policy distribution? ??
Application of DeepRL
? Game play

? Alphago, Atari, Vizdoom

? Robotics

? robot arm manipulation, locomotion

? Natural language process

? Question Answering, Chatting

? Autonomous driving

? Mobileye
!15
https://www.youtube.com/watch?v=vppFvq2quQ0
Active Question Answering
Ask The Right Question
? ICLR 2018? Oral presentation?? accept
Jeopardy!
? Jeopardy! : ??? ??? ???

? ??? ??? ? ??? ???? ??? ?? ??? ?? 

? ???? Question Answering ??
https://namu.wiki/w/Jeopardy! https://abcnews.go.com/Entertainment/jeopardy-things-americas-favorite-quiz-show/story?id=18824501
Jeopardy! Dataset
? ??? ?? ????? ???? ??: Q&A pairs
SearchQA Dataset
? SearchQA

? Matthew Dunn, Levent Sagun, Mike Higgins, Ugur Guney, Volkan Cirik, and Kyunghyun Cho.
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. https://arxiv.org/
abs/1704.05179, 2017. 

? Github repo: https://github.com/nyu-dl/SearchQA

? Jeopardy! ??? ? dataset? ?????? web ???

? 140k question-answer pairs, ? pair? ?? 49.6 snippet

? ? question?? google?? querying

? ? ? ???? information retrieval system? ?? ????
SearchQA Dataset
Problem De?nition
? Active Question Answering

? frame QA as a Reinforcement Learning problem

? English-to-English machine translation: paraphrasing 

? MDP? ?????!

? Agent: Reformulate

? Environment: Q&A system

? State: ????? ?? ?? ??

? Action: q (agent? ?? reformulate? ??)

? Reward: question answering quality
AQA Model
1. QA Environment

? BiDirectional Attention Flow (BiDAF) ??? ??

? Question? ?? answer? ?? (Training ? ?? reward)

? Reward: token level F1-score (answer? quality ??)

2. Reformulation Model

? Sequence-to-sequence model

? Multilingual translation? ?? pre-training

3. Answer Selection Model

? Test time? ???? ?? (?? ????? ?)

? Train ? ?? QA env? output? ?? reward? ??, test ? ?? answer ?? ?? ?? answer? ??? ?

? [query, rewrite, answer]? embedding? pre-training, 3?? embedding? concatenation

? 1-d convolution? ?? binary classi?cation
Reformulation model
? Massive Exploration of Neural Machine Translation Architectures - Denny Britz, 2017
BiDirectional Attention Flow
? BI-DIRECTIONAL ATTENTION FLOW FOR MACHINE COMPREHENSION - MinJoon Seo, 2017

? GitHub: https://github.com/allenai/bi-att-?ow

? query? context? ?? answering ?? model

? Bidirectional attention ?ow mechanism? ?? query? ?? context ??

? SQuAD(Stanford Question Answering Dataset) ???? ?? state-of-art(2017? ?? ?? ??)
https://rajpurkar.github.io/SQuAD-explorer/
BiDirectional Attention Flow
? SQuAD ??
BiDirectional Attention Flow
? Character embedding + word embedding  contextual embedding 

? Attention Flow: not ?xed length + memoryless
Training: Reformulation
? Policy Gradient Training

? ?? ??? ?? ?? ?? ??? question? ?? ?? ?? answer? ????? ???.

? Parameterized policy ??

? policy? seq2seq model??? ??? ?? ?? ??

? Policy? ?? question? ??? ? question? ?? environment? action? ??  reward ??

? ?? reward? ??? ?? policy? ??
Training: Reformulation
? Policy Gradient Training

? ??????? REINFORCE? ??: log likelihood? gradient? ?? ??????? reward? weight

? REINFORCE? gradient estimate? high variance? ???? ??  baseline ??

? sub-optimal? ??? ?? ??(exploration? ??)?? ?? entropy regularization ??
Training: Reformulation
? Policy Gradient Training

? ?? objective function? ??? ??. baseline? q_0? ??? reward ??? ??

? Pre-training

? Paraphrasing ? pre-training: translate English-to-English

? ??? ???? ???? ?? multilingual translation ?? (English-Spanish, French-English, etc.)

? Multilingual United Nations Parallel Corpus v1.0: 11.4M sentences

? Monolingual data? ???? ?? ?? (small corpus) 

? Paralex database of question paraphrases: 1.5M pairs(1 question ? 4?? paraphrase)
Experiment
Training: Reformulation
? Pretraining setting

? Optimizer: Adam

? Learning rate: 0.001, train: 400M instances

? RL setting

? optimizer: SGD

? Train: 100k RL steps

? Batch size: 64

? Learning rate: 0.001

? Regularization weight: 0.001

? QA system? GPU??, reformulation model ??? CPU??
Training: Answer Selector
? Answer Selector? ??: binary classi?cation

? reformulator? 20?? question ??  [query, rewrite, answer] ?? ??

? ? ??? ?? ?? answer? ???? ?

? ??? ???? model? ?? rewrite? ?? answer? ?? ??/???? classi?cation

? ?? ??: positive, ?? ??: negative

? token? ?? 100 dimension embedding? pre-training

? Query  embedding  100-d vector  1-d CNN(?lter size=3) K
? rewrite  embedding  100-d vector  1-d CNN(?lter size=3)  feed-forward network
? answer embedding  100-d vector  1-d CNN(?lter size=3) J
? (??? ?? ?? answer? ??? ??? ??..?)
` `
Result
? EM, F1: ??? model? answer? token level metric

? TopHyp: seq2seq model? output ? ? ?? reformulation ??

? CNN: CNN-based selector? ???? best answer? ??
Analysis of The Agents
Language
Statistics of Questions
? Length: question? ?? word? ??. TF(term frequency): question ??? ???? word? ??

? DF(document frequency): question ?? token? context? ???? ??? median

? QC(Query Clarity): question? reformulation ??? relative entropy
Question
Clue
gandhi deeply in?uenced count wrote war
peace.
Base-NMT Who in?uenced count wrote war?
AQA-QR
What is name gandhi gandhi in?uence
wrote peace peace?
Statistics of Questions
? Base-NMT

? ?? syntactically well-formed question

? Lower DF: NMT training corpus? SearchQA ??? ?? ??? ????

? AQA-QR: TopHyp

? 99.8% ? what is name?? ???: ?? answer? name? ??? ??? ??? ??? ?

? Less ?uent

? Multiple token? ???? ?? ??? SearchQA? ?? 2?
Paraphrasing Quality
? Image captioning dataset?? paraphrasing quality? ???

? MSCOCO ???? ??

? ????? 5?? caption? ??? ??? source? ???? ??? 4?? reference?

? Base-NMT: 11.4 BLEU / AQA-QR: 8.6 BLEU
Reformulation Examples
Future work
? One-shot decision  Sequential Decision 

? Information seeking task

? End-to-end RL problem

? Closed loop between reformulator and selector
Thank you

More Related Content

2018 06-11-active-question-answering

  • 1. Ask the right question Active Question Reformulation with Reinforcement Learning 2018.06.11 ???
  • 2. Table of Contents 1. Reinforcement Learning 2. Active Question Answering 3. BiDirectional Attention Flow 4. Experiment 5. Analysis of The Agents Language
  • 4. Reinforcement Learning ? Reinforcement Learning = Reinforcement + Machine Learning ? What is Reinforcement? ? ??? ???? ?? ????? ??? ? ?? ?? ??? ??? ??? ? ? Ex) Skinner? ???? ?? !4
  • 5. Reinforcement Learning ? What is Reinforcement Learning ? ??? X: ?? ???? ?? ??? ??? ? ?? Y: ??? ??? ???? ? ???? ?? ??? ????? ?? ??? ?? ?? ?? ??? ?? !5
  • 6. Reinforcement Learning ? What is Reinforcement Learning ? Agent? Environment? ???? ??? ??(state, action, reward? history ) ? ??? policy? ?? ?? ?? ?? reward? ????? ? ? Agent: ??? ???? ??? ???? ?? ???? ?? ? Environment: agent? ??? ??? !6
  • 7. Markov Decision Process ? MDP(Markov Decision Process) ? Sequential decision making ??? ???? framework ? 5-tuple (state, action, reward, transition probability, discount factor) !7
  • 8. Markov Decision Process ? ????? MDP ?? !8
  • 9. Markov Decision Process ? ????? MDP ?? ? ??? ??? ??? policy? ?? ? ? Policy = ?? state?? ?? action? ??? probability !9
  • 10. Markov Decision Process ? Return: ?? state?? ?? action? ?? ??? ?? reward? ?
  • 11. Policy Gradient ? Parameterized policy? ?? (linear function approx. or Neural Network) ? policy? input? state? feature??? raw pixels / output? probability of action http://karpathy.github.io/2016/05/31/rl/
  • 12. Policy Gradient ? Supervised Learning: Maximize log likelihood ? ?? ?? ??? policy? ???? (Imitation Learning, correct action label? ??)
  • 13. Policy Gradient ? Policy Gradient ? maximized log likelihood of probability of taking action weighted by reward(return)
  • 14. Policy Gradient ? + reward? ?? ??? policy distribution? ??
  • 15. Application of DeepRL ? Game play ? Alphago, Atari, Vizdoom ? Robotics ? robot arm manipulation, locomotion ? Natural language process ? Question Answering, Chatting ? Autonomous driving ? Mobileye !15 https://www.youtube.com/watch?v=vppFvq2quQ0
  • 17. Ask The Right Question ? ICLR 2018? Oral presentation?? accept
  • 18. Jeopardy! ? Jeopardy! : ??? ??? ??? ? ??? ??? ? ??? ???? ??? ?? ??? ?? ? ???? Question Answering ?? https://namu.wiki/w/Jeopardy! https://abcnews.go.com/Entertainment/jeopardy-things-americas-favorite-quiz-show/story?id=18824501
  • 19. Jeopardy! Dataset ? ??? ?? ????? ???? ??: Q&A pairs
  • 20. SearchQA Dataset ? SearchQA ? Matthew Dunn, Levent Sagun, Mike Higgins, Ugur Guney, Volkan Cirik, and Kyunghyun Cho. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. https://arxiv.org/ abs/1704.05179, 2017. ? Github repo: https://github.com/nyu-dl/SearchQA ? Jeopardy! ??? ? dataset? ?????? web ??? ? 140k question-answer pairs, ? pair? ?? 49.6 snippet ? ? question?? google?? querying ? ? ? ???? information retrieval system? ?? ????
  • 22. Problem De?nition ? Active Question Answering ? frame QA as a Reinforcement Learning problem ? English-to-English machine translation: paraphrasing ? MDP? ?????! ? Agent: Reformulate ? Environment: Q&A system ? State: ????? ?? ?? ?? ? Action: q (agent? ?? reformulate? ??) ? Reward: question answering quality
  • 23. AQA Model 1. QA Environment ? BiDirectional Attention Flow (BiDAF) ??? ?? ? Question? ?? answer? ?? (Training ? ?? reward) ? Reward: token level F1-score (answer? quality ??) 2. Reformulation Model ? Sequence-to-sequence model ? Multilingual translation? ?? pre-training 3. Answer Selection Model ? Test time? ???? ?? (?? ????? ?) ? Train ? ?? QA env? output? ?? reward? ??, test ? ?? answer ?? ?? ?? answer? ??? ? ? [query, rewrite, answer]? embedding? pre-training, 3?? embedding? concatenation ? 1-d convolution? ?? binary classi?cation
  • 24. Reformulation model ? Massive Exploration of Neural Machine Translation Architectures - Denny Britz, 2017
  • 25. BiDirectional Attention Flow ? BI-DIRECTIONAL ATTENTION FLOW FOR MACHINE COMPREHENSION - MinJoon Seo, 2017 ? GitHub: https://github.com/allenai/bi-att-?ow ? query? context? ?? answering ?? model ? Bidirectional attention ?ow mechanism? ?? query? ?? context ?? ? SQuAD(Stanford Question Answering Dataset) ???? ?? state-of-art(2017? ?? ?? ??) https://rajpurkar.github.io/SQuAD-explorer/
  • 27. BiDirectional Attention Flow ? Character embedding + word embedding contextual embedding ? Attention Flow: not ?xed length + memoryless
  • 28. Training: Reformulation ? Policy Gradient Training ? ?? ??? ?? ?? ?? ??? question? ?? ?? ?? answer? ????? ???. ? Parameterized policy ?? ? policy? seq2seq model??? ??? ?? ?? ?? ? Policy? ?? question? ??? ? question? ?? environment? action? ?? reward ?? ? ?? reward? ??? ?? policy? ??
  • 29. Training: Reformulation ? Policy Gradient Training ? ??????? REINFORCE? ??: log likelihood? gradient? ?? ??????? reward? weight ? REINFORCE? gradient estimate? high variance? ???? ?? baseline ?? ? sub-optimal? ??? ?? ??(exploration? ??)?? ?? entropy regularization ??
  • 30. Training: Reformulation ? Policy Gradient Training ? ?? objective function? ??? ??. baseline? q_0? ??? reward ??? ?? ? Pre-training ? Paraphrasing ? pre-training: translate English-to-English ? ??? ???? ???? ?? multilingual translation ?? (English-Spanish, French-English, etc.) ? Multilingual United Nations Parallel Corpus v1.0: 11.4M sentences ? Monolingual data? ???? ?? ?? (small corpus) ? Paralex database of question paraphrases: 1.5M pairs(1 question ? 4?? paraphrase)
  • 32. Training: Reformulation ? Pretraining setting ? Optimizer: Adam ? Learning rate: 0.001, train: 400M instances ? RL setting ? optimizer: SGD ? Train: 100k RL steps ? Batch size: 64 ? Learning rate: 0.001 ? Regularization weight: 0.001 ? QA system? GPU??, reformulation model ??? CPU??
  • 33. Training: Answer Selector ? Answer Selector? ??: binary classi?cation ? reformulator? 20?? question ?? [query, rewrite, answer] ?? ?? ? ? ??? ?? ?? answer? ???? ? ? ??? ???? model? ?? rewrite? ?? answer? ?? ??/???? classi?cation ? ?? ??: positive, ?? ??: negative ? token? ?? 100 dimension embedding? pre-training ? Query embedding 100-d vector 1-d CNN(?lter size=3) K ? rewrite embedding 100-d vector 1-d CNN(?lter size=3) feed-forward network ? answer embedding 100-d vector 1-d CNN(?lter size=3) J ? (??? ?? ?? answer? ??? ??? ??..?) ` `
  • 34. Result ? EM, F1: ??? model? answer? token level metric ? TopHyp: seq2seq model? output ? ? ?? reformulation ?? ? CNN: CNN-based selector? ???? best answer? ??
  • 35. Analysis of The Agents Language
  • 36. Statistics of Questions ? Length: question? ?? word? ??. TF(term frequency): question ??? ???? word? ?? ? DF(document frequency): question ?? token? context? ???? ??? median ? QC(Query Clarity): question? reformulation ??? relative entropy Question Clue gandhi deeply in?uenced count wrote war peace. Base-NMT Who in?uenced count wrote war? AQA-QR What is name gandhi gandhi in?uence wrote peace peace?
  • 37. Statistics of Questions ? Base-NMT ? ?? syntactically well-formed question ? Lower DF: NMT training corpus? SearchQA ??? ?? ??? ???? ? AQA-QR: TopHyp ? 99.8% ? what is name?? ???: ?? answer? name? ??? ??? ??? ??? ? ? Less ?uent ? Multiple token? ???? ?? ??? SearchQA? ?? 2?
  • 38. Paraphrasing Quality ? Image captioning dataset?? paraphrasing quality? ??? ? MSCOCO ???? ?? ? ????? 5?? caption? ??? ??? source? ???? ??? 4?? reference? ? Base-NMT: 11.4 BLEU / AQA-QR: 8.6 BLEU
  • 40. Future work ? One-shot decision Sequential Decision ? Information seeking task ? End-to-end RL problem ? Closed loop between reformulator and selector