際際滷

際際滷Share a Scribd company logo
Ask the right question
Active Question Reformulation with Reinforcement Learning

2018.06.11 伎
Table of Contents
1. Reinforcement Learning

2. Active Question Answering

3. BiDirectional Attention Flow

4. Experiment

5. Analysis of The Agents Language
Reinforcement Learning
Reinforcement Learning
 Reinforcement Learning = Reinforcement + Machine Learning

 What is Reinforcement?

 覦一一 讌襷 讌 覃伎 覲伎  襷 視  襯 企 蟆 

 Ex) Skinner 覓語 ろ
!4
Reinforcement Learning
 What is Reinforcement Learning

 一危 X: 企  企  讌

  Y: 朱 覲伎 覦讌

 一危一  伎 蟯蟯螻襯   覲伎 襷 覦蟆  豈 
!5
Reinforcement Learning
 What is Reinforcement Learning

 Agent Environment 語  一危 (state, action, reward history )

 豕 policy襯 谿城 蟆 覈   reward襯 豕 蟆

 Agent: 襯 蟯谿壱螻   覈 讌レ 譯殊牡

 Environment: agent襯 誤 襾語
!6
Markov Decision Process
 MDP(Markov Decision Process)

 Sequential decision making 覓語  framework

 5-tuple (state, action, reward, transition probability, discount factor)
!7
Markov Decision Process
 蠏碁Μ MDP 
!8
Markov Decision Process
 蠏碁Μ MDP 

 旧 覈 豕 policy襯 谿城 蟆

 Policy = 轟 state 轟 action  probability
!9
Markov Decision Process
 Return: 轟 state 轟 action 豬 危 覦 reward
Policy Gradient
 Parameterized policy襯 螳 (linear function approx. or Neural Network)

 policy input state feature願碓 raw pixels / output probability of action
http://karpathy.github.io/2016/05/31/rl/
Policy Gradient
 Supervised Learning: Maximize log likelihood 

 れ  磯殊 policy襯 一危 (Imitation Learning, correct action label 譟伎)
Policy Gradient
 Policy Gradient

 maximized log likelihood of probability of taking action weighted by reward(return)
Policy Gradient
 + reward襯 覦 讓曙朱 policy distribution 企
Application of DeepRL
 Game play

 Alphago, Atari, Vizdoom

 Robotics

 robot arm manipulation, locomotion

 Natural language process

 Question Answering, Chatting

 Autonomous driving

 Mobileye
!15
https://www.youtube.com/watch?v=vppFvq2quQ0
Active Question Answering
Ask The Right Question
 ICLR 2018 Oral presentation朱 accept
Jeopardy!
 Jeopardy! : 覩瑚記 る 伎

 讌覓語 る 蠏 讌覓語 螳襴る   襷豢 伎 

  Question Answering 覓語
https://namu.wiki/w/Jeopardy! https://abcnews.go.com/Entertainment/jeopardy-things-americas-favorite-quiz-show/story?id=18824501
Jeopardy! Dataset
 れ螻 螳 一危一 螻糾 : Q&A pairs
SearchQA Dataset
 SearchQA

 Matthew Dunn, Levent Sagun, Mike Higgins, Ugur Guney, Volkan Cirik, and Kyunghyun Cho.
SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. https://arxiv.org/
abs/1704.05179, 2017. 

 Github repo: https://github.com/nyu-dl/SearchQA

 Jeopardy! 讌覓瑚骸  dataset 覦朱伎 web 襦る

 140k question-answer pairs, 螳 pair 蠏 49.6 snippet

 螳 question襷 google querying

 譬  れ information retrieval system 襷 一危一
SearchQA Dataset
Problem De鍖nition
 Active Question Answering

 frame QA as a Reinforcement Learning problem

 English-to-English machine translation: paraphrasing 

 MDP襯 螳企慨!

 Agent: Reformulate

 Environment: Q&A system

 State: 一危一   讌覓

 Action: q (agent  reformulate 讌覓)

 Reward: question answering quality
AQA Model
1. QA Environment

 BiDirectional Attention Flow (BiDAF) 覈語 

 Question  answer襯 襷 (Training   reward)

 Reward: token level F1-score (answer quality 螳)

2. Reformulation Model

 Sequence-to-sequence model

 Multilingual translation 牛 pre-training

3. Answer Selection Model

 Test time  覈 (磯 旧貅 )

 Train   QA env output 牛 reward襯 螻, test   answer 譴 螳 譬 answer襯 螻殊 

 [query, rewrite, answer] embedding pre-training, 3螳 embedding concatenation

 1-d convolution 牛 binary classi鍖cation
Reformulation model
 Massive Exploration of Neural Machine Translation Architectures - Denny Britz, 2017
BiDirectional Attention Flow
 BI-DIRECTIONAL ATTENTION FLOW FOR MACHINE COMPREHENSION - MinJoon Seo, 2017

 GitHub: https://github.com/allenai/bi-att-鍖ow

 query context襯 牛 answering  model

 Bidirectional attention 鍖ow mechanism 牛 query  context 

 SQuAD(Stanford Question Answering Dataset) 一危一  state-of-art(2017 朱 覦 轟)
https://rajpurkar.github.io/SQuAD-explorer/
BiDirectional Attention Flow
 SQuAD
BiDirectional Attention Flow
 Character embedding + word embedding  contextual embedding 

 Attention Flow: not 鍖xed length + memoryless
Training: Reformulation
 Policy Gradient Training

 蟆郁記 磯Μ螳 螻 苦 蟆 譯殊伎 question  螳 譬 answer襯 襷れ企企 蟆企.

 Parameterized policy 

 policy seq2seq model企襦 れ螻 螳  螳

 Policy襯 牛 question る 蠏 question  environment螳 action   reward 

 れ reward襯 豕襦  policy襯
Training: Reformulation
 Policy Gradient Training

 旧螻襴讀 REINFORCE襯 : log likelihood gradient襯 磯 一危誤 reward螳 weight

 REINFORCE gradient estimate螳 high variance襯 螳讌る 覓語  baseline 

 sub-optimal 觜讌 蟆 覦讌(exploration ル)蠍  entropy regularization 豢螳
Training: Reformulation
 Policy Gradient Training

 豕譬 objective function れ螻 螳. baseline q_0 伎 reward 蠏 螻

 Pre-training

 Paraphrasing  pre-training: translate English-to-English

 覿 一危磯 牛蠍  multilingual translation  (English-Spanish, French-English, etc.)

 Multilingual United Nations Parallel Corpus v1.0: 11.4M sentences

 Monolingual data 伎 豢螳  (small corpus) 

 Paralex database of question paraphrases: 1.5M pairs(1 question  4螳 paraphrase)
Experiment
Training: Reformulation
 Pretraining setting

 Optimizer: Adam

 Learning rate: 0.001, train: 400M instances

 RL setting

 optimizer: SGD

 Train: 100k RL steps

 Batch size: 64

 Learning rate: 0.001

 Regularization weight: 0.001

 QA system GPU, reformulation model 旧 CPU
Training: Answer Selector
 Answer Selector : binary classi鍖cation

 reformulator螳 20螳 question   [query, rewrite, answer]  

 蠏 譴 螳 譬 answer襯 谿場伎 

 願  model 企 rewrite  answer螳 蠏 伎/危語 classi鍖cation

 蠏 伎: positive, 蠏 危: negative

 token  100 dimension embedding pre-training

 Query  embedding  100-d vector  1-d CNN(鍖lter size=3) 
 rewrite  embedding  100-d vector  1-d CNN(鍖lter size=3)  feed-forward network
 answer embedding  100-d vector  1-d CNN(鍖lter size=3) 
 (蠏碁 螳 譬 answer襯 企至 螻襯企 蟇伎..?)
` `
Result
 EM, F1: 糾骸 model answer token level metric

 TopHyp: seq2seq model output 譴 豌 覯讌 reformulation 

 CNN: CNN-based selector襯 伎 best answer襯
Analysis of The Agents
Language
Statistics of Questions
 Length: question  word 螳. TF(term frequency): question  覦覲給 word 螳

 DF(document frequency): question  token context   median

 QC(Query Clarity): question螻 reformulation 伎 relative entropy
Question
Clue
gandhi deeply in鍖uenced count wrote war
peace.
Base-NMT Who in鍖uenced count wrote war?
AQA-QR
What is name gandhi gandhi in鍖uence
wrote peace peace?
Statistics of Questions
 Base-NMT

 螳 syntactically well-formed question

 Lower DF: NMT training corpus螳 SearchQA 一危 螻 谿願 碁

 AQA-QR: TopHyp

 99.8% 螳 what is name朱 : 覈 answer螳 name螻 蟯 伎 企蟆 給 

 Less 鍖uent

 Multiple token 螻  蟆曙郁 SearchQA 觜 2覦
Paraphrasing Quality
 Image captioning dataset朱 paraphrasing quality襯 ろ

 MSCOCO 一危一 

 企語襷 5螳 caption  襯 source襦 螻 襾語 4螳襯 reference襦

 Base-NMT: 11.4 BLEU / AQA-QR: 8.6 BLEU
Reformulation Examples
Future work
 One-shot decision  Sequential Decision 

 Information seeking task

 End-to-end RL problem

 Closed loop between reformulator and selector
Thank you

More Related Content

What's hot (20)

Q Learning螻 CNN 伎 Object Localization
Q Learning螻 CNN 伎 Object LocalizationQ Learning螻 CNN 伎 Object Localization
Q Learning螻 CNN 伎 Object Localization
覦 蟾
[襾瑚]Chap11 螳
[襾瑚]Chap11 螳[襾瑚]Chap11 螳
[襾瑚]Chap11 螳
譬 豕
Guided policy search
Guided policy searchGuided policy search
Guided policy search
Jaehyeon Park
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
螳 蠍一_2(Deep sarsa, Deep Q-learning, DQN)
螳 蠍一_2(Deep sarsa, Deep Q-learning, DQN)螳 蠍一_2(Deep sarsa, Deep Q-learning, DQN)
螳 蠍一_2(Deep sarsa, Deep Q-learning, DQN)
Euijin Jeong
螳糾鍵豐(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 1
螳糾鍵豐(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 1螳糾鍵豐(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 1
螳糾鍵豐(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 1
Euijin Jeong
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQN
Curt Park
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1
San Kim
螳 企 蟲: Rainbow 企覿 蟲蟾讌 (2nd dlcat in Daejeon)
螳 企 蟲: Rainbow 企覿 蟲蟾讌 (2nd dlcat in Daejeon)螳 企 蟲: Rainbow 企覿 蟲蟾讌 (2nd dlcat in Daejeon)
螳 企 蟲: Rainbow 企覿 蟲蟾讌 (2nd dlcat in Daejeon)
Kyunghwan Kim
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5
SANG WON PARK
Differentiable Neural Computer
Differentiable Neural ComputerDifferentiable Neural Computer
Differentiable Neural Computer
Taehoon Kim
ル螻 螳 旧朱 覲企 貎る AI 蟲蠍
ル螻 螳 旧朱 覲企  貎る AI 蟲蠍ル螻 螳 旧朱 覲企  貎る AI 蟲蠍
ル螻 螳 旧朱 覲企 貎る AI 蟲蠍
NAVER D2
Rainbow 蟯 (The Rainbow's adventure in the vessel) (RL Korea)
Rainbow 蟯   (The Rainbow's adventure in the vessel) (RL Korea)Rainbow 蟯   (The Rainbow's adventure in the vessel) (RL Korea)
Rainbow 蟯 (The Rainbow's adventure in the vessel) (RL Korea)
Kyunghwan Kim
Soft Actor-Critic Algorithms and Applications 蟲 襴觀
Soft Actor-Critic Algorithms and Applications 蟲 襴觀Soft Actor-Critic Algorithms and Applications 蟲 襴觀
Soft Actor-Critic Algorithms and Applications 蟲 襴觀
[覦] Chap06 糾蠍一
[覦] Chap06 糾蠍一[覦] Chap06 糾蠍一
[覦] Chap06 糾蠍一
譬 豕
≡メ メ≡梶梶 メ求メ求 #1
≡メ メ≡梶梶 メ求メ求 #1≡メ メ≡梶梶 メ求メ求 #1
≡メ メ≡梶梶 メ求メ求 #1
Haesun Park
ル螻 螳 旧朱 覲企 貎る AI 蟲蠍 DEVIEW 2016
ル螻 螳 旧朱 覲企  貎る AI 蟲蠍 DEVIEW 2016ル螻 螳 旧朱 覲企  貎る AI 蟲蠍 DEVIEW 2016
ル螻 螳 旧朱 覲企 貎る AI 蟲蠍 DEVIEW 2016
Taehoon Kim
20170410 moving_average
20170410  moving_average20170410  moving_average
20170410 moving_average
hwangyoungjae
02. naive bayes classifier revision
02. naive bayes classifier   revision02. naive bayes classifier   revision
02. naive bayes classifier revision
Jeonghun Yoon
Workshop 210417 dhlee
Workshop 210417 dhleeWorkshop 210417 dhlee
Workshop 210417 dhlee
Dongheon Lee
Q Learning螻 CNN 伎 Object Localization
Q Learning螻 CNN 伎 Object LocalizationQ Learning螻 CNN 伎 Object Localization
Q Learning螻 CNN 伎 Object Localization
覦 蟾
[襾瑚]Chap11 螳
[襾瑚]Chap11 螳[襾瑚]Chap11 螳
[襾瑚]Chap11 螳
譬 豕
Guided policy search
Guided policy searchGuided policy search
Guided policy search
Jaehyeon Park
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
螳 蠍一_2(Deep sarsa, Deep Q-learning, DQN)
螳 蠍一_2(Deep sarsa, Deep Q-learning, DQN)螳 蠍一_2(Deep sarsa, Deep Q-learning, DQN)
螳 蠍一_2(Deep sarsa, Deep Q-learning, DQN)
Euijin Jeong
螳糾鍵豐(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 1
螳糾鍵豐(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 1螳糾鍵豐(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 1
螳糾鍵豐(MDP, Monte-Carlo, Time-difference, sarsa, q-learning) 1
Euijin Jeong
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQN
Curt Park
Deep learning study 1
Deep learning study 1Deep learning study 1
Deep learning study 1
San Kim
螳 企 蟲: Rainbow 企覿 蟲蟾讌 (2nd dlcat in Daejeon)
螳 企 蟲: Rainbow 企覿 蟲蟾讌 (2nd dlcat in Daejeon)螳 企 蟲: Rainbow 企覿 蟲蟾讌 (2nd dlcat in Daejeon)
螳 企 蟲: Rainbow 企覿 蟲蟾讌 (2nd dlcat in Daejeon)
Kyunghwan Kim
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5
SANG WON PARK
Differentiable Neural Computer
Differentiable Neural ComputerDifferentiable Neural Computer
Differentiable Neural Computer
Taehoon Kim
ル螻 螳 旧朱 覲企 貎る AI 蟲蠍
ル螻 螳 旧朱 覲企  貎る AI 蟲蠍ル螻 螳 旧朱 覲企  貎る AI 蟲蠍
ル螻 螳 旧朱 覲企 貎る AI 蟲蠍
NAVER D2
Rainbow 蟯 (The Rainbow's adventure in the vessel) (RL Korea)
Rainbow 蟯   (The Rainbow's adventure in the vessel) (RL Korea)Rainbow 蟯   (The Rainbow's adventure in the vessel) (RL Korea)
Rainbow 蟯 (The Rainbow's adventure in the vessel) (RL Korea)
Kyunghwan Kim
Soft Actor-Critic Algorithms and Applications 蟲 襴觀
Soft Actor-Critic Algorithms and Applications 蟲 襴觀Soft Actor-Critic Algorithms and Applications 蟲 襴觀
Soft Actor-Critic Algorithms and Applications 蟲 襴觀
[覦] Chap06 糾蠍一
[覦] Chap06 糾蠍一[覦] Chap06 糾蠍一
[覦] Chap06 糾蠍一
譬 豕
≡メ メ≡梶梶 メ求メ求 #1
≡メ メ≡梶梶 メ求メ求 #1≡メ メ≡梶梶 メ求メ求 #1
≡メ メ≡梶梶 メ求メ求 #1
Haesun Park
ル螻 螳 旧朱 覲企 貎る AI 蟲蠍 DEVIEW 2016
ル螻 螳 旧朱 覲企  貎る AI 蟲蠍 DEVIEW 2016ル螻 螳 旧朱 覲企  貎る AI 蟲蠍 DEVIEW 2016
ル螻 螳 旧朱 覲企 貎る AI 蟲蠍 DEVIEW 2016
Taehoon Kim
20170410 moving_average
20170410  moving_average20170410  moving_average
20170410 moving_average
hwangyoungjae
02. naive bayes classifier revision
02. naive bayes classifier   revision02. naive bayes classifier   revision
02. naive bayes classifier revision
Jeonghun Yoon
Workshop 210417 dhlee
Workshop 210417 dhleeWorkshop 210417 dhlee
Workshop 210417 dhlee
Dongheon Lee

Similar to 2018 06-11-active-question-answering (20)

Machine Learning Foundations (a case study approach) 螳 襴
Machine Learning Foundations (a case study approach) 螳 襴Machine Learning Foundations (a case study approach) 螳 襴
Machine Learning Foundations (a case study approach) 螳 襴
SANG WON PARK
梶=梶メ午メメ 求梶梶=
梶=梶メ午メメ 求梶梶=梶=梶メ午メメ 求梶梶=
梶=梶メ午メメ 求梶梶=
Sunyoung Shin
Using Behavioral Data to Identify Interviewer Fabrication in Surveys + CHI 20...
Using Behavioral Data to Identify Interviewer Fabrication in Surveys + CHI 20...Using Behavioral Data to Identify Interviewer Fabrication in Surveys + CHI 20...
Using Behavioral Data to Identify Interviewer Fabrication in Surveys + CHI 20...
Hyunjin Ahn
一危磯ゼ 詞朱る 語れる
一危磯ゼ 詞朱る 語れる一危磯ゼ 詞朱る 語れる
一危磯ゼ 詞朱る 語れる
Youngjae Kim
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
hkh
100% Serverless big data scale production Deep Learning System
100% Serverless big data scale production Deep Learning System100% Serverless big data scale production Deep Learning System
100% Serverless big data scale production Deep Learning System
hoondong kim
PyCon Korea 2018 - 伎朱 れる慨蠍
PyCon Korea 2018 - 伎朱  れる慨蠍PyCon Korea 2018 - 伎朱  れる慨蠍
PyCon Korea 2018 - 伎朱 れる慨蠍
Yungon Park
[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience
NAVER D2
12 願屋 豢
12 願屋 豢12 願屋 豢
12 願屋 豢
humana12
03 襦 襦語
03 襦  襦語03 襦  襦語
03 襦 襦語
humana12
1911 keracorn
1911 keracorn1911 keracorn
1911 keracorn
WarNik Chow
讌 讌 (Programming collective intelligence) ろ磯: Chapter 4 - Searching & Ranking
讌 讌 (Programming collective intelligence) ろ磯: Chapter 4 - Searching & Ranking讌 讌 (Programming collective intelligence) ろ磯: Chapter 4 - Searching & Ranking
讌 讌 (Programming collective intelligence) ろ磯: Chapter 4 - Searching & Ranking
Ian Choi
Learning dataanalyst 2020oct_yonsei
Learning dataanalyst 2020oct_yonseiLearning dataanalyst 2020oct_yonsei
Learning dataanalyst 2020oct_yonsei
Isabel Myeongju Han
Deep Learning for AI (1)
Deep Learning for AI (1)Deep Learning for AI (1)
Deep Learning for AI (1)
Dongheon Lee
[221] 求午メ求釈梶 求釈= 求п メ梶 メ戟≡ 求戟求
[221] 求午メ求釈梶 求釈= 求п メ梶 メ戟≡ 求戟求[221] 求午メ求釈梶 求釈= 求п メ梶 メ戟≡ 求戟求
[221] 求午メ求釈梶 求釈= 求п メ梶 メ戟≡ 求戟求
NAVER D2
.... 螳!
.... 螳!.... 螳!
.... 螳!
Dongmin Lee
=釈=梶 & Unity ML Agents
=釈=梶 & Unity ML Agents=釈=梶 & Unity ML Agents
=釈=梶 & Unity ML Agents
Hyunjong Lee
RLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdfRLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdf
ssuser1bc84b
2012 3 qp_hybrid algorithm optimization with artificial intelligence
2012 3 qp_hybrid algorithm optimization with artificial intelligence 2012 3 qp_hybrid algorithm optimization with artificial intelligence
2012 3 qp_hybrid algorithm optimization with artificial intelligence
Jong MIn Yu
Mrc based cdqa_Seoul chatbot
Mrc based cdqa_Seoul chatbotMrc based cdqa_Seoul chatbot
Mrc based cdqa_Seoul chatbot
Jun-Hyeong Lee
Machine Learning Foundations (a case study approach) 螳 襴
Machine Learning Foundations (a case study approach) 螳 襴Machine Learning Foundations (a case study approach) 螳 襴
Machine Learning Foundations (a case study approach) 螳 襴
SANG WON PARK
梶=梶メ午メメ 求梶梶=
梶=梶メ午メメ 求梶梶=梶=梶メ午メメ 求梶梶=
梶=梶メ午メメ 求梶梶=
Sunyoung Shin
Using Behavioral Data to Identify Interviewer Fabrication in Surveys + CHI 20...
Using Behavioral Data to Identify Interviewer Fabrication in Surveys + CHI 20...Using Behavioral Data to Identify Interviewer Fabrication in Surveys + CHI 20...
Using Behavioral Data to Identify Interviewer Fabrication in Surveys + CHI 20...
Hyunjin Ahn
一危磯ゼ 詞朱る 語れる
一危磯ゼ 詞朱る 語れる一危磯ゼ 詞朱る 語れる
一危磯ゼ 詞朱る 語れる
Youngjae Kim
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
Lab Seminar - Reading Wikipedia to Answer Open-Domain Questions (DrQA)
hkh
100% Serverless big data scale production Deep Learning System
100% Serverless big data scale production Deep Learning System100% Serverless big data scale production Deep Learning System
100% Serverless big data scale production Deep Learning System
hoondong kim
PyCon Korea 2018 - 伎朱 れる慨蠍
PyCon Korea 2018 - 伎朱  れる慨蠍PyCon Korea 2018 - 伎朱  れる慨蠍
PyCon Korea 2018 - 伎朱 れる慨蠍
Yungon Park
[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience[2A7]Linkedin'sDataScienceWhyIsItScience
[2A7]Linkedin'sDataScienceWhyIsItScience
NAVER D2
12 願屋 豢
12 願屋 豢12 願屋 豢
12 願屋 豢
humana12
03 襦 襦語
03 襦  襦語03 襦  襦語
03 襦 襦語
humana12
1911 keracorn
1911 keracorn1911 keracorn
1911 keracorn
WarNik Chow
讌 讌 (Programming collective intelligence) ろ磯: Chapter 4 - Searching & Ranking
讌 讌 (Programming collective intelligence) ろ磯: Chapter 4 - Searching & Ranking讌 讌 (Programming collective intelligence) ろ磯: Chapter 4 - Searching & Ranking
讌 讌 (Programming collective intelligence) ろ磯: Chapter 4 - Searching & Ranking
Ian Choi
Learning dataanalyst 2020oct_yonsei
Learning dataanalyst 2020oct_yonseiLearning dataanalyst 2020oct_yonsei
Learning dataanalyst 2020oct_yonsei
Isabel Myeongju Han
Deep Learning for AI (1)
Deep Learning for AI (1)Deep Learning for AI (1)
Deep Learning for AI (1)
Dongheon Lee
[221] 求午メ求釈梶 求釈= 求п メ梶 メ戟≡ 求戟求
[221] 求午メ求釈梶 求釈= 求п メ梶 メ戟≡ 求戟求[221] 求午メ求釈梶 求釈= 求п メ梶 メ戟≡ 求戟求
[221] 求午メ求釈梶 求釈= 求п メ梶 メ戟≡ 求戟求
NAVER D2
=釈=梶 & Unity ML Agents
=釈=梶 & Unity ML Agents=釈=梶 & Unity ML Agents
=釈=梶 & Unity ML Agents
Hyunjong Lee
RLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdfRLHF_Lessons_learned.pdf
RLHF_Lessons_learned.pdf
ssuser1bc84b
2012 3 qp_hybrid algorithm optimization with artificial intelligence
2012 3 qp_hybrid algorithm optimization with artificial intelligence 2012 3 qp_hybrid algorithm optimization with artificial intelligence
2012 3 qp_hybrid algorithm optimization with artificial intelligence
Jong MIn Yu
Mrc based cdqa_Seoul chatbot
Mrc based cdqa_Seoul chatbotMrc based cdqa_Seoul chatbot
Mrc based cdqa_Seoul chatbot
Jun-Hyeong Lee

2018 06-11-active-question-answering

  • 1. Ask the right question Active Question Reformulation with Reinforcement Learning 2018.06.11 伎
  • 2. Table of Contents 1. Reinforcement Learning 2. Active Question Answering 3. BiDirectional Attention Flow 4. Experiment 5. Analysis of The Agents Language
  • 4. Reinforcement Learning Reinforcement Learning = Reinforcement + Machine Learning What is Reinforcement? 覦一一 讌襷 讌 覃伎 覲伎 襷 視 襯 企 蟆 Ex) Skinner 覓語 ろ !4
  • 5. Reinforcement Learning What is Reinforcement Learning 一危 X: 企 企 讌 Y: 朱 覲伎 覦讌 一危一 伎 蟯蟯螻襯 覲伎 襷 覦蟆 豈 !5
  • 6. Reinforcement Learning What is Reinforcement Learning Agent Environment 語 一危 (state, action, reward history ) 豕 policy襯 谿城 蟆 覈 reward襯 豕 蟆 Agent: 襯 蟯谿壱螻 覈 讌レ 譯殊牡 Environment: agent襯 誤 襾語 !6
  • 7. Markov Decision Process MDP(Markov Decision Process) Sequential decision making 覓語 framework 5-tuple (state, action, reward, transition probability, discount factor) !7
  • 8. Markov Decision Process 蠏碁Μ MDP !8
  • 9. Markov Decision Process 蠏碁Μ MDP 旧 覈 豕 policy襯 谿城 蟆 Policy = 轟 state 轟 action probability !9
  • 10. Markov Decision Process Return: 轟 state 轟 action 豬 危 覦 reward
  • 11. Policy Gradient Parameterized policy襯 螳 (linear function approx. or Neural Network) policy input state feature願碓 raw pixels / output probability of action http://karpathy.github.io/2016/05/31/rl/
  • 12. Policy Gradient Supervised Learning: Maximize log likelihood れ 磯殊 policy襯 一危 (Imitation Learning, correct action label 譟伎)
  • 13. Policy Gradient Policy Gradient maximized log likelihood of probability of taking action weighted by reward(return)
  • 14. Policy Gradient + reward襯 覦 讓曙朱 policy distribution 企
  • 15. Application of DeepRL Game play Alphago, Atari, Vizdoom Robotics robot arm manipulation, locomotion Natural language process Question Answering, Chatting Autonomous driving Mobileye !15 https://www.youtube.com/watch?v=vppFvq2quQ0
  • 17. Ask The Right Question ICLR 2018 Oral presentation朱 accept
  • 18. Jeopardy! Jeopardy! : 覩瑚記 る 伎 讌覓語 る 蠏 讌覓語 螳襴る 襷豢 伎 Question Answering 覓語 https://namu.wiki/w/Jeopardy! https://abcnews.go.com/Entertainment/jeopardy-things-americas-favorite-quiz-show/story?id=18824501
  • 19. Jeopardy! Dataset れ螻 螳 一危一 螻糾 : Q&A pairs
  • 20. SearchQA Dataset SearchQA Matthew Dunn, Levent Sagun, Mike Higgins, Ugur Guney, Volkan Cirik, and Kyunghyun Cho. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine. https://arxiv.org/ abs/1704.05179, 2017. Github repo: https://github.com/nyu-dl/SearchQA Jeopardy! 讌覓瑚骸 dataset 覦朱伎 web 襦る 140k question-answer pairs, 螳 pair 蠏 49.6 snippet 螳 question襷 google querying 譬 れ information retrieval system 襷 一危一
  • 22. Problem De鍖nition Active Question Answering frame QA as a Reinforcement Learning problem English-to-English machine translation: paraphrasing MDP襯 螳企慨! Agent: Reformulate Environment: Q&A system State: 一危一 讌覓 Action: q (agent reformulate 讌覓) Reward: question answering quality
  • 23. AQA Model 1. QA Environment BiDirectional Attention Flow (BiDAF) 覈語 Question answer襯 襷 (Training reward) Reward: token level F1-score (answer quality 螳) 2. Reformulation Model Sequence-to-sequence model Multilingual translation 牛 pre-training 3. Answer Selection Model Test time 覈 (磯 旧貅 ) Train QA env output 牛 reward襯 螻, test answer 譴 螳 譬 answer襯 螻殊 [query, rewrite, answer] embedding pre-training, 3螳 embedding concatenation 1-d convolution 牛 binary classi鍖cation
  • 24. Reformulation model Massive Exploration of Neural Machine Translation Architectures - Denny Britz, 2017
  • 25. BiDirectional Attention Flow BI-DIRECTIONAL ATTENTION FLOW FOR MACHINE COMPREHENSION - MinJoon Seo, 2017 GitHub: https://github.com/allenai/bi-att-鍖ow query context襯 牛 answering model Bidirectional attention 鍖ow mechanism 牛 query context SQuAD(Stanford Question Answering Dataset) 一危一 state-of-art(2017 朱 覦 轟) https://rajpurkar.github.io/SQuAD-explorer/
  • 27. BiDirectional Attention Flow Character embedding + word embedding contextual embedding Attention Flow: not 鍖xed length + memoryless
  • 28. Training: Reformulation Policy Gradient Training 蟆郁記 磯Μ螳 螻 苦 蟆 譯殊伎 question 螳 譬 answer襯 襷れ企企 蟆企. Parameterized policy policy seq2seq model企襦 れ螻 螳 螳 Policy襯 牛 question る 蠏 question environment螳 action reward れ reward襯 豕襦 policy襯
  • 29. Training: Reformulation Policy Gradient Training 旧螻襴讀 REINFORCE襯 : log likelihood gradient襯 磯 一危誤 reward螳 weight REINFORCE gradient estimate螳 high variance襯 螳讌る 覓語 baseline sub-optimal 觜讌 蟆 覦讌(exploration ル)蠍 entropy regularization 豢螳
  • 30. Training: Reformulation Policy Gradient Training 豕譬 objective function れ螻 螳. baseline q_0 伎 reward 蠏 螻 Pre-training Paraphrasing pre-training: translate English-to-English 覿 一危磯 牛蠍 multilingual translation (English-Spanish, French-English, etc.) Multilingual United Nations Parallel Corpus v1.0: 11.4M sentences Monolingual data 伎 豢螳 (small corpus) Paralex database of question paraphrases: 1.5M pairs(1 question 4螳 paraphrase)
  • 32. Training: Reformulation Pretraining setting Optimizer: Adam Learning rate: 0.001, train: 400M instances RL setting optimizer: SGD Train: 100k RL steps Batch size: 64 Learning rate: 0.001 Regularization weight: 0.001 QA system GPU, reformulation model 旧 CPU
  • 33. Training: Answer Selector Answer Selector : binary classi鍖cation reformulator螳 20螳 question [query, rewrite, answer] 蠏 譴 螳 譬 answer襯 谿場伎 願 model 企 rewrite answer螳 蠏 伎/危語 classi鍖cation 蠏 伎: positive, 蠏 危: negative token 100 dimension embedding pre-training Query embedding 100-d vector 1-d CNN(鍖lter size=3) rewrite embedding 100-d vector 1-d CNN(鍖lter size=3) feed-forward network answer embedding 100-d vector 1-d CNN(鍖lter size=3) (蠏碁 螳 譬 answer襯 企至 螻襯企 蟇伎..?) ` `
  • 34. Result EM, F1: 糾骸 model answer token level metric TopHyp: seq2seq model output 譴 豌 覯讌 reformulation CNN: CNN-based selector襯 伎 best answer襯
  • 35. Analysis of The Agents Language
  • 36. Statistics of Questions Length: question word 螳. TF(term frequency): question 覦覲給 word 螳 DF(document frequency): question token context median QC(Query Clarity): question螻 reformulation 伎 relative entropy Question Clue gandhi deeply in鍖uenced count wrote war peace. Base-NMT Who in鍖uenced count wrote war? AQA-QR What is name gandhi gandhi in鍖uence wrote peace peace?
  • 37. Statistics of Questions Base-NMT 螳 syntactically well-formed question Lower DF: NMT training corpus螳 SearchQA 一危 螻 谿願 碁 AQA-QR: TopHyp 99.8% 螳 what is name朱 : 覈 answer螳 name螻 蟯 伎 企蟆 給 Less 鍖uent Multiple token 螻 蟆曙郁 SearchQA 觜 2覦
  • 38. Paraphrasing Quality Image captioning dataset朱 paraphrasing quality襯 ろ MSCOCO 一危一 企語襷 5螳 caption 襯 source襦 螻 襾語 4螳襯 reference襦 Base-NMT: 11.4 BLEU / AQA-QR: 8.6 BLEU
  • 40. Future work One-shot decision Sequential Decision Information seeking task End-to-end RL problem Closed loop between reformulator and selector