ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Talking Geckos QA System
Jie Cao
Boya Song
Overview
? Rule based system [Based on Ellen¡¯s Quarc Paper]
? Stanford core nlp lib: POS, sentence split, NER,
dependency relations
? Classify questions according to keyword: what, where,
when, why, how, etc.
? Word Match function
? Rules that award points to each sentence
? Narrow down answer according to key word type
Work Process
Input
wordMatch(Common Component)
(Question,StorySentences)->ScoredSentences
reverseWordMatch(Make Assumption)
correctAnswer->correctSentence
narrowDownAnswer(Why)
correctSentence->correctAnswer
Google: Fast is better than slow
narrowDownAnswer(What)
correctSentence->correctAnswer
narrowDownAnswer(¡­)
correctSentence->correctAnswer
narrowDownAnswer(How)
correctSentence->correctAnswer
BestSentence(Why)
ScoredSentence->correctSentence
BestSentence(What)
ScoredSentence->correctSentence
BestSentence(¡­)
ScoredSentence->correctSentence
BestSentence(How)
ScoredSentence->correctSentence
Rule Isolation
Rule Isolation
Developing Task Parallelism
Reports and Testing
Fast Testing
Classify Question Type Rules
WordMatch Rules Object Persistence, Reload
Test all data in less than 1 minute
Pre-annotated Story and Question
Annotated Test Report
Test By Question Type List
Annotated and Scored Report For
Every Question Type
151 Stories,1200 Questions
Precision Improvement
? Used reports and searched for general patterns to narrow down
answer within sentence
? Find continuous named entity tags in answer
? Location. time, and preposition keywords
? Why: substring after ¡°so¡± or ¡°because¡±¡ªsimple but very effective
? What: substring after core sub, core verb, root verb, or strike of
word contained in question
? Who, Where type of questions are more complicated
Sample Report
QuestionID: 1999-W38-1-9?
Question: How many peacekeeping troops does Canada now have in 22 nations around the world??
Answer: about 3,900 | 3,900?
SentenceSize: 1?
CorrectSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers
on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]?
MyAnswer: 3,900 22?
MyAnswerSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers
on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]?
MyAnswerScore: AnswerScore{recall=1.0, precise=0.5, fmeasure=0.6666666666666666, myCorrect=1, correctTotal=1,
myTotal=2, matchKey=' 3,900'}?
Difficulty: easy?
Included: Yes?
?
?
QuestionID: 1999-W37-5-3?
Question: How much longer did Meiorin take to run 2.5 kilometres than she was supposed to??
Answer: 49 seconds?
SentenceSize: 1?
CorrectSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer
[+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]?
MyAnswer: 49 seconds?
MyAnswerSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer
[+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]?
MyAnswerScore: AnswerScore{recall=1.0, precise=1.0, fmeasure=1.0, myCorrect=2, correctTotal=2, myTotal=2, matchKey='49
seconds'}?
Difficulty: moderate?
Included: Yes
rightSentence=94, length = 175, avgRecall = 0.47672550966159993, avgPrecision = 0.3014595835896379, avgFmeasure =
0.32152025726198186
WordMatch: TokenScore&RuleScore
Question: Why is the Sheldon Kennedy Foundation abandoning its dream of building a
ranch for sexually abused children?
MyAnswerSentence: [47.5]Troubled by poor business decisions , the Sheldon
[+strike_1+word_0.5+disword_1.0] Kennedy [+strike_1+word_0.5+disword_1.0]
Foundation [+strike_1+word_0.5+disword_2.0+subj_1+secverb_3] has abandoned
[+word_0.5+rootverb_6] its [+strike_1+word_0.5+disword_1.0] dream
[+strike_1+word_0.5+disword_2.0+dobj_1+secverb_3] of [+strike_1] building
[+strike_1+word_0.5+coreverb_4] a [+strike_1] ranch [+strike_1+word_0.5+disword_0.5]
for [+strike_1] sexually [+strike_1+word_0.5+disword_0.125] abused
[+strike_1+word_0.5+disword_0.125] children [+strike_1+word_0.5+disword_0.25] and will
hand its [+word_0.5+disword_0.5] donations over to the Canadian Red Cross .
[WHY_CURRENT_GOOD:2.0 ]
Stopword
Total Score
=Token Score+Rule Score
Strike_1:
Continuous Match
Word Match:0.5
closer
dis_word_1.0
Math.pow(2, 2 -
dobj_1
root verb_6
coreverb_4:
verb but not root
further dis_word_0.125
Math.pow(2, 2 - distance)
Incomplete
Lucene Stopword
Why RuleScore
WordMatch: TokenScore
1. Generate
1. VerbsInQuestion,OthersInQuestion,rootVerb,subjInQuestion,dObjInQuestion
2. General Word Match Score
1. WORD: ¡°word_0.5¡±
2. DIS_WORD: ¡°disword_{pow(2,2-distance)}¡± //unimportant ¡°Mod¡±, longer distance from the root
3. Continuous Word Match Bonus Score
1. STRIKE: ¡°strike_1¡± // for every continuous word(not include stopword)
4. Verb Match
1. Verb stopwords no score. //lucene stopwords
2. VerbsInQuestion.contains(¡°verb¡±)
? AUX_VERB: 0.5
? ROOT_VERB: 6
? COM_VERB: 0.5
? CORE_VERB: 4
5. Noun Match
1. ROOT_VERB: 6 // Root Verb in noun form
2. subjInQuestion != null && matched
? SUBJ: 1 // match subj in question.
? CORE_SUBJ: 3 // match core_subj in question
? SEC_VERB: 3 // verb of this matched noun is also in Question
3. dObjInQuestion != null && matched
? DOBJ: 1 // match dobj in question
? SEC_VERB: 3 // verb of this matched noun is also in Question
Time Limited
Just Extract Features, Manual tuning weights
Future Work:
1. Learning weights for features
2. dcoref
3. Thesaurus
BestSentence: Rule Score
WEAK_CLUE=1
CLUE=2
GOOD_CLUE =4
CONFIDENT=6
SLAM_DUNK=20
Only TokenScore
in all dataset
Average:
CorrectSentence/All
TokenScore+RuleScore
in all dataset
Average:
CorrectSentence/All
Where GOOD_CLUE: WHERE_LOCATION_PREP
CLUE: WHERE_LOCATION_NER
86/155=0.5548 93/155=0.6000
When
GOOD_CLUE: WHEN_TIME_NER
SLAM_DUNK: WHEN_ORDER_TOKEN
SLAM_DUNK: WHEN_BEGIN_TOKEN
117/173=0.6763 119/173=0.6879
Why
GOOD_CLUE: WHY_REASON_TOKEN
CLUE: WHY_CURRENT_GOOD
CLUE: WHY_POST_GOOD
CLUE: WHY_PRE_GOOD
WEAK_CLUE: WHY_LEAD_TOKEN
WEAK_CLUE: WHY_THINK_TOKEN
WEAK_CLUE: WHY_WANT_TOKEN
70/115=0.6087 71/115=0.6174
Who/
Whose
GOOD_CLUE:WHO_PERSON_NER
CLUE:WHO_ORGANIZATION_MISC_NER
CLUE:WHOSE_PRP_POS
116/187=0.6203 118/187=0.6310
What
GOOD_CLUE: WHAT_KIND_TOKEN
CLUE: WHAT_DATE_NER
SLAM_DUNK: WHAT_NAME_TOKEN
202/311=0.6325 202/311=0.6325
How HOW_SECKEY_K: distance from k to root
GOOD_CLUE: HOW_DISTANCE_TEXT
141/232=0.6078 143/232=0.6164
testset1 52/84 What, 35/57 How, 27/44 Where, 25/41 Who, 23/32 When, 16/28 Why
¡­
212/313=0.6773 212/313=0.6773
testset2 57/88 What, 41/50 How, 28/44 Who, 27/40 When, 24/38 Where, 21/31 Why ¡­ 215/315=0.6825 221/315=0.7016
Summary
? Reporting system worked very well, helped to
improve precision
? System is fairly simple and straight forward
? Unfortunately due to Stanford core nlp lib bug, we
could not incorporate coreference resolution into
our system
? Short on time: Tuning weights manually(ML?) Recall
and precision still have room for improvement
Thanks
Q&A

More Related Content

Talking Geckos (Question and Answering)

  • 1. Talking Geckos QA System Jie Cao Boya Song
  • 2. Overview ? Rule based system [Based on Ellen¡¯s Quarc Paper] ? Stanford core nlp lib: POS, sentence split, NER, dependency relations ? Classify questions according to keyword: what, where, when, why, how, etc. ? Word Match function ? Rules that award points to each sentence ? Narrow down answer according to key word type
  • 3. Work Process Input wordMatch(Common Component) (Question,StorySentences)->ScoredSentences reverseWordMatch(Make Assumption) correctAnswer->correctSentence narrowDownAnswer(Why) correctSentence->correctAnswer Google: Fast is better than slow narrowDownAnswer(What) correctSentence->correctAnswer narrowDownAnswer(¡­) correctSentence->correctAnswer narrowDownAnswer(How) correctSentence->correctAnswer BestSentence(Why) ScoredSentence->correctSentence BestSentence(What) ScoredSentence->correctSentence BestSentence(¡­) ScoredSentence->correctSentence BestSentence(How) ScoredSentence->correctSentence Rule Isolation Rule Isolation Developing Task Parallelism
  • 4. Reports and Testing Fast Testing Classify Question Type Rules WordMatch Rules Object Persistence, Reload Test all data in less than 1 minute Pre-annotated Story and Question Annotated Test Report Test By Question Type List Annotated and Scored Report For Every Question Type 151 Stories,1200 Questions
  • 5. Precision Improvement ? Used reports and searched for general patterns to narrow down answer within sentence ? Find continuous named entity tags in answer ? Location. time, and preposition keywords ? Why: substring after ¡°so¡± or ¡°because¡±¡ªsimple but very effective ? What: substring after core sub, core verb, root verb, or strike of word contained in question ? Who, Where type of questions are more complicated
  • 6. Sample Report QuestionID: 1999-W38-1-9? Question: How many peacekeeping troops does Canada now have in 22 nations around the world?? Answer: about 3,900 | 3,900? SentenceSize: 1? CorrectSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]? MyAnswer: 3,900 22? MyAnswerSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]? MyAnswerScore: AnswerScore{recall=1.0, precise=0.5, fmeasure=0.6666666666666666, myCorrect=1, correctTotal=1, myTotal=2, matchKey=' 3,900'}? Difficulty: easy? Included: Yes? ? ? QuestionID: 1999-W37-5-3? Question: How much longer did Meiorin take to run 2.5 kilometres than she was supposed to?? Answer: 49 seconds? SentenceSize: 1? CorrectSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer [+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]? MyAnswer: 49 seconds? MyAnswerSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer [+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]? MyAnswerScore: AnswerScore{recall=1.0, precise=1.0, fmeasure=1.0, myCorrect=2, correctTotal=2, myTotal=2, matchKey='49 seconds'}? Difficulty: moderate? Included: Yes rightSentence=94, length = 175, avgRecall = 0.47672550966159993, avgPrecision = 0.3014595835896379, avgFmeasure = 0.32152025726198186
  • 7. WordMatch: TokenScore&RuleScore Question: Why is the Sheldon Kennedy Foundation abandoning its dream of building a ranch for sexually abused children? MyAnswerSentence: [47.5]Troubled by poor business decisions , the Sheldon [+strike_1+word_0.5+disword_1.0] Kennedy [+strike_1+word_0.5+disword_1.0] Foundation [+strike_1+word_0.5+disword_2.0+subj_1+secverb_3] has abandoned [+word_0.5+rootverb_6] its [+strike_1+word_0.5+disword_1.0] dream [+strike_1+word_0.5+disword_2.0+dobj_1+secverb_3] of [+strike_1] building [+strike_1+word_0.5+coreverb_4] a [+strike_1] ranch [+strike_1+word_0.5+disword_0.5] for [+strike_1] sexually [+strike_1+word_0.5+disword_0.125] abused [+strike_1+word_0.5+disword_0.125] children [+strike_1+word_0.5+disword_0.25] and will hand its [+word_0.5+disword_0.5] donations over to the Canadian Red Cross . [WHY_CURRENT_GOOD:2.0 ] Stopword Total Score =Token Score+Rule Score Strike_1: Continuous Match Word Match:0.5 closer dis_word_1.0 Math.pow(2, 2 - dobj_1 root verb_6 coreverb_4: verb but not root further dis_word_0.125 Math.pow(2, 2 - distance) Incomplete Lucene Stopword Why RuleScore
  • 8. WordMatch: TokenScore 1. Generate 1. VerbsInQuestion,OthersInQuestion,rootVerb,subjInQuestion,dObjInQuestion 2. General Word Match Score 1. WORD: ¡°word_0.5¡± 2. DIS_WORD: ¡°disword_{pow(2,2-distance)}¡± //unimportant ¡°Mod¡±, longer distance from the root 3. Continuous Word Match Bonus Score 1. STRIKE: ¡°strike_1¡± // for every continuous word(not include stopword) 4. Verb Match 1. Verb stopwords no score. //lucene stopwords 2. VerbsInQuestion.contains(¡°verb¡±) ? AUX_VERB: 0.5 ? ROOT_VERB: 6 ? COM_VERB: 0.5 ? CORE_VERB: 4 5. Noun Match 1. ROOT_VERB: 6 // Root Verb in noun form 2. subjInQuestion != null && matched ? SUBJ: 1 // match subj in question. ? CORE_SUBJ: 3 // match core_subj in question ? SEC_VERB: 3 // verb of this matched noun is also in Question 3. dObjInQuestion != null && matched ? DOBJ: 1 // match dobj in question ? SEC_VERB: 3 // verb of this matched noun is also in Question Time Limited Just Extract Features, Manual tuning weights Future Work: 1. Learning weights for features 2. dcoref 3. Thesaurus
  • 9. BestSentence: Rule Score WEAK_CLUE=1 CLUE=2 GOOD_CLUE =4 CONFIDENT=6 SLAM_DUNK=20 Only TokenScore in all dataset Average: CorrectSentence/All TokenScore+RuleScore in all dataset Average: CorrectSentence/All Where GOOD_CLUE: WHERE_LOCATION_PREP CLUE: WHERE_LOCATION_NER 86/155=0.5548 93/155=0.6000 When GOOD_CLUE: WHEN_TIME_NER SLAM_DUNK: WHEN_ORDER_TOKEN SLAM_DUNK: WHEN_BEGIN_TOKEN 117/173=0.6763 119/173=0.6879 Why GOOD_CLUE: WHY_REASON_TOKEN CLUE: WHY_CURRENT_GOOD CLUE: WHY_POST_GOOD CLUE: WHY_PRE_GOOD WEAK_CLUE: WHY_LEAD_TOKEN WEAK_CLUE: WHY_THINK_TOKEN WEAK_CLUE: WHY_WANT_TOKEN 70/115=0.6087 71/115=0.6174 Who/ Whose GOOD_CLUE:WHO_PERSON_NER CLUE:WHO_ORGANIZATION_MISC_NER CLUE:WHOSE_PRP_POS 116/187=0.6203 118/187=0.6310 What GOOD_CLUE: WHAT_KIND_TOKEN CLUE: WHAT_DATE_NER SLAM_DUNK: WHAT_NAME_TOKEN 202/311=0.6325 202/311=0.6325 How HOW_SECKEY_K: distance from k to root GOOD_CLUE: HOW_DISTANCE_TEXT 141/232=0.6078 143/232=0.6164 testset1 52/84 What, 35/57 How, 27/44 Where, 25/41 Who, 23/32 When, 16/28 Why ¡­ 212/313=0.6773 212/313=0.6773 testset2 57/88 What, 41/50 How, 28/44 Who, 27/40 When, 24/38 Where, 21/31 Why ¡­ 215/315=0.6825 221/315=0.7016
  • 10. Summary ? Reporting system worked very well, helped to improve precision ? System is fairly simple and straight forward ? Unfortunately due to Stanford core nlp lib bug, we could not incorporate coreference resolution into our system ? Short on time: Tuning weights manually(ML?) Recall and precision still have room for improvement