The document describes a rule-based question answering system that takes in a question and story and uses Stanford NLP tools and custom rules to identify the sentence containing the answer. It classifies the question type, calculates word matching scores between the question and sentences, and applies rules to narrow down and score sentences. It outputs a report on test questions with the predicted answer and sentence. The system was improved over time by analyzing reports, adding rules to better identify answers for different question types, and incorporating named entity recognition.
2. Overview
? Rule based system [Based on Ellen¡¯s Quarc Paper]
? Stanford core nlp lib: POS, sentence split, NER,
dependency relations
? Classify questions according to keyword: what, where,
when, why, how, etc.
? Word Match function
? Rules that award points to each sentence
? Narrow down answer according to key word type
3. Work Process
Input
wordMatch(Common Component)
(Question,StorySentences)->ScoredSentences
reverseWordMatch(Make Assumption)
correctAnswer->correctSentence
narrowDownAnswer(Why)
correctSentence->correctAnswer
Google: Fast is better than slow
narrowDownAnswer(What)
correctSentence->correctAnswer
narrowDownAnswer(¡)
correctSentence->correctAnswer
narrowDownAnswer(How)
correctSentence->correctAnswer
BestSentence(Why)
ScoredSentence->correctSentence
BestSentence(What)
ScoredSentence->correctSentence
BestSentence(¡)
ScoredSentence->correctSentence
BestSentence(How)
ScoredSentence->correctSentence
Rule Isolation
Rule Isolation
Developing Task Parallelism
4. Reports and Testing
Fast Testing
Classify Question Type Rules
WordMatch Rules Object Persistence, Reload
Test all data in less than 1 minute
Pre-annotated Story and Question
Annotated Test Report
Test By Question Type List
Annotated and Scored Report For
Every Question Type
151 Stories,1200 Questions
5. Precision Improvement
? Used reports and searched for general patterns to narrow down
answer within sentence
? Find continuous named entity tags in answer
? Location. time, and preposition keywords
? Why: substring after ¡°so¡± or ¡°because¡±¡ªsimple but very effective
? What: substring after core sub, core verb, root verb, or strike of
word contained in question
? Who, Where type of questions are more complicated
6. Sample Report
QuestionID: 1999-W38-1-9?
Question: How many peacekeeping troops does Canada now have in 22 nations around the world??
Answer: about 3,900 | 3,900?
SentenceSize: 1?
CorrectSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers
on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]?
MyAnswer: 3,900 22?
MyAnswerSentence: [8.5]Right now [+word_0.5] , Canada [+word_0.5] has [+word_0.5+auxverb_0.5] about 3,900 peacekeepers
on 22 [+word_0.5] missions around [+word_0.5] the world [+strike_1+word_0.5] . [HOW_NUMBER_NER:4.0 ]?
MyAnswerScore: AnswerScore{recall=1.0, precise=0.5, fmeasure=0.6666666666666666, myCorrect=1, correctTotal=1,
myTotal=2, matchKey=' 3,900'}?
Difficulty: easy?
Included: Yes?
?
?
QuestionID: 1999-W37-5-3?
Question: How much longer did Meiorin take to run 2.5 kilometres than she was supposed to??
Answer: 49 seconds?
SentenceSize: 1?
CorrectSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer
[+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]?
MyAnswer: 49 seconds?
MyAnswerSentence: [15.5]She [+word_0.5+disword_2.0] took [+word_0.5+rootverb_6] 49 seconds longer
[+word_0.5+disword_2.0] . [HOW_NUMBER_NER:4.0 ]?
MyAnswerScore: AnswerScore{recall=1.0, precise=1.0, fmeasure=1.0, myCorrect=2, correctTotal=2, myTotal=2, matchKey='49
seconds'}?
Difficulty: moderate?
Included: Yes
rightSentence=94, length = 175, avgRecall = 0.47672550966159993, avgPrecision = 0.3014595835896379, avgFmeasure =
0.32152025726198186
7. WordMatch: TokenScore&RuleScore
Question: Why is the Sheldon Kennedy Foundation abandoning its dream of building a
ranch for sexually abused children?
MyAnswerSentence: [47.5]Troubled by poor business decisions , the Sheldon
[+strike_1+word_0.5+disword_1.0] Kennedy [+strike_1+word_0.5+disword_1.0]
Foundation [+strike_1+word_0.5+disword_2.0+subj_1+secverb_3] has abandoned
[+word_0.5+rootverb_6] its [+strike_1+word_0.5+disword_1.0] dream
[+strike_1+word_0.5+disword_2.0+dobj_1+secverb_3] of [+strike_1] building
[+strike_1+word_0.5+coreverb_4] a [+strike_1] ranch [+strike_1+word_0.5+disword_0.5]
for [+strike_1] sexually [+strike_1+word_0.5+disword_0.125] abused
[+strike_1+word_0.5+disword_0.125] children [+strike_1+word_0.5+disword_0.25] and will
hand its [+word_0.5+disword_0.5] donations over to the Canadian Red Cross .
[WHY_CURRENT_GOOD:2.0 ]
Stopword
Total Score
=Token Score+Rule Score
Strike_1:
Continuous Match
Word Match:0.5
closer
dis_word_1.0
Math.pow(2, 2 -
dobj_1
root verb_6
coreverb_4:
verb but not root
further dis_word_0.125
Math.pow(2, 2 - distance)
Incomplete
Lucene Stopword
Why RuleScore
8. WordMatch: TokenScore
1. Generate
1. VerbsInQuestion,OthersInQuestion,rootVerb,subjInQuestion,dObjInQuestion
2. General Word Match Score
1. WORD: ¡°word_0.5¡±
2. DIS_WORD: ¡°disword_{pow(2,2-distance)}¡± //unimportant ¡°Mod¡±, longer distance from the root
3. Continuous Word Match Bonus Score
1. STRIKE: ¡°strike_1¡± // for every continuous word(not include stopword)
4. Verb Match
1. Verb stopwords no score. //lucene stopwords
2. VerbsInQuestion.contains(¡°verb¡±)
? AUX_VERB: 0.5
? ROOT_VERB: 6
? COM_VERB: 0.5
? CORE_VERB: 4
5. Noun Match
1. ROOT_VERB: 6 // Root Verb in noun form
2. subjInQuestion != null && matched
? SUBJ: 1 // match subj in question.
? CORE_SUBJ: 3 // match core_subj in question
? SEC_VERB: 3 // verb of this matched noun is also in Question
3. dObjInQuestion != null && matched
? DOBJ: 1 // match dobj in question
? SEC_VERB: 3 // verb of this matched noun is also in Question
Time Limited
Just Extract Features, Manual tuning weights
Future Work:
1. Learning weights for features
2. dcoref
3. Thesaurus
9. BestSentence: Rule Score
WEAK_CLUE=1
CLUE=2
GOOD_CLUE =4
CONFIDENT=6
SLAM_DUNK=20
Only TokenScore
in all dataset
Average:
CorrectSentence/All
TokenScore+RuleScore
in all dataset
Average:
CorrectSentence/All
Where GOOD_CLUE: WHERE_LOCATION_PREP
CLUE: WHERE_LOCATION_NER
86/155=0.5548 93/155=0.6000
When
GOOD_CLUE: WHEN_TIME_NER
SLAM_DUNK: WHEN_ORDER_TOKEN
SLAM_DUNK: WHEN_BEGIN_TOKEN
117/173=0.6763 119/173=0.6879
Why
GOOD_CLUE: WHY_REASON_TOKEN
CLUE: WHY_CURRENT_GOOD
CLUE: WHY_POST_GOOD
CLUE: WHY_PRE_GOOD
WEAK_CLUE: WHY_LEAD_TOKEN
WEAK_CLUE: WHY_THINK_TOKEN
WEAK_CLUE: WHY_WANT_TOKEN
70/115=0.6087 71/115=0.6174
Who/
Whose
GOOD_CLUE:WHO_PERSON_NER
CLUE:WHO_ORGANIZATION_MISC_NER
CLUE:WHOSE_PRP_POS
116/187=0.6203 118/187=0.6310
What
GOOD_CLUE: WHAT_KIND_TOKEN
CLUE: WHAT_DATE_NER
SLAM_DUNK: WHAT_NAME_TOKEN
202/311=0.6325 202/311=0.6325
How HOW_SECKEY_K: distance from k to root
GOOD_CLUE: HOW_DISTANCE_TEXT
141/232=0.6078 143/232=0.6164
testset1 52/84 What, 35/57 How, 27/44 Where, 25/41 Who, 23/32 When, 16/28 Why
¡
212/313=0.6773 212/313=0.6773
testset2 57/88 What, 41/50 How, 28/44 Who, 27/40 When, 24/38 Where, 21/31 Why ¡ 215/315=0.6825 221/315=0.7016
10. Summary
? Reporting system worked very well, helped to
improve precision
? System is fairly simple and straight forward
? Unfortunately due to Stanford core nlp lib bug, we
could not incorporate coreference resolution into
our system
? Short on time: Tuning weights manually(ML?) Recall
and precision still have room for improvement