�ݺ�ߣ

Building a Graph of Names and Contextual
Patterns for Named Entity Classification
César de Pablo Sánchez and Paloma Mart´ınez
LABDA, Computer Science Dept., Universidad Carlos III de Madrid
{cdepablo,pmf}@inf.uc3m.es
Objectives
• NERC for multilingual applications
• Bootstrap a name list and indicative patterns
– Large document collection
– Few example seeds for every class Nseeds < 40
– Language independence (as an aim)
Initial assumptions
• Dual bootstrapping
• One sense per entity type (name)
• Indelibility of class assignments
• Counter-training: learn several classes at once
• Query based exploration of the indexed collection.
PERSON(x) Left patterns Right patterns
Num Name Num Text Num Text
15 Fernando Arrabal 0 Gobierno del presidente 6 , ### esta tarde
64 Teodoro Obiang 1 Gobierno del ### 9 , vencedor
68 Salvador Allende 12 gobierno del presidente 21 y el ex
128 Peres 13 presidente del pa´ıs , 26 , viajará
156 Edouard Balladur 29 actual presidente 34 , y su colega
332 Grachov 47 palabras de 42 , visitará
423 Calderón 50 cuyo ### , 49 , y el l´ıder
450 Colom 60 presidente , 63 y el presidente
522 Joaqu´ın Almunia 61 reunión con 65 se entrevistó’
Direct Evaluation: Name Lists (AvgPrec)
Model PER LOC ORG M / T Mean
PLO 94.8 52.7 67.1 – 71.5
PLOM 93.0 44.8 79.3 75.0 73.0
PLOT 94.8 87.4 81.1 40.9 76.0
Name Classification
Model P R F Acc
baseline
CONLL 26.27 56.48 35.86 –
ORG – – – 39.34
entities
PLO 77.33 54.34 63.83 64.04
PLOM 78.85 51.53 62.36 66.24
PLOT 78.72 41.58 54.42 62.18
entities+patterns
PLO 66.12 57.97 61.78 63.17
PLOM 73.65 61.73 67.17 71.29
PLOT 66.35 56.62 61.10 62.50
Algorithm
Pattern selection and evaluation
1. Rank by Support, filter min-support, select top-k
2. Evaluate min-Acc: Acc(p) = Pos
Pos+Neg
3. Evaluate min-Conf: Conf(p) == Pos−Neg
Pos+Neg+Unk
Entity selection and evaluation
1. Rank by Support, filter min-support, select top-k
2. Evaluate min-Conf:
Confslot(a) = 1 − i (1 − Confpattern(pi)) ,
ConfNE(a) = Confleft(a) ∗ Confright(a)
Conclusions
• Efficient bootstrapping from large indexed collections
with less seeds
• Already useful for NERC
• F-measure is lower than supervised machine learning
• More classes improves precision, not always recall
Future work
• Other languages and domains
• Complex semantic models
• Language independence and NE Recognition
• Seed selection and improve effectiveness
Acknowledgements: This work has been supported by the Regional Government of Madrid under the Research Network MAVIR (S-0505/TIC-0267)
and by the Spanish Ministry of Education under the project BRAVO (TIN2007-67407-C03-01).

�ݺ�ߣ

Building a Graph of Names and Contextual Patterns for Named Entity ClassificationEcir09 poster

More Related Content

Building a Graph of Names and Contextual Patterns for Named Entity ClassificationEcir09 poster