際際滷

際際滷Share a Scribd company logo
Building a Graph of Names and Contextual
Patterns for Named Entity Classi鍖cation
C卒esar de Pablo S卒anchez and Paloma Mart卒脹nez
LABDA, Computer Science Dept., Universidad Carlos III de Madrid
{cdepablo,pmf}@inf.uc3m.es
Objectives
 NERC for multilingual applications
 Bootstrap a name list and indicative patterns
 Large document collection
 Few example seeds for every class Nseeds < 40
 Language independence (as an aim)
Initial assumptions
 Dual bootstrapping
 One sense per entity type (name)
 Indelibility of class assignments
 Counter-training: learn several classes at once
 Query based exploration of the indexed collection.
PERSON(x) Left patterns Right patterns
Num Name Num Text Num Text
15 Fernando Arrabal 0 Gobierno del presidente 6 , ### esta tarde
64 Teodoro Obiang 1 Gobierno del ### 9 , vencedor
68 Salvador Allende 12 gobierno del presidente 21 y el ex
128 Peres 13 presidente del pa卒脹s , 26 , viajar卒a
156 Edouard Balladur 29 actual presidente 34 , y su colega
332 Grachov 47 palabras de 42 , visitar卒a
423 Calder卒on 50 cuyo ### , 49 , y el l卒脹der
450 Colom 60 presidente , 63 y el presidente
522 Joaqu卒脹n Almunia 61 reuni卒on con 65 se entrevist卒o
Direct Evaluation: Name Lists (AvgPrec)
Model PER LOC ORG M / T Mean
PLO 94.8 52.7 67.1  71.5
PLOM 93.0 44.8 79.3 75.0 73.0
PLOT 94.8 87.4 81.1 40.9 76.0
Name Classi鍖cation
Model P R F Acc
baseline
CONLL 26.27 56.48 35.86 
ORG    39.34
entities
PLO 77.33 54.34 63.83 64.04
PLOM 78.85 51.53 62.36 66.24
PLOT 78.72 41.58 54.42 62.18
entities+patterns
PLO 66.12 57.97 61.78 63.17
PLOM 73.65 61.73 67.17 71.29
PLOT 66.35 56.62 61.10 62.50
Algorithm
Pattern selection and evaluation
1. Rank by Support, 鍖lter min-support, select top-k
2. Evaluate min-Acc: Acc(p) = Pos
Pos+Neg
3. Evaluate min-Conf: Conf(p) == PosNeg
Pos+Neg+Unk
Entity selection and evaluation
1. Rank by Support, 鍖lter min-support, select top-k
2. Evaluate min-Conf:
Confslot(a) = 1  i (1  Confpattern(pi)) ,
ConfNE(a) = Confleft(a)  Confright(a)
Conclusions
 Ef鍖cient bootstrapping from large indexed collections
with less seeds
 Already useful for NERC
 F-measure is lower than supervised machine learning
 More classes improves precision, not always recall
Future work
 Other languages and domains
 Complex semantic models
 Language independence and NE Recognition
 Seed selection and improve effectiveness
Acknowledgements: This work has been supported by the Regional Government of Madrid under the Research Network MAVIR (S-0505/TIC-0267)
and by the Spanish Ministry of Education under the project BRAVO (TIN2007-67407-C03-01).

More Related Content

Building a Graph of Names and Contextual Patterns for Named Entity ClassificationEcir09 poster

  • 1. Building a Graph of Names and Contextual Patterns for Named Entity Classi鍖cation C卒esar de Pablo S卒anchez and Paloma Mart卒脹nez LABDA, Computer Science Dept., Universidad Carlos III de Madrid {cdepablo,pmf}@inf.uc3m.es Objectives NERC for multilingual applications Bootstrap a name list and indicative patterns Large document collection Few example seeds for every class Nseeds < 40 Language independence (as an aim) Initial assumptions Dual bootstrapping One sense per entity type (name) Indelibility of class assignments Counter-training: learn several classes at once Query based exploration of the indexed collection. PERSON(x) Left patterns Right patterns Num Name Num Text Num Text 15 Fernando Arrabal 0 Gobierno del presidente 6 , ### esta tarde 64 Teodoro Obiang 1 Gobierno del ### 9 , vencedor 68 Salvador Allende 12 gobierno del presidente 21 y el ex 128 Peres 13 presidente del pa卒脹s , 26 , viajar卒a 156 Edouard Balladur 29 actual presidente 34 , y su colega 332 Grachov 47 palabras de 42 , visitar卒a 423 Calder卒on 50 cuyo ### , 49 , y el l卒脹der 450 Colom 60 presidente , 63 y el presidente 522 Joaqu卒脹n Almunia 61 reuni卒on con 65 se entrevist卒o Direct Evaluation: Name Lists (AvgPrec) Model PER LOC ORG M / T Mean PLO 94.8 52.7 67.1 71.5 PLOM 93.0 44.8 79.3 75.0 73.0 PLOT 94.8 87.4 81.1 40.9 76.0 Name Classi鍖cation Model P R F Acc baseline CONLL 26.27 56.48 35.86 ORG 39.34 entities PLO 77.33 54.34 63.83 64.04 PLOM 78.85 51.53 62.36 66.24 PLOT 78.72 41.58 54.42 62.18 entities+patterns PLO 66.12 57.97 61.78 63.17 PLOM 73.65 61.73 67.17 71.29 PLOT 66.35 56.62 61.10 62.50 Algorithm Pattern selection and evaluation 1. Rank by Support, 鍖lter min-support, select top-k 2. Evaluate min-Acc: Acc(p) = Pos Pos+Neg 3. Evaluate min-Conf: Conf(p) == PosNeg Pos+Neg+Unk Entity selection and evaluation 1. Rank by Support, 鍖lter min-support, select top-k 2. Evaluate min-Conf: Confslot(a) = 1 i (1 Confpattern(pi)) , ConfNE(a) = Confleft(a) Confright(a) Conclusions Ef鍖cient bootstrapping from large indexed collections with less seeds Already useful for NERC F-measure is lower than supervised machine learning More classes improves precision, not always recall Future work Other languages and domains Complex semantic models Language independence and NE Recognition Seed selection and improve effectiveness Acknowledgements: This work has been supported by the Regional Government of Madrid under the Research Network MAVIR (S-0505/TIC-0267) and by the Spanish Ministry of Education under the project BRAVO (TIN2007-67407-C03-01).