狠狠撸

Temporal Entity Random Indexing
Annalina Caputo, Gary Munnelly, Seamus Lawless
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

www.adaptcentre.ieSome things stay the same
[1] https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Arnold_Schwarzenegger_1974.jpg/220px-Arnold_Schwarzenegger_1974.jpg
[2] http://epmghispanic.media.clients.ellingtoncms.com/img/photos/2017/08/01/Arnold_Schwarzenegger_t750x550.jpg
[3] http://cdn.cultofmac.com/wp-content/uploads/2014/05/arnold3.jpg

www.adaptcentre.ieSome things change
Body Builder Actor Governor
Is it possible to automatically identify and quantify the
contextual shift surrounding significant entities?[1] https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Arnold_Schwarzenegger_1974.jpg/220px-Arnold_Schwarzenegger_1974.jpg
[2] http://epmghispanic.media.clients.ellingtoncms.com/img/photos/2017/08/01/Arnold_Schwarzenegger_t750x550.jpg
[3] http://cdn.cultofmac.com/wp-content/uploads/2014/05/arnold3.jpg

www.adaptcentre.ieThe Corpus
? Provided by the Linguistic
Data Consortium1
? 1.8 million articles written
and published between
January 1, 1987 and June
19, 2007
? 5,268,315 recognised
entities
? 22,738 entities which
appear in every year

www.adaptcentre.ieMethod
TRI Time
Series
Change Point
Detection
Run TRI on New
York Corpus
(1987-2007): a
WordSpace for
each year
Provide a time
series for each
word/entity
Detect significant
changes in the
time series
Entity
Linking
Recognition and
linking of entity
mentions to
DBbpedia

Entity
Linking
Recognition and
linking of entity
mentions to
DBbpedia

www.adaptcentre.ieEntity Linking
? Task of linking entity mentions to entries in a
knowledge base (DBpedia):
– CogComp2 for Named Entity Recognition
– AGDISTIS3 for Named Entity Linking
Asked to name the leader of the Democratic Party, Mr.
Lieberman did not immediately mention Mr. Gore, the
standard bearer from 2000 , who beat George W. Bush in the
popular vote.

www.adaptcentre.ieEntity Linking
? Task of linking entity mentions to entries in a
knowledge base (DBpedia):
– CogComp2 for Named Entity Recognition
– AGDISTIS3 for Named Entity Linking
Asked to name the leader of the
[dbp:Democratic_Party_(United_States)], Mr. [dbp:Joe_Lieberman]
did not immediately mention Mr. [dbp:Al_Gore], the standard
bearer from 2000 , who beat [dbp:George_W._Bush] in the popular
vote .
[1] https://upload.wikimedia.org/wikipedia/commons/thumb/7/73/US_Democratic_Party_Logo.svg/300px-US_Democratic_Party_Logo.svg.png
[2] https://en.wikipedia.org/wiki/File:George-W-Bush.jpeg
[3] https://en.wikipedia.org/wiki/File:Al_Gore,_Vice_President_of_the_United_States,_official_portrait_1994.jpg
[4] https://upload.wikimedia.org/wikipedia/commons/thumb/6/62/Joe_Lieberman_official_portrait_2.jpg

TRI
Run TRI on New
York Corpus
(1987-2007): a
WordSpace for
each year
Entity
Linking
Recognition and
linking of entity
mentions to
DBbpedia

www.adaptcentre.ieDistributional Semantic Models
? Analysis of word-
usage statistics over
huge corpora
? Geometric space of
concepts (WordSpace)
? Similar words are
represented close in
the space

www.adaptcentre.ieRandom Indexing4,5
Random Vector
… 0 0 1 0 0 0 0 0 0 -1 …
? Sparse
? high dimensional
? ternary {-1, 0, +1}
? small number of
randomly distributed
non-zero elements
Building the WordSpace
? Assign a random
vector to each term
in the corpus
vocabulary
? Semantic vector for a
term is the sum of
the context vectors
co-occurring with the
term

www.adaptcentre.ieRandom Indexing
A WordSpace is a snapshot of a specific
corpus it does not take into account
temporal information
“

www.adaptcentre.ieTRI: Temporal Random Indexing6
? Corpus with temporal information: split the corpus
in several time periods
? Build a WordSpace for each time period
? Words in different WordSpaces are comparable!
RI
Space87
RI
Space88
RI
Space07
Corpus87 Corpus88 Corpus07
…

TRI Time
Series
Run TRI on New
York Corpus
(1987-2007): a
WordSpace for
each year
Provide a time
series for each
word/entity
Entity
Linking
Recognition and
linking of entity
mentions to
DBbpedia

www.adaptcentre.ieTime Series
?"##" = ?0.04, ?0.15, 0.04, 0.00, 0.01, …
?"##- = 0.00, ?0.22,0.05, ?0.01, ?0.03, …
?"##"
?"##-

www.adaptcentre.ieChange point detection: Mean shift model7
diff max-min of
bootstrap series
diff max-min
CUMBSUM of
similarity series

www.adaptcentre.ieEvaluation Methodology
? 20 WordSpaces: one for each year
? Context window of 10 words
? Selected the top 100 entities with the highest
temporal shift
? Selected the largest group of entities which
underwent a semantic shift in the same year

www.adaptcentre.ieSome results
? 12 entities are associated with a context shift in 2001; 9 of them are
statistically significant
Named Entity p-value
Federal_Bureau_of_Investigation 0.0649
Texas 0.0017
West 0.0963
Saddam_Hussein 0.0026
Pentagon 0.019
Department_of_Justice 0.5033
Congress 0.0185
White_House 0.0004
George_H._W._Bush 0.0031
New_York 0.0138
Republican_Party_(United_States) 0.0019
American_Motors 0.0495

www.adaptcentre.ieSome results: White House
Similar entities in 2000
not present in 2001
not present in 2000

www.adaptcentre.ieSome results: Republican Party (US)
not present in 2001
not present in 2000

www.adaptcentre.ieSome results: George H. W. Bush
not present in 2001
not present in 2000

www.adaptcentre.ieSimilarity vs Frequentist approach

www.adaptcentre.ieSomething went wrong!

www.adaptcentre.ieSomething went wrong!
The
Band!

www.adaptcentre.ieStill something wrong!
George W. Bush
presidency

www.adaptcentre.ieConclusions and Future Work
? TERI allows the automatic identification of
contextual shift of entity of interests
? Does not require alignment between spaces
? It is incremental, no need for retraining
? Future work
– Application on stream of data like Twitter
– Build a dataset for temporal entity context shift
– Play with different time slice granularity

Click to edit Master title style
Thank you
Questions
annalina.caputo@adaptcentre.ie
@headlighty
https://tinyurl.com/ybv7za9t

www.adaptcentre.ieReferences
1. https://catalog.ldc.upenn.edu/ldc2008t19
2. https://github.com/CogComp/cogcomp-
nlp/tree/master/ner
3. https://github.com/dice-group/AGDISTIS
4. Magnus Sahlgren. An Introduction to Random
Indexing. In Methods and Applications of Semantic
Indexing Workshop at TKE 2005, vol. 5, 2005
5. https://github.com/semanticvectors/semanticvectors
6. https://github.com/pippokill/tri
7. https://github.com/viveksck/langchangetrack

www.adaptcentre.ieTime Series
Several time series Γ at the time interval k
log frequency
point-wise
cumulative
Word frequency in each time
period k
Cosine similarity between
word vectors across two time
periods
Considers a cumulative vector
of the previous k-1 time
periods

www.adaptcentre.ieChange point detection: Mean shift model
? Mean shift of Γ pivoted at time period j
? Search statistical significant mean shift
? Bootstrapping approach under the null hypothesis
that there is no change in the meaning

狠狠撸

Temporal Entity Random Indexing

Recommended

More Related Content

Similar to Temporal Entity Random Indexing (20)

Recently uploaded (20)

Temporal Entity Random Indexing