ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Temporal Entity Random Indexing
Annalina Caputo, Gary Munnelly, Seamus Lawless
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
www.adaptcentre.ieSome things stay the same
[1] https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Arnold_Schwarzenegger_1974.jpg/220px-Arnold_Schwarzenegger_1974.jpg
[2] http://epmghispanic.media.clients.ellingtoncms.com/img/photos/2017/08/01/Arnold_Schwarzenegger_t750x550.jpg
[3] http://cdn.cultofmac.com/wp-content/uploads/2014/05/arnold3.jpg
www.adaptcentre.ieSome things change
Body Builder Actor Governor
Is it possible to automatically identify and quantify the
contextual shift surrounding significant entities?[1] https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Arnold_Schwarzenegger_1974.jpg/220px-Arnold_Schwarzenegger_1974.jpg
[2] http://epmghispanic.media.clients.ellingtoncms.com/img/photos/2017/08/01/Arnold_Schwarzenegger_t750x550.jpg
[3] http://cdn.cultofmac.com/wp-content/uploads/2014/05/arnold3.jpg
www.adaptcentre.ieThe Corpus
? Provided by the Linguistic
Data Consortium1
? 1.8 million articles written
and published between
January 1, 1987 and June
19, 2007
? 5,268,315 recognised
entities
? 22,738 entities which
appear in every year
www.adaptcentre.ieMethod
TRI Time
Series
Change Point
Detection
Run TRI on New
York Corpus
(1987-2007): a
WordSpace for
each year
Provide a time
series for each
word/entity
Detect significant
changes in the
time series
Entity
Linking
Recognition and
linking of entity
mentions to
DBbpedia
www.adaptcentre.ieMethod
Entity
Linking
Recognition and
linking of entity
mentions to
DBbpedia
www.adaptcentre.ieEntity Linking
? Task of linking entity mentions to entries in a
knowledge base (DBpedia):
¨C CogComp2 for Named Entity Recognition
¨C AGDISTIS3 for Named Entity Linking
Asked to name the leader of the Democratic Party, Mr.
Lieberman did not immediately mention Mr. Gore, the
standard bearer from 2000 , who beat George W. Bush in the
popular vote.
www.adaptcentre.ieEntity Linking
? Task of linking entity mentions to entries in a
knowledge base (DBpedia):
¨C CogComp2 for Named Entity Recognition
¨C AGDISTIS3 for Named Entity Linking
Asked to name the leader of the
[dbp:Democratic_Party_(United_States)], Mr. [dbp:Joe_Lieberman]
did not immediately mention Mr. [dbp:Al_Gore], the standard
bearer from 2000 , who beat [dbp:George_W._Bush] in the popular
vote .
[1] https://upload.wikimedia.org/wikipedia/commons/thumb/7/73/US_Democratic_Party_Logo.svg/300px-US_Democratic_Party_Logo.svg.png
[2] https://en.wikipedia.org/wiki/File:George-W-Bush.jpeg
[3] https://en.wikipedia.org/wiki/File:Al_Gore,_Vice_President_of_the_United_States,_official_portrait_1994.jpg
[4] https://upload.wikimedia.org/wikipedia/commons/thumb/6/62/Joe_Lieberman_official_portrait_2.jpg
www.adaptcentre.ieMethod
TRI
Run TRI on New
York Corpus
(1987-2007): a
WordSpace for
each year
Entity
Linking
Recognition and
linking of entity
mentions to
DBbpedia
www.adaptcentre.ieDistributional Semantic Models
? Analysis of word-
usage statistics over
huge corpora
? Geometric space of
concepts (WordSpace)
? Similar words are
represented close in
the space
www.adaptcentre.ieRandom Indexing4,5
Random Vector
¡­ 0 0 1 0 0 0 0 0 0 -1 ¡­
? Sparse
? high dimensional
? ternary {-1, 0, +1}
? small number of
randomly distributed
non-zero elements
Building the WordSpace
? Assign a random
vector to each term
in the corpus
vocabulary
? Semantic vector for a
term is the sum of
the context vectors
co-occurring with the
term
www.adaptcentre.ieRandom Indexing
A WordSpace is a snapshot of a specific
corpus it does not take into account
temporal information
¡°
www.adaptcentre.ieTRI: Temporal Random Indexing6
? Corpus with temporal information: split the corpus
in several time periods
? Build a WordSpace for each time period
? Words in different WordSpaces are comparable!
RI
Space87
RI
Space88
RI
Space07
Corpus87 Corpus88 Corpus07
¡­
www.adaptcentre.ieMethod
TRI Time
Series
Run TRI on New
York Corpus
(1987-2007): a
WordSpace for
each year
Provide a time
series for each
word/entity
Entity
Linking
Recognition and
linking of entity
mentions to
DBbpedia
www.adaptcentre.ieTime Series
?"##" = ?0.04, ?0.15, 0.04, 0.00, 0.01, ¡­
?"##- = 0.00, ?0.22,0.05, ?0.01, ?0.03, ¡­
?"##"
?"##-
www.adaptcentre.ieMethod
TRI Time
Series
Change Point
Detection
Run TRI on New
York Corpus
(1987-2007): a
WordSpace for
each year
Provide a time
series for each
word/entity
Detect significant
changes in the
time series
Entity
Linking
Recognition and
linking of entity
mentions to
DBbpedia
www.adaptcentre.ieChange point detection: Mean shift model7
diff max-min of
bootstrap series
diff max-min
CUMBSUM of
similarity series
www.adaptcentre.ieEvaluation Methodology
? 20 WordSpaces: one for each year
? Context window of 10 words
? Selected the top 100 entities with the highest
temporal shift
? Selected the largest group of entities which
underwent a semantic shift in the same year
www.adaptcentre.ieSome results
? 12 entities are associated with a context shift in 2001; 9 of them are
statistically significant
Named Entity p-value
Federal_Bureau_of_Investigation 0.0649
Texas 0.0017
West 0.0963
Saddam_Hussein 0.0026
Pentagon 0.019
Department_of_Justice 0.5033
Congress 0.0185
White_House 0.0004
George_H._W._Bush 0.0031
New_York 0.0138
Republican_Party_(United_States) 0.0019
American_Motors 0.0495
www.adaptcentre.ieSome results: White House
Similar entities in 2000
not present in 2001
Similar entities in 2001
not present in 2000
www.adaptcentre.ieSome results: Republican Party (US)
Similar entities in 2000
not present in 2001
Similar entities in 2001
not present in 2000
www.adaptcentre.ieSome results: George H. W. Bush
Similar entities in 2000
not present in 2001
Similar entities in 2001
not present in 2000
www.adaptcentre.ieSimilarity vs Frequentist approach
www.adaptcentre.ieSomething went wrong!
www.adaptcentre.ieSomething went wrong!
The
Band!
www.adaptcentre.ieStill something wrong!
George W. Bush
presidency
www.adaptcentre.ieConclusions and Future Work
? TERI allows the automatic identification of
contextual shift of entity of interests
? Does not require alignment between spaces
? It is incremental, no need for retraining
? Future work
¨C Application on stream of data like Twitter
¨C Build a dataset for temporal entity context shift
¨C Play with different time slice granularity
Click to edit Master title style
Thank you
Questions
annalina.caputo@adaptcentre.ie
@headlighty
https://tinyurl.com/ybv7za9t
www.adaptcentre.ieReferences
1. https://catalog.ldc.upenn.edu/ldc2008t19
2. https://github.com/CogComp/cogcomp-
nlp/tree/master/ner
3. https://github.com/dice-group/AGDISTIS
4. Magnus Sahlgren. An Introduction to Random
Indexing. In Methods and Applications of Semantic
Indexing Workshop at TKE 2005, vol. 5, 2005
5. https://github.com/semanticvectors/semanticvectors
6. https://github.com/pippokill/tri
7. https://github.com/viveksck/langchangetrack
www.adaptcentre.ieTime Series
Several time series ¦£ at the time interval k
log frequency
point-wise
cumulative
Word frequency in each time
period k
Cosine similarity between
word vectors across two time
periods
Considers a cumulative vector
of the previous k-1 time
periods
www.adaptcentre.ieChange point detection: Mean shift model
? Mean shift of ¦£ pivoted at time period j
? Search statistical significant mean shift
? Bootstrapping approach under the null hypothesis
that there is no change in the meaning

More Related Content

Similar to Temporal Entity Random Indexing (20)

Similarity at scale
Similarity at scaleSimilarity at scale
Similarity at scale
Ken Krugler
?
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
David Simons
?
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
Ritvvij Parrikh
?
Data warehouse
Data warehouseData warehouse
Data warehouse
Chakravarthi ch
?
Data warehouse
Data warehouseData warehouse
Data warehouse
krishna kumar singh
?
Schema.org - Extending Benefits
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
Richard Wallis
?
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
Victor de Boer
?
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
Ontotext
?
The 2016 State of Storage in Virtualization Survey Results
The 2016 State of Storage in Virtualization Survey ResultsThe 2016 State of Storage in Virtualization Survey Results
The 2016 State of Storage in Virtualization Survey Results
Flash Storage
?
·¡-³¦´Ç³¾³¾±ð°ù³¦±ð¹«Ë¾¤Ë¤ª¤±¤ë¥Ó¥Ã¥°¥Ç©`¥¿»îÓäÎÈ¡¤ê×é¤ß¤È½ñºó¤ÎÕ¹Íû
·¡-³¦´Ç³¾³¾±ð°ù³¦±ð¹«Ë¾¤Ë¤ª¤±¤ë¥Ó¥Ã¥°¥Ç©`¥¿»îÓäÎÈ¡¤ê×é¤ß¤È½ñºó¤ÎÕ¹Íû·¡-³¦´Ç³¾³¾±ð°ù³¦±ð¹«Ë¾¤Ë¤ª¤±¤ë¥Ó¥Ã¥°¥Ç©`¥¿»îÓäÎÈ¡¤ê×é¤ß¤È½ñºó¤ÎÕ¹Íû
·¡-³¦´Ç³¾³¾±ð°ù³¦±ð¹«Ë¾¤Ë¤ª¤±¤ë¥Ó¥Ã¥°¥Ç©`¥¿»îÓäÎÈ¡¤ê×é¤ß¤È½ñºó¤ÎÕ¹Íû
Rakuten Group, Inc.
?
The Power of Declarative Analytics
The Power of Declarative AnalyticsThe Power of Declarative Analytics
The Power of Declarative Analytics
Yunyao Li
?
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with R
Stephen Withington
?
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
J T "Tom" Johnson
?
Bus infoengineers january_25_2013_engr185_final in class
Bus infoengineers january_25_2013_engr185_final in classBus infoengineers january_25_2013_engr185_final in class
Bus infoengineers january_25_2013_engr185_final in class
Michael Oppenheim
?
BUSI 3460U - Fall 2010 [complete]
BUSI 3460U - Fall 2010 [complete]BUSI 3460U - Fall 2010 [complete]
BUSI 3460U - Fall 2010 [complete]
kstymest
?
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsLeveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Nitish Aggarwal
?
How IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsHow IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problems
ikanow
?
2009 unicef open everything nyc
2009 unicef open everything nyc2009 unicef open everything nyc
2009 unicef open everything nyc
Robert David Steele Vivas
?
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docxCSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
mydrynan
?
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
Isabelle Augenstein
?
Similarity at scale
Similarity at scaleSimilarity at scale
Similarity at scale
Ken Krugler
?
Data Modelling at Scale
Data Modelling at ScaleData Modelling at Scale
Data Modelling at Scale
David Simons
?
Getting comfortable with Data
Getting comfortable with DataGetting comfortable with Data
Getting comfortable with Data
Ritvvij Parrikh
?
Schema.org - Extending Benefits
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
Richard Wallis
?
Web technology: Web search
Web technology: Web searchWeb technology: Web search
Web technology: Web search
Victor de Boer
?
How to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk AnalyticsHow to Reveal Hidden Relationships in Data and Risk Analytics
How to Reveal Hidden Relationships in Data and Risk Analytics
Ontotext
?
The 2016 State of Storage in Virtualization Survey Results
The 2016 State of Storage in Virtualization Survey ResultsThe 2016 State of Storage in Virtualization Survey Results
The 2016 State of Storage in Virtualization Survey Results
Flash Storage
?
·¡-³¦´Ç³¾³¾±ð°ù³¦±ð¹«Ë¾¤Ë¤ª¤±¤ë¥Ó¥Ã¥°¥Ç©`¥¿»îÓäÎÈ¡¤ê×é¤ß¤È½ñºó¤ÎÕ¹Íû
·¡-³¦´Ç³¾³¾±ð°ù³¦±ð¹«Ë¾¤Ë¤ª¤±¤ë¥Ó¥Ã¥°¥Ç©`¥¿»îÓäÎÈ¡¤ê×é¤ß¤È½ñºó¤ÎÕ¹Íû·¡-³¦´Ç³¾³¾±ð°ù³¦±ð¹«Ë¾¤Ë¤ª¤±¤ë¥Ó¥Ã¥°¥Ç©`¥¿»îÓäÎÈ¡¤ê×é¤ß¤È½ñºó¤ÎÕ¹Íû
·¡-³¦´Ç³¾³¾±ð°ù³¦±ð¹«Ë¾¤Ë¤ª¤±¤ë¥Ó¥Ã¥°¥Ç©`¥¿»îÓäÎÈ¡¤ê×é¤ß¤È½ñºó¤ÎÕ¹Íû
Rakuten Group, Inc.
?
The Power of Declarative Analytics
The Power of Declarative AnalyticsThe Power of Declarative Analytics
The Power of Declarative Analytics
Yunyao Li
?
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
IRE "Better Watchdog" workshop presentation "Data: Now I've got it, what do I...
J T "Tom" Johnson
?
Bus infoengineers january_25_2013_engr185_final in class
Bus infoengineers january_25_2013_engr185_final in classBus infoengineers january_25_2013_engr185_final in class
Bus infoengineers january_25_2013_engr185_final in class
Michael Oppenheim
?
BUSI 3460U - Fall 2010 [complete]
BUSI 3460U - Fall 2010 [complete]BUSI 3460U - Fall 2010 [complete]
BUSI 3460U - Fall 2010 [complete]
kstymest
?
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsLeveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Nitish Aggarwal
?
How IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problemsHow IKANOW uses MongoDB to help organizations solve really big problems
How IKANOW uses MongoDB to help organizations solve really big problems
ikanow
?
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docxCSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
CSCI 340 Final Group ProjectNatalie Warden, Arturo Gonzalez, R.docx
mydrynan
?
Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)Towards Explainable Fact Checking (DIKU Business Club presentation)
Towards Explainable Fact Checking (DIKU Business Club presentation)
Isabelle Augenstein
?

Recently uploaded (20)

This presentation detail concepts of cryptocurrency
This presentation detail concepts of cryptocurrencyThis presentation detail concepts of cryptocurrency
This presentation detail concepts of cryptocurrency
Aslbtr
?
Chapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdfChapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdf
ShamsAli42
?
deloitte esg 16092024 for september 2024 pdf.pdf
deloitte esg 16092024 for september 2024 pdf.pdfdeloitte esg 16092024 for september 2024 pdf.pdf
deloitte esg 16092024 for september 2024 pdf.pdf
JatinSharma979989
?
Visionaize for Visionaize AI Powered Solution For Thermal Power Plant.pptx
Visionaize  for Visionaize AI Powered Solution For Thermal Power Plant.pptxVisionaize  for Visionaize AI Powered Solution For Thermal Power Plant.pptx
Visionaize for Visionaize AI Powered Solution For Thermal Power Plant.pptx
SumantaBasu12
?
Turinton Insights - Enterprise Agentic AI Platform
Turinton Insights - Enterprise Agentic AI PlatformTurinton Insights - Enterprise Agentic AI Platform
Turinton Insights - Enterprise Agentic AI Platform
vikrant530668
?
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
soniaseo850
?
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdhFOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
cshdhdhvfsbzdb
?
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docxThreat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
dexposewebcast
?
The rise of AI Agents - Beyond Automation_ The Rise of AI Agents in Service ...
The rise of AI Agents -  Beyond Automation_ The Rise of AI Agents in Service ...The rise of AI Agents -  Beyond Automation_ The Rise of AI Agents in Service ...
The rise of AI Agents - Beyond Automation_ The Rise of AI Agents in Service ...
Yasen Lilov
?
Chat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian playersChat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian players
DR. Ram Kumar Pathak
?
PPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
PPTjhjhghhhghghghggvgfggffgftftftftftft.pptPPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
PPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
vmanjusundertamil21
?
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
jimmy841199
?
20-NoSQLMongoDbiig data analytics hB.pdf
20-NoSQLMongoDbiig data analytics hB.pdf20-NoSQLMongoDbiig data analytics hB.pdf
20-NoSQLMongoDbiig data analytics hB.pdf
ssuser2d043c
?
Infection_Control_in_Dentistry_Presentation.pptx
Infection_Control_in_Dentistry_Presentation.pptxInfection_Control_in_Dentistry_Presentation.pptx
Infection_Control_in_Dentistry_Presentation.pptx
FadyAbedulAziz
?
Data-Ethics-and-Privacy-What-Every-Analyst-Should-Know
Data-Ethics-and-Privacy-What-Every-Analyst-Should-KnowData-Ethics-and-Privacy-What-Every-Analyst-Should-Know
Data-Ethics-and-Privacy-What-Every-Analyst-Should-Know
Ozias Rondon
?
AI-vs-Data-Science-Unraveling-the-Tech-Landscape
AI-vs-Data-Science-Unraveling-the-Tech-LandscapeAI-vs-Data-Science-Unraveling-the-Tech-Landscape
AI-vs-Data-Science-Unraveling-the-Tech-Landscape
Ozias Rondon
?
01125867_HPE_Primera_Customer_Presentation_FINAL.pptx
01125867_HPE_Primera_Customer_Presentation_FINAL.pptx01125867_HPE_Primera_Customer_Presentation_FINAL.pptx
01125867_HPE_Primera_Customer_Presentation_FINAL.pptx
ali2k2sec
?
Information Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptxInformation Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptx
FrancisFayiah
?
BoSEU25 | Diego de J¨®dar | Why User Activation is the Key to Sustainable Growth
BoSEU25 | Diego de J¨®dar | Why User Activation is the Key to Sustainable GrowthBoSEU25 | Diego de J¨®dar | Why User Activation is the Key to Sustainable Growth
BoSEU25 | Diego de J¨®dar | Why User Activation is the Key to Sustainable Growth
Business of Software Conference
?
Advice vs Criticism which one is good and not.pptx
Advice vs Criticism which one is good and not.pptxAdvice vs Criticism which one is good and not.pptx
Advice vs Criticism which one is good and not.pptx
thecorneredtigers
?
This presentation detail concepts of cryptocurrency
This presentation detail concepts of cryptocurrencyThis presentation detail concepts of cryptocurrency
This presentation detail concepts of cryptocurrency
Aslbtr
?
Chapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdfChapter-4-Plane-Wave-Propagation-pdf.pdf
Chapter-4-Plane-Wave-Propagation-pdf.pdf
ShamsAli42
?
deloitte esg 16092024 for september 2024 pdf.pdf
deloitte esg 16092024 for september 2024 pdf.pdfdeloitte esg 16092024 for september 2024 pdf.pdf
deloitte esg 16092024 for september 2024 pdf.pdf
JatinSharma979989
?
Visionaize for Visionaize AI Powered Solution For Thermal Power Plant.pptx
Visionaize  for Visionaize AI Powered Solution For Thermal Power Plant.pptxVisionaize  for Visionaize AI Powered Solution For Thermal Power Plant.pptx
Visionaize for Visionaize AI Powered Solution For Thermal Power Plant.pptx
SumantaBasu12
?
Turinton Insights - Enterprise Agentic AI Platform
Turinton Insights - Enterprise Agentic AI PlatformTurinton Insights - Enterprise Agentic AI Platform
Turinton Insights - Enterprise Agentic AI Platform
vikrant530668
?
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
cPanel Dedicated Server Hosting at Top-Tier Data Center comes with a Premier ...
soniaseo850
?
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdhFOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
FOOD LAWS.pptxbshdhdhdhdhdhhdhdhdhdhdhhdh
cshdhdhvfsbzdb
?
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docxThreat Intelligence Platform_ The Future of Cybersecurity Defense.docx
Threat Intelligence Platform_ The Future of Cybersecurity Defense.docx
dexposewebcast
?
The rise of AI Agents - Beyond Automation_ The Rise of AI Agents in Service ...
The rise of AI Agents -  Beyond Automation_ The Rise of AI Agents in Service ...The rise of AI Agents -  Beyond Automation_ The Rise of AI Agents in Service ...
The rise of AI Agents - Beyond Automation_ The Rise of AI Agents in Service ...
Yasen Lilov
?
Chat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian playersChat Bots - An Analytical study including Indian players
Chat Bots - An Analytical study including Indian players
DR. Ram Kumar Pathak
?
PPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
PPTjhjhghhhghghghggvgfggffgftftftftftft.pptPPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
PPTjhjhghhhghghghggvgfggffgftftftftftft.ppt
vmanjusundertamil21
?
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
Exploratory data analysis (EDA) is used by data scientists to analyze and inv...
jimmy841199
?
20-NoSQLMongoDbiig data analytics hB.pdf
20-NoSQLMongoDbiig data analytics hB.pdf20-NoSQLMongoDbiig data analytics hB.pdf
20-NoSQLMongoDbiig data analytics hB.pdf
ssuser2d043c
?
Infection_Control_in_Dentistry_Presentation.pptx
Infection_Control_in_Dentistry_Presentation.pptxInfection_Control_in_Dentistry_Presentation.pptx
Infection_Control_in_Dentistry_Presentation.pptx
FadyAbedulAziz
?
Data-Ethics-and-Privacy-What-Every-Analyst-Should-Know
Data-Ethics-and-Privacy-What-Every-Analyst-Should-KnowData-Ethics-and-Privacy-What-Every-Analyst-Should-Know
Data-Ethics-and-Privacy-What-Every-Analyst-Should-Know
Ozias Rondon
?
AI-vs-Data-Science-Unraveling-the-Tech-Landscape
AI-vs-Data-Science-Unraveling-the-Tech-LandscapeAI-vs-Data-Science-Unraveling-the-Tech-Landscape
AI-vs-Data-Science-Unraveling-the-Tech-Landscape
Ozias Rondon
?
01125867_HPE_Primera_Customer_Presentation_FINAL.pptx
01125867_HPE_Primera_Customer_Presentation_FINAL.pptx01125867_HPE_Primera_Customer_Presentation_FINAL.pptx
01125867_HPE_Primera_Customer_Presentation_FINAL.pptx
ali2k2sec
?
Information Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptxInformation Security Management-Planning 1.pptx
Information Security Management-Planning 1.pptx
FrancisFayiah
?
BoSEU25 | Diego de J¨®dar | Why User Activation is the Key to Sustainable Growth
BoSEU25 | Diego de J¨®dar | Why User Activation is the Key to Sustainable GrowthBoSEU25 | Diego de J¨®dar | Why User Activation is the Key to Sustainable Growth
BoSEU25 | Diego de J¨®dar | Why User Activation is the Key to Sustainable Growth
Business of Software Conference
?
Advice vs Criticism which one is good and not.pptx
Advice vs Criticism which one is good and not.pptxAdvice vs Criticism which one is good and not.pptx
Advice vs Criticism which one is good and not.pptx
thecorneredtigers
?

Temporal Entity Random Indexing

  • 1. Temporal Entity Random Indexing Annalina Caputo, Gary Munnelly, Seamus Lawless The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
  • 2. www.adaptcentre.ieSome things stay the same [1] https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Arnold_Schwarzenegger_1974.jpg/220px-Arnold_Schwarzenegger_1974.jpg [2] http://epmghispanic.media.clients.ellingtoncms.com/img/photos/2017/08/01/Arnold_Schwarzenegger_t750x550.jpg [3] http://cdn.cultofmac.com/wp-content/uploads/2014/05/arnold3.jpg
  • 3. www.adaptcentre.ieSome things change Body Builder Actor Governor Is it possible to automatically identify and quantify the contextual shift surrounding significant entities?[1] https://upload.wikimedia.org/wikipedia/commons/thumb/0/05/Arnold_Schwarzenegger_1974.jpg/220px-Arnold_Schwarzenegger_1974.jpg [2] http://epmghispanic.media.clients.ellingtoncms.com/img/photos/2017/08/01/Arnold_Schwarzenegger_t750x550.jpg [3] http://cdn.cultofmac.com/wp-content/uploads/2014/05/arnold3.jpg
  • 4. www.adaptcentre.ieThe Corpus ? Provided by the Linguistic Data Consortium1 ? 1.8 million articles written and published between January 1, 1987 and June 19, 2007 ? 5,268,315 recognised entities ? 22,738 entities which appear in every year
  • 5. www.adaptcentre.ieMethod TRI Time Series Change Point Detection Run TRI on New York Corpus (1987-2007): a WordSpace for each year Provide a time series for each word/entity Detect significant changes in the time series Entity Linking Recognition and linking of entity mentions to DBbpedia
  • 7. www.adaptcentre.ieEntity Linking ? Task of linking entity mentions to entries in a knowledge base (DBpedia): ¨C CogComp2 for Named Entity Recognition ¨C AGDISTIS3 for Named Entity Linking Asked to name the leader of the Democratic Party, Mr. Lieberman did not immediately mention Mr. Gore, the standard bearer from 2000 , who beat George W. Bush in the popular vote.
  • 8. www.adaptcentre.ieEntity Linking ? Task of linking entity mentions to entries in a knowledge base (DBpedia): ¨C CogComp2 for Named Entity Recognition ¨C AGDISTIS3 for Named Entity Linking Asked to name the leader of the [dbp:Democratic_Party_(United_States)], Mr. [dbp:Joe_Lieberman] did not immediately mention Mr. [dbp:Al_Gore], the standard bearer from 2000 , who beat [dbp:George_W._Bush] in the popular vote . [1] https://upload.wikimedia.org/wikipedia/commons/thumb/7/73/US_Democratic_Party_Logo.svg/300px-US_Democratic_Party_Logo.svg.png [2] https://en.wikipedia.org/wiki/File:George-W-Bush.jpeg [3] https://en.wikipedia.org/wiki/File:Al_Gore,_Vice_President_of_the_United_States,_official_portrait_1994.jpg [4] https://upload.wikimedia.org/wikipedia/commons/thumb/6/62/Joe_Lieberman_official_portrait_2.jpg
  • 9. www.adaptcentre.ieMethod TRI Run TRI on New York Corpus (1987-2007): a WordSpace for each year Entity Linking Recognition and linking of entity mentions to DBbpedia
  • 10. www.adaptcentre.ieDistributional Semantic Models ? Analysis of word- usage statistics over huge corpora ? Geometric space of concepts (WordSpace) ? Similar words are represented close in the space
  • 11. www.adaptcentre.ieRandom Indexing4,5 Random Vector ¡­ 0 0 1 0 0 0 0 0 0 -1 ¡­ ? Sparse ? high dimensional ? ternary {-1, 0, +1} ? small number of randomly distributed non-zero elements Building the WordSpace ? Assign a random vector to each term in the corpus vocabulary ? Semantic vector for a term is the sum of the context vectors co-occurring with the term
  • 12. www.adaptcentre.ieRandom Indexing A WordSpace is a snapshot of a specific corpus it does not take into account temporal information ¡°
  • 13. www.adaptcentre.ieTRI: Temporal Random Indexing6 ? Corpus with temporal information: split the corpus in several time periods ? Build a WordSpace for each time period ? Words in different WordSpaces are comparable! RI Space87 RI Space88 RI Space07 Corpus87 Corpus88 Corpus07 ¡­
  • 14. www.adaptcentre.ieMethod TRI Time Series Run TRI on New York Corpus (1987-2007): a WordSpace for each year Provide a time series for each word/entity Entity Linking Recognition and linking of entity mentions to DBbpedia
  • 15. www.adaptcentre.ieTime Series ?"##" = ?0.04, ?0.15, 0.04, 0.00, 0.01, ¡­ ?"##- = 0.00, ?0.22,0.05, ?0.01, ?0.03, ¡­ ?"##" ?"##-
  • 16. www.adaptcentre.ieMethod TRI Time Series Change Point Detection Run TRI on New York Corpus (1987-2007): a WordSpace for each year Provide a time series for each word/entity Detect significant changes in the time series Entity Linking Recognition and linking of entity mentions to DBbpedia
  • 17. www.adaptcentre.ieChange point detection: Mean shift model7 diff max-min of bootstrap series diff max-min CUMBSUM of similarity series
  • 18. www.adaptcentre.ieEvaluation Methodology ? 20 WordSpaces: one for each year ? Context window of 10 words ? Selected the top 100 entities with the highest temporal shift ? Selected the largest group of entities which underwent a semantic shift in the same year
  • 19. www.adaptcentre.ieSome results ? 12 entities are associated with a context shift in 2001; 9 of them are statistically significant Named Entity p-value Federal_Bureau_of_Investigation 0.0649 Texas 0.0017 West 0.0963 Saddam_Hussein 0.0026 Pentagon 0.019 Department_of_Justice 0.5033 Congress 0.0185 White_House 0.0004 George_H._W._Bush 0.0031 New_York 0.0138 Republican_Party_(United_States) 0.0019 American_Motors 0.0495
  • 20. www.adaptcentre.ieSome results: White House Similar entities in 2000 not present in 2001 Similar entities in 2001 not present in 2000
  • 21. www.adaptcentre.ieSome results: Republican Party (US) Similar entities in 2000 not present in 2001 Similar entities in 2001 not present in 2000
  • 22. www.adaptcentre.ieSome results: George H. W. Bush Similar entities in 2000 not present in 2001 Similar entities in 2001 not present in 2000
  • 27. www.adaptcentre.ieConclusions and Future Work ? TERI allows the automatic identification of contextual shift of entity of interests ? Does not require alignment between spaces ? It is incremental, no need for retraining ? Future work ¨C Application on stream of data like Twitter ¨C Build a dataset for temporal entity context shift ¨C Play with different time slice granularity
  • 28. Click to edit Master title style Thank you Questions annalina.caputo@adaptcentre.ie @headlighty https://tinyurl.com/ybv7za9t
  • 29. www.adaptcentre.ieReferences 1. https://catalog.ldc.upenn.edu/ldc2008t19 2. https://github.com/CogComp/cogcomp- nlp/tree/master/ner 3. https://github.com/dice-group/AGDISTIS 4. Magnus Sahlgren. An Introduction to Random Indexing. In Methods and Applications of Semantic Indexing Workshop at TKE 2005, vol. 5, 2005 5. https://github.com/semanticvectors/semanticvectors 6. https://github.com/pippokill/tri 7. https://github.com/viveksck/langchangetrack
  • 30. www.adaptcentre.ieTime Series Several time series ¦£ at the time interval k log frequency point-wise cumulative Word frequency in each time period k Cosine similarity between word vectors across two time periods Considers a cumulative vector of the previous k-1 time periods
  • 31. www.adaptcentre.ieChange point detection: Mean shift model ? Mean shift of ¦£ pivoted at time period j ? Search statistical significant mean shift ? Bootstrapping approach under the null hypothesis that there is no change in the meaning