Presentation at the "Proximity in Information Retrieval" symposium on the occasion of the PhD thesis defense of Jeroen Vuurens
April 26, 2017, Delft University of Technology
1 of 43
Download to read offline
More Related Content
Enriching Linked Open Data with distributional semantics to study concept drift
1. Enriching Linked Open Data
with distributional semantics
to study concept drift
Astrid van Aggelen, Laura Hollink, Jacco van Ossenbruggen
Information Access Group
2. What is concept drift?
Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014.
Wang, S, Schlobach, S, Klein, M. Concept drift and how to identify it. Journal of Web Semantics 9.3:247- 265, 2011.
Kenter, T, Wevers, M, Huijnen, P, de Rijke, M. Ad Hoc Monitoring of Vocabulary Shifts over Time. In Proceedings of CIKM, October 2015.
The phenomenon where the characteristics of a concept
change over time, signifying a shift in meaning
3. What is concept drift?
Intension: de鍖nitions, properties, necessary and suf鍖cient condition
e.g. science, gender nonconformity
Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014.
Wang, S, Schlobach, S, Klein, M. Concept drift and how to identify it. Journal of Web Semantics 9.3:247- 265, 2011.
Kenter, T, Wevers, M, Huijnen, P, de Rijke, M. Ad Hoc Monitoring of Vocabulary Shifts over Time. In Proceedings of CIKM, October 2015.
The phenomenon where the characteristics of a concept
change over time, signifying a shift in meaning
4. What is concept drift?
Intension: de鍖nitions, properties, necessary and suf鍖cient condition
e.g. science, gender nonconformity
Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014.
Wang, S, Schlobach, S, Klein, M. Concept drift and how to identify it. Journal of Web Semantics 9.3:247- 265, 2011.
Kenter, T, Wevers, M, Huijnen, P, de Rijke, M. Ad Hoc Monitoring of Vocabulary Shifts over Time. In Proceedings of CIKM, October 2015.
The phenomenon where the characteristics of a concept
change over time, signifying a shift in meaning
Extension: the instances of a class
e.g. new Nobel prize winners, EU member states
5. What is concept drift?
Intension: de鍖nitions, properties, necessary and suf鍖cient condition
e.g. science, gender nonconformity
Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014.
Wang, S, Schlobach, S, Klein, M. Concept drift and how to identify it. Journal of Web Semantics 9.3:247- 265, 2011.
Kenter, T, Wevers, M, Huijnen, P, de Rijke, M. Ad Hoc Monitoring of Vocabulary Shifts over Time. In Proceedings of CIKM, October 2015.
The phenomenon where the characteristics of a concept
change over time, signifying a shift in meaning
Extension: the instances of a class
e.g. new Nobel prize winners, EU member states
Labels: words used to refer to to a concept
e.g. migrant, refugee
6. Linked Open Data
Classes, instances, their properties and labels are
explicitly encoded in formal languages.
class
class class
i i i i i i
i i i i i
label
label
label
label
7. Concept drift problems in LOD applications
Semantic annotation under concept drift
Ontology matching under concept drift
Interpreting user input under concept drift
Premenstrual
tension
syndromes
Tension
syndromes
Menstrual
migraine
Migraine
x
ICD9 2009
Premenstrual
tension
syndromes
Tension
syndromes
synonyms
"menstrual
migrane"
De Lignieres, B., et al. "Prevention of menstrual migraine by percutaneous
oestradiol." British medical journal (Clinical research ed.) 293.6561 (1986): 1540.
ICD9 2008
Ontology A
Ontology A'
Ontology B
Ontology B'
matched
?
??
new version new version
8. Semantic annotation under concept drift
Premenstrual
tension
syndromes
Tension
syndromes
synonyms
"menstrual
migrane"
De Lignieres, B., et al. "Prevention of menstrual migraine by percutaneous
oestradiol." British medical journal (Clinical research ed.) 293.6561 (1986): 1540.
ICD9 2008
9. Semantic annotation under concept drift
Example adapted from:
C辿dric Pruski, keynote presentation at Drift-a-LOD17, First workshop
on Detection, Representation and Management of Concept Drift in
Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016.
Premenstrual
tension
syndromes
Tension
syndromes
synonyms
"menstrual
migrane"
De Lignieres, B., et al. "Prevention of menstrual migraine by percutaneous
oestradiol." British medical journal (Clinical research ed.) 293.6561 (1986): 1540.
ICD9 2008
10. Semantic annotation under concept drift
Example adapted from:
C辿dric Pruski, keynote presentation at Drift-a-LOD17, First workshop
on Detection, Representation and Management of Concept Drift in
Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016.
Premenstrual
tension
syndromes
Tension
syndromes
Menstrual
migraine
Migraine
x
ICD9 2009
Premenstrual
tension
syndromes
Tension
syndromes
synonyms
"menstrual
migrane"
De Lignieres, B., et al. "Prevention of menstrual migraine by percutaneous
oestradiol." British medical journal (Clinical research ed.) 293.6561 (1986): 1540.
ICD9 2008
11. Interpreting user input under concept drift
http://www.delpher.nl provides access to the digitised collections from
the National Library of the Netherlands.
12. Interpreting user input under concept drift
http://www.delpher.nl provides access to the digitised collections from
the National Library of the Netherlands.
S: (n) Holocaust, 鍖nal solution (the mass
murder of Jews under the German Nazi
regime from 1941 until 1945)
Semantic annotation / named entity detection
x
13. Ontology matching under concept drift
Example adapted from:
Julio Cesar dos Reis, C辿dric Pruski, Marcos Da Silveira, Chantal
Reynaud-Dela樽tre, Understanding semantic mapping evolution by
observing changes in biomedical ontologies, Journal of
Biomedical Informatics, Volume 47, February 2014, Pages 71-82
Ontology A Ontology Bmatched
14. Ontology matching under concept drift
Example adapted from:
Julio Cesar dos Reis, C辿dric Pruski, Marcos Da Silveira, Chantal
Reynaud-Dela樽tre, Understanding semantic mapping evolution by
observing changes in biomedical ontologies, Journal of
Biomedical Informatics, Volume 47, February 2014, Pages 71-82
Ontology A
Ontology A'
Ontology Bmatched
?new version
Ontology A Ontology Bmatched
15. Ontology matching under concept drift
Example adapted from:
Julio Cesar dos Reis, C辿dric Pruski, Marcos Da Silveira, Chantal
Reynaud-Dela樽tre, Understanding semantic mapping evolution by
observing changes in biomedical ontologies, Journal of
Biomedical Informatics, Volume 47, February 2014, Pages 71-82
Ontology A
Ontology A'
Ontology B
Ontology B'
matched
?
??
new version new version
Ontology A
Ontology A'
Ontology Bmatched
?new version
Ontology A Ontology Bmatched
16. Studying concept drift in Linked Open Data
Which concept will
be deleted /
merged / split /
edited?
Prediction Versioning RDF diff
Keeping links &
annotations up to
date when entities
change
Which syntactic
change is also a
semantic change?
17. Studying concept drift in Linked Open Data
Which concept will
be deleted /
merged / split /
edited?
Prediction Versioning RDF diff
Keeping links &
annotations up to
date when entities
change
Which syntactic
change is also a
semantic change?
Recent work: tracking changes on LOD scale
18. Studying concept drift in Linked Open Data
Which concept will
be deleted /
merged / split /
edited?
Prediction Versioning RDF diff
Keeping links &
annotations up to
date when entities
change
Which syntactic
change is also a
semantic change?
Recent work: tracking changes on LOD scale
Table from: K辰fer, Tobias, et al. "Observing linked data dynamics."
Extended Semantic Web Conference. Springer Berlin Heidelberg, 2013.
19. Studying concept drift in Linked Open Data
Which concept will
be deleted /
merged / split /
edited?
Prediction Versioning RDF diff
Keeping links &
annotations up to
date when entities
change
Which syntactic
change is also a
semantic change?
Recent work: tracking changes on LOD scale
Table from: K辰fer, Tobias, et al. "Observing linked data dynamics."
Extended Semantic Web Conference. Springer Berlin Heidelberg, 2013.
Apart from
these practical
issues, it is also
just interesting
to see how
knowledge
evolves!
20. Changes in explicit knowledge are
explicit too.
We can now measure where and when
intensional, extensional and label
changes took place.
21. Changes in explicit knowledge are
explicit too.
But only to the entend that the facts are
explicitly modelled.
The association between science and
religion is not explicit.
The prevalent meaning of polysemous
words is not explicit.
We can now measure where and when
intensional, extensional and label
changes took place.
22. Changes in explicit knowledge are
explicit too.
But only to the entend that the facts are
explicitly modelled.
The association between science and
religion is not explicit.
The prevalent meaning of polysemous
words is not explicit.
We can now measure where and when
intensional, extensional and label
changes took place.
23. Changes in explicit knowledge are
explicit too.
But only to the entend that the facts are
explicitly modelled.
The association between science and
religion is not explicit.
The prevalent meaning of polysemous
words is not explicit.
We can now measure where and when
intensional, extensional and label
changes took place.
24. Distributional semantics works well for detecting
changes in word meaning
Evaluated e.g. in Frermann &
Lapata. A Bayesian Model of
Diachronic Meaning Change.
examples by Aurelie Herbelot,
http://aurelieherbelot.net/research/distributional-semantics-intro/
matrices from https://cs224d.stanford.edu/lecture_notes/notes1.pdf
25. Image from: Lea Frermann. Modelling fine-grained Change in Word Meaning over centuries from Large Collections
of Unstructured Text." Keynote presentation at Drift-a-LOD17, First workshop on Detection, Representation and
Management of Concept Drift in Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016.
26. Image from: Lea Frermann. Modelling fine-grained Change in Word Meaning over centuries from Large Collections
of Unstructured Text." Keynote presentation at Drift-a-LOD17, First workshop on Detection, Representation and
Management of Concept Drift in Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016.
27. Information on the level of individual words
Open questions:
Have synonyms changed too? And hyponyms?
Have all the words for political systems changed?
Which group of words has changed most?
29. Enriching Linked Open Data with distributional
semantics
GTAA
+
* A method to link the two data sources
* A data model to represent the combination
* An RDF dataset that can be queried:
https://github.com/aan680/
SemanticChange_data
30. Enriching Linked Open Data with distributional
semantics
GTAA
+
* A method to link the two data sources
* A data model to represent the combination
* An RDF dataset that can be queried:
https://github.com/aan680/
SemanticChange_data Code
Embeddings derived from google books
Change scores for top 10.000 words
between each decade over 200 years.
31. WordNet Data Model
example of data from WordNet RDF
Synset
(democracy)
LexicalEntry
Form
Synset
(political system)
"a political system in which the
supreme power lies in a body of
citizens who can elect people to
represent them"
"democracy"@en
gloss
noun.group
domain
Synset
(parliamentary
democracy)
noun
part of speech
"a political system in which
a mob is the source of
control; government by the
masses"
Synset
(mobocracy)
gloss
Synset
(political party)
meronym hypernym hypernym
hypernym
32. Data model for change scores
{lexical entry, decade 1, decade 2,
change score}
33. Data model for change scores
8.878 matches (out of 10.000)
mapped on 12.469 lexical entries
34. Example query
WordNet synsets are classi鍖ed into 46 domains.
Which domain has changes most in the past two centuries?
.
:
43. Conclusion
A 鍖rst step to enrich LOD with information about lexical
change, obtained from large volumes of unstructured text.
GTAA
Next steps: enrich
LOD with info
about how
concepts are used:
popularity?
importance?
controversy?
Published as:
A. van Aggelen, L. Hollink and J. van Ossenbruggen.
Combining distributional semantics and structured data
to study lexical change. In proceedings of the first Drift-
a-LOD workshop, co-located with EKAW, Bologna, Italy,
20 Nov. 2016