ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
1 of 20
Introduction 
Semantic Web ? ? Web of Data ? 
Unprecedented production of resources 
published as 
Linked Open Data (LOD) 
RDF provides formal ways to build 
the assertions 
LOD ? typed links 
mostly RDF identity links ? sameAs statements 
2 of 20
The problem of identity 
Most of the RDF links 
connecting resources 
coming from different 
data sources are 
RDF identity links 
? sameAs statements. 
Many existing identity links 
do not reflect such genuine identity 
3 of 20 
sameAs 
i1 i2 
sameAs 
i4 i3 
sameAs 
i1 i5 
sameAs 
i5 i6
What we propose 
An ontology-based logical method to 
detect invalid sameAs statements 
We build a contextual graph ¡¯around¡¯ each 
one of the two resources involved in the 
sameAs statement 
We study the descriptions provided in 
these contextual graphs to eventually 
detect inconsistencies 
4 of 20
How to choose the properties? 
Properties meaningful for the validation, possibly not used in 
the linking¡­. 
Functional and Inverse Functional Properties 
sameAs(x,y) ? p1(x,w1) ? p1(y,w2) ? sameAs(w1,w2) 
publish1 ¡Ù publish2 book2 
¡­Local Completeness 
of 20 
book1 
¡®Springer¡¯ 
¡®ACM Press¡¯ 
5
How to choose the properties? 
Local Completeness 
If p(x,w1) !? w2 such that ?sameAs(w2,w1) ? p(x,w2) 
of 20 
book1 
L. Papaleo L. Papaleo 
N. Pernelle F. Sais 
book2 
¡Ù 
6
Property-based walk 
of length n w{n,s,P}. 
Alternating sequence of nodes and predicates 
{v0 ¡Ô s, p0, v1, p1, v2, . . . , vn?1, pn?1, vn} 
such that 
v0,...,vn?1 are resources with URI 
vn is a literal 
each triple {vi, pi, vi+1} is a RDF triple in G 
all the resources in the walk are 
distinct from one another 
of 20 
b1 
p1 
l1 
¡®New York¡¯ ¡®US¡¯ 
7
m-degree 
Contextual Graph G{m,s,P} 
Sub-graph of G such that 
every node vi ¡Ê G{m,s,P } 
belongs to a property-based walk of 
length n, with n ¡Ü m. 
of 20 
b1 
p1 
l1 
e1 
¡®New York¡¯ ¡®US¡¯ 
¡®E. Hauss¡¯ 
¡®My Title¡¯ 
Set the depth of the sub-graph! 
8
Problem statement 
check for inconsistencies in the assertion sameAs(x, y) 
according to the knowledge provided in the RDF graph G. 
Tools: sameAs statements, set of properties, the contextual graph. 
9 of 20
¡®one title¡¯ ¡®one title¡¯ 
35 23 
Given sameAs (x,y,) P={hasTitle, numPages, hasPublisher, hasName} 
and 2-contextual graphs G{2,Book1,P} and G{2,Book2,P} 
G{2,Book1,P} ? G {2,Book2,P} ? sameAs(Book1,Book2) ? ? 
of 20 
Book1 Book2 
publisher1 publisher2 
¡®name pub¡¯ ¡®name pub2¡¯ 
10 
sameAs 
hasPublisher hasPublisher
G{2,Book1,P} G {2,Book2,P} 
¡®one title¡¯ ¡®one title¡¯ 
35 23 
Given sameAs (x,y,) P={hasTitle, numPages, hasPublisher, hasName} 
and 2-contextual graphs G{2,Book1,P} and G{2,Book2,P} 
G{2,Book1,P} ? G {2,Book2,P} ? sameAs(Book1,Book2) ? ? 
of 20 
Book1 Book2 
publisher1 publisher2 
¡®name pub¡¯ ¡®name pub2¡¯ 
11 
sameAs 
hasPublisher hasPublisher
Method 
F is the set of RDF facts 
enriched by a set of ?synVals 
facts in the form 
?synVals(w1, w2) 
w1 and w2, being literals and 
different. 
EXAMPLES: 
- notSynVals(¡®231¡¯,¡¯100¡¯) 
for a functional property numOfPages 
-notSynVals(¡®New York¡¯, ¡®Paris¡¯) 
for a functional property cityName 
¡­ knowledge from expert or extracted. 
12 of 20
Method 
F is the set of RDF facts enriched 
also by a set of facts in the form 
?piLC 
(s, w) 
For every piLC 
(property local 
complete), 
if w is different to all the w¡¯ s.t. 
piLC 
(s, w¡¯) belongs to F 
EXAMPLES: 
? hasAuthor(b1,¡¯N. Pernelle¡¯) 
with 
hasAuthor property local complete 
and 
the book b1 has authors ¡®L. Papaleo¡¯ 
and ¡®F. Sais¡¯ 
13 of 20
Method 
R the set of rules 
14 of 20 
(inverse) functional properties 
sameAs(x,y) ?numOfPages(x,w1) ? numOfPages(y,w2) ? SynVals(w1,w2) 
local complete properties 
sameAs(x,y) ? hasAuthor(x,w1) 
? hasAuthor(y,w1)
Results 
A prototype of our validation framework has been implemented in 
Java using the AIMA library for the resolution 
We started from the output of different linking methods [1], [2] and 
[3]. Each of them provides a set of sameAs. 
15 of 20 
[1] Sais et al.: LN2R a knowledge based 
reference reconciliation system: 
OAEI2010 results. In OM2010 2010 
[2] Symeonidou et al.: SAKey: 
Scalable Almost Key Discovery in 
RDF Data. 
In ISWC 2014. LNCS (2014) 
[3] Yves et al.: Ontology matching 
with semantic verification. Web 
Semantics 7(3) (2009)
Contextual graph 
OAEI 2010 dataset on Restaurants 
¡®bel air¡¯ 
16 of 20 
restaurant2 
¡®Californian¡¯ 
¡®818/788-3536¡¯ ¡®cafe bizou¡¯ ¡®701 stone canyon rd.¡¯
Results 
17 of 20 
LM LM 
Precision 
TC RG TN TP FN FP Accuracy Recall IA 
precision 
LM+IA 
precision 
2 95.55% 90 4 81 3 1 5 93,34% 75% 37% 98.85% 
1 69.71% 142 43 94 38 5 5 92.9% 88.4% 88.4% 95.19% 
3 90.17% 112 11 86 11 0 16 86.60% 100% 42.30% 100% 
Improvement in 
precision
Results 
18 of 20 
LM LM 
Precision 
Because we cannot find inconsistencies wrt the 
properties selected 
TC RG TN TP FN FP Accuracy Recall IA 
precision 
LM+IA 
precision 
2 95.55% 90 4 81 3 1 5 93,34% 75% 37% 98.85% 
1 69.71% 142 43 94 38 5 5 92.9% 88.4% 88.4% 95.19% 
3 90.17% 112 11 86 11 0 16 86.60% 100% 42.30% 100% 
Because there are errors in the data, synvals were 
not accurate or some functional properties are not 
functional
Conclusions 
On evaluating the 
quality sameAs links 
links in the LOD 
? Definition of contextual 
graph 
? (inverse) functional 
properties and the 
properties defined as local 
local complete. 
An ontology-based 
based logical 
evaluation method 
method 
Tests on data 
coming from linking 
linking methods ? 
promising results 
19 of 20 
Definition of property-based 
walk 
Definition of m-degree 
contextual graph 
Definition of rules and fact for 
the resolution system
Future works 
Run experiments on more complex datasets. 
Generalization process for ? almost sameAs ? 
A new numerical approach using similarity measures 
20 of 20

More Related Content

Logical Detection of Invalid SameAs Statements in RDF Data

  • 2. Introduction Semantic Web ? ? Web of Data ? Unprecedented production of resources published as Linked Open Data (LOD) RDF provides formal ways to build the assertions LOD ? typed links mostly RDF identity links ? sameAs statements 2 of 20
  • 3. The problem of identity Most of the RDF links connecting resources coming from different data sources are RDF identity links ? sameAs statements. Many existing identity links do not reflect such genuine identity 3 of 20 sameAs i1 i2 sameAs i4 i3 sameAs i1 i5 sameAs i5 i6
  • 4. What we propose An ontology-based logical method to detect invalid sameAs statements We build a contextual graph ¡¯around¡¯ each one of the two resources involved in the sameAs statement We study the descriptions provided in these contextual graphs to eventually detect inconsistencies 4 of 20
  • 5. How to choose the properties? Properties meaningful for the validation, possibly not used in the linking¡­. Functional and Inverse Functional Properties sameAs(x,y) ? p1(x,w1) ? p1(y,w2) ? sameAs(w1,w2) publish1 ¡Ù publish2 book2 ¡­Local Completeness of 20 book1 ¡®Springer¡¯ ¡®ACM Press¡¯ 5
  • 6. How to choose the properties? Local Completeness If p(x,w1) !? w2 such that ?sameAs(w2,w1) ? p(x,w2) of 20 book1 L. Papaleo L. Papaleo N. Pernelle F. Sais book2 ¡Ù 6
  • 7. Property-based walk of length n w{n,s,P}. Alternating sequence of nodes and predicates {v0 ¡Ô s, p0, v1, p1, v2, . . . , vn?1, pn?1, vn} such that v0,...,vn?1 are resources with URI vn is a literal each triple {vi, pi, vi+1} is a RDF triple in G all the resources in the walk are distinct from one another of 20 b1 p1 l1 ¡®New York¡¯ ¡®US¡¯ 7
  • 8. m-degree Contextual Graph G{m,s,P} Sub-graph of G such that every node vi ¡Ê G{m,s,P } belongs to a property-based walk of length n, with n ¡Ü m. of 20 b1 p1 l1 e1 ¡®New York¡¯ ¡®US¡¯ ¡®E. Hauss¡¯ ¡®My Title¡¯ Set the depth of the sub-graph! 8
  • 9. Problem statement check for inconsistencies in the assertion sameAs(x, y) according to the knowledge provided in the RDF graph G. Tools: sameAs statements, set of properties, the contextual graph. 9 of 20
  • 10. ¡®one title¡¯ ¡®one title¡¯ 35 23 Given sameAs (x,y,) P={hasTitle, numPages, hasPublisher, hasName} and 2-contextual graphs G{2,Book1,P} and G{2,Book2,P} G{2,Book1,P} ? G {2,Book2,P} ? sameAs(Book1,Book2) ? ? of 20 Book1 Book2 publisher1 publisher2 ¡®name pub¡¯ ¡®name pub2¡¯ 10 sameAs hasPublisher hasPublisher
  • 11. G{2,Book1,P} G {2,Book2,P} ¡®one title¡¯ ¡®one title¡¯ 35 23 Given sameAs (x,y,) P={hasTitle, numPages, hasPublisher, hasName} and 2-contextual graphs G{2,Book1,P} and G{2,Book2,P} G{2,Book1,P} ? G {2,Book2,P} ? sameAs(Book1,Book2) ? ? of 20 Book1 Book2 publisher1 publisher2 ¡®name pub¡¯ ¡®name pub2¡¯ 11 sameAs hasPublisher hasPublisher
  • 12. Method F is the set of RDF facts enriched by a set of ?synVals facts in the form ?synVals(w1, w2) w1 and w2, being literals and different. EXAMPLES: - notSynVals(¡®231¡¯,¡¯100¡¯) for a functional property numOfPages -notSynVals(¡®New York¡¯, ¡®Paris¡¯) for a functional property cityName ¡­ knowledge from expert or extracted. 12 of 20
  • 13. Method F is the set of RDF facts enriched also by a set of facts in the form ?piLC (s, w) For every piLC (property local complete), if w is different to all the w¡¯ s.t. piLC (s, w¡¯) belongs to F EXAMPLES: ? hasAuthor(b1,¡¯N. Pernelle¡¯) with hasAuthor property local complete and the book b1 has authors ¡®L. Papaleo¡¯ and ¡®F. Sais¡¯ 13 of 20
  • 14. Method R the set of rules 14 of 20 (inverse) functional properties sameAs(x,y) ?numOfPages(x,w1) ? numOfPages(y,w2) ? SynVals(w1,w2) local complete properties sameAs(x,y) ? hasAuthor(x,w1) ? hasAuthor(y,w1)
  • 15. Results A prototype of our validation framework has been implemented in Java using the AIMA library for the resolution We started from the output of different linking methods [1], [2] and [3]. Each of them provides a set of sameAs. 15 of 20 [1] Sais et al.: LN2R a knowledge based reference reconciliation system: OAEI2010 results. In OM2010 2010 [2] Symeonidou et al.: SAKey: Scalable Almost Key Discovery in RDF Data. In ISWC 2014. LNCS (2014) [3] Yves et al.: Ontology matching with semantic verification. Web Semantics 7(3) (2009)
  • 16. Contextual graph OAEI 2010 dataset on Restaurants ¡®bel air¡¯ 16 of 20 restaurant2 ¡®Californian¡¯ ¡®818/788-3536¡¯ ¡®cafe bizou¡¯ ¡®701 stone canyon rd.¡¯
  • 17. Results 17 of 20 LM LM Precision TC RG TN TP FN FP Accuracy Recall IA precision LM+IA precision 2 95.55% 90 4 81 3 1 5 93,34% 75% 37% 98.85% 1 69.71% 142 43 94 38 5 5 92.9% 88.4% 88.4% 95.19% 3 90.17% 112 11 86 11 0 16 86.60% 100% 42.30% 100% Improvement in precision
  • 18. Results 18 of 20 LM LM Precision Because we cannot find inconsistencies wrt the properties selected TC RG TN TP FN FP Accuracy Recall IA precision LM+IA precision 2 95.55% 90 4 81 3 1 5 93,34% 75% 37% 98.85% 1 69.71% 142 43 94 38 5 5 92.9% 88.4% 88.4% 95.19% 3 90.17% 112 11 86 11 0 16 86.60% 100% 42.30% 100% Because there are errors in the data, synvals were not accurate or some functional properties are not functional
  • 19. Conclusions On evaluating the quality sameAs links links in the LOD ? Definition of contextual graph ? (inverse) functional properties and the properties defined as local local complete. An ontology-based based logical evaluation method method Tests on data coming from linking linking methods ? promising results 19 of 20 Definition of property-based walk Definition of m-degree contextual graph Definition of rules and fact for the resolution system
  • 20. Future works Run experiments on more complex datasets. Generalization process for ? almost sameAs ? A new numerical approach using similarity measures 20 of 20

Editor's Notes

  • #3: The Semantic Web is a ¡¯Web of Data¡¯, where data can be processed by machines, extending the principles of the Web from documents to data Today, we are experiencing an unprecedented production of resources, published as Linked Open Data This is leading to the creation of a global data space containing billions of assertions RDF provides formal ways to build these assertions.
  • #4: It is becoming extremely important to study the quality of data and links in the LOD Useful in applications that want to consume Linked Data as well as in Semantic Web frameworks dedicated to data linking or data integration.
  • #5: We claim that, when logical conflicts are encountered, the initial RDF identity link is ¡¯inconsistent¡¯, meaning that it requires further investigation (supervised or automatic). We suppose that, in case of multiple data sources, mappings between properties are provided
  • #6: The contextual graph is built according to specific properties Functional properties¡­
  • #7: Properties that can be considered local complete The closed-world assumption is in general inappropriate for the Semantic Web due to its size and rate of change. But in some domains and specific contexts, local-completeness for RDF predicates (properties) can be assured
  • #8: a collection of assertions selected according to specific conditions (the starting resource s and the set of properties P). In other words, with a walk w{n,s,P} in the graph G, we select a sequence of assertions in some way related to the resource s.
  • #9: A m-degree contextual graph for a resource s can be seen as a subset of knowledge pertinent to s, bounded by the set of predicates P.
  • #12: Given
  • #15: Semantics of equality. Equality relation?owl:sameAs?as being reflexive, symmetric, and transitive, and it axiomatizes the standard replacement properties of equality for it. Table 4.?The Semantics of Equality eq-ref T(?s,??p,??o) T(?s, owl:sameAs,??s) T(?p, owl:sameAs,??p) T(?o, owl:sameAs,??o) eq-sym T(?x, owl:sameAs,??y) T(?y, owl:sameAs,??x) eq-trans T(?x, owl:sameAs,??y) T(?y, owl:sameAs,??z) T(?x, owl:sameAs,??z) eq-rep-s T(?s, owl:sameAs,??s') T(?s',??p,??o) T(?s,??p,??o) eq-rep-p T(?p, owl:sameAs,??p') T(?s,??p',??o) T(?s,??p,??o) eq-rep-o T(?o, owl:sameAs,??o') T(?s,??p,??o') T(?s,??p,??o)
  • #16: In [20] the sameAs statements are computed according to similarity measures over specific property descriptions, as in [26] where similarity between entities is iteratively calculated by analyzing specific features. In [24], instead sameAs statements are computed on the basis of a novel algorithm for key discovery. We tested the approach on sameAs statements provided by linking tools that have been applied on Ontology Alignment Evaluation Initiative (OAEI) datasets, showing that our research direction is promising.
  • #17: An instance of restaurant in the dataset ¡¯restaurant1¡¯ Given the functional properties phone number, has address and city, a contextual graph of degree 2 is depicted ¡­. Given a sameAs statement in the form sameAs(x,y) we computed the contextual graph of degree 2 considering the three functional properties
  • #18: In conclusion, our results showed that, when our validation tool is applied after one of the linking tool, the precision of each tool can be improved, namely for [24] we pass from a precision of 95.55% to 98.85%, for [20] from a precision of 69.71% to 95.19% and finally for [26] from a precision of 90.17% to 100%.
  • #19: In conclusion, our results showed that, when our validation tool is applied after one of the linking tool, the precision of each tool can be improved, namely for [24] we pass from a precision of 95.55% to 98.85%, for [20] from a precision of 69.71% to 95.19% and finally for [26] from a precision of 90.17% to 100%.