In the last years, thanks to the standardization of Semantic Web technologies, we are experiencing an unprecedented production of data, published online as Linked Data. In this context, when a typed link is instantiated between two different resources referring to the same real world entity, the usage of owl:sameAs is generally predominant. However, recent research discussions have shown issues in the use of owl:sameAs. Problems arise both in cases in which sameAs is automatically discovered by a data linking tool erroneously, or when users declare it but meaning something less ¡¯strict¡¯ than the semantics defined by OWL. In this work, we discuss further this issue and we present a method for logically detect invalid sameAs statements under specific circumstances. We report our experimental results, performed on OAEI datasets, to prove that the approach is promising.
1 of 20
Download to read offline
More Related Content
Logical Detection of Invalid SameAs Statements in RDF Data
2. Introduction
Semantic Web ? ? Web of Data ?
Unprecedented production of resources
published as
Linked Open Data (LOD)
RDF provides formal ways to build
the assertions
LOD ? typed links
mostly RDF identity links ? sameAs statements
2 of 20
3. The problem of identity
Most of the RDF links
connecting resources
coming from different
data sources are
RDF identity links
? sameAs statements.
Many existing identity links
do not reflect such genuine identity
3 of 20
sameAs
i1 i2
sameAs
i4 i3
sameAs
i1 i5
sameAs
i5 i6
4. What we propose
An ontology-based logical method to
detect invalid sameAs statements
We build a contextual graph ¡¯around¡¯ each
one of the two resources involved in the
sameAs statement
We study the descriptions provided in
these contextual graphs to eventually
detect inconsistencies
4 of 20
5. How to choose the properties?
Properties meaningful for the validation, possibly not used in
the linking¡.
Functional and Inverse Functional Properties
sameAs(x,y) ? p1(x,w1) ? p1(y,w2) ? sameAs(w1,w2)
publish1 ¡Ù publish2 book2
¡Local Completeness
of 20
book1
¡®Springer¡¯
¡®ACM Press¡¯
5
6. How to choose the properties?
Local Completeness
If p(x,w1) !? w2 such that ?sameAs(w2,w1) ? p(x,w2)
of 20
book1
L. Papaleo L. Papaleo
N. Pernelle F. Sais
book2
¡Ù
6
7. Property-based walk
of length n w{n,s,P}.
Alternating sequence of nodes and predicates
{v0 ¡Ô s, p0, v1, p1, v2, . . . , vn?1, pn?1, vn}
such that
v0,...,vn?1 are resources with URI
vn is a literal
each triple {vi, pi, vi+1} is a RDF triple in G
all the resources in the walk are
distinct from one another
of 20
b1
p1
l1
¡®New York¡¯ ¡®US¡¯
7
8. m-degree
Contextual Graph G{m,s,P}
Sub-graph of G such that
every node vi ¡Ê G{m,s,P }
belongs to a property-based walk of
length n, with n ¡Ü m.
of 20
b1
p1
l1
e1
¡®New York¡¯ ¡®US¡¯
¡®E. Hauss¡¯
¡®My Title¡¯
Set the depth of the sub-graph!
8
9. Problem statement
check for inconsistencies in the assertion sameAs(x, y)
according to the knowledge provided in the RDF graph G.
Tools: sameAs statements, set of properties, the contextual graph.
9 of 20
10. ¡®one title¡¯ ¡®one title¡¯
35 23
Given sameAs (x,y,) P={hasTitle, numPages, hasPublisher, hasName}
and 2-contextual graphs G{2,Book1,P} and G{2,Book2,P}
G{2,Book1,P} ? G {2,Book2,P} ? sameAs(Book1,Book2) ? ?
of 20
Book1 Book2
publisher1 publisher2
¡®name pub¡¯ ¡®name pub2¡¯
10
sameAs
hasPublisher hasPublisher
11. G{2,Book1,P} G {2,Book2,P}
¡®one title¡¯ ¡®one title¡¯
35 23
Given sameAs (x,y,) P={hasTitle, numPages, hasPublisher, hasName}
and 2-contextual graphs G{2,Book1,P} and G{2,Book2,P}
G{2,Book1,P} ? G {2,Book2,P} ? sameAs(Book1,Book2) ? ?
of 20
Book1 Book2
publisher1 publisher2
¡®name pub¡¯ ¡®name pub2¡¯
11
sameAs
hasPublisher hasPublisher
12. Method
F is the set of RDF facts
enriched by a set of ?synVals
facts in the form
?synVals(w1, w2)
w1 and w2, being literals and
different.
EXAMPLES:
- notSynVals(¡®231¡¯,¡¯100¡¯)
for a functional property numOfPages
-notSynVals(¡®New York¡¯, ¡®Paris¡¯)
for a functional property cityName
¡ knowledge from expert or extracted.
12 of 20
13. Method
F is the set of RDF facts enriched
also by a set of facts in the form
?piLC
(s, w)
For every piLC
(property local
complete),
if w is different to all the w¡¯ s.t.
piLC
(s, w¡¯) belongs to F
EXAMPLES:
? hasAuthor(b1,¡¯N. Pernelle¡¯)
with
hasAuthor property local complete
and
the book b1 has authors ¡®L. Papaleo¡¯
and ¡®F. Sais¡¯
13 of 20
14. Method
R the set of rules
14 of 20
(inverse) functional properties
sameAs(x,y) ?numOfPages(x,w1) ? numOfPages(y,w2) ? SynVals(w1,w2)
local complete properties
sameAs(x,y) ? hasAuthor(x,w1)
? hasAuthor(y,w1)
15. Results
A prototype of our validation framework has been implemented in
Java using the AIMA library for the resolution
We started from the output of different linking methods [1], [2] and
[3]. Each of them provides a set of sameAs.
15 of 20
[1] Sais et al.: LN2R a knowledge based
reference reconciliation system:
OAEI2010 results. In OM2010 2010
[2] Symeonidou et al.: SAKey:
Scalable Almost Key Discovery in
RDF Data.
In ISWC 2014. LNCS (2014)
[3] Yves et al.: Ontology matching
with semantic verification. Web
Semantics 7(3) (2009)
16. Contextual graph
OAEI 2010 dataset on Restaurants
¡®bel air¡¯
16 of 20
restaurant2
¡®Californian¡¯
¡®818/788-3536¡¯ ¡®cafe bizou¡¯ ¡®701 stone canyon rd.¡¯
18. Results
18 of 20
LM LM
Precision
Because we cannot find inconsistencies wrt the
properties selected
TC RG TN TP FN FP Accuracy Recall IA
precision
LM+IA
precision
2 95.55% 90 4 81 3 1 5 93,34% 75% 37% 98.85%
1 69.71% 142 43 94 38 5 5 92.9% 88.4% 88.4% 95.19%
3 90.17% 112 11 86 11 0 16 86.60% 100% 42.30% 100%
Because there are errors in the data, synvals were
not accurate or some functional properties are not
functional
19. Conclusions
On evaluating the
quality sameAs links
links in the LOD
? Definition of contextual
graph
? (inverse) functional
properties and the
properties defined as local
local complete.
An ontology-based
based logical
evaluation method
method
Tests on data
coming from linking
linking methods ?
promising results
19 of 20
Definition of property-based
walk
Definition of m-degree
contextual graph
Definition of rules and fact for
the resolution system
20. Future works
Run experiments on more complex datasets.
Generalization process for ? almost sameAs ?
A new numerical approach using similarity measures
20 of 20
Editor's Notes
#3: The Semantic Web is a ¡¯Web of Data¡¯, where data can be processed by machines,
extending the principles of the Web from documents to data
Today, we are experiencing an
unprecedented production of resources, published as Linked Open Data
This is leading to the creation of a global data space containing billions
of assertions
RDF provides formal ways to build these assertions.
#4: It is becoming extremely important to study
the quality of data and links in the LOD
Useful in applications that want to consume Linked Data
as well as in Semantic Web frameworks dedicated
to data linking or data integration.
#5: We claim that, when logical conflicts are encountered,
the initial RDF identity link is ¡¯inconsistent¡¯,
meaning that it requires further investigation (supervised or automatic).
We suppose that, in case of multiple data sources,
mappings between properties are provided
#6: The contextual graph is built according to specific properties
Functional properties¡
#7: Properties that can be considered local complete
The closed-world assumption is in general
inappropriate for the Semantic Web
due to its size and rate of change.
But in some domains and specific contexts,
local-completeness for RDF predicates (properties)
can be assured
#8: a collection of assertions
selected according to specific conditions
(the starting resource s and the set of properties P).
In other words, with a walk w{n,s,P} in the graph G,
we select a sequence of assertions in some way
related to the resource s.
#9: A m-degree contextual graph for a resource s can be seen
as a subset of knowledge pertinent to s,
bounded by the set of predicates P.
#15: Semantics of equality.
Equality relation?owl:sameAs?as being reflexive, symmetric, and transitive,
and it axiomatizes the standard replacement properties of equality for it.
Table 4.?The Semantics of Equality
eq-ref T(?s,??p,??o) T(?s, owl:sameAs,??s) T(?p, owl:sameAs,??p) T(?o, owl:sameAs,??o)
eq-sym T(?x, owl:sameAs,??y) T(?y, owl:sameAs,??x)
eq-trans T(?x, owl:sameAs,??y)
T(?y, owl:sameAs,??z) T(?x, owl:sameAs,??z)
eq-rep-s T(?s, owl:sameAs,??s') T(?s',??p,??o)
T(?s,??p,??o)
eq-rep-p T(?p, owl:sameAs,??p') T(?s,??p',??o) T(?s,??p,??o)
eq-rep-o T(?o, owl:sameAs,??o') T(?s,??p,??o') T(?s,??p,??o)
#16: In [20] the sameAs statements are computed according to similarity measures over specific property descriptions, as in [26] where similarity between entities is iteratively calculated by analyzing specific features. In [24], instead sameAs statements are computed on the basis of a novel algorithm for key discovery.
We tested the approach on sameAs statements provided by linking tools that have been applied on
Ontology Alignment Evaluation Initiative (OAEI) datasets, showing that our research direction is promising.
#17: An instance of restaurant in the dataset ¡¯restaurant1¡¯ Given the functional properties phone number, has address and city, a contextual graph of degree 2 is depicted ¡.
Given a sameAs statement in the form sameAs(x,y) we computed the contextual graph of degree 2 considering the three functional properties
#18: In conclusion, our results showed that, when our validation tool is applied after one of the linking tool, the precision of each tool can be improved, namely for [24] we pass from a precision of 95.55% to 98.85%, for [20] from a precision of 69.71% to 95.19% and finally for [26] from a precision of 90.17% to 100%.
#19: In conclusion, our results showed that, when our validation tool is applied after one of the linking tool, the precision of each tool can be improved, namely for [24] we pass from a precision of 95.55% to 98.85%, for [20] from a precision of 69.71% to 95.19% and finally for [26] from a precision of 90.17% to 100%.