This document discusses managing and reasoning with completeness information for RDF data sources. It presents two tools for completeness management: CORNER, a generic completeness reasoner, and COOL-WD, a tool tailored for Wikidata. It also describes techniques for optimizing completeness reasoning, including data-aware reasoning, time-aware reasoning, constant-relevance, completeness templates, and partial matching. The goal is to develop flexible completeness reasoning methods that scale to real-world datasets.
1 of 1
Download to read offline
More Related Content
2017 UniBZ Winter Seminar Poster: Managing and Consuming Completeness Information for RDF Data Sources
1. C Q
Fariz Darari
Supervisors: Werner Nutt, Sebastian Rudolph
Managing and Consuming Completeness Information
for RDF Data Sources
Why completeness information?
Though generally incomplete, parts of data on the Web are indeed complete!
Completeness information lets us know exactly which parts are complete
Real-world RDF data sources need a large number of completeness
statements, resulting in long reasoning time.
Data-agnostic reasoning optimization
CORNER COOL-WD
Generic data source
Support data-agnostic
reasoning
Highlights: RDFS
extension, federated
extension
Wikidata-specific
Support data-aware
reasoning
Highlights: Built-in on
Wikidata, completeness
analytics, query
diagnostics
Complete for all Apollo 11 crew:
Compl(apollo11,crew,?crew)
Give me people who are NOT Apollo 11 crew:
SELECT *
WHERE { ?person isA person .
FILTER NOT EXISTS { apollo11 crew ?person } }
Is this query Qneg sound?
Give me the children of Apollo 11 crew:
SELECT *
WHERE { apollo11 crew ?crew .
?crew child ?child }
Is this query Qpos complete?*
*Suppose Wikidata is also complete for all children of Neil, Buzz, and Michael
Lets manage and consume completeness information!
Data-aware completeness reasoning
Darari et al. (ISWC13) formalized data-agnostic completeness reasoning.
The abstraction of the data graph results in weaker inferences:
e.g., fails to guarantee the completeness of Qpos
The incorporation of data graph increases the complexity from
NP-complete (for data-agnostic) to 2
-complete.
Yet, optimization techniques exist for practical settings.
But data-aware reasoning can guarantee it:
Optimizing completeness reasoning
Data-aware reasoning optimization
Soundness reasoning
Answer soundness reasoning
Is my query answer sound?
Input: P query with negation,
C set of completeness statements,
G graph,
u answer mapping
Output: true iff u is sound wrt. P, C, and G
Characterization The answer u of P over G wrt. C is sound iff
all P's NOT-EXISTS-BGPs (= negative parts), after applying u
to them, are complete for G wrt. C
Time-aware completeness reasoning
Completeness statements can sometimes be out-of-date. Capturing this data-dynamicity over time
increases flexibility in completeness reasoning!
Completeness management tools
To increase the potential uptake of our completeness reasoning framework, we have developed two completeness
management tools: CORNER (for Completeness Reasoner) and COOL-WD (for Completeness Tool for Wikidata)
Publications
Radityo Eko Prasojo, Fariz Darari, Simon Razniewski, Werner Nutt: Managing and Consuming Completeness Information for Wikidata Using COOL-WD. COLD 2016.
Fariz Darari, Simon Razniewski, Radityo Eko Prasojo, Werner Nutt: Enabling Fine-Grained RDF Data Completeness Assessment. ICWE 2016.
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: Expressing No-Value Information in RDF. ISWC (P&D) 2015.
Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gap between RDF and SPARQL using Completeness Statements. ISWC (P&D) 2014.
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A Completeness Reasoner for SPARQL Queries over RDF Data Sources. ESWC (P&D) 2014.
Cardinality extraction from the Web: Auto-generating completeness information
Cardinality information often expresses complete count information, when this matches the count of respective data in a KB,
completeness statements can be generated automatically!
Web documents
(eg. Wikipedia)
POS tags
NER tags
parsing
Distant Supervision Learning
Sentences containing a number matching with
the values count of a relation
Sentences containing a number NOT matching
with the values count of a relation
Learning classifier: Na誰ve Bayes, Logistic
Regression, SVM, Conditional Random Fields
Cardinalities
KB with
completeness statements
Training data
Pattern soundness reasoning
Is my query pattern sound?
Input: P minimal query with negation,
C set of completeness statements
Output: true iff P is sound wrt. C
Characterization The query P is sound wrt. C iff
each BGP of the NOT-EXISTS patterns (= negative parts)
is complete wrt. C under the condition of
the positive part of P
It is the case that Qneg is pattern-sound since the statement
Compl(apollo11,crew,?crew) guarantees the completeness of
apollo11 crew ?person under any condition
(hence also under the condition of ?person isA person)
apollo11 crew ?crew ?crew child ?childQpos
Compl(apollo11,crew,?crew)
neil child ?child buzz child ?child michael child ?child
Compl(neil,child,?child) Compl(buzz,child,?child) Compl(michael,child,?child)
Compl(neil,spouse,?spouse) Compl(buzz,child,?child) Compl(michael,child,?child)
Constants:
Constant-relevance
A completeness statement C is relevant to the query Q
iff all constants in C appear in Q
{neil, spouse} {buzz, child} {michael, child}
michael child ?child
Constants: {michael, child}
X X
Retrieval of constant-relevant
statements can be reduced to
subset-querying
Completeness template
Generalize similar completeness statements
for simultaneous matching process
Compl(neil,child,?child) Compl(buzz,child,?child) Compl(michael,child,?child)
Compl[$p,child,?child]
$p = {neil, buzz, michael}
Partial matching
Filter irrelevant completeness templates
by ruling out templates whose body is not overlapped
with the querys body
Experiments showed
a 50000X speed-up!
Experiments showed
a 112X speed-up!
Open-world style Closed-world style
of negationCompleteness statements:
reducing soundness checking to completeness checking!
2012
Compl(?movie,director,tarantino)
Compl(?movie,actor,tarantino)
SELECT * WHERE { ?movie actor tarantino. ?movie director tarantino }
GCD := maximum date d s.t.
all parts of the query Q can be
guaranteed to be complete
Guaranteed Completeness
Date (GCD) = 2012
Algorithm
Incrementally compute the union of query parts that can be guaranteed to be complete
from the latest date in C to the earliest date, while on the way checking if all the query parts are already included.