�ݺ�ߣ

An Empirical Evaluation of Cost-based
Federated SPARQL Query Processing Engines
Umair Qudus
Muhammad Saleem
Axel-Cyrille Ngonga Ngomo
Young-koo Lee

INTRODUCTION
• Finding a good query plan is of key step of the optimization of
query runtime.
• Different metrics proposed to measure the quality of query plan,
including query runtime, result set completeness and correctness,
number of sources selected and number of requests sent.
• Although informative, these metrics are generic and unable to
quantify and evaluate the accuracy of the cardinality estimators of
the cost-based federation engines.
• We present a novel evaluation metrics targeted at a ﬁne-grained
benchmarking of cost-based federated SPARQL query engines

We need methods to measure the quality of cost
estimations for better query planning.
Motivation (2)

Definitions (1)
• q-error:
• Example
– Cr(TP1):100 Ce(TP1) = 90
– q-error = max(90/100,100/90) = 1.11
– q-error of all TPs = max(1.11,1.25,1) = 1.25
– q-error of whole plan(TPs+Joins) = max(1.11,1.25,1,1.3,3) = 3.

Definitions (2)
• Proposed Similarity Error:
• real = (100,200,300,50,50) estimated = (90,250,300,65,150)
Ep(engine 1) = 2*0.1391 = 0.2784 Ep(engine2) = 2*0.3838 = 0.7676

Experimental Settings
• Federated Query Engines
– CostFed
– SPLENDID
– SemaGrow
– LHD
– Odyssey
• Queries and datasets
– FedBench and LargeRDFBench benchmarks. 13 Virtuoso endpoints.
• Technical Specifications: Each Virtuoso was deployed on a physical machine
(32 GB RAM, Core i7 processor and 500 GB hard disc). We ran the selected
federation engines on a local client machine with same speciﬁcations

Overall Plan Error (Similarity Error vs. q-error)

Join Error (Similarity Error vs. q-error)

Triple pattern error (Similarity Error vs. q-error)

Correlating metrics with runtime

Conclusion
• Positive correlation with the runtimes.
• The higher coeﬃcients (R values) with cosine-based errors as
compared to q-error.
• The smaller p-values of the cosine-based errors as compared to q-
error.
• Joins has higher correlation to runtimes as compared to the error in
the cardinality estimation of triple patterns.
• On average, the CostFed engine produce the smallest estimation
errors and has the smallest execution time for majority of the
LargeRDFBench queries.

Twitter: @UQudus
Paper Link: http://www.semantic-web-journal.net/system/files/swj2604.pdf
https://dice-research.org/UmairQudus

�ݺ�ߣ

An empirical evaluation of cost-based federated SPARQL query Processing Engines

More Related Content

An empirical evaluation of cost-based federated SPARQL query Processing Engines