Finding a good query plan is key to the optimization of query runtime. This holds in particular for cost-based federation
engines, which make use of cardinality estimations to achieve this goal. A number of studies compare SPARQL federation
engines across different performance metrics, including query runtime, result set completeness and correctness, number of sources
selected and number of requests sent. Albeit informative, these metrics are generic and unable to quantify and evaluate the
accuracy of the cardinality estimators of cost-based federation engines. To thoroughly evaluate cost-based federation engines, the
effect of estimated cardinality errors on the overall query runtime performance must be measured. In this paper, we address this
challenge by presenting novel evaluation metrics targeted at a fine-grained benchmarking of cost-based federated SPARQL query
engines. We evaluate five cost-based federated SPARQL query engines using existing as well as novel evaluation metrics by using
LargeRDFBench queries. Our results provide a detailed analysis of the experimental outcomes that reveal novel insights, useful
for the development of future cost-based federated SPARQL query processing engines.
1 of 21
Download to read offline
More Related Content
An empirical evaluation of cost-based federated SPARQL query Processing Engines
1. An Empirical Evaluation of Cost-based
Federated SPARQL Query Processing Engines
Umair Qudus
Muhammad Saleem
Axel-Cyrille Ngonga Ngomo
Young-koo Lee
2. INTRODUCTION
Finding a good query plan is of key step of the optimization of
query runtime.
Different metrics proposed to measure the quality of query plan,
including query runtime, result set completeness and correctness,
number of sources selected and number of requests sent.
Although informative, these metrics are generic and unable to
quantify and evaluate the accuracy of the cardinality estimators of
the cost-based federation engines.
We present a novel evaluation metrics targeted at a 鍖ne-grained
benchmarking of cost-based federated SPARQL query engines
11. Experimental Settings
Federated Query Engines
CostFed
SPLENDID
SemaGrow
LHD
Odyssey
Queries and datasets
FedBench and LargeRDFBench benchmarks. 13 Virtuoso endpoints.
Technical Specifications: Each Virtuoso was deployed on a physical machine
(32 GB RAM, Core i7 processor and 500 GB hard disc). We ran the selected
federation engines on a local client machine with same speci鍖cations
20. Conclusion
Positive correlation with the runtimes.
The higher coe鍖cients (R values) with cosine-based errors as
compared to q-error.
The smaller p-values of the cosine-based errors as compared to q-
error.
Joins has higher correlation to runtimes as compared to the error in
the cardinality estimation of triple patterns.
On average, the CostFed engine produce the smallest estimation
errors and has the smallest execution time for majority of the
LargeRDFBench queries.
21. Twitter: @UQudus
Paper Link: http://www.semantic-web-journal.net/system/files/swj2604.pdf
https://dice-research.org/UmairQudus