際際滷

際際滷Share a Scribd company logo
An Empirical Evaluation of Cost-based
Federated SPARQL Query Processing Engines
Umair Qudus
Muhammad Saleem
Axel-Cyrille Ngonga Ngomo
Young-koo Lee
INTRODUCTION
 Finding a good query plan is of key step of the optimization of
query runtime.
 Different metrics proposed to measure the quality of query plan,
including query runtime, result set completeness and correctness,
number of sources selected and number of requests sent.
 Although informative, these metrics are generic and unable to
quantify and evaluate the accuracy of the cardinality estimators of
the cost-based federation engines.
 We present a novel evaluation metrics targeted at a 鍖ne-grained
benchmarking of cost-based federated SPARQL query engines
Motivating Example
We need methods to measure the quality of cost
estimations for better query planning.
Motivation (2)
RELATED WORK
Current Performance Metrics
METRICS:
Definitions (1)
 q-error:
 Example
 Cr(TP1):100 Ce(TP1) = 90
 q-error = max(90/100,100/90) = 1.11
 q-error of all TPs = max(1.11,1.25,1) = 1.25
 q-error of whole plan(TPs+Joins) = max(1.11,1.25,1,1.3,3) = 3.
Definitions (2)
 Proposed Similarity Error:
 real = (100,200,300,50,50) estimated = (90,250,300,65,150)
Ep(engine 1) = 2*0.1391 = 0.2784 Ep(engine2) = 2*0.3838 = 0.7676
EXPERIMENTS AND RESULTS
Experimental Settings
 Federated Query Engines
 CostFed
 SPLENDID
 SemaGrow
 LHD
 Odyssey
 Queries and datasets
 FedBench and LargeRDFBench benchmarks. 13 Virtuoso endpoints.
 Technical Specifications: Each Virtuoso was deployed on a physical machine
(32 GB RAM, Core i7 processor and 500 GB hard disc). We ran the selected
federation engines on a local client machine with same speci鍖cations
Overall Plan Error (Similarity Error vs. q-error)
Join Error (Similarity Error vs. q-error)
Triple pattern error (Similarity Error vs. q-error)
Correlating metrics with runtime
Regression Experiments
Query Runtime (1/3)
Query Runtime (2/3)
Query Runtime (2/3)
Conclusion
 Positive correlation with the runtimes.
 The higher coe鍖cients (R values) with cosine-based errors as
compared to q-error.
 The smaller p-values of the cosine-based errors as compared to q-
error.
 Joins has higher correlation to runtimes as compared to the error in
the cardinality estimation of triple patterns.
 On average, the CostFed engine produce the smallest estimation
errors and has the smallest execution time for majority of the
LargeRDFBench queries.
Twitter: @UQudus
Paper Link: http://www.semantic-web-journal.net/system/files/swj2604.pdf
https://dice-research.org/UmairQudus

More Related Content

An empirical evaluation of cost-based federated SPARQL query Processing Engines

  • 1. An Empirical Evaluation of Cost-based Federated SPARQL Query Processing Engines Umair Qudus Muhammad Saleem Axel-Cyrille Ngonga Ngomo Young-koo Lee
  • 2. INTRODUCTION Finding a good query plan is of key step of the optimization of query runtime. Different metrics proposed to measure the quality of query plan, including query runtime, result set completeness and correctness, number of sources selected and number of requests sent. Although informative, these metrics are generic and unable to quantify and evaluate the accuracy of the cardinality estimators of the cost-based federation engines. We present a novel evaluation metrics targeted at a 鍖ne-grained benchmarking of cost-based federated SPARQL query engines
  • 4. We need methods to measure the quality of cost estimations for better query planning. Motivation (2)
  • 8. Definitions (1) q-error: Example Cr(TP1):100 Ce(TP1) = 90 q-error = max(90/100,100/90) = 1.11 q-error of all TPs = max(1.11,1.25,1) = 1.25 q-error of whole plan(TPs+Joins) = max(1.11,1.25,1,1.3,3) = 3.
  • 9. Definitions (2) Proposed Similarity Error: real = (100,200,300,50,50) estimated = (90,250,300,65,150) Ep(engine 1) = 2*0.1391 = 0.2784 Ep(engine2) = 2*0.3838 = 0.7676
  • 11. Experimental Settings Federated Query Engines CostFed SPLENDID SemaGrow LHD Odyssey Queries and datasets FedBench and LargeRDFBench benchmarks. 13 Virtuoso endpoints. Technical Specifications: Each Virtuoso was deployed on a physical machine (32 GB RAM, Core i7 processor and 500 GB hard disc). We ran the selected federation engines on a local client machine with same speci鍖cations
  • 12. Overall Plan Error (Similarity Error vs. q-error)
  • 13. Join Error (Similarity Error vs. q-error)
  • 14. Triple pattern error (Similarity Error vs. q-error)
  • 20. Conclusion Positive correlation with the runtimes. The higher coe鍖cients (R values) with cosine-based errors as compared to q-error. The smaller p-values of the cosine-based errors as compared to q- error. Joins has higher correlation to runtimes as compared to the error in the cardinality estimation of triple patterns. On average, the CostFed engine produce the smallest estimation errors and has the smallest execution time for majority of the LargeRDFBench queries.
  • 21. Twitter: @UQudus Paper Link: http://www.semantic-web-journal.net/system/files/swj2604.pdf https://dice-research.org/UmairQudus