AtScale, the first company to provide business users with speed, security and simplicity for BI on Hadoop shares the results here of a comprehensive Business Intelligence benchmark for SQL-on-Hadoop engines.
The goal of the Business Intelligence for Hadoop benchmark is to help technology evaluators select the best SQL-on-Hadoop technology for their use cases.
The benchmark tested the industrys top SQL-on-Hadoop engines over key Business Intelligence (BI) workloads use case queries, and reveals and rates strengths and weaknesses of the engines, and reveals which ones are ideally suited to various scenarios.
To learn more about how AtScale can help you make BI work on Hadoop in your enterprise, visit www.atscale.com.
1 of 17
Download to read offline
More Related Content
The Business Intelligence for Hadoop Benchmark - Q1 2016
1. The BI for Hadoop Benchmark
Q1 2016
atscale.com/benchmark
2. 2? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Hadoop Use Cases have evolved
74%
62%
65%
ETL Data Science Business
Intelligence
51% 56%
69%
ETL Data Science Business
Intelligence
Yesterday Today
atscale.com/survey
3. 3? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Self-Service leads to Business Value
atscale.com/survey
41%
61%
59%
39%
No Access Self Service
Companies that
provide self-service
accessto business units
are 50% more likely
to gain value out of Hadoop
4. 4? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Most Dont Have Self-Service on Hadoop
atscale.com/survey
Close to 60% have not
provided self-service
accessto Hadoop yet
41%
59%
Yes
No
5. 5? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Why Self-Service is so Hard
1. Current BI Tools are limited
2. Hadoop is not optimizedfor performance
3. Governance and security are an issue
4. Current approaches are unnatural
atscale.com/benchmark
6. The BI for Hadoop Benchmark
Q1 2016
atscale.com/benchmark
7. 7? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Benchmark Framework
Three key conceptsneed to be inspected when evaluatingSQL-on-Hadoop enginesand their fitness to
satisfy Business Intelligenceworkloads:
q? Performson Big Data: the SQL-on-Hadoop enginemust be able to consistentlyanalyze billionsor
trillionsof rowsof datawithoutgenerating errorsand with response times on the order of 10s or
100s of seconds.
q? Fast onSmall Data: the engine needs to deliver interactiveperformanceon known querypatterns
and as such itis importantthat the SQL-on-Hadoop enginereturn results in no greater than a few
secondson small data sets (on the order of thousandsor millionsof rows).
q? Stable for Many Users: Enterprise BI user bases consistof hundredsor thousandsof data works,
and as aresult the underlyingSQL-on-Hadoop enginemust performreliablyunder highly
concurrentanalysisworkloads.
atscale.com/benchmark
8. 8? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Benchmark Queries
Data Set:Star Schema Benchmark (SSB)data set
6B rows, 13 queries, 3 patterns
1. Quick Metricqueries: Compute a particular metric value for a period of time. These
queries have a small number of joins and minimal or no group-bys (Q1.1 - Q1.3)
2. Product Insight queries:Compute a metric (or several metrics) aggregated against a
set of product and date based dimensions. These queries include medium sized joins
and a small number of group-bys (Q2.1 - Q2.3)
3. Customer Insight: Compute a metric (or several metrics) aggregated against a set of
product, customer, and date-based dimensions. These queries include both medium
and very large sized joins as well as a number of group-bys (Q3.1 - Q4.3)
atscale.com/benchmark
10. 10? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Benchmark Key Findings
q? One engine does not fit all: Dependingon raw data size, query complexity,and the target number of
end-users enterpriseswill find that oneengine cant accomplish itall. Each enginehas its own
sweet spotand enterprises may find that a blended usage SQL-on-Hadoop enginesmight fit their
companysgoals better.
q? Small vs. Big Data: While all queryengines successfullycompleted the Large Data query tests,
Spark SQL and Impala performed better on smaller data sets - tables with thousandsor several
million rowsof data.
q? Few vs. Many Users: Impala has shown the best concurrencytestresults, over Hiveand Spark-SQL.
Companiesthat anticipateconnectinglargenumbersof business users to Hadoop may want to
consider Impala.
q? Constant Innovation: Open sourcecontribution,asseen by Spark SQL improvements, provides
constantinnovation. Weexpect the industryto continueinnovatinghere: for example,Cloudera
donated the Impala projectto the ApacheSoftware Foundation thispastNovember. There isno
doubtmore innovation will comeoutfromthis new development.
atscale.com/benchmark
12. 12? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Benchmarks: Environment
RAM pernode 128G
CPU specs for data (worker) nodes 32 CPU cores
Storage specs for data (worker) nodes 2x 512mb SSD
For our test environment weused an 12 node cluster with:
? 1 master node
? 1 gateway node
? 10 data nodes
13. 13? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Benchmarks: Data Set
Table Name
Number of
Rows
CUSTOMER_SMALL 30M
CUSTOMER 1B
LINEORDER 6B
SUPPLIER 2M
PART 2M
DATE 16K
14. 14? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Benchmarks: Queries
Query ID Number of Joins Largest Join Table Number of Group Bys Number of Filters Comments
Q1.1 1 16,799 0 3 1 range condition, 1 comparative filter condition directly on LINEORDERtable
Q1.2 1 16,799 0 3 2 range filter conditions directly on LINEORDERtable
Q1.3 1 16,799 0 4
2 range filter conditions directly on LINEORDERtable, 2 conditions on joined
table
Q2.1 3 2,000,000 2 2 filter on p_category (less selective)
Q2.2 3 2,000,000 2 2 filter on p_brand, 2 values (more selective)
Q2.3 3 2,000,000 2 2 filter on p_brand, 1 value (most selective)
Q3.1 3 1,050,000,000 3 3 filter on region (less selective)
Q3.2 3 1,050,000,000 3 3 filter on nation (more selective)
Q3.3 3 1,050,000,000 3 3 filter on city (most selective)
Q3.4 3 1,050,000,000 3 3 filter on city (most selective) and month (vs. year)
Q4.1 4 1,050,000,000 2 2
Q4.2 4 1,050,000,000 3 3 includes filter on year (more selective)
Q4.3 4 1,050,000,000 3 3 includes filter on year and nation (most selective)
16. 16? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
AtScale Intelligence Platform
I.T. needs
Control & Consistency
The Business needs
Freedom & Self-Service
The Business Interface
for Hadoop
17. 17? 2015 ATSCALE, INC. ALLRIGHTSRESERVED. CONFIDENTIAL & PROPRIETARY
Superior Architecture
q? Any BI tool
q? Industry standards
q? Schema on demand
q? Write once