ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Amazon Redshift:
How we managed 300 billion rows with no DBA
Matt Cohen
Founder & President
matt@onespot.com
December 10th, 2013

Copyright?2013OneSpot,Proprietary&Confidential

1
What is OneSpot?
? OneSpot is a content
advertising platform that
distributes content as
ads that people want
to click on.
¨C Fortune 2000 clients
¨C Realtime ad exchange
bidding
¨C Adaptive machine learning
¨C Seed funded until
$5.3M Series A last month

? Big data, big analysis
Copyright?2013OneSpot,Proprietary&Confidential

2
What is Redshift?
1. When light from a receding object appears
shifted to the red end of the spectrum
¨C A consequence of the expanding universe.

2. A cheap, fast, Petabyte-scale, managed
SQL data warehouse service from Amazon
Web Services
¨C A consequence of the expanding cloud ecosystem

Copyright?2013OneSpot,Proprietary&Confidential

3
Why Redshift?
?
?
?
?
?
?
?

Cheap
Fast
Petabyte-scale
Managed Service
SQL
Data Warehouse
From AWS

Copyright?2013OneSpot,Proprietary&Confidential

4
SQL Data Warehouse
? Based on the commercial ParAccel database
¨C Which is based on Postgres

? Standards-based tools and knowledge
? Built for data warehousing
¨C
¨C
¨C
¨C
¨C

Column-oriented
Cluster architecture
Read optimized
No relational integrity
Almost no SQL extensions

Copyright?2013OneSpot,Proprietary&Confidential

5
SQL Data Warehouse
? Column-oriented

Copyright?2013OneSpot,Proprietary&Confidential

6
SQL Data Warehouse
? Column-oriented

? 11 different compression techniques

Copyright?2013OneSpot,Proprietary&Confidential

7
SQL Data Warehouse
? Cluster architecture

Copyright?2013OneSpot,Proprietary&Confidential

8
SQL Data Warehouse
? Read optimized

? No relational integrity

¨C Large block size (1MB)
¨C Data replication

¨C No indexes:
sort and distribution keys

? 2x live, 1x S3

? Almost no SQL
extensions

Copyright?2013OneSpot,Proprietary&Confidential

9
Fast = Cheap
? Starts with 1 XL node
¨C 85? an hour ($620/month) on demand
¨C 50? an hour ($365) 1 year reserved

? Benchmarks say:
¨C Scales linearly
¨C 5-10x faster than Hadoop/Hive

Copyright?2013OneSpot,Proprietary&Confidential

10
Petabyte scale
? Up to
¨C 32 XL nodes (64 Terabytes)
¨C 100 8XL nodes (1.6 Petabytes)

Copyright?2013OneSpot,Proprietary&Confidential

11
Managed Service from AWS
? Managed Service
¨C Incredibly easy
¨C Nice UI
¨C Most SQL tools

? From AWS
¨C Free data transfer
¨C Easy load from S3
¨C Use AWS Data Pipeline

Copyright?2013OneSpot,Proprietary&Confidential

12
The TL;DR
? Pros
¨C
¨C
¨C
¨C
¨C

Standard SQL
Super easy
Very fast
Affordable
Integrates with AWS

¨C No DBA
¨C No Sysadmin

? Cons
¨C Standard SQL
¨C Almost no SQL
extensions
¨C Best with Star Schema
? Big joins can be slow

¨C
¨C
¨C
¨C

Copyright?2013OneSpot,Proprietary&Confidential

No MapReduce
Fixed columns
Consistency
1.6 Pbyte limit

13
Amazon Redshift:
How we managed 300 billion rows with no DBA
Matt Cohen
Founder & President
matt@onespot.com
December 10th, 2013

Copyright?2013OneSpot,Proprietary&Confidential

14

More Related Content

What's hot (17)

§¥§Þ§Ú§ä§â§Ú§Û §­§Ñ§Ó§â§Ú§ß§Ö§ß§Ü§à "Blockchain for Identity Management, based on Fast Big Data"
§¥§Þ§Ú§ä§â§Ú§Û §­§Ñ§Ó§â§Ú§ß§Ö§ß§Ü§à "Blockchain for Identity Management, based on Fast Big Data"§¥§Þ§Ú§ä§â§Ú§Û §­§Ñ§Ó§â§Ú§ß§Ö§ß§Ü§à "Blockchain for Identity Management, based on Fast Big Data"
§¥§Þ§Ú§ä§â§Ú§Û §­§Ñ§Ó§â§Ú§ß§Ö§ß§Ü§à "Blockchain for Identity Management, based on Fast Big Data"
Fwdays
?
Pyramid vs QlikView
Pyramid vs QlikViewPyramid vs QlikView
Pyramid vs QlikView
Pyramid Analytics
?
Pyramid Analytics vs Sisense
Pyramid Analytics vs SisensePyramid Analytics vs Sisense
Pyramid Analytics vs Sisense
Pyramid Analytics
?
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
Rasmus Ekman
?
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
Amazon Web Services
?
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
HostedbyConfluent
?
Wix sql on-storm-platform
Wix sql on-storm-platformWix sql on-storm-platform
Wix sql on-storm-platform
alooma
?
Datastax Expedia
Datastax ExpediaDatastax Expedia
Datastax Expedia
Eddie Satterly
?
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
Sudhir Tonse
?
An¨¢lisis de las novedades del Elastic Stack
An¨¢lisis de las novedades del Elastic StackAn¨¢lisis de las novedades del Elastic Stack
An¨¢lisis de las novedades del Elastic Stack
Elasticsearch
?
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Databricks
?
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
Amazon Web Services
?
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
Amazon Web Services
?
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
Lynn Langit
?
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
HostedbyConfluent
?
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
AWSCOMSUM
?
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Amazon Web Services
?
§¥§Þ§Ú§ä§â§Ú§Û §­§Ñ§Ó§â§Ú§ß§Ö§ß§Ü§à "Blockchain for Identity Management, based on Fast Big Data"
§¥§Þ§Ú§ä§â§Ú§Û §­§Ñ§Ó§â§Ú§ß§Ö§ß§Ü§à "Blockchain for Identity Management, based on Fast Big Data"§¥§Þ§Ú§ä§â§Ú§Û §­§Ñ§Ó§â§Ú§ß§Ö§ß§Ü§à "Blockchain for Identity Management, based on Fast Big Data"
§¥§Þ§Ú§ä§â§Ú§Û §­§Ñ§Ó§â§Ú§ß§Ö§ß§Ü§à "Blockchain for Identity Management, based on Fast Big Data"
Fwdays
?
Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)Rethinking the database for the cloud (iJAWS)
Rethinking the database for the cloud (iJAWS)
Rasmus Ekman
?
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
Event & Data Mesh as a Service: Industrializing Microservices in the Enterpri...
HostedbyConfluent
?
Wix sql on-storm-platform
Wix sql on-storm-platformWix sql on-storm-platform
Wix sql on-storm-platform
alooma
?
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
Sudhir Tonse
?
An¨¢lisis de las novedades del Elastic Stack
An¨¢lisis de las novedades del Elastic StackAn¨¢lisis de las novedades del Elastic Stack
An¨¢lisis de las novedades del Elastic Stack
Elasticsearch
?
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
The Evolution of the Fashion Retail Industry in the Age of AI with Kshitij Ku...
Databricks
?
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
Amazon Web Services
?
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
Lynn Langit
?
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
Streaming Data in the Cloud with Confluent and MongoDB Atlas | Robert Walters...
HostedbyConfluent
?
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
AWSCOMSUM
?
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Taking the Performance of your Data Warehouse to the Next Level with Amazon R...
Amazon Web Services
?

Similar to 2 one spot redshift bigdatacamp 1.02 (20)

Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
?
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
Amazon Web Services
?
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
Amazon Web Services
?
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
Amazon Web Services
?
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
?
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
Amazon Web Services
?
Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users
mauerbac
?
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloud
Dr. Wilfred Lin (Ph.D.)
?
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
Amazon Web Services
?
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
Amazon Web Services
?
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
Amazon Web Services
?
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Web Services
?
Using Data Lakes
Using Data Lakes Using Data Lakes
Using Data Lakes
Amazon Web Services
?
Weathering the Data Storm ¨C How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm ¨C How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm ¨C How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm ¨C How SnapLogic and AWS Deliver Analytics in the Cl...
SnapLogic
?
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
?
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
Amazon Web Services
?
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Amazon Web Services
?
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Amazon Web Services
?
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Ian Massingham
?
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
Amazon Web Services
?
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
?
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
Amazon Web Services
?
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
Amazon Web Services
?
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
Amazon Web Services
?
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
Amazon Web Services LATAM
?
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
Amazon Web Services
?
Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users Scaling on AWS to the First 10 Million Users
Scaling on AWS to the First 10 Million Users
mauerbac
?
A3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloudA3 transforming data_management_in_the_cloud
A3 transforming data_management_in_the_cloud
Dr. Wilfred Lin (Ph.D.)
?
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
Amazon Web Services
?
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
Amazon Web Services
?
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Redshift in Action: Enterprise, Big Data, and SaaS Use Cases (DAT205) ...
Amazon Web Services
?
Weathering the Data Storm ¨C How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm ¨C How SnapLogic and AWS Deliver Analytics in the Cl...Weathering the Data Storm ¨C How SnapLogic and AWS Deliver Analytics in the Cl...
Weathering the Data Storm ¨C How SnapLogic and AWS Deliver Analytics in the Cl...
SnapLogic
?
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Data Warehouse Modernization - Big Data in the Cloud Success with Qubole on O...
Qubole
?
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
Amazon Web Services
?
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdfBuilding_a_Modern_Data_Platform_in_the_Cloud.pdf
Building_a_Modern_Data_Platform_in_the_Cloud.pdf
Amazon Web Services
?
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Amazon Web Services
?
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit DublinScaling on AWS for the First 10 Million Users at Websummit Dublin
Scaling on AWS for the First 10 Million Users at Websummit Dublin
Ian Massingham
?
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
Amazon Web Services
?

More from Valerie Akinson Brown (6)

1 big datacampdell2013
1 big datacampdell20131 big datacampdell2013
1 big datacampdell2013
Valerie Akinson Brown
?
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
Valerie Akinson Brown
?
3 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-133 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-13
Valerie Akinson Brown
?
1 big datacampdell2013
1 big datacampdell20131 big datacampdell2013
1 big datacampdell2013
Valerie Akinson Brown
?
3 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-133 analytic strategies shree dandekar dell 12-10-13
3 analytic strategies shree dandekar dell 12-10-13
Valerie Akinson Brown
?
1 big datacampdell2013
1 big datacampdell20131 big datacampdell2013
1 big datacampdell2013
Valerie Akinson Brown
?

2 one spot redshift bigdatacamp 1.02

  • 1. Amazon Redshift: How we managed 300 billion rows with no DBA Matt Cohen Founder & President matt@onespot.com December 10th, 2013 Copyright?2013OneSpot,Proprietary&Confidential 1
  • 2. What is OneSpot? ? OneSpot is a content advertising platform that distributes content as ads that people want to click on. ¨C Fortune 2000 clients ¨C Realtime ad exchange bidding ¨C Adaptive machine learning ¨C Seed funded until $5.3M Series A last month ? Big data, big analysis Copyright?2013OneSpot,Proprietary&Confidential 2
  • 3. What is Redshift? 1. When light from a receding object appears shifted to the red end of the spectrum ¨C A consequence of the expanding universe. 2. A cheap, fast, Petabyte-scale, managed SQL data warehouse service from Amazon Web Services ¨C A consequence of the expanding cloud ecosystem Copyright?2013OneSpot,Proprietary&Confidential 3
  • 4. Why Redshift? ? ? ? ? ? ? ? Cheap Fast Petabyte-scale Managed Service SQL Data Warehouse From AWS Copyright?2013OneSpot,Proprietary&Confidential 4
  • 5. SQL Data Warehouse ? Based on the commercial ParAccel database ¨C Which is based on Postgres ? Standards-based tools and knowledge ? Built for data warehousing ¨C ¨C ¨C ¨C ¨C Column-oriented Cluster architecture Read optimized No relational integrity Almost no SQL extensions Copyright?2013OneSpot,Proprietary&Confidential 5
  • 6. SQL Data Warehouse ? Column-oriented Copyright?2013OneSpot,Proprietary&Confidential 6
  • 7. SQL Data Warehouse ? Column-oriented ? 11 different compression techniques Copyright?2013OneSpot,Proprietary&Confidential 7
  • 8. SQL Data Warehouse ? Cluster architecture Copyright?2013OneSpot,Proprietary&Confidential 8
  • 9. SQL Data Warehouse ? Read optimized ? No relational integrity ¨C Large block size (1MB) ¨C Data replication ¨C No indexes: sort and distribution keys ? 2x live, 1x S3 ? Almost no SQL extensions Copyright?2013OneSpot,Proprietary&Confidential 9
  • 10. Fast = Cheap ? Starts with 1 XL node ¨C 85? an hour ($620/month) on demand ¨C 50? an hour ($365) 1 year reserved ? Benchmarks say: ¨C Scales linearly ¨C 5-10x faster than Hadoop/Hive Copyright?2013OneSpot,Proprietary&Confidential 10
  • 11. Petabyte scale ? Up to ¨C 32 XL nodes (64 Terabytes) ¨C 100 8XL nodes (1.6 Petabytes) Copyright?2013OneSpot,Proprietary&Confidential 11
  • 12. Managed Service from AWS ? Managed Service ¨C Incredibly easy ¨C Nice UI ¨C Most SQL tools ? From AWS ¨C Free data transfer ¨C Easy load from S3 ¨C Use AWS Data Pipeline Copyright?2013OneSpot,Proprietary&Confidential 12
  • 13. The TL;DR ? Pros ¨C ¨C ¨C ¨C ¨C Standard SQL Super easy Very fast Affordable Integrates with AWS ¨C No DBA ¨C No Sysadmin ? Cons ¨C Standard SQL ¨C Almost no SQL extensions ¨C Best with Star Schema ? Big joins can be slow ¨C ¨C ¨C ¨C Copyright?2013OneSpot,Proprietary&Confidential No MapReduce Fixed columns Consistency 1.6 Pbyte limit 13
  • 14. Amazon Redshift: How we managed 300 billion rows with no DBA Matt Cohen Founder & President matt@onespot.com December 10th, 2013 Copyright?2013OneSpot,Proprietary&Confidential 14