際際滷

際際滷Share a Scribd company logo
1
Cloudera	
 Impala	
 
Charm	
 City	
 Linux,	
 March	
 2014	
 
	
 
Alex	
 Moundalexis	
 
	
 	
 
@technmsg
Thirty	
 Seconds	
 About	
 Alex	
 
≒ Solu@ons	
 Architect	
 
≒ aka	
 consultant	
 
≒ government	
 
≒ infrastructure	
 
≒ former	
 coder	
 of	
 Perl	
 
≒ former	
 administrator	
 
≒ likes	
 shiny	
 objects	
 
2
What	
 Does	
 Cloudera	
 Do?	
 
≒ product	
 
≒ distribu@on	
 of	
 Hadoop	
 components,	
 Apache	
 licensed	
 
≒ enterprise	
 tooling	
 
≒ support	
 
≒ training	
 
≒ services	
 (aka	
 consul@ng)	
 
≒ community	
 
3
Disclaimer	
 
≒ Cloudera	
 builds	
 things	
 soMware	
 
≒ most	
 donated	
 to	
 Apache	
 
≒ some	
 closed-足source	
 
≒ Cloudera	
 products	
 I	
 reference	
 are	
 open	
 source	
 
≒ Apache	
 Licensed	
 
≒ source	
 code	
 is	
 on	
 GitHub	
 
≒ hSps://github.com/cloudera	
 
4
What	
 This	
 Talk	
 Isnt	
 About	
 
≒ deploying	
 
≒ Puppet,	
 Chef,	
 Ansible,	
 homegrown	
 scripts,	
 intern	
 labor	
 
≒ sizing	
 &	
 tuning	
 
≒ depends	
 heavily	
 on	
 data	
 and	
 workload	
 
≒ coding	
 
≒ unless	
 you	
 count	
 XML	
 or	
 CSV	
 or	
 SQL	
 
≒ algorithms	
 
5
6
Quick	
 and	
 dirty,	
 for	
 context.	
 
The	
 Apache	
 Hadoop	
 Ecosystem
Why	
 Ecosystem?	
 
≒ In	
 the	
 beginning,	
 just	
 Hadoop	
 
≒ HDFS	
 
≒ MapReduce	
 
≒ Today,	
 dozens	
 of	
 interrelated	
 components	
 
≒ I/O	
 
≒ Processing	
 
≒ Specialty	
 Applica@ons	
 
≒ Con鍖gura@on	
 
≒ Work鍖ow	
 
7
HDFS	
 
≒ Distributed,	
 highly	
 fault-足tolerant	
 鍖lesystem	
 
≒ Op@mized	
 for	
 large	
 streaming	
 access	
 to	
 data	
 
≒ Based	
 on	
 Google	
 File	
 System	
 
≒ hSp://research.google.com/archive/gfs.html	
 
8
Lots	
 of	
 Commodity	
 Machines	
 
9
Image:Yahoo! Hadoop cluster [ OSCON 07 ]
MapReduce	
 (MR)	
 
≒ Programming	
 paradigm	
 
≒ Batch	
 oriented,	
 not	
 real@me	
 
≒ Works	
 well	
 with	
 distributed	
 compu@ng	
 
≒ Lots	
 of	
 Java,	
 but	
 other	
 languages	
 supported	
 
≒ Based	
 on	
 Googles	
 paper	
 
≒ hSp://research.google.com/archive/mapreduce.html	
 
10
Under	
 the	
 Covers	
 
11
You specify map() and
reduce() functions.

The framework does the
rest.	

60
Apache	
 Hive	
 
≒ Abstrac@on	
 of	
 Hadoops	
 Java	
 API	
 
≒ HiveQL	
 compiles	
 down	
 to	
 MR	
 
≒ a	
 SQL-足like	
 language	
 
≒ Eases	
 analysis	
 using	
 MapReduce	
 
13
Apache	
 Hive	
 Metastore	
 
≒ Maps	
 HDFS	
 鍖les	
 to	
 DB-足like	
 resources	
 
≒ Databases	
 
≒ Tables	
 
≒ Column/鍖eld	
 names,	
 data	
 types	
 
≒ Roles/users	
 
≒ InputFormat/OutputFormat	
 
14
WHY	
 DO	
 WE	
 NEED	
 THIS?	
 
But	
 wait	
 
15
16
17
I	
 am	
 not	
 a	
 SQL	
 wizard	
 by	
 any	
 means	
 
Super	
 Shady	
 SQL	
 Supplement
A	
 Simple	
 Rela@onal	
 Database	
 
name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011	
 
18
Interac@ng	
 with	
 Rela@onal	
 Data	
 
name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011	
 
19
	
 SELECT	
 *	
 FROM	
 people;
Interac@ng	
 with	
 Rela@onal	
 Data	
 
name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011	
 
20
	
 SELECT	
 *	
 FROM	
 people;
Reques@ng	
 Speci鍖c	
 Fields	
 
name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011	
 
21
	
 SELECT	
 name,	
 state	
 FROM	
 people;
Reques@ng	
 Speci鍖c	
 Fields	
 
name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011	
 
22
	
 SELECT	
 name,	
 state	
 FROM	
 people;
Reques@ng	
 Speci鍖c	
 Rows	
 
name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011	
 
23
	
 SELECT	
 name,	
 state	
 FROM	
 people	
 WHERE	
 year	
 	
 2012;
Reques@ng	
 Speci鍖c	
 Rows	
 
name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011	
 
24
	
 SELECT	
 name,	
 state	
 FROM	
 people	
 WHERE	
 year	
 	
 2012;
Two	
 Simple	
 Tables	
 
owner	
  species	
  name	
 
Alex	
  Cactus	
  Marvin	
 
Joey	
  Cat	
  Brain	
 
Sean	
  None	
 
Paris	
  Unknown	
 
25	
 
	
 
name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011
Joining	
 Two	
 Tables	
 
owner	
  species	
  name	
 
Alex	
  Cactus	
  Marvin	
 
Joey	
  Cat	
  Brain	
 
Sean	
  None	
 
Paris	
  Unknown	
 
26	
 
	
 SELECT	
 people.name	
 AS	
 owner,	
 people.state	
 AS	
 state,	
 pets.name	
 AS	
 pet	
 
	
 FROM	
 people	
 LEFT	
 JOIN	
 pets	
 ON	
 people.name	
 =	
 pets.owner	
 
	
 name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011
Joining	
 Two	
 Tables	
 
owner	
  species	
  name	
 
Alex	
  Cactus	
  Marvin	
 
Joey	
  Cat	
  Brain	
 
Sean	
  None	
 
Paris	
  Unknown	
 
27	
 
	
 SELECT	
 people.name	
 AS	
 owner,	
 people.state	
 AS	
 state,	
 pets.name	
 AS	
 pet	
 
	
 FROM	
 people	
 LEFT	
 JOIN	
 pets	
 ON	
 people.name	
 =	
 pets.owner	
 
	
 name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011
Joining	
 Two	
 Tables	
 
owner	
  species	
  name	
 
Alex	
  Cactus	
  Marvin	
 
Joey	
  Cat	
  Brain	
 
Sean	
  None	
 
Paris	
  Unknown	
 
28	
 
	
 SELECT	
 people.name	
 AS	
 owner,	
 people.state	
 AS	
 state,	
 pets.name	
 AS	
 pet	
 
	
 FROM	
 people	
 LEFT	
 JOIN	
 pets	
 ON	
 people.name	
 =	
 pets.owner	
 
name	
  state	
  employer	
  year	
 
Alex	
  Maryland	
  Cloudera	
  2013	
 
Joey	
  Maryland	
  Cloudera	
  2011	
 
Sean	
  Texas	
  Cloudera	
  2013	
 
Paris	
  Maryland	
  AOL	
  2011
Joining	
 Two	
 Tables	
 
29
	
 SELECT	
 people.name	
 AS	
 owner,	
 people.state	
 AS	
 state,	
 pets.name	
 AS	
 pet	
 
	
 FROM	
 people	
 LEFT	
 JOIN	
 pets	
 ON	
 people.name	
 =	
 pets.owner	
 
owner	
  state	
  pet	
 
Alex	
  Maryland	
  Marvin	
 
Joey	
  Maryland	
  Brain	
 
Sean	
  Texas	
 
Paris	
  Maryland
Varying	
 Implementa@on	
 of	
 JOIN	
 
30
	
 SELECT	
 people.name	
 AS	
 owner,	
 people.state	
 AS	
 state,	
 pets.name	
 AS	
 pet	
 
	
 FROM	
 people	
 LEFT	
 JOIN	
 pets	
 ON	
 people.name	
 =	
 pets.owner	
 
owner	
  state	
  pet	
 
Alex	
  Maryland	
  Marvin	
 
Joey	
  Maryland	
  Brain	
 
Sean	
  Texas	
  ?	
 
Paris	
  Maryland	
  ?
31
Familiar	
 interface,	
 but	
 more	
 powerful.	
 
Cloudera	
 Impala
Cloudera	
 Impala	
 
≒ Interac@ve	
 query	
 on	
 Hadoop	
 
≒ think	
 seconds,	
 not	
 minutes	
 
≒ Nearly	
 ANSI-足92	
 standard	
 SQL	
 
≒ compa@ble	
 with	
 HiveQL	
 
≒ Na@ve	
 MPP	
 query	
 engine	
 
≒ built	
 for	
 low-足latency	
 queries	
 
32
Cloudera	
 Impala	
 	
 Design	
 Choices	
 
≒ Na@ve	
 daemons,	
 wriSen	
 in	
 C/C++	
 
≒ No	
 JVM,	
 no	
 MapReduce	
 
≒ Saturate	
 disks	
 on	
 reads	
 
≒ Uses	
 in-足memory	
 HDFS	
 caching	
 
≒ Re-足uses	
 Hive	
 metastore	
 
≒ Not	
 as	
 fault-足tolerant	
 as	
 MapReduce	
 
33
Cloudera	
 Impala	
 	
 Architecture	
 
≒ Impala	
 Daemon	
 
≒ runs	
 on	
 every	
 node	
 
≒ handles	
 client	
 requests	
 
≒ handles	
 query	
 planning	
 	
 execu@on	
 
≒ State	
 Store	
 Daemon	
 
≒ provides	
 name	
 service	
 
≒ metadata	
 distribu@on	
 
≒ used	
 for	
 鍖nding	
 data	
 
34
Impala	
 Query	
 Execu@on	
 
35
Query	
 Planner	
 
Query	
 Coordinator	
 
Query	
 Executor	
 
HDFS	
 DN	
  HBase	
 
SQL	
 App	
 
ODBC	
 
Hive	
 
Metastore	
 
HDFS	
 NN	
  Statestore	
 
Query	
 Planner	
 
Query	
 Coordinator	
 
Query	
 Executor	
 
HDFS	
 DN	
  HBase	
 
Query	
 Planner	
 
Query	
 Coordinator	
 
Query	
 Executor	
 
HDFS	
 DN	
  HBase	
 
SQL	
 request	
 
1)	
 Request	
 arrives	
 via	
 ODBC/JDBC/HUE/Shell
Impala	
 Query	
 Execu@on	
 
36
Query	
 Planner	
 
Query	
 Coordinator	
 
Query	
 Executor	
 
HDFS	
 DN	
  HBase	
 
SQL	
 App	
 
ODBC	
 
Hive	
 
Metastore	
 
HDFS	
 NN	
  Statestore	
 
Query	
 Planner	
 
Query	
 Coordinator	
 
Query	
 Executor	
 
HDFS	
 DN	
  HBase	
 
Query	
 Planner	
 
Query	
 Coordinator	
 
Query	
 Executor	
 
HDFS	
 DN	
  HBase	
 
2)	
 Planner	
 turns	
 request	
 into	
 collecRons	
 of	
 plan	
 fragments	
 
3)	
 Coordinator	
 iniRates	
 execuRon	
 on	
 impalad(s)	
 local	
 to	
 data
Impala	
 Query	
 Execu@on	
 
37
Query	
 Planner	
 
Query	
 Coordinator	
 
Query	
 Executor	
 
HDFS	
 DN	
  HBase	
 
SQL	
 App	
 
ODBC	
 
Hive	
 
Metastore	
 
HDFS	
 NN	
  Statestore	
 
Query	
 Planner	
 
Query	
 Coordinator	
 
Query	
 Executor	
 
HDFS	
 DN	
  HBase	
 
Query	
 Planner	
 
Query	
 Coordinator	
 
Query	
 Executor	
 
HDFS	
 DN	
  HBase	
 
4)	
 Intermediate	
 results	
 are	
 streamed	
 between	
 impalad(s)	
 
5)	
 Query	
 results	
 are	
 streamed	
 back	
 to	
 client	
 
Query	
 results
Cloudera	
 Impala	
 	
 Results	
 
≒ Allows	
 for	
 fast	
 itera@on/discovery	
 
≒ How	
 much	
 faster?	
 
≒ 3-足4x	
 faster	
 on	
 I/O	
 bound	
 workloads	
 
≒ up	
 to	
 45x	
 faster	
 on	
 mul@-足MR	
 queries	
 
≒ up	
 to	
 90x	
 faster	
 on	
 in-足memory	
 cache	
 
38
39
Hold	
 onto	
 something,	
 folks.	
 
Demo
Whats	
 Next?	
 
≒ Download	
 Hadoop!	
 
≒ CDH	
 available	
 at	
 www.cloudera.com	
 
≒ Already	
 done	
 that?	
 Contribute	
 
≒ Cloudera	
 provides	
 pre-足loaded	
 VMs	
 
≒ hSp://@ny.cloudera.com/quickstartvm	
 
≒ Clone	
 our	
 repos!	
 
≒ hSps://github.com/cloudera	
 
40
PARIS	
 
Special	
 thanks:	
 
41
42
Preferably	
 related	
 to	
 the	
 talk	
 or	
 not.	
 
Ques@ons?
43
Thank	
 You!	
 
Alex	
 Moundalexis	
 
	
 	
 
@technmsg	
 
	
 
Were	
 hiring,	
 kids!	
 Well,	
 not	
 kids.

More Related Content

Introduction to Cloudera Impala

  • 1. 1 Cloudera Impala Charm City Linux, March 2014 Alex Moundalexis @technmsg
  • 2. Thirty Seconds About Alex ≒ Solu@ons Architect ≒ aka consultant ≒ government ≒ infrastructure ≒ former coder of Perl ≒ former administrator ≒ likes shiny objects 2
  • 3. What Does Cloudera Do? ≒ product ≒ distribu@on of Hadoop components, Apache licensed ≒ enterprise tooling ≒ support ≒ training ≒ services (aka consul@ng) ≒ community 3
  • 4. Disclaimer ≒ Cloudera builds things soMware ≒ most donated to Apache ≒ some closed-足source ≒ Cloudera products I reference are open source ≒ Apache Licensed ≒ source code is on GitHub ≒ hSps://github.com/cloudera 4
  • 5. What This Talk Isnt About ≒ deploying ≒ Puppet, Chef, Ansible, homegrown scripts, intern labor ≒ sizing & tuning ≒ depends heavily on data and workload ≒ coding ≒ unless you count XML or CSV or SQL ≒ algorithms 5
  • 6. 6 Quick and dirty, for context. The Apache Hadoop Ecosystem
  • 7. Why Ecosystem? ≒ In the beginning, just Hadoop ≒ HDFS ≒ MapReduce ≒ Today, dozens of interrelated components ≒ I/O ≒ Processing ≒ Specialty Applica@ons ≒ Con鍖gura@on ≒ Work鍖ow 7
  • 8. HDFS ≒ Distributed, highly fault-足tolerant 鍖lesystem ≒ Op@mized for large streaming access to data ≒ Based on Google File System ≒ hSp://research.google.com/archive/gfs.html 8
  • 9. Lots of Commodity Machines 9 Image:Yahoo! Hadoop cluster [ OSCON 07 ]
  • 10. MapReduce (MR) ≒ Programming paradigm ≒ Batch oriented, not real@me ≒ Works well with distributed compu@ng ≒ Lots of Java, but other languages supported ≒ Based on Googles paper ≒ hSp://research.google.com/archive/mapreduce.html 10
  • 12. You specify map() and reduce() functions. The framework does the rest. 60
  • 13. Apache Hive ≒ Abstrac@on of Hadoops Java API ≒ HiveQL compiles down to MR ≒ a SQL-足like language ≒ Eases analysis using MapReduce 13
  • 14. Apache Hive Metastore ≒ Maps HDFS 鍖les to DB-足like resources ≒ Databases ≒ Tables ≒ Column/鍖eld names, data types ≒ Roles/users ≒ InputFormat/OutputFormat 14
  • 15. WHY DO WE NEED THIS? But wait 15
  • 16. 16
  • 17. 17 I am not a SQL wizard by any means Super Shady SQL Supplement
  • 18. A Simple Rela@onal Database name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 18
  • 19. Interac@ng with Rela@onal Data name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 19 SELECT * FROM people;
  • 20. Interac@ng with Rela@onal Data name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 20 SELECT * FROM people;
  • 21. Reques@ng Speci鍖c Fields name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 21 SELECT name, state FROM people;
  • 22. Reques@ng Speci鍖c Fields name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 22 SELECT name, state FROM people;
  • 23. Reques@ng Speci鍖c Rows name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 23 SELECT name, state FROM people WHERE year 2012;
  • 24. Reques@ng Speci鍖c Rows name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011 24 SELECT name, state FROM people WHERE year 2012;
  • 25. Two Simple Tables owner species name Alex Cactus Marvin Joey Cat Brain Sean None Paris Unknown 25 name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011
  • 26. Joining Two Tables owner species name Alex Cactus Marvin Joey Cat Brain Sean None Paris Unknown 26 SELECT people.name AS owner, people.state AS state, pets.name AS pet FROM people LEFT JOIN pets ON people.name = pets.owner name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011
  • 27. Joining Two Tables owner species name Alex Cactus Marvin Joey Cat Brain Sean None Paris Unknown 27 SELECT people.name AS owner, people.state AS state, pets.name AS pet FROM people LEFT JOIN pets ON people.name = pets.owner name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011
  • 28. Joining Two Tables owner species name Alex Cactus Marvin Joey Cat Brain Sean None Paris Unknown 28 SELECT people.name AS owner, people.state AS state, pets.name AS pet FROM people LEFT JOIN pets ON people.name = pets.owner name state employer year Alex Maryland Cloudera 2013 Joey Maryland Cloudera 2011 Sean Texas Cloudera 2013 Paris Maryland AOL 2011
  • 29. Joining Two Tables 29 SELECT people.name AS owner, people.state AS state, pets.name AS pet FROM people LEFT JOIN pets ON people.name = pets.owner owner state pet Alex Maryland Marvin Joey Maryland Brain Sean Texas Paris Maryland
  • 30. Varying Implementa@on of JOIN 30 SELECT people.name AS owner, people.state AS state, pets.name AS pet FROM people LEFT JOIN pets ON people.name = pets.owner owner state pet Alex Maryland Marvin Joey Maryland Brain Sean Texas ? Paris Maryland ?
  • 31. 31 Familiar interface, but more powerful. Cloudera Impala
  • 32. Cloudera Impala ≒ Interac@ve query on Hadoop ≒ think seconds, not minutes ≒ Nearly ANSI-足92 standard SQL ≒ compa@ble with HiveQL ≒ Na@ve MPP query engine ≒ built for low-足latency queries 32
  • 33. Cloudera Impala Design Choices ≒ Na@ve daemons, wriSen in C/C++ ≒ No JVM, no MapReduce ≒ Saturate disks on reads ≒ Uses in-足memory HDFS caching ≒ Re-足uses Hive metastore ≒ Not as fault-足tolerant as MapReduce 33
  • 34. Cloudera Impala Architecture ≒ Impala Daemon ≒ runs on every node ≒ handles client requests ≒ handles query planning execu@on ≒ State Store Daemon ≒ provides name service ≒ metadata distribu@on ≒ used for 鍖nding data 34
  • 35. Impala Query Execu@on 35 Query Planner Query Coordinator Query Executor HDFS DN HBase SQL App ODBC Hive Metastore HDFS NN Statestore Query Planner Query Coordinator Query Executor HDFS DN HBase Query Planner Query Coordinator Query Executor HDFS DN HBase SQL request 1) Request arrives via ODBC/JDBC/HUE/Shell
  • 36. Impala Query Execu@on 36 Query Planner Query Coordinator Query Executor HDFS DN HBase SQL App ODBC Hive Metastore HDFS NN Statestore Query Planner Query Coordinator Query Executor HDFS DN HBase Query Planner Query Coordinator Query Executor HDFS DN HBase 2) Planner turns request into collecRons of plan fragments 3) Coordinator iniRates execuRon on impalad(s) local to data
  • 37. Impala Query Execu@on 37 Query Planner Query Coordinator Query Executor HDFS DN HBase SQL App ODBC Hive Metastore HDFS NN Statestore Query Planner Query Coordinator Query Executor HDFS DN HBase Query Planner Query Coordinator Query Executor HDFS DN HBase 4) Intermediate results are streamed between impalad(s) 5) Query results are streamed back to client Query results
  • 38. Cloudera Impala Results ≒ Allows for fast itera@on/discovery ≒ How much faster? ≒ 3-足4x faster on I/O bound workloads ≒ up to 45x faster on mul@-足MR queries ≒ up to 90x faster on in-足memory cache 38
  • 39. 39 Hold onto something, folks. Demo
  • 40. Whats Next? ≒ Download Hadoop! ≒ CDH available at www.cloudera.com ≒ Already done that? Contribute ≒ Cloudera provides pre-足loaded VMs ≒ hSp://@ny.cloudera.com/quickstartvm ≒ Clone our repos! ≒ hSps://github.com/cloudera 40
  • 42. 42 Preferably related to the talk or not. Ques@ons?
  • 43. 43 Thank You! Alex Moundalexis @technmsg Were hiring, kids! Well, not kids.