ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
DATAINTEGRATIONAND
SQLAPPLICATIONMIGRATION
WITH
CASCADINGLINGUAL
Chris K Wensel | Hadoop Summit EU 2014
? Not a ¡°data scientist¡±
? No idea what ¡°big data¡± means
? Used MR in anger once, and did it wrong
? Author of Cascading
? Co-Author of Lingual (w/ Julian Hyde)
CHRISKWENSEL
2
3
Why is Hadoop & ¡°big data¡± a thing?
More is better
HADOOP&BIGDATA
4
More Data
More Machines
More Algorithms
More Tools
HADOOP&BIGDATA
5
Worse is better
HADOOP&BIGDATA
6
Less red tape
More degrees of freedom
No upfront design
HADOOP&BIGDATA
7
8
Why Cascading?
Makes hard things possible.
CASCADING
9
While helping to retain
Conceptual Integrity.
CASCADING
10
"the speed of innovation is
proportional to the arrival rate of
answers to questions"
HADOOP&BIGDATA
11
True when you are questioning
Data, Algorithms, and
Architecture
CASCADING
12
? Java API (alternative to Hadoop MapReduce)
? Separates business logic from integration
? Testable at every lifecycle stage
? Works with any JVM language
? Many integration adapters
CASCADING
13
Process Planner
Processing API Integration API
Scheduler API
Scheduler
Compute
Cascading
Data Stores
Scripting
Scala, Clojure, JRuby, Jython, Groovy
Enterprise Java
ECOSYSTEM
14
Lingual Pattern
Cascading
Hadoop MR
Scalding Cascalog
Hadoop Tez Whatever
? Started in 2007
? 2.0 released June 2012
? 2.5 stable out now
? 3.0 wip now available
? Tez support coming soon
? Apache Licensed Open-Source
? Supports all Hadoop 1 & 2 distros
CASCADING
15
ANSI SQL
on Cascading
on Whatever
LINGUAL
16
How¡¯s this different than all the
other ¡°SQL for Hadoop¡± projects?
LINGUAL
17
Not intended as
an ad-hoc query interface.
[Lingual is only as fast as Hadoop]
WHYLINGUAL?
18
Is intended to be
as standards compliant as
possible.
WHYLINGUAL?
19
Migrate workloads from expensive systems
to less expensive Hadoop
WHYLINGUAL?
20
Liberate the data trapped on Hadoop w/o
involving an Engineer
WHYLINGUAL?
21
? ANSI Compatible SQL
? JDBC Driver
? Cascading Java API
? SQL Command Shell
? Catalog Manager Tool
? Data Provider API
LINGUAL
22
Query Planner
JDBC API Lingual APIProvider API
Cascading
Compute
Lingual
Data Stores
CLI / Shell Enterprise Java
Catalog
? SQL-92
? Character, Numeric, and Temporal types
? IN and CASE
? FROM sub-queries
? CAST and CONVERT
? CURRENT_*
ANSISQL
23
http://docs.cascading.org/lingual/1.1/#sql-support
24
query:	
 ?
	
 ?	
 ?{	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?select	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?query	
 ?UNION	
 ?[	
 ?ALL	
 ?]	
 ?query	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?query	
 ?EXCEPT	
 ?query	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?query	
 ?INTERSECT	
 ?query	
 ?
	
 ?	
 ?}	
 ?
	
 ?	
 ?[	
 ?ORDER	
 ?BY	
 ?orderItem	
 ?[,	
 ?orderItem	
 ?]*	
 ?]	
 ?
	
 ?	
 ?[	
 ?LIMIT	
 ?{	
 ?count	
 ?|	
 ?ALL	
 ?}	
 ?]	
 ?
	
 ?	
 ?[	
 ?OFFSET	
 ?start	
 ?{	
 ?ROW	
 ?|	
 ?ROWS	
 ?}	
 ?]	
 ?
	
 ?	
 ?[	
 ?FETCH	
 ?{	
 ?FIRST	
 ?|	
 ?NEXT	
 ?}	
 ?[	
 ?count	
 ?]	
 ?{	
 ?ROW	
 ?|	
 ?ROWS	
 ?}	
 ?]	
 ?
!
orderItem:	
 ?
	
 ?	
 ?expression	
 ?[	
 ?ASC	
 ?|	
 ?DESC	
 ?]	
 ?[	
 ?NULLS	
 ?FIRST	
 ?|	
 ?NULLS	
 ?LAST	
 ?]	
 ?
!
select:	
 ?
	
 ?	
 ?SELECT	
 ?[	
 ?ALL	
 ?|	
 ?DISTINCT	
 ?]	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?{	
 ?*	
 ?|	
 ?projectItem	
 ?[,	
 ?projectItem	
 ?]*	
 ?}	
 ?
	
 ?	
 ?FROM	
 ?tableExpression	
 ?
	
 ?	
 ?[	
 ?WHERE	
 ?booleanExpression	
 ?]	
 ?
	
 ?	
 ?[	
 ?GROUP	
 ?BY	
 ?{	
 ?()	
 ?|	
 ?expression	
 ?[,	
 ?expression]*	
 ?}	
 ?]	
 ?
	
 ?	
 ?[	
 ?HAVING	
 ?booleanExpression	
 ?]	
 ?
	
 ?	
 ?[	
 ?WINDOW	
 ?windowName	
 ?AS	
 ?windowSpec	
 ?[,	
 ?windowName	
 ?AS	
 ?windowSpec	
 ?]*	
 ?]	
 ?
!
projectItem:	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?expression	
 ?[	
 ?[	
 ?AS	
 ?]	
 ?columnAlias	
 ?]	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?tableAlias	
 ?.	
 ?*	
 ?
tableExpression:	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?tableReference	
 ?[,	
 ?tableReference	
 ?]*	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?tableExpression	
 ?[	
 ?NATURAL	
 ?]	
 ?[	
 ?LEFT	
 ?|	
 ?RIGHT	
 ?|	
 ?FULL	
 ?]	
 ?	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?JOIN	
 ?tableExpression	
 ?[	
 ?joinCondition	
 ?]	
 ?
!
joinCondition:	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?ON	
 ?booleanExpression	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?USING	
 ?(	
 ?column	
 ?[,	
 ?column	
 ?]*	
 ?)	
 ?
!
tableReference:	
 ?
	
 ?	
 ?tablePrimary	
 ?[	
 ?[	
 ?AS	
 ?]	
 ?alias	
 ?[	
 ?(	
 ?columnAlias	
 ?[,	
 ?columnAlias	
 ?]*	
 ?)	
 ?]	
 ?]	
 ?
!
tablePrimary:	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?[	
 ?TABLE	
 ?]	
 ?[	
 ?[	
 ?catalogName	
 ?.	
 ?]	
 ?schemaName	
 ?.	
 ?]	
 ?tableName	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?(	
 ?query	
 ?)	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?VALUES	
 ?expression	
 ?[,	
 ?expression	
 ?]*	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?(	
 ?TABLE	
 ?expression	
 ?)	
 ?
!
windowRef:	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?windowName	
 ?
	
 ?	
 ?|	
 ?	
 ?	
 ?windowSpec	
 ?
!
windowSpec:	
 ?
	
 ?	
 ?[	
 ?windowName	
 ?]	
 ?
	
 ?	
 ?(	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?[	
 ?ORDER	
 ?BY	
 ?orderItem	
 ?[,	
 ?orderItem	
 ?]*	
 ?]	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?[	
 ?PARTITION	
 ?BY	
 ?expression	
 ?[,	
 ?expression	
 ?]*	
 ?]	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?{	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?RANGE	
 ?numericOrInterval	
 ?{	
 ?PRECEDING	
 ?|	
 ?FOLLOWING	
 ?}	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?|	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?ROWS	
 ?numeric	
 ?{	
 ?PRECEDING	
 ?|	
 ?FOLLOWING	
 ?}	
 ?
	
 ?	
 ?	
 ?	
 ?	
 ?	
 ?}	
 ?
	
 ?	
 ?)
Lingual 1.1 -> Optiq 0.4.12.3
https://github.com/julianhyde/optiq/blob/master/REFERENCE.md
Lingual provides two interfaces.
APIS
25
Allows SQL and non-SQL Flows to work
together as a single application via
conceptually similar interfaces
CASCADINGAPI
26
27
Cascading API	

!
FlowDef flowDef = FlowDef.flowDef()!
.setName( "sqlflow" )!
.addSource( "example.employee", emplTap )!
.addSource( "example.sales", salesTap )!
.addSink( "results", resultsTap );!
?!
SQLPlanner sqlPlanner = new SQLPlanner()!
.setSql( sqlStatement );!
?!
flowDef.addAssemblyPlanner( sqlPlanner );!
!
Flow	
 ?flow	
 ?=	
 ?new	
 ?HadoopFlowConnector().connect(	
 ?flowDef	
 ?);	
 ?
!
flow.complete();
So Systems and People can talk directly to
Hadoop visible data
JDBCAPI
28
29
JDBC driver	

public void run() throws ClassNotFoundException, SQLException {!
Class.forName( "cascading.lingual.jdbc.Driver" );!
Connection connection =!
DriverManager.getConnection(!
"jdbc:lingual:local;schemas=src/main/resources/data/example"
);!
Statement statement = connection.createStatement();!
?!
ResultSet resultSet = statement.executeQuery(!
"select *n"!
+ "from "EXAMPLE"."SALES_FACT_1997" as sn"!
+ "join "EXAMPLE"."EMPLOYEE" as en"!
+ "on e."EMPID" = s."CUST_ID"" );!
?!
// do something!
?!
resultSet.close();!
statement.close();!
connection.close();!
}
JDBC
30
Server / Desktop
JDBC
FlowAssembly
Cluster
JobJobSQL
select * from
employees ...
SQL
select * from
employees ...
SQL
select * from
employees ...
lingual-hadoop-1.1.0-jdbc.jar
meta-data catalog
DEFAULTSHELL
31
select dept_no, avg( max_salary ) from employees.dept_emp,
( select emp_no as sal_emp_no, max( salary ) as max_salary from employees.salaries
group by emp_no )
where dept_emp.emp_no = sal_emp_no group by dept_no;
SUB-QUERY
32
ACCESSHADOOPFROMR
33
# load the JDBC package!
library(RJDBC)!
?!
# set up the driver!
drv <- JDBC("cascading.lingual.jdbc.Driver", !
"~/src/concur/lingual/lingual-local/build/libs/lingual-local-1.0.0-wip-dev-jdbc.jar")!
?!
# set up a database connection to a local repository!
connection <- dbConnect(drv, !
"jdbc:lingual:local;catalog=~/src/concur/lingual/lingual-examples/tables;schema=EMPLOYEES")!
?!
# query the repository: in this case the MySQL sample database (CSV files)!
df <- dbGetQuery(connection, !
"SELECT * FROM EMPLOYEES.EMPLOYEES WHERE FIRST_NAME = 'Gina'")!
head(df)!
?!
# use R functions to summarize and visualize part of the data!
df$hire_age <- as.integer(as.Date(df$HIRE_DATE) - as.Date(df$BIRTH_DATE)) / 365.25!
summary(df$hire_age)!
!
library(ggplot2)!
m <- ggplot(df, aes(x=hire_age))!
m <- m + ggtitle("Age at hire, people named Gina")!
m + geom_histogram(binwidth=1, aes(y=..density.., fill=..count..)) + geom_density()
RESULTS
34
> summary(df$hire_age)
Min. 1st Qu. Median Mean 3rd Qu. Max.
20.86 27.89 31.70 31.61 35.01 43.92
INTEGRATION
35
But I use a custom data format!
? Any Cascading Tap and/or Scheme can be used from JDBC
? Use a ¡°fat jar¡± on local disk or from a Maven repo
? cascading-jdbc:cascading-jdbc-oracle-provider:1.0
? The Jar is dynamically loaded into cluster, on the ?y
DATAPROVIDERAPI
36
DATAPROVIDER
37
JDBC
Maven Repo
Assembly Flow
Cluster
JobJob
lingual-hadoop-1.1.0-jdbc.jar
cascading-jdbc-oracle-provider.jar
your-avro-provider.jar
AMAZONEMR&REDSHIFT
38
Amazon Elastic MapReduce
Job Job Job Job
SELECT ... FROM ?le1 JOIN ?le2 ON ?le1.id = ?le2.id ...
Amazon S3
Amazon RedShift
?le1 ?le2
results
http://docs.cascading.org/tutorials/lingual-redshift/
All Cascading applications can be
visualized and monitored ¡­
MANAGED
39
? Understand how your application maps onto your cluster
? Identify bottlenecks (data, code, or the system)
? Jump to the line of code implicated on a failure
? Plugin available via Maven repo
? Beta UI hosted online
DRIVEN
40
http://cascading.io/driven/
MANAGEDWITHDRIVEN
41
42
ABOOK!
43
Enterprise DataWork?ows?
with Cascading	

O¡¯Reilly, 2013	

amazon.com/dp/1449358721
chris@concurrentinc.com
!
!
@cwensel
DONE
44

More Related Content

Hadoop Summit EU 2014

  • 2. ? Not a ¡°data scientist¡± ? No idea what ¡°big data¡± means ? Used MR in anger once, and did it wrong ? Author of Cascading ? Co-Author of Lingual (w/ Julian Hyde) CHRISKWENSEL 2
  • 3. 3 Why is Hadoop & ¡°big data¡± a thing?
  • 5. More Data More Machines More Algorithms More Tools HADOOP&BIGDATA 5
  • 7. Less red tape More degrees of freedom No upfront design HADOOP&BIGDATA 7
  • 9. Makes hard things possible. CASCADING 9
  • 10. While helping to retain Conceptual Integrity. CASCADING 10
  • 11. "the speed of innovation is proportional to the arrival rate of answers to questions" HADOOP&BIGDATA 11
  • 12. True when you are questioning Data, Algorithms, and Architecture CASCADING 12
  • 13. ? Java API (alternative to Hadoop MapReduce) ? Separates business logic from integration ? Testable at every lifecycle stage ? Works with any JVM language ? Many integration adapters CASCADING 13 Process Planner Processing API Integration API Scheduler API Scheduler Compute Cascading Data Stores Scripting Scala, Clojure, JRuby, Jython, Groovy Enterprise Java
  • 15. ? Started in 2007 ? 2.0 released June 2012 ? 2.5 stable out now ? 3.0 wip now available ? Tez support coming soon ? Apache Licensed Open-Source ? Supports all Hadoop 1 & 2 distros CASCADING 15
  • 16. ANSI SQL on Cascading on Whatever LINGUAL 16
  • 17. How¡¯s this different than all the other ¡°SQL for Hadoop¡± projects? LINGUAL 17
  • 18. Not intended as an ad-hoc query interface. [Lingual is only as fast as Hadoop] WHYLINGUAL? 18
  • 19. Is intended to be as standards compliant as possible. WHYLINGUAL? 19
  • 20. Migrate workloads from expensive systems to less expensive Hadoop WHYLINGUAL? 20
  • 21. Liberate the data trapped on Hadoop w/o involving an Engineer WHYLINGUAL? 21
  • 22. ? ANSI Compatible SQL ? JDBC Driver ? Cascading Java API ? SQL Command Shell ? Catalog Manager Tool ? Data Provider API LINGUAL 22 Query Planner JDBC API Lingual APIProvider API Cascading Compute Lingual Data Stores CLI / Shell Enterprise Java Catalog
  • 23. ? SQL-92 ? Character, Numeric, and Temporal types ? IN and CASE ? FROM sub-queries ? CAST and CONVERT ? CURRENT_* ANSISQL 23 http://docs.cascading.org/lingual/1.1/#sql-support
  • 24. 24 query: ? ? ?{ ? ? ? ? ? ? ?select ? ? ?| ? ? ?query ?UNION ?[ ?ALL ?] ?query ? ? ?| ? ? ?query ?EXCEPT ?query ? ? ?| ? ? ?query ?INTERSECT ?query ? ? ?} ? ? ?[ ?ORDER ?BY ?orderItem ?[, ?orderItem ?]* ?] ? ? ?[ ?LIMIT ?{ ?count ?| ?ALL ?} ?] ? ? ?[ ?OFFSET ?start ?{ ?ROW ?| ?ROWS ?} ?] ? ? ?[ ?FETCH ?{ ?FIRST ?| ?NEXT ?} ?[ ?count ?] ?{ ?ROW ?| ?ROWS ?} ?] ? ! orderItem: ? ? ?expression ?[ ?ASC ?| ?DESC ?] ?[ ?NULLS ?FIRST ?| ?NULLS ?LAST ?] ? ! select: ? ? ?SELECT ?[ ?ALL ?| ?DISTINCT ?] ? ? ? ? ? ? ?{ ?* ?| ?projectItem ?[, ?projectItem ?]* ?} ? ? ?FROM ?tableExpression ? ? ?[ ?WHERE ?booleanExpression ?] ? ? ?[ ?GROUP ?BY ?{ ?() ?| ?expression ?[, ?expression]* ?} ?] ? ? ?[ ?HAVING ?booleanExpression ?] ? ? ?[ ?WINDOW ?windowName ?AS ?windowSpec ?[, ?windowName ?AS ?windowSpec ?]* ?] ? ! projectItem: ? ? ? ? ? ? ?expression ?[ ?[ ?AS ?] ?columnAlias ?] ? ? ?| ? ? ?tableAlias ?. ?* ? tableExpression: ? ? ? ? ? ? ?tableReference ?[, ?tableReference ?]* ? ? ?| ? ? ?tableExpression ?[ ?NATURAL ?] ?[ ?LEFT ?| ?RIGHT ?| ?FULL ?] ? ? ? ? ? ? ? ? ? ?JOIN ?tableExpression ?[ ?joinCondition ?] ? ! joinCondition: ? ? ? ? ? ? ?ON ?booleanExpression ? ? ?| ? ? ?USING ?( ?column ?[, ?column ?]* ?) ? ! tableReference: ? ? ?tablePrimary ?[ ?[ ?AS ?] ?alias ?[ ?( ?columnAlias ?[, ?columnAlias ?]* ?) ?] ?] ? ! tablePrimary: ? ? ? ? ? ? ?[ ?TABLE ?] ?[ ?[ ?catalogName ?. ?] ?schemaName ?. ?] ?tableName ? ? ?| ? ? ?( ?query ?) ? ? ?| ? ? ?VALUES ?expression ?[, ?expression ?]* ? ? ?| ? ? ?( ?TABLE ?expression ?) ? ! windowRef: ? ? ? ? ? ? ?windowName ? ? ?| ? ? ?windowSpec ? ! windowSpec: ? ? ?[ ?windowName ?] ? ? ?( ? ? ? ? ? ? ?[ ?ORDER ?BY ?orderItem ?[, ?orderItem ?]* ?] ? ? ? ? ? ? ?[ ?PARTITION ?BY ?expression ?[, ?expression ?]* ?] ? ? ? ? ? ? ?{ ? ? ? ? ? ? ? ? ? ? ?RANGE ?numericOrInterval ?{ ?PRECEDING ?| ?FOLLOWING ?} ? ? ? ? ? ? ?| ? ? ? ? ? ? ? ? ? ? ?ROWS ?numeric ?{ ?PRECEDING ?| ?FOLLOWING ?} ? ? ? ? ? ? ?} ? ? ?) Lingual 1.1 -> Optiq 0.4.12.3 https://github.com/julianhyde/optiq/blob/master/REFERENCE.md
  • 25. Lingual provides two interfaces. APIS 25
  • 26. Allows SQL and non-SQL Flows to work together as a single application via conceptually similar interfaces CASCADINGAPI 26
  • 27. 27 Cascading API ! FlowDef flowDef = FlowDef.flowDef()! .setName( "sqlflow" )! .addSource( "example.employee", emplTap )! .addSource( "example.sales", salesTap )! .addSink( "results", resultsTap );! ?! SQLPlanner sqlPlanner = new SQLPlanner()! .setSql( sqlStatement );! ?! flowDef.addAssemblyPlanner( sqlPlanner );! ! Flow ?flow ?= ?new ?HadoopFlowConnector().connect( ?flowDef ?); ? ! flow.complete();
  • 28. So Systems and People can talk directly to Hadoop visible data JDBCAPI 28
  • 29. 29 JDBC driver public void run() throws ClassNotFoundException, SQLException {! Class.forName( "cascading.lingual.jdbc.Driver" );! Connection connection =! DriverManager.getConnection(! "jdbc:lingual:local;schemas=src/main/resources/data/example" );! Statement statement = connection.createStatement();! ?! ResultSet resultSet = statement.executeQuery(! "select *n"! + "from "EXAMPLE"."SALES_FACT_1997" as sn"! + "join "EXAMPLE"."EMPLOYEE" as en"! + "on e."EMPID" = s."CUST_ID"" );! ?! // do something! ?! resultSet.close();! statement.close();! connection.close();! }
  • 30. JDBC 30 Server / Desktop JDBC FlowAssembly Cluster JobJobSQL select * from employees ... SQL select * from employees ... SQL select * from employees ... lingual-hadoop-1.1.0-jdbc.jar meta-data catalog
  • 32. select dept_no, avg( max_salary ) from employees.dept_emp, ( select emp_no as sal_emp_no, max( salary ) as max_salary from employees.salaries group by emp_no ) where dept_emp.emp_no = sal_emp_no group by dept_no; SUB-QUERY 32
  • 33. ACCESSHADOOPFROMR 33 # load the JDBC package! library(RJDBC)! ?! # set up the driver! drv <- JDBC("cascading.lingual.jdbc.Driver", ! "~/src/concur/lingual/lingual-local/build/libs/lingual-local-1.0.0-wip-dev-jdbc.jar")! ?! # set up a database connection to a local repository! connection <- dbConnect(drv, ! "jdbc:lingual:local;catalog=~/src/concur/lingual/lingual-examples/tables;schema=EMPLOYEES")! ?! # query the repository: in this case the MySQL sample database (CSV files)! df <- dbGetQuery(connection, ! "SELECT * FROM EMPLOYEES.EMPLOYEES WHERE FIRST_NAME = 'Gina'")! head(df)! ?! # use R functions to summarize and visualize part of the data! df$hire_age <- as.integer(as.Date(df$HIRE_DATE) - as.Date(df$BIRTH_DATE)) / 365.25! summary(df$hire_age)! ! library(ggplot2)! m <- ggplot(df, aes(x=hire_age))! m <- m + ggtitle("Age at hire, people named Gina")! m + geom_histogram(binwidth=1, aes(y=..density.., fill=..count..)) + geom_density()
  • 34. RESULTS 34 > summary(df$hire_age) Min. 1st Qu. Median Mean 3rd Qu. Max. 20.86 27.89 31.70 31.61 35.01 43.92
  • 35. INTEGRATION 35 But I use a custom data format!
  • 36. ? Any Cascading Tap and/or Scheme can be used from JDBC ? Use a ¡°fat jar¡± on local disk or from a Maven repo ? cascading-jdbc:cascading-jdbc-oracle-provider:1.0 ? The Jar is dynamically loaded into cluster, on the ?y DATAPROVIDERAPI 36
  • 38. AMAZONEMR&REDSHIFT 38 Amazon Elastic MapReduce Job Job Job Job SELECT ... FROM ?le1 JOIN ?le2 ON ?le1.id = ?le2.id ... Amazon S3 Amazon RedShift ?le1 ?le2 results http://docs.cascading.org/tutorials/lingual-redshift/
  • 39. All Cascading applications can be visualized and monitored ¡­ MANAGED 39
  • 40. ? Understand how your application maps onto your cluster ? Identify bottlenecks (data, code, or the system) ? Jump to the line of code implicated on a failure ? Plugin available via Maven repo ? Beta UI hosted online DRIVEN 40 http://cascading.io/driven/
  • 42. 42