端端舝

1,000,000 foot view of Hadoop-like
parallel data processing systems
2014.10.11 @DSIRNLP
Hiroyuki Yamada (feeblefakie)

赻撩畿賡
? 刓泬瘋眳 (feeblefakie)
? 郔輪𤩸中凶𤩸戮及皿伕白奴奈伙井日
? ゜掛失奶?申奈?巨丞絁宒頗扦毛磊化ㄛ乩白奈絁
宒頗扦卞化煦汃倰�恅𨈘坰巨件斥件及旃噶嶱逃卞
𣶹岈﹝2008爛奻ヽ帤怳IT�第逃橢?郤傖岈𦲀卞云
中化詢俶夔煦汃倰𨈘坰巨件斥件及嶱逃卞方曰旦奈
由奈弁伉巨奈正卞庲隅﹝政婓反𡌂儔湮悝汜宴撮胍
旃噶垀卞化詢俶夔�K蹈犯奈正矛奈旦及旃噶嶱逃卞
𣶹岈﹝痔尪ㄗロ�燴馱悝ㄘ﹝
? 扑旦氾丞挌匹允

跁諒模ㄗ𡌂筵奶件啐�模ㄘ
午掛毛𤩸五引仄凶
? 踏゜及逃桶午反��
憝�S丐曰引六氏
? 惁歹卅中午隴゜弁仿
旦匹中元戶日木月方
斥亙件弁斻
Power Pushㄐ
喀渝掛虛駙磁3弇ㄗ珨媆ヽㄘ

犯奈正矛奈旦互卅中ㄐ
? DSIRNLP及�砓��ㄩ
? 犯奈正�婖午失伙打伉朮丞﹜ロ�𨈘坰﹜
赻�晟惤�I燴﹜犯奈正穴奶瓦件弘﹜辻迮
悝�﹜�趙悝�﹜�馱眭夔﹜�K俴皿伕弘
仿立件弘﹜ロ�燴�﹜犯奈正㇌窺卅升﹜
匹允互﹜♂♂妏勻化心凶!午中丹逃桶眕俋卅
日升氏卅氾奈穴?逃桶匹手湮极OK匹允﹝
? 犯奈正矛奈旦憝�S及趕毛仄引允

Figure by courtesy of Cloudera.

Hadoop-like parallel data processing
systems from 1,000,000 feet
? Hadoop脹及�K蹈犯奈正�I燴炵
每 Hadoop, Hive, Impala, Presto, Spark, Tez
? 迶坒髦秎及�I燴炵及掛斮毛畿賡
每 Impala, Presto勻化睡匹厒中及ˋ
每 Spark, Tez勻化睡ˋ

MapReduce: A major step backwards
? By D. DeWitt and M. Stonebraker
? MapReduce is not novel
http://homes.cs.washington.edu/~billhowe/mapreduce_a_major_step_backwards.html
The MapReduce community seems to feel that they have discovered an entirely new paradigm for
processing large data sets. In actuality, the techniques employed by MapReduce are more than 20
years old. The idea of partitioning a large data set into smaller partitions was first proposed in
"Application of Hash to Data Base Machine and Its Architecture" [11] as the basis for a new type of join
algorithm. In "Multiprocessor Hash-Based Join Algorithms," [7], Gerber demonstrated how
Kitsuregawa's techniques could be extended to execute joins in parallel on a shared-nothing [8]
cluster using a combination of partitioned tables, partitioned execution, and hash based splitting.
DeWitt [2] showed how these techniques could be adopted to execute aggregates with and without
group by clauses in parallel. DeWitt and Gray [6] described parallel database systems and how they
process queries. Shatdal and Naughton [9] explored alternative strategies for executing aggregates in
parallel.
Teradata has been selling a commercial DBMS utilizing all of these techniques for more than 20
years; exactly the techniques that the MapReduce crowd claims to have invented.

Hadoop
? I/Fㄩmap(), reduce()ㄗProceduralㄘ
? �I燴乒犯伙ㄩ甩永扑弁仿旦正伉件弘毛蚚
中凶�K蹈穴奈斥末奈玄
? Unary Operator => 考, 羽, aggregation
? �竘卞Binary Operator趙 =>

Hive
? I/FㄩSQLㄗDeclarativeㄘ
? �I燴乒犯伙ㄩ甩永扑弁仿旦正伉件弘毛蚚
中凶�K蹈穴奈斥末奈玄 w/ Left-Deep Tree
? Sort-Merge Join (Reduce-side Join)
? Co-partitioned/Broadcast Hash Join (Map-
side Join)

Impala, Presto
? I/FㄩSQLㄗDeclarativeㄘ
? �I燴乒犯伙ㄩIn-memory Hash Join w/ Right-
Deep Tree
? Partitioned+Pipelined Hash Join (Repartition,
Broadcast)
每 In memory only, No Grace, No Hybrid
? Hive午及船
每 Hash Join vs. Sort Merge Join
每 In-memory based (Pipeline) vs. Disk-based
(External Sort)

[娗逋] Parallelism in Database Systems
? Inter-query parallelism: 恚杅及弁巨伉毛肮媆卞灍俴
? Intra-query parallelism: ㄠ勾及弁巨伉毛�K蹈卞灍俴
? Independent, Pipelined, Partitioned
Pipelined:
Producer-Consumer憝�S及
左矢伊奈正毛公木冗木
�卅月穴扑件匹灍俴
Independent:
黃蕾仄凶左矢伊奈正毛
�卅月穴扑件匹灍俴
Partitioned:
珨勾及左矢伊奈正毛
恚杅及穴扑件匹灍俴
�燴腔卅左矢伊奈正躂

[娗逋] Hash Join w/ Left/Right/Bushy tree
R1
R2
R4
R3
R1
R2
R4
R3
R1 R2 R4R3
J1
J2
J3
J1
J2
J3
J1 J2
J3
J1a
a b c
a b c
a b c
B P
P
P
B
B
B P
B
B
P
P
B P PB
B P
Sequential
Processing
Pipelined
Processing
J1b J1c
J1a J1b J1c
J2a J2b J2c
J3a J3b J3c
J1a J1b J1c
J2a J2b J2c
LD
RD
BS

[娗逋] Hash Join w/ other trees
Segmented RD Zig Zag
Figures by courtesy of Dr. Nakano.
? 80爛測摽圉井日90爛測ゴ圉卞呏氏卞旃噶
? ㄗ云公日仁ㄘ妀蚚DB尺反癶摯六內

Spark, Tez
? I/Fㄩ忒適五 (Procedural)
? �I燴乒犯伙ㄩDAG
每￤砩及ㄗ引凶反𢜪戶日木凶ㄘ�I燴左矢伊奈正毛窣
磐仄化DAG毛�傖
每 cf. Dryad [Isard＊07], Hyracks [Borkar＊11]
? 喲砓腔卅�K蹈犯奈正�I燴白伊奈丞伐奈弁
每 Hadoop, Impala, Presto毛釬月仇午手�燴腔卞反
褫夔
每 DAG毛赻煦匹𤩸五凶中ˋ
伕打母扔中

Spark SQL, Hive on Tez
? SQL on top of DAG
每 DAG及郔羥趙反抻坰諾嶲及惇逃卞方曰嬪褣
每 Tree卞云中化手躂及倛毛癹隅仄化抻坰
? 磐擁Tree毛蚚中凶郔羥趙
每狟互DAG匹丐月砩庤互卅仁卅勻化中月
每 Left-deep, Bushy in Hive on Tez (H2 2014)
每 �燴腔卞反Impala, Presto午肮元

引午戶
? 郔輪及左奈皿件末奈旦�K蹈犯奈正�I燴炵
毛喲砓腔卞紝舷
? 價掛反80爛測井日90爛測及�K蹈犯奈正
矛奈旦撮胍
? �〝及皿伕母弁玄方曰手掛斮及燴賤尺
每 Hadoop�I燴炵毛眭曰凶中卅日�K蹈DB及燴賤
互輪耋

端端舝

1,000,000 foot view of Hadoop-like parallel data processing systems

More Related Content

1,000,000 foot view of Hadoop-like parallel data processing systems

Editor's Notes