際際滷

際際滷Share a Scribd company logo
Spark Performance
Tuning - Part #2 (覲豌襴)
2019. 1. 22
Contents
1. 襦
1. 郁規覦郁化 覦 覦覯
2. 企 螻谿
3. 豸° 螳覦螻旧
2. 覲碁
1. ろ 蟲焔
2. 一危 蟲焔
3. ろ れ
4. ろ 覦 ろ
3. 蟆磯
2
1. 襦
3
4
2-1. 郁規覦郁化 覦 覦覯
′貊ろ 蟲 覦 一危  願 ろ
Hadoop
Name Node
Spark
Master
Hive
Master
Resource Manger
No Lv1 Lv2 Version Contents
1
Oracle
Linux
7.3 OS
2 Hadoop 2.7.6 Distributed Storage
3 Spark 2.2.0
Distributed
Processing
4 Hive 2.3.3
Supprt SQL
(Master, Only master)
5 MariaDB 10.2.11
RDB
(Master, Only master)
6
Oracle
Client
18.3.0.0.
0
Oracle DB
client
Maria DB
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop
DataNode
Spark
Worker
NodeManager
Service configurationHadoop Ecosystem
Secondary
Name Node
p-master
hadoop1 hadoop2 hadoop3
5
2-1. 郁規覦郁化 覦 覦覯
′貊ろ 蟲 覦 一危  願 ろ
覿り鍵 豌襴 ロ蠍
Source Data Spark Output Data
襷れれ 一危 願 襷れれ
Oracle
DB
Spark
Oracle
DB
S/W
H/W
S/W
H/W
S/W
H/W
6
2-2. 企 螻谿
一
企ろ磯ゼ 牛  覦壱螻 螳 碁 覃覈襴 蟲  譴願鍵 
Spark 一危磯ゼ 一 牛 覿螻 螳 一 Executor °.
豌 一危
一 一#1 一#2 一#3
Executor1 Executor2 Executor3
7
2-2. 企 螻谿
1. Spark 蟲覦
Spark-submit 
Spark  豢
Sparkcontext 殊企
襦蠏碁 ろ
企ろ 襷ろ
碁 WORKER NODE襯 牛
Executor螳 ろ覃 襦蠏碁 蟲
8
2-2. 企 螻谿
2. 襷
一危郁 一螳 覦一    覦. (蠏碁9 , 蠏)
一
 郁鍵
一#1 一#2 一#3
Executor1 Executor2 Executor3
一#1 一#2 一#3 所鍵
9
2-2. 企 螻谿
3. 譯殊 れ
(蠍磯蓋 BASE れ) SPARK-ENV.SH
(壱  れ) SPARK-DEFAULT.CONF
spark-env.sh spark-defaults.conf
10
2-3. 豸° 螳覦螻旧
4. 螳覦 襦語
覿襯 譴覿襯  一覓
谿 螻襴 襦 螻 襴 /  襦 螻
覿
覿 蟲 , ろ 覿 蟲 
覿螻殊  覿螻殊 豢, , 
ろ ろ豎 
覿覈  覿覈/ろ 
り
覿覈 り 覿 襴, 一危, 螻襴讀 り 覦 蟆讀 覿覈 
ろ り 一危磯伎, I/F 覦 ろ り
企, 襦蠏碁, 覃
誤壱伎 覈襦/
蟲豢 ろ 螳覦 覃, DB, 覦一, I/F 螳覦 れ, DB
ろ /牛 ろ
ろ 襴  覦   覦 牛 ろ
螻/谿/蟆郁骸牛ろ 螻 / 襴 襴 覦 ろ
危
覦 
ろ 危 危螻  危 螻
 蟲 /伎 蟲 れ 覦 襷る伎 
 襷る伎,
伎 襷る伎
譬襭 襦 譬襭 譬襭覲願 覦 語 襭覲願, 蟆語
3. 覲碁
11
3-1. ろ 蟲焔
12
Input DB Output DB
192.168.110.112 192.168.110.111
Hadoop
Name Node
Spark
Master
Hive
Master
Resource Manger
Maria DB
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop Ecosystem
Secondary
Name Node
p-master
hadoop1 hadoop2 hadoop3
192.168.110.117
192.168.110.118 192.168.110.119 192.168.110.120
Oracle Oracle
3-2. 一危 蟲焔
13
Define the outbound and inbound data
No InterfaceID Content System Type Count Periods Column cnt Comments
1 IB-001 Sellout Dev System RDB 100 million - 17 TBD
2 IB-002 Sellout Dev System RDB 13 million 17 1/22
3 IB-003 Parameter Dev System RDB 2 5
inbound
No InterfaceID Content System Type Count Periods Column cnt Comments
1 OB-001 Sellout Op System RDB
100
million
- 17 TBD
2 OB-002 Sellout Op System RDB 13 million 17 1/22
outbound
3-3. ろ れ
14
Div Value
Cluster 3
Worker 覯覲 1
Executor-count 覯覲 3
Executor-core 4
Executor-memory 10
企 PC #1
Executor
Core: 4
Mem: 10
Worker #1
Worker #2, .
CPU: 16貊 MEM: 40G 
襷ろ PC
Worker #1
MEM: 5G 
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
企 PC #1
Executor
Core: 4
Mem: 10
Worker #1
Worker #2, .
CPU: 16貊 MEM: 40G 
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
企 PC #1
Executor
Core: 4
Mem: 10
Worker #1
Worker #2, .
CPU: 16貊 MEM: 40G 
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
3-4. ろ 覦 ろ
15
(Case #1) 覲豌襴 貊 覩   (貊ろ)
spark-submit --class com.spark.c10_dataTransfer.basicDataTransfer sparkProgramming-spark-1.0.jar
一 覩
3-4. ろ 覦 ろ
16
(Case #1) 覲豌襴 貊 覩   (蟆郁骸)
3-4. ろ 覦 ろ
17
(Case #2) 覲豌襴 貊   (貊ろ)
spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar
一 伎 100 れ
3-4. ろ 覦 ろ
18
(Case #2) 覲豌襴 貊   (蟆郁骸)
3-4. ろ 覦 ろ
19
(Case #3) 覲豌襴 貊   (貊ろ)
spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar
一 伎 1000 れ
3-4. ろ 覦 ろ
20
(Case #3) 覲豌襴 貊   (蟆郁骸)
3-4. ろ 覦 ろ
21
(Case #4) 覲豌襴 貊   (貊ろ)
spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar
一 伎 10 れ
3-4. ろ 覦 ろ
22
(Case #4) 覲豌襴 貊   (蟆郁骸)
4. 蟆磯
23
4. 蟆磯
 Spark 一危 覿り鍵 覦 ロ蠍  貊 
覲豌襴    レ 螳ロ.
(, れ Band 蠍磯ゼ  一 伎 螻り )
蟲覿 一危 (1500襷蟇, 2.6GB) 觜螻
覲豌襴 覩   7覿 No 一
  (一 伎 100) 3.7覿 一 蠍 100
  (一 伎 1000) 12覿 一 蠍 1000
  (一 伎 10) 16覿 一 蠍 100
Thank you
25
End of Document

More Related Content

What's hot (20)

PDF
[OpenInfra Days Korea 2018] Day 2 - E1: ルれ企 - OpenStack 譟願鍵
OpenStack Korea Community
PDF
[OpenInfra Days Korea 2018] Day 2 - CEPH 伎襯 Object Storage Performance T...
OpenStack Korea Community
PPTX
introduce of Hadoop map reduce
Daeyong Shin
PDF
殊ろ煙豺 企ろ磯 蟇伎 一危 伎蠍
ル 蟾
PDF
[Pgday.Seoul 2018] Greenplum 碁 覿 り
PgDay.Seoul
PPTX
Vectorized processing in_a_nutshell_DeView2014
Gruter
PDF
TestDFSIO
hhyin
PPTX
data platform on kubernetes
谿曙
PPTX
2.apache spark れ
PDF
Data platform data pipeline(Airflow, Kubernetes)
谿曙
PDF
Grafana Review
Sangmo Goo
PDF
Terasort
hhyin
PDF
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
PDF
Alluxio: Data Orchestration on Multi-Cloud
Jinwook Chung
PPTX
Spark 螳 1覿
Jinho Yoo
PDF
20141029 2.5 hiveれ 覦
Tae Young Lee
PDF
觜一危 覿 ろ 2 襦蠏碁覦 : 一危 豌襴覿 襾語蟾讌
PDF
Fluentd with MySQL
I Goo Lee
PDF
4.1 狩語ろ語 覿
Mungyu Choi
PDF
AWS 蟆曙 MySQL Infra り蠍-2覿.覲碁
I Goo Lee
[OpenInfra Days Korea 2018] Day 2 - E1: ルれ企 - OpenStack 譟願鍵
OpenStack Korea Community
[OpenInfra Days Korea 2018] Day 2 - CEPH 伎襯 Object Storage Performance T...
OpenStack Korea Community
introduce of Hadoop map reduce
Daeyong Shin
殊ろ煙豺 企ろ磯 蟇伎 一危 伎蠍
ル 蟾
[Pgday.Seoul 2018] Greenplum 碁 覿 り
PgDay.Seoul
Vectorized processing in_a_nutshell_DeView2014
Gruter
TestDFSIO
hhyin
data platform on kubernetes
谿曙
2.apache spark れ
Data platform data pipeline(Airflow, Kubernetes)
谿曙
Grafana Review
Sangmo Goo
Terasort
hhyin
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
Alluxio: Data Orchestration on Multi-Cloud
Jinwook Chung
Spark 螳 1覿
Jinho Yoo
20141029 2.5 hiveれ 覦
Tae Young Lee
觜一危 覿 ろ 2 襦蠏碁覦 : 一危 豌襴覿 襾語蟾讌
Fluentd with MySQL
I Goo Lee
4.1 狩語ろ語 覿
Mungyu Choi
AWS 蟆曙 MySQL Infra り蠍-2覿.覲碁
I Goo Lee

Similar to Spark performance tuning (20)

PDF
Rankwave MOMENT (Korean)
HyoungEun Kim
PPTX
[D2 COMMUNITY] Spark User Group - ろ襯 牛 ル 企螻 れ
NAVER D2
PDF
Rankwave moment desc3
Sungwha Shim
PDF
EMR 蠍磯 Spark 襦 ろ 豕 覦 - 語, AWS 襭讀 ろ:: AWS Summit Online Ko...
Amazon Web Services Korea
PPTX
Apache spark 螳 覦 れ
PPTX
Hadoop cluster os_tuning_v1.0_20170106_mobile
PPTX
DeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun Kim
Gruter
PDF
觜一危 豌襴蠍一 危
paul lee
PPTX
貊ろ 危ろ - DSTS 2019
Kenneth Ceyer
PDF
Hadoop engineering v1.0 for dataconference.io
daumkakao
PDF
螻焔 觜一危 讌 覦 覿 襭 - 磯Дれ 轟
eungjin cho
PDF
觜一危, big data
H K Yoon
PPTX
Bigquery airflow襯 伎 一危 覿 ろ 蟲豢 v1 覓願鍵(譯) 豕 20170912
Yooseok Choi
PDF
Spark_Overview_qna
豌 覦
PPTX
What is spark
jaeho kang
PDF
20180714 ろ磯 譬襭 覲願 覦 郁規螻殊 覦襭
BOMI KIM
PDF
log-monitoring-architecture.pdf
Sungkyun Kim
PDF
MS 觜一危 觜 覦 蟆 PoC 襦 螳
I Goo Lee
PDF
Real-time Big Data Analytics Practice with Unstructured Data
Ted Won
PPTX
An introduction to hadoop
MinJae Kang
Rankwave MOMENT (Korean)
HyoungEun Kim
[D2 COMMUNITY] Spark User Group - ろ襯 牛 ル 企螻 れ
NAVER D2
Rankwave moment desc3
Sungwha Shim
EMR 蠍磯 Spark 襦 ろ 豕 覦 - 語, AWS 襭讀 ろ:: AWS Summit Online Ko...
Amazon Web Services Korea
Apache spark 螳 覦 れ
Hadoop cluster os_tuning_v1.0_20170106_mobile
DeView2013 Big Data Platform Architecture with Hadoop - Hyeong-jun Kim
Gruter
觜一危 豌襴蠍一 危
paul lee
貊ろ 危ろ - DSTS 2019
Kenneth Ceyer
Hadoop engineering v1.0 for dataconference.io
daumkakao
螻焔 觜一危 讌 覦 覿 襭 - 磯Дれ 轟
eungjin cho
觜一危, big data
H K Yoon
Bigquery airflow襯 伎 一危 覿 ろ 蟲豢 v1 覓願鍵(譯) 豕 20170912
Yooseok Choi
Spark_Overview_qna
豌 覦
What is spark
jaeho kang
20180714 ろ磯 譬襭 覲願 覦 郁規螻殊 覦襭
BOMI KIM
log-monitoring-architecture.pdf
Sungkyun Kim
MS 觜一危 觜 覦 蟆 PoC 襦 螳
I Goo Lee
Real-time Big Data Analytics Practice with Unstructured Data
Ted Won
An introduction to hadoop
MinJae Kang
Ad

Spark performance tuning

  • 1. Spark Performance Tuning - Part #2 (覲豌襴) 2019. 1. 22
  • 2. Contents 1. 襦 1. 郁規覦郁化 覦 覦覯 2. 企 螻谿 3. 豸° 螳覦螻旧 2. 覲碁 1. ろ 蟲焔 2. 一危 蟲焔 3. ろ れ 4. ろ 覦 ろ 3. 蟆磯 2
  • 4. 4 2-1. 郁規覦郁化 覦 覦覯 ′貊ろ 蟲 覦 一危 願 ろ Hadoop Name Node Spark Master Hive Master Resource Manger No Lv1 Lv2 Version Contents 1 Oracle Linux 7.3 OS 2 Hadoop 2.7.6 Distributed Storage 3 Spark 2.2.0 Distributed Processing 4 Hive 2.3.3 Supprt SQL (Master, Only master) 5 MariaDB 10.2.11 RDB (Master, Only master) 6 Oracle Client 18.3.0.0. 0 Oracle DB client Maria DB Hadoop DataNode Spark Worker NodeManager Hadoop DataNode Spark Worker NodeManager Hadoop DataNode Spark Worker NodeManager Service configurationHadoop Ecosystem Secondary Name Node p-master hadoop1 hadoop2 hadoop3
  • 5. 5 2-1. 郁規覦郁化 覦 覦覯 ′貊ろ 蟲 覦 一危 願 ろ 覿り鍵 豌襴 ロ蠍 Source Data Spark Output Data 襷れれ 一危 願 襷れれ Oracle DB Spark Oracle DB S/W H/W S/W H/W S/W H/W
  • 6. 6 2-2. 企 螻谿 一 企ろ磯ゼ 牛 覦壱螻 螳 碁 覃覈襴 蟲 譴願鍵 Spark 一危磯ゼ 一 牛 覿螻 螳 一 Executor °. 豌 一危 一 一#1 一#2 一#3 Executor1 Executor2 Executor3
  • 7. 7 2-2. 企 螻谿 1. Spark 蟲覦 Spark-submit Spark 豢 Sparkcontext 殊企 襦蠏碁 ろ 企ろ 襷ろ 碁 WORKER NODE襯 牛 Executor螳 ろ覃 襦蠏碁 蟲
  • 8. 8 2-2. 企 螻谿 2. 襷 一危郁 一螳 覦一 覦. (蠏碁9 , 蠏) 一 郁鍵 一#1 一#2 一#3 Executor1 Executor2 Executor3 一#1 一#2 一#3 所鍵
  • 9. 9 2-2. 企 螻谿 3. 譯殊 れ (蠍磯蓋 BASE れ) SPARK-ENV.SH (壱 れ) SPARK-DEFAULT.CONF spark-env.sh spark-defaults.conf
  • 10. 10 2-3. 豸° 螳覦螻旧 4. 螳覦 襦語 覿襯 譴覿襯 一覓 谿 螻襴 襦 螻 襴 / 襦 螻 覿 覿 蟲 , ろ 覿 蟲 覿螻殊 覿螻殊 豢, , ろ ろ豎 覿覈 覿覈/ろ り 覿覈 り 覿 襴, 一危, 螻襴讀 り 覦 蟆讀 覿覈 ろ り 一危磯伎, I/F 覦 ろ り 企, 襦蠏碁, 覃 誤壱伎 覈襦/ 蟲豢 ろ 螳覦 覃, DB, 覦一, I/F 螳覦 れ, DB ろ /牛 ろ ろ 襴 覦 覦 牛 ろ 螻/谿/蟆郁骸牛ろ 螻 / 襴 襴 覦 ろ 危 覦 ろ 危 危螻 危 螻 蟲 /伎 蟲 れ 覦 襷る伎 襷る伎, 伎 襷る伎 譬襭 襦 譬襭 譬襭覲願 覦 語 襭覲願, 蟆語
  • 12. 3-1. ろ 蟲焔 12 Input DB Output DB 192.168.110.112 192.168.110.111 Hadoop Name Node Spark Master Hive Master Resource Manger Maria DB Hadoop DataNode Spark Worker NodeManager Hadoop DataNode Spark Worker NodeManager Hadoop DataNode Spark Worker NodeManager Hadoop Ecosystem Secondary Name Node p-master hadoop1 hadoop2 hadoop3 192.168.110.117 192.168.110.118 192.168.110.119 192.168.110.120 Oracle Oracle
  • 13. 3-2. 一危 蟲焔 13 Define the outbound and inbound data No InterfaceID Content System Type Count Periods Column cnt Comments 1 IB-001 Sellout Dev System RDB 100 million - 17 TBD 2 IB-002 Sellout Dev System RDB 13 million 17 1/22 3 IB-003 Parameter Dev System RDB 2 5 inbound No InterfaceID Content System Type Count Periods Column cnt Comments 1 OB-001 Sellout Op System RDB 100 million - 17 TBD 2 OB-002 Sellout Op System RDB 13 million 17 1/22 outbound
  • 14. 3-3. ろ れ 14 Div Value Cluster 3 Worker 覯覲 1 Executor-count 覯覲 3 Executor-core 4 Executor-memory 10 企 PC #1 Executor Core: 4 Mem: 10 Worker #1 Worker #2, . CPU: 16貊 MEM: 40G 襷ろ PC Worker #1 MEM: 5G Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 企 PC #1 Executor Core: 4 Mem: 10 Worker #1 Worker #2, . CPU: 16貊 MEM: 40G Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 企 PC #1 Executor Core: 4 Mem: 10 Worker #1 Worker #2, . CPU: 16貊 MEM: 40G Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10
  • 15. 3-4. ろ 覦 ろ 15 (Case #1) 覲豌襴 貊 覩 (貊ろ) spark-submit --class com.spark.c10_dataTransfer.basicDataTransfer sparkProgramming-spark-1.0.jar 一 覩
  • 16. 3-4. ろ 覦 ろ 16 (Case #1) 覲豌襴 貊 覩 (蟆郁骸)
  • 17. 3-4. ろ 覦 ろ 17 (Case #2) 覲豌襴 貊 (貊ろ) spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar 一 伎 100 れ
  • 18. 3-4. ろ 覦 ろ 18 (Case #2) 覲豌襴 貊 (蟆郁骸)
  • 19. 3-4. ろ 覦 ろ 19 (Case #3) 覲豌襴 貊 (貊ろ) spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar 一 伎 1000 れ
  • 20. 3-4. ろ 覦 ろ 20 (Case #3) 覲豌襴 貊 (蟆郁骸)
  • 21. 3-4. ろ 覦 ろ 21 (Case #4) 覲豌襴 貊 (貊ろ) spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar 一 伎 10 れ
  • 22. 3-4. ろ 覦 ろ 22 (Case #4) 覲豌襴 貊 (蟆郁骸)
  • 24. 4. 蟆磯 Spark 一危 覿り鍵 覦 ロ蠍 貊 覲豌襴 レ 螳ロ. (, れ Band 蠍磯ゼ 一 伎 螻り ) 蟲覿 一危 (1500襷蟇, 2.6GB) 觜螻 覲豌襴 覩 7覿 No 一 (一 伎 100) 3.7覿 一 蠍 100 (一 伎 1000) 12覿 一 蠍 1000 (一 伎 10) 16覿 一 蠍 100