際際滷

際際滷Share a Scribd company logo
Spark Performance
Tuning - Part #2 (覲豌襴)
2019. 1. 22
Contents
1. 襦
1. 郁規覦郁化 覦 覦覯
2. 企 螻谿
3. 豸° 螳覦螻旧
2. 覲碁
1. ろ 蟲焔
2. 一危 蟲焔
3. ろ れ
4. ろ 覦 ろ
3. 蟆磯
2
1. 襦
3
4
2-1. 郁規覦郁化 覦 覦覯
′貊ろ 蟲 覦 一危  願 ろ
Hadoop
Name Node
Spark
Master
Hive
Master
Resource Manger
No Lv1 Lv2 Version Contents
1
Oracle
Linux
7.3 OS
2 Hadoop 2.7.6 Distributed Storage
3 Spark 2.2.0
Distributed
Processing
4 Hive 2.3.3
Supprt SQL
(Master, Only master)
5 MariaDB 10.2.11
RDB
(Master, Only master)
6
Oracle
Client
18.3.0.0.
0
Oracle DB
client
Maria DB
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop
DataNode
Spark
Worker
NodeManager
Service configurationHadoop Ecosystem
Secondary
Name Node
p-master
hadoop1 hadoop2 hadoop3
5
2-1. 郁規覦郁化 覦 覦覯
′貊ろ 蟲 覦 一危  願 ろ
覿り鍵 豌襴 ロ蠍
Source Data Spark Output Data
襷れれ 一危 願 襷れれ
Oracle
DB
Spark
Oracle
DB
S/W
H/W
S/W
H/W
S/W
H/W
6
2-2. 企 螻谿
一
企ろ磯ゼ 牛  覦壱螻 螳 碁 覃覈襴 蟲  譴願鍵 
Spark 一危磯ゼ 一 牛 覿螻 螳 一 Executor °.
豌 一危
一 一#1 一#2 一#3
Executor1 Executor2 Executor3
7
2-2. 企 螻谿
1. Spark 蟲覦
Spark-submit 
Spark  豢
Sparkcontext 殊企
襦蠏碁 ろ
企ろ 襷ろ
碁 WORKER NODE襯 牛
Executor螳 ろ覃 襦蠏碁 蟲
8
2-2. 企 螻谿
2. 襷
一危郁 一螳 覦一    覦. (蠏碁9 , 蠏)
一
 郁鍵
一#1 一#2 一#3
Executor1 Executor2 Executor3
一#1 一#2 一#3 所鍵
9
2-2. 企 螻谿
3. 譯殊 れ
(蠍磯蓋 BASE れ) SPARK-ENV.SH
(壱  れ) SPARK-DEFAULT.CONF
spark-env.sh spark-defaults.conf
10
2-3. 豸° 螳覦螻旧
4. 螳覦 襦語
覿襯 譴覿襯  一覓
谿 螻襴 襦 螻 襴 /  襦 螻
覿
覿 蟲 , ろ 覿 蟲 
覿螻殊  覿螻殊 豢, , 
ろ ろ豎 
覿覈  覿覈/ろ 
り
覿覈 り 覿 襴, 一危, 螻襴讀 り 覦 蟆讀 覿覈 
ろ り 一危磯伎, I/F 覦 ろ り
企, 襦蠏碁, 覃
誤壱伎 覈襦/
蟲豢 ろ 螳覦 覃, DB, 覦一, I/F 螳覦 れ, DB
ろ /牛 ろ
ろ 襴  覦   覦 牛 ろ
螻/谿/蟆郁骸牛ろ 螻 / 襴 襴 覦 ろ
危
覦 
ろ 危 危螻  危 螻
 蟲 /伎 蟲 れ 覦 襷る伎 
 襷る伎,
伎 襷る伎
譬襭 襦 譬襭 譬襭覲願 覦 語 襭覲願, 蟆語
3. 覲碁
11
3-1. ろ 蟲焔
12
Input DB Output DB
192.168.110.112 192.168.110.111
Hadoop
Name Node
Spark
Master
Hive
Master
Resource Manger
Maria DB
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop
DataNode
Spark
Worker
NodeManager
Hadoop Ecosystem
Secondary
Name Node
p-master
hadoop1 hadoop2 hadoop3
192.168.110.117
192.168.110.118 192.168.110.119 192.168.110.120
Oracle Oracle
3-2. 一危 蟲焔
13
Define the outbound and inbound data
No InterfaceID Content System Type Count Periods Column cnt Comments
1 IB-001 Sellout Dev System RDB 100 million - 17 TBD
2 IB-002 Sellout Dev System RDB 13 million 17 1/22
3 IB-003 Parameter Dev System RDB 2 5
inbound
No InterfaceID Content System Type Count Periods Column cnt Comments
1 OB-001 Sellout Op System RDB
100
million
- 17 TBD
2 OB-002 Sellout Op System RDB 13 million 17 1/22
outbound
3-3. ろ れ
14
Div Value
Cluster 3
Worker 覯覲 1
Executor-count 覯覲 3
Executor-core 4
Executor-memory 10
企 PC #1
Executor
Core: 4
Mem: 10
Worker #1
Worker #2, .
CPU: 16貊 MEM: 40G 
襷ろ PC
Worker #1
MEM: 5G 
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
企 PC #1
Executor
Core: 4
Mem: 10
Worker #1
Worker #2, .
CPU: 16貊 MEM: 40G 
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
企 PC #1
Executor
Core: 4
Mem: 10
Worker #1
Worker #2, .
CPU: 16貊 MEM: 40G 
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
Executor
Core: 4
Mem: 10
3-4. ろ 覦 ろ
15
(Case #1) 覲豌襴 貊 覩   (貊ろ)
spark-submit --class com.spark.c10_dataTransfer.basicDataTransfer sparkProgramming-spark-1.0.jar
一 覩
3-4. ろ 覦 ろ
16
(Case #1) 覲豌襴 貊 覩   (蟆郁骸)
3-4. ろ 覦 ろ
17
(Case #2) 覲豌襴 貊   (貊ろ)
spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar
一 伎 100 れ
3-4. ろ 覦 ろ
18
(Case #2) 覲豌襴 貊   (蟆郁骸)
3-4. ろ 覦 ろ
19
(Case #3) 覲豌襴 貊   (貊ろ)
spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar
一 伎 1000 れ
3-4. ろ 覦 ろ
20
(Case #3) 覲豌襴 貊   (蟆郁骸)
3-4. ろ 覦 ろ
21
(Case #4) 覲豌襴 貊   (貊ろ)
spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar
一 伎 10 れ
3-4. ろ 覦 ろ
22
(Case #4) 覲豌襴 貊   (蟆郁骸)
4. 蟆磯
23
4. 蟆磯
 Spark 一危 覿り鍵 覦 ロ蠍  貊 
覲豌襴    レ 螳ロ.
(, れ Band 蠍磯ゼ  一 伎 螻り )
蟲覿 一危 (1500襷蟇, 2.6GB) 觜螻
覲豌襴 覩   7覿 No 一
  (一 伎 100) 3.7覿 一 蠍 100
  (一 伎 1000) 12覿 一 蠍 1000
  (一 伎 10) 16覿 一 蠍 100
Thank you
25
End of Document
Ad

Recommended

PostgreSQL Deep Internal
PostgreSQL Deep Internal
EXEM
Tajo TPC-H Benchmark Test on AWS
Tajo TPC-H Benchmark Test on AWS
Gruter
Airflow襯 伎 一危 Workflow 蟯襴
Airflow襯 伎 一危 Workflow 蟯襴
YoungHeon (Roy) Kim
Web Analytics at Scale with Elasticsearch @ naver.com - Part 2 - Lessons Learned
Web Analytics at Scale with Elasticsearch @ naver.com - Part 2 - Lessons Learned
Jungsu Heo
Web Analytics at Scale with Elasticsearch @ naver.com - Part 1
Web Analytics at Scale with Elasticsearch @ naver.com - Part 1
Jungsu Heo
Data discovery & metadata management (amundsen installation)
Data discovery & metadata management (amundsen installation)
谿曙
Java 豐覲伎襯 hadoop れ
Java 豐覲伎襯 hadoop れ
HyeonSeok Choi
Custom DevOps Monitoring System in MelOn (with InfluxDB + Telegraf + Grafana)
Custom DevOps Monitoring System in MelOn (with InfluxDB + Telegraf + Grafana)
Seungmin Yu
[OpenInfra Days Korea 2018] Day 2 - E1: ルれ企 - OpenStack 譟願鍵
[OpenInfra Days Korea 2018] Day 2 - E1: ルれ企 - OpenStack 譟願鍵
OpenStack Korea Community
[OpenInfra Days Korea 2018] Day 2 - CEPH 伎襯 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 伎襯 Object Storage Performance T...
OpenStack Korea Community
introduce of Hadoop map reduce
introduce of Hadoop map reduce
Daeyong Shin
殊ろ煙豺 企ろ磯 蟇伎 一危 伎蠍
殊ろ煙豺 企ろ磯 蟇伎 一危 伎蠍
ル 蟾
[Pgday.Seoul 2018] Greenplum 碁 覿 り
[Pgday.Seoul 2018] Greenplum 碁 覿 り
PgDay.Seoul
Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014
Gruter
TestDFSIO
TestDFSIO
hhyin
data platform on kubernetes
data platform on kubernetes
谿曙
2.apache spark れ
2.apache spark れ
Data platform data pipeline(Airflow, Kubernetes)
Data platform data pipeline(Airflow, Kubernetes)
谿曙
Grafana Review
Grafana Review
Sangmo Goo
Terasort
Terasort
hhyin
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
Alluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-Cloud
Jinwook Chung
Spark 螳 1覿
Spark 螳 1覿
Jinho Yoo
20141029 2.5 hiveれ 覦
20141029 2.5 hiveれ 覦
Tae Young Lee
觜一危 覿 ろ 2 襦蠏碁覦 : 一危 豌襴覿 襾語蟾讌
觜一危 覿 ろ 2 襦蠏碁覦 : 一危 豌襴覿 襾語蟾讌
Fluentd with MySQL
Fluentd with MySQL
I Goo Lee
4.1 狩語ろ語 覿
4.1 狩語ろ語 覿
Mungyu Choi
AWS 蟆曙 MySQL Infra り蠍-2覿.覲碁
AWS 蟆曙 MySQL Infra り蠍-2覿.覲碁
I Goo Lee
[磯襭] 蟆 讌 ろ豎_2_蟲 (Game Engine Architecture Chapter.2-Tools)
[磯襭] 蟆 讌 ろ豎_2_蟲 (Game Engine Architecture Chapter.2-Tools)
MinGeun Park
Ndc2011 焔 レ__一危磯伎_ろ豎_蟲豢_覦_螳覦_螳企
Ndc2011 焔 レ__一危磯伎_ろ豎_蟲豢_覦_螳覦_螳企
cranbe95

More Related Content

What's hot (20)

[OpenInfra Days Korea 2018] Day 2 - E1: ルれ企 - OpenStack 譟願鍵
[OpenInfra Days Korea 2018] Day 2 - E1: ルれ企 - OpenStack 譟願鍵
OpenStack Korea Community
[OpenInfra Days Korea 2018] Day 2 - CEPH 伎襯 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 伎襯 Object Storage Performance T...
OpenStack Korea Community
introduce of Hadoop map reduce
introduce of Hadoop map reduce
Daeyong Shin
殊ろ煙豺 企ろ磯 蟇伎 一危 伎蠍
殊ろ煙豺 企ろ磯 蟇伎 一危 伎蠍
ル 蟾
[Pgday.Seoul 2018] Greenplum 碁 覿 り
[Pgday.Seoul 2018] Greenplum 碁 覿 り
PgDay.Seoul
Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014
Gruter
TestDFSIO
TestDFSIO
hhyin
data platform on kubernetes
data platform on kubernetes
谿曙
2.apache spark れ
2.apache spark れ
Data platform data pipeline(Airflow, Kubernetes)
Data platform data pipeline(Airflow, Kubernetes)
谿曙
Grafana Review
Grafana Review
Sangmo Goo
Terasort
Terasort
hhyin
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
Alluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-Cloud
Jinwook Chung
Spark 螳 1覿
Spark 螳 1覿
Jinho Yoo
20141029 2.5 hiveれ 覦
20141029 2.5 hiveれ 覦
Tae Young Lee
觜一危 覿 ろ 2 襦蠏碁覦 : 一危 豌襴覿 襾語蟾讌
觜一危 覿 ろ 2 襦蠏碁覦 : 一危 豌襴覿 襾語蟾讌
Fluentd with MySQL
Fluentd with MySQL
I Goo Lee
4.1 狩語ろ語 覿
4.1 狩語ろ語 覿
Mungyu Choi
AWS 蟆曙 MySQL Infra り蠍-2覿.覲碁
AWS 蟆曙 MySQL Infra り蠍-2覿.覲碁
I Goo Lee
[OpenInfra Days Korea 2018] Day 2 - E1: ルれ企 - OpenStack 譟願鍵
[OpenInfra Days Korea 2018] Day 2 - E1: ルれ企 - OpenStack 譟願鍵
OpenStack Korea Community
[OpenInfra Days Korea 2018] Day 2 - CEPH 伎襯 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 伎襯 Object Storage Performance T...
OpenStack Korea Community
introduce of Hadoop map reduce
introduce of Hadoop map reduce
Daeyong Shin
殊ろ煙豺 企ろ磯 蟇伎 一危 伎蠍
殊ろ煙豺 企ろ磯 蟇伎 一危 伎蠍
ル 蟾
[Pgday.Seoul 2018] Greenplum 碁 覿 り
[Pgday.Seoul 2018] Greenplum 碁 覿 り
PgDay.Seoul
Vectorized processing in_a_nutshell_DeView2014
Vectorized processing in_a_nutshell_DeView2014
Gruter
TestDFSIO
TestDFSIO
hhyin
data platform on kubernetes
data platform on kubernetes
谿曙
2.apache spark れ
2.apache spark れ
Data platform data pipeline(Airflow, Kubernetes)
Data platform data pipeline(Airflow, Kubernetes)
谿曙
Grafana Review
Grafana Review
Sangmo Goo
Terasort
Terasort
hhyin
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
Alluxio: Data Orchestration on Multi-Cloud
Alluxio: Data Orchestration on Multi-Cloud
Jinwook Chung
Spark 螳 1覿
Spark 螳 1覿
Jinho Yoo
20141029 2.5 hiveれ 覦
20141029 2.5 hiveれ 覦
Tae Young Lee
觜一危 覿 ろ 2 襦蠏碁覦 : 一危 豌襴覿 襾語蟾讌
觜一危 覿 ろ 2 襦蠏碁覦 : 一危 豌襴覿 襾語蟾讌
Fluentd with MySQL
Fluentd with MySQL
I Goo Lee
4.1 狩語ろ語 覿
4.1 狩語ろ語 覿
Mungyu Choi
AWS 蟆曙 MySQL Infra り蠍-2覿.覲碁
AWS 蟆曙 MySQL Infra り蠍-2覿.覲碁
I Goo Lee

Similar to Spark performance tuning (20)

[磯襭] 蟆 讌 ろ豎_2_蟲 (Game Engine Architecture Chapter.2-Tools)
[磯襭] 蟆 讌 ろ豎_2_蟲 (Game Engine Architecture Chapter.2-Tools)
MinGeun Park
Ndc2011 焔 レ__一危磯伎_ろ豎_蟲豢_覦_螳覦_螳企
Ndc2011 焔 レ__一危磯伎_ろ豎_蟲豢_覦_螳覦_螳企
cranbe95
[D2 COMMUNITY] Spark User Group - ろ襯 牛 ル 企螻 れ
[D2 COMMUNITY] Spark User Group - ろ襯 牛 ル 企螻 れ
NAVER D2
Remote-debugging-based-on-notrace32-20130619-1900
Remote-debugging-based-on-notrace32-20130619-1900
Samsung Electronics
DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit)
ymtech
Optane DC Persistent Memory(DCPMM) 焔 ろ
Optane DC Persistent Memory(DCPMM) 焔 ろ
SANG WON PARK
Spark sql
Spark sql
DPDK
DPDK
ymtech
[232] メ釈梶釈メ求=求堰メ=氏 釈≡
[232] メ釈梶釈メ求=求堰メ=氏 釈≡
NAVER D2
Hadoop cluster os_tuning_v1.0_20170106_mobile
Hadoop cluster os_tuning_v1.0_20170106_mobile
豈企 危エ覲企 Eclipse螳覦
豈企 危エ覲企 Eclipse螳覦
cho hyun jong
Android Native Module 朱 螳覦蠍
Android Native Module 朱 螳覦蠍
hanbeom Park
01.螳覦蟆 蟲♀
01.螳覦蟆 蟲♀
Hankyo
What is spark
What is spark
jaeho kang
Pyspark Demo1,Demo2 襴
Pyspark Demo1,Demo2 襴
覩手鍵 覦
[NDC2015] 語 企 襦朱 螳ロ 貊れ JYP 炎鍵 - 殊企 蟆 覦壱 襦朱 蠍
[NDC2015] 語 企 襦朱 螳ロ 貊れ JYP 炎鍵 - 殊企 蟆 覦壱 襦朱 蠍
Jaeseung Ha
襦企 觜: ろ 瑚
襦企 觜: ろ 瑚
Leonardo YongUk Kim
DB Monitoring 螳 覦 (覦覈蠏)
DB Monitoring 螳 覦 (覦覈蠏)
WhaTap Labs
干閣珂喝碁觚.沿糸韓
干閣珂喝碁觚.沿糸韓
kwbak
誤 れ覦蟆
誤 れ覦蟆
Devgear
[磯襭] 蟆 讌 ろ豎_2_蟲 (Game Engine Architecture Chapter.2-Tools)
[磯襭] 蟆 讌 ろ豎_2_蟲 (Game Engine Architecture Chapter.2-Tools)
MinGeun Park
Ndc2011 焔 レ__一危磯伎_ろ豎_蟲豢_覦_螳覦_螳企
Ndc2011 焔 レ__一危磯伎_ろ豎_蟲豢_覦_螳覦_螳企
cranbe95
[D2 COMMUNITY] Spark User Group - ろ襯 牛 ル 企螻 れ
[D2 COMMUNITY] Spark User Group - ろ襯 牛 ル 企螻 れ
NAVER D2
Remote-debugging-based-on-notrace32-20130619-1900
Remote-debugging-based-on-notrace32-20130619-1900
Samsung Electronics
DPDK (Data Plane Development Kit)
DPDK (Data Plane Development Kit)
ymtech
Optane DC Persistent Memory(DCPMM) 焔 ろ
Optane DC Persistent Memory(DCPMM) 焔 ろ
SANG WON PARK
Spark sql
Spark sql
DPDK
DPDK
ymtech
[232] メ釈梶釈メ求=求堰メ=氏 釈≡
[232] メ釈梶釈メ求=求堰メ=氏 釈≡
NAVER D2
Hadoop cluster os_tuning_v1.0_20170106_mobile
Hadoop cluster os_tuning_v1.0_20170106_mobile
豈企 危エ覲企 Eclipse螳覦
豈企 危エ覲企 Eclipse螳覦
cho hyun jong
Android Native Module 朱 螳覦蠍
Android Native Module 朱 螳覦蠍
hanbeom Park
01.螳覦蟆 蟲♀
01.螳覦蟆 蟲♀
Hankyo
What is spark
What is spark
jaeho kang
Pyspark Demo1,Demo2 襴
Pyspark Demo1,Demo2 襴
覩手鍵 覦
[NDC2015] 語 企 襦朱 螳ロ 貊れ JYP 炎鍵 - 殊企 蟆 覦壱 襦朱 蠍
[NDC2015] 語 企 襦朱 螳ロ 貊れ JYP 炎鍵 - 殊企 蟆 覦壱 襦朱 蠍
Jaeseung Ha
DB Monitoring 螳 覦 (覦覈蠏)
DB Monitoring 螳 覦 (覦覈蠏)
WhaTap Labs
干閣珂喝碁觚.沿糸韓
干閣珂喝碁觚.沿糸韓
kwbak
誤 れ覦蟆
誤 れ覦蟆
Devgear
Ad

Spark performance tuning

  • 1. Spark Performance Tuning - Part #2 (覲豌襴) 2019. 1. 22
  • 2. Contents 1. 襦 1. 郁規覦郁化 覦 覦覯 2. 企 螻谿 3. 豸° 螳覦螻旧 2. 覲碁 1. ろ 蟲焔 2. 一危 蟲焔 3. ろ れ 4. ろ 覦 ろ 3. 蟆磯 2
  • 4. 4 2-1. 郁規覦郁化 覦 覦覯 ′貊ろ 蟲 覦 一危 願 ろ Hadoop Name Node Spark Master Hive Master Resource Manger No Lv1 Lv2 Version Contents 1 Oracle Linux 7.3 OS 2 Hadoop 2.7.6 Distributed Storage 3 Spark 2.2.0 Distributed Processing 4 Hive 2.3.3 Supprt SQL (Master, Only master) 5 MariaDB 10.2.11 RDB (Master, Only master) 6 Oracle Client 18.3.0.0. 0 Oracle DB client Maria DB Hadoop DataNode Spark Worker NodeManager Hadoop DataNode Spark Worker NodeManager Hadoop DataNode Spark Worker NodeManager Service configurationHadoop Ecosystem Secondary Name Node p-master hadoop1 hadoop2 hadoop3
  • 5. 5 2-1. 郁規覦郁化 覦 覦覯 ′貊ろ 蟲 覦 一危 願 ろ 覿り鍵 豌襴 ロ蠍 Source Data Spark Output Data 襷れれ 一危 願 襷れれ Oracle DB Spark Oracle DB S/W H/W S/W H/W S/W H/W
  • 6. 6 2-2. 企 螻谿 一 企ろ磯ゼ 牛 覦壱螻 螳 碁 覃覈襴 蟲 譴願鍵 Spark 一危磯ゼ 一 牛 覿螻 螳 一 Executor °. 豌 一危 一 一#1 一#2 一#3 Executor1 Executor2 Executor3
  • 7. 7 2-2. 企 螻谿 1. Spark 蟲覦 Spark-submit Spark 豢 Sparkcontext 殊企 襦蠏碁 ろ 企ろ 襷ろ 碁 WORKER NODE襯 牛 Executor螳 ろ覃 襦蠏碁 蟲
  • 8. 8 2-2. 企 螻谿 2. 襷 一危郁 一螳 覦一 覦. (蠏碁9 , 蠏) 一 郁鍵 一#1 一#2 一#3 Executor1 Executor2 Executor3 一#1 一#2 一#3 所鍵
  • 9. 9 2-2. 企 螻谿 3. 譯殊 れ (蠍磯蓋 BASE れ) SPARK-ENV.SH (壱 れ) SPARK-DEFAULT.CONF spark-env.sh spark-defaults.conf
  • 10. 10 2-3. 豸° 螳覦螻旧 4. 螳覦 襦語 覿襯 譴覿襯 一覓 谿 螻襴 襦 螻 襴 / 襦 螻 覿 覿 蟲 , ろ 覿 蟲 覿螻殊 覿螻殊 豢, , ろ ろ豎 覿覈 覿覈/ろ り 覿覈 り 覿 襴, 一危, 螻襴讀 り 覦 蟆讀 覿覈 ろ り 一危磯伎, I/F 覦 ろ り 企, 襦蠏碁, 覃 誤壱伎 覈襦/ 蟲豢 ろ 螳覦 覃, DB, 覦一, I/F 螳覦 れ, DB ろ /牛 ろ ろ 襴 覦 覦 牛 ろ 螻/谿/蟆郁骸牛ろ 螻 / 襴 襴 覦 ろ 危 覦 ろ 危 危螻 危 螻 蟲 /伎 蟲 れ 覦 襷る伎 襷る伎, 伎 襷る伎 譬襭 襦 譬襭 譬襭覲願 覦 語 襭覲願, 蟆語
  • 12. 3-1. ろ 蟲焔 12 Input DB Output DB 192.168.110.112 192.168.110.111 Hadoop Name Node Spark Master Hive Master Resource Manger Maria DB Hadoop DataNode Spark Worker NodeManager Hadoop DataNode Spark Worker NodeManager Hadoop DataNode Spark Worker NodeManager Hadoop Ecosystem Secondary Name Node p-master hadoop1 hadoop2 hadoop3 192.168.110.117 192.168.110.118 192.168.110.119 192.168.110.120 Oracle Oracle
  • 13. 3-2. 一危 蟲焔 13 Define the outbound and inbound data No InterfaceID Content System Type Count Periods Column cnt Comments 1 IB-001 Sellout Dev System RDB 100 million - 17 TBD 2 IB-002 Sellout Dev System RDB 13 million 17 1/22 3 IB-003 Parameter Dev System RDB 2 5 inbound No InterfaceID Content System Type Count Periods Column cnt Comments 1 OB-001 Sellout Op System RDB 100 million - 17 TBD 2 OB-002 Sellout Op System RDB 13 million 17 1/22 outbound
  • 14. 3-3. ろ れ 14 Div Value Cluster 3 Worker 覯覲 1 Executor-count 覯覲 3 Executor-core 4 Executor-memory 10 企 PC #1 Executor Core: 4 Mem: 10 Worker #1 Worker #2, . CPU: 16貊 MEM: 40G 襷ろ PC Worker #1 MEM: 5G Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 企 PC #1 Executor Core: 4 Mem: 10 Worker #1 Worker #2, . CPU: 16貊 MEM: 40G Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 企 PC #1 Executor Core: 4 Mem: 10 Worker #1 Worker #2, . CPU: 16貊 MEM: 40G Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10 Executor Core: 4 Mem: 10
  • 15. 3-4. ろ 覦 ろ 15 (Case #1) 覲豌襴 貊 覩 (貊ろ) spark-submit --class com.spark.c10_dataTransfer.basicDataTransfer sparkProgramming-spark-1.0.jar 一 覩
  • 16. 3-4. ろ 覦 ろ 16 (Case #1) 覲豌襴 貊 覩 (蟆郁骸)
  • 17. 3-4. ろ 覦 ろ 17 (Case #2) 覲豌襴 貊 (貊ろ) spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar 一 伎 100 れ
  • 18. 3-4. ろ 覦 ろ 18 (Case #2) 覲豌襴 貊 (蟆郁骸)
  • 19. 3-4. ろ 覦 ろ 19 (Case #3) 覲豌襴 貊 (貊ろ) spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar 一 伎 1000 れ
  • 20. 3-4. ろ 覦 ろ 20 (Case #3) 覲豌襴 貊 (蟆郁骸)
  • 21. 3-4. ろ 覦 ろ 21 (Case #4) 覲豌襴 貊 (貊ろ) spark-submit --class com.spark.c10_dataTransfer.partitionDataTransfer sparkProgramming-spark-1.0.jar 一 伎 10 れ
  • 22. 3-4. ろ 覦 ろ 22 (Case #4) 覲豌襴 貊 (蟆郁骸)
  • 24. 4. 蟆磯 Spark 一危 覿り鍵 覦 ロ蠍 貊 覲豌襴 レ 螳ロ. (, れ Band 蠍磯ゼ 一 伎 螻り ) 蟲覿 一危 (1500襷蟇, 2.6GB) 觜螻 覲豌襴 覩 7覿 No 一 (一 伎 100) 3.7覿 一 蠍 100 (一 伎 1000) 12覿 一 蠍 1000 (一 伎 10) 16覿 一 蠍 100