ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
???? ??? ? ?? ??? ??
: Tajo on AWS
??? ?? ???
(?) ???
About me
? Gruter / BigData Engineer
? Apache Tajo Committer
? jhjung@gruter.com
? http://blrunner.com
? The author of Hadoop book
??
1. ?? ???? ???
2. ?? ??? ?? ??
3. Tajo
4. Locket ?? ??
1. ?? ???? ???
5
??? ?? ?? ??
1. ?? ???? ???
6
1. ?? ???? ???
- PV / UV
- DAU / MAU
- PU / ARPU
- NRU / PR
- etc¡­
7
1. ?? ???? ???
????? ???? ??? ???? ???
http://www.slideshare.net/yongho/ss-32267675
8
1. ?? ???? ???
?? ?? ???¡­.
9
1. ?? ???? ???
- ?? ?? ??
?????? ??? ???! ??????!
?? ??? ?????! ??????!
- ??? ?? ???? ? ??? ??
- ?? ??
- ?? ??
???? ??? ?? ?? ??
??? ?? ?? ??? ??? 1?
??? ?? ??
2. ?? ??? ?? ??
11
2.1 RDBMS ??
RDBMS? SQL ??? ???? ???, ?? ??? ??? ??? ?????.
???? ???? ??? ???, ??? ??? ???? ????.
??) WAS + MySQL
Server
Apache Httpd
mod_php
MySQL
Server
Apache Httpd
mod_php
Server
Apache Httpd
mod_php
12
2.2 NoSQL ??
???? ???? ??? ????, JSON ???? ???? ??? ? ????.
NoSQL ?? ??? ???, ?? ???? ??? ??? ? ????.
??? ?? ??? ????, ??? ???? ?? ?? ??? ?????.
??) WAS + Redis + MongoDB
Server
Apache Httpd
mod_php
Predis
Server
Apache Httpd
mod_php
Predis
Server
Apache Httpd
mod_php
Predis
Redis
MongoDB
13
2.3 Hadoop ??
??? ????? ??? ??? ?, ??? ??? ? ???, ??? ??? ?????
? ?????. ?? DBMS? ??? ?? ??? ETL ??? ????, ?? ?? ???
? ?? ????? ???.
WAS
LOGS
Flume Agent
- Source : Spooling Directory
- Sink : Kafka
Flume Agent
- Source : Spooling Directory
- Sink : Kafka
WAS
LOGS
Kafka
Broker
Broker
Broker
¡­
Hadoop
??) Flume + Kafka + Hadoop
14
?? ? ? ???
???¡­
15
16
?? ?? ???
??? ??? ??????
???? ????
??? ?? ?????
17
?? ??? ?? ????
??? ?? ????
??? ?????? ???
??? ??, ??? ?? ????
18
EC2
2.4 Tajo on AWS
EC2 ?????? ??? ??? S3? ????, Tajo? ???? ?????.
S3? ????? ????, ??? ??? ???, CPU ??? ?? ????.
?20TB ? ??? ? 700?? (?? ? 82??)
Apache Httpd
S3
EC2
Apache Httpd
EC2
Apache Httpd
Tajo (EMR or EC2)
3. Tajo
20
3.1 Tajo overview
? ?? ??? ???? ????? ???
? 2013? ??? ?????, 2014? ??? ??? ????
? ANSI SQL ??
? ?? ??
¨C ?? ??? ?? ?? ?? (Not MapReduce)
¨C ??? ?? ??? ?? ? ???? ??
¨C ??? ?? ???? ETL ?? ??
¨C ?? ????? ?? ???? ????? ?? ??
21
3.2 Tajo Architecture
Master Server (HA)
Client
JDBC TSql Web UI
CatalogStore
DBMS
HCatalog
Submit a Query
Manage metadata
Allocate a query
Send task
& monitor
Send task
& monitor
Slave Server
TajoWorker
QueryMaster
Local
FileSystem
HDFS
Local Query
Engine
StorageManager
Slave Server
TajoWorker
QueryMaster
Local
FileSystem
HDFS
Local Query
Engine
StorageManager
Slave Server
TajoWorker
QueryMaster
Local
FileSystem
HDFS
Local Query
Engine
StorageManager
TajoMaster
TajoMaster
22
3.3 Tajo Comparative Advantages
? ANSI SQL ??
¨C ???? ??? ? ?? ???? ?? ??
¨C ??? SQL? ??, Oracle? PostgreSQL? ??
? ???? ???
¨C ??? ???? ?? ???
¨C ?? ?????? 500??? ??
? ??? ?? ?? ??
¨C ????: ??? ???? 100MB/sec (SATA ??)
¨C ??? ? ??? ??? ?? ??, 10 ??? 100?? 1TB ?? ??
¨C ??? ?? ?, 10??? 5? ??? 1TB ?? ??
23
3.4 Nested ? JSON ?? ??
JSON ?? ??? HDFS? S3? ??? ?, Tajo?? external table ? ???? ???
?, SELECT ??? ??? ? ????. ? ?? ??? ??? ???? ????? SQL
??? ?????.
?? ???
??? ??
SQL ?
24
3.5 AWS ??
- EMR ? S3 ? ?? ???
- ???? ??? ?? ??? S3 ?? Fix
- EMR bootstrap ??
- EMR ? ???
? http://www.gruter.com/blog/setting-up-a-tajo-cluster-on-amazon-emr/
-EMR bootstrap
? https://github.com/awslabs/emr-bootstrap-actions/tree/master/taj
25
3.6 Pluggable Storage Layer
Hadoop, S3 ? ??? ???? ???? ?? ??? ? ????.
??? ???? ????? ?? ??? ?????.
TajoMaster
HDFS HBase
AWS
S3
Local
Storage
OpenStack
Swift
TajoWorker TajoWorker TajoWorker TajoWorker TajoWorker
Pluggable Storage Layer
26
3.7 ????(Fault Tolerance) ? ???? ?? ?? ???
???? ???? ????? ????, ??? ??? ?????
?????.
1. EC2 ????: c3.4xlarge (vCPU: 16, ???: 30GiB, SSD????: 160GB x 2)
2. Tajo ??: 0.9.1-SNAPSHOT ??, 1 master, 16 worker
3. ????: TPC-H 1TB
AWS ???? ??
0
2000
4000
6000
8000
10000
12000
14000
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
sec.
hive
presto
spark
tajo
Tajo? Hive ?? ??4?, Presto ?? ??1.5? ??.
Spark? ??, ??? ???? ??? ?? ??.
27
3.8 ??? ?? ???
????? ??? ?? ???? ?? ??? ?????.
1. EC2 ????: c3.4xlarge (vCPU: 16, ???: 30GiB, SSD????: 160GB x 2)
2. Tajo ??: 0.9.1-SNAPSHOT ??
3. ????: TPC-H 1TB
AWS ???? ??
0
1000
2000
3000
4000
5000
6000
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
sec.
16 workers
8 workers
4 workers
4 -> 8?? ??? 1.6? ?? ?? ??
4 -> 16?? ??? 2.4? ?? ?? ??
?? ???? ???? 500? ???? ?? ??
4. Locket ?? ??
29
4.1 About Locket
Locket? ????? Best Apps of 2014? ??? ???
lock screen App? ????? ?????.
30
4.2 Needs
- ??? ??? ??? ??? ??? ???? Join ?????.
- ???? ?? ??? ??? ??? ? ??? ???.
31
Amazon EC2 Cloud
Tajo Cluster
TajoWorker
TajoMaster
4.3 Workloads
?? ??? ??? ?? EC2 spot ???? 10?? ???? corhot ??? ?????
TajoWorker
TajoWorker
TajoWorker
S3
Source Data Tajo Tables
RDS
MySQL
1. ?? ??
2. ?? ?? ??
external ??? ??
3. Cohort ?? ?? ??
4. ?? ?? ?? ??
5. ?? ??? ??
32
4.4 ? ???? (TCO)
? ?? ??
¨C EC2 c3.2xlarge ???? 10?? ??GB ??? ? 40? ?? ??
? ?????.
? EC2 ????
¨C ??: c3.2xlarge
¨C ??: CPU 8 core, ??? 15GB, ???: SSD 80GB x 2
¨C ??: ??? 0.420 ?? (??: 489.85?)
? ? ?? ??
¨C 0.420 * 10 = 4.20 ?? (??: 4898.5?)
Q & A
34
Welcome to Tajo
1. Homepage
? http://tajo.apache.org/
2. ?? ?? ??? ??
- ?? ??: https://groups.google.com/forum/#!forum/tajo-user-kr
- ????: https://www.facebook.com/groups/tajokorea/
3. ?? ?? ??? ????
- http://bit.ly/1Ir417T
4. ?? ?? ???
- http://www.gruter.com/blog/tag/apache-tajo/
- http://teamblog.gruter.com/tag/apache-tajo/
- http://blrunner.com/category/Development/Tajo
35
?? ??
- NDC 2015 Cookie Run Log System
? https://speakerdeck.com/junggun_lim/ndc-2015-cookie-run-log-system
- Redis, MongoDB ??? MySQL ? ???? ??? ?????? ?????? ??
??? ??
? http://www.slideshare.net/lqez/redis-mongodb-mysql
GRUTER: YOUR PARTNER
IN THE BIG DATA REVOLUTION
Phone +82-2-508-5911
Fax +82-2-508-5912
E-mail contact@gruter.com
Web www.gruter.com

More Related Content

???? ??? ? ?? ??? ?? : Tajo on AWS

  • 1. ???? ??? ? ?? ??? ?? : Tajo on AWS ??? ?? ??? (?) ???
  • 2. About me ? Gruter / BigData Engineer ? Apache Tajo Committer ? jhjung@gruter.com ? http://blrunner.com ? The author of Hadoop book
  • 3. ?? 1. ?? ???? ??? 2. ?? ??? ?? ?? 3. Tajo 4. Locket ?? ??
  • 5. 5 ??? ?? ?? ?? 1. ?? ???? ???
  • 6. 6 1. ?? ???? ??? - PV / UV - DAU / MAU - PU / ARPU - NRU / PR - etc¡­
  • 7. 7 1. ?? ???? ??? ????? ???? ??? ???? ??? http://www.slideshare.net/yongho/ss-32267675
  • 8. 8 1. ?? ???? ??? ?? ?? ???¡­.
  • 9. 9 1. ?? ???? ??? - ?? ?? ?? ?????? ??? ???! ??????! ?? ??? ?????! ??????! - ??? ?? ???? ? ??? ?? - ?? ?? - ?? ?? ???? ??? ?? ?? ?? ??? ?? ?? ??? ??? 1? ??? ?? ??
  • 10. 2. ?? ??? ?? ??
  • 11. 11 2.1 RDBMS ?? RDBMS? SQL ??? ???? ???, ?? ??? ??? ??? ?????. ???? ???? ??? ???, ??? ??? ???? ????. ??) WAS + MySQL Server Apache Httpd mod_php MySQL Server Apache Httpd mod_php Server Apache Httpd mod_php
  • 12. 12 2.2 NoSQL ?? ???? ???? ??? ????, JSON ???? ???? ??? ? ????. NoSQL ?? ??? ???, ?? ???? ??? ??? ? ????. ??? ?? ??? ????, ??? ???? ?? ?? ??? ?????. ??) WAS + Redis + MongoDB Server Apache Httpd mod_php Predis Server Apache Httpd mod_php Predis Server Apache Httpd mod_php Predis Redis MongoDB
  • 13. 13 2.3 Hadoop ?? ??? ????? ??? ??? ?, ??? ??? ? ???, ??? ??? ????? ? ?????. ?? DBMS? ??? ?? ??? ETL ??? ????, ?? ?? ??? ? ?? ????? ???. WAS LOGS Flume Agent - Source : Spooling Directory - Sink : Kafka Flume Agent - Source : Spooling Directory - Sink : Kafka WAS LOGS Kafka Broker Broker Broker ¡­ Hadoop ??) Flume + Kafka + Hadoop
  • 14. 14 ?? ? ? ??? ???¡­
  • 15. 15
  • 16. 16 ?? ?? ??? ??? ??? ?????? ???? ???? ??? ?? ?????
  • 17. 17 ?? ??? ?? ???? ??? ?? ???? ??? ?????? ??? ??? ??, ??? ?? ????
  • 18. 18 EC2 2.4 Tajo on AWS EC2 ?????? ??? ??? S3? ????, Tajo? ???? ?????. S3? ????? ????, ??? ??? ???, CPU ??? ?? ????. ?20TB ? ??? ? 700?? (?? ? 82??) Apache Httpd S3 EC2 Apache Httpd EC2 Apache Httpd Tajo (EMR or EC2)
  • 20. 20 3.1 Tajo overview ? ?? ??? ???? ????? ??? ? 2013? ??? ?????, 2014? ??? ??? ???? ? ANSI SQL ?? ? ?? ?? ¨C ?? ??? ?? ?? ?? (Not MapReduce) ¨C ??? ?? ??? ?? ? ???? ?? ¨C ??? ?? ???? ETL ?? ?? ¨C ?? ????? ?? ???? ????? ?? ??
  • 21. 21 3.2 Tajo Architecture Master Server (HA) Client JDBC TSql Web UI CatalogStore DBMS HCatalog Submit a Query Manage metadata Allocate a query Send task & monitor Send task & monitor Slave Server TajoWorker QueryMaster Local FileSystem HDFS Local Query Engine StorageManager Slave Server TajoWorker QueryMaster Local FileSystem HDFS Local Query Engine StorageManager Slave Server TajoWorker QueryMaster Local FileSystem HDFS Local Query Engine StorageManager TajoMaster TajoMaster
  • 22. 22 3.3 Tajo Comparative Advantages ? ANSI SQL ?? ¨C ???? ??? ? ?? ???? ?? ?? ¨C ??? SQL? ??, Oracle? PostgreSQL? ?? ? ???? ??? ¨C ??? ???? ?? ??? ¨C ?? ?????? 500??? ?? ? ??? ?? ?? ?? ¨C ????: ??? ???? 100MB/sec (SATA ??) ¨C ??? ? ??? ??? ?? ??, 10 ??? 100?? 1TB ?? ?? ¨C ??? ?? ?, 10??? 5? ??? 1TB ?? ??
  • 23. 23 3.4 Nested ? JSON ?? ?? JSON ?? ??? HDFS? S3? ??? ?, Tajo?? external table ? ???? ??? ?, SELECT ??? ??? ? ????. ? ?? ??? ??? ???? ????? SQL ??? ?????. ?? ??? ??? ?? SQL ?
  • 24. 24 3.5 AWS ?? - EMR ? S3 ? ?? ??? - ???? ??? ?? ??? S3 ?? Fix - EMR bootstrap ?? - EMR ? ??? ? http://www.gruter.com/blog/setting-up-a-tajo-cluster-on-amazon-emr/ -EMR bootstrap ? https://github.com/awslabs/emr-bootstrap-actions/tree/master/taj
  • 25. 25 3.6 Pluggable Storage Layer Hadoop, S3 ? ??? ???? ???? ?? ??? ? ????. ??? ???? ????? ?? ??? ?????. TajoMaster HDFS HBase AWS S3 Local Storage OpenStack Swift TajoWorker TajoWorker TajoWorker TajoWorker TajoWorker Pluggable Storage Layer
  • 26. 26 3.7 ????(Fault Tolerance) ? ???? ?? ?? ??? ???? ???? ????? ????, ??? ??? ????? ?????. 1. EC2 ????: c3.4xlarge (vCPU: 16, ???: 30GiB, SSD????: 160GB x 2) 2. Tajo ??: 0.9.1-SNAPSHOT ??, 1 master, 16 worker 3. ????: TPC-H 1TB AWS ???? ?? 0 2000 4000 6000 8000 10000 12000 14000 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 sec. hive presto spark tajo Tajo? Hive ?? ??4?, Presto ?? ??1.5? ??. Spark? ??, ??? ???? ??? ?? ??.
  • 27. 27 3.8 ??? ?? ??? ????? ??? ?? ???? ?? ??? ?????. 1. EC2 ????: c3.4xlarge (vCPU: 16, ???: 30GiB, SSD????: 160GB x 2) 2. Tajo ??: 0.9.1-SNAPSHOT ?? 3. ????: TPC-H 1TB AWS ???? ?? 0 1000 2000 3000 4000 5000 6000 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 sec. 16 workers 8 workers 4 workers 4 -> 8?? ??? 1.6? ?? ?? ?? 4 -> 16?? ??? 2.4? ?? ?? ?? ?? ???? ???? 500? ???? ?? ??
  • 29. 29 4.1 About Locket Locket? ????? Best Apps of 2014? ??? ??? lock screen App? ????? ?????.
  • 30. 30 4.2 Needs - ??? ??? ??? ??? ??? ???? Join ?????. - ???? ?? ??? ??? ??? ? ??? ???.
  • 31. 31 Amazon EC2 Cloud Tajo Cluster TajoWorker TajoMaster 4.3 Workloads ?? ??? ??? ?? EC2 spot ???? 10?? ???? corhot ??? ????? TajoWorker TajoWorker TajoWorker S3 Source Data Tajo Tables RDS MySQL 1. ?? ?? 2. ?? ?? ?? external ??? ?? 3. Cohort ?? ?? ?? 4. ?? ?? ?? ?? 5. ?? ??? ??
  • 32. 32 4.4 ? ???? (TCO) ? ?? ?? ¨C EC2 c3.2xlarge ???? 10?? ??GB ??? ? 40? ?? ?? ? ?????. ? EC2 ???? ¨C ??: c3.2xlarge ¨C ??: CPU 8 core, ??? 15GB, ???: SSD 80GB x 2 ¨C ??: ??? 0.420 ?? (??: 489.85?) ? ? ?? ?? ¨C 0.420 * 10 = 4.20 ?? (??: 4898.5?)
  • 33. Q & A
  • 34. 34 Welcome to Tajo 1. Homepage ? http://tajo.apache.org/ 2. ?? ?? ??? ?? - ?? ??: https://groups.google.com/forum/#!forum/tajo-user-kr - ????: https://www.facebook.com/groups/tajokorea/ 3. ?? ?? ??? ???? - http://bit.ly/1Ir417T 4. ?? ?? ??? - http://www.gruter.com/blog/tag/apache-tajo/ - http://teamblog.gruter.com/tag/apache-tajo/ - http://blrunner.com/category/Development/Tajo
  • 35. 35 ?? ?? - NDC 2015 Cookie Run Log System ? https://speakerdeck.com/junggun_lim/ndc-2015-cookie-run-log-system - Redis, MongoDB ??? MySQL ? ???? ??? ?????? ?????? ?? ??? ?? ? http://www.slideshare.net/lqez/redis-mongodb-mysql
  • 36. GRUTER: YOUR PARTNER IN THE BIG DATA REVOLUTION Phone +82-2-508-5911 Fax +82-2-508-5912 E-mail contact@gruter.com Web www.gruter.com