ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
1
pgday.Seoul 2018
Greenplum? ?? ?? ??
??? ??? ?? ?? ?? ?? ???
2018. 11. 03
Pivotal Korea
???
2
??? ??? ??? ?? ??
?? 3??? ??? ???..
??? ??? ??? ??? ??..
????? ??? ????..
?? ???? ?? ??? ??..
??? ??? ?? ?????..
3
?? ???, ?? ??, ??? ?? ??
4
PostgreSQL? ?????
100 TB100 GB 1 TB 10 TB
5
?? ?? PostgreSQL? ??? ?? ???
??? ???? ??? ????
10 TB 10 TB 10 TB 10 TB 10 TB 10 TB 10 TB 10 TB 10 TB 10 TB
6
2005? Greenplum? ??!
Network
Interconnect
¡­
Segment
Node
Standby MasterMaster Node
SEQ SCAN HASH
SEQ SCAN
HASH JOIN HASH
SEQ SCAN
HASH JOIN
SEQ SCAN HASH
SEQ SCAN
HASH JOIN HASH
SEQ SCAN
HASH JOIN
? ?? ??
? HA ??? ?? ?? ??
? ??? ? ??? ??
? ?? ??? ??, ??, ??
? ???? ? ???? ??? ??? ??
? ?? ??(File Server, Hadoop, Cloud?)? ?? ?? ??? ?? ??
? ?? ??? ??? ?? ??? ??? ???? ??
7
Greenplum? ??? ?? ??
??? ???? ?? ??? ?? ??
?? ???? ??? ?? ??? ??
PLA
N
PLA
N
PLA
N
PLA
N
???? ???? ??? ??
??? ?? ???? ?? ??
8
?????? ??? ??, ??? ?? ?? ??
Deploy Anywhere
On-premise
Private Cloud
Public Cloud
Pivotal Container
Service (PKS)
9
??1. ??? ?? ???? Greenplum? ??????
17?? ???
42?????
12
> 200TB
??? ?????
6.2?? ????
480
????????
?? ????
??? ?????? ??? ?? ???
10
??2. ??? Greenplum ??? ?? ???
HDFS
Cluster
Computing
Cloud
Storage
In-Memory
Data Grid
interconnect
master
master
System B
interconnect
master
master
System A
???? 1:1??? ??????? ???? ?? ??
? Spark?? Greenplum? ???? Read ?
Write
? ??? ??????? Spark ??? ??
? ??? ???? ??? 4.8 Hour with
Spark connector vs 15 days with JDBC
(75? ??)
Yes! ?????? ?? ?? ?? ?? ??? ????? ??
11
??3. ??? ??? ??? ??? ?? ????? ????
No. ????? ??? ????? ???? ??
REGRESSIONCLASSIFICATIONCLUSTERINGGraph GeospatialTraditional BI / ReportSQL TEXT Transformation
Structured Data
Any Workload
Any Data
??? ??? ?? ?? ?? : ??? ???? ??
??? ??? ?? ?? ???? ?? ?????? ?? ??? ???? ?? 10? ? ???
Data ?? Data ??/?? ? ?? Application ??
¡ì ??? ??? ??? ??? ?? ¡ì Large-scale 3D rain computer simulation
- 100?? ?? ?? ????? ?? ??
(? ??: 1?? ????? ?? ??)
- 100m ??? ?? ?? ??
(? ??: 2km ?? 5km ??? ?? ??)
¡ì 3D Nowcasting
- ????? ?? ???
??? ????
?? 10? ? ??
- ??? 80% ??
(? ?? 50% ??)
- Phased-array radar ?? ???
: 15? ?? ?? ??
3?? ??
(? ??: 1? ?
2?? ??)
: 30? ?? ????
- ?? ?? ?? ???
: ??? ??? ??, ??, ??,
?? ? ??
- ?? ???
- ?? ?? ???
- ?? ??? ??? ???
¨¹ ?? ?
?? ??
???
[ GPDB ?? ??? ???? ?? ??? ]
???
?? ???
???
???
??
???
- ?? ?? ???
??? ??? ?? ?? ??
(Massively Parallel Processing)
¨¹ ??? ??? ???? ?? ??? ? ??
In-Database
?? ?? ??
¡­
GPText
* source: ¡°Greenplum for Extreme Weather Predictions and Analytics at Japan¡¯s NICT¡± (https://www.youtube.com/watch?v=pjDSi1KGaDU)
12
??4. ???? ??? ??? ? ????
Yes! Greenplum? ?-?????? ?? ?? ??
In-Database Analytics
Native support
?? Language
???? ????? ??? ??? ??
?? ?? ??
DW
???? ???? ??
??? ??
?? ??
?? ??? ?? ?? ?? ??? ?? ??? ?? ??
?????
(summary)
?????
(raw data)
??
??
??
??
??
????
??
???? ??
???? ??
¡­
? 50?? ??? ?? ??
13
1.??? ??? ??
2.??? ??? ????
3.?? ???? ??? ??? ??? ?? ??
??? ? ?? ??? ?? ???
14
Appendix : Greenplum Tuning? ?? ?? ??
1
??? Skew
??
?? ???? ??? ??? ?? ? ? ??? ???? ?????
Seg1 Seg2 Seg3 Seg4
CREATE TABLE customer (
cust_id VARCHAR(80)
,gender CHAR(5))
DISTRIBUTED BY(gender);
Data Data
Seg1 Seg2 Seg3 Seg4
CREATE TABLE customer (
cust_id VARCHAR(80)
,gender CHAR(5))
DISTRIBUTED BY(cust_id);
Data Data Data Data
15
Appendix : Greenplum Tuning? ?? ?? ??
2
?????
???
??? I/O?? ???? ? ?? ???? ???? ????
Seg1 Seg2 Seg3 Seg4
CREATE TABLE orders (
order_id INT
,order_date DATE )
DISTRIBUTED BY (order_id) ;
Data Data
Seg1 Seg2 Seg3 Seg4
:
DISTRIBUTED BY (order_id)
PARTITION BY RANGE (order_date)
(START (¡®2018-01-01¡¯)
END (¡®2018-12-031¡¯)
EVERY (INTERVAL ¡®1 month¡¯));
06 06 06 06
Data Data
SELECT COUNT(*) FROM orders WHERE order_date BETWEEN ¡®2018-10-22¡¯ and ¡®2018-10-27¡¯
07 07 07 07
08 08 08 08
09 09 09 09
10 10
10 10
16
Appendix : Greenplum Tuning? ?? ?? ??
3
????
I/O ??
??? I/O?? ???? ? ?? ???? ???? ????
Seg1 Seg2 Seg3 Seg4
CREATE TABLE orders (
order_id INT
,order_date DATE )
DISTRIBUTED BY (order_id)
PARTITION BY RANGE (order_date)
:
Seg1 Seg2 Seg3 Seg4
CREATE TABLE orders WITH (
appendonly=true, compresslevel=5)(
order_id INT
,order_date DATE )
:
06 06 06 06
07 07 07 07
08 08 08 08
09 09 09 09
10 10
10 10
17
Appendix : Greenplum Tuning? ?? ?? ??
4
???
?????
?? ???? ??? ???? ?? ??? ????? ???? ????
Seg1 Seg2 Seg3 Seg4
CREATE TABLE orders (
order_id INT
,order_date DATE
,product_id INT )
DISTRIBUTED BY (order_id);
Seg1 Seg2 Seg3 Seg4
CREATE INDEX idx_order_pid
ON orders (product_id);
Data Data Data Data Data Data Data Data
18
GREENPLUM SUMMIT at PostgresConf 2019
by Pivotal

More Related Content

[Pgday.Seoul 2018] Greenplum? ?? ?? ??

  • 1. 1 pgday.Seoul 2018 Greenplum? ?? ?? ?? ??? ??? ?? ?? ?? ?? ??? 2018. 11. 03 Pivotal Korea ???
  • 2. 2 ??? ??? ??? ?? ?? ?? 3??? ??? ???.. ??? ??? ??? ??? ??.. ????? ??? ????.. ?? ???? ?? ??? ??.. ??? ??? ?? ?????..
  • 3. 3 ?? ???, ?? ??, ??? ?? ??
  • 5. 5 ?? ?? PostgreSQL? ??? ?? ??? ??? ???? ??? ???? 10 TB 10 TB 10 TB 10 TB 10 TB 10 TB 10 TB 10 TB 10 TB 10 TB
  • 6. 6 2005? Greenplum? ??! Network Interconnect ¡­ Segment Node Standby MasterMaster Node SEQ SCAN HASH SEQ SCAN HASH JOIN HASH SEQ SCAN HASH JOIN SEQ SCAN HASH SEQ SCAN HASH JOIN HASH SEQ SCAN HASH JOIN ? ?? ?? ? HA ??? ?? ?? ?? ? ??? ? ??? ?? ? ?? ??? ??, ??, ?? ? ???? ? ???? ??? ??? ?? ? ?? ??(File Server, Hadoop, Cloud?)? ?? ?? ??? ?? ?? ? ?? ??? ??? ?? ??? ??? ???? ??
  • 7. 7 Greenplum? ??? ?? ?? ??? ???? ?? ??? ?? ?? ?? ???? ??? ?? ??? ?? PLA N PLA N PLA N PLA N ???? ???? ??? ?? ??? ?? ???? ?? ??
  • 8. 8 ?????? ??? ??, ??? ?? ?? ?? Deploy Anywhere On-premise Private Cloud Public Cloud Pivotal Container Service (PKS)
  • 9. 9 ??1. ??? ?? ???? Greenplum? ?????? 17?? ??? 42????? 12 > 200TB ??? ????? 6.2?? ???? 480 ???????? ?? ???? ??? ?????? ??? ?? ???
  • 10. 10 ??2. ??? Greenplum ??? ?? ??? HDFS Cluster Computing Cloud Storage In-Memory Data Grid interconnect master master System B interconnect master master System A ???? 1:1??? ??????? ???? ?? ?? ? Spark?? Greenplum? ???? Read ? Write ? ??? ??????? Spark ??? ?? ? ??? ???? ??? 4.8 Hour with Spark connector vs 15 days with JDBC (75? ??) Yes! ?????? ?? ?? ?? ?? ??? ????? ??
  • 11. 11 ??3. ??? ??? ??? ??? ?? ????? ???? No. ????? ??? ????? ???? ?? REGRESSIONCLASSIFICATIONCLUSTERINGGraph GeospatialTraditional BI / ReportSQL TEXT Transformation Structured Data Any Workload Any Data ??? ??? ?? ?? ?? : ??? ???? ?? ??? ??? ?? ?? ???? ?? ?????? ?? ??? ???? ?? 10? ? ??? Data ?? Data ??/?? ? ?? Application ?? ¡ì ??? ??? ??? ??? ?? ¡ì Large-scale 3D rain computer simulation - 100?? ?? ?? ????? ?? ?? (? ??: 1?? ????? ?? ??) - 100m ??? ?? ?? ?? (? ??: 2km ?? 5km ??? ?? ??) ¡ì 3D Nowcasting - ????? ?? ??? ??? ???? ?? 10? ? ?? - ??? 80% ?? (? ?? 50% ??) - Phased-array radar ?? ??? : 15? ?? ?? ?? 3?? ?? (? ??: 1? ? 2?? ??) : 30? ?? ???? - ?? ?? ?? ??? : ??? ??? ??, ??, ??, ?? ? ?? - ?? ??? - ?? ?? ??? - ?? ??? ??? ??? ¨¹ ?? ? ?? ?? ??? [ GPDB ?? ??? ???? ?? ??? ] ??? ?? ??? ??? ??? ?? ??? - ?? ?? ??? ??? ??? ?? ?? ?? (Massively Parallel Processing) ¨¹ ??? ??? ???? ?? ??? ? ?? In-Database ?? ?? ?? ¡­ GPText * source: ¡°Greenplum for Extreme Weather Predictions and Analytics at Japan¡¯s NICT¡± (https://www.youtube.com/watch?v=pjDSi1KGaDU)
  • 12. 12 ??4. ???? ??? ??? ? ???? Yes! Greenplum? ?-?????? ?? ?? ?? In-Database Analytics Native support ?? Language ???? ????? ??? ??? ?? ?? ?? ?? DW ???? ???? ?? ??? ?? ?? ?? ?? ??? ?? ?? ?? ??? ?? ??? ?? ?? ????? (summary) ????? (raw data) ?? ?? ?? ?? ?? ???? ?? ???? ?? ???? ?? ¡­ ? 50?? ??? ?? ??
  • 13. 13 1.??? ??? ?? 2.??? ??? ???? 3.?? ???? ??? ??? ??? ?? ?? ??? ? ?? ??? ?? ???
  • 14. 14 Appendix : Greenplum Tuning? ?? ?? ?? 1 ??? Skew ?? ?? ???? ??? ??? ?? ? ? ??? ???? ????? Seg1 Seg2 Seg3 Seg4 CREATE TABLE customer ( cust_id VARCHAR(80) ,gender CHAR(5)) DISTRIBUTED BY(gender); Data Data Seg1 Seg2 Seg3 Seg4 CREATE TABLE customer ( cust_id VARCHAR(80) ,gender CHAR(5)) DISTRIBUTED BY(cust_id); Data Data Data Data
  • 15. 15 Appendix : Greenplum Tuning? ?? ?? ?? 2 ????? ??? ??? I/O?? ???? ? ?? ???? ???? ???? Seg1 Seg2 Seg3 Seg4 CREATE TABLE orders ( order_id INT ,order_date DATE ) DISTRIBUTED BY (order_id) ; Data Data Seg1 Seg2 Seg3 Seg4 : DISTRIBUTED BY (order_id) PARTITION BY RANGE (order_date) (START (¡®2018-01-01¡¯) END (¡®2018-12-031¡¯) EVERY (INTERVAL ¡®1 month¡¯)); 06 06 06 06 Data Data SELECT COUNT(*) FROM orders WHERE order_date BETWEEN ¡®2018-10-22¡¯ and ¡®2018-10-27¡¯ 07 07 07 07 08 08 08 08 09 09 09 09 10 10 10 10
  • 16. 16 Appendix : Greenplum Tuning? ?? ?? ?? 3 ???? I/O ?? ??? I/O?? ???? ? ?? ???? ???? ???? Seg1 Seg2 Seg3 Seg4 CREATE TABLE orders ( order_id INT ,order_date DATE ) DISTRIBUTED BY (order_id) PARTITION BY RANGE (order_date) : Seg1 Seg2 Seg3 Seg4 CREATE TABLE orders WITH ( appendonly=true, compresslevel=5)( order_id INT ,order_date DATE ) : 06 06 06 06 07 07 07 07 08 08 08 08 09 09 09 09 10 10 10 10
  • 17. 17 Appendix : Greenplum Tuning? ?? ?? ?? 4 ??? ????? ?? ???? ??? ???? ?? ??? ????? ???? ???? Seg1 Seg2 Seg3 Seg4 CREATE TABLE orders ( order_id INT ,order_date DATE ,product_id INT ) DISTRIBUTED BY (order_id); Seg1 Seg2 Seg3 Seg4 CREATE INDEX idx_order_pid ON orders (product_id); Data Data Data Data Data Data Data Data
  • 18. 18 GREENPLUM SUMMIT at PostgresConf 2019 by Pivotal