際際滷

際際滷Share a Scribd company logo
Presto User & Admin Guide
Presto 0.197 User & Admin Guide

filfal2001@gmail.com
1. Presto 螳
2. Presto Architecture
3. In-Memory Processing  Hive vs Presto
4. Query Processing
5. Query Execution
6. Presto 
7. 谿瑚襭
Contents
Presto 螳
Distributed SQL Query Engine for Big Data
 Presto  一危 覿 蟲襯  Facebook  襦碁 2013 11 7 
豺 殊伎る 螻糾給.
 覿 覲 SQL 讌朱 interactive Query襯   朱 Storage Layer 譟伎讌
螻 Query襯  Engine朱襷 蟲焔 朱 譴 ANSI SQL 讌.
 MPP(Massive Parallel Processing)   豌襴 覦 Pipelined Execution Model 覦朱
Query襯 豌襴覃 Storage Layer螳   Source Data(Hive, HDFS or etc) 讌 蠏狩
一危磯ゼ 螳讌螻   襦  伎螻 企ゼ  れ Connectorれ 螻牛.
2
Architecture  Coordinator & Worker
Presto Coordinator Worker襦 蟲焔 朱 螳螳  螳  企麹.
Coordinator
 Coordinator Query Parsing, Query Planning 覦 Worker Node 蟯襴襯 企麹. ,
Client螳 Query襯 蠍  郁屋 Node. 覈 Presto れ  伎
Worker Coordinator螳 伎 .
 Coordinator Worker   襭覃 Worker襦 覿 豕譬 蟆郁骸襯 螳語 Client
 覦.
Worker
 Worker  ろ 覦 一危 豌襴襯 企麹 Node . Worker Connector襯 牛
一危磯ゼ 螳語り 譴螳 一危磯ゼ 襦 蟲覃伎  豌襴蟆 .
 Worker Node螳 ろ覃 Coordinator Discovery Server  ろ 襴螻
Coordinator  覦   螳 .
3
Architecture  Connector
 Presto Connector 襯 牛  Data Source 蠏狩 一危磯ゼ 螳語  貎
襴 覿 螳ロ. 螻牛 Connector  螳給.
 Accumulo
 Black Hole
 Cassandra
 Hive
 JMX
 Kafka
 Local file
 MongoDB
 MySQL
 PostgreSQL
 Redis
 Amazon Redshift
 MS SQL Server
 Thrift
4
In-Memory Processing  Hive vs Presto
Hive vs Presto 豌襴 觜蟲
 Hive 螳 螻覲 蟆郁骸襯 Disk襦 Spill 螻覲襦 Disk Read/Write 豌襴 覦
 Presto 螳 螻覲 蟆郁骸襯 pipeline 牛 memory-to-memory襦  豌襴 覦
5
In-Memory Processing  Hive vs Presto
Hive Presto 豌襴 覦 磯ジ Query  螳企  螳 襴  給.
Hive(Data warehouse)
  一危一  batch 豌襴 
  JOIN, Aggregation, , count  蟆曙
 貎朱Μ  願屋(fault tolerance)  蟆曙
 ETL,  螳 伎 Long time 貎朱Μ, 貎朱Μ螳 Fault tolerance襯 讌伎  蟆曙
Presto(Distributed Query Engine)
 觜襯 旧  ad-hoc Query
 觜襯 旧 螳讌  貎朱Μ  願屋 蠍磯(No fault tolerance) 
  count, distinct, union, order by   - 覃覈襴 襷朱 豌襴   螻 (Disk
Spill 旧 牛 覃覈襴 Over 一危磯 豌襴 螳ロ 焔 螳)
 In-Memory 豌襴 螳ロ 一危 伎 Simple Query  蟆曙
6
Query Processing
Presto Query れ螻 螳 ろ 襴朱 豌襴.
 (Worker) Worker Coordinator Discovery Service  ろ 襴暑.
 (Client) Client Coordinator Query襯 豢覃 ろ 豌.
 (Coordinator) Coordinator 豌 覦 Query  Parsing/Analyzing -> Planning -> Scheduling 螻殊 .
 (Coordinator) Coordinator Query  焔 Task襯 Worker . 企, Worker ろ task襯 豢
 task螳 ろ讌 螳.
 (Worker) Worker 覦 Task ろ  Connector襯 牛 Data Source Data襯 曙伎給.
 (Worker) Workerれ Task 蟆郁骸襯 Pipeline Execution 牛 譯手 覦朱 Query襯 覃 豕譬 蟆郁骸襯 Coordinator襦
.
 (Coordinator) Coordinator Worker襦覿 Query  豕譬 蟆郁骸襯 覦る 企ゼ Client .
1
2 4
3
5
6
7
Query Execution - 1
Coordinator 豌 覦 Query SQL螻 蟷 Connector襯 牛 覦 Metadata襯
螳讌螻 Logical Query plan 1谿朱 燕.
Coordinator れ 焔 Logical Query plan 覿  螳ロ襦
Distributed Query plan朱 覲 燕.
8
Query Execution - 2
焔 Query Plan Worker Node 螳 磯 覿  螳ロ襦
Distributed Query plan朱 覲 螳螳 Worker Node .
Distributed Query Plan 螳 螻 Stage手 豺覃 Stage 碁Μ  螻豸旧
蟲譟磯 焔. 螳 襷讌襷 豕譬 蟆郁骸襯 譬 Stage襯 Root Stage手 
.
9
Query Execution - 3
 Stage 螳螳 Worker Node  螳 Task襦 ろ. Worker
Node  Stage 伎  Task襯  覦蟆 . 企蟆  Stage
 螳螳 Worker Node Task 襦 覲 蟆 .
螳 Task  螳 Operator襦  讌. 蠍一 Operator Filter, Projection,
Table Scan, Aggregation, Join 螳  .
10
Query Execution - 4
螳螳 Task  伎 Split 螳讌蟆 . Split Data Partition朱
Connector襯 牛伎 Data Source 螳語り碓 讌 襭 Stage Task
Pipeline execution 牛 螳語り  企 Stage螳 Data  螻殊
Exchange手 .
11
Query Execution - 5
豌 Query ろ 螻殊 螳 渚覃  螳 襴  給.
 豌 覯讌 Stage Task Connector襯 牛 一危磯ゼ 曙伎 Split 
 Task Split  螳 Operator襦  豌襴
 Task ろ 襭  れ Stage Split朱 一危磯ゼ Exchange
 Query 豕譬 蟆郁骸 Root Stage 譬 Client 
12
Query Execution - 6
Query Plan in Presto Web UI
Presto Web UI ろ 貎朱Μ
 history 蠍磯レ 螻牛覃 ろ 貎
襴  Plan 覲企ゼ Graphical 蟆
覲  給.
Query Plan EXPLAIN 覈轟企ゼ 牛
text 襦 豢 螳ロ.
13
一危 (Data Types)(1/2)
Groups Type Description Examples
Boolean BOOLEAN true  false 螳 .
Integer
TINYINT
豕螳 -2^7螻 豕螳 2^7 - 1 螳 8 觜 覿
 2 覲伎 .
SMALLINT
豕螳 -2^15螻 豕螳 2^15 - 1 螳 16 觜
覿語 2 覲伎.
INTEGER
豕螳 -2^31螻 豕 螳 2^31 - 1 螳 32 觜
覿語 2 覲伎.( INT 螳.)
BIGINT
豕螳 -2^63螻 豕螳 2^63 - 1 螳 64 觜
覿語 2 覲伎 .
Floating-Point
REAL
れ 2 讌 覿  一  IEEE 譴 754
襯 蟲 32 觜. (4 Byte  覿 )
DOUBLE
double 伎 覿  一  IEEE 譴
754襯 蟲 64 觜 (8 Byte覦一 覿 )
Fixed-Precision DECIMAL
螻 覦 10 讌. 38 襴蟾讌 覦螳 讌
讌襷 焔レ 豕 18 襴蟾讌 螳.
String
VARCHAR  豕 蠍語企ゼ 螳 螳覲 蠍語. 覓語 一危.
CHAR 螻 蠍語 覓語 一危
VARBINARY 螳覲 蠍語 2 讌 一危.
JSON
JSON る誤 JSON 企, JSON 覯,
JSON 覓語   JSON 螳  true, false
 null.
Presto   一危 
14
一危 (Data Types)(2/2)
Groups Type Description Examples
Date and Time
DATE  讌 (, , ). : DATE '2001-08-22'
TIME TIME ZONE  螳 (, 覿, 豐, 覦襴 豐). : TIME '01:02:03.456'
TIME WITH TIME ZONE TIME ZONE  螳 (, 覿, 豐, 覦襴 豐).
: TIME '01:02:03.456 America
/Los_Angeles'
TIMESTAMP
TIME ZONE  螳螻 讌螳   語ろ
.
: TIMESTAMP '2001-08-22 03
:04:05.321'
TIMESTAMP WITH TIM
E ZONE
TIME ZONE  讌 螳   語ろ
.
: TIMESTAMP '2001-08-22 03
:04:05.321 America/Los_Angele
s'
INTERVAL YEAR TO MO
NTH
螻  覯. : INTERVAL '3' MONTH
INTERVAL DAY TO SEC
OND
, 螳, 覿, 豐 覦 覦襴 豐 覯. : INTERVAL '2' DAY
Structural
ARRAY 讌 貉危朱  覦一. : ARRAY[1, 2, 3]
MAP 讌 貉危朱  企 MAP.
: MAP(ARRAY['foo', 'bar'], AR
RAY[1, 2])
ROW
覈覈  襦 蟲焔 蟲譟.   SQL
 螳讌 ,  谿語^ 一一襯  ′
   .
: CAST(ROW(1, 2.0) AS ROW(
x BIGINT, y DOUBLE))
Network Address IPADDRESS IPv4  IPv6 譯殊襯   IP 譯殊.
 : ,IPADDRESS '10.0.0.1'IPAD
DRESS '2001:db8::1'
HyperLogLog
HyperLogLog HyperLogLog れ豺襦  螻一 螳.
P4HyperLogLog
P4HyperLogLog れ豺 ,讌襷 譟磯  
 .
Quantile Digest QDigest
Quantile れ伎ろ (qdigest) 譯殊伎  誤
  旧 一危 覿襯 谿  蟲
譟
Presto   一危 
15
SQL Statement Syntax (1/3)
 /覲蟆
Syntax Description Synopsis Examples
CREATE
襦 ろる,
企 煙
燕.
CREATE SCHEMA [ IF NOT EXISTS ] schema_name
[ WITH ( property_name = expression [, ...] )
CREATE SCHEMA IF NOT EXISTS
orders;
CREATE TABLE [ IF NOT EXISTS ]
table_name (
{ column_name data_type [ COMMENT comment ]
[ WITH ( property_name = expression [, ...] ) ]
| LIKE existing_table_name [ { INCLUDING |
EXCLUDING } PROPERTIES ] }
[, ...]
)
[ COMMENT table_comment ]
[ WITH ( property_name = expression [, ...] ) ]
CREATE TABLE IF NOT EXISTS orders (
orderkey bigint,
orderstatus varchar,
totalprice double COMMENT 'Price in
cents.',
orderdate date
)
COMMENT 'A table to keep track of
orders.'
ALTER
Definition
覲蟆渚.
ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMN column_name data_
type [ COMMENT comment ] [ WITH ( property_name
= expression [, ...] ) ]
ALTER TABLE name RENAME TO
new_name
ALTER TABLE name ADD COLUMN
column_name orderkey bigint
ALTER TABLE name DROP COLUMN
column_name
ALTER TABLE name RENAME
COLUMN column_name TO
new_column_name
Presto   SQL Syntax
16
SQL Statement Syntax (2/3)
 譟壱/焔/覲蟆/
Syntax Description Synopsis Examples
INSERT
Table 襦 一磯ゼ
1 焔.
INSERT INTO table_name [ ( column [, ... ] ) ] que
ry
INSERT INTO nation (nationkey, n
ame, regionkey, comment)
VALUES (26, 'POLAND', 3, 'no co
mment');
SELECT
譟郁唄 襷 一危磯ゼ 譟
.
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expr [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element
[, ...] ]
[ HAVING condition]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTIN
CT ] select ]
[ ORDER BY expression [ ASC | DESC ] [, ...] ]
[ LIMIT [ count | ALL ] ]
SELECT origin_state, NULL, NULL,
sum(package_weight)
FROM shipping
GROUP BY origin_state
DROP
schema/table/view 煙

DROP SCHEMA [ IF EXISTS ] schema_name
DROP TABLE [ IF EXISTS ] table_name
DROP VIEW [ IF EXISTS ] view_name
DROP TABLE sales
DROP TABLE IF EXISTS tb_date
DROP VIEW IF EXISTS vtb_date
DELETE
Table 一危磯ゼ 
.(一 貉朱 
螳.)
DELETE FROM table_name [ WHERE condition ]
DELETE FROM lineitem
WHERE orderkey IN (SELECT ord
erkey FROM orders WHERE priori
ty = 'LOW');
Presto   SQL Syntax
17
SQL Statement Syntax (3/3)
蠍壱
Syntax Description Synopsis Examples
COMMIT
豕 transaction
COMMIT.
COMMIT [ WORK ] COMMIT;
EXPLAIN
覈豪語 朱Μ  覿
 ろ 螻 蟇
 覈豪語 煙 蟆
.
EXPLAIN [ ( option [, ...] ) ] statement
where option can be one of:
FORMAT { TEXT | GRAPH+D8+B7:C7+A
7:C7
- EXPLAIN SELECT regionkey, count(*) F
ROM nation GROUP BY 1;
- EXPLAIN (TYPE DISTRIBUTED) SELEC
T regionkey, count(*) FROM nation GRO
UP BY 1;
Presto   SQL Syntax
18
- Logical Operators
朱Μ一一襯 螻牛.
- Comparison Functions and
Operators
觜蟲一一襯 螻牛.
 (Functions)(1/5)
- Range Operator: BETWEEN
Operators ろ 螳 轟 覯 伎  蟆曙 
襯 螻牛.
- IS NULL and IS NOT NULL
螳 NULL(覩語)語 覿 誤襯 螻牛.
- Conditional Expressions
譟郁唄覓語 螻牛.
SELECT 3 BETWEEN 2 AND 6;
SELECT 3 NOT BETWEEN 2 AND 6;
SELECT 3.0 IS NULL; -- false
SELECT a, CASE a WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE
'many' END
if( 譟郁唄 , true_value , false_value )
譟郁唄 谿語企 true_value, 蟇一企 false_value 覦.
Presto  - Functions
19
- Mathematical Operators
襯 螻牛.
- Comparison Functions and
Operatorsabs( x )  [リ骸 ]. x  螳 覦 .
cbrt( x )  覦, x 觚 襭碁ゼ 覦 .
ceil( x )  [リ骸 ], ceiling()螻 .
ceiling( x )  [リ骸 ], 覦 x螳 螳 螳蟾 襦 覦襴朱
cosine_similarity( x , y )  double, ろ  覯″ x yれ 螻 螳 貊 煙 覦 .
floor( x )  [リ骸 ] / 覦 螳 x螳 螳蟾 襦 企朱.
random( )  double / 0.0 <= x <1.0 覯   豺襯 れ.
round( x , d )  [リ骸 ] / x襦 覦襴 d 危 襴酔襯 覦.
Presto  - Functions
 (Functions)(2/5)
20
- Mathematical Operators
覓語 襯 螻牛.
concat( string1 , ... , stringN )  varchar /   SQL 譴 郁屋 一一 ( ||)  狩 蠍磯レ 螻牛 .
length( 覓語 )  bigint / string覓語  蠍語企ゼ  譴.
lower( 覓語 )  varchar / string覓語襦 覲 .
lpad( string , size , padstring )  varchar / 襷 string 覓語伎 蠍語願 size覲企  蟆曙 るジ讓 覿 padstring
覓語襦 size襷 豈讌.
trim( 覓語 )  varchar / 螻給葦 蟇壱.
replace( 覓語 , searchfrom , replace )  varchar / 覓語伎 searchfrom  狩 企ゼ 谿場 replace 襦 覓語
襯 覲 .
split( 覓語 , 蟲覿 蠍壱 )  覦一 <varchar> / 覓語伎 delimiter襦蟲覿 覦一伎 覦.
substr( 覓語 ,  )  varchar / 覓語伎  豺覿 襾語 覿覿 覦
substr( 覓語 ,  , 蠍語 )  varchar / 覓語伎  豺覿 蠍語願讌 覓語伎 覦 .
upper( 覓語 )  varchar / 覓語襦 覲 .
Presto  - Functions
 (Functions)(3/5)
21
- Unicode Operators
貊 襯 螻牛.
- Regular Expression Functions
覈 蠏   覈 螳讌 碁ゼ  螻 Java  蟲覓語  .
to_utf8( 覓語 )  varbinary / 覓語 UTF-8 varbinary 朱 語 .
From_utf8( 覦企襴 )  varchar / 覦企襴  UTF-8襦 語  覓語伎 貊.
from_utf8( 覦企襴 , 覦蠑瑚鍵 )  varchar / 覦企襴 UTF-8襦 語  覓語伎 貊 .
regexp_extract( 覓語 ,  )  varchar / 蠏   伎 襷 豌覯讌 覓語 覦.
regexp_like( 覓語 ,  )  boolean / 覓語伎 蠏 蠏  pattern  讌 誤 .
regexp_replace( 覓語 ,  , 豌)  varchar / 蠏   pattern 襷 覓語伎 豌企 覲蟆渚.
regexp_split( 覓語 ,  )  覦一 <varchar> / 蠏   pattern襯 蟲覿襦 覿 覦一伎 覦.
Presto  - Functions
 (Functions)(4/5)
22
- Aggregate Functions
朱 螳    蟆郁骸襯 螻壱 讌螻 襯 螻牛.
- Lambda
  ->襦 . SELECT filter(ARRAY [5, -6, NULL, 7], x -> x > 0); -- [5, 7]
- GeoSpatial
Presto 讌襴 螻糾  WKT (Well-Known Text)  螻糾 螳豌企ゼ 讌.
- URL
URL 豢豢  HTTP URL ( RFC 2396  譴  URI ) 讌.
- JSON
一企襯 JSON朱 覲 螳ロ覃 JSON  一危磯ゼ る 襯 讌json_array_length( json )  bigint / JSON 覦一伎  覓語  覦一 蠍語企ゼ 覦 .
json_extract( json , json_path )  json / JSON 覦一伎 json_path    JSON 覓語企 覦.
json_format( json )  varchar /   JSON 螳 讌  JSON ろ碁ゼ 覦.
Presto  - Functions
 (Functions)(5/5)
23
-  columns襷 讌 覈.
SELECT * FROM TB; (X)  SELECT a,b FROM TB; (O)
- GROUP BY 伎 cardinality 螻.
SELECT GROUP BY gender, age; (X)  SELECT GROUP BY age, gender ; (O)
- ORDER BY 蟷 LIMIT 
SELECT col FROM tbl ORDER BY time ; (X)  SELECT col FROM tbl ORDER BY time LIMIT 100 ; (O)
- 蠏殊 讌螻 (approximate aggregate functions) 
SELECT distinct(user_id) FROM (X)  SELECT approx_distinct user_id FROM; (O)
- り唄 LIKE   regexp_like 襦 覲蟆.
WHERE method LIKE %GET% OR method LIKE %POST% (X)  WHERE regex_like(method, GET|POST) (O)
- JOIN   企 襾殊 讌
FROM small s, large l where l.id=s.id (X)  FROM large l, small s where l.id=s.id (O)
- TOP N 貊襯  ROW_NUMBER  RANK 襯 .
 牛 焔ロレ(SQL)
Presto  - tuning
24
Presto  - tuning
 distributed join
 Join Key襦 hash partitioning
 朱 襴讌襷,  豌襴
 broadcast join
 Join るジ讓 企 螳 Worker襦 覲旧
 朱 觜襯伎襷,  豌襴 覿螳
 牛 焔ロレ(JOIN)
SET SESSION distributed_join = true;
Hash distributed join  蟆語  覿襯 Session Level襦 讌 螳ロ. Hash
distributed join   Join   讌襷 Broadcast join覲企 螳 れ  給.
Session  螳 れ  給. 蠍磯蓋 れ true .
25
Presto  - tuning
redistribute_writes (default : true)
Presto襯 牛 write  蠍 伎 一危磯ゼ 覿覦壱 Skew襦 誤
覦 焔 襯 覦讌蠍  覈. る 一危郁  蟆曙 一危 覿覦 覦
Hashing 朱 誤 企ろ  Overhead 覦  .
optimize_hash_generation (default : true)
貎朱Μ   Join 覦 Aggregation  hash 螳 螻壱讌  覿
貎朱Μ  譴 狩 Hash 螳 螻一 讌  覩襦 CPU  譴企れ襷
Hash 螳 豌 Stage 譴覲給 °蠍 覓語 Network  觜 讀螳
push_aggregation_through_join (default : true)
Scala sub query   sub query  aggregation 覦蟾レ曙朱 蟆
襦 Query Plan .
push_table_write_through_union (default : true)
UNION ALL  覿一襴  蟆語  覿. UNION ALL    焔レ
レ讌襷 企ろ 襦螳 讀螳 襦螳  企ろ一 讌 .
* 覈 Session level propertyれ  螳  螳ロ.
 牛 焔ロレ(Configuration) (1/2)
set session 覈轟企ゼ 牛 Query 襦 れ 螳ロ property
SET SESSION distributed_join = true;
26
Presto  - tuning
task_concurrency (default : 16)
JOIN 覦 Aggregation 螳 覲 一一  蟆曙一 local concurrency襯 讌.
 螳 蟆 れ 蟆曙 企ろ一   貎朱Μ ルレ レ 讌襷
context switching 覦 蠍壱 るる襦 誤 企ろ 焔レ 螳  .
企ろ一   貎朱Μ襷  蟆曙 螳 襦 企ろ 焔 レ.
task_writer_count (default : 1)
Worker Query  豪 Write thread 螳襯 讌.
Write thread CPU襯 豢螳 麹   蟆曙 Write螳 觜殊  讌襷
豢  蠍壱 語朱 誤 朱 Connector 覲覈  殊企  .
* 覈 Session level propertyれ  螳  螳ロ.
 牛 焔ロレ(Configuration) (2/2)
set session 覈轟企ゼ 牛 Query 襦 れ 螳ロ property
SET SESSION distributed_join = true;
27
谿瑚
  覓語 Presto 0.197 覯 蠍一朱 螻旧危碁ゼ 谿瑚 燕給.
  誤 覲企  Presto 螻旧危語 螻牛.
Presto | Distributed SQL Query Engine for Big Data :
https://prestodb.io/
GitHub - prestodb/presto: Distributed SQL query engine for big data :
https://github.com/prestodb/presto
28

More Related Content

Presto User & Admin Guide

  • 1. Presto User & Admin Guide Presto 0.197 User & Admin Guide filfal2001@gmail.com
  • 2. 1. Presto 螳 2. Presto Architecture 3. In-Memory Processing Hive vs Presto 4. Query Processing 5. Query Execution 6. Presto 7. 谿瑚襭 Contents
  • 3. Presto 螳 Distributed SQL Query Engine for Big Data Presto 一危 覿 蟲襯 Facebook 襦碁 2013 11 7 豺 殊伎る 螻糾給. 覿 覲 SQL 讌朱 interactive Query襯 朱 Storage Layer 譟伎讌 螻 Query襯 Engine朱襷 蟲焔 朱 譴 ANSI SQL 讌. MPP(Massive Parallel Processing) 豌襴 覦 Pipelined Execution Model 覦朱 Query襯 豌襴覃 Storage Layer螳 Source Data(Hive, HDFS or etc) 讌 蠏狩 一危磯ゼ 螳讌螻 襦 伎螻 企ゼ れ Connectorれ 螻牛. 2
  • 4. Architecture Coordinator & Worker Presto Coordinator Worker襦 蟲焔 朱 螳螳 螳 企麹. Coordinator Coordinator Query Parsing, Query Planning 覦 Worker Node 蟯襴襯 企麹. , Client螳 Query襯 蠍 郁屋 Node. 覈 Presto れ 伎 Worker Coordinator螳 伎 . Coordinator Worker 襭覃 Worker襦 覿 豕譬 蟆郁骸襯 螳語 Client 覦. Worker Worker ろ 覦 一危 豌襴襯 企麹 Node . Worker Connector襯 牛 一危磯ゼ 螳語り 譴螳 一危磯ゼ 襦 蟲覃伎 豌襴蟆 . Worker Node螳 ろ覃 Coordinator Discovery Server ろ 襴螻 Coordinator 覦 螳 . 3
  • 5. Architecture Connector Presto Connector 襯 牛 Data Source 蠏狩 一危磯ゼ 螳語 貎 襴 覿 螳ロ. 螻牛 Connector 螳給. Accumulo Black Hole Cassandra Hive JMX Kafka Local file MongoDB MySQL PostgreSQL Redis Amazon Redshift MS SQL Server Thrift 4
  • 6. In-Memory Processing Hive vs Presto Hive vs Presto 豌襴 觜蟲 Hive 螳 螻覲 蟆郁骸襯 Disk襦 Spill 螻覲襦 Disk Read/Write 豌襴 覦 Presto 螳 螻覲 蟆郁骸襯 pipeline 牛 memory-to-memory襦 豌襴 覦 5
  • 7. In-Memory Processing Hive vs Presto Hive Presto 豌襴 覦 磯ジ Query 螳企 螳 襴 給. Hive(Data warehouse) 一危一 batch 豌襴 JOIN, Aggregation, , count 蟆曙 貎朱Μ 願屋(fault tolerance) 蟆曙 ETL, 螳 伎 Long time 貎朱Μ, 貎朱Μ螳 Fault tolerance襯 讌伎 蟆曙 Presto(Distributed Query Engine) 觜襯 旧 ad-hoc Query 觜襯 旧 螳讌 貎朱Μ 願屋 蠍磯(No fault tolerance) count, distinct, union, order by - 覃覈襴 襷朱 豌襴 螻 (Disk Spill 旧 牛 覃覈襴 Over 一危磯 豌襴 螳ロ 焔 螳) In-Memory 豌襴 螳ロ 一危 伎 Simple Query 蟆曙 6
  • 8. Query Processing Presto Query れ螻 螳 ろ 襴朱 豌襴. (Worker) Worker Coordinator Discovery Service ろ 襴暑. (Client) Client Coordinator Query襯 豢覃 ろ 豌. (Coordinator) Coordinator 豌 覦 Query Parsing/Analyzing -> Planning -> Scheduling 螻殊 . (Coordinator) Coordinator Query 焔 Task襯 Worker . 企, Worker ろ task襯 豢 task螳 ろ讌 螳. (Worker) Worker 覦 Task ろ Connector襯 牛 Data Source Data襯 曙伎給. (Worker) Workerれ Task 蟆郁骸襯 Pipeline Execution 牛 譯手 覦朱 Query襯 覃 豕譬 蟆郁骸襯 Coordinator襦 . (Coordinator) Coordinator Worker襦覿 Query 豕譬 蟆郁骸襯 覦る 企ゼ Client . 1 2 4 3 5 6 7
  • 9. Query Execution - 1 Coordinator 豌 覦 Query SQL螻 蟷 Connector襯 牛 覦 Metadata襯 螳讌螻 Logical Query plan 1谿朱 燕. Coordinator れ 焔 Logical Query plan 覿 螳ロ襦 Distributed Query plan朱 覲 燕. 8
  • 10. Query Execution - 2 焔 Query Plan Worker Node 螳 磯 覿 螳ロ襦 Distributed Query plan朱 覲 螳螳 Worker Node . Distributed Query Plan 螳 螻 Stage手 豺覃 Stage 碁Μ 螻豸旧 蟲譟磯 焔. 螳 襷讌襷 豕譬 蟆郁骸襯 譬 Stage襯 Root Stage手 . 9
  • 11. Query Execution - 3 Stage 螳螳 Worker Node 螳 Task襦 ろ. Worker Node Stage 伎 Task襯 覦蟆 . 企蟆 Stage 螳螳 Worker Node Task 襦 覲 蟆 . 螳 Task 螳 Operator襦 讌. 蠍一 Operator Filter, Projection, Table Scan, Aggregation, Join 螳 . 10
  • 12. Query Execution - 4 螳螳 Task 伎 Split 螳讌蟆 . Split Data Partition朱 Connector襯 牛伎 Data Source 螳語り碓 讌 襭 Stage Task Pipeline execution 牛 螳語り 企 Stage螳 Data 螻殊 Exchange手 . 11
  • 13. Query Execution - 5 豌 Query ろ 螻殊 螳 渚覃 螳 襴 給. 豌 覯讌 Stage Task Connector襯 牛 一危磯ゼ 曙伎 Split Task Split 螳 Operator襦 豌襴 Task ろ 襭 れ Stage Split朱 一危磯ゼ Exchange Query 豕譬 蟆郁骸 Root Stage 譬 Client 12
  • 14. Query Execution - 6 Query Plan in Presto Web UI Presto Web UI ろ 貎朱Μ history 蠍磯レ 螻牛覃 ろ 貎 襴 Plan 覲企ゼ Graphical 蟆 覲 給. Query Plan EXPLAIN 覈轟企ゼ 牛 text 襦 豢 螳ロ. 13
  • 15. 一危 (Data Types)(1/2) Groups Type Description Examples Boolean BOOLEAN true false 螳 . Integer TINYINT 豕螳 -2^7螻 豕螳 2^7 - 1 螳 8 觜 覿 2 覲伎 . SMALLINT 豕螳 -2^15螻 豕螳 2^15 - 1 螳 16 觜 覿語 2 覲伎. INTEGER 豕螳 -2^31螻 豕 螳 2^31 - 1 螳 32 觜 覿語 2 覲伎.( INT 螳.) BIGINT 豕螳 -2^63螻 豕螳 2^63 - 1 螳 64 觜 覿語 2 覲伎 . Floating-Point REAL れ 2 讌 覿 一 IEEE 譴 754 襯 蟲 32 觜. (4 Byte 覿 ) DOUBLE double 伎 覿 一 IEEE 譴 754襯 蟲 64 觜 (8 Byte覦一 覿 ) Fixed-Precision DECIMAL 螻 覦 10 讌. 38 襴蟾讌 覦螳 讌 讌襷 焔レ 豕 18 襴蟾讌 螳. String VARCHAR 豕 蠍語企ゼ 螳 螳覲 蠍語. 覓語 一危. CHAR 螻 蠍語 覓語 一危 VARBINARY 螳覲 蠍語 2 讌 一危. JSON JSON る誤 JSON 企, JSON 覯, JSON 覓語 JSON 螳 true, false null. Presto 一危 14
  • 16. 一危 (Data Types)(2/2) Groups Type Description Examples Date and Time DATE 讌 (, , ). : DATE '2001-08-22' TIME TIME ZONE 螳 (, 覿, 豐, 覦襴 豐). : TIME '01:02:03.456' TIME WITH TIME ZONE TIME ZONE 螳 (, 覿, 豐, 覦襴 豐). : TIME '01:02:03.456 America /Los_Angeles' TIMESTAMP TIME ZONE 螳螻 讌螳 語ろ . : TIMESTAMP '2001-08-22 03 :04:05.321' TIMESTAMP WITH TIM E ZONE TIME ZONE 讌 螳 語ろ . : TIMESTAMP '2001-08-22 03 :04:05.321 America/Los_Angele s' INTERVAL YEAR TO MO NTH 螻 覯. : INTERVAL '3' MONTH INTERVAL DAY TO SEC OND , 螳, 覿, 豐 覦 覦襴 豐 覯. : INTERVAL '2' DAY Structural ARRAY 讌 貉危朱 覦一. : ARRAY[1, 2, 3] MAP 讌 貉危朱 企 MAP. : MAP(ARRAY['foo', 'bar'], AR RAY[1, 2]) ROW 覈覈 襦 蟲焔 蟲譟. SQL 螳讌 , 谿語^ 一一襯 ′ . : CAST(ROW(1, 2.0) AS ROW( x BIGINT, y DOUBLE)) Network Address IPADDRESS IPv4 IPv6 譯殊襯 IP 譯殊. : ,IPADDRESS '10.0.0.1'IPAD DRESS '2001:db8::1' HyperLogLog HyperLogLog HyperLogLog れ豺襦 螻一 螳. P4HyperLogLog P4HyperLogLog れ豺 ,讌襷 譟磯 . Quantile Digest QDigest Quantile れ伎ろ (qdigest) 譯殊伎 誤 旧 一危 覿襯 谿 蟲 譟 Presto 一危 15
  • 17. SQL Statement Syntax (1/3) /覲蟆 Syntax Description Synopsis Examples CREATE 襦 ろる, 企 煙 燕. CREATE SCHEMA [ IF NOT EXISTS ] schema_name [ WITH ( property_name = expression [, ...] ) CREATE SCHEMA IF NOT EXISTS orders; CREATE TABLE [ IF NOT EXISTS ] table_name ( { column_name data_type [ COMMENT comment ] [ WITH ( property_name = expression [, ...] ) ] | LIKE existing_table_name [ { INCLUDING | EXCLUDING } PROPERTIES ] } [, ...] ) [ COMMENT table_comment ] [ WITH ( property_name = expression [, ...] ) ] CREATE TABLE IF NOT EXISTS orders ( orderkey bigint, orderstatus varchar, totalprice double COMMENT 'Price in cents.', orderdate date ) COMMENT 'A table to keep track of orders.' ALTER Definition 覲蟆渚. ALTER TABLE name RENAME TO new_name ALTER TABLE name ADD COLUMN column_name data_ type [ COMMENT comment ] [ WITH ( property_name = expression [, ...] ) ] ALTER TABLE name RENAME TO new_name ALTER TABLE name ADD COLUMN column_name orderkey bigint ALTER TABLE name DROP COLUMN column_name ALTER TABLE name RENAME COLUMN column_name TO new_column_name Presto SQL Syntax 16
  • 18. SQL Statement Syntax (2/3) 譟壱/焔/覲蟆/ Syntax Description Synopsis Examples INSERT Table 襦 一磯ゼ 1 焔. INSERT INTO table_name [ ( column [, ... ] ) ] que ry INSERT INTO nation (nationkey, n ame, regionkey, comment) VALUES (26, 'POLAND', 3, 'no co mment'); SELECT 譟郁唄 襷 一危磯ゼ 譟 . [ WITH with_query [, ...] ] SELECT [ ALL | DISTINCT ] select_expr [, ...] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ] [ HAVING condition] [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTIN CT ] select ] [ ORDER BY expression [ ASC | DESC ] [, ...] ] [ LIMIT [ count | ALL ] ] SELECT origin_state, NULL, NULL, sum(package_weight) FROM shipping GROUP BY origin_state DROP schema/table/view 煙 DROP SCHEMA [ IF EXISTS ] schema_name DROP TABLE [ IF EXISTS ] table_name DROP VIEW [ IF EXISTS ] view_name DROP TABLE sales DROP TABLE IF EXISTS tb_date DROP VIEW IF EXISTS vtb_date DELETE Table 一危磯ゼ .(一 貉朱 螳.) DELETE FROM table_name [ WHERE condition ] DELETE FROM lineitem WHERE orderkey IN (SELECT ord erkey FROM orders WHERE priori ty = 'LOW'); Presto SQL Syntax 17
  • 19. SQL Statement Syntax (3/3) 蠍壱 Syntax Description Synopsis Examples COMMIT 豕 transaction COMMIT. COMMIT [ WORK ] COMMIT; EXPLAIN 覈豪語 朱Μ 覿 ろ 螻 蟇 覈豪語 煙 蟆 . EXPLAIN [ ( option [, ...] ) ] statement where option can be one of: FORMAT { TEXT | GRAPH+D8+B7:C7+A 7:C7 - EXPLAIN SELECT regionkey, count(*) F ROM nation GROUP BY 1; - EXPLAIN (TYPE DISTRIBUTED) SELEC T regionkey, count(*) FROM nation GRO UP BY 1; Presto SQL Syntax 18
  • 20. - Logical Operators 朱Μ一一襯 螻牛. - Comparison Functions and Operators 觜蟲一一襯 螻牛. (Functions)(1/5) - Range Operator: BETWEEN Operators ろ 螳 轟 覯 伎 蟆曙 襯 螻牛. - IS NULL and IS NOT NULL 螳 NULL(覩語)語 覿 誤襯 螻牛. - Conditional Expressions 譟郁唄覓語 螻牛. SELECT 3 BETWEEN 2 AND 6; SELECT 3 NOT BETWEEN 2 AND 6; SELECT 3.0 IS NULL; -- false SELECT a, CASE a WHEN 1 THEN 'one' WHEN 2 THEN 'two' ELSE 'many' END if( 譟郁唄 , true_value , false_value ) 譟郁唄 谿語企 true_value, 蟇一企 false_value 覦. Presto - Functions 19
  • 21. - Mathematical Operators 襯 螻牛. - Comparison Functions and Operatorsabs( x ) [リ骸 ]. x 螳 覦 . cbrt( x ) 覦, x 觚 襭碁ゼ 覦 . ceil( x ) [リ骸 ], ceiling()螻 . ceiling( x ) [リ骸 ], 覦 x螳 螳 螳蟾 襦 覦襴朱 cosine_similarity( x , y ) double, ろ 覯″ x yれ 螻 螳 貊 煙 覦 . floor( x ) [リ骸 ] / 覦 螳 x螳 螳蟾 襦 企朱. random( ) double / 0.0 <= x <1.0 覯 豺襯 れ. round( x , d ) [リ骸 ] / x襦 覦襴 d 危 襴酔襯 覦. Presto - Functions (Functions)(2/5) 20
  • 22. - Mathematical Operators 覓語 襯 螻牛. concat( string1 , ... , stringN ) varchar / SQL 譴 郁屋 一一 ( ||) 狩 蠍磯レ 螻牛 . length( 覓語 ) bigint / string覓語 蠍語企ゼ 譴. lower( 覓語 ) varchar / string覓語襦 覲 . lpad( string , size , padstring ) varchar / 襷 string 覓語伎 蠍語願 size覲企 蟆曙 るジ讓 覿 padstring 覓語襦 size襷 豈讌. trim( 覓語 ) varchar / 螻給葦 蟇壱. replace( 覓語 , searchfrom , replace ) varchar / 覓語伎 searchfrom 狩 企ゼ 谿場 replace 襦 覓語 襯 覲 . split( 覓語 , 蟲覿 蠍壱 ) 覦一 <varchar> / 覓語伎 delimiter襦蟲覿 覦一伎 覦. substr( 覓語 , ) varchar / 覓語伎 豺覿 襾語 覿覿 覦 substr( 覓語 , , 蠍語 ) varchar / 覓語伎 豺覿 蠍語願讌 覓語伎 覦 . upper( 覓語 ) varchar / 覓語襦 覲 . Presto - Functions (Functions)(3/5) 21
  • 23. - Unicode Operators 貊 襯 螻牛. - Regular Expression Functions 覈 蠏 覈 螳讌 碁ゼ 螻 Java 蟲覓語 . to_utf8( 覓語 ) varbinary / 覓語 UTF-8 varbinary 朱 語 . From_utf8( 覦企襴 ) varchar / 覦企襴 UTF-8襦 語 覓語伎 貊. from_utf8( 覦企襴 , 覦蠑瑚鍵 ) varchar / 覦企襴 UTF-8襦 語 覓語伎 貊 . regexp_extract( 覓語 , ) varchar / 蠏 伎 襷 豌覯讌 覓語 覦. regexp_like( 覓語 , ) boolean / 覓語伎 蠏 蠏 pattern 讌 誤 . regexp_replace( 覓語 , , 豌) varchar / 蠏 pattern 襷 覓語伎 豌企 覲蟆渚. regexp_split( 覓語 , ) 覦一 <varchar> / 蠏 pattern襯 蟲覿襦 覿 覦一伎 覦. Presto - Functions (Functions)(4/5) 22
  • 24. - Aggregate Functions 朱 螳 蟆郁骸襯 螻壱 讌螻 襯 螻牛. - Lambda ->襦 . SELECT filter(ARRAY [5, -6, NULL, 7], x -> x > 0); -- [5, 7] - GeoSpatial Presto 讌襴 螻糾 WKT (Well-Known Text) 螻糾 螳豌企ゼ 讌. - URL URL 豢豢 HTTP URL ( RFC 2396 譴 URI ) 讌. - JSON 一企襯 JSON朱 覲 螳ロ覃 JSON 一危磯ゼ る 襯 讌json_array_length( json ) bigint / JSON 覦一伎 覓語 覦一 蠍語企ゼ 覦 . json_extract( json , json_path ) json / JSON 覦一伎 json_path JSON 覓語企 覦. json_format( json ) varchar / JSON 螳 讌 JSON ろ碁ゼ 覦. Presto - Functions (Functions)(5/5) 23
  • 25. - columns襷 讌 覈. SELECT * FROM TB; (X) SELECT a,b FROM TB; (O) - GROUP BY 伎 cardinality 螻. SELECT GROUP BY gender, age; (X) SELECT GROUP BY age, gender ; (O) - ORDER BY 蟷 LIMIT SELECT col FROM tbl ORDER BY time ; (X) SELECT col FROM tbl ORDER BY time LIMIT 100 ; (O) - 蠏殊 讌螻 (approximate aggregate functions) SELECT distinct(user_id) FROM (X) SELECT approx_distinct user_id FROM; (O) - り唄 LIKE regexp_like 襦 覲蟆. WHERE method LIKE %GET% OR method LIKE %POST% (X) WHERE regex_like(method, GET|POST) (O) - JOIN 企 襾殊 讌 FROM small s, large l where l.id=s.id (X) FROM large l, small s where l.id=s.id (O) - TOP N 貊襯 ROW_NUMBER RANK 襯 . 牛 焔ロレ(SQL) Presto - tuning 24
  • 26. Presto - tuning distributed join Join Key襦 hash partitioning 朱 襴讌襷, 豌襴 broadcast join Join るジ讓 企 螳 Worker襦 覲旧 朱 觜襯伎襷, 豌襴 覿螳 牛 焔ロレ(JOIN) SET SESSION distributed_join = true; Hash distributed join 蟆語 覿襯 Session Level襦 讌 螳ロ. Hash distributed join Join 讌襷 Broadcast join覲企 螳 れ 給. Session 螳 れ 給. 蠍磯蓋 れ true . 25
  • 27. Presto - tuning redistribute_writes (default : true) Presto襯 牛 write 蠍 伎 一危磯ゼ 覿覦壱 Skew襦 誤 覦 焔 襯 覦讌蠍 覈. る 一危郁 蟆曙 一危 覿覦 覦 Hashing 朱 誤 企ろ Overhead 覦 . optimize_hash_generation (default : true) 貎朱Μ Join 覦 Aggregation hash 螳 螻壱讌 覿 貎朱Μ 譴 狩 Hash 螳 螻一 讌 覩襦 CPU 譴企れ襷 Hash 螳 豌 Stage 譴覲給 °蠍 覓語 Network 觜 讀螳 push_aggregation_through_join (default : true) Scala sub query sub query aggregation 覦蟾レ曙朱 蟆 襦 Query Plan . push_table_write_through_union (default : true) UNION ALL 覿一襴 蟆語 覿. UNION ALL 焔レ レ讌襷 企ろ 襦螳 讀螳 襦螳 企ろ一 讌 . * 覈 Session level propertyれ 螳 螳ロ. 牛 焔ロレ(Configuration) (1/2) set session 覈轟企ゼ 牛 Query 襦 れ 螳ロ property SET SESSION distributed_join = true; 26
  • 28. Presto - tuning task_concurrency (default : 16) JOIN 覦 Aggregation 螳 覲 一一 蟆曙一 local concurrency襯 讌. 螳 蟆 れ 蟆曙 企ろ一 貎朱Μ ルレ レ 讌襷 context switching 覦 蠍壱 るる襦 誤 企ろ 焔レ 螳 . 企ろ一 貎朱Μ襷 蟆曙 螳 襦 企ろ 焔 レ. task_writer_count (default : 1) Worker Query 豪 Write thread 螳襯 讌. Write thread CPU襯 豢螳 麹 蟆曙 Write螳 觜殊 讌襷 豢 蠍壱 語朱 誤 朱 Connector 覲覈 殊企 . * 覈 Session level propertyれ 螳 螳ロ. 牛 焔ロレ(Configuration) (2/2) set session 覈轟企ゼ 牛 Query 襦 れ 螳ロ property SET SESSION distributed_join = true; 27
  • 29. 谿瑚 覓語 Presto 0.197 覯 蠍一朱 螻旧危碁ゼ 谿瑚 燕給. 誤 覲企 Presto 螻旧危語 螻牛. Presto | Distributed SQL Query Engine for Big Data : https://prestodb.io/ GitHub - prestodb/presto: Distributed SQL query engine for big data : https://github.com/prestodb/presto 28