8. 8
DRUID
BITMAP INDEX OR 一
thinkware.com -> [111000]
facebook.com -> [000111]
SELECT * FROM table
WHERE publisher= 'thinkware.com' OR publisher='facebook.com'
thinkware.com
OR
facebook.com
[111000]
OR
[000111]
[111111]
"1~6螻 襷れ広"
9. 9
DRUID
"Druid is NOT time series DB"
Druid 覲 一危磯ゼ ロ讌 螻 蠍一ヾ 一危一 indexing 覲企 .
BROKER
REALTIME
HISTORICAL HDFS
CLIENT
DATA STREAM
HAND OFF
INDEXING
INDEXING
16. 16
DRUID
Druid 0.10.0 伎手鍵
Built-in SQL (Powered by Apache Calcite)
- REST API 訖襷 伎 JDBC Driver 螻.
- HIVE StorageHandler襯 Druid Input format 蟲
- Druid 蠍磯 Hive Table 襷
Druid Input Format for Hive
17. 17
DRUID
Druid Query Recognition (Powered by Apache Calcite)
SELECT user, SUM(sales) AS s
FROM druid_table
WHERE month IN (12, 11, 10)
AND year=2017 AND state='CA'
GROUP BY user ORDER BY s DESC
LIMIT 10;
Apache Hive Query
19. 19
DRUID
Registering Druid Data Sources
Point hive to the broker
- SET hive.druid.broker.address.default=druid.broker.hostname:8082;
Create external table Statement
CREATE EXTERNAL TABLE druid_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource"='druid_source')
20. 20
DRUID
Push Data to Druid without Hive
Push Data to Druid with Hive
CREATE TABLE druid_table
STORED BY 'org.apache.hadoop.hive.druid.DruidStorageHandler'
TBLPROPERTIES ("druid.datasource"="druid_source",
"druid.segment.granularity"="HOUR")
AS SELECT time, page, user, c_added FROM src;
21. 21
DRUID
Benefits both to Druid and Apache Hive
Druid
SQL Query襯 螳ロ
Hive襯 牛 Join螻 螳 覲旧″ 一一 螳ロ伎
Hive
れ螳 一危一 豌襴螳 螳ロ伎
37. APACHE CALCITE
Planning Queries
37
SELECT p.productName, COUNT(*) as cnt
FROM splunk.splunk AS s
JOIN mysql.products AS p
ON s.productID = p.productID
WHERE s.action = 'purchase'
GROUP BY p.productName
ORDER BY cnt DESC
SCAN SCAN
JOIN
FILTER
GROUP
BY
ORDER
BY
splunk mysql
KEY : productID
action="purchase"
38. APACHE CALCITE
Optimized Queries
38
SELECT p.productName, COUNT(*) as cnt
FROM splunk.splunk AS s
JOIN mysql.products AS p
ON s.productID = p.productID
WHERE s.action = 'purchase'
GROUP BY p.productName
ORDER BY cnt DESC
SCAN SCAN
JOIN
GROUP
BY
ORDER
BY
splunk mysql
KEY : productID
action="purchase"
FILTER
39. APACHE CALCITE
Using AdaptiveMonteCarlo Algorithm
39
Harinarayan, Rajaraman, Ullman(1996), "Implementing data cubes efficiently"
org.pentaho.aggdes.algorithm.impl.AdaptiveMonteCarloAlgorithm