際際滷

際際滷Share a Scribd company logo
JECB: a Join-Extension,
Code-Based Approach to
OLTP Data Partitioning
Khai Q. Tran -- Oracle Labs1
Jeffrey F. Naughton  UW Madison
Bruhathi Sundarmurthy -- UW Madison
Dimitris Tsirogiannis -- Cloudera2
1
1: Work done at UW Madison
2: Work done at the Microsoft Gray Systems Lab
Motivation
 Modern OLTP applications often require
high transaction throughput
 Scaling out OLTP transactions in
distributed systems is challenging
 Communication protocols
(2PC, Paxos) are expensive
 A smart way to partition the
database to avoid distributed
transactions is crucial
2
The problem
Given a database, a transaction
workload, and k processing elements,
horizontally split the database into k
partitions such that the number of
distributed transactions is minimized.
Tuesday, June 24, 2014 3
Work done in the past: parallel
database partitioning
Tuesday, June 24, 2014 4
OLTP Parallel database
Goal
Wants local
execution, avoids
splitting xacts
Wants parallelism
power by splitting
query/xact
Workload
properties
Simple queries,
touching few
records
Complex queries,
large scans
Old techniques may not be
applicable for the new problem
State of the art 1 - Schism
 A statistic-based fine-granularity
approach:
 Models training transactions as a tuple graph:
tuples nodes and transactions  edges
connecting the tuples
 Uses graph min-cut algorithms to partition the
graph into k partitions
 Mapping tuple  partition id
 Uses machine learning techniques to infer
partitioning attributes from the sample
mapping
 May not scale well for large databasesTuesday, June 24, 2014 5
State of the art 2 -
Horticulture
 A schema-based approach
 Uses schema to search for the best
partitioning attribute for each table
 Employs an advanced search technique
(large-neighborhood search) to reduce the
search space
 Uses training trace and a data skew-aware
cost function to evaluate the partitioning
quality
 May not find good partitioning solutions for
complicated workloadsTuesday, June 24, 2014 6
Our approach - JECB
 Join-extension (idea):
 Explicitly uses key-foreign key relationships from
schema to partition through joins
 Code-based (algorithm):
 Analyze SQL code from store procedures to find the
best way joining tables together
 Divide-and-conquer (optimization):
 Consider each class of transactions separately and
combine results later
Tuesday, June 24, 2014 7
Simple example for Join
Extension
Two tables: R(X, Y) and S(Y, Z)
Query: SELECT * FROM R, S
WHERE R.Y = S.Y AND S.Z = 100
The best way to partition R ?
- Not by any columns of R
- By S.Z and using join R.Y = S.Y
Tuesday, June 24, 2014 8
Our data partitioning
framework
 Phase 1: Pre-processing
 Collecting workload trace (sample run) and
identifying read-only tables
 Phase 2: Partitioning individual transaction
types
 Finding the best partitioning solution for each
type of transactions using join extensions in
transaction source code
 Phase 3: Merging
 Combining results from all transaction types to
find the best global solution.
Tuesday, June 24, 2014 9
Phase 2 demo on TPC-E Customer-Position
transactions
IF (@cust_id = 0)
SELECT @cust_id = C_ID
FROM CUSTOMER
WHERE C_TAX_ID = @tax_id
SELECT CA_ID, CA_BAL,
SUM(ISNULL(HS_QTY, 0) * ISNULL(LT_PRICE, 0))
FROM CUSTOMER_ACCOUNT WITH (UPDLOCK) LEFT LOOP JOIN
HOLDING_SUMMARY ON CA_ID=HS_CA_ID LEFT LOOP JOIN LAST_TRADE ON HS_S_SYMB = LT_S_SYMB
WHERE CA_C_ID = @cust_id
GROUP BY CA_ID,CA_BAL
ORDER BY 3 ASC
SELECT TOP 30
TH_DTS,
T_QTY,
T_S_SYMB,
T_ID,
ST_NAME
FROM (SELECT TOP 10 T_ID AS ID
FROM TRADE
WHERE T_CA_ID = @acct_id
ORDER BY T_DTS DESC) AS T
JOIN TRADE ON T_ID=ID
INNER LOOP JOIN TRADE_HISTORY ON TH_T_ID=T_ID,
STATUS_TYPE
WHERE ST_ID = TH_ST_ID
ORDER BY TH_DTS DESC
Tuesday, June 24, 2014 10
CA_ID CA_C_ID
CUSTOMER_ACCOUNT
HS_S_SYMB, HS_CA_ID HS_CA_ID
HOLDING_SUMMARY
T_ID T_CA_ID
TRADE
HS_S_SYMB
LT_S_SYMB
LAST_TRADE
C_ID C_TAX_ID
CUSTOMER
TH_T_ID, TH_ST_ID TH_T_ID
TRADE_HISTORY
ST_ID
STATUS
TH_ST_ID
CA_ID CA_C_IDCA_C_ID
C_IDC_ID C_TAX_ID
CUSTOMER
Step 1: Identify accessed tables and accessed attributes from SQL codeStep 2: Connect tables with key-foreign joins to form a join graphStep 3: List all possible join trees from the join graphStep 4: Use the workload trace to select the best join tree
Implementation
Tuesday, June 24, 2014 11
Workload
driver
Partitioning
Algorithm
P
Simulation
framework
CostCost Model
Trace
Collector
Training
trace
Database schema
Transaction SQL code
Testing
trace
Partitioning solution
SQL Server 2012
Scalability experiments with
TPC-C
Tuesday, June 24, 2014 12
0
50
100
150
200
250
0
1
2
3
4
5
6
0.2 0.4 0.8 1.6 3.2 Thousand
transactions
GB
Database size (GB)
Memory
Traning
size
Schism JECB
0
50
100
150
200
250
0
1
2
3
4
5
6
0.2 0.4 0.8 1.6 3.2
Thousand
transactions
GB
Database size (GB)
Memory
Quality experiments
Tuesday, June 24, 2014 13
0
10
20
30
40
50
60
%ofdist.transactions
Schism 10%
coverage
Horticulture
JECB
TPC-E partitioning scheme
14
Table Horticulture JECB
ACCOUNT_PERMISSION AP_CA_ID replicated
CUSTOMER_TAXRATE CX_C_ID replicated
DAILY_MARKET DM_DATE replicated
WATCH_LIST WL_C_ID replicated
CASH_TRANSACTION CT_T_ID CT_T_ID = T_ID, T_CA_ID = CA_ID,
CA_C_ID
CUSTOMER_ACCOUNT replicated CA_C_ID
HOLDING H_CA_ID H_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID
HOLDING_HISTORY HH_T_ID HH_T_ID=T_ID, T_CA_ID = CA_ID,
CA_C_ID
HOLDING_SUMMARY HS_CA_ID HS_CA_ID = CA_ID, CA_C_ID
SETTLEMENT SE_T_ID SE_T_ID = T_ID, T_CA_ID = CA_ID,
CA_C_ID
TRADE T_CA_ID T_CA_ID = CA_ID, CA_C_ID
TRADE_HISTORY TH_T_ID TH_T_ID = T_ID, T_CA_ID = CA_ID,
CA_C_ID
TRADE_REQUEST replicated TR_T_ID = T_ID, T_CA_ID = CA_ID,
CA_C_ID
Join extensions help to deliver
better results on complicated
workloads
What we learnt from
Experiments
Tuesday, June 24, 2014 15
Schema-based approach
Statistics-based
approach
JECB:
 Divide-and-conquer and join
extensions to break complex
workloads
 Restrictions: all joins are key-
foreign key joins
Workload complexity
SimpleComplicated
Databasesize
SmallLarge
?
Conclusion and future work
 We developed a promising approach
for partitioning complicated OLTP
databases.
 Future work:
 Address the data skew problem
 Try more complicated cost models rather
than % of distributed transactions
 End-to-end performance experiments
Tuesday, June 24, 2014 16
Phase 2 (Background)
 Join path:
 Sequence of key-foreign key joins connecting tables to partitioning attributes
 Allow a table to be partitioned by attribute from another table
Example:
Tuesday, June 24, 2014 17
CA_ID C_ID
1 1
2 1
6 2
9 2
T_ID T_CA_ID
1 1
2 1
3 2
4 2
5 6
6 6
7 9
8 9
TRADE
CUSTOMER_ACCOUNT SELECT *
FROM TRADE,
CUSTOMER_ACCOUNT
WHERE T_CA_ID = CA_ID AND
C_ID = @cust_id
Join path from TRADE to C_ID:
T_ID T_CA_ID CA_ID CA_C_ID
TRADE CUSTOMER_ACCOUNT
Blue: source
Red: destination
 Join tree:
 Combination of join paths sharing the same destination (the root)
TPC-E transaction class
breakdown
18

More Related Content

Jecb sigmod2014

  • 1. JECB: a Join-Extension, Code-Based Approach to OLTP Data Partitioning Khai Q. Tran -- Oracle Labs1 Jeffrey F. Naughton UW Madison Bruhathi Sundarmurthy -- UW Madison Dimitris Tsirogiannis -- Cloudera2 1 1: Work done at UW Madison 2: Work done at the Microsoft Gray Systems Lab
  • 2. Motivation Modern OLTP applications often require high transaction throughput Scaling out OLTP transactions in distributed systems is challenging Communication protocols (2PC, Paxos) are expensive A smart way to partition the database to avoid distributed transactions is crucial 2
  • 3. The problem Given a database, a transaction workload, and k processing elements, horizontally split the database into k partitions such that the number of distributed transactions is minimized. Tuesday, June 24, 2014 3
  • 4. Work done in the past: parallel database partitioning Tuesday, June 24, 2014 4 OLTP Parallel database Goal Wants local execution, avoids splitting xacts Wants parallelism power by splitting query/xact Workload properties Simple queries, touching few records Complex queries, large scans Old techniques may not be applicable for the new problem
  • 5. State of the art 1 - Schism A statistic-based fine-granularity approach: Models training transactions as a tuple graph: tuples nodes and transactions edges connecting the tuples Uses graph min-cut algorithms to partition the graph into k partitions Mapping tuple partition id Uses machine learning techniques to infer partitioning attributes from the sample mapping May not scale well for large databasesTuesday, June 24, 2014 5
  • 6. State of the art 2 - Horticulture A schema-based approach Uses schema to search for the best partitioning attribute for each table Employs an advanced search technique (large-neighborhood search) to reduce the search space Uses training trace and a data skew-aware cost function to evaluate the partitioning quality May not find good partitioning solutions for complicated workloadsTuesday, June 24, 2014 6
  • 7. Our approach - JECB Join-extension (idea): Explicitly uses key-foreign key relationships from schema to partition through joins Code-based (algorithm): Analyze SQL code from store procedures to find the best way joining tables together Divide-and-conquer (optimization): Consider each class of transactions separately and combine results later Tuesday, June 24, 2014 7
  • 8. Simple example for Join Extension Two tables: R(X, Y) and S(Y, Z) Query: SELECT * FROM R, S WHERE R.Y = S.Y AND S.Z = 100 The best way to partition R ? - Not by any columns of R - By S.Z and using join R.Y = S.Y Tuesday, June 24, 2014 8
  • 9. Our data partitioning framework Phase 1: Pre-processing Collecting workload trace (sample run) and identifying read-only tables Phase 2: Partitioning individual transaction types Finding the best partitioning solution for each type of transactions using join extensions in transaction source code Phase 3: Merging Combining results from all transaction types to find the best global solution. Tuesday, June 24, 2014 9
  • 10. Phase 2 demo on TPC-E Customer-Position transactions IF (@cust_id = 0) SELECT @cust_id = C_ID FROM CUSTOMER WHERE C_TAX_ID = @tax_id SELECT CA_ID, CA_BAL, SUM(ISNULL(HS_QTY, 0) * ISNULL(LT_PRICE, 0)) FROM CUSTOMER_ACCOUNT WITH (UPDLOCK) LEFT LOOP JOIN HOLDING_SUMMARY ON CA_ID=HS_CA_ID LEFT LOOP JOIN LAST_TRADE ON HS_S_SYMB = LT_S_SYMB WHERE CA_C_ID = @cust_id GROUP BY CA_ID,CA_BAL ORDER BY 3 ASC SELECT TOP 30 TH_DTS, T_QTY, T_S_SYMB, T_ID, ST_NAME FROM (SELECT TOP 10 T_ID AS ID FROM TRADE WHERE T_CA_ID = @acct_id ORDER BY T_DTS DESC) AS T JOIN TRADE ON T_ID=ID INNER LOOP JOIN TRADE_HISTORY ON TH_T_ID=T_ID, STATUS_TYPE WHERE ST_ID = TH_ST_ID ORDER BY TH_DTS DESC Tuesday, June 24, 2014 10 CA_ID CA_C_ID CUSTOMER_ACCOUNT HS_S_SYMB, HS_CA_ID HS_CA_ID HOLDING_SUMMARY T_ID T_CA_ID TRADE HS_S_SYMB LT_S_SYMB LAST_TRADE C_ID C_TAX_ID CUSTOMER TH_T_ID, TH_ST_ID TH_T_ID TRADE_HISTORY ST_ID STATUS TH_ST_ID CA_ID CA_C_IDCA_C_ID C_IDC_ID C_TAX_ID CUSTOMER Step 1: Identify accessed tables and accessed attributes from SQL codeStep 2: Connect tables with key-foreign joins to form a join graphStep 3: List all possible join trees from the join graphStep 4: Use the workload trace to select the best join tree
  • 11. Implementation Tuesday, June 24, 2014 11 Workload driver Partitioning Algorithm P Simulation framework CostCost Model Trace Collector Training trace Database schema Transaction SQL code Testing trace Partitioning solution SQL Server 2012
  • 12. Scalability experiments with TPC-C Tuesday, June 24, 2014 12 0 50 100 150 200 250 0 1 2 3 4 5 6 0.2 0.4 0.8 1.6 3.2 Thousand transactions GB Database size (GB) Memory Traning size Schism JECB 0 50 100 150 200 250 0 1 2 3 4 5 6 0.2 0.4 0.8 1.6 3.2 Thousand transactions GB Database size (GB) Memory
  • 13. Quality experiments Tuesday, June 24, 2014 13 0 10 20 30 40 50 60 %ofdist.transactions Schism 10% coverage Horticulture JECB
  • 14. TPC-E partitioning scheme 14 Table Horticulture JECB ACCOUNT_PERMISSION AP_CA_ID replicated CUSTOMER_TAXRATE CX_C_ID replicated DAILY_MARKET DM_DATE replicated WATCH_LIST WL_C_ID replicated CASH_TRANSACTION CT_T_ID CT_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID CUSTOMER_ACCOUNT replicated CA_C_ID HOLDING H_CA_ID H_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID HOLDING_HISTORY HH_T_ID HH_T_ID=T_ID, T_CA_ID = CA_ID, CA_C_ID HOLDING_SUMMARY HS_CA_ID HS_CA_ID = CA_ID, CA_C_ID SETTLEMENT SE_T_ID SE_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID TRADE T_CA_ID T_CA_ID = CA_ID, CA_C_ID TRADE_HISTORY TH_T_ID TH_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID TRADE_REQUEST replicated TR_T_ID = T_ID, T_CA_ID = CA_ID, CA_C_ID Join extensions help to deliver better results on complicated workloads
  • 15. What we learnt from Experiments Tuesday, June 24, 2014 15 Schema-based approach Statistics-based approach JECB: Divide-and-conquer and join extensions to break complex workloads Restrictions: all joins are key- foreign key joins Workload complexity SimpleComplicated Databasesize SmallLarge ?
  • 16. Conclusion and future work We developed a promising approach for partitioning complicated OLTP databases. Future work: Address the data skew problem Try more complicated cost models rather than % of distributed transactions End-to-end performance experiments Tuesday, June 24, 2014 16
  • 17. Phase 2 (Background) Join path: Sequence of key-foreign key joins connecting tables to partitioning attributes Allow a table to be partitioned by attribute from another table Example: Tuesday, June 24, 2014 17 CA_ID C_ID 1 1 2 1 6 2 9 2 T_ID T_CA_ID 1 1 2 1 3 2 4 2 5 6 6 6 7 9 8 9 TRADE CUSTOMER_ACCOUNT SELECT * FROM TRADE, CUSTOMER_ACCOUNT WHERE T_CA_ID = CA_ID AND C_ID = @cust_id Join path from TRADE to C_ID: T_ID T_CA_ID CA_ID CA_C_ID TRADE CUSTOMER_ACCOUNT Blue: source Red: destination Join tree: Combination of join paths sharing the same destination (the root)