際際滷

際際滷Share a Scribd company logo
DBMS 2001 Notes 6: Query Compilation 1
Principles of Database
Management Systems
6: Query Compilation and
Optimization
Pekka Kilpel辰inen
(partially based on Stanford CS245 slide
originals by Hector Garcia-Molina, Jeff Ullman
and Jennifer Widom)
DBMS 2001 Notes 6: Query Compilation 2
Overview
 We have studied recently:
 Algorithms for selections and joins, and their costs
(in terms of disk I/O)
 Next: A closer look at query compilation
 Parsing
 Algebraic optimization of logical query plans
 Estimating sizes of intermediate results
 Focus still on selections and joins
 Remember the overall process of query
execution:
DBMS 2001 Notes 6: Query Compilation 3
{P1,P2,..}
{P1,C1>...}
parse
convert
apply laws
estimate result sizes
consider physical plans estimate costs
pick best
execute
Pi
answer
SQL query
parse tree
logical query plan
improved l.q.p
l.q.p. +sizes
statistics
DBMS 2001 Notes 6: Query Compilation 4
Step 1: Parsing
 Check syntactic correctness of the query, and
transform it into a parse tree
 Based on the formal grammar for SQL
 Inner nodes nonterminal symbols (syntactic
categories for things like <Query>, <RelName>, or
<Condition>)
 Leaves terminal symbols: names of relations or
attributes, keywords (SELECT, FROM, ),
operators (+, AND, OR, LIKE, ), operands (10,
'%1960', )
 Also check semantic correctness: Relations and
attributes exist, operands compatible with
operators,
DBMS 2001 Notes 6: Query Compilation 5
Example: SQL query
Consider querying a movie database with relations
StarsIn(title, year, starName)
MovieStar(name, address, gender, birthdate)
SELECT title
FROM StarsIn, MovieStar
WHERE starName = name AND birthdate LIKE %1960 ;
(Find the titles for movies with stars born in 1960)
DBMS 2001 Notes 6: Query Compilation 6
Example: Parse Tree
<Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> , <FromList> AND
title StarsIn <RelName>
<Condition> <Condition>
<Attribute> = <Attribute> <Attribute> LIKE <Pattern>
starName name birthdate %1960
MovieStar
DBMS 2001 Notes 6: Query Compilation 7
Step 2:
Parse Tree > Logical Query
Plan Basic strategy:
SELECT A, B, C
FROM R1, R2
WHERE Cond ;
becomes
A,B,C[Cond(R1 x R2)]
DBMS 2001 Notes 6: Query Compilation 8
Example: Logical Query Plan
title
starName=name and birthdate LIKE %1960
StarsIn MovieStar
DBMS 2001 Notes 6: Query Compilation 9
Step 3: Improving the L.Q.P
 Transform the logical query plan into an
equivalent form expected to lead to
better performance
 Based on laws of relational algebra
 Normal to produce a single optimized form,
which acts as input for the generation of
physical query plans
DBMS 2001 Notes 6: Query Compilation 10
Example: Improved Logical Query Plan
title
starName=name
StarsIn
birthdate LIKE %1960
MovieStar
DBMS 2001 Notes 6: Query Compilation 11
Step 4: Estimate Result Sizes
 Cost estimates of database algorithms
depend on sizes of input relations
> Need to estimate sizes of
intermediate results
 Estimates based on statistics about
relations, gathered periodically or
incrementally
DBMS 2001 Notes 6: Query Compilation 12
Example: Estimate Result Sizes
Need expected size
StarsIn
MovieStar
DBMS 2001 Notes 6: Query Compilation 13
Steps 5, 6, ...
 Generate and compare query plans
 generate alternate query plans P1, , Pk by
selecting algorithms and execution orders
for relational operations
 Estimate the cost of each plan
 Choose the plan Pi estimated to be "best"
 Execute plan Pi and return its result
DBMS 2001 Notes 6: Query Compilation 14
Example: One Physical Plan
Parameters: join order,
memory size, project attributes,...
Hash join
SEQ scan index scan Parameters:
Select Condition,...
StarsIn MovieStar
DBMS 2001 Notes 6: Query Compilation 15
Next: Closer look at ...
 Transformation rules
 Estimating result sizes
DBMS 2001 Notes 6: Query Compilation 16
Relational algebra optimization
 Transformation rules
(preserve equivalence)
 What are good transformations?
DBMS 2001 Notes 6: Query Compilation 17
Rules: Natural joins
R S = S R (commutative)
(R S) T = R (S T) (associative)
 Carry attribute names in results, so order
is not important
> Can evaluate in any order
DBMS 2001 Notes 6: Query Compilation 18
(R x S) x T = R x (S x T)
R x S = S x R
R U (S U T) = (R U S) U T
R U S = S U R
Rules:
Cross products & union similarly
(both associative & commutative):
DBMS 2001 Notes 6: Query Compilation 19
Rules: Selects
1. p1p2(R) =
2. p1vp2(R) =
p1 [ p2 (R)]
[ p1 (R)] U [ p2 (R)]
1. Especially useful (applied left-to-right):
Allows compound select conditions to be
split and moved to suitable positions
(See next)
NB:  = AND; v = OR
DBMS 2001 Notes 6: Query Compilation 20
Rules: Products to Joins
Definition of Natural Join:
R S = L[C(R x S)] ;
Condition C equates attributes common to R
and S, and L projects one copy of them out
Applied right-to-left
- definition of general join applied similarly
DBMS 2001 Notes 6: Query Compilation 21
Let p = predicate with only R attribs
q = predicate with only S attribs
m = predicate with attribs common to R,S
p (R S) =
q (R S) =
Rules:  + combined
[p (R)] S
R [q (S)]
More rules can be derived...
DBMS 2001 Notes 6: Query Compilation 22
pq (R S) = [p (R)] [q (S)]
pqm (R S) =
m [(p R) (q S)]
pvq (R S) =
[(p R) S] U [R (q S)]
Derived rules for  +
DBMS 2001 Notes 6: Query Compilation 23
Derivation for first one;
Others for homework:
pq (R S) =
p [q (R S) ] =
p [(R q (S) ] =
[p (R)] [q (S)]
DBMS 2001 Notes 6: Query Compilation 24
Rules for ,  combined with X
similar...
e.g., p (R X S) = ?
DBMS 2001 Notes 6: Query Compilation 25
p1p2 (R)  p1 [p2 (R)]
p (R S)  [p (R)] S
R S  S R
Some good transformations:
 No transformation is always good
 Usually good: early selections
DBMS 2001 Notes 6: Query Compilation 26
Outline - Query Processing
 Relational algebra level
 transformations
 good transformations
 Detailed query plan level
 estimate costs
 generate and compare plans
next
DBMS 2001 Notes 6: Query Compilation 27
 Estimating cost of query plan
(1) Estimating size of intermediate results
>
(2) Estimating # of I/Os
(considered last week)
DBMS 2001 Notes 6: Query Compilation 28
Estimating result size
 Maintain statistics for relation R
 T(R) : # tuples in R
 L(R) : Length of rows,
# of bytes in each R tuple
 B(R): # of blocks to hold all R tuples
 V(R, A) :
# distinct values in R for attribute A
DBMS 2001 Notes 6: Query Compilation 29
Example
R A: 20 byte string
B: 4 byte integer
C: 8 byte date
D: 5 byte string
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
T(R) = 5 L(R) = 37
V(R,A) = 3 V(R,C) = 5
V(R,B) = 1 V(R,D) = 4
DBMS 2001 Notes 6: Query Compilation 30
Size estimates for product W = R1xR2
T(W) =
L(W) =
T(R1)  T(R2)
L(R1) + L(R2)
DBMS 2001 Notes 6: Query Compilation 31
L(W) = L(R)
T(W) = ?
Estimates for selection W = A=a (R)
= AVG number of tuples that satisfy
an equality condition on R.A
=
DBMS 2001 Notes 6: Query Compilation 32
Example
R V(R,A)=3
V(R,B)=1
V(R,C)=5
V(R,D)=4
AVG size of, say, A=val(R)?
T(A=cat(R))=2, T(A=dog(R))=2, T(A=bat(R))=1
=> (2+2+1)/3 = 5/3
A B C D
cat 1 10 a
cat 1 20 b
dog 1 30 a
dog 1 40 c
bat 1 50 d
DBMS 2001 Notes 6: Query Compilation 33
Size of W = Z=a(R) in general:
Assume: Only existing Z values a1, a2, 
are used in a select expression Z= ai,
each with equal probalility 1/V(R,Z)
=> the expected size of Z=val(R) is
E(T(W)) = 裡i 1/V(R,Z) * T(Z=a
i (R))
= T(R)/V(R,Z)
DBMS 2001 Notes 6: Query Compilation 34
What about selection W=z  val (R) ?
T(W) = ?
 Estimate 1: T(W) = T(R)/2
 Rationale: On avg, 1/2 of tuples satisfy the condition
 Estimate 2: T(W) = T(R)/3
 Rationale: Acknowledges the tendency of
selecting "interesting" (e.g., rare tuples) more
frequently
<,  and < similary
DBMS 2001 Notes 6: Query Compilation 35
Size estimates for W = R1 R2
Let X = attributes of R1
Y = attributes of R2
X  Y = 
Same as R1 x R2
Case 1
DBMS 2001 Notes 6: Query Compilation 36
W = R1 R2 X  Y = A
R1 A B C R2 A D
Case 2
Assumption:
V(R1,A)  V(R2,A)  A(R1)  A(R2)
V(R2,A)  V(R1,A)  A(R2)  A(R1)
Containment of value sets [Sec. 7.4.4]
DBMS 2001 Notes 6: Query Compilation 37
Why should containment of value sets hold?
Consider joining relations
Faculties(FName, ) and
Depts(DName, FName, )
where Depts.FName is a foreign key, and
faculties can have 0,,n departments.
Now V(Depts, FName)  V(Faculties,
FName), and referential integrity requires that
FName(Depts)  FName(Faculties)
DBMS 2001 Notes 6: Query Compilation 38
R1 A B C R2 A D
a
Estimating T(W) when V(R1,A)  V(R2,A)
Take
1 tuple
Match
Each estimated to match with T(R2)/V(R2,A)
tuples ...
so T(W) = T(R1)T(R2)/V(R2, A)
DBMS 2001 Notes 6: Query Compilation 39
 V(R1,A)  V(R2,A): T(W) = T(R1)T(R2)/V(R2,A)
 V(R2,A)  V(R1,A): T(W) = T(R2)T(R1)/V(R1,A)
[A is the join attribute]
=> in general:
T(W) = T(R1)T(R2)/max{V(R1,A), V(R2,A) }
DBMS 2001 Notes 6: Query Compilation 40
With similar ideas, can estimate sizes of:
 W = AB (R) .. [Sec. 7.4.2]
 W = A=aB=b (R) = A=a (B=b (R));
Ass. A and B independent =>
T(W) = T(R)/(V(R, A) x V(R, B))
 W=R(A,X,Y) S(X,Y,B); [Sec. 7.4.5]
Ass. X and Y independent => 
T(W) = T(R)T(S)/(max{V(R,X), V(S,X)} x
max{V(R,Y), V(S,Y)})
 Union, intersection, diff, . [Sec. 7.4.7]
DBMS 2001 Notes 6: Query Compilation 41
Note: for complex expressions, need
intermediate estimates.
E.g. W = [A=a (R1) ] R2
Treat as relation U
T(U) = T(R1)/V(R1,A) L(U) = L(R1)
Also need an estimate for V(U, Ai) !
DBMS 2001 Notes 6: Query Compilation 42
To estimate V (U, Ai)
E.g., U = A=a (R)
Say R has attributes A,B,C,D
V(U, A) = ?
V(U, B) = ?
V(U, C) = ?
V(U, D) = ?
DBMS 2001 Notes 6: Query Compilation 43
Example
R V(R,A)=3
V(R,B)=1
V(R,C)=T(R)=5
V(R,D)=3
U = A=x (R)
A B C D
cat 1 10 10
cat 1 20 20
dog 1 30 10
dog 1 40 30
bat 1 50 10
V(U, D) = V(R,D)/T(R,A) ... V(R,D)
V(U,A) =1 V(U,B) =1 V(U,C) = T(R)
V(R,A)
DBMS 2001 Notes 6: Query Compilation 44
V(U,Ai) for Joins U = R1(A,B) R2(A,C)
V(U,A) = min { V(R1, A), V(R2, A) }
V(U,B) = V(R1, B)
V(U,C) = V(R2, C)
[Assumption called
preservation of value sets, sect. 7.4.4]
Values of non-join
attributes preserved
DBMS 2001 Notes 6: Query Compilation 45
Example:
Z = R1(A,B) R2(B,C) R3(C,D)
 R1: T(R1) = 1000 V(R1,A)=50 V(R1,B)=100
 R2: T(R2) = 2000 V(R2,B)=200 V(R2,C)=300
 R3: T(R3) = 3000 V(R3,C)=90 V(R3,D)=500
DBMS 2001 Notes 6: Query Compilation 46
T(U) = T(R1) T(R2)/max{V(R1,B), V(R2,B)}
= 10002000/200 = 10,000;
V(U,A) = V(R1,A) = 50
V(U,C) = V(R2,C) = 300
V(U,B) = min{V(R1,B), V(R2,B)}= 100
Intermediate Result:
U(A,B,C)= R1(A,B) R2(B,C)
DBMS 2001 Notes 6: Query Compilation 47
Z = U(A,B,C) R3(C,D)
T(Z) = T(U)T(R3)/ max{V(U,C), V(R3,C)}
= 10,0003000/300
= 100,000
V(Z,A) = V(U,A) = 50
V(Z,B) = V(U,B) = 100
V(Z,C) = min{V(U,C), V(R3,C)} = 90
V(Z,D) = V(R3,D) =500
DBMS 2001 Notes 6: Query Compilation 48
Outline/Summary
 Estimating cost of query plan
 Estimating size of results done!
 Estimating # of IOs last week
 Generate and compare plans
 skip this
 Execute physical operations of query plan
 Sketched last week

More Related Content

What's hot (9)

2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
DB Tsai
Lecture 10 data structures and algorithms
Lecture 10 data structures and algorithmsLecture 10 data structures and algorithms
Lecture 10 data structures and algorithms
Aakash deep Singhal
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean OptimizationEfficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
jfrchicanog
Spark summit talk, july 2014 powered by reveal
Spark summit talk, july 2014 powered by revealSpark summit talk, july 2014 powered by reveal
Spark summit talk, july 2014 powered by reveal
Debasish Das
Advance control theory
Advance control theoryAdvance control theory
Advance control theory
SHIMI S L
[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13
AnusAhmad
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
Krish_ver2
Lecture23
Lecture23Lecture23
Lecture23
Dr Sandeep Kumar Poonia
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
DB Tsai
Lecture 10 data structures and algorithms
Lecture 10 data structures and algorithmsLecture 10 data structures and algorithms
Lecture 10 data structures and algorithms
Aakash deep Singhal
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean OptimizationEfficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
Efficient Hill Climber for Multi-Objective Pseudo-Boolean Optimization
jfrchicanog
Spark summit talk, july 2014 powered by reveal
Spark summit talk, july 2014 powered by revealSpark summit talk, july 2014 powered by reveal
Spark summit talk, july 2014 powered by reveal
Debasish Das
Advance control theory
Advance control theoryAdvance control theory
Advance control theory
SHIMI S L
[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13[Www.pkbulk.blogspot.com]dbms13
[Www.pkbulk.blogspot.com]dbms13
AnusAhmad
5.4 randomized datastructures
5.4 randomized datastructures5.4 randomized datastructures
5.4 randomized datastructures
Krish_ver2
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
J Singh

Viewers also liked (15)

Agenda
AgendaAgenda
Agenda
tutorialsruby
Relazioni brixen 2016
Relazioni brixen 2016Relazioni brixen 2016
Relazioni brixen 2016
IstitutoRete
Berkeley VC Course Completion Certificate 2016
Berkeley VC Course Completion Certificate 2016Berkeley VC Course Completion Certificate 2016
Berkeley VC Course Completion Certificate 2016
Mitchell Weinstock
How to Send Fax From BlackBerry Smartphones
How to Send Fax From BlackBerry SmartphonesHow to Send Fax From BlackBerry Smartphones
How to Send Fax From BlackBerry Smartphones
alooftwaddle1646
The Adbrand Battles Presentation Final Draft
The Adbrand Battles Presentation Final DraftThe Adbrand Battles Presentation Final Draft
The Adbrand Battles Presentation Final Draft
Eric Floresca
CV Iain Lewis
CV Iain LewisCV Iain Lewis
CV Iain Lewis
Iain Lewis
eng
engeng
eng
Andrei Ciprian Maxinese
Sitios webSitios web
Sitios web
Camilo Dorado
Catalogo 2013 aplic_por_correias_poly_vCatalogo 2013 aplic_por_correias_poly_v
Catalogo 2013 aplic_por_correias_poly_v
carlosmendelski
Onlinechat
OnlinechatOnlinechat
Onlinechat
shavetaverma
Tabela de-parafusos1Tabela de-parafusos1
Tabela de-parafusos1
lauromatozinho
The Seven Kinds of Security
The Seven Kinds of SecurityThe Seven Kinds of Security
The Seven Kinds of Security
Veracode
PLANTAS VASCULARES SIN SEMILLAPLANTAS VASCULARES SIN SEMILLA
PLANTAS VASCULARES SIN SEMILLA
william tito nina
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e ArruelasGabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
RH Indufix Fixadores
Back-to-School market trends + recommendations
Back-to-School market trends + recommendations Back-to-School market trends + recommendations
Back-to-School market trends + recommendations
VaynerMedia
Relazioni brixen 2016
Relazioni brixen 2016Relazioni brixen 2016
Relazioni brixen 2016
IstitutoRete
Berkeley VC Course Completion Certificate 2016
Berkeley VC Course Completion Certificate 2016Berkeley VC Course Completion Certificate 2016
Berkeley VC Course Completion Certificate 2016
Mitchell Weinstock
How to Send Fax From BlackBerry Smartphones
How to Send Fax From BlackBerry SmartphonesHow to Send Fax From BlackBerry Smartphones
How to Send Fax From BlackBerry Smartphones
alooftwaddle1646
The Adbrand Battles Presentation Final Draft
The Adbrand Battles Presentation Final DraftThe Adbrand Battles Presentation Final Draft
The Adbrand Battles Presentation Final Draft
Eric Floresca
CV Iain Lewis
CV Iain LewisCV Iain Lewis
CV Iain Lewis
Iain Lewis
Sitios webSitios web
Sitios web
Camilo Dorado
Catalogo 2013 aplic_por_correias_poly_vCatalogo 2013 aplic_por_correias_poly_v
Catalogo 2013 aplic_por_correias_poly_v
carlosmendelski
Tabela de-parafusos1Tabela de-parafusos1
Tabela de-parafusos1
lauromatozinho
The Seven Kinds of Security
The Seven Kinds of SecurityThe Seven Kinds of Security
The Seven Kinds of Security
Veracode
PLANTAS VASCULARES SIN SEMILLAPLANTAS VASCULARES SIN SEMILLA
PLANTAS VASCULARES SIN SEMILLA
william tito nina
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e ArruelasGabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
Gabaritos para Imprimir - Tamanho Real de Parafusos, Porcas e Arruelas
RH Indufix Fixadores
Back-to-School market trends + recommendations
Back-to-School market trends + recommendations Back-to-School market trends + recommendations
Back-to-School market trends + recommendations
VaynerMedia

Similar to Compilation (20)

Intro to relational model
Intro to relational modelIntro to relational model
Intro to relational model
ATS SBGI MIRAJ
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
J Singh
dld 01-introduction
dld 01-introductiondld 01-introduction
dld 01-introduction
United International University
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
Khaled Al-Shamaa
Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica
"FENG "GEORGE"" YU
Query Optimization By Example on Database Apps
Query Optimization By Example on Database AppsQuery Optimization By Example on Database Apps
Query Optimization By Example on Database Apps
Sigit Priyanta
S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...
CAChemE
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
LinaCovington707
Algebra
AlgebraAlgebra
Algebra
himanshu211
relational algebra and it's implementation
relational algebra and it's implementationrelational algebra and it's implementation
relational algebra and it's implementation
dbmscse61
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
ssuser01e301
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
Qingsong Guo
lecture14.ppt
lecture14.pptlecture14.ppt
lecture14.ppt
SivaSankaran81
Adbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationAdbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimization
Vaibhav Khanna
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculus
emailharmeet
Relational Algebra1.ppt
Relational Algebra1.pptRelational Algebra1.ppt
Relational Algebra1.ppt
20DCE031RAJGANATRA
Relational Algebra
Relational AlgebraRelational Algebra
Relational Algebra
Amin Omi
MOD2-DBMS.pdf
MOD2-DBMS.pdfMOD2-DBMS.pdf
MOD2-DBMS.pdf
RohitKumarSahoo5
relalgebraasssssssssssssssssssssssss.ppt
relalgebraasssssssssssssssssssssssss.pptrelalgebraasssssssssssssssssssssssss.ppt
relalgebraasssssssssssssssssssssssss.ppt
bambangryanChannelYT
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
Mohammed El Rafie Tarabay
Intro to relational model
Intro to relational modelIntro to relational model
Intro to relational model
ATS SBGI MIRAJ
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
J Singh
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
Khaled Al-Shamaa
Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica
"FENG "GEORGE"" YU
Query Optimization By Example on Database Apps
Query Optimization By Example on Database AppsQuery Optimization By Example on Database Apps
Query Optimization By Example on Database Apps
Sigit Priyanta
S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...S1 - Process product optimization using design experiments and response surfa...
S1 - Process product optimization using design experiments and response surfa...
CAChemE
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStoraDBMS ArchitectureQuery ExecutorBuffer ManagerStora
DBMS ArchitectureQuery ExecutorBuffer ManagerStora
LinaCovington707
relational algebra and it's implementation
relational algebra and it's implementationrelational algebra and it's implementation
relational algebra and it's implementation
dbmscse61
Unit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptxUnit-1 Basic Concept of Algorithm.pptx
Unit-1 Basic Concept of Algorithm.pptx
ssuser01e301
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
Qingsong Guo
Adbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimizationAdbms 40 heuristics in query optimization
Adbms 40 heuristics in query optimization
Vaibhav Khanna
Lecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculusLecture 06 relational algebra and calculus
Lecture 06 relational algebra and calculus
emailharmeet
Relational Algebra
Relational AlgebraRelational Algebra
Relational Algebra
Amin Omi
relalgebraasssssssssssssssssssssssss.ppt
relalgebraasssssssssssssssssssssssss.pptrelalgebraasssssssssssssssssssssssss.ppt
relalgebraasssssssssssssssssssssssss.ppt
bambangryanChannelYT

Compilation

  • 1. DBMS 2001 Notes 6: Query Compilation 1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpel辰inen (partially based on Stanford CS245 slide originals by Hector Garcia-Molina, Jeff Ullman and Jennifer Widom)
  • 2. DBMS 2001 Notes 6: Query Compilation 2 Overview We have studied recently: Algorithms for selections and joins, and their costs (in terms of disk I/O) Next: A closer look at query compilation Parsing Algebraic optimization of logical query plans Estimating sizes of intermediate results Focus still on selections and joins Remember the overall process of query execution:
  • 3. DBMS 2001 Notes 6: Query Compilation 3 {P1,P2,..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer SQL query parse tree logical query plan improved l.q.p l.q.p. +sizes statistics
  • 4. DBMS 2001 Notes 6: Query Compilation 4 Step 1: Parsing Check syntactic correctness of the query, and transform it into a parse tree Based on the formal grammar for SQL Inner nodes nonterminal symbols (syntactic categories for things like <Query>, <RelName>, or <Condition>) Leaves terminal symbols: names of relations or attributes, keywords (SELECT, FROM, ), operators (+, AND, OR, LIKE, ), operands (10, '%1960', ) Also check semantic correctness: Relations and attributes exist, operands compatible with operators,
  • 5. DBMS 2001 Notes 6: Query Compilation 5 Example: SQL query Consider querying a movie database with relations StarsIn(title, year, starName) MovieStar(name, address, gender, birthdate) SELECT title FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE %1960 ; (Find the titles for movies with stars born in 1960)
  • 6. DBMS 2001 Notes 6: Query Compilation 6 Example: Parse Tree <Query> <SFW> SELECT <SelList> FROM <FromList> WHERE <Condition> <Attribute> <RelName> , <FromList> AND title StarsIn <RelName> <Condition> <Condition> <Attribute> = <Attribute> <Attribute> LIKE <Pattern> starName name birthdate %1960 MovieStar
  • 7. DBMS 2001 Notes 6: Query Compilation 7 Step 2: Parse Tree > Logical Query Plan Basic strategy: SELECT A, B, C FROM R1, R2 WHERE Cond ; becomes A,B,C[Cond(R1 x R2)]
  • 8. DBMS 2001 Notes 6: Query Compilation 8 Example: Logical Query Plan title starName=name and birthdate LIKE %1960 StarsIn MovieStar
  • 9. DBMS 2001 Notes 6: Query Compilation 9 Step 3: Improving the L.Q.P Transform the logical query plan into an equivalent form expected to lead to better performance Based on laws of relational algebra Normal to produce a single optimized form, which acts as input for the generation of physical query plans
  • 10. DBMS 2001 Notes 6: Query Compilation 10 Example: Improved Logical Query Plan title starName=name StarsIn birthdate LIKE %1960 MovieStar
  • 11. DBMS 2001 Notes 6: Query Compilation 11 Step 4: Estimate Result Sizes Cost estimates of database algorithms depend on sizes of input relations > Need to estimate sizes of intermediate results Estimates based on statistics about relations, gathered periodically or incrementally
  • 12. DBMS 2001 Notes 6: Query Compilation 12 Example: Estimate Result Sizes Need expected size StarsIn MovieStar
  • 13. DBMS 2001 Notes 6: Query Compilation 13 Steps 5, 6, ... Generate and compare query plans generate alternate query plans P1, , Pk by selecting algorithms and execution orders for relational operations Estimate the cost of each plan Choose the plan Pi estimated to be "best" Execute plan Pi and return its result
  • 14. DBMS 2001 Notes 6: Query Compilation 14 Example: One Physical Plan Parameters: join order, memory size, project attributes,... Hash join SEQ scan index scan Parameters: Select Condition,... StarsIn MovieStar
  • 15. DBMS 2001 Notes 6: Query Compilation 15 Next: Closer look at ... Transformation rules Estimating result sizes
  • 16. DBMS 2001 Notes 6: Query Compilation 16 Relational algebra optimization Transformation rules (preserve equivalence) What are good transformations?
  • 17. DBMS 2001 Notes 6: Query Compilation 17 Rules: Natural joins R S = S R (commutative) (R S) T = R (S T) (associative) Carry attribute names in results, so order is not important > Can evaluate in any order
  • 18. DBMS 2001 Notes 6: Query Compilation 18 (R x S) x T = R x (S x T) R x S = S x R R U (S U T) = (R U S) U T R U S = S U R Rules: Cross products & union similarly (both associative & commutative):
  • 19. DBMS 2001 Notes 6: Query Compilation 19 Rules: Selects 1. p1p2(R) = 2. p1vp2(R) = p1 [ p2 (R)] [ p1 (R)] U [ p2 (R)] 1. Especially useful (applied left-to-right): Allows compound select conditions to be split and moved to suitable positions (See next) NB: = AND; v = OR
  • 20. DBMS 2001 Notes 6: Query Compilation 20 Rules: Products to Joins Definition of Natural Join: R S = L[C(R x S)] ; Condition C equates attributes common to R and S, and L projects one copy of them out Applied right-to-left - definition of general join applied similarly
  • 21. DBMS 2001 Notes 6: Query Compilation 21 Let p = predicate with only R attribs q = predicate with only S attribs m = predicate with attribs common to R,S p (R S) = q (R S) = Rules: + combined [p (R)] S R [q (S)] More rules can be derived...
  • 22. DBMS 2001 Notes 6: Query Compilation 22 pq (R S) = [p (R)] [q (S)] pqm (R S) = m [(p R) (q S)] pvq (R S) = [(p R) S] U [R (q S)] Derived rules for +
  • 23. DBMS 2001 Notes 6: Query Compilation 23 Derivation for first one; Others for homework: pq (R S) = p [q (R S) ] = p [(R q (S) ] = [p (R)] [q (S)]
  • 24. DBMS 2001 Notes 6: Query Compilation 24 Rules for , combined with X similar... e.g., p (R X S) = ?
  • 25. DBMS 2001 Notes 6: Query Compilation 25 p1p2 (R) p1 [p2 (R)] p (R S) [p (R)] S R S S R Some good transformations: No transformation is always good Usually good: early selections
  • 26. DBMS 2001 Notes 6: Query Compilation 26 Outline - Query Processing Relational algebra level transformations good transformations Detailed query plan level estimate costs generate and compare plans next
  • 27. DBMS 2001 Notes 6: Query Compilation 27 Estimating cost of query plan (1) Estimating size of intermediate results > (2) Estimating # of I/Os (considered last week)
  • 28. DBMS 2001 Notes 6: Query Compilation 28 Estimating result size Maintain statistics for relation R T(R) : # tuples in R L(R) : Length of rows, # of bytes in each R tuple B(R): # of blocks to hold all R tuples V(R, A) : # distinct values in R for attribute A
  • 29. DBMS 2001 Notes 6: Query Compilation 29 Example R A: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d T(R) = 5 L(R) = 37 V(R,A) = 3 V(R,C) = 5 V(R,B) = 1 V(R,D) = 4
  • 30. DBMS 2001 Notes 6: Query Compilation 30 Size estimates for product W = R1xR2 T(W) = L(W) = T(R1) T(R2) L(R1) + L(R2)
  • 31. DBMS 2001 Notes 6: Query Compilation 31 L(W) = L(R) T(W) = ? Estimates for selection W = A=a (R) = AVG number of tuples that satisfy an equality condition on R.A =
  • 32. DBMS 2001 Notes 6: Query Compilation 32 Example R V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 AVG size of, say, A=val(R)? T(A=cat(R))=2, T(A=dog(R))=2, T(A=bat(R))=1 => (2+2+1)/3 = 5/3 A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d
  • 33. DBMS 2001 Notes 6: Query Compilation 33 Size of W = Z=a(R) in general: Assume: Only existing Z values a1, a2, are used in a select expression Z= ai, each with equal probalility 1/V(R,Z) => the expected size of Z=val(R) is E(T(W)) = 裡i 1/V(R,Z) * T(Z=a i (R)) = T(R)/V(R,Z)
  • 34. DBMS 2001 Notes 6: Query Compilation 34 What about selection W=z val (R) ? T(W) = ? Estimate 1: T(W) = T(R)/2 Rationale: On avg, 1/2 of tuples satisfy the condition Estimate 2: T(W) = T(R)/3 Rationale: Acknowledges the tendency of selecting "interesting" (e.g., rare tuples) more frequently <, and < similary
  • 35. DBMS 2001 Notes 6: Query Compilation 35 Size estimates for W = R1 R2 Let X = attributes of R1 Y = attributes of R2 X Y = Same as R1 x R2 Case 1
  • 36. DBMS 2001 Notes 6: Query Compilation 36 W = R1 R2 X Y = A R1 A B C R2 A D Case 2 Assumption: V(R1,A) V(R2,A) A(R1) A(R2) V(R2,A) V(R1,A) A(R2) A(R1) Containment of value sets [Sec. 7.4.4]
  • 37. DBMS 2001 Notes 6: Query Compilation 37 Why should containment of value sets hold? Consider joining relations Faculties(FName, ) and Depts(DName, FName, ) where Depts.FName is a foreign key, and faculties can have 0,,n departments. Now V(Depts, FName) V(Faculties, FName), and referential integrity requires that FName(Depts) FName(Faculties)
  • 38. DBMS 2001 Notes 6: Query Compilation 38 R1 A B C R2 A D a Estimating T(W) when V(R1,A) V(R2,A) Take 1 tuple Match Each estimated to match with T(R2)/V(R2,A) tuples ... so T(W) = T(R1)T(R2)/V(R2, A)
  • 39. DBMS 2001 Notes 6: Query Compilation 39 V(R1,A) V(R2,A): T(W) = T(R1)T(R2)/V(R2,A) V(R2,A) V(R1,A): T(W) = T(R2)T(R1)/V(R1,A) [A is the join attribute] => in general: T(W) = T(R1)T(R2)/max{V(R1,A), V(R2,A) }
  • 40. DBMS 2001 Notes 6: Query Compilation 40 With similar ideas, can estimate sizes of: W = AB (R) .. [Sec. 7.4.2] W = A=aB=b (R) = A=a (B=b (R)); Ass. A and B independent => T(W) = T(R)/(V(R, A) x V(R, B)) W=R(A,X,Y) S(X,Y,B); [Sec. 7.4.5] Ass. X and Y independent => T(W) = T(R)T(S)/(max{V(R,X), V(S,X)} x max{V(R,Y), V(S,Y)}) Union, intersection, diff, . [Sec. 7.4.7]
  • 41. DBMS 2001 Notes 6: Query Compilation 41 Note: for complex expressions, need intermediate estimates. E.g. W = [A=a (R1) ] R2 Treat as relation U T(U) = T(R1)/V(R1,A) L(U) = L(R1) Also need an estimate for V(U, Ai) !
  • 42. DBMS 2001 Notes 6: Query Compilation 42 To estimate V (U, Ai) E.g., U = A=a (R) Say R has attributes A,B,C,D V(U, A) = ? V(U, B) = ? V(U, C) = ? V(U, D) = ?
  • 43. DBMS 2001 Notes 6: Query Compilation 43 Example R V(R,A)=3 V(R,B)=1 V(R,C)=T(R)=5 V(R,D)=3 U = A=x (R) A B C D cat 1 10 10 cat 1 20 20 dog 1 30 10 dog 1 40 30 bat 1 50 10 V(U, D) = V(R,D)/T(R,A) ... V(R,D) V(U,A) =1 V(U,B) =1 V(U,C) = T(R) V(R,A)
  • 44. DBMS 2001 Notes 6: Query Compilation 44 V(U,Ai) for Joins U = R1(A,B) R2(A,C) V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) [Assumption called preservation of value sets, sect. 7.4.4] Values of non-join attributes preserved
  • 45. DBMS 2001 Notes 6: Query Compilation 45 Example: Z = R1(A,B) R2(B,C) R3(C,D) R1: T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 R2: T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 R3: T(R3) = 3000 V(R3,C)=90 V(R3,D)=500
  • 46. DBMS 2001 Notes 6: Query Compilation 46 T(U) = T(R1) T(R2)/max{V(R1,B), V(R2,B)} = 10002000/200 = 10,000; V(U,A) = V(R1,A) = 50 V(U,C) = V(R2,C) = 300 V(U,B) = min{V(R1,B), V(R2,B)}= 100 Intermediate Result: U(A,B,C)= R1(A,B) R2(B,C)
  • 47. DBMS 2001 Notes 6: Query Compilation 47 Z = U(A,B,C) R3(C,D) T(Z) = T(U)T(R3)/ max{V(U,C), V(R3,C)} = 10,0003000/300 = 100,000 V(Z,A) = V(U,A) = 50 V(Z,B) = V(U,B) = 100 V(Z,C) = min{V(U,C), V(R3,C)} = 90 V(Z,D) = V(R3,D) =500
  • 48. DBMS 2001 Notes 6: Query Compilation 48 Outline/Summary Estimating cost of query plan Estimating size of results done! Estimating # of IOs last week Generate and compare plans skip this Execute physical operations of query plan Sketched last week