Vectorized Processing in a Nutshell. (in Korean)
Presented by Hyoungjun Kim, Gruter CTO and Apache Tajo committer, at DeView 2014, Sep. 30 Seoul Korea.
Imagination-Augmented Agents for Deep Reinforcement Learning?? ?
?
I will introduce a paper about I2A architecture made by deepmind. That is about Imagination-Augmented Agents for Deep Reinforcement Learning
This slide were presented at Deep Learning Study group in DAVIAN LAB.
Paper link: https://arxiv.org/abs/1707.06203
Spark machine learning & deep learninghoondong kim
?
Spark Machine Learning and Deep Learning Deep Dive.
Scenarios that use Spark hybrid with other data analytics tools (MS R on Spark, Tensorflow(keras) with Spark, Scikit-learn with Spark, etc)
Spark machine learning & deep learninghoondong kim
?
Spark Machine Learning and Deep Learning Deep Dive.
Scenarios that use Spark hybrid with other data analytics tools (MS R on Spark, Tensorflow(keras) with Spark, Scikit-learn with Spark, etc)
9. 9
OPTIMIZER BASICS
? Three main questions you should ask when
looking for an efficient execution plan:
1. How much data? How many rows? Volume?
2. How scattered / clustered is the data?
3. Caching?
=> Know your data!
10. 10
OPTIMIZER BASICS
? Why are these questioins so important?
? Two main strategies:
1. One “Big Job”
=>How much data, volume?
2. Few/many “Small Jobs”
=>How many times / rows ?
=>Effort per iteration? Clustering/Caching
11. 11
OPTIMIZER BASICS
? Optimizer’s cost estimate is based on:
? How much data? How many rows?
? How scattered / clustered ?(partially)
? (Caching?) Not at all : 11g
12. 12
SUMMARY
? Cardinality and Clustering determine
whether the “Big Job” or “Small Job”
strategy should be preferred
? If the optimizer gets these estimates rigtht,
the resulting execution plan will be
efficient within the boundaries of the given
access paths
? Know your data and business questions
? Help your optimizer. (Oracle doesn’t know
the data the way you know it.)
13. 13
Today’s LEMA
? Oracle doesn’t know the
data the way you know it!!
? Inefficient Execution Plan
50% Oracle Does not know the data.
50% SQL writers Do not know the
optimizer.
15. 15
HOW SCATTERED / CLUSTERED?
? INDEX SCAN ? TABLE BLOCK
? Worst Case
1,000 rows => visit 1,000 table blocks:
1,000 * 5ms = 5s
? Good Case
1,000 rows => visit 10 table blocks: 10*5ms = 50ms
16. 16
HOW SCATTERED / CLUSTERED?
? There is only a single measure of clustering
in Oracle:
The index clustering factor
? The index clustering factor is represented
by a single value
? The logic measuring the clustering factor by
default does not cater for data clustered
across few blocks(ASSM!)
17. 17
HOW SCATTERED / CLUSTERED?
? Challenges
? Getting the index clustering factor right
? There are various reasons why the index
clustering factor measured by Oracle might not
be representative
- Multiple freelists / freelist groups
- ASSM (automatic space segment management)
- Partitioning
- SHRINK SPACE effores
20. 20
? The CF in case of an index range scan
with table access involved represents the
largest fraction of the cost associated
with the operation. (See 10053 trace file)
HOW SCATTERED / CLUSTERED?
22. 22
Statistics
? Controlling column statistics via METHOD_OPT
? FOR ALL INDEXED COLUMNS SIZE > 1:
? Nonsense, without basic column
statistics
? Default from 10g on:
FOR ALL COLUMNS SIZE AUTO:
basic column statistics for all coumns,
histograms if Oracle determines so
23. 23
HISTOGRAMS
? Basic column statistics get generated
along with table statistics in a single pass
? Each histogram requires a separate pass
? Therefore Oracle resorts to aggressive
sampling if allowed =>AUTO_SAMPLE_SIZE
? This limits the quality of histograms and
their significance
(basic column statistics? ?? ?? row? ???? ?
?, histogram? ?? ?? row? sampling)
user_tab_col_statistics ??
24. 24
HISTOGRAMS
? Limited resolution of 255 value pairs
maximum
? Less than 255 distinct column values =>
Frequency Histogram
? More than 255 distinct column values
=>Height Balanced Histogram
? Height Balanced is always a sampling of
data, even when computing statistics!
25. 25
HISTOGRAMS
? Aggressive sampling
? Oracle doesn’t trust its own histogram
information when caculating estimated
cardinality.
? Very bad cardinality estimation ?
inefficient execution plan
26. 26
Frequency Histograms
? When it consists of only a few popular
values
? Very popular and nonpopular values
? Dynamic sampling also is not
representative
? Statistics is sometimes inconsistent
27. 27
Height Histograms
? Rounding effects
? They cannot cover all values.
? Histogram values are unstable.
(when you gather histograms, the values
can be different.)
? Oracle doesn’t know the data the way
you know it.
30. 30
SUMMARY
? Check the correctness of the CF for your
critical indexes
? Oracle does not know the questions you
ask about the data
? You may want to use FOR ALL COLUMNS
SIZE 1 as default and only generate
histograms where really necessary
? You may get better results with the old
histogram behavior, but not always
31. 31
SUMMARY
? There are data patterns that don’t work well
with histograms
? => You may need to manullay generate
histograms using
DBMS_STATS.SET_COLUMN_STATS for
critical columns
? Don’t forget about Dynamic
Sampling/FBI/Virtual Columns/Extended
Statistics
? Know your data and business questions!
32. 32
10053 Trace File
? SYSTEM Statistics Information
- CPU SPEED, SBRDTime, MBRDTime, MBRC
? ???/??? Statistical Information
- Base Cardinality, Density, CLUF
? Cardinality Estimation
? Cost Estimation
- Access Type, Join Type, Join Order
? ??? ?? ? ?? ??.
33. 33
COST?
? Jonathan Lewis
? The cost represents (and has always
represented) the optimizer’s best
estimate of the time it will take to
execute the statement.
? Query? ?? ?? ??(Time)
34. 34
Time???? Cost
? Total Time = CPU Time + I/O time + Wait Time
? Estimated Time
= Estimated CPU time + Estimated I/O time
= Estimated CPU time + Single Block I/O time
+ Multi Block I/O time
36. 36
Time???? Cost
? Total Time = CPU Time + I/O time + Wait Time
? Estimated Time
= Estimated CPU time + Estimated I/O time
= Estimated CPU time + Single Block I/O time +
Multi Block I/O time
? COST = (Estimated Time / Single Block I/O Time)
?cost? ?? = ????? ??
= Single Block I/O count + ??Multi Block I/O count + ?
?CPU count
? ?? ?? ?? ??? single block I/O time? ?? ???
? ??? count??? ??
? I/O Cost ???? ??
55. 55
Clustering Factor
? A measure of the orderedness of an
index in comparison to the table that it
is based upon. It is used as an indicator
for computing the estimated cost of the
table lookup following an index access.
? Index Key? Table Row? ?? ??? ?
?? ??? ???? ???
56. 56
Clustering Factor
? Good Clustering Factor
Index = 1,2,3,4,5,6,7,8,9,10,….
Table = 1,2,3,4,5,6,7,8,9,10,….
? Bad Clustering Facotr
Index = 1,2,3,4,5,6,7,8,9,10,…
Table = 3,8,7,1,4,5,10,2,6,9,…
Index = 1,2,3,4,5,6,7,8
Table = 8,7,6,5,4,3,2,1 Good or Bad?
57. 57
CF ?? ??
? The CF records the number of data
blocks that will be accessed when
scanning an index.
? ????
1. ? ??? Table Data Block??
Memory? Cache? ? ??.
2. CF? Physical Reads, ? Cache? ???
? ?? Disk?? ?? ??? ????.
58. 58
CF ?? ??
1 2 3
1 2 3
INDEX
TABLE
Index
Block
Table
Block
81. 81
?? ??
? ??? ????
? ??? ?
? ????
Select t1.c1, t2.c2, (select fun_nm(c1) from t2)
From t1,
(select t3.c1 from t3),
Where t1.c1 = t2.c1
and t2.c2 not in (select c2
from t1);
93. 93
??vs. ??? ????
Q: Compared to the join method, are
sacalar subqueries always inefficient?
A : No, sometimes scalar subqueries are
more desirable.
Q : When subqueries are more efficient?
94. 94
?? vs. scalar subquery
select rownum rnum, x.*
from (select /*+ leading(T1) USE_NL(T1 T2 T3) */
t1.c1, t1.c2, t1.c3,
t2.c3 as t2_c3,
t3.c3 as t3_c3
from scalar_t1 t1, scalar_t2 t2, scalar_t3 t3
where t1.c1 = t2.c1(+)
and t1.c1 = t3.c1(+)
order by t1.c1, t1.c2) x
where rownum <= 10;
? ??? ????? ??? ??, ??? ???? ???? ??? ?
???, ??? ??? GURU?.
140. 140
???? ?? ?? ??1
Select *
From (
select rownum no, a.*
from (
select memb_nm(??????) ??????,
memb_nm(??????) ??????,
??????, ??????, ????, ????, ???
from ??
where ???? = :????
and ???? = :????
and ???? between sysdate -10/24 and sysdate
order by ???? desc
) a
where rownum <= 30
)
)
Where no between 21 and 30
)
? ????? ??? ??? ?????
141. 141
???? ?? ?? ??1
? ????? ??? ???? row count? 10????, 10??? ????
???? sorting? ? memb_nm??? 10?? ????. Sorting? ???
?? ?? 30?? ??? 21???? 30?? row?? ????. ??? ??
????, ??? ????? ?? ?? ??? 10?? ?? ? ??.
Select memb_nm(??????) ??????,
memb_nm(??????) ??????,
??????, ??????, ???? ????, ???
From (
select rownum no, a.*
from (
select ??????, ??????
??????, ??????, ????, ????, ???
from ??
where ???? = :????
and ???? = :????
and ???? between sysdate -10/24 and sysdate
order by ???? desc
) a
where rownum <= 30
)
)
Where no between 21 and 30
)