際際滷

際際滷Share a Scribd company logo
GPU strategies
for evolutionary machine learning
Master :
Dr. Ghaderi
Mohammad Amin Amjadi
Spring 2016
悋悋惠
1. Large-scale experimental evaluation of GPU
strategies for evolutionary machine learning
Mar鱈a A. Franco, Jaume Bacardit
2016 Elsevier B.V. All rights reserved
2. Improving the scalability of rule-based
evolutionary learning
Jaume Bacardit, Edmund K. Burke, Natalio Krasnogor
2009
2
愀悋惡
1.Evolutionary Algorithm
2.Machine Learning
3.Problem
4.Goal
5.BioHEL Algorithm
6.GPUs and CUDA
7. CUDA-based fitness computation for
BioHEL
8. Coarse-grained parallel approach
9. fine-grained parallelization
10. Experimental design
11. Conclusions
12. Recommendation
3
Evolutionary Algorithm
≒惡惘惘愕 惘悋 忰悋悋惠 悋慍 擧愕惘 惠惶悋惆 惡惶惘惠
悋悧
≒惠悴 惡惘惘愕  惠惶悋惆 悋惠悽悋惡 惘悛惆
悋悧 惠擧惘悋惘 惡悋惘 惆 惘悋
≒ 悋惠悽悋惡 悴愆 惡惘 惺擯惘悋 悋慍...
悋悧 悋愕惠悋惆 惘忰 惘 惆惘
悋悋擯惘惠 悋慍 惡惘悽
愕惠惆 惡惘慍悋  拆惆
≒忰悋悋惠 惠悋 悽悋
悋悧 惡惘惘愕 惘悋
≒惠惘惡 擧 悽悋 愀
惡慍
4
 , ,  = reward = fitness
reward  cost
Machine Learning
5
Problem
悋擯惘 愆惆 慍悋惆 悋惆擯惘 慍悋:
悋)dataset惡悋愆惆 惡慍惘擯
惡)惡悋愆惆 拆惆 悋惆擯惘 惆( 惴悋惘惠 惡悋 悋惆擯惘 悋惆)...
悋慍 悋愕惠悋惆 悋惘愆 惠惘愆悋惺 悋慍 擧 悋惘慍gpu悋愕 惡惡惆 惡惘悋 悽惶惶悋 悋愕惠
惆悋惆 悋悋擯惘惠 拆悵惘-擧悋
悋慍 惠擧悋 悋悋擯惘惠 悋慍 愀惡惺惠 惡惆gpu悋愕惠悋惆 惘惡惘 惡惡惆 惡惘悋
悋愕惠 愆惆
6
Goal
惠擧悋 悋惆擯惘 愕愕惠 悋慍 悋愕惠悋惆BioHEL惆悋惆 惡惘悋 悽悋惶 惡愀惘 擧-擧悋
dataset悋愕惠 愆惆 愀惘悋忰 惡慍惘擯 悋
愕愕惠BioHEL擧惆 惘悋惡惠 悋愆 悋惆擯惘 悋悋擯惘惠 愕悋惘 惡悋
擧 悋慍 愆惆 拆愆悋惆gpu悋愕惡BioHEL(nvidia悋慍 擧cuda悋惆 拆愆惠惡悋)
愆惆 悋愕惠悋惆
悋慍 惡愆 惠愕惘惺 惡慍惘擯 愕悋悧  悛慍悋愆悋惠 惆惘60悋愕惠 惆悋愆惠 惡惘悋惡惘
忰悋惷惘 忰悋 惆惘 惘 惡惘 悋慍悋愆 擧悋慍悋 惠惘擧惡 惡悋BioHEL悋慍 惡愆750惡惘悋惡惘
悋愕惠 惆悋愆惠 惠愕惘惺
7
BioHEL
BioHEL惡慍惘擯 悋愕 悋惆悋惆 悴惺 惆惘惠 惆 惡悋 惠擧悋 悋惆擯惘 愕愕惠 擧
惡悋愆惆 
悋 悋惆擯惘 惘愆 悋慍 拆愕 愕愕惠 悋(rule)悋悋 悴惺 惠擧惘悋惘愆惆
擧惆 惠惆 惘悋 惘悋惠惡 愕愕
悋慍 悛惘惆 惆愕惠 惡 拆 惆惘 拆 擧 悋悋 悋 惘 悋惆擯惘 惡惘悋 惘愆 悋 惆惘
擧惆 悋愕惠悋惆 悋愕惠悋惆悋惘惆 惠擧 悋擯惘惠 擧
8
BioHEL
Procedure BioHEL general workflow
Input : TrainingSet
RuleSet = 
stop = false
Do
BestRule = null
For repetition=1 to NumRepetitionsRuleLearning
CandidateRule = RunGA(TrainingSet)
If CandidateRule is better than BestRule
BestRule = CandidateRule
EndIf
EndFor
Matched = Examples from TrainingSet matched by BestRule
If class of BestRule is the majority class in Matched
Remove Matched from TrainingSet
Add BestRule to RuleSet
Else
stop = true
EndIf
While stop is false
Output : RuleSet
9
BioHEL
Fitness = TL 揃 W + EL
TL = 0.25 : theory length : 忰 惘悋 拆惆擯
EL : exceptions length : 忰 惘悋 惆惠
W = 0.9 : weight : TL EL惡 惘悋惡愀 惠惴
  =
=1
=
1 
ю()
ю(倹)

R:悋 擧
NA:惆悋 悋擯 惠惺惆悋惆
Size(Ri):擯 惡悋 惘惠惡愀 悋惶 愀i悋R
Size(Di):擯 愀i惆悋 悋
10
BioHEL
EL(R) = 2  ACC(R)  COV(R)
ACC R =
()
$()
駒  =
駒.

駒
  < 駒
駒 + 1  駒 .
  駒
1  
   駒
 =
$()
|| 11
BioHEL
BioHEL悋慍ALKR擧惆 悋愕惠悋惆 惠擧 悋擯惘惠 惆惘 悋 惘慍擧惘惆 惡惘悋(惆惘 悴惺惠
惠擧 悋擯惘惠)
12
numAtt : 擯 惠惺惆悋惆
whichAtt : 悋擯
predicates : 悋惆惘
悋擯
offsetPred : 忰悋惴 擧悋
擯 惘
class : 悋 擧悋愕
BioHEL
BioHEL擧 悋拆悴惘 擧悋慍 悋慍 惘 惡惘 悋慍悋愆  悋悴惘悋 慍悋 擧悋愆 惡惘悋ILAS
擧惆 悋愕惠悋惆 愆惆 悋惆
Training set悋慍 愕惠 惠擧 悋擯惘惠 惘惘忰 惆惘  愆惆 惠愕 愕惠 惆 惡
愆惆 忰悋惴 悛
愆惆 惆悋惆 悋愆 惶惘惠 惡惆 悋 擧:
13
GPUs and CUDA
14
GPUs and CUDA
惠悽惶惶忰悋惴慍悋惆惆惘CUDA悋惠悋悛
慍慍悋惆惆悋惘惆
悋愕惠悋惆悋慍忰悋惴惡惘悋CUDA惡忰惘悋悋愕惠
擧惠悋惆惡悋惺惓悋惠擧悋惘悋愆惆
Coalescence惡愕悋惘悋愕惠
惡慍悋惡惘愕擧Thread悋拆愆惠愕惘惆惘擧
block惡悋擧悋拆愆惠愕惘忰悋惴惆愕惠惘愕
惆悋愆惠惡悋愆惆
惺惆Coalescence惡悋惺惓悋慍悋愆慍悋悋悴惘悋
Kernel愆惆
悋愕惠悋惆悋愕惡悋慍忰悋惴惠悋惆惡悋惺惓
悋慍悋愆擧悋惘悋愆惆
15
Host Device
1. Host to Device
2. Launch Kernel
3. Device to Host
types of memory in CUDA
Device mem
Texture
mem
Shared mem 惡Thread擧 悋Block悋愕惠 愆惠惘擧
Constant
mem
惠愕愀 愀Host悋愕惠 愆惠 悋惡
Local thread
mem
惘 悽惶惶Thread
16
CUDA-based fitness computation for BioHEL
愆悋愕惠擯 忰悋愕惡 惡惘悋(Fitness)愆惆 愆悋惘愆 惺悋惘 愕 悋 惘:惠惺惆悋惆
擧 悋
.1惡悋condition悋match愆惆
.2惡悋class悋match愆惆
.3惡悋 慍悋classcondition悋match愆惆
惠悋惡惺 悋慍 惆CUDA fitness悋愕惠 惺悋惘 愕 忰悋愕惡  惠愀惡 惘悛惆 悋慍 悋悴悋 
愆惆 悋悴悋 擯悋 惆 惆惘 擧:
.1 惡 惠愀惡 忰悋愕惡classifier悋   悋
.2悋慍 惡惶惘惠 悋 惘 惡惘悋 惺悋惘 愕 忰悋愕惡
忰悋愕惡fitness愆惆 惺悋 悋 悋 惆惘 惘愆 惆 惡:
.1惆惘愆惠 悋慍 惘愆-惆悋:coarse-grained parallelisation:悋 悋惶 惘愆
.2惘慍 悋慍 惘愆-惆悋:fine-grained parallelisation:拆愆悋惆 惘愆
17
CUDA-based fitness computation for BioHEL
18
coarse-grained parallelisation
≒悋愕惠 惆惡惺惆
,
≒ 惠愀惡i悋  悋j悋
fine-grained parallelisation
≒悋愕惠 惡惺惆 愕:惠惺惆悋惆 惡
愕悧 悋擯
,,
≒擯 惠愀惡k 悋慍 悋i悋
悋 j悋
Coarse-grained parallel approach
Kernel 1
≒ 悋悋  惠愀惡
悋
≒惘Thread:
1.擧  悋 擧 惠愀惡

2.惺悋惘 愕 忰悋愕惡(惠愀惡
conditionclass惘惆 )
3.惆惘 惠悋悴 悵悽惘shareMem
Kernel 2
≒惠愀惡 惺惆  惠愀惡 惠惺惆悋惆 愆悋惘愆
悋 擧
19
Kernel 1
20
Kernel 2
21
fine-grained parallelisation
Kernel 1:惠愀惡 忰悋愕惡Condition擯愕愕惠 悋擯 惡惘悋
Kernel 2:惠愀惡 忰悋愕惡Condition拆愕惠 悋擯 惡惘悋
Kernel 3:惠愀惡 忰悋愕惡class
Kernel 4:忰悋愕惡AND惠愀惡Condition惠愀惡 Class
惠擧惘悋惘 惘 惆惘 惠悋悴 惠惺惆悋惆 擧悋愆
22
Kernel 1,2
23
Experimental design
24
Name:悋Dataset
T:悴惺 愕悋慍
Training
#Att:悋擯 惠惺惆悋惆
#Dis:悋擯 惠惺惆悋惆
擯愕愕惠
#Con:悋擯 惠惺惆悋惆
拆愕惠
#Cl:惠惺惆悋惆Class悋
Experimental design
 Intel(R) Core(TM) i7 CPU
8 cores
3.07 GHz
12 GB of RAM
 Tesla C2070
6 GB of internal memory
448 CUDA cores
14 multiprocessors x 32 CUDA Cores/MP
25
Experimental design
 the serial experiments are run in Intel Xeon E5472 processors at 3.0 GHz
26
Experimental design
 Pentium 4
3.6 GHz
hyper-threading
2 GB of RAM
 Tesla C1060
4 GB of global memory
30 multiprocessors
27
Synthetic datasets Result
28
Run-time of the two GPU strategies on
synthetic problems
#Windows=1 and population size=500
Log scale is used for both axis
disc = discrete attributes
real = continuous attributes
Synthetic datasets Result
29
Run-time per instance of the two GPU strategies on synthetic problems
#Attributes=300, #Windows=1 and population size=500
Log scale in x axis
disc = discrete attributes
real = continuous attributes
Synthetic datasets Result
30
Run-time per attribute of the two GPU strategies on synthetic problems
#Instances=1 M, #Windows=1 and population size=500
Log scale in x axis
disc = discrete attributes
real = continuous attributes
Synthetic datasets Result
31
Run-time per individual of the two GPU strategies on synthetic problems
#Instances=1 M, #Attributes = 300 and #Windows=1
disc = discrete attributes
real = continuous attributes
Synthetic datasets Result
32
Run-time per individual of the two GPU strategies on synthetic problems
#Instances=1 M, #Attributes = 300 and #Windows=1
disc = discrete attributes
real = continuous attributes
Synthetic datasets Result
33
Execution time in seconds of the evaluation process of the serial version and
both CUDA fitness functions with windowing disabled
Experiments on real-world datasets
34
Speedup against the serial algorithm without using
windowing of the different parallelisation
approaches ran on different architectures
Experiments on real-world datasets
35
Independent evaluation process, coarse-grained
strategy on the C1060 GPU Card
Problems:
black = continuous
red = mixed
blue = discrete
(For interpretation of the references to colour in this
figure legend, the reader is referred to the web
version of this article)
Experiments on real-world datasets
36
Independent evaluation process, coarse-grained
strategy on the C2070 GPU Card
Problems:
black = continuous
red = mixed
blue = discrete
(For interpretation of the references to colour in this
figure legend, the reader is referred to the web
version of this article)
Experiments on real-world datasets
37
Independent evaluation process, fine-grained
strategy on the C2070 GPU Card
Problems:
black = continuous
red = mixed
blue = discrete
(For interpretation of the references to colour in this
figure legend, the reader is referred to the web
version of this article)
Conclusions
.1惡惘悋 惠擧悋 悋愆 悋惆擯惘 悋愕惠惘悋惠 惆gpu愆惆 愕悋慍 拆悋惆
.2惆惘愆惠 惘愆-惘慍 惘愆  惡惆 惠惘 擧悋惘悋 忰悋惴 惆悋-惡愆惠惘 愕悋慍 悋慍 惆悋
惆悋愆惠
.3Kernel惘慍 -惆惘 惡惆 惠惘愕惘惺 惆悋Dataset惡悋10悋15悋惘惆 愕悋惘 惆惘  擯
惆惘愆惠-惡惆 惠惘 愕惘惺 惆悋
.4悋慍 悋愕惠悋惆 擧擧 愕悋悧 惆惘gpu悋愕惠 惆悋愆惠 惡惆惠惘 惠悋悴
.5擧悋惘忰悴  愕悋慍 悋慍 惡 惠惺悋惆Thread惆悋愆惠 悽悋惆 悋愕惡 惠悋悴 悋
.6惆惘愆惠 悋擧-愕悋慍 悋慍 惺 惆惘擧 惡悋 惠惷悋惆 悋惆惆悋愆惠 惡惠惘 惠悋悴 悋惆悋
悋愕惠 惠愕惘惺  惡愆惠惘
.7惘悋 拆悋惘悋惠惘悋 惡惆惘愕惠 惠悋 惡惘惘愕  惆惠 惡悋Tune惆
.8Gpu惡 愕惘惺 悋惘惠惡悋愀悋惠 悋擧悋 擧 惆悋惘惆 悋慍悋惘 愕悽惠 拆愆惘惠 悋擧悋悋惠 悴惆惆 悋
Wrap擧惆 惘悋 惘悋 悋(惓wrapvoteshuffle)愆惆 惡悋惺惓 惠悋惆 擧
Kernel惘慍 悋-愆惆 惠惘 愕惘惺 惆悋
38
Recommendation
.1拆悋惘悋惠惘悋 悋愕惡 悋惠悽悋惡
.2惠悋愆 悛 惡惡惆 惡惘悋 惠悋 擧 悋愕惠 愆惆 惡忰惓  惠擧 悋擯惘惠 惘惆 惆惘
惆
.3惠惘悋愕惡 愆惘悋愀 惡 惡愕惠 悋擧悋 惶惘惠 惆惘 擧 擯惘惆 惠惶 惠悋惡惺 擧 惠愕惺
悋惆 忰悋惴 惘悋 拆悋惘悋惠惘悋  悋擯惘惠
.4悋慍 悋愕惠悋惆co-processor愕悋慍 惡 gpu悋悋擯惘惠  悋愆 悋惆擯惘 惡惘悋
惠擧悋
39
Question ?
Thanks
40
Ad

Recommended

Akporesiri Omene BP PPT.pptx
Akporesiri Omene BP PPT.pptx
Akporesiri Omene, E.I.T.
Online Rapid Response Strategies
Online Rapid Response Strategies
Steve MacLaughlin
Soa bpel-123
Soa bpel-123
Priyanka Bansal
Herdem360 Toplant脹lar脹
Herdem360 Toplant脹lar脹
Safak Herdem
Headphone tangling
Headphone tangling
J_Anderson186020
DDevonshire MSN-ED Comprehensive Essay Exam
DDevonshire MSN-ED Comprehensive Essay Exam
Denise Devonshire,
An introduction to genetic algorithms
An introduction to genetic algorithms
Hamideh Iraj
悋擯惘惠 惠擧
悋擯惘惠 惠擧
saeedeh ezzati
悋擯惘惠 惠擧
悋擯惘惠 惠擧
saeedeh ezzati
Genetic
Genetic
saeedeh ezzati
Genetic Algoritm
Genetic Algoritm
saeedeh ezzati
悛慍愆 愆 惶惺 - 惡悽愆 悋惘
悛慍愆 愆 惶惺 - 惡悽愆 悋惘
faradars
shape optimization
shape optimization
Morteza Dalil
05 mpi fundamentals_of_parallelism_and_code_optimization-www.astek.ir
05 mpi fundamentals_of_parallelism_and_code_optimization-www.astek.ir
aminnezarat
悋惘慍悋惡 愕悋悋悋 惘悋悋悋 惡悋 擧擧 愆惡愕悋慍
悋惘慍悋惡 愕悋悋悋 惘悋悋悋 惡悋 擧擧 愆惡愕悋慍
Sadegh Dorri N.
Id3
Id3
khatereh asadi
悋惆擯惘 惆惘悽惠 惠惶
悋惆擯惘 惆惘悽惠 惠惶
avissco
Big data HPC Convergence-Dr. Amin-Nezarat-(aminnezarat@gmail.com)-2019
Big data HPC Convergence-Dr. Amin-Nezarat-(aminnezarat@gmail.com)-2019
aminnezarat
惆悋惆 惘悋擯悋 悛慍愆 忰 愕悋 慍悋惡惆 拆惘 惡悋 忰惆惆惠 悋惡惺 悋 RCPSP ...
惆悋惆 惘悋擯悋 悛慍愆 忰 愕悋 慍悋惡惆 拆惘 惡悋 忰惆惆惠 悋惡惺 悋 RCPSP ...
MatlabHome.ir MatlabHome.ir
Software reliability model(惘愆 悋 悋惆悋慍 擯惘 悋惡惠 悋愀悋 惘 悋慍悋惘)
Software reliability model(惘愆 悋 悋惆悋慍 擯惘 悋惡惠 悋愀悋 惘 悋慍悋惘)
amirbabol
Life of Points (Machine learning with Orange flavor)
Life of Points (Machine learning with Orange flavor)
khalooei
Iwd
Iwd
hossin2012
Nabz AI Academy - Watson Course - Session 1
Nabz AI Academy - Watson Course - Session 1
Sahand Samiei
Image Cryptography and Steganography
Image Cryptography and Steganography
Mohammad Amin Amjadi
memetic algorithm
memetic algorithm
Mohammad Amin Amjadi

More Related Content

Similar to Seminar-Parallel Processing (20)

An introduction to genetic algorithms
An introduction to genetic algorithms
Hamideh Iraj
悋擯惘惠 惠擧
悋擯惘惠 惠擧
saeedeh ezzati
悋擯惘惠 惠擧
悋擯惘惠 惠擧
saeedeh ezzati
Genetic
Genetic
saeedeh ezzati
Genetic Algoritm
Genetic Algoritm
saeedeh ezzati
悛慍愆 愆 惶惺 - 惡悽愆 悋惘
悛慍愆 愆 惶惺 - 惡悽愆 悋惘
faradars
shape optimization
shape optimization
Morteza Dalil
05 mpi fundamentals_of_parallelism_and_code_optimization-www.astek.ir
05 mpi fundamentals_of_parallelism_and_code_optimization-www.astek.ir
aminnezarat
悋惘慍悋惡 愕悋悋悋 惘悋悋悋 惡悋 擧擧 愆惡愕悋慍
悋惘慍悋惡 愕悋悋悋 惘悋悋悋 惡悋 擧擧 愆惡愕悋慍
Sadegh Dorri N.
Id3
Id3
khatereh asadi
悋惆擯惘 惆惘悽惠 惠惶
悋惆擯惘 惆惘悽惠 惠惶
avissco
Big data HPC Convergence-Dr. Amin-Nezarat-(aminnezarat@gmail.com)-2019
Big data HPC Convergence-Dr. Amin-Nezarat-(aminnezarat@gmail.com)-2019
aminnezarat
惆悋惆 惘悋擯悋 悛慍愆 忰 愕悋 慍悋惡惆 拆惘 惡悋 忰惆惆惠 悋惡惺 悋 RCPSP ...
惆悋惆 惘悋擯悋 悛慍愆 忰 愕悋 慍悋惡惆 拆惘 惡悋 忰惆惆惠 悋惡惺 悋 RCPSP ...
MatlabHome.ir MatlabHome.ir
Software reliability model(惘愆 悋 悋惆悋慍 擯惘 悋惡惠 悋愀悋 惘 悋慍悋惘)
Software reliability model(惘愆 悋 悋惆悋慍 擯惘 悋惡惠 悋愀悋 惘 悋慍悋惘)
amirbabol
Life of Points (Machine learning with Orange flavor)
Life of Points (Machine learning with Orange flavor)
khalooei
Iwd
Iwd
hossin2012
Nabz AI Academy - Watson Course - Session 1
Nabz AI Academy - Watson Course - Session 1
Sahand Samiei
An introduction to genetic algorithms
An introduction to genetic algorithms
Hamideh Iraj
悋擯惘惠 惠擧
悋擯惘惠 惠擧
saeedeh ezzati
悋擯惘惠 惠擧
悋擯惘惠 惠擧
saeedeh ezzati
悛慍愆 愆 惶惺 - 惡悽愆 悋惘
悛慍愆 愆 惶惺 - 惡悽愆 悋惘
faradars
shape optimization
shape optimization
Morteza Dalil
05 mpi fundamentals_of_parallelism_and_code_optimization-www.astek.ir
05 mpi fundamentals_of_parallelism_and_code_optimization-www.astek.ir
aminnezarat
悋惘慍悋惡 愕悋悋悋 惘悋悋悋 惡悋 擧擧 愆惡愕悋慍
悋惘慍悋惡 愕悋悋悋 惘悋悋悋 惡悋 擧擧 愆惡愕悋慍
Sadegh Dorri N.
悋惆擯惘 惆惘悽惠 惠惶
悋惆擯惘 惆惘悽惠 惠惶
avissco
Big data HPC Convergence-Dr. Amin-Nezarat-(aminnezarat@gmail.com)-2019
Big data HPC Convergence-Dr. Amin-Nezarat-(aminnezarat@gmail.com)-2019
aminnezarat
惆悋惆 惘悋擯悋 悛慍愆 忰 愕悋 慍悋惡惆 拆惘 惡悋 忰惆惆惠 悋惡惺 悋 RCPSP ...
惆悋惆 惘悋擯悋 悛慍愆 忰 愕悋 慍悋惡惆 拆惘 惡悋 忰惆惆惠 悋惡惺 悋 RCPSP ...
MatlabHome.ir MatlabHome.ir
Software reliability model(惘愆 悋 悋惆悋慍 擯惘 悋惡惠 悋愀悋 惘 悋慍悋惘)
Software reliability model(惘愆 悋 悋惆悋慍 擯惘 悋惡惠 悋愀悋 惘 悋慍悋惘)
amirbabol
Life of Points (Machine learning with Orange flavor)
Life of Points (Machine learning with Orange flavor)
khalooei
Nabz AI Academy - Watson Course - Session 1
Nabz AI Academy - Watson Course - Session 1
Sahand Samiei

More from Mohammad Amin Amjadi (15)

Image Cryptography and Steganography
Image Cryptography and Steganography
Mohammad Amin Amjadi
memetic algorithm
memetic algorithm
Mohammad Amin Amjadi
Amjadi - Ebook 6 - Ref,Out - v1
Amjadi - Ebook 6 - Ref,Out - v1
Mohammad Amin Amjadi
Amjadi - Ebook 5 - Function - v1
Amjadi - Ebook 5 - Function - v1
Mohammad Amin Amjadi
rivercode.PDF
rivercode.PDF
Mohammad Amin Amjadi
Ad

Seminar-Parallel Processing

  • 1. GPU strategies for evolutionary machine learning Master : Dr. Ghaderi Mohammad Amin Amjadi Spring 2016
  • 2. 悋悋惠 1. Large-scale experimental evaluation of GPU strategies for evolutionary machine learning Mar鱈a A. Franco, Jaume Bacardit 2016 Elsevier B.V. All rights reserved 2. Improving the scalability of rule-based evolutionary learning Jaume Bacardit, Edmund K. Burke, Natalio Krasnogor 2009 2
  • 3. 愀悋惡 1.Evolutionary Algorithm 2.Machine Learning 3.Problem 4.Goal 5.BioHEL Algorithm 6.GPUs and CUDA 7. CUDA-based fitness computation for BioHEL 8. Coarse-grained parallel approach 9. fine-grained parallelization 10. Experimental design 11. Conclusions 12. Recommendation 3
  • 4. Evolutionary Algorithm ≒惡惘惘愕 惘悋 忰悋悋惠 悋慍 擧愕惘 惠惶悋惆 惡惶惘惠 悋悧 ≒惠悴 惡惘惘愕 惠惶悋惆 悋惠悽悋惡 惘悛惆 悋悧 惠擧惘悋惘 惡悋惘 惆 惘悋 ≒ 悋惠悽悋惡 悴愆 惡惘 惺擯惘悋 悋慍... 悋悧 悋愕惠悋惆 惘忰 惘 惆惘 悋悋擯惘惠 悋慍 惡惘悽 愕惠惆 惡惘慍悋 拆惆 ≒忰悋悋惠 惠悋 悽悋 悋悧 惡惘惘愕 惘悋 ≒惠惘惡 擧 悽悋 愀 惡慍 4 , , = reward = fitness reward cost
  • 6. Problem 悋擯惘 愆惆 慍悋惆 悋惆擯惘 慍悋: 悋)dataset惡悋愆惆 惡慍惘擯 惡)惡悋愆惆 拆惆 悋惆擯惘 惆( 惴悋惘惠 惡悋 悋惆擯惘 悋惆)... 悋慍 悋愕惠悋惆 悋惘愆 惠惘愆悋惺 悋慍 擧 悋惘慍gpu悋愕 惡惡惆 惡惘悋 悽惶惶悋 悋愕惠 惆悋惆 悋悋擯惘惠 拆悵惘-擧悋 悋慍 惠擧悋 悋悋擯惘惠 悋慍 愀惡惺惠 惡惆gpu悋愕惠悋惆 惘惡惘 惡惡惆 惡惘悋 悋愕惠 愆惆 6
  • 7. Goal 惠擧悋 悋惆擯惘 愕愕惠 悋慍 悋愕惠悋惆BioHEL惆悋惆 惡惘悋 悽悋惶 惡愀惘 擧-擧悋 dataset悋愕惠 愆惆 愀惘悋忰 惡慍惘擯 悋 愕愕惠BioHEL擧惆 惘悋惡惠 悋愆 悋惆擯惘 悋悋擯惘惠 愕悋惘 惡悋 擧 悋慍 愆惆 拆愆悋惆gpu悋愕惡BioHEL(nvidia悋慍 擧cuda悋惆 拆愆惠惡悋) 愆惆 悋愕惠悋惆 悋慍 惡愆 惠愕惘惺 惡慍惘擯 愕悋悧 悛慍悋愆悋惠 惆惘60悋愕惠 惆悋愆惠 惡惘悋惡惘 忰悋惷惘 忰悋 惆惘 惘 惡惘 悋慍悋愆 擧悋慍悋 惠惘擧惡 惡悋BioHEL悋慍 惡愆750惡惘悋惡惘 悋愕惠 惆悋愆惠 惠愕惘惺 7
  • 8. BioHEL BioHEL惡慍惘擯 悋愕 悋惆悋惆 悴惺 惆惘惠 惆 惡悋 惠擧悋 悋惆擯惘 愕愕惠 擧 惡悋愆惆 悋 悋惆擯惘 惘愆 悋慍 拆愕 愕愕惠 悋(rule)悋悋 悴惺 惠擧惘悋惘愆惆 擧惆 惠惆 惘悋 惘悋惠惡 愕愕 悋慍 悛惘惆 惆愕惠 惡 拆 惆惘 拆 擧 悋悋 悋 惘 悋惆擯惘 惡惘悋 惘愆 悋 惆惘 擧惆 悋愕惠悋惆 悋愕惠悋惆悋惘惆 惠擧 悋擯惘惠 擧 8
  • 9. BioHEL Procedure BioHEL general workflow Input : TrainingSet RuleSet = stop = false Do BestRule = null For repetition=1 to NumRepetitionsRuleLearning CandidateRule = RunGA(TrainingSet) If CandidateRule is better than BestRule BestRule = CandidateRule EndIf EndFor Matched = Examples from TrainingSet matched by BestRule If class of BestRule is the majority class in Matched Remove Matched from TrainingSet Add BestRule to RuleSet Else stop = true EndIf While stop is false Output : RuleSet 9
  • 10. BioHEL Fitness = TL 揃 W + EL TL = 0.25 : theory length : 忰 惘悋 拆惆擯 EL : exceptions length : 忰 惘悋 惆惠 W = 0.9 : weight : TL EL惡 惘悋惡愀 惠惴 = =1 = 1 ю() ю(倹) R:悋 擧 NA:惆悋 悋擯 惠惺惆悋惆 Size(Ri):擯 惡悋 惘惠惡愀 悋惶 愀i悋R Size(Di):擯 愀i惆悋 悋 10
  • 11. BioHEL EL(R) = 2 ACC(R) COV(R) ACC R = () $() 駒 = 駒. 駒 < 駒 駒 + 1 駒 . 駒 1 駒 = $() || 11
  • 12. BioHEL BioHEL悋慍ALKR擧惆 悋愕惠悋惆 惠擧 悋擯惘惠 惆惘 悋 惘慍擧惘惆 惡惘悋(惆惘 悴惺惠 惠擧 悋擯惘惠) 12 numAtt : 擯 惠惺惆悋惆 whichAtt : 悋擯 predicates : 悋惆惘 悋擯 offsetPred : 忰悋惴 擧悋 擯 惘 class : 悋 擧悋愕
  • 13. BioHEL BioHEL擧 悋拆悴惘 擧悋慍 悋慍 惘 惡惘 悋慍悋愆 悋悴惘悋 慍悋 擧悋愆 惡惘悋ILAS 擧惆 悋愕惠悋惆 愆惆 悋惆 Training set悋慍 愕惠 惠擧 悋擯惘惠 惘惘忰 惆惘 愆惆 惠愕 愕惠 惆 惡 愆惆 忰悋惴 悛 愆惆 惆悋惆 悋愆 惶惘惠 惡惆 悋 擧: 13
  • 16. types of memory in CUDA Device mem Texture mem Shared mem 惡Thread擧 悋Block悋愕惠 愆惠惘擧 Constant mem 惠愕愀 愀Host悋愕惠 愆惠 悋惡 Local thread mem 惘 悽惶惶Thread 16
  • 17. CUDA-based fitness computation for BioHEL 愆悋愕惠擯 忰悋愕惡 惡惘悋(Fitness)愆惆 愆悋惘愆 惺悋惘 愕 悋 惘:惠惺惆悋惆 擧 悋 .1惡悋condition悋match愆惆 .2惡悋class悋match愆惆 .3惡悋 慍悋classcondition悋match愆惆 惠悋惡惺 悋慍 惆CUDA fitness悋愕惠 惺悋惘 愕 忰悋愕惡 惠愀惡 惘悛惆 悋慍 悋悴悋 愆惆 悋悴悋 擯悋 惆 惆惘 擧: .1 惡 惠愀惡 忰悋愕惡classifier悋 悋 .2悋慍 惡惶惘惠 悋 惘 惡惘悋 惺悋惘 愕 忰悋愕惡 忰悋愕惡fitness愆惆 惺悋 悋 悋 惆惘 惘愆 惆 惡: .1惆惘愆惠 悋慍 惘愆-惆悋:coarse-grained parallelisation:悋 悋惶 惘愆 .2惘慍 悋慍 惘愆-惆悋:fine-grained parallelisation:拆愆悋惆 惘愆 17
  • 18. CUDA-based fitness computation for BioHEL 18 coarse-grained parallelisation ≒悋愕惠 惆惡惺惆 , ≒ 惠愀惡i悋 悋j悋 fine-grained parallelisation ≒悋愕惠 惡惺惆 愕:惠惺惆悋惆 惡 愕悧 悋擯 ,, ≒擯 惠愀惡k 悋慍 悋i悋 悋 j悋
  • 19. Coarse-grained parallel approach Kernel 1 ≒ 悋悋 惠愀惡 悋 ≒惘Thread: 1.擧 悋 擧 惠愀惡 2.惺悋惘 愕 忰悋愕惡(惠愀惡 conditionclass惘惆 ) 3.惆惘 惠悋悴 悵悽惘shareMem Kernel 2 ≒惠愀惡 惺惆 惠愀惡 惠惺惆悋惆 愆悋惘愆 悋 擧 19
  • 22. fine-grained parallelisation Kernel 1:惠愀惡 忰悋愕惡Condition擯愕愕惠 悋擯 惡惘悋 Kernel 2:惠愀惡 忰悋愕惡Condition拆愕惠 悋擯 惡惘悋 Kernel 3:惠愀惡 忰悋愕惡class Kernel 4:忰悋愕惡AND惠愀惡Condition惠愀惡 Class 惠擧惘悋惘 惘 惆惘 惠悋悴 惠惺惆悋惆 擧悋愆 22
  • 24. Experimental design 24 Name:悋Dataset T:悴惺 愕悋慍 Training #Att:悋擯 惠惺惆悋惆 #Dis:悋擯 惠惺惆悋惆 擯愕愕惠 #Con:悋擯 惠惺惆悋惆 拆愕惠 #Cl:惠惺惆悋惆Class悋
  • 25. Experimental design Intel(R) Core(TM) i7 CPU 8 cores 3.07 GHz 12 GB of RAM Tesla C2070 6 GB of internal memory 448 CUDA cores 14 multiprocessors x 32 CUDA Cores/MP 25
  • 26. Experimental design the serial experiments are run in Intel Xeon E5472 processors at 3.0 GHz 26
  • 27. Experimental design Pentium 4 3.6 GHz hyper-threading 2 GB of RAM Tesla C1060 4 GB of global memory 30 multiprocessors 27
  • 28. Synthetic datasets Result 28 Run-time of the two GPU strategies on synthetic problems #Windows=1 and population size=500 Log scale is used for both axis disc = discrete attributes real = continuous attributes
  • 29. Synthetic datasets Result 29 Run-time per instance of the two GPU strategies on synthetic problems #Attributes=300, #Windows=1 and population size=500 Log scale in x axis disc = discrete attributes real = continuous attributes
  • 30. Synthetic datasets Result 30 Run-time per attribute of the two GPU strategies on synthetic problems #Instances=1 M, #Windows=1 and population size=500 Log scale in x axis disc = discrete attributes real = continuous attributes
  • 31. Synthetic datasets Result 31 Run-time per individual of the two GPU strategies on synthetic problems #Instances=1 M, #Attributes = 300 and #Windows=1 disc = discrete attributes real = continuous attributes
  • 32. Synthetic datasets Result 32 Run-time per individual of the two GPU strategies on synthetic problems #Instances=1 M, #Attributes = 300 and #Windows=1 disc = discrete attributes real = continuous attributes
  • 33. Synthetic datasets Result 33 Execution time in seconds of the evaluation process of the serial version and both CUDA fitness functions with windowing disabled
  • 34. Experiments on real-world datasets 34 Speedup against the serial algorithm without using windowing of the different parallelisation approaches ran on different architectures
  • 35. Experiments on real-world datasets 35 Independent evaluation process, coarse-grained strategy on the C1060 GPU Card Problems: black = continuous red = mixed blue = discrete (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article)
  • 36. Experiments on real-world datasets 36 Independent evaluation process, coarse-grained strategy on the C2070 GPU Card Problems: black = continuous red = mixed blue = discrete (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article)
  • 37. Experiments on real-world datasets 37 Independent evaluation process, fine-grained strategy on the C2070 GPU Card Problems: black = continuous red = mixed blue = discrete (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article)
  • 38. Conclusions .1惡惘悋 惠擧悋 悋愆 悋惆擯惘 悋愕惠惘悋惠 惆gpu愆惆 愕悋慍 拆悋惆 .2惆惘愆惠 惘愆-惘慍 惘愆 惡惆 惠惘 擧悋惘悋 忰悋惴 惆悋-惡愆惠惘 愕悋慍 悋慍 惆悋 惆悋愆惠 .3Kernel惘慍 -惆惘 惡惆 惠惘愕惘惺 惆悋Dataset惡悋10悋15悋惘惆 愕悋惘 惆惘 擯 惆惘愆惠-惡惆 惠惘 愕惘惺 惆悋 .4悋慍 悋愕惠悋惆 擧擧 愕悋悧 惆惘gpu悋愕惠 惆悋愆惠 惡惆惠惘 惠悋悴 .5擧悋惘忰悴 愕悋慍 悋慍 惡 惠惺悋惆Thread惆悋愆惠 悽悋惆 悋愕惡 惠悋悴 悋 .6惆惘愆惠 悋擧-愕悋慍 悋慍 惺 惆惘擧 惡悋 惠惷悋惆 悋惆惆悋愆惠 惡惠惘 惠悋悴 悋惆悋 悋愕惠 惠愕惘惺 惡愆惠惘 .7惘悋 拆悋惘悋惠惘悋 惡惆惘愕惠 惠悋 惡惘惘愕 惆惠 惡悋Tune惆 .8Gpu惡 愕惘惺 悋惘惠惡悋愀悋惠 悋擧悋 擧 惆悋惘惆 悋慍悋惘 愕悽惠 拆愆惘惠 悋擧悋悋惠 悴惆惆 悋 Wrap擧惆 惘悋 惘悋 悋(惓wrapvoteshuffle)愆惆 惡悋惺惓 惠悋惆 擧 Kernel惘慍 悋-愆惆 惠惘 愕惘惺 惆悋 38
  • 39. Recommendation .1拆悋惘悋惠惘悋 悋愕惡 悋惠悽悋惡 .2惠悋愆 悛 惡惡惆 惡惘悋 惠悋 擧 悋愕惠 愆惆 惡忰惓 惠擧 悋擯惘惠 惘惆 惆惘 惆 .3惠惘悋愕惡 愆惘悋愀 惡 惡愕惠 悋擧悋 惶惘惠 惆惘 擧 擯惘惆 惠惶 惠悋惡惺 擧 惠愕惺 悋惆 忰悋惴 惘悋 拆悋惘悋惠惘悋 悋擯惘惠 .4悋慍 悋愕惠悋惆co-processor愕悋慍 惡 gpu悋悋擯惘惠 悋愆 悋惆擯惘 惡惘悋 惠擧悋 39