際際滷

際際滷Share a Scribd company logo
Learning how to explain neural networks
PatternNet and PatternAttribution
PJ Kindermans et al. 2017
蠏觜

螻る蟲 一蟆曙螻牛螻

Data Science & Business Analytics 郁規
/ 29
覈谿
1. 螻手碓 覦覯襦れ 覦 覓語
2. 螻手碓 覦覯襦 覿
1) DeConvNet
2) Guided BackProp
3. Linear model 蟲
1) Deterministic distractor
2) Additive isotropic Gaussian noise
4. Approaches
5. Quality criterion for signal estimator
6. Learning to estimate signal
1) Existing estimators
2) PatternNet & PatternAttribution
7. Experiments
2
/ 29
0. 
Data 譴 覩碁ゼ 願  Signal螻 碁 覿覿 Distractor襦 蟲焔.
螳襯 る Signal 覿覿 譴 伎 .
Model weight Distractor レ 襷 覦蠍 覓語
螳襯   weight襷 譟危覃 譬讌  蟆郁骸襯 碁.
output y distractor correlation朱 signal 讌 , 螳  .
豢覿 給 覈語 { weight, input, output } 螳朱 linear, non-linear
 覦朱 signal 蟲  螻,  蠏碁襦 螳.
3
/ 29
1. 螻手碓 覦覯襦れ 覓語
Deep learning る蠍  れ れ 伎
DeConvNet, Guided BackProp, LRP(Layer-wise Relevance Propagation)
 覈碁れ 譯殊 螳 譴 覲願 豢 Output る Backpropagation 
覃 input 譴 覲企れ 企至 豢伎讌   る 蟆
4
/ 29
1. 螻手碓 覦覯襦れ 覓語(Contd)
讌襷 螻手碓 覦覯襦れ 豢豢 saliency mapれ 蟆朱 蠏碁伎誤企慨,
企朱 覯渚蟆 input data 譴 覿覿れ 豢豢讌 覈詩
Data Signal螻 Distractor  螳讌襦 蟲覿  
Relevant signal: Data襯 企 譴 覿覿, 旧 伎
Distracting component: Data 蟯螻 noise
Signal  Distractor 蟆 譬讌一 Weight vector襦 覈語 る
朱朱 Deep learning model  殊 Data Distractor襯 蟇壱 蟆
∬係 weight vector襯 filter手 覿襯碁
5
- Distractor direction
磯 weight vector
direction 襷蟲 覲
蟆 覲  
- Weight vector 
distractor 蟇一企襦
轟壱
/ 29
2.1 螻手碓 覦覯襦 - DeConvNet by Zeiler & Fergus
DeconvNet 伎蟾讌
input layer 覦襦 襷酔 豌 覯讌 layer 螻殊 伎襷 る蟇磯 -> る 覿譟
Hessian matrix襯 伎伎 るジ layer 螻朱 る
-> layer襯 願襦 覲旧″伎蠍 覓語 quadratic approximation朱 螻螳 
譯殊 貉
Gradient descent襯  轟 neuron 燕る input space  谿剰鍵
∬ 襷ル曙 觜訣 郁規 譟伎
Simonyan et al. -> fully connected layer saliency map 谿城 郁規
Girchick et al. -> image 企 patch螳 filter襯 activation る讌 谿剰鍵
6
(1) Adaptive Deconvolutional Networks for Mid and High Level Feature Learning - Matthew D. Zeiler and Rob Fergus, 2011

(2) Visualizing and Understanding Convolutioinal Networks - Matthew D. Zeiler and Rob Fergus, 2014
/ 29
2.1 螻手碓 覦覯襦 - DeConvNet by Zeiler & Fergus (Contd)
 ろ語 蟲譟(AlexNet)
1. 螳 layerれ Convolution layer襦 蟲焔 
2. Convolution layer れ  ReLU 覿
3. (Optionally) Max pooling layer 
DeConvNet 谿
1. image襯 蠍一ヾ 給 ConvNet model(一検) 糾骸
2. 危エ覲 feature map 螻,
るジ 覈 feature mapれ 0朱 襷 れ,
DeConvNet layer(譬豸) 糾骸
(i) Unpool
(ii) Rectify
(iii)Reconstruction
7
(1) Adaptive Deconvolutional Networks for Mid and High Level Feature Learning - Matthew D. Zeiler and Rob Fergus, 2011

(2) Visualizing and Understanding Convolutioinal Networks - Matthew D. Zeiler and Rob Fergus, 2014
/ 29
2.1 螻手碓 覦覯襦 - DeConvNet by Zeiler & Fergus (Contd)
Unpooling
Max pooling non-invertible -> Approximation
∬係 pooling  襷 企 layer 豕螳
豺襯 record企螻, DeConvNet Switch
覦 .
Rectification
ReLU襯 蠍 覓語 feature map  0 伎
DeConvNet  螳 襷り鍵  ReLU 糾骸
Convolutional Filtering(Reconstruction)
∬鍵譟 learned conv filter襯 vertically, horizontally
flip貅 (transposed)
 螳 activation襷 BackPropる 蟆螻 螳 覩
8
(1) Adaptive Deconvolutional Networks for Mid and High Level Feature Learning - Matthew D. Zeiler and Rob Fergus, 2011

(2) Visualizing and Understanding Convolutioinal Networks - Matthew D. Zeiler and Rob Fergus, 2014
< 朱(1) figure >
< from towardsdatascience.com >
/ 29
2.1 螻手碓 覦覯襦 - DeConvNet by Zeiler & Fergus (Contd)
Occlusion Sensitivity test
轟 覿覿  襦 螳襴
企 覿覿 螳れ 
(b) total activation 覲
(d) probability score 覲
(e) most probable class 覲
(c) projection 蟆郁骸
∬ 襴 企 企語
るジ 3螳 るジ 企語 

伎: Convolution filter螳
企語 轟 覿覿(螳豌)
 ′願 . 9
/ 29
2.2 螻手碓 覦覯襦 - Guided BackProp by Springenberg
企 朱語 覈
襦 覈語 螳覦  れ 覯伎る 狩 蠑 覲旧″伎瑚螻 .
ろ語襯 蟲燕  譴 豕螻 焔レ 企  螳 譴 覿覿 覓伎殊 谿場覲伎
覈 螳 覦覯 deconvolution approach(Guided BackProp)
朱語 蟆磯
Convolution layer襷朱 覈 ろ語襯 蟲
max-pooling layer襯 一 襷螻 -> Strided convolution layer (stride 2, 3x3 filter)
10(3) Striving for Simplicity: The All Convolutional Net - Jost Tobias Springenberg et al. 2014
/ 29
2.2 螻手碓 覦覯襦 - Guided BackProp by Springenberg(Contd)
Max-pooling Strided Convolution layer襦 豌危   伎
Pooling equation
Convolution equation
=> pooling, convolution  覈 螳 input 覦朱謂
pooling activation function p-norm  蟆螻 螳.
11(3) Striving for Simplicity: The All Convolutional Net - Jost Tobias Springenberg et al. 2014
si,j,u(f ) = (
k/2

h=k/2
k/2

w=k/2
| fg(h,w,i,j,u) |p
)1/p
Notation
 f : feature map
 W,H : width, height
 N : Number of channels
 k : pooling size
 r : stride
 g(h,w,i,j,u) = (r*i + h,
r*j + w, u)
 p : order of p-norm,
p螳 覓危朱 螳 
max-pooling螻 螳 覩
 theta : convolutional
weight
 sigma : activation function
 o : # of output channel
ci,j,o(f ) = (
k/2

h=k/2
k/2

w=k/2
N

u=1
慮h,w,u,o  fg(h,w,i,j,u))
/ 29
2.2 螻手碓 覦覯襦 - Guided BackProp by Springenberg(Contd)
ろ 觜蟲 ろ語 譬襯
A, B, C : Convolutional filter size螳 る
C 3螳讌 襦 譬  れ: 麹 一検 觜螳   覈 觜蟲
12(3) Striving for Simplicity: The All Convolutional Net - Jost Tobias Springenberg et al. 2014
螳 一
/ 29
2.2 螻手碓 覦覯襦 - Guided BackProp by Springenberg(Contd)
All-CNN-C 覈語 Deconvolution 覦朱 螳
讌 螳 Zeiler & Fergus(2014) Deconvolution 覦覯
企 覈語 Max-Pooling layer螳 伎 焔レ 譬讌 蟆 
Guided BackProp
Convolution layer襷 蠍 覓語 switch 
Activation 螳 0 伎願, Gradient 0 伎 蟆襷 
a) Zeiler&Fergus(2014)
13
/ 29
2.3 螻手碓 覦覯襦 - Weight vector襷 讌譴
DeConvNet, Guided BackProp 覦覯 覈
Weight(Conv filter) 一 蟆郁骸 蟯
Weight螳 Data Signal 磯手讌 朱襦 企 覦覯れ
Data 譴 覿覿 ′伎 覈詩  
14
/ 29
3.1  Linear Model 蟲 - Deterministic distractor
 Linear model 牛 signal螻 distractor 讌 蟯谿
15
Notation
 w : filter or weight
 x : data
 y : condensed output
 s : relevant signal
 d : distracting component.
 output  覓企
覲企 螳讌螻 讌  覿覿
 a_s : direction of signal.
 output 殊 覈
 a_d : direction of distractor
s = asyx = s + d
d = ad狼
as = (1,0)T
ad = (1,1)T
y  [1,1]
狼  (亮, 2
)
Data x  signal s distractor d 
   襷譟燕蠍 伎
 願, 伎伎 蠍 覓
wT
x = y w = [1,  1]T
wT
asy = y wT
ad狼 = 0
/ 29
3.1  Linear Model 蟲 - Deterministic distractor
16
,   覈襯 豢譟煙貅 
weight distractor襯 蟇壱伎狩蠍 覓語 distractor direction螻 orthogonal り 
讀 w signal direction螻 align讌 .
weight distractor orthogonal  讌覃伎,
蠍 譟一 牛  讌 .
Weight vector distractor  蟆 譬讌一 
weight vector襷朱 企 input pattern output レ 殊讌   
wT
asy = y wT
ad狼 = 0
- signal direction 蠏碁襦 讌
- distractor direction 覦
weight direction  覦
wT
as = 1
/ 29
3.2  Linear Model 蟲 - No distractor, Additive isotropic Gaussian noise
17
Isotropic Gaussian noise襯  伎
zero mean: noise mean bias襯 牛 朱
讌   朱襦  0朱 .
correlation企 structure螳 蠍 覓語
weight vector襯  牛り 伎
noise螳 蟇磯讌 .
Gaussian noise襯 豢螳 蟆
L2 regularization螻 螳 螻朱ゼ 碁.
讀 weight襯 shirink .
 譟郁唄 覓語  襷譟燕
螳  weight vector
 螳 覦レ vector
< Gaussian pattern >
yn = 硫xn + 狼
N

n=1
(yn |硫xn, 2
)
N

n=1
(yn |硫xn, 2
)(硫|0,了1
)
N

n=1

1
2
(yn  硫xn)2
 了硫2
+ const
< Gaussian noise & L2 regularization >
> likelihood,
皙 Logarithm
wT
as = 1
as
as
w
w霞
w
1
/ 29
4. Approaches
18
Functions
data x output y襯 觸  磯 覦覯. ex) gradients, saliency map
y襯 x襦 覩碁伎 input 覲螳 企至 output 覲蟆讌 危エ覲碁.
企 model gradient襯 磯 蟆願 蟆郁記  gradient weight.
Signal
Signal: 覈語 neuron activate る 一危一 蟲 
Output input space蟾讌 gradient襯 backprop貅 覲 蟯谿
DeConvNet, Guded BackProp  蟆 signal企 覲伎 覈詩.
Attribution
轟 Signal 朱 output 蠍一讌 企 讌
Linear model signal螻 weight vector element-wise 螻煙朱 伎伎
Deep taylor decomposition 朱語 activation 螳 input 
contribution朱 覿危螻, LRP relevance 豺.
y = wT
x
y/x = w
PatternNet
PatternAttribution
/ 29
5. Quality criterion for signal estimator
19
伎  
wT
x = y
wT
s + wT
d = y
(x = s + d)wT
(s + d) = y
wT
s = y (wT
d = 0)
(wT
)1
wT
s = (wT
)1
y
s = uu1
(wT
)1
y
s = u(wT
u)1
y
u = random vector
(wT
u  0)
Quality measure 
S(x) = s
(S) = 1  maxvcorr(wT
x, vT
(x  S(x)))
d = x  S(x) y = wT
x, ,
= 1  maxv
vT
cov[y, d]
2
vT d
2
y
譬 signal estimator correlation 0朱 -> 
w 企  給 覈語 weight 螳
correlation scale invariant 蠍 覓語
 覿一  覿郁骸 螳 蟆企 曙^蟇 豢螳
S(x)襯 螻り optimal 襯 谿城謂
 覦 d y  Least-squares regression

vT d y
v
illposed problem.
企襦 襴讌 .
るジ 覦
/ 29
6.1 蠍一ヾ Signal estimator 覦
20
The identity estimator
data distractor , signal襷 譟伎 螳
data螳 企語  signal 企語 蠏碁襦企.
 linear model  attribution 蟲 .
(distractor螳 譟伎朱, attribution )
れ 一危一 distractor螳   螻,
forward pass 蟇磯讌襷
backward pass element wise 螻煙  讌
螳 noise螳 襷 覲伎碁(LRP)
Sx(x) = x
r = w  x = w  s + w  d
The filter based estimator
∬豸° signal weight direction  螳
ex) DeConvNet, Guided BackProp
weight normalize 伎 
linear model  attribution 螻旧 れ螻 螳螻
signal  襦 蟲燕讌 覈詩
Sw(x) =
w
wTw
wT
x
r =
w  w
wTw
y
/ 29
6.2 PatternNet & PatternAttribution
21
 覦 覦 螳
 criterion 豕 給逢
覈 螳ロ 覯″  y d correlation 0 
signal estimator S螳 optimal企   .
Linear model  y d covariance 0企
cov[y, x]  cov[y, S(x)] = 0

v
cov[y, x] = cov[y, S(x)]
cov[y, d] = 0
(S) = 1  maxvcorr(wT
x, vT
(x  S(x)))
= 1  maxv
vT
cov[y, d]
2
vT d
2
y
Quality measure
/ 29
6.2 PatternNet & PatternAttribution
22
The linear estimator
linear neuron data x linear signal襷 豢豢 螳
 豌 y linear 一一   signal 
linear model  y, d covariance 0企襦
Sa(x) = awT
x = ay
cov[x, y]
= cov[S(x), y]
= cov[awT
x, y]
= a  cov[y, y]
 a =
cov[x, y]
2
y
襷 d s螳 orthogonal る
DeConvNet螻 螳 filter-based 覦
螻 殊.
Convolution layer 襷れ  
FC layer ReLU螳 郁屋伎 覿覿
 correlation  蟇壱  
朱襦  豌 criterion 豺 
VGG16

criterion 觜蟲

襷蠏碁 襦

random, S_w,
S_a, S_a+-
/ 29
6.2 PatternNet & PatternAttribution
23
The two-component(Non-linear) estimator
 linear estimator 螳 trick 一襷
y 螳 覿語 磯 螳螳 るゴ蟆 豌襴.
企一 燕伎讌 覿  覲企
distractor 譟伎 y螳  覿覿 螻 
ReLU 覓語 れ positive domain襷
locally 一危 蠍 覓語 企ゼ 覲伎
covariance 螻旧 れ螻 螳螻
覿語 磯 磯 螻, 螳譴豺襦 .
Sa+(x) =
{
a+w
x ifw
x > 0
aw
x otherwise
x =
{
s+ + d+ ify > 0
s + d otherwise
cov(x, y) = [xy]  [x][y]
cov(x, y) = +(+[xy]  +[x][y])
+(1  +)(錫[xy]  錫[x][y])
cov(s, y) = +(+[sy]  +[s][y])
+(1  +)(錫[sy]  錫[s][y])
cov(x,y), cov(s,y)螳 殊 ,  覿語  覃
a+ =
+[xy]  +[x][y]
w +[xy]  w +[x][y]
/ 29
6.2 PatternNet & PatternAttribution
24
PatternNet and PatternAttribution
/ 29
6.2 PatternNet & PatternAttribution
25
PatternNet and PatternAttribution
PatternNet, Linear
cov[x,y], cov[s,y]螳 螳
a襯 螻壱  x, y襦襷 螻 螳
PatternNet, Non-linear
ReLU activation 轟 螻
/  a 螳 螻
Non-linear 覈語 焔 譬
PatternAttribution
a 螳 w襯 element-wise 螻燕 蟆郁骸襦
伎蠍  譬  蟾
heat map 燕  . r = w  a+
a =
cov[x, y]
2
y
a+ =
+[xy]  +[x][y]
w +[xy]  w +[x][y]
/ 29
7. Experiments
26
螳 觜蟲, VGG16 on ImageNet(S)
Convolution layer
覲 朱語  linear estimator螳 覃伎 譬 焔レ 覲伎願 
覲 朱語 non-linear estimator  るジ filter-based, random 覲企 燕 焔
FC layer with ReLU
覲 朱語 linear estimator 焔 蠍蟆 伎
non-linear estimator 焔 讌
/ 29
7. Experiments
27
Qualitative evaluation
轟 企語  レ  觜蟲 伎
Methods 蟲覿
Sx : Identity estimator
Sw : DeConvNet, Guided BackProp
Sa : Linear
Sa+- : Non-linear
一検朱 螳 襦 譬  structure襯錫
誤螻 碁蟆 ′碁.
/ 29
7. Experiments
28
/ 2929
螳.
PatternNet & PatternAttribution

More Related Content

Learning how to explain neural networks: PatternNet and PatternAttribution

  • 1. Learning how to explain neural networks PatternNet and PatternAttribution PJ Kindermans et al. 2017 蠏觜 螻る蟲 一蟆曙螻牛螻 Data Science & Business Analytics 郁規
  • 2. / 29 覈谿 1. 螻手碓 覦覯襦れ 覦 覓語 2. 螻手碓 覦覯襦 覿 1) DeConvNet 2) Guided BackProp 3. Linear model 蟲 1) Deterministic distractor 2) Additive isotropic Gaussian noise 4. Approaches 5. Quality criterion for signal estimator 6. Learning to estimate signal 1) Existing estimators 2) PatternNet & PatternAttribution 7. Experiments 2
  • 3. / 29 0. Data 譴 覩碁ゼ 願 Signal螻 碁 覿覿 Distractor襦 蟲焔. 螳襯 る Signal 覿覿 譴 伎 . Model weight Distractor レ 襷 覦蠍 覓語 螳襯 weight襷 譟危覃 譬讌 蟆郁骸襯 碁. output y distractor correlation朱 signal 讌 , 螳 . 豢覿 給 覈語 { weight, input, output } 螳朱 linear, non-linear 覦朱 signal 蟲 螻, 蠏碁襦 螳. 3
  • 4. / 29 1. 螻手碓 覦覯襦れ 覓語 Deep learning る蠍 れ れ 伎 DeConvNet, Guided BackProp, LRP(Layer-wise Relevance Propagation) 覈碁れ 譯殊 螳 譴 覲願 豢 Output る Backpropagation 覃 input 譴 覲企れ 企至 豢伎讌 る 蟆 4
  • 5. / 29 1. 螻手碓 覦覯襦れ 覓語(Contd) 讌襷 螻手碓 覦覯襦れ 豢豢 saliency mapれ 蟆朱 蠏碁伎誤企慨, 企朱 覯渚蟆 input data 譴 覿覿れ 豢豢讌 覈詩 Data Signal螻 Distractor 螳讌襦 蟲覿 Relevant signal: Data襯 企 譴 覿覿, 旧 伎 Distracting component: Data 蟯螻 noise Signal Distractor 蟆 譬讌一 Weight vector襦 覈語 る 朱朱 Deep learning model 殊 Data Distractor襯 蟇壱 蟆 ∬係 weight vector襯 filter手 覿襯碁 5 - Distractor direction 磯 weight vector direction 襷蟲 覲 蟆 覲 - Weight vector distractor 蟇一企襦 轟壱
  • 6. / 29 2.1 螻手碓 覦覯襦 - DeConvNet by Zeiler & Fergus DeconvNet 伎蟾讌 input layer 覦襦 襷酔 豌 覯讌 layer 螻殊 伎襷 る蟇磯 -> る 覿譟 Hessian matrix襯 伎伎 るジ layer 螻朱 る -> layer襯 願襦 覲旧″伎蠍 覓語 quadratic approximation朱 螻螳 譯殊 貉 Gradient descent襯 轟 neuron 燕る input space 谿剰鍵 ∬ 襷ル曙 觜訣 郁規 譟伎 Simonyan et al. -> fully connected layer saliency map 谿城 郁規 Girchick et al. -> image 企 patch螳 filter襯 activation る讌 谿剰鍵 6 (1) Adaptive Deconvolutional Networks for Mid and High Level Feature Learning - Matthew D. Zeiler and Rob Fergus, 2011 (2) Visualizing and Understanding Convolutioinal Networks - Matthew D. Zeiler and Rob Fergus, 2014
  • 7. / 29 2.1 螻手碓 覦覯襦 - DeConvNet by Zeiler & Fergus (Contd) ろ語 蟲譟(AlexNet) 1. 螳 layerれ Convolution layer襦 蟲焔 2. Convolution layer れ ReLU 覿 3. (Optionally) Max pooling layer DeConvNet 谿 1. image襯 蠍一ヾ 給 ConvNet model(一検) 糾骸 2. 危エ覲 feature map 螻, るジ 覈 feature mapれ 0朱 襷 れ, DeConvNet layer(譬豸) 糾骸 (i) Unpool (ii) Rectify (iii)Reconstruction 7 (1) Adaptive Deconvolutional Networks for Mid and High Level Feature Learning - Matthew D. Zeiler and Rob Fergus, 2011 (2) Visualizing and Understanding Convolutioinal Networks - Matthew D. Zeiler and Rob Fergus, 2014
  • 8. / 29 2.1 螻手碓 覦覯襦 - DeConvNet by Zeiler & Fergus (Contd) Unpooling Max pooling non-invertible -> Approximation ∬係 pooling 襷 企 layer 豕螳 豺襯 record企螻, DeConvNet Switch 覦 . Rectification ReLU襯 蠍 覓語 feature map 0 伎 DeConvNet 螳 襷り鍵 ReLU 糾骸 Convolutional Filtering(Reconstruction) ∬鍵譟 learned conv filter襯 vertically, horizontally flip貅 (transposed) 螳 activation襷 BackPropる 蟆螻 螳 覩 8 (1) Adaptive Deconvolutional Networks for Mid and High Level Feature Learning - Matthew D. Zeiler and Rob Fergus, 2011 (2) Visualizing and Understanding Convolutioinal Networks - Matthew D. Zeiler and Rob Fergus, 2014 < 朱(1) figure > < from towardsdatascience.com >
  • 9. / 29 2.1 螻手碓 覦覯襦 - DeConvNet by Zeiler & Fergus (Contd) Occlusion Sensitivity test 轟 覿覿 襦 螳襴 企 覿覿 螳れ (b) total activation 覲 (d) probability score 覲 (e) most probable class 覲 (c) projection 蟆郁骸 ∬ 襴 企 企語 るジ 3螳 るジ 企語 伎: Convolution filter螳 企語 轟 覿覿(螳豌) ′願 . 9
  • 10. / 29 2.2 螻手碓 覦覯襦 - Guided BackProp by Springenberg 企 朱語 覈 襦 覈語 螳覦 れ 覯伎る 狩 蠑 覲旧″伎瑚螻 . ろ語襯 蟲燕 譴 豕螻 焔レ 企 螳 譴 覿覿 覓伎殊 谿場覲伎 覈 螳 覦覯 deconvolution approach(Guided BackProp) 朱語 蟆磯 Convolution layer襷朱 覈 ろ語襯 蟲 max-pooling layer襯 一 襷螻 -> Strided convolution layer (stride 2, 3x3 filter) 10(3) Striving for Simplicity: The All Convolutional Net - Jost Tobias Springenberg et al. 2014
  • 11. / 29 2.2 螻手碓 覦覯襦 - Guided BackProp by Springenberg(Contd) Max-pooling Strided Convolution layer襦 豌危 伎 Pooling equation Convolution equation => pooling, convolution 覈 螳 input 覦朱謂 pooling activation function p-norm 蟆螻 螳. 11(3) Striving for Simplicity: The All Convolutional Net - Jost Tobias Springenberg et al. 2014 si,j,u(f ) = ( k/2 h=k/2 k/2 w=k/2 | fg(h,w,i,j,u) |p )1/p Notation f : feature map W,H : width, height N : Number of channels k : pooling size r : stride g(h,w,i,j,u) = (r*i + h, r*j + w, u) p : order of p-norm, p螳 覓危朱 螳 max-pooling螻 螳 覩 theta : convolutional weight sigma : activation function o : # of output channel ci,j,o(f ) = ( k/2 h=k/2 k/2 w=k/2 N u=1 慮h,w,u,o fg(h,w,i,j,u))
  • 12. / 29 2.2 螻手碓 覦覯襦 - Guided BackProp by Springenberg(Contd) ろ 觜蟲 ろ語 譬襯 A, B, C : Convolutional filter size螳 る C 3螳讌 襦 譬 れ: 麹 一検 觜螳 覈 觜蟲 12(3) Striving for Simplicity: The All Convolutional Net - Jost Tobias Springenberg et al. 2014 螳 一
  • 13. / 29 2.2 螻手碓 覦覯襦 - Guided BackProp by Springenberg(Contd) All-CNN-C 覈語 Deconvolution 覦朱 螳 讌 螳 Zeiler & Fergus(2014) Deconvolution 覦覯 企 覈語 Max-Pooling layer螳 伎 焔レ 譬讌 蟆 Guided BackProp Convolution layer襷 蠍 覓語 switch Activation 螳 0 伎願, Gradient 0 伎 蟆襷 a) Zeiler&Fergus(2014) 13
  • 14. / 29 2.3 螻手碓 覦覯襦 - Weight vector襷 讌譴 DeConvNet, Guided BackProp 覦覯 覈 Weight(Conv filter) 一 蟆郁骸 蟯 Weight螳 Data Signal 磯手讌 朱襦 企 覦覯れ Data 譴 覿覿 ′伎 覈詩 14
  • 15. / 29 3.1 Linear Model 蟲 - Deterministic distractor Linear model 牛 signal螻 distractor 讌 蟯谿 15 Notation w : filter or weight x : data y : condensed output s : relevant signal d : distracting component. output 覓企 覲企 螳讌螻 讌 覿覿 a_s : direction of signal. output 殊 覈 a_d : direction of distractor s = asyx = s + d d = ad狼 as = (1,0)T ad = (1,1)T y [1,1] 狼 (亮, 2 ) Data x signal s distractor d 襷譟燕蠍 伎 願, 伎伎 蠍 覓 wT x = y w = [1, 1]T wT asy = y wT ad狼 = 0
  • 16. / 29 3.1 Linear Model 蟲 - Deterministic distractor 16 , 覈襯 豢譟煙貅 weight distractor襯 蟇壱伎狩蠍 覓語 distractor direction螻 orthogonal り 讀 w signal direction螻 align讌 . weight distractor orthogonal 讌覃伎, 蠍 譟一 牛 讌 . Weight vector distractor 蟆 譬讌一 weight vector襷朱 企 input pattern output レ 殊讌 wT asy = y wT ad狼 = 0 - signal direction 蠏碁襦 讌 - distractor direction 覦 weight direction 覦 wT as = 1
  • 17. / 29 3.2 Linear Model 蟲 - No distractor, Additive isotropic Gaussian noise 17 Isotropic Gaussian noise襯 伎 zero mean: noise mean bias襯 牛 朱 讌 朱襦 0朱 . correlation企 structure螳 蠍 覓語 weight vector襯 牛り 伎 noise螳 蟇磯讌 . Gaussian noise襯 豢螳 蟆 L2 regularization螻 螳 螻朱ゼ 碁. 讀 weight襯 shirink . 譟郁唄 覓語 襷譟燕 螳 weight vector 螳 覦レ vector < Gaussian pattern > yn = 硫xn + 狼 N n=1 (yn |硫xn, 2 ) N n=1 (yn |硫xn, 2 )(硫|0,了1 ) N n=1 1 2 (yn 硫xn)2 了硫2 + const < Gaussian noise & L2 regularization > > likelihood, 皙 Logarithm wT as = 1 as as w w霞 w 1
  • 18. / 29 4. Approaches 18 Functions data x output y襯 觸 磯 覦覯. ex) gradients, saliency map y襯 x襦 覩碁伎 input 覲螳 企至 output 覲蟆讌 危エ覲碁. 企 model gradient襯 磯 蟆願 蟆郁記 gradient weight. Signal Signal: 覈語 neuron activate る 一危一 蟲 Output input space蟾讌 gradient襯 backprop貅 覲 蟯谿 DeConvNet, Guded BackProp 蟆 signal企 覲伎 覈詩. Attribution 轟 Signal 朱 output 蠍一讌 企 讌 Linear model signal螻 weight vector element-wise 螻煙朱 伎伎 Deep taylor decomposition 朱語 activation 螳 input contribution朱 覿危螻, LRP relevance 豺. y = wT x y/x = w PatternNet PatternAttribution
  • 19. / 29 5. Quality criterion for signal estimator 19 伎 wT x = y wT s + wT d = y (x = s + d)wT (s + d) = y wT s = y (wT d = 0) (wT )1 wT s = (wT )1 y s = uu1 (wT )1 y s = u(wT u)1 y u = random vector (wT u 0) Quality measure S(x) = s (S) = 1 maxvcorr(wT x, vT (x S(x))) d = x S(x) y = wT x, , = 1 maxv vT cov[y, d] 2 vT d 2 y 譬 signal estimator correlation 0朱 -> w 企 給 覈語 weight 螳 correlation scale invariant 蠍 覓語 覿一 覿郁骸 螳 蟆企 曙^蟇 豢螳 S(x)襯 螻り optimal 襯 谿城謂 覦 d y Least-squares regression vT d y v illposed problem. 企襦 襴讌 . るジ 覦
  • 20. / 29 6.1 蠍一ヾ Signal estimator 覦 20 The identity estimator data distractor , signal襷 譟伎 螳 data螳 企語 signal 企語 蠏碁襦企. linear model attribution 蟲 . (distractor螳 譟伎朱, attribution ) れ 一危一 distractor螳 螻, forward pass 蟇磯讌襷 backward pass element wise 螻煙 讌 螳 noise螳 襷 覲伎碁(LRP) Sx(x) = x r = w x = w s + w d The filter based estimator ∬豸° signal weight direction 螳 ex) DeConvNet, Guided BackProp weight normalize 伎 linear model attribution 螻旧 れ螻 螳螻 signal 襦 蟲燕讌 覈詩 Sw(x) = w wTw wT x r = w w wTw y
  • 21. / 29 6.2 PatternNet & PatternAttribution 21 覦 覦 螳 criterion 豕 給逢 覈 螳ロ 覯″ y d correlation 0 signal estimator S螳 optimal企 . Linear model y d covariance 0企 cov[y, x] cov[y, S(x)] = 0 v cov[y, x] = cov[y, S(x)] cov[y, d] = 0 (S) = 1 maxvcorr(wT x, vT (x S(x))) = 1 maxv vT cov[y, d] 2 vT d 2 y Quality measure
  • 22. / 29 6.2 PatternNet & PatternAttribution 22 The linear estimator linear neuron data x linear signal襷 豢豢 螳 豌 y linear 一一 signal linear model y, d covariance 0企襦 Sa(x) = awT x = ay cov[x, y] = cov[S(x), y] = cov[awT x, y] = a cov[y, y] a = cov[x, y] 2 y 襷 d s螳 orthogonal る DeConvNet螻 螳 filter-based 覦 螻 殊. Convolution layer 襷れ FC layer ReLU螳 郁屋伎 覿覿 correlation 蟇壱 朱襦 豌 criterion 豺 VGG16 criterion 觜蟲 襷蠏碁 襦 random, S_w, S_a, S_a+-
  • 23. / 29 6.2 PatternNet & PatternAttribution 23 The two-component(Non-linear) estimator linear estimator 螳 trick 一襷 y 螳 覿語 磯 螳螳 るゴ蟆 豌襴. 企一 燕伎讌 覿 覲企 distractor 譟伎 y螳 覿覿 螻 ReLU 覓語 れ positive domain襷 locally 一危 蠍 覓語 企ゼ 覲伎 covariance 螻旧 れ螻 螳螻 覿語 磯 磯 螻, 螳譴豺襦 . Sa+(x) = { a+w x ifw x > 0 aw x otherwise x = { s+ + d+ ify > 0 s + d otherwise cov(x, y) = [xy] [x][y] cov(x, y) = +(+[xy] +[x][y]) +(1 +)(錫[xy] 錫[x][y]) cov(s, y) = +(+[sy] +[s][y]) +(1 +)(錫[sy] 錫[s][y]) cov(x,y), cov(s,y)螳 殊 , 覿語 覃 a+ = +[xy] +[x][y] w +[xy] w +[x][y]
  • 24. / 29 6.2 PatternNet & PatternAttribution 24 PatternNet and PatternAttribution
  • 25. / 29 6.2 PatternNet & PatternAttribution 25 PatternNet and PatternAttribution PatternNet, Linear cov[x,y], cov[s,y]螳 螳 a襯 螻壱 x, y襦襷 螻 螳 PatternNet, Non-linear ReLU activation 轟 螻 / a 螳 螻 Non-linear 覈語 焔 譬 PatternAttribution a 螳 w襯 element-wise 螻燕 蟆郁骸襦 伎蠍 譬 蟾 heat map 燕 . r = w a+ a = cov[x, y] 2 y a+ = +[xy] +[x][y] w +[xy] w +[x][y]
  • 26. / 29 7. Experiments 26 螳 觜蟲, VGG16 on ImageNet(S) Convolution layer 覲 朱語 linear estimator螳 覃伎 譬 焔レ 覲伎願 覲 朱語 non-linear estimator るジ filter-based, random 覲企 燕 焔 FC layer with ReLU 覲 朱語 linear estimator 焔 蠍蟆 伎 non-linear estimator 焔 讌
  • 27. / 29 7. Experiments 27 Qualitative evaluation 轟 企語 レ 觜蟲 伎 Methods 蟲覿 Sx : Identity estimator Sw : DeConvNet, Guided BackProp Sa : Linear Sa+- : Non-linear 一検朱 螳 襦 譬 structure襯錫 誤螻 碁蟆 ′碁.
  • 29. / 2929 螳. PatternNet & PatternAttribution