This document discusses variational Bayes methods for deep learning including:
1. Variational Bayes approximates the posterior distribution p(Z|X) with a variational distribution q(Z;ξ) by minimizing the Kullback-Leibler divergence between them.
2. For linear dimensionality reduction, it presents a model with observation model p(X|Z,W), priors p(Z) and p(W), and approximates the posterior p(Z,W|X) with variational posterior q(Z)q(W).
3. It derives an objective function L to maximize the evidence lower bound, which involves the expected log likelihood and KL divergences of the vari
3. Linear dimensionality reduction
p(X O Z, W) =
N
’
n=1
p(xn |zn, W) =
N
’
n=1
? (xn O Wzn, σ2
x I)
Observation model
Prior
p(Z) =
N
’
n=1
? (zn O 0, I) p(W) =
’
i,j
? (wij O 0,σ2
w)
Variational posterior
p(Z, W O X) 「 q(Z)q(W)
6. Gaussian mixture model
p(X O S, W) =
N
’
n=1
p(xn |sn, W) =
N
’
n=1
? (xn O Wsn, σ2
x I)
Observation model
sn ( {0,1}K
,
K
‘
k=1
sn,k = 1
Prior
Variational posterior
p(S) =
N
’
n=1
cat (sn O π)
p(S, W O X) 「 q(S)q(W)
7. Laplace approximation
p(Z O X) 「 ? (Z O ZMAP, Λ(ZMAP))
Quadratic approximation of posterior around
Λ(Z) = ? ?2
Zln p (Z O X)
× ln p(Z O X) 「 ln p (ZMAP O X)+
(Z ? ZMAP)
?
?2
Zln p (Z|X)
Z=ZMAP
(Z ? ZMAP)
ZMAP
× ?Zln p(Z|X)
Z=ZMAP
= 0
9. Assumed density ?ltering
qi+1(θ) 「 ri+1(θ) = Z?1
i+1p(?i+1 O θ)qi(θ) = Z?1
i+1fi+1(θ)qi(θ)
With conjugate prior, are the same familypi(θ)(i = 0,1,?)
pi+1(θ) = Z?1
i+1p(?i+1 O θ)pi(θ)
With non-conjugate prior, ´
Consider estimation for sequence of data
?1, ?2, ?
(q0(θ) = p0(θ))
Moment matching
10. MM with 1-dim Gaussian distribution
qi(θ) = ? (θ O μi, vi)
Di?erentiate w.r.t.ln Zi+1
μi
Normalization constant Zi+1 =
〈
fi+1(θ)
1
2πv2
i
exp
(
?
(θ ? μi)2
2v2
i )
dθ
?
?μi
ln Zi+1 =
1
Zi+1
〈
fi+1(θ)?(θ O μi, vi)
θ ? μi
vi
dθ =
Eri+1 [θ] ? μi
vi
÷ μi+1 = Eri+1 [θ] = μi + vi
?
?μi
ln Zi+1
13. MM for probit regression
Marginal likelihood is intractable
p(Y O X, w) =
N
’
n=1
?(yn O xn, w) p(w) = ? (w O 0,v0)
Z =
〈
p(Y O X, w)p(w)dw
Instead, apply recursive update qi+1(θ) = Z?1
i+1p(?i+1 O θ)qi(θ)