際際滷

際際滷Share a Scribd company logo
ベイズ侮嗔Я
4.2 ゛ 4.6
Variational Bayes
DKL = [q(Z; ξ) [ p(Z|X)]
= ?
〈
q(Z; ξ)ln
p(Z|X)
q(Z; ξ)
dZ = ?
〈
q(Z; ξ)ln
p(Z, X)
q(Z; ξ)p(X)
dZ
= ln p(X) ?
〈
q(Z; ξ)ln
p(Z, X)
q(Z; ξ)
dZ = ln p(X) ? ?(ξ)
? Maximize ELBO Minimize KL-divergence

? Normalization constant not required for computing ELBO
DKL(q [ p) − 0 ? ln p(X) − ?(ξ) Evidence lower bound (ELBO)
Linear dimensionality reduction
p(X O Z, W) =
N
’
n=1
p(xn |zn, W) =
N
’
n=1
? (xn O Wzn, σ2
x I)
Observation model
Prior
p(Z) =
N
’
n=1
? (zn O 0, I) p(W) =
’
i,j
? (wij O 0,σ2
w)
Variational posterior
p(Z, W O X) 「 q(Z)q(W)
? = Eq(Z)q(W) [ln p(X|Z, W)] ? DKL [q(Z) [ p(Z)] ? DKL [q(W) [ p(W)]
Likelihood Regularizer
Objective
?qi(Z) = Eqi(Z)qi+1(W) [ln p(X O Z, W)] ? DKL [qi+1(W) [ p(W)] + const .
= Eqi+1(W) ln
exp (Eqi(Z) [ln p(X O Z, W)]) p(W)
qi+1(W)
+ const .
= DKL [qi+1(W) [ ri(W)] + const .
÷ qi+1(W) = ri(W) 『 exp (Eqi(Z) [ln p(X O Z, W)]) p(W)
Variational M-step: maximize w.r.t.?qi(Z) qi+1(W)
Variational M-step: maximize w.r.t.?qi(Z) qi+1(W)
qi+1(Z) = ri(Z) 『 exp (Eqi(W) [ln p(X O Z, W)]) p(Z)
Gaussian mixture model
p(X O S, W) =
N
’
n=1
p(xn |sn, W) =
N
’
n=1
? (xn O Wsn, σ2
x I)
Observation model
sn ( {0,1}K
,
K
‘
k=1
sn,k = 1
Prior
Variational posterior
p(S) =
N
’
n=1
cat (sn O π)
p(S, W O X) 「 q(S)q(W)
Laplace approximation
p(Z O X) 「 ? (Z O ZMAP, Λ(ZMAP))
Quadratic approximation of posterior around
Λ(Z) = ? ?2
Zln p (Z O X)
× ln p(Z O X) 「 ln p (ZMAP O X)+
(Z ? ZMAP)
?
?2
Zln p (Z|X)
Z=ZMAP
(Z ? ZMAP)
ZMAP
× ?Zln p(Z|X)
Z=ZMAP
= 0
Moment matching
q(z; η) = h(z)exp (η?
t(z) ? a(η))Approximate byp(z)
DKL (p(z) [ q(z; η)) = ? Ep(z) [ln q(z; η)] + Ep(z) [ln p(z)]
= ? ηEp(z) [t(z)] + a(η) + const .
÷ Eq(z;η) [t(z)] = Ep(z) [t(z)]
?ηDKL (p(z) [ q(z; η)) = ? Ep(z) [t(z)] + ?ηa(η)
= ? Ep(z) [t(z)] + Eq(z;η) [t(z)] = 0
Assumed density ?ltering
qi+1(θ) 「 ri+1(θ) = Z?1
i+1p(?i+1 O θ)qi(θ) = Z?1
i+1fi+1(θ)qi(θ)
With conjugate prior, are the same familypi(θ)(i = 0,1,?)
pi+1(θ) = Z?1
i+1p(?i+1 O θ)pi(θ)
With non-conjugate prior, ´
Consider estimation for sequence of data
?1, ?2, ?
(q0(θ) = p0(θ))
Moment matching
MM with 1-dim Gaussian distribution
qi(θ) = ? (θ O μi, vi)
Di?erentiate w.r.t.ln Zi+1
μi
Normalization constant Zi+1 =
〈
fi+1(θ)
1
2πv2
i
exp
(
?
(θ ? μi)2
2v2
i )
dθ
?
?μi
ln Zi+1 =
1
Zi+1
〈
fi+1(θ)?(θ O μi, vi)
θ ? μi
vi
dθ =
Eri+1 [θ] ? μi
vi
÷ μi+1 = Eri+1 [θ] = μi + vi
?
?μi
ln Zi+1
Di?erentiate w.r.t.ln Zi+1 vi
MM with Gamma distribution
MM for probit regression
Marginal likelihood is intractable
p(Y O X, w) =
N
’
n=1
?(yn O xn, w) p(w) = ? (w O 0,v0)
Z =
〈
p(Y O X, w)p(w)dw
Instead, apply recursive update qi+1(θ) = Z?1
i+1p(?i+1 O θ)qi(θ)
Expectation propagation

More Related Content

20191026 bayes dl

  • 2. Variational Bayes DKL = [q(Z; ξ) [ p(Z|X)] = ? 〈 q(Z; ξ)ln p(Z|X) q(Z; ξ) dZ = ? 〈 q(Z; ξ)ln p(Z, X) q(Z; ξ)p(X) dZ = ln p(X) ? 〈 q(Z; ξ)ln p(Z, X) q(Z; ξ) dZ = ln p(X) ? ?(ξ) ? Maximize ELBO Minimize KL-divergence ? Normalization constant not required for computing ELBO DKL(q [ p) − 0 ? ln p(X) − ?(ξ) Evidence lower bound (ELBO)
  • 3. Linear dimensionality reduction p(X O Z, W) = N ’ n=1 p(xn |zn, W) = N ’ n=1 ? (xn O Wzn, σ2 x I) Observation model Prior p(Z) = N ’ n=1 ? (zn O 0, I) p(W) = ’ i,j ? (wij O 0,σ2 w) Variational posterior p(Z, W O X) 「 q(Z)q(W)
  • 4. ? = Eq(Z)q(W) [ln p(X|Z, W)] ? DKL [q(Z) [ p(Z)] ? DKL [q(W) [ p(W)] Likelihood Regularizer Objective ?qi(Z) = Eqi(Z)qi+1(W) [ln p(X O Z, W)] ? DKL [qi+1(W) [ p(W)] + const . = Eqi+1(W) ln exp (Eqi(Z) [ln p(X O Z, W)]) p(W) qi+1(W) + const . = DKL [qi+1(W) [ ri(W)] + const . ÷ qi+1(W) = ri(W) 『 exp (Eqi(Z) [ln p(X O Z, W)]) p(W) Variational M-step: maximize w.r.t.?qi(Z) qi+1(W)
  • 5. Variational M-step: maximize w.r.t.?qi(Z) qi+1(W) qi+1(Z) = ri(Z) 『 exp (Eqi(W) [ln p(X O Z, W)]) p(Z)
  • 6. Gaussian mixture model p(X O S, W) = N ’ n=1 p(xn |sn, W) = N ’ n=1 ? (xn O Wsn, σ2 x I) Observation model sn ( {0,1}K , K ‘ k=1 sn,k = 1 Prior Variational posterior p(S) = N ’ n=1 cat (sn O π) p(S, W O X) 「 q(S)q(W)
  • 7. Laplace approximation p(Z O X) 「 ? (Z O ZMAP, Λ(ZMAP)) Quadratic approximation of posterior around Λ(Z) = ? ?2 Zln p (Z O X) × ln p(Z O X) 「 ln p (ZMAP O X)+ (Z ? ZMAP) ? ?2 Zln p (Z|X) Z=ZMAP (Z ? ZMAP) ZMAP × ?Zln p(Z|X) Z=ZMAP = 0
  • 8. Moment matching q(z; η) = h(z)exp (η? t(z) ? a(η))Approximate byp(z) DKL (p(z) [ q(z; η)) = ? Ep(z) [ln q(z; η)] + Ep(z) [ln p(z)] = ? ηEp(z) [t(z)] + a(η) + const . ÷ Eq(z;η) [t(z)] = Ep(z) [t(z)] ?ηDKL (p(z) [ q(z; η)) = ? Ep(z) [t(z)] + ?ηa(η) = ? Ep(z) [t(z)] + Eq(z;η) [t(z)] = 0
  • 9. Assumed density ?ltering qi+1(θ) 「 ri+1(θ) = Z?1 i+1p(?i+1 O θ)qi(θ) = Z?1 i+1fi+1(θ)qi(θ) With conjugate prior, are the same familypi(θ)(i = 0,1,?) pi+1(θ) = Z?1 i+1p(?i+1 O θ)pi(θ) With non-conjugate prior, ´ Consider estimation for sequence of data ?1, ?2, ? (q0(θ) = p0(θ)) Moment matching
  • 10. MM with 1-dim Gaussian distribution qi(θ) = ? (θ O μi, vi) Di?erentiate w.r.t.ln Zi+1 μi Normalization constant Zi+1 = 〈 fi+1(θ) 1 2πv2 i exp ( ? (θ ? μi)2 2v2 i ) dθ ? ?μi ln Zi+1 = 1 Zi+1 〈 fi+1(θ)?(θ O μi, vi) θ ? μi vi dθ = Eri+1 [θ] ? μi vi ÷ μi+1 = Eri+1 [θ] = μi + vi ? ?μi ln Zi+1
  • 12. MM with Gamma distribution
  • 13. MM for probit regression Marginal likelihood is intractable p(Y O X, w) = N ’ n=1 ?(yn O xn, w) p(w) = ? (w O 0,v0) Z = 〈 p(Y O X, w)p(w)dw Instead, apply recursive update qi+1(θ) = Z?1 i+1p(?i+1 O θ)qi(θ)