Introduction to Auto-Encoders, Denoising Auto-Encoders and Variational Auto-Encoders
1 of 30
Downloaded 16 times
More Related Content
Auto-Encoders and Variational Auto-Encoders
2. In this chapter, we will discuss about
β’ What is Autoencoder
- neural networks whose dimension of input and output are same
- if the autoencoder use only linear activations and the cost
function is MES, then it is same to PCA
- the architecture of a stacked autoencoder is typically symmetrical
8. Cost function
β’ [VLBM, p.2] An autoencoder takes an input vector π₯ β 0,1 π
, and first
maps it to a hidden representation y β 0,1 πβ²
through a deterministic
mapping π¦ = ππ π₯ = π (ππ₯ + π), parameterized by π = {π, π}.
π is a π Γ πβ²
weight matrix and π is a bias vector. The resulting latent
representation π¦ is then mapped back to a βreconstructedβ vector
z β 0,1 π in input space π§ = π πβ² π¦ = π πβ² π¦ + πβ² with πβ² = {πβ², πβ²}.
The weight matrix πβ² of the reverse mapping may optionally be
constrained by πβ² = π π, in which case the autoencoder is said to
have tied weights.
β’ The parameters of this model are optimized to minimize
the average reconstruction error:
πβ
, πβ²β
= arg min
π,πβ²
1
π
π=1
π
πΏ(π₯ π
, π§ π
)
= arg min
π,πβ²
1
π π=1
π
πΏ(π₯ π
, π πβ²(ππ π₯ π
) (1)
where πΏ π₯ β π§ = π₯ β π§ 2.
9. Cost function
β’ [VLBM,p.2] An alternative loss, suggested by the interpretation of π₯
and π§ as either bit vectors or vectors of bit probabilities (Bernoullis) is
the reconstruction cross-entropy:
πΏ π» π₯, π§ = π»(π΅π₯||π΅π§)
= β π=1
π
[π₯ π log π§ π + log 1 β π₯ π log(1 β π§ π)]
where π΅π π₯ = (π΅π1
π₯ , β― , π΅π π
π₯ ) is a Bernoulli distribution.
β’ [VLBM,p.2] Equation (1) with πΏ = πΏ π» can be written
πβ, πβ²β = arg min
π,πβ²
πΈ π0 [πΏ π»(π, π πβ²(ππ π ))]
where π0(π) denotes the empirical distribution associated to our π
training inputs.
15. Reference
β’ Reference. Auto-Encoding Variational Bayes, Diederik P Kingma,
Max Welling, 2013
β’ [λ Όλ¬Έμ κ°μ ] We will restrict ourselves here to the common case
where we have an i.i.d. dataset with latent variables per datapoint,
and where we like to perform maximum likelihood (ML) or
maximum a posteriori (MAP) inference on the (global) parameters,
and variational inference on the latent variables.