2. 12.1 Factor analysis
? ? ? ???? latent variable z = {1,2,..,K} ? ???? ??
An alternative is to use a vector of real-valued latent variables,zi ¡ÊR
? where W is a D¡ÁL matrix, known as the factor loading matrix, and ¦· is a D¡ÁD covariance matrix.
? We take ¦· to be diagonal, since the whole point of the model is to ¡°force¡± zi to explain the correlation, rather than
¡°baking it in¡± to the observation¡¯s covariance.
? The special case in which ¦·=¦Ò2I is called probabilistic principal components analysis or PPCA.
? The reason for this name will become apparent later.
3. 12.1.1 FA is a low rank parameterization of an MVN
? FA can be thought of as a way of specifying a joint density model on x using a small number of parameters.
4. 12.1 Factor analysis
? The generative process, where L=1, D=2 and ¦· is diagonal, is illustrated in Figure 12.1.
? We take an isotropic Gaussian ¡°spray can¡± and slide it along the 1d line defined by wzi +¦Ì.
? This induces an ellongated (and hence correlated) Gaussian in 2d.
5. 12.1.2 Inference of the latent factors
?
latent factors z will reveal something interesting about the data.
xi(D??)? ??? L???? ???? ? ??
training set? D???? L???? ?? ??
6. 12.1.2 Inference of the latent factors
? Example
? D =11??(????, ??? ?, ??,...), N =328 ?? example(??? ??), L = 2
? ? ??(????, ??? ?,.. 11?)? ?? ?? e1=(1,0,...,0), e2=(0,1,0,...,0)? ??? ??? ??? ?? ?
?? ? (biplot??? ?)
? biplot ??? ?? ????(??)? ? ??? ? ??? ?? ?
training set? D???? L???? ?? ?? (??? ?)
7. 12.1.3 Unidentifiability
? Just like with mixture models, FA is also unidentifiable
? LDA ?? ?? ?????, z(??)? ??? ??
? ?? ???? ??? ?? ???, ?? ??? ??? ?
? ?? ??
? Forcing W to be orthonormal Perhaps the cleanest solution to the identifiability problem is to force W to be
orthonormal, and to order the columns by decreasing variance of the corresponding latent factors. This is the
approach adopted by PCA, which we will discuss in Section 12.2.
? orthonormal ??? ?? ???? ?? ????
? ???? ?????,
9. 12.1.4 Mixtures of factor analysers
?
let [the k¡¯th linear subspace of dimensionality Lk]] be represented by Wk, for k=1:K.
? Suppose we have a latent indicator qi ¡Ê{1,...,K} specifying which subspace we should use to generate the data.
? We then sample zi from a Gaussian prior and pass it through the Wk matrix (where k=qi), and add noise.
? ??? Xi? k?? FA?? ???? ??
(GMM? ??)
10. 12.1.5 EM for factor analysis models
Expected log likelihood
ESS(Expected Sufficient Statistics)
11. 12.1.5 EM for factor analysis models
? E- step
? M-step
12. 12.2 Principal components analysis (PCA)
? Consider the FA model where we constrain ¦·=¦Ò2I, and W to be orthonormal.
? It can be shown (Tipping and Bishop 1999) that, as ¦Ò2 ¡ú0, this model reduces to classical (nonprobabilistic)principal
components analysis( PCA),
? The version where ¦Ò2 > 0 is known as probabilistic PCA(PPCA)
15. proof of PCA
? wj ¡ÊRD to denote the j¡¯th principal direction
? xi ¡ÊRD to denote the i¡¯th high-dimensional observation,
? zi ¡ÊRL to denote the i¡¯th low-dimensional representation
? Let us start by estimating the best 1d solution,w1 ¡ÊRD, and the corresponding projected points?z1¡ÊRN.
? So the optimal reconstruction weights are obtained by orthogonally projecting the data onto the first principal
direction
16. proof of PCA
x? z = wx? ??? ??? ????
??
????? reconstruction error? ????? ??? ??? ??? ??? ????? ??? ????
direction that maximizes the variance is an
eigenvector of the covariance matrix.
17. proof of PCA
Optimizing wrt w1 and z1 gives the same solution as before.
The proof continues in this way. (Formally one can use induction.)
26. 12.2.5 EM algorithm for PCA
? M-step
multi-output linear regression (Equation 7.89)
? linear regression?? ? y? ??? ??? linear regression
? ??? zi? ????, xi? ??? ?? multi-output linear regression??
? ??? ??? ??? zi? ??? ??(W)? ??? ? x(??? ?)?? ??? ??? ?? ?? ??? ?
??.
27. 12.2.5 EM algorithm for PCA
? EM? ??
? EM can be faster
? EM can be implemented in an online fashion, i.e., we can update our estimate of W
as the data streams in.