�ݺ�ߣ

12.1 Factor analysis
? ? ? ???? latent variable z = {1,2,..,K} ? ???? ??
An alternative is to use a vector of real-valued latent variables,zi ��R

? where W is a D��L matrix, known as the factor loading matrix, and �� is a D��D covariance matrix.
? We take �� to be diagonal, since the whole point of the model is to ��force�� zi to explain the correlation, rather than
��baking it in�� to the observation��s covariance.
? The special case in which ��=��2I is called probabilistic principal components analysis or PPCA.
? The reason for this name will become apparent later.

12.1.1 FA is a low rank parameterization of an MVN
? FA can be thought of as a way of specifying a joint density model on x using a small number of parameters.

12.1 Factor analysis
? The generative process, where L=1, D=2 and �� is diagonal, is illustrated in Figure 12.1.
? We take an isotropic Gaussian ��spray can�� and slide it along the 1d line defined by wzi +��.

? This induces an ellongated (and hence correlated) Gaussian in 2d.

12.1.2 Inference of the latent factors
?

latent factors z will reveal something interesting about the data.

xi(D??)? ??? L???? ???? ? ??
training set? D???? L???? ?? ??

12.1.2 Inference of the latent factors
? Example
? D =11??(????, ??? ?, ??,...), N =328 ?? example(??? ??), L = 2

? ? ??(????, ??? ?,.. 11?)? ?? ?? e1=(1,0,...,0), e2=(0,1,0,...,0)? ??? ??? ??? ?? ?
?? ? (biplot??? ?)
? biplot ??? ?? ????(??)? ? ??? ? ??? ?? ?

training set? D???? L???? ?? ?? (??? ?)

12.1.3 Unidentifiability
? Just like with mixture models, FA is also unidentifiable
? LDA ?? ?? ?????, z(??)? ??? ??
? ?? ???? ??? ?? ???, ?? ??? ??? ?
? ?? ??
? Forcing W to be orthonormal Perhaps the cleanest solution to the identifiability problem is to force W to be
orthonormal, and to order the columns by decreasing variance of the corresponding latent factors. This is the
approach adopted by PCA, which we will discuss in Section 12.2.
? orthonormal ??? ?? ???? ?? ????
? ???? ?????,

??'s ????: Latent Linear Model

12.1.4 Mixtures of factor analysers
?

let [the k��th linear subspace of dimensionality Lk]] be represented by Wk, for k=1:K.

? Suppose we have a latent indicator qi ��{1,...,K} specifying which subspace we should use to generate the data.

? We then sample zi from a Gaussian prior and pass it through the Wk matrix (where k=qi), and add noise.

? ??? Xi? k?? FA?? ???? ??
(GMM? ??)

12.1.5 EM for factor analysis models
Expected log likelihood

ESS(Expected Sufficient Statistics)

12.1.5 EM for factor analysis models
? E- step

? M-step

12.2 Principal components analysis (PCA)
? Consider the FA model where we constrain ��=��2I, and W to be orthonormal.
? It can be shown (Tipping and Bishop 1999) that, as ��2 ��0, this model reduces to classical (nonprobabilistic)principal
components analysis( PCA),
? The version where ��2 > 0 is known as probabilistic PCA(PPCA)

proof sketch
? reconstruction error? ??? W? ??? ? = z? ???? ???? ??? ??? ?? W? ??? ?
? z? ???? ???? ??? ??? ?? W? lagrange multiplier ???? ????

? z? ???? ???? ??? ??? ?? W? ????? ???? empirical covariance matrix? [??
?, ???, ???.. eigenvector]

proof of PCA
? wj ��RD to denote the j��th principal direction
? xi ��RD to denote the i��th high-dimensional observation,

? zi ��RL to denote the i��th low-dimensional representation

? Let us start by estimating the best 1d solution,w1 ��RD, and the corresponding projected points?z1��RN.

? So the optimal reconstruction weights are obtained by orthogonally projecting the data onto the first principal
direction

proof of PCA
x? z = wx? ??? ??? ????
??

????? reconstruction error? ????? ??? ??? ??? ??? ????? ??? ????

direction that maximizes the variance is an
eigenvector of the covariance matrix.

proof of PCA

Optimizing wrt w1 and z1 gives the same solution as before.

The proof continues in this way. (Formally one can use induction.)

12.2.3 Singular value decomposition (SVD)
? PCA? SVD? ??? ??? ??
? SVD? ???, PCA? ? W? ?? ? ??
? PCA? ?? truncated SVD approximation? ??

thin SVD

SVD: example
sigular value ??,??,?? ? ???

12.2.3 Singular value decomposition (SVD)

PCA? ? W? XTX? eigenvectors? ????, W=V
svd? ??? ? pca? ?? ???

PCA? ?? truncated SVD approximation? ??

12.2.4 Probabilistic PCA
? x? ??? 0, ��=��2I ?? W? orthogonal? FA? ????.

MLE? ???,

12.2.5 EM algorithm for PCA
? PCA?? Estep? latent ?? Z? ??? ?? ??? FA EM?? etep??? posterior? ??? ??

X? W? span?? ??? ??? ?

????? ??? ??? ??? ?
?? ??

?

linear regression ???? ??? ?? ???

? linear regression? ??? ?? span?? ???? y? ????? ???? ?? = ???? y? ?? ???
(7.3.2)
? // E-step? W? ???? span?? ???? X? ????? ?

Wt-1

? M-step

multi-output linear regression (Equation 7.89)

? linear regression?? ? y? ??? ??? linear regression

? ??? zi? ????, xi? ??? ?? multi-output linear regression??
? ??? ??? ??? zi? ??? ??(W)? ??? ? x(??? ?)?? ??? ??? ?? ?? ??? ?
??.

? EM? ??
? EM can be faster
? EM can be implemented in an online fashion, i.e., we can update our estimate of W
as the data streams in.

12.3.1 Model selection for FA/PPCA
12.3.2 Model selection for PCA

Conclusion
? FA? ????? x ?(D*D paramters), ? ?? parameter ??(D*L)? ????.
? PCA? FA? special ?????

? PCA??

? ? W? Z? ???? ???? ??? ??? ?? ?? ?? ?

eigenvalue? ???? eigenvectors??
? SVD (X = USV��)?? V? X? ??? ??? eigenvectors??. ???? W=V

�ݺ�ߣ

??'s ????: Latent Linear Model

Recommended

More Related Content

What's hot (20)

Viewers also liked (14)

Similar to ??'s ????: Latent Linear Model (20)

??'s ????: Latent Linear Model