ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
ML study
7??
12.1 Factor analysis
? ? ? ???? latent variable z = {1,2,..,K} ? ???? ??
An alternative is to use a vector of real-valued latent variables,zi ¡ÊR

? where W is a D¡ÁL matrix, known as the factor loading matrix, and ¦· is a D¡ÁD covariance matrix.
? We take ¦· to be diagonal, since the whole point of the model is to ¡°force¡± zi to explain the correlation, rather than
¡°baking it in¡± to the observation¡¯s covariance.
? The special case in which ¦·=¦Ò2I is called probabilistic principal components analysis or PPCA.
? The reason for this name will become apparent later.
12.1.1 FA is a low rank parameterization of an MVN
? FA can be thought of as a way of specifying a joint density model on x using a small number of parameters.
12.1 Factor analysis
? The generative process, where L=1, D=2 and ¦· is diagonal, is illustrated in Figure 12.1.
? We take an isotropic Gaussian ¡°spray can¡± and slide it along the 1d line defined by wzi +¦Ì.

? This induces an ellongated (and hence correlated) Gaussian in 2d.
12.1.2 Inference of the latent factors
?

latent factors z will reveal something interesting about the data.

xi(D??)? ??? L???? ???? ? ??
training set? D???? L???? ?? ??
12.1.2 Inference of the latent factors
? Example
? D =11??(????, ??? ?, ??,...), N =328 ?? example(??? ??), L = 2

? ? ??(????, ??? ?,.. 11?)? ?? ?? e1=(1,0,...,0), e2=(0,1,0,...,0)? ??? ??? ??? ?? ?
?? ? (biplot??? ?)
? biplot ??? ?? ????(??)? ? ??? ? ??? ?? ?

training set? D???? L???? ?? ?? (??? ?)
12.1.3 Unidentifiability
? Just like with mixture models, FA is also unidentifiable
? LDA ?? ?? ?????, z(??)? ??? ??
? ?? ???? ??? ?? ???, ?? ??? ??? ?
? ?? ??
? Forcing W to be orthonormal Perhaps the cleanest solution to the identifiability problem is to force W to be
orthonormal, and to order the columns by decreasing variance of the corresponding latent factors. This is the
approach adopted by PCA, which we will discuss in Section 12.2.
? orthonormal ??? ?? ???? ?? ????
? ???? ?????,
??'s ????: Latent Linear Model
12.1.4 Mixtures of factor analysers
?

let [the k¡¯th linear subspace of dimensionality Lk]] be represented by Wk, for k=1:K.

? Suppose we have a latent indicator qi ¡Ê{1,...,K} specifying which subspace we should use to generate the data.

? We then sample zi from a Gaussian prior and pass it through the Wk matrix (where k=qi), and add noise.

? ??? Xi? k?? FA?? ???? ??
(GMM? ??)
12.1.5 EM for factor analysis models
Expected log likelihood

ESS(Expected Sufficient Statistics)
12.1.5 EM for factor analysis models
? E- step

? M-step
12.2 Principal components analysis (PCA)
? Consider the FA model where we constrain ¦·=¦Ò2I, and W to be orthonormal.
? It can be shown (Tipping and Bishop 1999) that, as ¦Ò2 ¡ú0, this model reduces to classical (nonprobabilistic)principal
components analysis( PCA),
? The version where ¦Ò2 > 0 is known as probabilistic PCA(PPCA)
??'s ????: Latent Linear Model
proof sketch
? reconstruction error? ??? W? ??? ? = z? ???? ???? ??? ??? ?? W? ??? ?
? z? ???? ???? ??? ??? ?? W? lagrange multiplier ???? ????

? z? ???? ???? ??? ??? ?? W? ????? ???? empirical covariance matrix? [??
?, ???, ???.. eigenvector]
proof of PCA
? wj ¡ÊRD to denote the j¡¯th principal direction
? xi ¡ÊRD to denote the i¡¯th high-dimensional observation,

? zi ¡ÊRL to denote the i¡¯th low-dimensional representation

? Let us start by estimating the best 1d solution,w1 ¡ÊRD, and the corresponding projected points?z1¡ÊRN.

? So the optimal reconstruction weights are obtained by orthogonally projecting the data onto the first principal
direction
proof of PCA
x? z = wx? ??? ??? ????
??

????? reconstruction error? ????? ??? ??? ??? ??? ????? ??? ????

direction that maximizes the variance is an
eigenvector of the covariance matrix.
proof of PCA

Optimizing wrt w1 and z1 gives the same solution as before.

The proof continues in this way. (Formally one can use induction.)
12.2.3 Singular value decomposition (SVD)
? PCA? SVD? ??? ??? ??
? SVD? ???, PCA? ? W? ?? ? ??
? PCA? ?? truncated SVD approximation? ??

thin SVD
SVD: example
sigular value ??,??,?? ? ???
SVD: example
12.2.3 Singular value decomposition (SVD)

PCA? ? W? XTX? eigenvectors? ????, W=V
svd? ??? ? pca? ?? ???
PCA? ?? truncated SVD approximation? ??
12.2.4 Probabilistic PCA
? x? ??? 0, ¦·=¦Ò2I ?? W? orthogonal? FA? ????.

MLE? ???,
12.2.5 EM algorithm for PCA
? PCA?? Estep? latent ?? Z? ??? ?? ??? FA EM?? etep??? posterior? ??? ??

X? W? span?? ??? ??? ?

????? ??? ??? ??? ?
?? ??
12.2.5 EM algorithm for PCA
?

linear regression ???? ??? ?? ???

? linear regression? ??? ?? span?? ???? y? ????? ???? ?? = ???? y? ?? ???
(7.3.2)
? // E-step? W? ???? span?? ???? X? ????? ?

Wt-1
12.2.5 EM algorithm for PCA
? M-step

multi-output linear regression (Equation 7.89)

? linear regression?? ? y? ??? ??? linear regression

? ??? zi? ????, xi? ??? ?? multi-output linear regression??
? ??? ??? ??? zi? ??? ??(W)? ??? ? x(??? ?)?? ??? ??? ?? ?? ??? ?
??.
12.2.5 EM algorithm for PCA
? EM? ??
? EM can be faster
? EM can be implemented in an online fashion, i.e., we can update our estimate of W
as the data streams in.
12.3.1 Model selection for FA/PPCA
12.3.2 Model selection for PCA
Conclusion
? FA? ????? x ?(D*D paramters), ? ?? parameter ??(D*L)? ????.
? PCA? FA? special ?????

? PCA??

? ? W? Z? ???? ???? ??? ??? ?? ?? ?? ?

eigenvalue? ???? eigenvectors??
? SVD (X = USV¡¯)?? V? X? ??? ??? eigenvectors??. ???? W=V

More Related Content

What's hot (20)

3 Generative models for discrete data
3 Generative models for discrete data3 Generative models for discrete data
3 Generative models for discrete data
Jungkyu Lee
?
???? Variational autoencoder
???? Variational autoencoder???? Variational autoencoder
???? Variational autoencoder
?? ?
?
Gmm to vgmm
Gmm to vgmmGmm to vgmm
Gmm to vgmm
?? ?
?
Eigenvalues of regular graphs
Eigenvalues of regular graphsEigenvalues of regular graphs
Eigenvalues of regular graphs
Jungkyu Lee
?
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
?? ?
?
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
?? ?
?
Murpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelMurpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear Model
Jungkyu Lee
?
Normalization ??
Normalization ?? Normalization ??
Normalization ??
?? ?
?
??-??????? ??????
??-??????? ????????-??????? ??????
??-??????? ??????
jdo
?
Focal loss? ??(Detection & Classification)
Focal loss? ??(Detection & Classification)Focal loss? ??(Detection & Classification)
Focal loss? ??(Detection & Classification)
?? ?
?
Computational Complexity
Computational ComplexityComputational Complexity
Computational Complexity
skku_npc
?
Mathematics
MathematicsMathematics
Mathematics
skku_npc
?
A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generator
?? ?
?
Chapter 19 Variational Inference
Chapter 19 Variational InferenceChapter 19 Variational Inference
Chapter 19 Variational Inference
KyeongUkJang
?
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
?? ?
?
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researches
seungwoo kim
?
Data Visualization and t-SNE
Data Visualization and t-SNEData Visualization and t-SNE
Data Visualization and t-SNE
Hyeongmin Lee
?
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
Sang Jun Lee
?
Chapter 6 Deep feedforward networks - 1
Chapter 6 Deep feedforward networks - 1Chapter 6 Deep feedforward networks - 1
Chapter 6 Deep feedforward networks - 1
KyeongUkJang
?
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised Learning
Sang Jun Lee
?
3 Generative models for discrete data
3 Generative models for discrete data3 Generative models for discrete data
3 Generative models for discrete data
Jungkyu Lee
?
???? Variational autoencoder
???? Variational autoencoder???? Variational autoencoder
???? Variational autoencoder
?? ?
?
Gmm to vgmm
Gmm to vgmmGmm to vgmm
Gmm to vgmm
?? ?
?
Eigenvalues of regular graphs
Eigenvalues of regular graphsEigenvalues of regular graphs
Eigenvalues of regular graphs
Jungkyu Lee
?
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
?? ?
?
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
?? ?
?
Murpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear ModelMurpy's Machine Learning 9. Generalize Linear Model
Murpy's Machine Learning 9. Generalize Linear Model
Jungkyu Lee
?
Normalization ??
Normalization ?? Normalization ??
Normalization ??
?? ?
?
??-??????? ??????
??-??????? ????????-??????? ??????
??-??????? ??????
jdo
?
Focal loss? ??(Detection & Classification)
Focal loss? ??(Detection & Classification)Focal loss? ??(Detection & Classification)
Focal loss? ??(Detection & Classification)
?? ?
?
Computational Complexity
Computational ComplexityComputational Complexity
Computational Complexity
skku_npc
?
A neural image caption generator
A neural image caption generatorA neural image caption generator
A neural image caption generator
?? ?
?
Chapter 19 Variational Inference
Chapter 19 Variational InferenceChapter 19 Variational Inference
Chapter 19 Variational Inference
KyeongUkJang
?
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
Knowing when to look : Adaptive Attention via A Visual Sentinel for Image Cap...
?? ?
?
XAI recent researches
XAI recent researchesXAI recent researches
XAI recent researches
seungwoo kim
?
Data Visualization and t-SNE
Data Visualization and t-SNEData Visualization and t-SNE
Data Visualization and t-SNE
Hyeongmin Lee
?
Lecture 4: Neural Networks I
Lecture 4: Neural Networks ILecture 4: Neural Networks I
Lecture 4: Neural Networks I
Sang Jun Lee
?
Chapter 6 Deep feedforward networks - 1
Chapter 6 Deep feedforward networks - 1Chapter 6 Deep feedforward networks - 1
Chapter 6 Deep feedforward networks - 1
KyeongUkJang
?
Lecture 3: Unsupervised Learning
Lecture 3: Unsupervised LearningLecture 3: Unsupervised Learning
Lecture 3: Unsupervised Learning
Sang Jun Lee
?

Viewers also liked (14)

Jensen's inequality, EM ????
Jensen's inequality, EM ???? Jensen's inequality, EM ????
Jensen's inequality, EM ????
Jungkyu Lee
?
ThinkBayes: chapter?13??simulation
ThinkBayes: chapter?13??simulationThinkBayes: chapter?13??simulation
ThinkBayes: chapter?13??simulation
Jungkyu Lee
?
Murpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelMurpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. Kernel
Jungkyu Lee
?
ThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensionsThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensions
Jungkyu Lee
?
TETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNINGTETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNING
Jungkyu Lee
?
??? ???? 13 Sparse Linear Model
??? ???? 13 Sparse Linear Model??? ???? 13 Sparse Linear Model
??? ???? 13 Sparse Linear Model
Jungkyu Lee
?
4. Gaussian Model
4. Gaussian Model4. Gaussian Model
4. Gaussian Model
Jungkyu Lee
?
??? ?? ??? ????? ??
??? ?? ??? ????? ????? ?? ??? ????? ??
??? ?? ??? ????? ??
Jungkyu Lee
?
??? ????: 17? Markov Chain and HMM
??? ????: 17?  Markov Chain and HMM??? ????: 17?  Markov Chain and HMM
??? ????: 17? Markov Chain and HMM
Jungkyu Lee
?
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
hsharmasshare
?
From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecFrom A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vec
Jungkyu Lee
?
Machine Learning : Latent variable models for discrete data (Topic model ...)
Machine Learning : Latent variable models for discrete data (Topic model ...)Machine Learning : Latent variable models for discrete data (Topic model ...)
Machine Learning : Latent variable models for discrete data (Topic model ...)
Yukara Ikemiya
?
Jensen's inequality, EM ????
Jensen's inequality, EM ???? Jensen's inequality, EM ????
Jensen's inequality, EM ????
Jungkyu Lee
?
ThinkBayes: chapter?13??simulation
ThinkBayes: chapter?13??simulationThinkBayes: chapter?13??simulation
ThinkBayes: chapter?13??simulation
Jungkyu Lee
?
Murpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. KernelMurpy's Machine Learning:14. Kernel
Murpy's Machine Learning:14. Kernel
Jungkyu Lee
?
ThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensionsThinkBayes: Chapter 9 two_dimensions
ThinkBayes: Chapter 9 two_dimensions
Jungkyu Lee
?
TETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNINGTETRIS AI WITH REINFORCEMENT LEARNING
TETRIS AI WITH REINFORCEMENT LEARNING
Jungkyu Lee
?
??? ???? 13 Sparse Linear Model
??? ???? 13 Sparse Linear Model??? ???? 13 Sparse Linear Model
??? ???? 13 Sparse Linear Model
Jungkyu Lee
?
??? ?? ??? ????? ??
??? ?? ??? ????? ????? ?? ??? ????? ??
??? ?? ??? ????? ??
Jungkyu Lee
?
??? ????: 17? Markov Chain and HMM
??? ????: 17?  Markov Chain and HMM??? ????: 17?  Markov Chain and HMM
??? ????: 17? Markov Chain and HMM
Jungkyu Lee
?
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
hsharmasshare
?
From A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vecFrom A Neural Probalistic Language Model to Word2vec
From A Neural Probalistic Language Model to Word2vec
Jungkyu Lee
?
Machine Learning : Latent variable models for discrete data (Topic model ...)
Machine Learning : Latent variable models for discrete data (Topic model ...)Machine Learning : Latent variable models for discrete data (Topic model ...)
Machine Learning : Latent variable models for discrete data (Topic model ...)
Yukara Ikemiya
?

Similar to ??'s ????: Latent Linear Model (20)

Eigendecomposition and pca
Eigendecomposition and pcaEigendecomposition and pca
Eigendecomposition and pca
Jinhwan Suk
?
Wasserstein GAN ?? ???? I
Wasserstein GAN ?? ???? IWasserstein GAN ?? ???? I
Wasserstein GAN ?? ???? I
Sungbin Lim
?
(Handson ml)ch.8-dimensionality reduction
(Handson ml)ch.8-dimensionality reduction(Handson ml)ch.8-dimensionality reduction
(Handson ml)ch.8-dimensionality reduction
Haesun Park
?
[??] Tutorial: Sparse variational dropout
[??] Tutorial: Sparse variational dropout[??] Tutorial: Sparse variational dropout
[??] Tutorial: Sparse variational dropout
Wuhyun Rico Shin
?
07. PCA
07. PCA07. PCA
07. PCA
Jeonghun Yoon
?
[?? ???? ??? - ??? ????] 4?. ?? ??
[?? ???? ??? - ??? ????] 4?. ?? ??[?? ???? ??? - ??? ????] 4?. ?? ??
[?? ???? ??? - ??? ????] 4?. ?? ??
Haesun Park
?
RUCK 2017 ???? ???? ??? ??
RUCK 2017 ???? ???? ??? ??RUCK 2017 ???? ???? ??? ??
RUCK 2017 ???? ???? ??? ??
r-kor
?
Rendering techniques
Rendering techniquesRendering techniques
Rendering techniques
Jinho Yoo
?
Semiconductor Fundamentals 2.3
Semiconductor Fundamentals 2.3Semiconductor Fundamentals 2.3
Semiconductor Fundamentals 2.3
Je Hun Seo
?
08. spectal clustering
08. spectal clustering08. spectal clustering
08. spectal clustering
Jeonghun Yoon
?
2.supervised learning
2.supervised learning2.supervised learning
2.supervised learning
Haesun Park
?
??? ??? - ????? ??
??? ??? - ????? ????? ??? - ????? ??
??? ??? - ????? ??
??? default
?
??? ?? ??
??? ?? ????? ?? ??
??? ?? ??
?? ?
?
Ch11.????
Ch11.????Ch11.????
Ch11.????
?? ?
?
Tda jisu kim
Tda jisu kimTda jisu kim
Tda jisu kim
NAVER Engineering
?
R_datamining
R_dataminingR_datamining
R_datamining
?? ?
?
Deep Learning from scratch 5? : backpropagation
 Deep Learning from scratch 5? : backpropagation Deep Learning from scratch 5? : backpropagation
Deep Learning from scratch 5? : backpropagation
JinSooKim80
?
Variational AutoEncoder(VAE)
Variational AutoEncoder(VAE)Variational AutoEncoder(VAE)
Variational AutoEncoder(VAE)
??? ???
?
Eigendecomposition and pca
Eigendecomposition and pcaEigendecomposition and pca
Eigendecomposition and pca
Jinhwan Suk
?
Wasserstein GAN ?? ???? I
Wasserstein GAN ?? ???? IWasserstein GAN ?? ???? I
Wasserstein GAN ?? ???? I
Sungbin Lim
?
(Handson ml)ch.8-dimensionality reduction
(Handson ml)ch.8-dimensionality reduction(Handson ml)ch.8-dimensionality reduction
(Handson ml)ch.8-dimensionality reduction
Haesun Park
?
[??] Tutorial: Sparse variational dropout
[??] Tutorial: Sparse variational dropout[??] Tutorial: Sparse variational dropout
[??] Tutorial: Sparse variational dropout
Wuhyun Rico Shin
?
[?? ???? ??? - ??? ????] 4?. ?? ??
[?? ???? ??? - ??? ????] 4?. ?? ??[?? ???? ??? - ??? ????] 4?. ?? ??
[?? ???? ??? - ??? ????] 4?. ?? ??
Haesun Park
?
RUCK 2017 ???? ???? ??? ??
RUCK 2017 ???? ???? ??? ??RUCK 2017 ???? ???? ??? ??
RUCK 2017 ???? ???? ??? ??
r-kor
?
Rendering techniques
Rendering techniquesRendering techniques
Rendering techniques
Jinho Yoo
?
Semiconductor Fundamentals 2.3
Semiconductor Fundamentals 2.3Semiconductor Fundamentals 2.3
Semiconductor Fundamentals 2.3
Je Hun Seo
?
2.supervised learning
2.supervised learning2.supervised learning
2.supervised learning
Haesun Park
?
??? ?? ??
??? ?? ????? ?? ??
??? ?? ??
?? ?
?
Ch11.????
Ch11.????Ch11.????
Ch11.????
?? ?
?
R_datamining
R_dataminingR_datamining
R_datamining
?? ?
?
Deep Learning from scratch 5? : backpropagation
 Deep Learning from scratch 5? : backpropagation Deep Learning from scratch 5? : backpropagation
Deep Learning from scratch 5? : backpropagation
JinSooKim80
?
Variational AutoEncoder(VAE)
Variational AutoEncoder(VAE)Variational AutoEncoder(VAE)
Variational AutoEncoder(VAE)
??? ???
?

??'s ????: Latent Linear Model

  • 2. 12.1 Factor analysis ? ? ? ???? latent variable z = {1,2,..,K} ? ???? ?? An alternative is to use a vector of real-valued latent variables,zi ¡ÊR ? where W is a D¡ÁL matrix, known as the factor loading matrix, and ¦· is a D¡ÁD covariance matrix. ? We take ¦· to be diagonal, since the whole point of the model is to ¡°force¡± zi to explain the correlation, rather than ¡°baking it in¡± to the observation¡¯s covariance. ? The special case in which ¦·=¦Ò2I is called probabilistic principal components analysis or PPCA. ? The reason for this name will become apparent later.
  • 3. 12.1.1 FA is a low rank parameterization of an MVN ? FA can be thought of as a way of specifying a joint density model on x using a small number of parameters.
  • 4. 12.1 Factor analysis ? The generative process, where L=1, D=2 and ¦· is diagonal, is illustrated in Figure 12.1. ? We take an isotropic Gaussian ¡°spray can¡± and slide it along the 1d line defined by wzi +¦Ì. ? This induces an ellongated (and hence correlated) Gaussian in 2d.
  • 5. 12.1.2 Inference of the latent factors ? latent factors z will reveal something interesting about the data. xi(D??)? ??? L???? ???? ? ?? training set? D???? L???? ?? ??
  • 6. 12.1.2 Inference of the latent factors ? Example ? D =11??(????, ??? ?, ??,...), N =328 ?? example(??? ??), L = 2 ? ? ??(????, ??? ?,.. 11?)? ?? ?? e1=(1,0,...,0), e2=(0,1,0,...,0)? ??? ??? ??? ?? ? ?? ? (biplot??? ?) ? biplot ??? ?? ????(??)? ? ??? ? ??? ?? ? training set? D???? L???? ?? ?? (??? ?)
  • 7. 12.1.3 Unidentifiability ? Just like with mixture models, FA is also unidentifiable ? LDA ?? ?? ?????, z(??)? ??? ?? ? ?? ???? ??? ?? ???, ?? ??? ??? ? ? ?? ?? ? Forcing W to be orthonormal Perhaps the cleanest solution to the identifiability problem is to force W to be orthonormal, and to order the columns by decreasing variance of the corresponding latent factors. This is the approach adopted by PCA, which we will discuss in Section 12.2. ? orthonormal ??? ?? ???? ?? ???? ? ???? ?????,
  • 9. 12.1.4 Mixtures of factor analysers ? let [the k¡¯th linear subspace of dimensionality Lk]] be represented by Wk, for k=1:K. ? Suppose we have a latent indicator qi ¡Ê{1,...,K} specifying which subspace we should use to generate the data. ? We then sample zi from a Gaussian prior and pass it through the Wk matrix (where k=qi), and add noise. ? ??? Xi? k?? FA?? ???? ?? (GMM? ??)
  • 10. 12.1.5 EM for factor analysis models Expected log likelihood ESS(Expected Sufficient Statistics)
  • 11. 12.1.5 EM for factor analysis models ? E- step ? M-step
  • 12. 12.2 Principal components analysis (PCA) ? Consider the FA model where we constrain ¦·=¦Ò2I, and W to be orthonormal. ? It can be shown (Tipping and Bishop 1999) that, as ¦Ò2 ¡ú0, this model reduces to classical (nonprobabilistic)principal components analysis( PCA), ? The version where ¦Ò2 > 0 is known as probabilistic PCA(PPCA)
  • 14. proof sketch ? reconstruction error? ??? W? ??? ? = z? ???? ???? ??? ??? ?? W? ??? ? ? z? ???? ???? ??? ??? ?? W? lagrange multiplier ???? ???? ? z? ???? ???? ??? ??? ?? W? ????? ???? empirical covariance matrix? [?? ?, ???, ???.. eigenvector]
  • 15. proof of PCA ? wj ¡ÊRD to denote the j¡¯th principal direction ? xi ¡ÊRD to denote the i¡¯th high-dimensional observation, ? zi ¡ÊRL to denote the i¡¯th low-dimensional representation ? Let us start by estimating the best 1d solution,w1 ¡ÊRD, and the corresponding projected points?z1¡ÊRN. ? So the optimal reconstruction weights are obtained by orthogonally projecting the data onto the first principal direction
  • 16. proof of PCA x? z = wx? ??? ??? ???? ?? ????? reconstruction error? ????? ??? ??? ??? ??? ????? ??? ???? direction that maximizes the variance is an eigenvector of the covariance matrix.
  • 17. proof of PCA Optimizing wrt w1 and z1 gives the same solution as before. The proof continues in this way. (Formally one can use induction.)
  • 18. 12.2.3 Singular value decomposition (SVD) ? PCA? SVD? ??? ??? ?? ? SVD? ???, PCA? ? W? ?? ? ?? ? PCA? ?? truncated SVD approximation? ?? thin SVD
  • 19. SVD: example sigular value ??,??,?? ? ???
  • 21. 12.2.3 Singular value decomposition (SVD) PCA? ? W? XTX? eigenvectors? ????, W=V svd? ??? ? pca? ?? ???
  • 22. PCA? ?? truncated SVD approximation? ??
  • 23. 12.2.4 Probabilistic PCA ? x? ??? 0, ¦·=¦Ò2I ?? W? orthogonal? FA? ????. MLE? ???,
  • 24. 12.2.5 EM algorithm for PCA ? PCA?? Estep? latent ?? Z? ??? ?? ??? FA EM?? etep??? posterior? ??? ?? X? W? span?? ??? ??? ? ????? ??? ??? ??? ? ?? ??
  • 25. 12.2.5 EM algorithm for PCA ? linear regression ???? ??? ?? ??? ? linear regression? ??? ?? span?? ???? y? ????? ???? ?? = ???? y? ?? ??? (7.3.2) ? // E-step? W? ???? span?? ???? X? ????? ? Wt-1
  • 26. 12.2.5 EM algorithm for PCA ? M-step multi-output linear regression (Equation 7.89) ? linear regression?? ? y? ??? ??? linear regression ? ??? zi? ????, xi? ??? ?? multi-output linear regression?? ? ??? ??? ??? zi? ??? ??(W)? ??? ? x(??? ?)?? ??? ??? ?? ?? ??? ? ??.
  • 27. 12.2.5 EM algorithm for PCA ? EM? ?? ? EM can be faster ? EM can be implemented in an online fashion, i.e., we can update our estimate of W as the data streams in.
  • 28. 12.3.1 Model selection for FA/PPCA 12.3.2 Model selection for PCA
  • 29. Conclusion ? FA? ????? x ?(D*D paramters), ? ?? parameter ??(D*L)? ????. ? PCA? FA? special ????? ? PCA?? ? ? W? Z? ???? ???? ??? ??? ?? ?? ?? ? eigenvalue? ???? eigenvectors?? ? SVD (X = USV¡¯)?? V? X? ??? ??? eigenvectors??. ???? W=V