�ݺ�ߣ

1
DTS304TC: Machine Learning
Lecture 8: Gaussian Mixture Model (GMM)
Dr Kang Dang
D-5032, Taicang Campus
Kang.Dang@xjtlu.edu.cn
Tel: 88973341

2
Acknowledges
This set of lecture notes has been adapted from
materials originally provided by Dr. Gan Hong Seng and
Christopher M. Bishop's lecture notes.

3
Course Outline
? What it is GMM?
? The concept of Mixture of Gaussians
? EM algorithm & Latent Variables l,

4
What is Gaussian Mixture Model?
? Probabilistic Model used for clustering and classification tasks.
? Assumption: data is generated by a mixture of several Gaussian
distributions, each with its own mean and variance.
? Application: by fitting a GMM to the data:
? Identify underlying clusters.
? Make predictions on new data points through probabilistic
assignments to each cluster..
? What is Gaussian Mixture Model

5
Example of Gaussian Distribution
X-Axis: Data Values
Y-Axis: Frequency or Probability of Occurrence
? Bell-Shaped Curve: illustrates that most data is clustered around the mean.
? Mean is depicted by the vertical line at the center.
? Standard Deviation measures the spread of the data

6
Example of Gaussian Distribution

7
Multivariate Gaussian Distribution

Likelihood Function
? Data set
? The probability of observing x given the Gaussian distribution:
Assume observed data points generated independently
? This probability is a function of the parameters this is known as the
likelihood function

Maximum Likelihood
? Obtaining the parameters by the given dataset, and maximizing the
likelihood function
? Equivalently maximize the log likelihood

Maximum Likelihood Solution
? Maximizing w.r.t. the mean gives the sample mean
? Maximizing w.r.t covariance gives the sample covariance

11
Mixture Models
? So estimating parameters for a single Gaussian is simple.
? How about modelling non-Gaussian data?
? Mixture models can be powerful to handle many non-gaussian data
distributions!

12
Mixture Model
Mixture Models are a collection of the weighted sum of a number of
probability density functions (PDFs) where the weights are determined by a
distribution

14
Hard Assignments (K-Means Clustering)
? Exclusive Assignment: each data point is assigned to a single
cluster.
? Cluster Membership: data points belong to one, and only
one, cluster.

15
Soft Assignments (GMM)
? Probabilistic Assignment: Assigns a probability for each data point
indicating its likelihood of belonging to each Gaussian distribution in
the mixture.
? Partial Membership: A single data point can have partial membership
in multiple Gaussian distributions.

16
Q&A
? When to use hard assignment and when to use soft assignment?

17
Hard vs Soft Assignemnts
? When to Use Hard Assignments
? Ideal for data with clearly separable, distinct clusters.
? Most effective when there is minimal overlap between clusters.
? When to Use Soft Assignments
? Suitable for data that is not easily separable into distinct clusters.
? Ideal for handling data with significant overlap between clusters.

18
Hard Assignments vs Soft Assignments

20
Mixture of Gaussian in 2D
? Model Assumption: Data points are generated by a combination of several 2D Gaussian distributions.
? Distinct Parameters: Each distribution has its own mean (center point) and covariance matrix (shape and
orientation).

23
Gaussian Mixture Model as PDF
Q&A:
? How to prove a function is a PDF?

24

25

Gaussian Mixtures
? Linear super-position of Gaussians
? Normalization and positivity require
? Can interpret the mixing coefficients as prior probabilities

Sampling from the Gaussian Mixture
? To generate a data point:
? first pick one of the components with probability
? then draw a sample from that component
? Repeat these two steps for each new data point

Fitting the Gaussian Mixture
? We wish to invert this process �C given the data set, find the
corresponding parameters:
? mixing coefficients
? means
? covariances
? If we knew which component generated each data point, the
maximum likelihood solution would involve fitting each component to
the corresponding cluster
? Problem: the data set is unlabelled
? We shall refer to the labels as latent (= hidden) variables

Synthetic Data Set Without Labels

Posterior Probabilities
? We can think of the mixing coefficients as prior probabilities for the
components
? For a given value of we can evaluate the corresponding posterior
probabilities, called responsibilities
? These are given from Bayes�� theorem by

Posterior Probabilities (colour coded)

Maximum Likelihood for the GMM
? The log likelihood function takes the form
? Note: sum over components appears inside the log
? There is no closed form solution for maximum likelihood

Problems and Solutions
? How to maximize the log likelihood
? solved by expectation-maximization (EM) algorithm
? This is the topic of our lecture
? How to avoid singularities in the likelihood function
? solved by a Bayesian treatment
? How to choose number K of components
? also solved by a Bayesian treatment

EM Algorithm �C Informal Derivation
? Let us proceed by simply differentiating the log likelihood
? Setting derivative with respect to equal to zero gives
giving
which is simply the weighted mean of the data

? Similarly for the covariances
? For mixing coefficients use a Lagrange multiplier to give

37
EM Algorithm for GMM Estimation

38
EM Algorithm for GMM Estimation

39
EM Algorithm for GMM Estimation -
Summary
Evaluate the log likelihood

? An iterative scheme for solving them:
? Make initial guesses for the parameters
? Alternate between the following two stages:
1. E-step: evaluate responsibilities
2. M-step: update parameters using ML results

GMM Clustering Presentation �ݺ�ߣs for Machine Learning Course

47
GMM and K-Means Differences
K-means Clustering
? Assumption: Spherical clusters with equal probability.
? Cluster Assignment: Hard assignment (points belong to one cluster).
? Cluster Shape: Only identifies circular clusters.
? Algorithm: Minimizes within-cluster variance.
? Outlier Sensitivity: High, due to mean calculation.
Gaussian Mixture Models (GMM)
? Assumption: Data from multiple Gaussian distributions.
? Cluster Assignment: Soft assignment (probabilistic cluster
membership).
? Cluster Shape: Identifies elliptical clusters.
? Algorithm: Maximizes likelihood using expectation-maximization.
? Outlier Sensitivity: Lower, due to probabilistic framework.

48
Flexibility in Cluster Shapes: GMM can model elliptical and varying size clusters, not
just spherical.
Soft Clustering and Uncertainty: Provides membership probabilities, offering a
nuanced understanding of cluster belonging.
Density Estimation: GMM estimates the density distribution of each cluster, not just
central tendency.
Model Complexity: GMM captures complex cluster structures but requires more data
and computational power.

49
Use K-means When:
? You need a fast, simple, and interpretable model.
? Your data is expected to form spherical clusters.
? Computational resources are limited.
Use GMM When:
? You suspect clusters are non-spherical or have different sizes.
? You need a measure of uncertainty in cluster assignments.
? You have enough data to estimate the additional parameters reliably.
Takeaway:
? K-means is efficient for well-separated, spherical clusters.
? GMM is more flexible, capturing complex cluster shapes and providing
probabilistic cluster assignments.

�ݺ�ߣ

GMM Clustering Presentation �ݺ�ߣs for Machine Learning Course

Recommended

More Related Content

Similar to GMM Clustering Presentation �ݺ�ߣs for Machine Learning Course (20)

More from ssuserfece35 (7)

Recently uploaded (20)

GMM Clustering Presentation �ݺ�ߣs for Machine Learning Course

Editor's Notes