�ݺ�ߣ

PRESENTATION ON AN OVERVIEW OF
CLUSTERING ALGORITHMS & COMPLEXITY
ANALYSIS
PRESENTED BY
Mohammad Ashraful Islam ; ID: 15162103013 ; SEC: 01 ; INTAKE: 32ND
Salahuddin Gazi Fahim ; ID: 15162103019 ; SEC: 01 ; INTAKE: 32ND
Dween Mohammad ; ID: 15162103038 ; SEC: 01 ; INTAKE: 32ND
DEPARTMENT OF B.Sc. in CSE
FACULTY OF ENGINEERING & APPLIED SCIENCES
BANGLADESH UNIVERSITY OF BUSINESS AND TECHNOLOGY

INDEX
o What is clustering
o What is good clustering
o Requirements of Clustering in Data Mining
o Major Clustering Approaches
o Cluster algorithms
oK-means
oExpectation maximization
oPAM Algorithm

CLUSTERING
─ Process of grouping similar items together
─ Clusters should be very similar to each other but…
─ Should be very different from the objects of other clusters/ other
clusters

WHAT IS GOOD CLUSTERING
o A good clustering method will produce high quality clusters with
o high intra-class similarity
o low inter-class similarity
Intra-cluster
distances are
minimized
Inter-cluster
distances are
maximized

REQUIREMENTS OF CLUSTERING IN DATA
MINING
o Scalability
o Ability to deal with different types of attributes
o Discovery of clusters with arbitrary shape
o Minimal requirements for domain knowledge to determine input parameters
o Able to deal with noise and outliers
o Insensitive to order of input records
o High dimensionality
o Incorporation of user-specified constraints
o Interpretability and usability

MAJOR CLUSTERING APPROACHES
o Partitioning algorithms
o Hierarchy algorithms
o Density-based
o Grid-based
o Model-based

CLUSTER ALGORITHMS
─ Clustering has been a popular area of research
─ Several methods and techniques have been developed to
determine natural grouping among the objects

ROAD MAP
K-means
Expectation Maximization Method
PAM algorithm

K-MEANS
o k-means clustering is a method of vector quantization,
originally from signal processing, that is popular for cluster
analysis in data mining.

ADVANTAGES & DISADVANTAGES
o Advantages
 Fast, robust and easier to understand.
 Relatively efficient.
 Gives best result when data set are distinct or well separated from
each other.

CONTINUE…..
o Disadvantages
 Applicable only when mean is defined i.e. fails for categorical data.
 Unable to handle noisy data and outliers.
 Algorithm fails for non-linear data set.

TIME AND SPACE COMPLEXITY
o The space requirements for K-means are modest.
o The storage required is O((m+k)n).
o The time requirements for K-means are also modest.
o The time required is O(I*k*m*n).

EXPECTATION MAXIMIZATION
o The EM (expectation maximization) technique is similar to the K-
Means technique.

o Advantage
 Gives extremely useful result for the real world data set.
o Disadvantage
 Algorithm is highly complex in nature.

Time computational complexity is linear in d (the number of input
features), n (the number of objects) and t (the number of iterations).

PAM ALGORITHM
o PAM stands for “partition around medoids”. The algorithm is
intended to find a sequence of objects called medoids that are
centrally located in clusters.

o Advantages
 Relatively scalable and simple.
 Suitable for datasets with compact spherical clusters that are well-
separated.

CONTINUE….
o Disadvantages
 Poor cluster descriptors.
 Reliance on the user to specify the number of clusters in advance.
 High sensitivity to initialization phase, noise and outliers.
 Frequent entrapments into local optima.
 Inability to deal with non-convex clusters of varying size and
density.

o The time complexity of the PAM algorithm is O(K(N − K)2 I).
o PAM is not scalable for large dataset.

It is not our
Ending …..It is
just Our
Performance 
THANK YOU :|

�ݺ�ߣ

Presentation on an overview of clustering algorithms & complexity analysis

More Related Content

Presentation on an overview of clustering algorithms & complexity analysis