It's a research base power point presentation on clustering algorithm.In Shaa Allah It may help you lot for better understanding about clustering.
1 of 20
Download to read offline
More Related Content
Presentation on an overview of clustering algorithms & complexity analysis
1. PRESENTATION ON AN OVERVIEW OF
CLUSTERING ALGORITHMS & COMPLEXITY
ANALYSIS
PRESENTED BY
Mohammad Ashraful Islam ; ID: 15162103013 ; SEC: 01 ; INTAKE: 32ND
Salahuddin Gazi Fahim ; ID: 15162103019 ; SEC: 01 ; INTAKE: 32ND
Dween Mohammad ; ID: 15162103038 ; SEC: 01 ; INTAKE: 32ND
DEPARTMENT OF B.Sc. in CSE
FACULTY OF ENGINEERING & APPLIED SCIENCES
BANGLADESH UNIVERSITY OF BUSINESS AND TECHNOLOGY
2. INDEX
o What is clustering
o What is good clustering
o Requirements of Clustering in Data Mining
o Major Clustering Approaches
o Cluster algorithms
oK-means
oExpectation maximization
oPAM Algorithm
3. CLUSTERING
Process of grouping similar items together
Clusters should be very similar to each other but
Should be very different from the objects of other clusters/ other
clusters
4. WHAT IS GOOD CLUSTERING
o A good clustering method will produce high quality clusters with
o high intra-class similarity
o low inter-class similarity
Intra-cluster
distances are
minimized
Inter-cluster
distances are
maximized
5. REQUIREMENTS OF CLUSTERING IN DATA
MINING
o Scalability
o Ability to deal with different types of attributes
o Discovery of clusters with arbitrary shape
o Minimal requirements for domain knowledge to determine input parameters
o Able to deal with noise and outliers
o Insensitive to order of input records
o High dimensionality
o Incorporation of user-specified constraints
o Interpretability and usability
6. MAJOR CLUSTERING APPROACHES
o Partitioning algorithms
o Hierarchy algorithms
o Density-based
o Grid-based
o Model-based
7. CLUSTER ALGORITHMS
Clustering has been a popular area of research
Several methods and techniques have been developed to
determine natural grouping among the objects
9. K-MEANS
o k-means clustering is a method of vector quantization,
originally from signal processing, that is popular for cluster
analysis in data mining.
10. ADVANTAGES & DISADVANTAGES
o Advantages
Fast, robust and easier to understand.
Relatively efficient.
Gives best result when data set are distinct or well separated from
each other.
11. CONTINUE..
o Disadvantages
Applicable only when mean is defined i.e. fails for categorical data.
Unable to handle noisy data and outliers.
Algorithm fails for non-linear data set.
12. TIME AND SPACE COMPLEXITY
o The space requirements for K-means are modest.
o The storage required is O((m+k)n).
o The time requirements for K-means are also modest.
o The time required is O(I*k*m*n).
14. ADVANTAGES & DISADVANTAGES
o Advantage
Gives extremely useful result for the real world data set.
o Disadvantage
Algorithm is highly complex in nature.
15. TIME AND SPACE COMPLEXITY
Time computational complexity is linear in d (the number of input
features), n (the number of objects) and t (the number of iterations).
16. PAM ALGORITHM
o PAM stands for partition around medoids. The algorithm is
intended to find a sequence of objects called medoids that are
centrally located in clusters.
17. ADVANTAGES & DISADVANTAGES
o Advantages
Relatively scalable and simple.
Suitable for datasets with compact spherical clusters that are well-
separated.
18. CONTINUE.
o Disadvantages
Poor cluster descriptors.
Reliance on the user to specify the number of clusters in advance.
High sensitivity to initialization phase, noise and outliers.
Frequent entrapments into local optima.
Inability to deal with non-convex clusters of varying size and
density.
19. TIME AND SPACE COMPLEXITY
o The time complexity of the PAM algorithm is O(K(N K)2 I).
o PAM is not scalable for large dataset.
20. It is not our
Ending ..It is
just Our
Performance
THANK YOU :|