際際滷

際際滷Share a Scribd company logo
PRESENTATION ON AN OVERVIEW OF
CLUSTERING ALGORITHMS & COMPLEXITY
ANALYSIS
PRESENTED BY
Mohammad Ashraful Islam ; ID: 15162103013 ; SEC: 01 ; INTAKE: 32ND
Salahuddin Gazi Fahim ; ID: 15162103019 ; SEC: 01 ; INTAKE: 32ND
Dween Mohammad ; ID: 15162103038 ; SEC: 01 ; INTAKE: 32ND
DEPARTMENT OF B.Sc. in CSE
FACULTY OF ENGINEERING & APPLIED SCIENCES
BANGLADESH UNIVERSITY OF BUSINESS AND TECHNOLOGY
INDEX
o What is clustering
o What is good clustering
o Requirements of Clustering in Data Mining
o Major Clustering Approaches
o Cluster algorithms
oK-means
oExpectation maximization
oPAM Algorithm
CLUSTERING
 Process of grouping similar items together
 Clusters should be very similar to each other but
 Should be very different from the objects of other clusters/ other
clusters
WHAT IS GOOD CLUSTERING
o A good clustering method will produce high quality clusters with
o high intra-class similarity
o low inter-class similarity
Intra-cluster
distances are
minimized
Inter-cluster
distances are
maximized
REQUIREMENTS OF CLUSTERING IN DATA
MINING
o Scalability
o Ability to deal with different types of attributes
o Discovery of clusters with arbitrary shape
o Minimal requirements for domain knowledge to determine input parameters
o Able to deal with noise and outliers
o Insensitive to order of input records
o High dimensionality
o Incorporation of user-specified constraints
o Interpretability and usability
MAJOR CLUSTERING APPROACHES
o Partitioning algorithms
o Hierarchy algorithms
o Density-based
o Grid-based
o Model-based
CLUSTER ALGORITHMS
 Clustering has been a popular area of research
 Several methods and techniques have been developed to
determine natural grouping among the objects
ROAD MAP
K-means
Expectation Maximization Method
PAM algorithm
K-MEANS
o k-means clustering is a method of vector quantization,
originally from signal processing, that is popular for cluster
analysis in data mining.
ADVANTAGES & DISADVANTAGES
o Advantages
 Fast, robust and easier to understand.
 Relatively efficient.
 Gives best result when data set are distinct or well separated from
each other.
CONTINUE..
o Disadvantages
 Applicable only when mean is defined i.e. fails for categorical data.
 Unable to handle noisy data and outliers.
 Algorithm fails for non-linear data set.
TIME AND SPACE COMPLEXITY
o The space requirements for K-means are modest.
o The storage required is O((m+k)n).
o The time requirements for K-means are also modest.
o The time required is O(I*k*m*n).
EXPECTATION MAXIMIZATION
o The EM (expectation maximization) technique is similar to the K-
Means technique.
ADVANTAGES & DISADVANTAGES
o Advantage
 Gives extremely useful result for the real world data set.
o Disadvantage
 Algorithm is highly complex in nature.
TIME AND SPACE COMPLEXITY
Time computational complexity is linear in d (the number of input
features), n (the number of objects) and t (the number of iterations).
PAM ALGORITHM
o PAM stands for partition around medoids. The algorithm is
intended to find a sequence of objects called medoids that are
centrally located in clusters.
ADVANTAGES & DISADVANTAGES
o Advantages
 Relatively scalable and simple.
 Suitable for datasets with compact spherical clusters that are well-
separated.
CONTINUE.
o Disadvantages
 Poor cluster descriptors.
 Reliance on the user to specify the number of clusters in advance.
 High sensitivity to initialization phase, noise and outliers.
 Frequent entrapments into local optima.
 Inability to deal with non-convex clusters of varying size and
density.
TIME AND SPACE COMPLEXITY
o The time complexity of the PAM algorithm is O(K(N  K)2 I).
o PAM is not scalable for large dataset.
It is not our
Ending ..It is
just Our
Performance 
THANK YOU :|

More Related Content

Presentation on an overview of clustering algorithms & complexity analysis

  • 1. PRESENTATION ON AN OVERVIEW OF CLUSTERING ALGORITHMS & COMPLEXITY ANALYSIS PRESENTED BY Mohammad Ashraful Islam ; ID: 15162103013 ; SEC: 01 ; INTAKE: 32ND Salahuddin Gazi Fahim ; ID: 15162103019 ; SEC: 01 ; INTAKE: 32ND Dween Mohammad ; ID: 15162103038 ; SEC: 01 ; INTAKE: 32ND DEPARTMENT OF B.Sc. in CSE FACULTY OF ENGINEERING & APPLIED SCIENCES BANGLADESH UNIVERSITY OF BUSINESS AND TECHNOLOGY
  • 2. INDEX o What is clustering o What is good clustering o Requirements of Clustering in Data Mining o Major Clustering Approaches o Cluster algorithms oK-means oExpectation maximization oPAM Algorithm
  • 3. CLUSTERING Process of grouping similar items together Clusters should be very similar to each other but Should be very different from the objects of other clusters/ other clusters
  • 4. WHAT IS GOOD CLUSTERING o A good clustering method will produce high quality clusters with o high intra-class similarity o low inter-class similarity Intra-cluster distances are minimized Inter-cluster distances are maximized
  • 5. REQUIREMENTS OF CLUSTERING IN DATA MINING o Scalability o Ability to deal with different types of attributes o Discovery of clusters with arbitrary shape o Minimal requirements for domain knowledge to determine input parameters o Able to deal with noise and outliers o Insensitive to order of input records o High dimensionality o Incorporation of user-specified constraints o Interpretability and usability
  • 6. MAJOR CLUSTERING APPROACHES o Partitioning algorithms o Hierarchy algorithms o Density-based o Grid-based o Model-based
  • 7. CLUSTER ALGORITHMS Clustering has been a popular area of research Several methods and techniques have been developed to determine natural grouping among the objects
  • 9. K-MEANS o k-means clustering is a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining.
  • 10. ADVANTAGES & DISADVANTAGES o Advantages Fast, robust and easier to understand. Relatively efficient. Gives best result when data set are distinct or well separated from each other.
  • 11. CONTINUE.. o Disadvantages Applicable only when mean is defined i.e. fails for categorical data. Unable to handle noisy data and outliers. Algorithm fails for non-linear data set.
  • 12. TIME AND SPACE COMPLEXITY o The space requirements for K-means are modest. o The storage required is O((m+k)n). o The time requirements for K-means are also modest. o The time required is O(I*k*m*n).
  • 13. EXPECTATION MAXIMIZATION o The EM (expectation maximization) technique is similar to the K- Means technique.
  • 14. ADVANTAGES & DISADVANTAGES o Advantage Gives extremely useful result for the real world data set. o Disadvantage Algorithm is highly complex in nature.
  • 15. TIME AND SPACE COMPLEXITY Time computational complexity is linear in d (the number of input features), n (the number of objects) and t (the number of iterations).
  • 16. PAM ALGORITHM o PAM stands for partition around medoids. The algorithm is intended to find a sequence of objects called medoids that are centrally located in clusters.
  • 17. ADVANTAGES & DISADVANTAGES o Advantages Relatively scalable and simple. Suitable for datasets with compact spherical clusters that are well- separated.
  • 18. CONTINUE. o Disadvantages Poor cluster descriptors. Reliance on the user to specify the number of clusters in advance. High sensitivity to initialization phase, noise and outliers. Frequent entrapments into local optima. Inability to deal with non-convex clusters of varying size and density.
  • 19. TIME AND SPACE COMPLEXITY o The time complexity of the PAM algorithm is O(K(N K)2 I). o PAM is not scalable for large dataset.
  • 20. It is not our Ending ..It is just Our Performance THANK YOU :|