際際滷

際際滷Share a Scribd company logo
CLUSTERING
CHAPTER 7: MACHINE LEARNING  THEORY & PRACTICE
Introduction
 Clustering refers to the process of arranging or
organizing objects according to specific criteria.
 Partitioning of data
oGrouping of data in database application improves the
data access
 Data reorganization
 Data compression
Introduction
 Summarization
oStatistical measures like mean, mode, and median
can provide summarized information.
 Matrix factorization
o Let there be n data points in an l-dimensional space. We
can represent it as a matrix Xnl.
oIt is possible to approximate X as a product of two
matrices BnK and CKl.
oSo, X  BC, where B is the cluster assignment matrix and
C is the representatives matrix
Clustering process
 The clustering process ensure that the distance
between any two points within the same cluster
(intra-cluster distance), as measured by a
dissimilarity measure such as Euclidean distance, is
smaller than the distance between any two points
belonging to different clusters (inter-cluster
distance)
 Any two points are placed in the same cluster if the
distance between them is lower than a certain
threshold (input). Squared Euclidean distance
is used as one of the measure to compute the
distance between the points.
Hard and soft clustering
Data abstraction
 Clustering is a useful method for data abstraction, and
it can be applied to generate clusters of data points
that can be represented by their centroid or
medoid or leader or some other suitable entity.
 The centroid is computed as the sample mean of the
data points in a cluster.
 The medoid is the point that minimizes the sum of
distances to all other points in the cluster.
 Note: The centroid can shift dramatically based on the
position of the outlier, while the medoid remains stable
within the boundaries of the original cluster.
Clustering algorithms
Divisive clustering
 Divisive algorithms are either polythetic where the
division is based on more than one feature or
monothetic when only one feature is considered at
a time.
 The polythetic scheme is based on finding all
possible 2-partitions of the data and choosing the
best among them. If there are n patterns, the
number
of distinct 2-partions is given by (2n 2)/2 = 2n1  1.
Divisive clustering
 Among all possible 2-partitions, the partition with the
least sum of the sample variances of the two clusters is
chosen as the best.
 From the resulting partition, the cluster with the
maximum sample variance is selected and is split into
an optimal 2-partition.
 This process is repeated till we get singleton clusters.
 If a collection of patterns (data points) is split into two
clusters with p patterns x1, 揃 揃 揃 , xp in one cluster and q
patterns y1, 揃 揃 揃 , yq in the other cluster with the
centroids of the two clusters being C1 and C2
respectively, then the sum of the sample variances will
be
Divisive clustering
Monothetic clustering
 involves considering each feature direction
individually and dividing the data into two clusters
based on the gap in projected values along that
feature direction.
 Specifically, the dataset is split into two parts at a
point that corresponds to the mean value of the
maximum gap observed among the feature values.
 This process is then repeated sequentially for the
remaining features, further partitioning each
cluster.
Monothetic clustering
Agglomerative clustering
 An agglomerative clustering algorithm generally follows
the following steps:
1. Compute the proximity matrix for all pairs of patterns
in the dataset.
2. Find the closest pair of clusters based on the
computed proximity measure and merge them into a
single cluster. Update the proximity matrix to reflect
the merge, adjusting the distances between the
newly formed cluster and the remaining clusters.
3. If all patterns belong to a single cluster, terminate the
algorithm. Otherwise, go back to Step 2 and repeat
the process until all patterns are in one cluster.
Agglomerative clustering
k-Means clustering
Elbow method to select k
k-Means++ clustering
 k-means++ clustering algorithm is mainly used for
identifying the initial cluster centers.
Agglomerative clustering
Soft partitioning
1. Fuzzy clustering: Each data point is assigned to
multiple clusters, typically more than one, based on a
membership value. The value is computed using the
data point and the corresponding cluster centroid.
2. Rough clustering: Each cluster is assumed to have
both a non-overlapping part and an overlapping part.
Data points in the non-overlapping portion
exclusively belong to that cluster, while data points in
the overlapping part may belong to multiple clusters.
3. Neural network-based clustering: In this method,
varying weights associated with the data points are
used to obtain a soft partition.
Soft partitioning  Contd.
1. Simulated annealing: In this case, the current solution is
randomly updated, and the resulting solution is accepted
with a certain proba-bility. If the resulting solution is
better than the current solution, it is accepted; otherwise,
it is accepted with a probability ranging from 0 to 1.
2. Tabu search: Unlike simulated annealing, multiple
solutions are stored, and the current solution is perturbed
in various ways to determine the next configuration.
3. Evolutionary algorithms: This method maintains a
population of solutions. In addition to the fitness values
of individuals, a random search based on the interaction
among solutions with mutation is employed to generate
the next population.
Fuzzy c-means clustering
Fuzzy c-means clustering  contd.
Rough clustering
Rough k-means clustering algorithm
Rough k-means clustering algorithm
Clustering large datasets
 Issues:
 Number of dataset scans
 Incremental changes in dataset
 Solutions:
 Single dataset scan clustering algorithms
 Incremental clustering algorithms
 Abstraction based clustering
 Examples
 PC-Clustering algorithm
 Leader clustering algorithm
Divide-and-conquer method
 The divide-and-conquer approach is an effective
strategy for addressing the challenge of clustering
large datasets that cannot be stored entirely in
main memory.
 To overcome this limitation, a common solution is
to process a portion of the dataset at a time and
store the relevant cluster representatives in
memory.
Divide-and-conquer method
Divide-and-conquer method

More Related Content

Similar to Chapter7 clustering types concepts algorithms.pdf (20)

Clustering
ClusteringClustering
Clustering
Md. Hasnat Shoheb
k-mean-clustering.pdf
k-mean-clustering.pdfk-mean-clustering.pdf
k-mean-clustering.pdf
YatharthKhichar1
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
refedey275
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
VoidVampire
clustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data miningclustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data mining
RevathiSundar4
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
SowmyaJyothi3
clustering using different methods in .pdf
clustering using different methods in .pdfclustering using different methods in .pdf
clustering using different methods in .pdf
officialnovice7
clustering ppt.pptx
clustering ppt.pptxclustering ppt.pptx
clustering ppt.pptx
chmeghana1
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
JK970901
Algorithms used in AIML and the need for aiml basic use cases
Algorithms used in AIML and the need for aiml basic use casesAlgorithms used in AIML and the need for aiml basic use cases
Algorithms used in AIML and the need for aiml basic use cases
Bhagirathi12
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
Prashanth Guntal
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
NANDHINIS900805
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
Pravinkumar Landge
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
Yan Xu
Machine Learning : Clustering - Cluster analysis.pptx
Machine Learning : Clustering - Cluster analysis.pptxMachine Learning : Clustering - Cluster analysis.pptx
Machine Learning : Clustering - Cluster analysis.pptx
tecaviw979
PPT s10-machine vision-s2
PPT s10-machine vision-s2PPT s10-machine vision-s2
PPT s10-machine vision-s2
Binus Online Learning
CLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptxCLUSTER ANALYSIS ALGORITHMS.pptx
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
Unsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and AssumptionsUnsupervised learning Algorithms and Assumptions
Unsupervised learning Algorithms and Assumptions
refedey275
K means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objectsK means Clustering - algorithm to cluster n objects
K means Clustering - algorithm to cluster n objects
VoidVampire
clustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data miningclustering in DataMining and differences in models/ clustering in data mining
clustering in DataMining and differences in models/ clustering in data mining
RevathiSundar4
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
SowmyaJyothi3
clustering using different methods in .pdf
clustering using different methods in .pdfclustering using different methods in .pdf
clustering using different methods in .pdf
officialnovice7
clustering ppt.pptx
clustering ppt.pptxclustering ppt.pptx
clustering ppt.pptx
chmeghana1
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
JK970901
Algorithms used in AIML and the need for aiml basic use cases
Algorithms used in AIML and the need for aiml basic use casesAlgorithms used in AIML and the need for aiml basic use cases
Algorithms used in AIML and the need for aiml basic use cases
Bhagirathi12
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
Prashanth Guntal
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
NANDHINIS900805
UNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data MiningUNIT - 4: Data Warehousing and Data Mining
UNIT - 4: Data Warehousing and Data Mining
Nandakumar P
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
Pravinkumar Landge
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
Yan Xu
Machine Learning : Clustering - Cluster analysis.pptx
Machine Learning : Clustering - Cluster analysis.pptxMachine Learning : Clustering - Cluster analysis.pptx
Machine Learning : Clustering - Cluster analysis.pptx
tecaviw979

More from PRABHUCECC (7)

Chapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdf
Chapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdfChapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdf
Chapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdf
PRABHUCECC
Chapter1MACHINE LEARNING THEORY AND PRACTICES.pdf
Chapter1MACHINE LEARNING THEORY AND PRACTICES.pdfChapter1MACHINE LEARNING THEORY AND PRACTICES.pdf
Chapter1MACHINE LEARNING THEORY AND PRACTICES.pdf
PRABHUCECC
Chapter5 ML BASED FREQUENT ITEM SETS.pdf
Chapter5 ML BASED FREQUENT ITEM SETS.pdfChapter5 ML BASED FREQUENT ITEM SETS.pdf
Chapter5 ML BASED FREQUENT ITEM SETS.pdf
PRABHUCECC
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdfChapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
PRABHUCECC
Criterion _1_NBA RELATED DOCUMENT BRR.pptx
Criterion _1_NBA RELATED DOCUMENT BRR.pptxCriterion _1_NBA RELATED DOCUMENT BRR.pptx
Criterion _1_NBA RELATED DOCUMENT BRR.pptx
PRABHUCECC
DATA WAREHOUSING AND DATA MINING JNTUK UNIT-2.ppt
DATA WAREHOUSING AND DATA MINING JNTUK UNIT-2.pptDATA WAREHOUSING AND DATA MINING JNTUK UNIT-2.ppt
DATA WAREHOUSING AND DATA MINING JNTUK UNIT-2.ppt
PRABHUCECC
Data ware house and miningUNIT-1 DATA MINING CONCEPT.ppt
Data ware house and miningUNIT-1 DATA MINING CONCEPT.pptData ware house and miningUNIT-1 DATA MINING CONCEPT.ppt
Data ware house and miningUNIT-1 DATA MINING CONCEPT.ppt
PRABHUCECC
Chapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdf
Chapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdfChapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdf
Chapter8 LINEAR DESCRIMINANT FOR MACHINE LEARNING.pdf
PRABHUCECC
Chapter1MACHINE LEARNING THEORY AND PRACTICES.pdf
Chapter1MACHINE LEARNING THEORY AND PRACTICES.pdfChapter1MACHINE LEARNING THEORY AND PRACTICES.pdf
Chapter1MACHINE LEARNING THEORY AND PRACTICES.pdf
PRABHUCECC
Chapter5 ML BASED FREQUENT ITEM SETS.pdf
Chapter5 ML BASED FREQUENT ITEM SETS.pdfChapter5 ML BASED FREQUENT ITEM SETS.pdf
Chapter5 ML BASED FREQUENT ITEM SETS.pdf
PRABHUCECC
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdfChapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
Chapter2 NEAREST NEIGHBOURHOOD ALGORITHMS.pdf
PRABHUCECC
Criterion _1_NBA RELATED DOCUMENT BRR.pptx
Criterion _1_NBA RELATED DOCUMENT BRR.pptxCriterion _1_NBA RELATED DOCUMENT BRR.pptx
Criterion _1_NBA RELATED DOCUMENT BRR.pptx
PRABHUCECC
DATA WAREHOUSING AND DATA MINING JNTUK UNIT-2.ppt
DATA WAREHOUSING AND DATA MINING JNTUK UNIT-2.pptDATA WAREHOUSING AND DATA MINING JNTUK UNIT-2.ppt
DATA WAREHOUSING AND DATA MINING JNTUK UNIT-2.ppt
PRABHUCECC
Data ware house and miningUNIT-1 DATA MINING CONCEPT.ppt
Data ware house and miningUNIT-1 DATA MINING CONCEPT.pptData ware house and miningUNIT-1 DATA MINING CONCEPT.ppt
Data ware house and miningUNIT-1 DATA MINING CONCEPT.ppt
PRABHUCECC

Recently uploaded (20)

Recruitment in the Odoo 17 - Odoo 17 際際滷s
Recruitment in the Odoo 17 - Odoo 17 際際滷sRecruitment in the Odoo 17 - Odoo 17 際際滷s
Recruitment in the Odoo 17 - Odoo 17 際際滷s
Celine George
How to Manage Check Out Process in Odoo 17 Website
How to Manage Check Out Process in Odoo 17 WebsiteHow to Manage Check Out Process in Odoo 17 Website
How to Manage Check Out Process in Odoo 17 Website
Celine George
Different perspectives on dugout canoe heritage of Soomaa.pdf
Different perspectives on dugout canoe heritage of Soomaa.pdfDifferent perspectives on dugout canoe heritage of Soomaa.pdf
Different perspectives on dugout canoe heritage of Soomaa.pdf
Aivar Ruukel
O SWEET SPONTANEOUS BY EDWARD ESTLIN CUMMINGSAN.pptx
O SWEET SPONTANEOUS BY EDWARD ESTLIN CUMMINGSAN.pptxO SWEET SPONTANEOUS BY EDWARD ESTLIN CUMMINGSAN.pptx
O SWEET SPONTANEOUS BY EDWARD ESTLIN CUMMINGSAN.pptx
AituzazKoree
PUBH1000 際際滷s - Module 7: Ecological Health
PUBH1000 際際滷s - Module 7: Ecological HealthPUBH1000 際際滷s - Module 7: Ecological Health
PUBH1000 際際滷s - Module 7: Ecological Health
Jonathan Hallett
BIOPHARMACEUTICS AND PHARMACOKINETICS(BP604T) - Copy (3).pptx
BIOPHARMACEUTICS AND PHARMACOKINETICS(BP604T) - Copy (3).pptxBIOPHARMACEUTICS AND PHARMACOKINETICS(BP604T) - Copy (3).pptx
BIOPHARMACEUTICS AND PHARMACOKINETICS(BP604T) - Copy (3).pptx
maniramkumar
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VIAnti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Samruddhi Khonde
How to Setup Company Data in Odoo 17 Accounting App
How to Setup Company Data in Odoo 17 Accounting AppHow to Setup Company Data in Odoo 17 Accounting App
How to Setup Company Data in Odoo 17 Accounting App
Celine George
NURSING PROCESS AND ITS STEPS .pptx
NURSING PROCESS AND ITS STEPS                 .pptxNURSING PROCESS AND ITS STEPS                 .pptx
NURSING PROCESS AND ITS STEPS .pptx
PoojaSen20
MIPLM subject matter expert Daniel Holzner
MIPLM subject matter expert Daniel HolznerMIPLM subject matter expert Daniel Holzner
MIPLM subject matter expert Daniel Holzner
MIPLM
Strategic Corporate Social Responsibility: Sustainable Value Creation Fourth
Strategic Corporate Social Responsibility: Sustainable Value Creation FourthStrategic Corporate Social Responsibility: Sustainable Value Creation Fourth
Strategic Corporate Social Responsibility: Sustainable Value Creation Fourth
keileyrazawi
NC Advisory Council on Student Safety and Well-Being
NC Advisory Council on Student Safety and Well-BeingNC Advisory Council on Student Safety and Well-Being
NC Advisory Council on Student Safety and Well-Being
Mebane Rash
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
Amlan Sarkar
10.socialorganisationandsocialsystem .pptx
10.socialorganisationandsocialsystem .pptx10.socialorganisationandsocialsystem .pptx
10.socialorganisationandsocialsystem .pptx
Vivek Bhattji
MIPLM subject matter expert Sascha Kamhuber
MIPLM subject matter expert Sascha KamhuberMIPLM subject matter expert Sascha Kamhuber
MIPLM subject matter expert Sascha Kamhuber
MIPLM
Studying and Notetaking: Some Suggestions
Studying and Notetaking: Some SuggestionsStudying and Notetaking: Some Suggestions
Studying and Notetaking: Some Suggestions
Damian T. Gordon
All India Council of Skills and Vocational Studies (AICSVS) PROSPECTUS 2025
All India Council of Skills and Vocational Studies (AICSVS) PROSPECTUS 2025All India Council of Skills and Vocational Studies (AICSVS) PROSPECTUS 2025
All India Council of Skills and Vocational Studies (AICSVS) PROSPECTUS 2025
National Council of Open Schooling Research and Training
Unit1 Inroduction to Internal Combustion Engines
Unit1  Inroduction to Internal Combustion EnginesUnit1  Inroduction to Internal Combustion Engines
Unit1 Inroduction to Internal Combustion Engines
NileshKumbhar21
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VIAnti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Samruddhi Khonde
20250402 ACCA TeamScienceAIEra 20250402 v10.pptx
20250402 ACCA TeamScienceAIEra 20250402 v10.pptx20250402 ACCA TeamScienceAIEra 20250402 v10.pptx
20250402 ACCA TeamScienceAIEra 20250402 v10.pptx
home
Recruitment in the Odoo 17 - Odoo 17 際際滷s
Recruitment in the Odoo 17 - Odoo 17 際際滷sRecruitment in the Odoo 17 - Odoo 17 際際滷s
Recruitment in the Odoo 17 - Odoo 17 際際滷s
Celine George
How to Manage Check Out Process in Odoo 17 Website
How to Manage Check Out Process in Odoo 17 WebsiteHow to Manage Check Out Process in Odoo 17 Website
How to Manage Check Out Process in Odoo 17 Website
Celine George
Different perspectives on dugout canoe heritage of Soomaa.pdf
Different perspectives on dugout canoe heritage of Soomaa.pdfDifferent perspectives on dugout canoe heritage of Soomaa.pdf
Different perspectives on dugout canoe heritage of Soomaa.pdf
Aivar Ruukel
O SWEET SPONTANEOUS BY EDWARD ESTLIN CUMMINGSAN.pptx
O SWEET SPONTANEOUS BY EDWARD ESTLIN CUMMINGSAN.pptxO SWEET SPONTANEOUS BY EDWARD ESTLIN CUMMINGSAN.pptx
O SWEET SPONTANEOUS BY EDWARD ESTLIN CUMMINGSAN.pptx
AituzazKoree
PUBH1000 際際滷s - Module 7: Ecological Health
PUBH1000 際際滷s - Module 7: Ecological HealthPUBH1000 際際滷s - Module 7: Ecological Health
PUBH1000 際際滷s - Module 7: Ecological Health
Jonathan Hallett
BIOPHARMACEUTICS AND PHARMACOKINETICS(BP604T) - Copy (3).pptx
BIOPHARMACEUTICS AND PHARMACOKINETICS(BP604T) - Copy (3).pptxBIOPHARMACEUTICS AND PHARMACOKINETICS(BP604T) - Copy (3).pptx
BIOPHARMACEUTICS AND PHARMACOKINETICS(BP604T) - Copy (3).pptx
maniramkumar
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VIAnti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Anti-Viral Agents.pptx Medicinal Chemistry III, B Pharm SEM VI
Samruddhi Khonde
How to Setup Company Data in Odoo 17 Accounting App
How to Setup Company Data in Odoo 17 Accounting AppHow to Setup Company Data in Odoo 17 Accounting App
How to Setup Company Data in Odoo 17 Accounting App
Celine George
NURSING PROCESS AND ITS STEPS .pptx
NURSING PROCESS AND ITS STEPS                 .pptxNURSING PROCESS AND ITS STEPS                 .pptx
NURSING PROCESS AND ITS STEPS .pptx
PoojaSen20
MIPLM subject matter expert Daniel Holzner
MIPLM subject matter expert Daniel HolznerMIPLM subject matter expert Daniel Holzner
MIPLM subject matter expert Daniel Holzner
MIPLM
Strategic Corporate Social Responsibility: Sustainable Value Creation Fourth
Strategic Corporate Social Responsibility: Sustainable Value Creation FourthStrategic Corporate Social Responsibility: Sustainable Value Creation Fourth
Strategic Corporate Social Responsibility: Sustainable Value Creation Fourth
keileyrazawi
NC Advisory Council on Student Safety and Well-Being
NC Advisory Council on Student Safety and Well-BeingNC Advisory Council on Student Safety and Well-Being
NC Advisory Council on Student Safety and Well-Being
Mebane Rash
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
General Quiz at Maharaja Agrasen College | Amlan Sarkar | Prelims with Answer...
Amlan Sarkar
10.socialorganisationandsocialsystem .pptx
10.socialorganisationandsocialsystem .pptx10.socialorganisationandsocialsystem .pptx
10.socialorganisationandsocialsystem .pptx
Vivek Bhattji
MIPLM subject matter expert Sascha Kamhuber
MIPLM subject matter expert Sascha KamhuberMIPLM subject matter expert Sascha Kamhuber
MIPLM subject matter expert Sascha Kamhuber
MIPLM
Studying and Notetaking: Some Suggestions
Studying and Notetaking: Some SuggestionsStudying and Notetaking: Some Suggestions
Studying and Notetaking: Some Suggestions
Damian T. Gordon
Unit1 Inroduction to Internal Combustion Engines
Unit1  Inroduction to Internal Combustion EnginesUnit1  Inroduction to Internal Combustion Engines
Unit1 Inroduction to Internal Combustion Engines
NileshKumbhar21
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VIAnti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Anti-Fungal Agents.pptx Medicinal Chemistry III B. Pharm Sem VI
Samruddhi Khonde
20250402 ACCA TeamScienceAIEra 20250402 v10.pptx
20250402 ACCA TeamScienceAIEra 20250402 v10.pptx20250402 ACCA TeamScienceAIEra 20250402 v10.pptx
20250402 ACCA TeamScienceAIEra 20250402 v10.pptx
home

Chapter7 clustering types concepts algorithms.pdf

  • 1. CLUSTERING CHAPTER 7: MACHINE LEARNING THEORY & PRACTICE
  • 2. Introduction Clustering refers to the process of arranging or organizing objects according to specific criteria. Partitioning of data oGrouping of data in database application improves the data access Data reorganization Data compression
  • 3. Introduction Summarization oStatistical measures like mean, mode, and median can provide summarized information. Matrix factorization o Let there be n data points in an l-dimensional space. We can represent it as a matrix Xnl. oIt is possible to approximate X as a product of two matrices BnK and CKl. oSo, X BC, where B is the cluster assignment matrix and C is the representatives matrix
  • 4. Clustering process The clustering process ensure that the distance between any two points within the same cluster (intra-cluster distance), as measured by a dissimilarity measure such as Euclidean distance, is smaller than the distance between any two points belonging to different clusters (inter-cluster distance) Any two points are placed in the same cluster if the distance between them is lower than a certain threshold (input). Squared Euclidean distance is used as one of the measure to compute the distance between the points.
  • 5. Hard and soft clustering
  • 6. Data abstraction Clustering is a useful method for data abstraction, and it can be applied to generate clusters of data points that can be represented by their centroid or medoid or leader or some other suitable entity. The centroid is computed as the sample mean of the data points in a cluster. The medoid is the point that minimizes the sum of distances to all other points in the cluster. Note: The centroid can shift dramatically based on the position of the outlier, while the medoid remains stable within the boundaries of the original cluster.
  • 8. Divisive clustering Divisive algorithms are either polythetic where the division is based on more than one feature or monothetic when only one feature is considered at a time. The polythetic scheme is based on finding all possible 2-partitions of the data and choosing the best among them. If there are n patterns, the number of distinct 2-partions is given by (2n 2)/2 = 2n1 1.
  • 9. Divisive clustering Among all possible 2-partitions, the partition with the least sum of the sample variances of the two clusters is chosen as the best. From the resulting partition, the cluster with the maximum sample variance is selected and is split into an optimal 2-partition. This process is repeated till we get singleton clusters. If a collection of patterns (data points) is split into two clusters with p patterns x1, 揃 揃 揃 , xp in one cluster and q patterns y1, 揃 揃 揃 , yq in the other cluster with the centroids of the two clusters being C1 and C2 respectively, then the sum of the sample variances will be
  • 11. Monothetic clustering involves considering each feature direction individually and dividing the data into two clusters based on the gap in projected values along that feature direction. Specifically, the dataset is split into two parts at a point that corresponds to the mean value of the maximum gap observed among the feature values. This process is then repeated sequentially for the remaining features, further partitioning each cluster.
  • 13. Agglomerative clustering An agglomerative clustering algorithm generally follows the following steps: 1. Compute the proximity matrix for all pairs of patterns in the dataset. 2. Find the closest pair of clusters based on the computed proximity measure and merge them into a single cluster. Update the proximity matrix to reflect the merge, adjusting the distances between the newly formed cluster and the remaining clusters. 3. If all patterns belong to a single cluster, terminate the algorithm. Otherwise, go back to Step 2 and repeat the process until all patterns are in one cluster.
  • 16. Elbow method to select k
  • 17. k-Means++ clustering k-means++ clustering algorithm is mainly used for identifying the initial cluster centers.
  • 19. Soft partitioning 1. Fuzzy clustering: Each data point is assigned to multiple clusters, typically more than one, based on a membership value. The value is computed using the data point and the corresponding cluster centroid. 2. Rough clustering: Each cluster is assumed to have both a non-overlapping part and an overlapping part. Data points in the non-overlapping portion exclusively belong to that cluster, while data points in the overlapping part may belong to multiple clusters. 3. Neural network-based clustering: In this method, varying weights associated with the data points are used to obtain a soft partition.
  • 20. Soft partitioning Contd. 1. Simulated annealing: In this case, the current solution is randomly updated, and the resulting solution is accepted with a certain proba-bility. If the resulting solution is better than the current solution, it is accepted; otherwise, it is accepted with a probability ranging from 0 to 1. 2. Tabu search: Unlike simulated annealing, multiple solutions are stored, and the current solution is perturbed in various ways to determine the next configuration. 3. Evolutionary algorithms: This method maintains a population of solutions. In addition to the fitness values of individuals, a random search based on the interaction among solutions with mutation is employed to generate the next population.
  • 26. Clustering large datasets Issues: Number of dataset scans Incremental changes in dataset Solutions: Single dataset scan clustering algorithms Incremental clustering algorithms Abstraction based clustering Examples PC-Clustering algorithm Leader clustering algorithm
  • 27. Divide-and-conquer method The divide-and-conquer approach is an effective strategy for addressing the challenge of clustering large datasets that cannot be stored entirely in main memory. To overcome this limitation, a common solution is to process a portion of the dataset at a time and store the relevant cluster representatives in memory.