ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Data stream classi?cation
by incremental
semi-supervised fuzzy clustering
G.Casalino, G. Castellano, C.Castiello, A.M.Fanelli, C. Mencar
CVPL2018
gabriella.casalino@uniba.it
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Data streams
? Continuous ?ow of data
? sensors, online transactions, health monitoring, network traf?c,¡­
? Impractical to store and use all data
? Need of new techniques that:
? Process a ?nite number of data at a time
? Use a limited amount of memory
? Predict/classify at any time and in a limited amount of time
? Take into account the evolution of data
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Proposed method
? DISSFCM: Dynamic Incremental Semi-Supervised Fuzzy C-Means
? a method for data stream classi?cation that
? works in an incremental way
? dynamically adapts the number of clusters:
? a ?xed number of clusters may not capture adequately the evolving
structure of streaming data
? uses unlabeled and labeled data, semi-supervised
? uses fuzzy logic to describe patterns in data
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Proposed method
? Based on semi-supervised fuzzy clustering
algorithm
? Applied to subsequent, non-overlapping chunks of
data so as to enable continuous update of clusters
? SSFCM - Semi-Supervised FCM (Pedrycz and
Waletzky, 1997)
Supervised component
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Split
? When the cluster quality deteriorates from one data
chunk to another, the number of clusters is
increased (by splitting some clusters)
? The cluster quality is evaluated in terms of the
reconstruction error (Pedrycz, 2008)
? The cluster having the highest value of the
reconstruction error is splitted in two clusters
? To ?nd the new two prototypes a conditional fuzzy
clustering is applied to the data belonging to the cluster
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Merge
? The two nearest clusters sharing the same
prototype¡¯s label are merged in one if:
? the number of clusters exceeds a prede?ned threshold
? the number of data belonging to a cluster is below a
prede?ned threshold
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
DISSFCM
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Experimental results
? Optical recognition of Handwritten Digits dataset
? 5620 samples, 10 classes
? Training set: 90%, Test set: 10%
? #Chunk: 5,10,15,20
? %Labeling: 75%
? Splitting tolerance: 25, 50, 100
? Evaluation measure: classi?cation accuracy
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Trend of the reconstruction
error
#Chunk=20, %Labeling=75%, SplitTol=25
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Accuracy values
#Chunk=5 #Chunk=10
#Chunk=15 #Chunk=20
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Conclusions
? DISSFCM
? learn incrementally from data
? adapt the number of cluster
? inject a-priori knowledge in the process
? Future work:
? the merge activation conditions
? the in?uence of the chunk composition
? a mechanism to detect outliers, concept drift and the emergence of
new classes.
CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
http://www.di.uniba.it/~cilab

More Related Content

Similar to Data stream classification by incremental semi-supervised fuzzy clustering (20)

Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
QuantUniversity
?
data analytics presentAtion on cororna virus or covid -19 detection analysis ...
data analytics presentAtion on cororna virus or covid -19 detection analysis ...data analytics presentAtion on cororna virus or covid -19 detection analysis ...
data analytics presentAtion on cororna virus or covid -19 detection analysis ...
RanjanaChoudhary13
?
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
?
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...
Vincenzo Gulisano
?
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm""Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
"Agro-Market Prediction by Fuzzy based Neuro-Genetic Algorithm"
Government of India and Tata Trusts
?
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
Paolo Missier
?
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Yves Sucaet
?
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big Data
Facultad de Inform¨¢tica UCM
?
Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...
Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...
Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...
Barbara Russo
?
PPT for ensembled techniques used for smoke detection
PPT for ensembled techniques used for smoke detectionPPT for ensembled techniques used for smoke detection
PPT for ensembled techniques used for smoke detection
pinigi9949
?
Big data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modellingBig data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modelling
Dario Buono
?
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
?
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
?
Pride Cluster 062016 Update
Pride Cluster 062016 UpdatePride Cluster 062016 Update
Pride Cluster 062016 Update
Juan Antonio Vizcaino
?
Data Analytics
Data AnalyticsData Analytics
Data Analytics
Tata Power Delhi Distribution Limited
?
SEBD2015_PresentationVitali
SEBD2015_PresentationVitaliSEBD2015_PresentationVitali
SEBD2015_PresentationVitali
Monica Vitali
?
ProFAX: a hardware acceleration of a protein folding algorithm
ProFAX: a hardware acceleration of a protein folding algorithmProFAX: a hardware acceleration of a protein folding algorithm
ProFAX: a hardware acceleration of a protein folding algorithm
NECST Lab @ Politecnico di Milano
?
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
SGS
?
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
cscpconf
?
Big&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 ShanghaiBig&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 Shanghai
Victoria L¨®pez
?
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
QuantUniversity
?
data analytics presentAtion on cororna virus or covid -19 detection analysis ...
data analytics presentAtion on cororna virus or covid -19 detection analysis ...data analytics presentAtion on cororna virus or covid -19 detection analysis ...
data analytics presentAtion on cororna virus or covid -19 detection analysis ...
RanjanaChoudhary13
?
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...
Vincenzo Gulisano
?
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Whole slide imaging: beyond pathology (Pittsburgh Computational Pathology Lec...
Yves Sucaet
?
Grouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big DataGrouping techniques for facing Volume and Velocity in the Big Data
Grouping techniques for facing Volume and Velocity in the Big Data
Facultad de Inform¨¢tica UCM
?
Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...
Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...
Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...
Barbara Russo
?
PPT for ensembled techniques used for smoke detection
PPT for ensembled techniques used for smoke detectionPPT for ensembled techniques used for smoke detection
PPT for ensembled techniques used for smoke detection
pinigi9949
?
Big data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modellingBig data and macroeconomic nowcasting from data access to modelling
Big data and macroeconomic nowcasting from data access to modelling
Dario Buono
?
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
?
K- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptxK- means clustering method based Data Mining of Network Shared Resources .pptx
K- means clustering method based Data Mining of Network Shared Resources .pptx
SaiPragnaKancheti
?
SEBD2015_PresentationVitali
SEBD2015_PresentationVitaliSEBD2015_PresentationVitali
SEBD2015_PresentationVitali
Monica Vitali
?
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
XploreIQ: Machine Learning and Big Data The Successful Use of Algorithms in E...
SGS
?
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE
cscpconf
?
Big&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 ShanghaiBig&open data challenges for smartcity-PIC2014 Shanghai
Big&open data challenges for smartcity-PIC2014 Shanghai
Victoria L¨®pez
?

More from Gabriella Casalino (11)

IJCCI2023.pdf
IJCCI2023.pdfIJCCI2023.pdf
IJCCI2023.pdf
Gabriella Casalino
?
A mHealth solution for contact-less self-monitoring of vital sign parameters
A mHealth solution for contact-less self-monitoring of vital sign parametersA mHealth solution for contact-less self-monitoring of vital sign parameters
A mHealth solution for contact-less self-monitoring of vital sign parameters
Gabriella Casalino
?
Text mining through Non Negative Matrix Factorizations
Text mining through Non Negative Matrix FactorizationsText mining through Non Negative Matrix Factorizations
Text mining through Non Negative Matrix Factorizations
Gabriella Casalino
?
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Gabriella Casalino
?
A mHealth solution for contact-less self-monitoring of vital signs parameters
A mHealth solution for contact-less  self-monitoring of vital signs parametersA mHealth solution for contact-less  self-monitoring of vital signs parameters
A mHealth solution for contact-less self-monitoring of vital signs parameters
Gabriella Casalino
?
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Gabriella Casalino
?
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
Gabriella Casalino
?
Non-negative factorization methods for extracting semantically relevant featu...
Non-negative factorization methods for extracting semantically relevant featu...Non-negative factorization methods for extracting semantically relevant featu...
Non-negative factorization methods for extracting semantically relevant featu...
Gabriella Casalino
?
ICCSA2014 - slides
ICCSA2014 - slidesICCSA2014 - slides
ICCSA2014 - slides
Gabriella Casalino
?
Didamatica2012 - slides
Didamatica2012 - slidesDidamatica2012 - slides
Didamatica2012 - slides
Gabriella Casalino
?
WILF2011 - slides
WILF2011 - slidesWILF2011 - slides
WILF2011 - slides
Gabriella Casalino
?
A mHealth solution for contact-less self-monitoring of vital sign parameters
A mHealth solution for contact-less self-monitoring of vital sign parametersA mHealth solution for contact-less self-monitoring of vital sign parameters
A mHealth solution for contact-less self-monitoring of vital sign parameters
Gabriella Casalino
?
Text mining through Non Negative Matrix Factorizations
Text mining through Non Negative Matrix FactorizationsText mining through Non Negative Matrix Factorizations
Text mining through Non Negative Matrix Factorizations
Gabriella Casalino
?
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...
Gabriella Casalino
?
A mHealth solution for contact-less self-monitoring of vital signs parameters
A mHealth solution for contact-less  self-monitoring of vital signs parametersA mHealth solution for contact-less  self-monitoring of vital signs parameters
A mHealth solution for contact-less self-monitoring of vital signs parameters
Gabriella Casalino
?
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Incremental and Adaptive fuzzy clustering for Virtual Learning Environments D...
Gabriella Casalino
?
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...
Gabriella Casalino
?
Non-negative factorization methods for extracting semantically relevant featu...
Non-negative factorization methods for extracting semantically relevant featu...Non-negative factorization methods for extracting semantically relevant featu...
Non-negative factorization methods for extracting semantically relevant featu...
Gabriella Casalino
?

Recently uploaded (20)

Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
?
Endpoint Backup: 3 Reasons MSPs Ignore It
Endpoint Backup: 3 Reasons MSPs Ignore ItEndpoint Backup: 3 Reasons MSPs Ignore It
Endpoint Backup: 3 Reasons MSPs Ignore It
MSP360
?
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
?
Deno ...................................
Deno ...................................Deno ...................................
Deno ...................................
Robert MacLean
?
What Makes "Deep Research"? A Dive into AI Agents
What Makes "Deep Research"? A Dive into AI AgentsWhat Makes "Deep Research"? A Dive into AI Agents
What Makes "Deep Research"? A Dive into AI Agents
Zilliz
?
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOTSMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
TanmaiArni
?
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
Safe Software
?
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIATHE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
Srivaanchi Nathan
?
1.1. Evolution-and-Scope-of-Business-Analytics.pptx
1.1. Evolution-and-Scope-of-Business-Analytics.pptx1.1. Evolution-and-Scope-of-Business-Analytics.pptx
1.1. Evolution-and-Scope-of-Business-Analytics.pptx
Jitendra Tomar
?
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
Tsuyoshi Hirayama
?
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
?
Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Stronger Together: Combining Data Quality and Governance for Confident AI & A...Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Precisely
?
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-StoryRevolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
ssuser52ad5e
?
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar PatturajInside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
ScyllaDB
?
Gojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptxGojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptx
V3cube
?
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarterQ4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
MariaBarbaraPaglinaw
?
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
?
Unlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & KeylockUnlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & Keylock
HusseinMalikMammadli
?
DealBook of Ukraine: 2025 edition | AVentures Capital
DealBook of Ukraine: 2025 edition | AVentures CapitalDealBook of Ukraine: 2025 edition | AVentures Capital
DealBook of Ukraine: 2025 edition | AVentures Capital
Yevgen Sysoyev
?
Future-Proof Your Career with AI Options
Future-Proof Your  Career with AI OptionsFuture-Proof Your  Career with AI Options
Future-Proof Your Career with AI Options
DianaGray10
?
Technology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptxTechnology use over time and its impact on consumers and businesses.pptx
Technology use over time and its impact on consumers and businesses.pptx
kaylagaze
?
Endpoint Backup: 3 Reasons MSPs Ignore It
Endpoint Backup: 3 Reasons MSPs Ignore ItEndpoint Backup: 3 Reasons MSPs Ignore It
Endpoint Backup: 3 Reasons MSPs Ignore It
MSP360
?
Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4Build with AI on Google Cloud Session #4
Build with AI on Google Cloud Session #4
Margaret Maynard-Reid
?
Deno ...................................
Deno ...................................Deno ...................................
Deno ...................................
Robert MacLean
?
What Makes "Deep Research"? A Dive into AI Agents
What Makes "Deep Research"? A Dive into AI AgentsWhat Makes "Deep Research"? A Dive into AI Agents
What Makes "Deep Research"? A Dive into AI Agents
Zilliz
?
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOTSMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
SMART SENTRY CYBER THREAT INTELLIGENCE IN IIOT
TanmaiArni
?
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
[Webinar] Scaling Made Simple: Getting Started with No-Code Web Apps
Safe Software
?
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIATHE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIA
Srivaanchi Nathan
?
1.1. Evolution-and-Scope-of-Business-Analytics.pptx
1.1. Evolution-and-Scope-of-Business-Analytics.pptx1.1. Evolution-and-Scope-of-Business-Analytics.pptx
1.1. Evolution-and-Scope-of-Business-Analytics.pptx
Jitendra Tomar
?
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
DAO UTokyo 2025 DLT mass adoption case studies IBM Tsuyoshi Hirayama (ƽɽÒã)
Tsuyoshi Hirayama
?
Computational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the WorldComputational Photography: How Technology is Changing Way We Capture the World
Computational Photography: How Technology is Changing Way We Capture the World
HusseinMalikMammadli
?
Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Stronger Together: Combining Data Quality and Governance for Confident AI & A...Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Stronger Together: Combining Data Quality and Governance for Confident AI & A...
Precisely
?
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-StoryRevolutionizing-Government-Communication-The-OSWAN-Success-Story
Revolutionizing-Government-Communication-The-OSWAN-Success-Story
ssuser52ad5e
?
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar PatturajInside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
ScyllaDB
?
Gojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptxGojek Clone Multi-Service Super App.pptx
Gojek Clone Multi-Service Super App.pptx
V3cube
?
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarterQ4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
Q4_TLE-7-Lesson-6-Week-6.pptx 4th quarter
MariaBarbaraPaglinaw
?
DevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdfDevNexus - Building 10x Development Organizations.pdf
DevNexus - Building 10x Development Organizations.pdf
Justin Reock
?
Unlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & KeylockUnlocking DevOps Secuirty :Vault & Keylock
Unlocking DevOps Secuirty :Vault & Keylock
HusseinMalikMammadli
?
DealBook of Ukraine: 2025 edition | AVentures Capital
DealBook of Ukraine: 2025 edition | AVentures CapitalDealBook of Ukraine: 2025 edition | AVentures Capital
DealBook of Ukraine: 2025 edition | AVentures Capital
Yevgen Sysoyev
?
Future-Proof Your Career with AI Options
Future-Proof Your  Career with AI OptionsFuture-Proof Your  Career with AI Options
Future-Proof Your Career with AI Options
DianaGray10
?

Data stream classification by incremental semi-supervised fuzzy clustering

  • 1. Data stream classi?cation by incremental semi-supervised fuzzy clustering G.Casalino, G. Castellano, C.Castiello, A.M.Fanelli, C. Mencar CVPL2018 gabriella.casalino@uniba.it
  • 2. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering Data streams ? Continuous ?ow of data ? sensors, online transactions, health monitoring, network traf?c,¡­ ? Impractical to store and use all data ? Need of new techniques that: ? Process a ?nite number of data at a time ? Use a limited amount of memory ? Predict/classify at any time and in a limited amount of time ? Take into account the evolution of data
  • 3. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering Proposed method ? DISSFCM: Dynamic Incremental Semi-Supervised Fuzzy C-Means ? a method for data stream classi?cation that ? works in an incremental way ? dynamically adapts the number of clusters: ? a ?xed number of clusters may not capture adequately the evolving structure of streaming data ? uses unlabeled and labeled data, semi-supervised ? uses fuzzy logic to describe patterns in data
  • 4. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering Proposed method ? Based on semi-supervised fuzzy clustering algorithm ? Applied to subsequent, non-overlapping chunks of data so as to enable continuous update of clusters ? SSFCM - Semi-Supervised FCM (Pedrycz and Waletzky, 1997) Supervised component
  • 5. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering Split ? When the cluster quality deteriorates from one data chunk to another, the number of clusters is increased (by splitting some clusters) ? The cluster quality is evaluated in terms of the reconstruction error (Pedrycz, 2008) ? The cluster having the highest value of the reconstruction error is splitted in two clusters ? To ?nd the new two prototypes a conditional fuzzy clustering is applied to the data belonging to the cluster
  • 6. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering Merge ? The two nearest clusters sharing the same prototype¡¯s label are merged in one if: ? the number of clusters exceeds a prede?ned threshold ? the number of data belonging to a cluster is below a prede?ned threshold
  • 7. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering DISSFCM
  • 8. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering Experimental results ? Optical recognition of Handwritten Digits dataset ? 5620 samples, 10 classes ? Training set: 90%, Test set: 10% ? #Chunk: 5,10,15,20 ? %Labeling: 75% ? Splitting tolerance: 25, 50, 100 ? Evaluation measure: classi?cation accuracy
  • 9. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering Trend of the reconstruction error #Chunk=20, %Labeling=75%, SplitTol=25
  • 10. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering Accuracy values #Chunk=5 #Chunk=10 #Chunk=15 #Chunk=20
  • 11. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering Conclusions ? DISSFCM ? learn incrementally from data ? adapt the number of cluster ? inject a-priori knowledge in the process ? Future work: ? the merge activation conditions ? the in?uence of the chunk composition ? a mechanism to detect outliers, concept drift and the emergence of new classes.
  • 12. CVPL2018 - Vico Equense, Italy, 30-31 August 2018 gabriella.casalino@uniba.it Data stream classi?cation by incremental semi- supervised fuzzy clustering http://www.di.uniba.it/~cilab