Presentation of the CILAB research activity at the CVPL (Associazione Italiana per la ricerca in Computer Vision,
Pattern recognition e machine Learning (CVPL- ex-GIRPR)) congress (CVPL2018).
The document outlines the development of a protein identification algorithm using string matching between field programmable gate arrays (FPGAs). It describes completing an analysis of the state-of-the-art consumption and successfully enabling communication between FPGAs. Various milestones are shown in a table, including handling the full database and completing result analysis. The document also discusses helpful and harmful factors, such as the innovative FPGA design bringing less outlay than standard instruments but specific applicability to mass spectrometry.
Big Data Quality Panel: Diachron Workshop @EDBTPaolo Missier
?
1) Traditional approaches to ensuring data quality such as quality assurance and curation face challenges from big data's volume, velocity, and variety characteristics.
2) It is difficult to determine general thresholds for when data quality issues can be ignored as the importance varies between different analytics algorithms.
3) The ReComp decision support system aims to use metadata about past analytics tasks to determine when knowledge needs to be refreshed due to changes in big data or models.
Satwik Mishra is seeking a position in data science. He has a Master's in Computer Science expected in 2020 from Rochester Institute of Technology with a 4.0 GPA. He received a Bachelor's in Information Technology in 2018 from Manipal Institute of Technology with an 8.29/10 GPA. His skills include Python, Java, C++, Git, MySQL, R, and he has experience with machine learning algorithms like random forest and XGBoost. He has internship experience in data science and published a paper on bagging and boosting algorithms. His projects include lung cancer detection using deep learning and a movie recommendation system using collaborative filtering with MapReduce.
The document describes using linear regression and PHP to optimize media buying by predicting the optimal cost and revenue based on campaign data from Google Analytics, cost, and revenue data from native ad networks. The algorithm uses linear regression to model the relationships between cost and clicks and revenue and clicks. The PHP application loads the data from a CSV file, calculates the correlation coefficients, and uses the linear regression models to predict the optimal cost and revenue for a given number of clicks.
Incremental adaptive semi-supervised fuzzy clustering for data stream classif...Gabriella Casalino
?
Presentation of the article "Incremental adaptive semi-supervised fuzzy clustering for data stream classification" at the IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS 2018), Rhodes 25-29 May 2018
Joint work with Giovanna Castellano and Corrado Mencar
Dynamic Incremental Semi-Supervised Fuzzy Clustering for Data Stream Classifi...Gabriella Casalino
?
International FDP on ¡°Advances in technologies, evolving new dimensions in e-society¡± organised by Department of CSE, JIS College of Engineering, West Bengal
https://youtu.be/VXm9jaKj0sg
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
?
In this master class, Stefan shows how to create synthetic time-series data using generative adversarial networks (GAN). GANs train a generator and a discriminator network in a competitive setting so that the generator learns to produce samples that the discriminator cannot distinguish from a given class of training data. The goal is to yield a generative model capable of producing synthetic samples representative of this class. While most popular with image data, GANs have also been used to generate synthetic time-series data in the medical domain. Subsequent experiments with financial data explored whether GANs can produce alternative price trajectories useful for ML training or strategy backtests.
This presentation introduces clustering analysis and the k-means clustering technique. It defines clustering as an unsupervised method to segment data into groups with similar traits. The presentation outlines different clustering types (hard vs soft), techniques (partitioning, hierarchical, etc.), and describes the k-means algorithm in detail through multiple steps. It discusses requirements for clustering, provides examples of applications, and reviews advantages and disadvantages of k-means clustering.
The benefits of fine-grained synchronization in deterministic and efficient ...Vincenzo Gulisano
?
This talk, given by Vincenzo Gulisano and Yiannis Nikolakopoulos at Yahoo! discusses some of their latest research results in the field of deterministic and efficient parallelization of data streaming operators. It also present ScaleGate, the abstract data type at the core of their research and whose java-based lock-free implementation is available at https://github.com/dcs-chalmers/ScaleGate_Java
1) The document describes a final semester project analyzing agricultural sector data using hybrid algorithms and machine learning techniques.
2) It involves collecting cost and capital logs, applying algorithms like genetic, fuzzy logic, and neural networks to generate mean cost values and predict commodity prices.
3) Validation techniques like internal and external clustering are used to improve the analysis and resulting prediction, which is subject to change with new data but provides an accurate forecast.
Our vision for the selective re-computation of genomics pipelines in reaction to changes to tools and reference datasets.
How do you prioritise patients for re-analysis on a given budget?
This document summarizes Yves Sucaet's presentation on whole slide imaging and digital pathology. It discusses the history of digital pathology, how digital pathology can improve biobanks by allowing remote querying and analysis of virtual slides, and the future of intelligent querying of biobanks using digital pathology and bioinformatics tools. The presentation concludes by encouraging attendees to implement digital pathology workflows and continue the conversation around computational pathology.
The data deluge who we are living today is fostering the development of new techniques for effective and efficient methods for the analysis and the extraction of knowledge and insights from the data. The Big Data paradigm, in particular, the Volume and the Velocity features, is requiring to change our habits for treating data and for extracting information that is useful for discovering patterns and insights. Also, the exploratory data analysis must reformulate its aims. How many groups are in the data? How to deal with data that doesn't fit my PC memory? How to represent aggregated data or repeated measures on individuals? How data correlates? The Symbolic Data Analysis approach tried to reformulate the statistical thinking in this case. In this talk, we present some tools for working with aggregated data described by empirical distributions of values.
Using some real-cases from different fields (data from sensors, official statistics or data stream), and the HistDAWass R package, we show some recent solutions for the unsupervised classification, and for feature selection in a subspace clustering context, and how to interpret the results.
Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...Barbara Russo
?
Predicting system failures can be of great benefit to managers that get a better command over system performance.
Data that systems generate in the form of logs is a valuable source of information to predict system reliability. As such, there is an increasing demand of
tools to mine logs and provide accurate predictions. However, interpreting information in logs poses some challenges. This talk
presents how to effectively mining sequences of logs and provide correct predictions.
The approach integrates different machine learning techniques to control for data brittleness, provide accuracy of model selection and validation,
and increase robustness of classification results. We apply the proposed approach to log sequences of 25 different applications of a software system for
telemetry of cars
Big data and macroeconomic nowcasting from data access to modellingDario Buono
?
Parallel advances in IT and in the social use of Internet-related applications, provide the general public with access to a vast amount of information. The associated Big Data are potentially very useful for a variety of applications, ranging from marketing to tapering fiscal evasion.
From the point of view of official statistics, the main question is whether and to what extent Big Data are a field worth investing to expand, check and improve the data production process and which types of partnerships will have to be formed for this purpose. Nowcasting of macroeconomic indicators represents a well-identified field where Big Data has the potential to play a decisive role in the future.
In this paper we present the results and main recommendations from the Eurostat-funded project ¡°Big Data and macroeconomic nowcasting¡±, implemented by GOPA Consultants, which benefits from the cooperation and work of the Eurostat task force on Big Data and a few external academic experts.
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
?
K-means clustering is an unsupervised machine learning algorithm that is useful for clustering and categorizing unlabeled data points. It works by assigning data points to a set number of clusters, K, where each data point belongs to the cluster with the nearest mean. The document discusses how k-means clustering can be applied to network shared resources mining to overcome limitations of existing methods. It provides details on how k-means clustering works, compares it to other clustering algorithms, and demonstrates how it can accurately and efficiently cluster network resource data into groups within 0.6 seconds on average.
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
?
K-means clustering is an unsupervised machine learning algorithm that is useful for clustering and categorizing unlabeled data points. It works by assigning data points to a set number of clusters, K, where each data point belongs to the cluster with the nearest mean. The document discusses how k-means clustering can be applied to network shared resources mining to overcome limitations of existing methods. It provides details on how k-means clustering works, compares it to other clustering algorithms, and demonstrates how it can accurately and efficiently cluster network resource data into groups within 0.6 seconds on average.
This document provides an overview of big data analytics applications in electric power distribution systems. Large amounts of both structured and unstructured data are generated daily from various sources like smart meters, weather data, and asset management systems. These data resources can be analyzed using techniques like machine learning and predictive modeling to provide insights for better decision making, predictive analysis, and strategic business objectives. Specifically, the training module will focus on applying big data analytics methods to problems in electric distribution systems, such as load forecasting, anomaly detection, predictive maintenance, and more. The training will cover topics over 4 sessions and cost Rs. 50,000 for up to 30 participants from one company.
The document proposes a distributed monitoring system to manage energy efficiency and quality of service in cloud applications. It addresses the issues that monitoring data becomes too large to analyze centrally due to volume and velocity. The system distributes data collection and analysis across nodes to reduce network usage and improve scalability. A distributed algorithm learns relationships between monitored indicators using Bayesian networks, providing energy efficiency analysis and improvements in a way that grows linearly rather than exponentially with data volume.
This document discusses a hardware acceleration of a protein folding algorithm called ProFAX. ProFAX aims to solve the high computational needs and power consumption of protein folding algorithms by implementing the functions in hardware. Initial results show a speed-up of up to 1.61x compared to a software implementation. A demo is proposed that would allow a user to input a protein sequence on a web app, have an FPGA compute the 3D structure, and return the results.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE cscpconf
?
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
Big&open data challenges for smartcity-PIC2014 ShanghaiVictoria L¨®pez
?
This talk is about how both private enterprise and government wish to improve the value of their data and how they deal with this issue. The talk summarizes the ways we think about Big Data, Open Data and their use by organizations or individuals. Big Data is explained in terms of collection, storage, analysis and valuation. This data is collected from numerous sources including networks of sensors, government data holdings, company market databases, and public profiles on social networking sites. Organizations use many data analysis techniques to study both structured and unstructured data. Due to volume, velocity and variety of data, some specific techniques have been developed. MapReduce, Hadoop and other related as RHadoop are trendy topics nowadays.
In this talk several applications and case studies are presented as examples. Data which come from government sources must be open. Every day more and more cities and countries are opening their data. Open Data is then presented as a specific case of public data with a special role in Smartcity. The main goal of Big and Open Data in Smartcity is to develop systems which can be useful for citizens. In this sense RMap (Mapa de Recursos) is shown as an Open Data application, an open system for Madrid City Council, available for smartphones and totally developed by the researching group G-TeC (www.tecnologiaUCM.es).
Semi-Supervised Fuzzy C-Means for Regression
We propose a method to perform regression on partially labeled data, which is based on SSFCM (Semi-Supervised Fuzzy C-Means), an algorithm for semi-supervised classification based on fuzzy clustering. The proposed method, called SSFCM-R, precedes the application of SSFCM with a relabeling module based on target discretization. After the application of SSFCM, regression is carried out according to one out of two possible schemes: (i) the output corresponds to the label of the closest cluster; (ii) the output is a linear combination of the cluster labels weighted by the membership degree of the input. Some experiments on synthetic data are reported to compare both approaches.
IJCCI 15th International joint Conference on Computational Intelligence, 13-15 November, 2023, Rome, Italy
full paper: https://www.researchgate.net/publication/375671573_Semi-Supervised_Fuzzy_C-Means_for_Regression
A mHealth solution for contact-less self-monitoring of vital sign parametersGabriella Casalino
?
A mHealth solution for contact-less self-monitoring of vital sign parameters
Gabriella Casalino
https://sites.google.com/site/cilabuniba/people/gabriella-casalino
https://www.amity.edu/aset/confluence2021/index.html
Confluence-2021 - 11th International Conference on Cloud Computing, Data Science & Engineering
IEEE sponsored
More Related Content
Similar to Data stream classification by incremental semi-supervised fuzzy clustering (20)
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
?
In this master class, Stefan shows how to create synthetic time-series data using generative adversarial networks (GAN). GANs train a generator and a discriminator network in a competitive setting so that the generator learns to produce samples that the discriminator cannot distinguish from a given class of training data. The goal is to yield a generative model capable of producing synthetic samples representative of this class. While most popular with image data, GANs have also been used to generate synthetic time-series data in the medical domain. Subsequent experiments with financial data explored whether GANs can produce alternative price trajectories useful for ML training or strategy backtests.
This presentation introduces clustering analysis and the k-means clustering technique. It defines clustering as an unsupervised method to segment data into groups with similar traits. The presentation outlines different clustering types (hard vs soft), techniques (partitioning, hierarchical, etc.), and describes the k-means algorithm in detail through multiple steps. It discusses requirements for clustering, provides examples of applications, and reviews advantages and disadvantages of k-means clustering.
The benefits of fine-grained synchronization in deterministic and efficient ...Vincenzo Gulisano
?
This talk, given by Vincenzo Gulisano and Yiannis Nikolakopoulos at Yahoo! discusses some of their latest research results in the field of deterministic and efficient parallelization of data streaming operators. It also present ScaleGate, the abstract data type at the core of their research and whose java-based lock-free implementation is available at https://github.com/dcs-chalmers/ScaleGate_Java
1) The document describes a final semester project analyzing agricultural sector data using hybrid algorithms and machine learning techniques.
2) It involves collecting cost and capital logs, applying algorithms like genetic, fuzzy logic, and neural networks to generate mean cost values and predict commodity prices.
3) Validation techniques like internal and external clustering are used to improve the analysis and resulting prediction, which is subject to change with new data but provides an accurate forecast.
Our vision for the selective re-computation of genomics pipelines in reaction to changes to tools and reference datasets.
How do you prioritise patients for re-analysis on a given budget?
This document summarizes Yves Sucaet's presentation on whole slide imaging and digital pathology. It discusses the history of digital pathology, how digital pathology can improve biobanks by allowing remote querying and analysis of virtual slides, and the future of intelligent querying of biobanks using digital pathology and bioinformatics tools. The presentation concludes by encouraging attendees to implement digital pathology workflows and continue the conversation around computational pathology.
The data deluge who we are living today is fostering the development of new techniques for effective and efficient methods for the analysis and the extraction of knowledge and insights from the data. The Big Data paradigm, in particular, the Volume and the Velocity features, is requiring to change our habits for treating data and for extracting information that is useful for discovering patterns and insights. Also, the exploratory data analysis must reformulate its aims. How many groups are in the data? How to deal with data that doesn't fit my PC memory? How to represent aggregated data or repeated measures on individuals? How data correlates? The Symbolic Data Analysis approach tried to reformulate the statistical thinking in this case. In this talk, we present some tools for working with aggregated data described by empirical distributions of values.
Using some real-cases from different fields (data from sensors, official statistics or data stream), and the HistDAWass R package, we show some recent solutions for the unsupervised classification, and for feature selection in a subspace clustering context, and how to interpret the results.
Mining System Logs to Learn Error Predictors, Universit?t Stuttgart, Stuttgar...Barbara Russo
?
Predicting system failures can be of great benefit to managers that get a better command over system performance.
Data that systems generate in the form of logs is a valuable source of information to predict system reliability. As such, there is an increasing demand of
tools to mine logs and provide accurate predictions. However, interpreting information in logs poses some challenges. This talk
presents how to effectively mining sequences of logs and provide correct predictions.
The approach integrates different machine learning techniques to control for data brittleness, provide accuracy of model selection and validation,
and increase robustness of classification results. We apply the proposed approach to log sequences of 25 different applications of a software system for
telemetry of cars
Big data and macroeconomic nowcasting from data access to modellingDario Buono
?
Parallel advances in IT and in the social use of Internet-related applications, provide the general public with access to a vast amount of information. The associated Big Data are potentially very useful for a variety of applications, ranging from marketing to tapering fiscal evasion.
From the point of view of official statistics, the main question is whether and to what extent Big Data are a field worth investing to expand, check and improve the data production process and which types of partnerships will have to be formed for this purpose. Nowcasting of macroeconomic indicators represents a well-identified field where Big Data has the potential to play a decisive role in the future.
In this paper we present the results and main recommendations from the Eurostat-funded project ¡°Big Data and macroeconomic nowcasting¡±, implemented by GOPA Consultants, which benefits from the cooperation and work of the Eurostat task force on Big Data and a few external academic experts.
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
?
K-means clustering is an unsupervised machine learning algorithm that is useful for clustering and categorizing unlabeled data points. It works by assigning data points to a set number of clusters, K, where each data point belongs to the cluster with the nearest mean. The document discusses how k-means clustering can be applied to network shared resources mining to overcome limitations of existing methods. It provides details on how k-means clustering works, compares it to other clustering algorithms, and demonstrates how it can accurately and efficiently cluster network resource data into groups within 0.6 seconds on average.
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
?
K-means clustering is an unsupervised machine learning algorithm that is useful for clustering and categorizing unlabeled data points. It works by assigning data points to a set number of clusters, K, where each data point belongs to the cluster with the nearest mean. The document discusses how k-means clustering can be applied to network shared resources mining to overcome limitations of existing methods. It provides details on how k-means clustering works, compares it to other clustering algorithms, and demonstrates how it can accurately and efficiently cluster network resource data into groups within 0.6 seconds on average.
This document provides an overview of big data analytics applications in electric power distribution systems. Large amounts of both structured and unstructured data are generated daily from various sources like smart meters, weather data, and asset management systems. These data resources can be analyzed using techniques like machine learning and predictive modeling to provide insights for better decision making, predictive analysis, and strategic business objectives. Specifically, the training module will focus on applying big data analytics methods to problems in electric distribution systems, such as load forecasting, anomaly detection, predictive maintenance, and more. The training will cover topics over 4 sessions and cost Rs. 50,000 for up to 30 participants from one company.
The document proposes a distributed monitoring system to manage energy efficiency and quality of service in cloud applications. It addresses the issues that monitoring data becomes too large to analyze centrally due to volume and velocity. The system distributes data collection and analysis across nodes to reduce network usage and improve scalability. A distributed algorithm learns relationships between monitored indicators using Bayesian networks, providing energy efficiency analysis and improvements in a way that grows linearly rather than exponentially with data volume.
This document discusses a hardware acceleration of a protein folding algorithm called ProFAX. ProFAX aims to solve the high computational needs and power consumption of protein folding algorithms by implementing the functions in hardware. Initial results show a speed-up of up to 1.61x compared to a software implementation. A demo is proposed that would allow a user to input a protein sequence on a web app, have an FPGA compute the 3D structure, and return the results.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE cscpconf
?
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
Big&open data challenges for smartcity-PIC2014 ShanghaiVictoria L¨®pez
?
This talk is about how both private enterprise and government wish to improve the value of their data and how they deal with this issue. The talk summarizes the ways we think about Big Data, Open Data and their use by organizations or individuals. Big Data is explained in terms of collection, storage, analysis and valuation. This data is collected from numerous sources including networks of sensors, government data holdings, company market databases, and public profiles on social networking sites. Organizations use many data analysis techniques to study both structured and unstructured data. Due to volume, velocity and variety of data, some specific techniques have been developed. MapReduce, Hadoop and other related as RHadoop are trendy topics nowadays.
In this talk several applications and case studies are presented as examples. Data which come from government sources must be open. Every day more and more cities and countries are opening their data. Open Data is then presented as a specific case of public data with a special role in Smartcity. The main goal of Big and Open Data in Smartcity is to develop systems which can be useful for citizens. In this sense RMap (Mapa de Recursos) is shown as an Open Data application, an open system for Madrid City Council, available for smartphones and totally developed by the researching group G-TeC (www.tecnologiaUCM.es).
Semi-Supervised Fuzzy C-Means for Regression
We propose a method to perform regression on partially labeled data, which is based on SSFCM (Semi-Supervised Fuzzy C-Means), an algorithm for semi-supervised classification based on fuzzy clustering. The proposed method, called SSFCM-R, precedes the application of SSFCM with a relabeling module based on target discretization. After the application of SSFCM, regression is carried out according to one out of two possible schemes: (i) the output corresponds to the label of the closest cluster; (ii) the output is a linear combination of the cluster labels weighted by the membership degree of the input. Some experiments on synthetic data are reported to compare both approaches.
IJCCI 15th International joint Conference on Computational Intelligence, 13-15 November, 2023, Rome, Italy
full paper: https://www.researchgate.net/publication/375671573_Semi-Supervised_Fuzzy_C-Means_for_Regression
A mHealth solution for contact-less self-monitoring of vital sign parametersGabriella Casalino
?
A mHealth solution for contact-less self-monitoring of vital sign parameters
Gabriella Casalino
https://sites.google.com/site/cilabuniba/people/gabriella-casalino
https://www.amity.edu/aset/confluence2021/index.html
Confluence-2021 - 11th International Conference on Cloud Computing, Data Science & Engineering
IEEE sponsored
Text mining through Non Negative Matrix FactorizationsGabriella Casalino
?
The 2nd International Conference on Machine Learning and Intelligent Systems (MLIS2020)
October 25-28, 2020, Online Conference
References:
G. Casalino, C. Castiello, N. Del Buono, C. Mencar, (2018) A framework for intelligent Twitter data analysis with non-negative matrix factorization, International Journal of Web Information Systems, Vol. 14 Issue: 3, pp.334-356, https://doi.org/10.1108/IJWIS-11-2017-0081
Casalino G., Castiello C., Del Buono N., Mencar C. (2017) Intelligent Twitter Data Analysis Based on Nonnegative Matrix Factorizations. In: Gervasi O. et al. (eds) Computational Science and Its Applications ¨C ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science, vol 10404, pages 188--202. Springer
G.Casalino, N.Del Buono, C. Mencar, (2016), Non Negative Matrix
Factorisations for Intelligent Data Analysis, in G.R. Naik (ed.), Nonnegative Matrix Factorization Techniques, Signals and Communication Technology, ISBN: 978-3-662-48330-5, http://dx.doi.org/10.1007/978-3-662-48331-2_2.
Dynamic Incremental Semi-supervised Fuzzy Clustering for Bipolar Disorder Epi...Gabriella Casalino
?
23rd International Conference on Discovery Science
Full paper: https://www.researchgate.net/publication/343254555_Dynamic_Incremental_Semi-Supervised_Fuzzy_Clustering_for_Bipolar_Disorder_Episode_Prediction
A mHealth solution for contact-less self-monitoring of vital signs parametersGabriella Casalino
?
This document describes a contactless mHealth solution for self-monitoring vital signs using a webcam. The solution extracts photoplethysmography signals from video of a person's face to estimate blood oxygen saturation levels. It uses face detection, tracking of regions of interest, and signal processing techniques. The estimated vital signs are then used in a fuzzy inference system to predict cardiovascular risk levels. The goal is to provide an affordable, easy to use method for remote patient monitoring.
The use of an Explainable Artificial Intelligence Tool for Decision-making Su...Gabriella Casalino
?
A joint work of Jose Maria Alonso (Universidade de Santiago de
Compostela, Spain) and Gabriella Casalino (University of Bari Aldo Moro, Italy)
Presented at HELMeTO 2019 - International Workshop on
Higher Education Learning Methodologies and Technologies Online, June 6-7, 2019, Novedrate (CO), Italy
full text: https://link.springer.com/chapter/10.1007/978-3-030-31284-8_10
Non-negative factorization methods for extracting semantically relevant featu...Gabriella Casalino
?
This document discusses non-negative matrix factorization methods for dimensionality reduction and feature extraction in intelligent data analysis. It begins with an outline of non-negative matrix factorization background and applications. Next, it describes how non-negative matrix factorization can be used for tasks like document clustering by discovering semantic features in text while preserving data non-negativity. Finally, it proposes using subtractive clustering to provide the initial matrices for non-negative matrix factorization, which helps guide the number of clusters.
Gabriella Casalino, Nicoletta Del Buono, Corrado Mencar (2014) Part-Based Data Analysis with Masked Non-negative Matrix Factorization, 440-454. In Computational Science and Its Applications ¨C ICCSA 2014 SE - 33.
The 14th International Conference on Computational Science and Its Applications (ICCSA 2014), June 30 - July 03 2014, Guimar?es Portugal.
Gabriella Casalino, Ciro Castiello, Nicoletta Del Buono et al. (2012) Fattorizzazioni matriciali non negative per l'analisi dei dati nell'Educational Data Mining. In DIDAMATICA 2012.
DIDAMATICA 2012, informatica per la didattica, Taranto, 14-16 Maggio 2012
Gabriella Casalino, Nicoletta Del Buono, Corrado Mencar (2011) Subtractive Initialization of Nonnegative Matrix Factorizations for Document Clustering, 188-195. In Fuzzy Logic and Applications (WILF 2011).
The 9th International Workshop on Fuzzy Logic and Applications, August 29-31 2011, Trani
Technology use over time and its impact on consumers and businesses.pptxkaylagaze
?
In this presentation, I explore how technology has changed consumer behaviour and its impact on consumers and businesses. I will focus on internet access, digital devices, how customers search for information and what they buy online, video consumption, and lastly consumer trends.
Many MSPs overlook endpoint backup, missing out on additional profit and leaving a gap that puts client data at risk.
Join our webinar as we break down the top challenges of endpoint backup¡ªand how to overcome them.
This is session #4 of the 5-session online study series with Google Cloud, where we take you onto the journey learning generative AI. You¡¯ll explore the dynamic landscape of Generative AI, gaining both theoretical insights and practical know-how of Google Cloud GenAI tools such as Gemini, Vertex AI, AI agents and Imagen 3.
What Makes "Deep Research"? A Dive into AI AgentsZilliz
?
About this webinar:
Unless you live under a rock, you will have heard about OpenAI¡¯s release of Deep Research on Feb 2, 2025. This new product promises to revolutionize how we answer questions requiring the synthesis of large amounts of diverse information. But how does this technology work, and why is Deep Research a noticeable improvement over previous attempts? In this webinar, we will examine the concepts underpinning modern agents using our basic clone, Deep Searcher, as an example.
Topics covered:
Tool use
Structured output
Reflection
Reasoning models
Planning
Types of agentic memory
[Webinar] Scaling Made Simple: Getting Started with No-Code Web AppsSafe Software
?
Ready to simplify workflow sharing across your organization without diving into complex coding? With FME Flow Apps, you can build no-code web apps that make your data work harder for you ¡ª fast.
In this webinar, we¡¯ll show you how to:
Build and deploy Workspace Apps to create an intuitive user interface for self-serve data processing and validation.
Automate processes using Automation Apps. Learn to create a no-code web app to kick off workflows tailored to your needs, trigger multiple workspaces and external actions, and use conditional filtering within automations to control your workflows.
Create a centralized portal with Gallery Apps to share a collection of no-code web apps across your organization.
Through real-world examples and practical demos, you¡¯ll learn how to transform your workflows into intuitive, self-serve solutions that empower your team and save you time. We can¡¯t wait to show you what¡¯s possible!
THE BIG TEN BIOPHARMACEUTICAL MNCs: GLOBAL CAPABILITY CENTERS IN INDIASrivaanchi Nathan
?
This business intelligence report, "The Big Ten Biopharmaceutical MNCs: Global Capability Centers in India", provides an in-depth analysis of the operations and contributions of the Global Capability Centers (GCCs) of ten leading biopharmaceutical multinational corporations in India. The report covers AstraZeneca, Bayer, Bristol Myers Squibb, GlaxoSmithKline (GSK), Novartis, Sanofi, Roche, Pfizer, Novo Nordisk, and Eli Lilly. In this report each company's GCC is profiled with details on location, workforce size, investment, and the strategic roles these centers play in global business operations, research and development, and information technology and digital innovation.
Computational Photography: How Technology is Changing Way We Capture the WorldHusseinMalikMammadli
?
? Computational Photography (Computer Vision/Image): How Technology is Changing the Way We Capture the World
He? d¨¹?¨¹nm¨¹s¨¹n¨¹zm¨¹, m¨¹asir smartfonlar v? kameralar nec? bu q?d?r g?z?l g?r¨¹nt¨¹l?r yarad?r? Bunun sirri Computational Fotoqrafiyas?nda(Computer Vision/Imaging) gizlidir¡ª??kill?ri ??km? v? emal etm? ¨¹sulumuzu t?kmill??dir?n, komp¨¹ter elmi il? fotoqrafiyan?n inqilabi birl??m?si.
? ????? ??????? ????? ?
???????? ??????????? is proud to be a part of the ?????? ????? ???? ???? ??????? (?????) success story! By delivering seamless, secure, and high-speed connectivity, OSWAN has revolutionized e-?????????? ?? ??????, enabling efficient communication between government departments and enhancing citizen services.
Through our innovative solutions, ???????? ?????????? has contributed to making governance smarter, faster, and more transparent. This milestone reflects our commitment to driving digital transformation and empowering communities.
? ?????????? ??????, ?????????? ??????????!
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar PatturajScyllaDB
?
Freshworks migrated from Cassandra to ScyllaDB to handle growing audit log data efficiently. Cassandra required frequent scaling, complex repairs, and had non-linear scaling. ScyllaDB reduced costs with fewer machines and improved operations. Using Zero Downtime Migration (ZDM), they bulk-migrated data, performed dual writes, and validated consistency.
Gojek Clone is a versatile multi-service super app that offers ride-hailing, food delivery, payment services, and more, providing a seamless experience for users and businesses alike on a single platform.
DevNexus - Building 10x Development Organizations.pdfJustin Reock
?
Developer Experience is Dead! Long Live Developer Experience!
In this keynote-style session, we¡¯ll take a detailed, granular look at the barriers to productivity developers face today and modern approaches for removing them. 10x developers may be a myth, but 10x organizations are very real, as proven by the influential study performed in the 1980s, ¡®The Coding War Games.¡¯
Right now, here in early 2025, we seem to be experiencing YAPP (Yet Another Productivity Philosophy), and that philosophy is converging on developer experience. It seems that with every new method, we invent to deliver products, whether physical or virtual, we reinvent productivity philosophies to go alongside them.
But which of these approaches works? DORA? SPACE? DevEx? What should we invest in and create urgency behind today so we don¡¯t have the same discussion again in a decade?
DealBook of Ukraine: 2025 edition | AVentures CapitalYevgen Sysoyev
?
The DealBook is our annual overview of the Ukrainian tech investment industry. This edition comprehensively covers the full year 2024 and the first deals of 2025.
Future-Proof Your Career with AI OptionsDianaGray10
?
Learn about the difference between automation, AI and agentic and ways you can harness these to further your career. In this session you will learn:
Introduction to automation, AI, agentic
Trends in the marketplace
Take advantage of UiPath training and certification
In demand skills needed to strategically position yourself to stay ahead
? If you have any questions or feedback, please refer to the "Women in Automation 2025" dedicated Forum thread. You can find there extra details and updates.
Future-Proof Your Career with AI OptionsDianaGray10
?
Data stream classification by incremental semi-supervised fuzzy clustering
1. Data stream classi?cation
by incremental
semi-supervised fuzzy clustering
G.Casalino, G. Castellano, C.Castiello, A.M.Fanelli, C. Mencar
CVPL2018
gabriella.casalino@uniba.it
2. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Data streams
? Continuous ?ow of data
? sensors, online transactions, health monitoring, network traf?c,¡
? Impractical to store and use all data
? Need of new techniques that:
? Process a ?nite number of data at a time
? Use a limited amount of memory
? Predict/classify at any time and in a limited amount of time
? Take into account the evolution of data
3. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Proposed method
? DISSFCM: Dynamic Incremental Semi-Supervised Fuzzy C-Means
? a method for data stream classi?cation that
? works in an incremental way
? dynamically adapts the number of clusters:
? a ?xed number of clusters may not capture adequately the evolving
structure of streaming data
? uses unlabeled and labeled data, semi-supervised
? uses fuzzy logic to describe patterns in data
4. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Proposed method
? Based on semi-supervised fuzzy clustering
algorithm
? Applied to subsequent, non-overlapping chunks of
data so as to enable continuous update of clusters
? SSFCM - Semi-Supervised FCM (Pedrycz and
Waletzky, 1997)
Supervised component
5. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Split
? When the cluster quality deteriorates from one data
chunk to another, the number of clusters is
increased (by splitting some clusters)
? The cluster quality is evaluated in terms of the
reconstruction error (Pedrycz, 2008)
? The cluster having the highest value of the
reconstruction error is splitted in two clusters
? To ?nd the new two prototypes a conditional fuzzy
clustering is applied to the data belonging to the cluster
6. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Merge
? The two nearest clusters sharing the same
prototype¡¯s label are merged in one if:
? the number of clusters exceeds a prede?ned threshold
? the number of data belonging to a cluster is below a
prede?ned threshold
7. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
DISSFCM
8. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Experimental results
? Optical recognition of Handwritten Digits dataset
? 5620 samples, 10 classes
? Training set: 90%, Test set: 10%
? #Chunk: 5,10,15,20
? %Labeling: 75%
? Splitting tolerance: 25, 50, 100
? Evaluation measure: classi?cation accuracy
9. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Trend of the reconstruction
error
#Chunk=20, %Labeling=75%, SplitTol=25
10. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Accuracy values
#Chunk=5 #Chunk=10
#Chunk=15 #Chunk=20
11. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
Conclusions
? DISSFCM
? learn incrementally from data
? adapt the number of cluster
? inject a-priori knowledge in the process
? Future work:
? the merge activation conditions
? the in?uence of the chunk composition
? a mechanism to detect outliers, concept drift and the emergence of
new classes.
12. CVPL2018 - Vico Equense,
Italy, 30-31 August 2018
gabriella.casalino@uniba.it
Data stream classi?cation by incremental semi-
supervised fuzzy clustering
http://www.di.uniba.it/~cilab