�ݺ�ߣshows by User: abifet

�ݺ�ߣshows by User: abifet / http://www.slideshare.net/images/logo.gif �ݺ�ߣshows by User: abifet / Fri, 16 Mar 2018 20:59:17 GMT �ݺ�ߣShare feed for �ݺ�ߣshows by User: abifet Artificial intelligence and data stream mining /abifet/artificial-intelligence-and-data-stream-mining ai-imt-tpt-180316205917
Big Data and Artificial Intelligence have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from data streams has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of artificial intelligence research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, industrial applications, open source tools, and current challenges of data stream mining.]]>
Big Data and Artificial Intelligence have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from data streams has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of artificial intelligence research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, industrial applications, open source tools, and current challenges of data stream mining.]]> Fri, 16 Mar 2018 20:59:17 GMT /abifet/artificial-intelligence-and-data-stream-mining abifet@slideshare.net(abifet) Artificial intelligence and data stream mining abifet Big Data and Artificial Intelligence have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from data streams has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of artificial intelligence research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, industrial applications, open source tools, and current challenges of data stream mining. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ai-imt-tpt-180316205917-thumbnail.jpg?width=120&height=120&fit=bounds" /> Big Data and Artificial Intelligence have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from data streams has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of artificial intelligence research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, industrial applications, open source tools, and current challenges of data stream mining.

Artificial intelligence and data stream mining from Albert Bifet

]]> 1842 4 https://cdn.slidesharecdn.com/ss_thumbnails/ai-imt-tpt-180316205917-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 MOA for the IoT at ACML 2016 /slideshow/moa-for-the-iot-at-acml-2016/90960641 moa-iot-acml-180316205514
Big Data and the Internet of Things (IoT) have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, and I will introduce some popular open source tools for data stream mining.]]>
Big Data and the Internet of Things (IoT) have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, and I will introduce some popular open source tools for data stream mining.]]> Fri, 16 Mar 2018 20:55:14 GMT /slideshow/moa-for-the-iot-at-acml-2016/90960641 abifet@slideshare.net(abifet) MOA for the IoT at ACML 2016 abifet Big Data and the Internet of Things (IoT) have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, and I will introduce some popular open source tools for data stream mining. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/moa-iot-acml-180316205514-thumbnail.jpg?width=120&height=120&fit=bounds" /> Big Data and the Internet of Things (IoT) have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analysis of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream mining, and I will introduce some popular open source tools for data stream mining.

MOA for the IoT at ACML 2016 from Albert Bifet

]]> 618 1 https://cdn.slidesharecdn.com/ss_thumbnails/moa-iot-acml-180316205514-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Mining Big Data Streams with APACHE SAMOA /slideshow/mining-big-data-streams-with-apache-samoa/90960001 samoajonthebeach-180316204812
In this talk, we present Apache SAMOA, an open-source platform for mining big data streams with Apache Flink, Storm and Samza. Real time analytics is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Apache SAMOA includes algorithms for the most common machine learning tasks such as classification and clustering. It provides a pluggable architecture that allows it to run on Apache Flink, but also with other several distributed stream processing engines such as Storm and Samza.]]>
In this talk, we present Apache SAMOA, an open-source platform for mining big data streams with Apache Flink, Storm and Samza. Real time analytics is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Apache SAMOA includes algorithms for the most common machine learning tasks such as classification and clustering. It provides a pluggable architecture that allows it to run on Apache Flink, but also with other several distributed stream processing engines such as Storm and Samza.]]> Fri, 16 Mar 2018 20:48:12 GMT /slideshow/mining-big-data-streams-with-apache-samoa/90960001 abifet@slideshare.net(abifet) Mining Big Data Streams with APACHE SAMOA abifet In this talk, we present Apache SAMOA, an open-source platform for mining big data streams with Apache Flink, Storm and Samza. Real time analytics is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Apache SAMOA includes algorithms for the most common machine learning tasks such as classification and clustering. It provides a pluggable architecture that allows it to run on Apache Flink, but also with other several distributed stream processing engines such as Storm and Samza. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/samoajonthebeach-180316204812-thumbnail.jpg?width=120&height=120&fit=bounds" /> In this talk, we present Apache SAMOA, an open-source platform for mining big data streams with Apache Flink, Storm and Samza. Real time analytics is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Apache SAMOA includes algorithms for the most common machine learning tasks such as classification and clustering. It provides a pluggable architecture that allows it to run on Apache Flink, but also with other several distributed stream processing engines such as Storm and Samza.

Mining Big Data Streams with APACHE SAMOA from Albert Bifet

]]> 1063 7 https://cdn.slidesharecdn.com/ss_thumbnails/samoajonthebeach-180316204812-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Efficient Online Evaluation of Big Data Stream Classifiers /slideshow/efficient-online-evaluation-of-big-data-stream-classifiers/59412968 evaluation-160311082034
The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.]]>
The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.]]> Fri, 11 Mar 2016 08:20:34 GMT /slideshow/efficient-online-evaluation-of-big-data-stream-classifiers/59412968 abifet@slideshare.net(abifet) Efficient Online Evaluation of Big Data Stream Classifiers abifet The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/evaluation-160311082034-thumbnail.jpg?width=120&height=120&fit=bounds" /> The evaluation of classifiers in data streams is fundamental so that poorly-performing models can be identified, and either improved or replaced by better-performing models. This is an increasingly relevant and important task as stream data is generated from more sources, in real-time, in large quantities, and is now considered the largest source of big data. Both researchers and practitioners need to be able to effectively evaluate the performance of the methods they employ. However, there are major challenges for evaluation in a stream. Instances arriving in a data stream are usually time-dependent, and the underlying concept that they represent may evolve over time. Furthermore, the massive quantity of data also tends to exacerbate issues such as class imbalance. Current frameworks for evaluating streaming and online algorithms are able to give predictions in real-time, but as they use a prequential setting, they build only one model, and are thus not able to compute the statistical significance of results in real-time. In this paper we propose a new evaluation methodology for big data streams. This methodology addresses unbalanced data streams, data where change occurs on different time scales, and the question of how to split the data between training and testing, over multiple models.

Efficient Online Evaluation of Big Data Stream Classifiers from Albert Bifet

]]> 16804 6 https://cdn.slidesharecdn.com/ss_thumbnails/evaluation-160311082034-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Apache Samoa: Mining Big Data Streams with Apache Flink /slideshow/apache-samoa-mining-big-data-streams-with-apache-flink/56258192 samoa2015flink-151019134600-lva1-app6891-151217231426
Flink Forward 2015]]>
Flink Forward 2015]]> Thu, 17 Dec 2015 23:14:26 GMT /slideshow/apache-samoa-mining-big-data-streams-with-apache-flink/56258192 abifet@slideshare.net(abifet) Apache Samoa: Mining Big Data Streams with Apache Flink abifet Flink Forward 2015 <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/samoa2015flink-151019134600-lva1-app6891-151217231426-thumbnail.jpg?width=120&height=120&fit=bounds" /> Flink Forward 2015

Apache Samoa: Mining Big Data Streams with Apache Flink from Albert Bifet

]]> 3341 8 https://cdn.slidesharecdn.com/ss_thumbnails/samoa2015flink-151019134600-lva1-app6891-151217231426-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Introduction to Big Data Science /slideshow/introduction-to-big-data-science/56258066 bigdata-datascience-151217230811
Data Science, Data Mining]]>
Data Science, Data Mining]]> Thu, 17 Dec 2015 23:08:11 GMT /slideshow/introduction-to-big-data-science/56258066 abifet@slideshare.net(abifet) Introduction to Big Data Science abifet Data Science, Data Mining <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bigdata-datascience-151217230811-thumbnail.jpg?width=120&height=120&fit=bounds" /> Data Science, Data Mining

Introduction to Big Data Science from Albert Bifet

]]> 868 4 https://cdn.slidesharecdn.com/ss_thumbnails/bigdata-datascience-151217230811-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Introduction to Big Data /slideshow/introduction-to-big-data-56258037/56258037 bigdata-introduction-151217230629
Hadoop ecosystem, Big Data ]]>
Hadoop ecosystem, Big Data ]]> Thu, 17 Dec 2015 23:06:29 GMT /slideshow/introduction-to-big-data-56258037/56258037 abifet@slideshare.net(abifet) Introduction to Big Data abifet Hadoop ecosystem, Big Data <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bigdata-introduction-151217230629-thumbnail.jpg?width=120&height=120&fit=bounds" /> Hadoop ecosystem, Big Data

Introduction to Big Data from Albert Bifet

]]> 1496 7 https://cdn.slidesharecdn.com/ss_thumbnails/bigdata-introduction-151217230629-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Internet of Things Data Science /slideshow/internet-of-things-data-science/56257953 bigdata-streammining-151217230150
Big Data Stream Mining]]>
Big Data Stream Mining]]> Thu, 17 Dec 2015 23:01:50 GMT /slideshow/internet-of-things-data-science/56257953 abifet@slideshare.net(abifet) Internet of Things Data Science abifet Big Data Stream Mining <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bigdata-streammining-151217230150-thumbnail.jpg?width=120&height=120&fit=bounds" /> Big Data Stream Mining

Internet of Things Data Science from Albert Bifet

]]> 850 4 https://cdn.slidesharecdn.com/ss_thumbnails/bigdata-streammining-151217230150-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Real Time Big Data Management /slideshow/real-time-big-data-management/56257952 bigdata-streaming-151217230148
Storm, Samza, Flink]]>
Storm, Samza, Flink]]> Thu, 17 Dec 2015 23:01:48 GMT /slideshow/real-time-big-data-management/56257952 abifet@slideshare.net(abifet) Real Time Big Data Management abifet Storm, Samza, Flink <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/bigdata-streaming-151217230148-thumbnail.jpg?width=120&height=120&fit=bounds" /> Storm, Samza, Flink

Real Time Big Data Management from Albert Bifet

]]> 1282 6 https://cdn.slidesharecdn.com/ss_thumbnails/bigdata-streaming-151217230148-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 A Short Course in Data Stream Mining /slideshow/a-short-course-in-data-stream-mining/46150542 streamingcourse-150322204707-conversion-gate01
�ݺ�ߣs for a short course in data stream mining, showing classification, adaptive methods, clustering and frequent pattern methods.]]>
�ݺ�ߣs for a short course in data stream mining, showing classification, adaptive methods, clustering and frequent pattern methods.]]> Sun, 22 Mar 2015 20:47:07 GMT /slideshow/a-short-course-in-data-stream-mining/46150542 abifet@slideshare.net(abifet) A Short Course in Data Stream Mining abifet �ݺ�ߣs for a short course in data stream mining, showing classification, adaptive methods, clustering and frequent pattern methods. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/streamingcourse-150322204707-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds" /> �ݺ�ߣs for a short course in data stream mining, showing classification, adaptive methods, clustering and frequent pattern methods.

A Short Course in Data Stream Mining from Albert Bifet

]]> 8825 6 https://cdn.slidesharecdn.com/ss_thumbnails/streamingcourse-150322204707-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Real-Time Big Data Stream Analytics /abifet/realtime-big-data-stream-analytics basna-141218080314-conversion-gate02
Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4.]]>
Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4.]]> Thu, 18 Dec 2014 08:03:13 GMT /abifet/realtime-big-data-stream-analytics abifet@slideshare.net(abifet) Real-Time Big Data Stream Analytics abifet Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/basna-141218080314-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds" /> Big Data is a new term used in Business Analytics to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. In this talk, we will focus on advanced techniques in Big Data mining in real time using evolving data stream techniques: using a small amount of time and memory resources, and being able to adapt to changes. We will discuss a social network application of data stream mining to compute user influence probabilities. And finally, we will present the MOA software framework with classification, regression, and frequent pattern methods, and the SAMOA distributed streaming software that runs on top of Storm, Samza and S4.

Real-Time Big Data Stream Analytics from Albert Bifet

]]> 6296 2 https://cdn.slidesharecdn.com/ss_thumbnails/basna-141218080314-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Multi-label Classification with Meta-labels /slideshow/presentation-42833362/42833362 presentation-141218075855-conversion-gate01
The area of multi-label classification has rapidly developed in recent years. It has become widely known that the baseline binary relevance approach suffers from class imbalance and a restricted hypothesis space that negatively affects its predictive performance, and can easily be outperformed by methods which learn labels together. A number of methods have grown around the label powerset approach, which models label combinations together as class values in a multi-class problem. We describe the label-powerset-based solutions under a general framework of \emph{meta-labels}. We provide theoretical justification for this framework which has been lacking, by viewing meta-labels as a hidden layer in an artificial neural network. We explain how meta-labels essentially allow a random projection into a space where non-linearities can easily be tackled with established linear learning algorithms. The proposed framework enables comparison and combination of related approaches to different multi-label problems. Indeed, we present a novel model in the framework and evaluate it empirically against several high-performing methods, with respect to predictive performance and scalability, on a number of datasets and evaluation metrics. Our deployment of an ensemble of meta-label classifiers obtains competitive accuracy for a fraction of the computation required by the current meta-label methods for multi-label classification.]]>
The area of multi-label classification has rapidly developed in recent years. It has become widely known that the baseline binary relevance approach suffers from class imbalance and a restricted hypothesis space that negatively affects its predictive performance, and can easily be outperformed by methods which learn labels together. A number of methods have grown around the label powerset approach, which models label combinations together as class values in a multi-class problem. We describe the label-powerset-based solutions under a general framework of \emph{meta-labels}. We provide theoretical justification for this framework which has been lacking, by viewing meta-labels as a hidden layer in an artificial neural network. We explain how meta-labels essentially allow a random projection into a space where non-linearities can easily be tackled with established linear learning algorithms. The proposed framework enables comparison and combination of related approaches to different multi-label problems. Indeed, we present a novel model in the framework and evaluate it empirically against several high-performing methods, with respect to predictive performance and scalability, on a number of datasets and evaluation metrics. Our deployment of an ensemble of meta-label classifiers obtains competitive accuracy for a fraction of the computation required by the current meta-label methods for multi-label classification.]]> Thu, 18 Dec 2014 07:58:55 GMT /slideshow/presentation-42833362/42833362 abifet@slideshare.net(abifet) Multi-label Classification with Meta-labels abifet The area of multi-label classification has rapidly developed in recent years. It has become widely known that the baseline binary relevance approach suffers from class imbalance and a restricted hypothesis space that negatively affects its predictive performance, and can easily be outperformed by methods which learn labels together. A number of methods have grown around the label powerset approach, which models label combinations together as class values in a multi-class problem. We describe the label-powerset-based solutions under a general framework of \emph{meta-labels}. We provide theoretical justification for this framework which has been lacking, by viewing meta-labels as a hidden layer in an artificial neural network. We explain how meta-labels essentially allow a random projection into a space where non-linearities can easily be tackled with established linear learning algorithms. The proposed framework enables comparison and combination of related approaches to different multi-label problems. Indeed, we present a novel model in the framework and evaluate it empirically against several high-performing methods, with respect to predictive performance and scalability, on a number of datasets and evaluation metrics. Our deployment of an ensemble of meta-label classifiers obtains competitive accuracy for a fraction of the computation required by the current meta-label methods for multi-label classification. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/presentation-141218075855-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds" /> The area of multi-label classification has rapidly developed in recent years. It has become widely known that the baseline binary relevance approach suffers from class imbalance and a restricted hypothesis space that negatively affects its predictive performance, and can easily be outperformed by methods which learn labels together. A number of methods have grown around the label powerset approach, which models label combinations together as class values in a multi-class problem. We describe the label-powerset-based solutions under a general framework of \emph{meta-labels}. We provide theoretical justification for this framework which has been lacking, by viewing meta-labels as a hidden layer in an artificial neural network. We explain how meta-labels essentially allow a random projection into a space where non-linearities can easily be tackled with established linear learning algorithms. The proposed framework enables comparison and combination of related approaches to different multi-label problems. Indeed, we present a novel model in the framework and evaluate it empirically against several high-performing methods, with respect to predictive performance and scalability, on a number of datasets and evaluation metrics. Our deployment of an ensemble of meta-label classifiers obtains competitive accuracy for a fraction of the computation required by the current meta-label methods for multi-label classification.

Multi-label Classification with Meta-labels from Albert Bifet

]]> 2433 6 https://cdn.slidesharecdn.com/ss_thumbnails/presentation-141218075855-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Pitfalls in benchmarking data stream classification and how to avoid them /slideshow/pitfalls-in-benchmarking-data-stream-classification-and-how-to-avoid-them/26781777 slides-131002064648-phpapp02
]]>
]]> Wed, 02 Oct 2013 06:46:48 GMT /slideshow/pitfalls-in-benchmarking-data-stream-classification-and-how-to-avoid-them/26781777 abifet@slideshare.net(abifet) Pitfalls in benchmarking data stream classification and how to avoid them abifet <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/slides-131002064648-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds" />

Pitfalls in benchmarking data stream classification and how to avoid them from Albert Bifet

]]> 7483 3 https://cdn.slidesharecdn.com/ss_thumbnails/slides-131002064648-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 STRIP: stream learning of influence probabilities. /slideshow/strip-26781683/26781683 strip-131002064309-phpapp02
Influence-driven diffusion of information is a fundamental process in social networks. Learning the latent variables of such process, i.e., the influence strength along each link, is a central question towards understanding the structure and function of complex networks, modeling information cascades, and developing applications such as viral marketing. Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. Our contribution is a number of randomized approximation algorithms, categorized according to the available space (superlinear, linear, and sublinear in the number of nodes n) and according to different models (landmark and sliding window). Among several results, we show that we can learn influence probabilities with one pass over the data, using O(nlog n) space, in both the landmark model and the sliding-window model, and we further show that our algorithm is within a logarithmic factor of optimal. For truly large graphs, when one needs to operate with sublinear space, we show that we can still learn influence probabilities in one pass, assuming that we restrict our attention to the most active users. Our thorough experimental evaluation on large social graph demonstrates that the empirical performance of our algorithms agrees with that predicted by the theory. ]]>
Influence-driven diffusion of information is a fundamental process in social networks. Learning the latent variables of such process, i.e., the influence strength along each link, is a central question towards understanding the structure and function of complex networks, modeling information cascades, and developing applications such as viral marketing. Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. Our contribution is a number of randomized approximation algorithms, categorized according to the available space (superlinear, linear, and sublinear in the number of nodes n) and according to different models (landmark and sliding window). Among several results, we show that we can learn influence probabilities with one pass over the data, using O(nlog n) space, in both the landmark model and the sliding-window model, and we further show that our algorithm is within a logarithmic factor of optimal. For truly large graphs, when one needs to operate with sublinear space, we show that we can still learn influence probabilities in one pass, assuming that we restrict our attention to the most active users. Our thorough experimental evaluation on large social graph demonstrates that the empirical performance of our algorithms agrees with that predicted by the theory. ]]> Wed, 02 Oct 2013 06:43:09 GMT /slideshow/strip-26781683/26781683 abifet@slideshare.net(abifet) STRIP: stream learning of influence probabilities. abifet Influence-driven diffusion of information is a fundamental process in social networks. Learning the latent variables of such process, i.e., the influence strength along each link, is a central question towards understanding the structure and function of complex networks, modeling information cascades, and developing applications such as viral marketing. Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. Our contribution is a number of randomized approximation algorithms, categorized according to the available space (superlinear, linear, and sublinear in the number of nodes n) and according to different models (landmark and sliding window). Among several results, we show that we can learn influence probabilities with one pass over the data, using O(nlog n) space, in both the landmark model and the sliding-window model, and we further show that our algorithm is within a logarithmic factor of optimal. For truly large graphs, when one needs to operate with sublinear space, we show that we can still learn influence probabilities in one pass, assuming that we restrict our attention to the most active users. Our thorough experimental evaluation on large social graph demonstrates that the empirical performance of our algorithms agrees with that predicted by the theory. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/strip-131002064309-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds" /> Influence-driven diffusion of information is a fundamental process in social networks. Learning the latent variables of such process, i.e., the influence strength along each link, is a central question towards understanding the structure and function of complex networks, modeling information cascades, and developing applications such as viral marketing. Motivated by modern microblogging platforms, such as twitter, in this paper we study the problem of learning influence probabilities in a data-stream scenario, in which the network topology is relatively stable and the challenge of a learning algorithm is to keep up with a continuous stream of tweets using a small amount of time and memory. Our contribution is a number of randomized approximation algorithms, categorized according to the available space (superlinear, linear, and sublinear in the number of nodes n) and according to different models (landmark and sliding window). Among several results, we show that we can learn influence probabilities with one pass over the data, using O(nlog n) space, in both the landmark model and the sliding-window model, and we further show that our algorithm is within a logarithmic factor of optimal. For truly large graphs, when one needs to operate with sublinear space, we show that we can still learn influence probabilities in one pass, assuming that we restrict our attention to the most active users. Our thorough experimental evaluation on large social graph demonstrates that the empirical performance of our algorithms agrees with that predicted by the theory.

STRIP: stream learning of influence probabilities. from Albert Bifet

]]> 2542 2 https://cdn.slidesharecdn.com/ss_thumbnails/strip-131002064309-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Efficient Data Stream Classification via Probabilistic Adaptive Windows /slideshow/slides-26781490/26781490 slides-131002063625-phpapp01
]]>
]]> Wed, 02 Oct 2013 06:36:25 GMT /slideshow/slides-26781490/26781490 abifet@slideshare.net(abifet) Efficient Data Stream Classification via Probabilistic Adaptive Windows abifet <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/slides-131002063625-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds" />

Efficient Data Stream Classification via Probabilistic Adaptive Windows from Albert Bifet

]]> 990 2 https://cdn.slidesharecdn.com/ss_thumbnails/slides-131002063625-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Mining Big Data in Real Time /slideshow/mining-bigdata/23974916 miningbigdata-130706121009-phpapp02
Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.]]>
Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.]]> Sat, 06 Jul 2013 12:10:09 GMT /slideshow/mining-bigdata/23974916 abifet@slideshare.net(abifet) Mining Big Data in Real Time abifet Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/miningbigdata-130706121009-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds" /> Big Data is a new term used to identify datasets that we can not manage with current methodologies or data mining software tools due to their large size and complexity. Big Data mining is the capability of extracting useful information from these large datasets or streams of data. New mining techniques are necessary due to the volume, variability, and velocity, of such data.

Mining Big Data in Real Time from Albert Bifet

]]> 6210 7 https://cdn.slidesharecdn.com/ss_thumbnails/miningbigdata-130706121009-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Mining Big Data in Real Time /slideshow/mining-big-data-in-real-time/15522780 20slais-121206132621-phpapp02
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.]]>
Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.]]> Thu, 06 Dec 2012 13:26:21 GMT /slideshow/mining-big-data-in-real-time/15522780 abifet@slideshare.net(abifet) Mining Big Data in Real Time abifet Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/20slais-121206132621-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds" /> Streaming data analysis in real time is becoming the fastest and most efficient way to obtain useful knowledge from what is happening now, allowing organizations to react quickly when problems appear or to detect new trends helping to improve their performance. Evolving data streams are contributing to the growth of data created over the last few years. We are creating the same quantity of data every two days, as we created from the dawn of time up until 2003. Evolving data streams methods are becoming a low-cost, green methodology for real time online prediction and analysis. We discuss the current and future trends of mining evolving data streams, and the challenges that the field will have to overcome during the next years.

Mining Big Data in Real Time from Albert Bifet

]]> 5106 6 https://cdn.slidesharecdn.com/ss_thumbnails/20slais-121206132621-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Mining Frequent Closed Graphs on Evolving Data Streams /slideshow/mining-frequent-closed-graphs-on-evolving-data-streams/9009102 kdd11-110825124406-phpapp01
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.]]>
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.]]> Thu, 25 Aug 2011 12:44:05 GMT /slideshow/mining-frequent-closed-graphs-on-evolving-data-streams/9009102 abifet@slideshare.net(abifet) Mining Frequent Closed Graphs on Evolving Data Streams abifet Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/kdd11-110825124406-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds" /> Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.

Mining Frequent Closed Graphs on Evolving Data Streams from Albert Bifet

]]> 1027 3 https://cdn.slidesharecdn.com/ss_thumbnails/kdd11-110825124406-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions /slideshow/pakdd-2011-tutorial-handling-concept-drift-importance-challenges-and-solutions/8158115 pakdd11tutorialhandlingconceptdrift-110531022457-phpapp01
Presenters: Indrė Žliobaitė, João Gama, Albert Bifet, and Mykola Pechenizkiy.]]>
Presenters: Indrė Žliobaitė, João Gama, Albert Bifet, and Mykola Pechenizkiy.]]> Tue, 31 May 2011 02:24:51 GMT /slideshow/pakdd-2011-tutorial-handling-concept-drift-importance-challenges-and-solutions/8158115 abifet@slideshare.net(abifet) PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions abifet Presenters: Indrė Žliobaitė, João Gama, Albert Bifet, and Mykola Pechenizkiy. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/pakdd11tutorialhandlingconceptdrift-110531022457-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds" /> Presenters: Indrė Žliobaitė, João Gama, Albert Bifet, and Mykola Pechenizkiy.

PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions from Albert Bifet

]]> 7319 7 https://cdn.slidesharecdn.com/ss_thumbnails/pakdd11tutorialhandlingconceptdrift-110531022457-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 Sentiment Knowledge Discovery in Twitter Streaming Data /slideshow/ds10/5637548 ds10-101101165101-phpapp02
This talk shows how to use the new Twitter Streaming API to obtain new knowledge using data mining methods for evolving data streams.]]>
This talk shows how to use the new Twitter Streaming API to obtain new knowledge using data mining methods for evolving data streams.]]> Mon, 01 Nov 2010 16:50:43 GMT /slideshow/ds10/5637548 abifet@slideshare.net(abifet) Sentiment Knowledge Discovery in Twitter Streaming Data abifet This talk shows how to use the new Twitter Streaming API to obtain new knowledge using data mining methods for evolving data streams. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ds10-101101165101-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds" /> This talk shows how to use the new Twitter Streaming API to obtain new knowledge using data mining methods for evolving data streams.

Sentiment Knowledge Discovery in Twitter Streaming Data from Albert Bifet

]]> 987 2 https://cdn.slidesharecdn.com/ss_thumbnails/ds10-101101165101-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post

http://activitystrea.ms/schema/1.0/posted

0 https://cdn.slidesharecdn.com/profile-photo-abifet-48x48.jpg?cb=1616960192 Albert Bifet is a Big Data Scientist working on data stream mining. albertbifet.com/ https://cdn.slidesharecdn.com/ss_thumbnails/ai-imt-tpt-180316205917-thumbnail.jpg?width=320&height=320&fit=bounds abifet/artificial-intelligence-and-data-stream-mining Artificial intelligenc... https://cdn.slidesharecdn.com/ss_thumbnails/moa-iot-acml-180316205514-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/moa-for-the-iot-at-acml-2016/90960641 MOA for the IoT at ACM... https://cdn.slidesharecdn.com/ss_thumbnails/samoajonthebeach-180316204812-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/mining-big-data-streams-with-apache-samoa/90960001 Mining Big Data Stream...