際際滷shows by User: ds_mi / http://www.slideshare.net/images/logo.gif 際際滷shows by User: ds_mi / Thu, 16 Feb 2023 11:35:50 GMT 際際滷Share feed for 際際滷shows by User: ds_mi ML & Graph algorithms to prevent financial crime in digital payments /slideshow/ml-graph-algorithms-to-prevent-financial-crime-in-digital-payments/255886804 20230207datascienemilannexi-230216113550-2bd6a8b0
ML & Graph algorithms to prevent financial crime in digital payments]]>

ML & Graph algorithms to prevent financial crime in digital payments]]>
Thu, 16 Feb 2023 11:35:50 GMT /slideshow/ml-graph-algorithms-to-prevent-financial-crime-in-digital-payments/255886804 ds_mi@slideshare.net(ds_mi) ML & Graph algorithms to prevent financial crime in digital payments ds_mi ML & Graph algorithms to prevent financial crime in digital payments <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/20230207datascienemilannexi-230216113550-2bd6a8b0-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> ML &amp; Graph algorithms to prevent financial crime in digital payments
ML & Graph algorithms to prevent financial crime in digital payments from Data Science Milan
]]>
242 0 https://cdn.slidesharecdn.com/ss_thumbnails/20230207datascienemilannexi-230216113550-2bd6a8b0-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
How to use the Economic Complexity Index to guide innovation plans /slideshow/how-to-use-the-economic-complexity-index-to-guide-innovation-plans/253889590 ecimauropelucchi-milandatasciencemeetup-october2022-221030214805-fdfbf821
In this talk Mauro Pelucchi will present the Economic Complexity Index (ECI) and the Product Complexity Index (PCI), two network measures that provide unique insights into economic development patterns.We will show how to compute these metrics and explore the network theory behind these indices (Hidalgo and Hausmann, 2009). The measures are also related to various dimensionality reduction methods and can be used to determine distances between nodes based on their nodes based on their similarity.Finally, we will discover how to interpret these metrics to compare countries, markets, products, and guide our plans in a data-driven context.]]>

In this talk Mauro Pelucchi will present the Economic Complexity Index (ECI) and the Product Complexity Index (PCI), two network measures that provide unique insights into economic development patterns.We will show how to compute these metrics and explore the network theory behind these indices (Hidalgo and Hausmann, 2009). The measures are also related to various dimensionality reduction methods and can be used to determine distances between nodes based on their nodes based on their similarity.Finally, we will discover how to interpret these metrics to compare countries, markets, products, and guide our plans in a data-driven context.]]>
Sun, 30 Oct 2022 21:48:05 GMT /slideshow/how-to-use-the-economic-complexity-index-to-guide-innovation-plans/253889590 ds_mi@slideshare.net(ds_mi) How to use the Economic Complexity Index to guide innovation plans ds_mi In this talk Mauro Pelucchi will present the Economic Complexity Index (ECI) and the Product Complexity Index (PCI), two network measures that provide unique insights into economic development patterns.We will show how to compute these metrics and explore the network theory behind these indices (Hidalgo and Hausmann, 2009). The measures are also related to various dimensionality reduction methods and can be used to determine distances between nodes based on their nodes based on their similarity.Finally, we will discover how to interpret these metrics to compare countries, markets, products, and guide our plans in a data-driven context. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ecimauropelucchi-milandatasciencemeetup-october2022-221030214805-fdfbf821-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In this talk Mauro Pelucchi will present the Economic Complexity Index (ECI) and the Product Complexity Index (PCI), two network measures that provide unique insights into economic development patterns.We will show how to compute these metrics and explore the network theory behind these indices (Hidalgo and Hausmann, 2009). The measures are also related to various dimensionality reduction methods and can be used to determine distances between nodes based on their nodes based on their similarity.Finally, we will discover how to interpret these metrics to compare countries, markets, products, and guide our plans in a data-driven context.
How to use the Economic Complexity Index to guide innovation plans from Data Science Milan
]]>
127 0 https://cdn.slidesharecdn.com/ss_thumbnails/ecimauropelucchi-milandatasciencemeetup-october2022-221030214805-fdfbf821-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Robustness Metrics for ML Models based on Deep Learning Methods /slideshow/robustness-metrics-for-ml-models-based-on-deep-learning-methods/252396706 20220721aiguilddinner-220802053509-b8df77ce
Robustness Metrics for ML Models based on Deep Learning Methods Davide Posillipo Alkemy - Data Science Milan Meetup - AI Guild]]>

Robustness Metrics for ML Models based on Deep Learning Methods Davide Posillipo Alkemy - Data Science Milan Meetup - AI Guild]]>
Tue, 02 Aug 2022 05:35:09 GMT /slideshow/robustness-metrics-for-ml-models-based-on-deep-learning-methods/252396706 ds_mi@slideshare.net(ds_mi) Robustness Metrics for ML Models based on Deep Learning Methods ds_mi Robustness Metrics for ML Models based on Deep Learning Methods Davide Posillipo Alkemy - Data Science Milan Meetup - AI Guild <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/20220721aiguilddinner-220802053509-b8df77ce-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Robustness Metrics for ML Models based on Deep Learning Methods Davide Posillipo Alkemy - Data Science Milan Meetup - AI Guild
Robustness Metrics for ML Models based on Deep Learning Methods from Data Science Milan
]]>
92 0 https://cdn.slidesharecdn.com/ss_thumbnails/20220721aiguilddinner-220802053509-b8df77ce-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
"You don't need a bigger boat": serverless MLOps for reasonable companies /slideshow/you-dont-need-a-bigger-boat-serverless-mlops-for-reasonable-companies/249486836 biggerboatdatasciencemilanjune2021-210625085045
It is indeed a wonderful time to build machine learning systems, as the growing ecosystems of tools and shared best practices make even small teams incredibly productive at scale. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a (almost) pure serverless and open-source approach, and showing how the entire toolchain works - from raw data to model serving - on a real-world dataset. Finally, we argue that the crucial component for analyzing data pipelines is not the model per se, but the surrounding DAG, and present our proposal for producing automated "DAG cards" from Metaflow classes. Bio: Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo. When not busy building A.I. products, he is exploring research topics at the intersection of language, reasoning and learning, with several publications at major conferences (e.g. WWW, SIGIR, RecSys, NAACL). In previous lives, he managed to get a Ph.D., do scienc-y things for a pro basketball team, and simulate a pre-Columbian civilization. Topics: MLOps, Metaflow, model cards.]]>

It is indeed a wonderful time to build machine learning systems, as the growing ecosystems of tools and shared best practices make even small teams incredibly productive at scale. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a (almost) pure serverless and open-source approach, and showing how the entire toolchain works - from raw data to model serving - on a real-world dataset. Finally, we argue that the crucial component for analyzing data pipelines is not the model per se, but the surrounding DAG, and present our proposal for producing automated "DAG cards" from Metaflow classes. Bio: Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo. When not busy building A.I. products, he is exploring research topics at the intersection of language, reasoning and learning, with several publications at major conferences (e.g. WWW, SIGIR, RecSys, NAACL). In previous lives, he managed to get a Ph.D., do scienc-y things for a pro basketball team, and simulate a pre-Columbian civilization. Topics: MLOps, Metaflow, model cards.]]>
Fri, 25 Jun 2021 08:50:45 GMT /slideshow/you-dont-need-a-bigger-boat-serverless-mlops-for-reasonable-companies/249486836 ds_mi@slideshare.net(ds_mi) "You don't need a bigger boat": serverless MLOps for reasonable companies ds_mi It is indeed a wonderful time to build machine learning systems, as the growing ecosystems of tools and shared best practices make even small teams incredibly productive at scale. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a (almost) pure serverless and open-source approach, and showing how the entire toolchain works - from raw data to model serving - on a real-world dataset. Finally, we argue that the crucial component for analyzing data pipelines is not the model per se, but the surrounding DAG, and present our proposal for producing automated "DAG cards" from Metaflow classes. Bio: Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo. When not busy building A.I. products, he is exploring research topics at the intersection of language, reasoning and learning, with several publications at major conferences (e.g. WWW, SIGIR, RecSys, NAACL). In previous lives, he managed to get a Ph.D., do scienc-y things for a pro basketball team, and simulate a pre-Columbian civilization. Topics: MLOps, Metaflow, model cards. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/biggerboatdatasciencemilanjune2021-210625085045-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> It is indeed a wonderful time to build machine learning systems, as the growing ecosystems of tools and shared best practices make even small teams incredibly productive at scale. In this talk, we present our philosophy for modern, no-nonsense data pipelines, highlighting the advantages of a (almost) pure serverless and open-source approach, and showing how the entire toolchain works - from raw data to model serving - on a real-world dataset. Finally, we argue that the crucial component for analyzing data pipelines is not the model per se, but the surrounding DAG, and present our proposal for producing automated &quot;DAG cards&quot; from Metaflow classes. Bio: Jacopo Tagliabue was co-founder and CTO of Tooso, an A.I. company in San Francisco acquired by Coveo in 2019. Jacopo is currently the Lead A.I. Scientist at Coveo. When not busy building A.I. products, he is exploring research topics at the intersection of language, reasoning and learning, with several publications at major conferences (e.g. WWW, SIGIR, RecSys, NAACL). In previous lives, he managed to get a Ph.D., do scienc-y things for a pro basketball team, and simulate a pre-Columbian civilization. Topics: MLOps, Metaflow, model cards.
"You don't need a bigger boat": serverless MLOps for reasonable companies from Data Science Milan
]]>
2142 0 https://cdn.slidesharecdn.com/ss_thumbnails/biggerboatdatasciencemilanjune2021-210625085045-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Question generation using Natural Language Processing by QuestGen.AI /slideshow/question-generation-using-natural-language-processing-by-questgenai/249208443 presentationslides-210608190127
Manual question generation (worksheets and quizzes) in edtech is not scalable for online transformation and leads to increased workload on teachers due to the pandemic. In this session, we will explore natural language processing (NLP) techniques to generate Multiple Choice Questions automatically from any text content using the T5 transformer model. We will also explore methods to deploy the T5 question generation model for fast CPU inference using ONNX conversion and quantization. Bio: Ramsri is a Lead Data Scientist with 8+ years of work experience across Silicon Valley, Singapore, and India. Most recently he had been a co-founder and CTO of a funded AI-assisted assessments startup. He has spent the last 2 years developing question generation models in edtech and also released an open-source library on the same.]]>

Manual question generation (worksheets and quizzes) in edtech is not scalable for online transformation and leads to increased workload on teachers due to the pandemic. In this session, we will explore natural language processing (NLP) techniques to generate Multiple Choice Questions automatically from any text content using the T5 transformer model. We will also explore methods to deploy the T5 question generation model for fast CPU inference using ONNX conversion and quantization. Bio: Ramsri is a Lead Data Scientist with 8+ years of work experience across Silicon Valley, Singapore, and India. Most recently he had been a co-founder and CTO of a funded AI-assisted assessments startup. He has spent the last 2 years developing question generation models in edtech and also released an open-source library on the same.]]>
Tue, 08 Jun 2021 19:01:26 GMT /slideshow/question-generation-using-natural-language-processing-by-questgenai/249208443 ds_mi@slideshare.net(ds_mi) Question generation using Natural Language Processing by QuestGen.AI ds_mi Manual question generation (worksheets and quizzes) in edtech is not scalable for online transformation and leads to increased workload on teachers due to the pandemic. In this session, we will explore natural language processing (NLP) techniques to generate Multiple Choice Questions automatically from any text content using the T5 transformer model. We will also explore methods to deploy the T5 question generation model for fast CPU inference using ONNX conversion and quantization. Bio: Ramsri is a Lead Data Scientist with 8+ years of work experience across Silicon Valley, Singapore, and India. Most recently he had been a co-founder and CTO of a funded AI-assisted assessments startup. He has spent the last 2 years developing question generation models in edtech and also released an open-source library on the same. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/presentationslides-210608190127-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Manual question generation (worksheets and quizzes) in edtech is not scalable for online transformation and leads to increased workload on teachers due to the pandemic. In this session, we will explore natural language processing (NLP) techniques to generate Multiple Choice Questions automatically from any text content using the T5 transformer model. We will also explore methods to deploy the T5 question generation model for fast CPU inference using ONNX conversion and quantization. Bio: Ramsri is a Lead Data Scientist with 8+ years of work experience across Silicon Valley, Singapore, and India. Most recently he had been a co-founder and CTO of a funded AI-assisted assessments startup. He has spent the last 2 years developing question generation models in edtech and also released an open-source library on the same.
Question generation using Natural Language Processing by QuestGen.AI from Data Science Milan
]]>
915 0 https://cdn.slidesharecdn.com/ss_thumbnails/presentationslides-210608190127-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Speed up data preparation for ML pipelines on AWS /slideshow/speed-up-data-preparation-for-ml-pipelines-on-aws/248088753 2021-210508071844
Abstract: Data preparation and modelling are the activities that take most of the time in a typical data scientist workday. In this session well see how AWS services for Analytics and data management can be effectively used and integrated in AI/ML pipelines. Well focus on AWS Glue, AWS Glue DataBrew and AWS Data Wrangler with a bit of theory and hands-on demos. Bio: Francesco Marelli is a senior solutions architect at Amazon Web Services. He has lived and worked in UK, italy, Switzerland and other countries in EMEA. He is specialized in the design and implementation of Analytics, Data Management and Big Data systems. Francesco also has a strong experience in systems integration and design and implementation of applications. Topics: machine learning pipelines, AWS, cloud.]]>

Abstract: Data preparation and modelling are the activities that take most of the time in a typical data scientist workday. In this session well see how AWS services for Analytics and data management can be effectively used and integrated in AI/ML pipelines. Well focus on AWS Glue, AWS Glue DataBrew and AWS Data Wrangler with a bit of theory and hands-on demos. Bio: Francesco Marelli is a senior solutions architect at Amazon Web Services. He has lived and worked in UK, italy, Switzerland and other countries in EMEA. He is specialized in the design and implementation of Analytics, Data Management and Big Data systems. Francesco also has a strong experience in systems integration and design and implementation of applications. Topics: machine learning pipelines, AWS, cloud.]]>
Sat, 08 May 2021 07:18:44 GMT /slideshow/speed-up-data-preparation-for-ml-pipelines-on-aws/248088753 ds_mi@slideshare.net(ds_mi) Speed up data preparation for ML pipelines on AWS ds_mi Abstract: Data preparation and modelling are the activities that take most of the time in a typical data scientist workday. In this session well see how AWS services for Analytics and data management can be effectively used and integrated in AI/ML pipelines. Well focus on AWS Glue, AWS Glue DataBrew and AWS Data Wrangler with a bit of theory and hands-on demos. Bio: Francesco Marelli is a senior solutions architect at Amazon Web Services. He has lived and worked in UK, italy, Switzerland and other countries in EMEA. He is specialized in the design and implementation of Analytics, Data Management and Big Data systems. Francesco also has a strong experience in systems integration and design and implementation of applications. Topics: machine learning pipelines, AWS, cloud. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/2021-210508071844-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Abstract: Data preparation and modelling are the activities that take most of the time in a typical data scientist workday. In this session well see how AWS services for Analytics and data management can be effectively used and integrated in AI/ML pipelines. Well focus on AWS Glue, AWS Glue DataBrew and AWS Data Wrangler with a bit of theory and hands-on demos. Bio: Francesco Marelli is a senior solutions architect at Amazon Web Services. He has lived and worked in UK, italy, Switzerland and other countries in EMEA. He is specialized in the design and implementation of Analytics, Data Management and Big Data systems. Francesco also has a strong experience in systems integration and design and implementation of applications. Topics: machine learning pipelines, AWS, cloud.
Speed up data preparation for ML pipelines on AWS from Data Science Milan
]]>
354 0 https://cdn.slidesharecdn.com/ss_thumbnails/2021-210508071844-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Serverless machine learning architectures at Helixa /slideshow/serverless-machine-learning-architectures-at-helixa-data-science-milan-240170403/240170403 serverlessmachinelearningarchitecturesathelixadatasciencemilan-201216092137
An overview of the Helixa ML end-to-end system including serverless architectures, useful tools, and engineering practices.]]>

An overview of the Helixa ML end-to-end system including serverless architectures, useful tools, and engineering practices.]]>
Wed, 16 Dec 2020 09:21:37 GMT /slideshow/serverless-machine-learning-architectures-at-helixa-data-science-milan-240170403/240170403 ds_mi@slideshare.net(ds_mi) Serverless machine learning architectures at Helixa ds_mi An overview of the Helixa ML end-to-end system including serverless architectures, useful tools, and engineering practices. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/serverlessmachinelearningarchitecturesathelixadatasciencemilan-201216092137-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> An overview of the Helixa ML end-to-end system including serverless architectures, useful tools, and engineering practices.
Serverless machine learning architectures at Helixa from Data Science Milan
]]>
298 0 https://cdn.slidesharecdn.com/ss_thumbnails/serverlessmachinelearningarchitecturesathelixadatasciencemilan-201216092137-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
MLOps with a Feature Store: Filling the Gap in ML Infrastructure /ds_mi/mlops-with-a-feature-store-filling-the-gap-in-ml-infrastructure hopsworksmilandatasciencemeetup-200710130253
A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform. The first Feature Stores, developed at hyperscale AI companies such as Uber, Airbnb, and Facebook, enabled feature engineering using domain specific languages, providing abstractions tailored to the companies feature engineering domains. However, a general purpose Feature Store needs a general purpose feature engineering, feature selection, and feature transformation platform. In this talk, we describe how we built a general purpose, open-source Feature Store for ML around dataframes and Apache Spark. We will demonstrate how data engineers can transform and engineers features from backend databases and data lakes, while data scientists can use PySpark to select and transform features into train/test data in a file format of choice (.tfrecords, .npy, .petastorm, etc) on a file system of choice (S3, HDFS). Finally, we will show how the Feature Store enables end-to-end ML pipelines to be factored into feature engineering and data science stages that each can run at different cadences. Bio: Fabio Buso is the head of engineering at Logical Clocks AB, where he leads the Feature Store development. Fabio holds a master's degree in cloud computing and services with a focus on data intensive applications, awarded by a joint program between KTH Stockholm and TU Berlin. Topics: feature store, MLOps.]]>

A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform. The first Feature Stores, developed at hyperscale AI companies such as Uber, Airbnb, and Facebook, enabled feature engineering using domain specific languages, providing abstractions tailored to the companies feature engineering domains. However, a general purpose Feature Store needs a general purpose feature engineering, feature selection, and feature transformation platform. In this talk, we describe how we built a general purpose, open-source Feature Store for ML around dataframes and Apache Spark. We will demonstrate how data engineers can transform and engineers features from backend databases and data lakes, while data scientists can use PySpark to select and transform features into train/test data in a file format of choice (.tfrecords, .npy, .petastorm, etc) on a file system of choice (S3, HDFS). Finally, we will show how the Feature Store enables end-to-end ML pipelines to be factored into feature engineering and data science stages that each can run at different cadences. Bio: Fabio Buso is the head of engineering at Logical Clocks AB, where he leads the Feature Store development. Fabio holds a master's degree in cloud computing and services with a focus on data intensive applications, awarded by a joint program between KTH Stockholm and TU Berlin. Topics: feature store, MLOps.]]>
Fri, 10 Jul 2020 13:02:53 GMT /ds_mi/mlops-with-a-feature-store-filling-the-gap-in-ml-infrastructure ds_mi@slideshare.net(ds_mi) MLOps with a Feature Store: Filling the Gap in ML Infrastructure ds_mi A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform. The first Feature Stores, developed at hyperscale AI companies such as Uber, Airbnb, and Facebook, enabled feature engineering using domain specific languages, providing abstractions tailored to the companies feature engineering domains. However, a general purpose Feature Store needs a general purpose feature engineering, feature selection, and feature transformation platform. In this talk, we describe how we built a general purpose, open-source Feature Store for ML around dataframes and Apache Spark. We will demonstrate how data engineers can transform and engineers features from backend databases and data lakes, while data scientists can use PySpark to select and transform features into train/test data in a file format of choice (.tfrecords, .npy, .petastorm, etc) on a file system of choice (S3, HDFS). Finally, we will show how the Feature Store enables end-to-end ML pipelines to be factored into feature engineering and data science stages that each can run at different cadences. Bio: Fabio Buso is the head of engineering at Logical Clocks AB, where he leads the Feature Store development. Fabio holds a master's degree in cloud computing and services with a focus on data intensive applications, awarded by a joint program between KTH Stockholm and TU Berlin. Topics: feature store, MLOps. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/hopsworksmilandatasciencemeetup-200710130253-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> A Feature Store enables machine learning (ML) features to be registered, discovered, and used as part of ML pipelines, thus making it easier to transform and validate the training data that is fed into machine learning systems. Feature stores can also enable consistent engineering of features between training and inference, but to do so, they need a common data processing platform. The first Feature Stores, developed at hyperscale AI companies such as Uber, Airbnb, and Facebook, enabled feature engineering using domain specific languages, providing abstractions tailored to the companies feature engineering domains. However, a general purpose Feature Store needs a general purpose feature engineering, feature selection, and feature transformation platform. In this talk, we describe how we built a general purpose, open-source Feature Store for ML around dataframes and Apache Spark. We will demonstrate how data engineers can transform and engineers features from backend databases and data lakes, while data scientists can use PySpark to select and transform features into train/test data in a file format of choice (.tfrecords, .npy, .petastorm, etc) on a file system of choice (S3, HDFS). Finally, we will show how the Feature Store enables end-to-end ML pipelines to be factored into feature engineering and data science stages that each can run at different cadences. Bio: Fabio Buso is the head of engineering at Logical Clocks AB, where he leads the Feature Store development. Fabio holds a master&#39;s degree in cloud computing and services with a focus on data intensive applications, awarded by a joint program between KTH Stockholm and TU Berlin. Topics: feature store, MLOps.
MLOps with a Feature Store: Filling the Gap in ML Infrastructure from Data Science Milan
]]>
494 0 https://cdn.slidesharecdn.com/ss_thumbnails/hopsworksmilandatasciencemeetup-200710130253-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Reinforcement Learning Overview | Marco Del Pra /slideshow/reinforcement-learning-overview-marco-del-pra/236562468 reinforcementlearningpresentation-200703075303
Reinforcement Learning is a growing subset of Machine Learning and one of the most important frontiers of Artificial Intelligence. Its goal is to capture higher logic and use more adaptable algorithms than classical Machine Learning. Formally it denotes a set of algorithms that deal with sequential decision-making and have the potential capability to make highly intelligent decisions depending on their local environment. Reinforcement Learning problems can be described as an agent that has to make decisions in its environment in order to optimize a cumulative reward, and it is clear that this formalization applies to a great variety of tasks in many different fields. In this talk, the main features of the most important Reinforcement Learning algorithms will be illustrated and deepened, with some concrete and explanatory examples. Bio: Marco Del Pra Marco was born in Venice 41 years ago, has two master's degrees (Computer Science and Mathematics), and has two important publications in applied mathematics. He has been working in Artificial Intelligence for 10 years, mainly as a freelancer. Among others, he worked for the European Commission's Joint Research Center, for Cuebiq, and as Data Science Lead for Microsoft's Artificial Intelligence projects in Italy.]]>

Reinforcement Learning is a growing subset of Machine Learning and one of the most important frontiers of Artificial Intelligence. Its goal is to capture higher logic and use more adaptable algorithms than classical Machine Learning. Formally it denotes a set of algorithms that deal with sequential decision-making and have the potential capability to make highly intelligent decisions depending on their local environment. Reinforcement Learning problems can be described as an agent that has to make decisions in its environment in order to optimize a cumulative reward, and it is clear that this formalization applies to a great variety of tasks in many different fields. In this talk, the main features of the most important Reinforcement Learning algorithms will be illustrated and deepened, with some concrete and explanatory examples. Bio: Marco Del Pra Marco was born in Venice 41 years ago, has two master's degrees (Computer Science and Mathematics), and has two important publications in applied mathematics. He has been working in Artificial Intelligence for 10 years, mainly as a freelancer. Among others, he worked for the European Commission's Joint Research Center, for Cuebiq, and as Data Science Lead for Microsoft's Artificial Intelligence projects in Italy.]]>
Fri, 03 Jul 2020 07:53:03 GMT /slideshow/reinforcement-learning-overview-marco-del-pra/236562468 ds_mi@slideshare.net(ds_mi) Reinforcement Learning Overview | Marco Del Pra ds_mi Reinforcement Learning is a growing subset of Machine Learning and one of the most important frontiers of Artificial Intelligence. Its goal is to capture higher logic and use more adaptable algorithms than classical Machine Learning. Formally it denotes a set of algorithms that deal with sequential decision-making and have the potential capability to make highly intelligent decisions depending on their local environment. Reinforcement Learning problems can be described as an agent that has to make decisions in its environment in order to optimize a cumulative reward, and it is clear that this formalization applies to a great variety of tasks in many different fields. In this talk, the main features of the most important Reinforcement Learning algorithms will be illustrated and deepened, with some concrete and explanatory examples. Bio: Marco Del Pra Marco was born in Venice 41 years ago, has two master's degrees (Computer Science and Mathematics), and has two important publications in applied mathematics. He has been working in Artificial Intelligence for 10 years, mainly as a freelancer. Among others, he worked for the European Commission's Joint Research Center, for Cuebiq, and as Data Science Lead for Microsoft's Artificial Intelligence projects in Italy. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/reinforcementlearningpresentation-200703075303-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Reinforcement Learning is a growing subset of Machine Learning and one of the most important frontiers of Artificial Intelligence. Its goal is to capture higher logic and use more adaptable algorithms than classical Machine Learning. Formally it denotes a set of algorithms that deal with sequential decision-making and have the potential capability to make highly intelligent decisions depending on their local environment. Reinforcement Learning problems can be described as an agent that has to make decisions in its environment in order to optimize a cumulative reward, and it is clear that this formalization applies to a great variety of tasks in many different fields. In this talk, the main features of the most important Reinforcement Learning algorithms will be illustrated and deepened, with some concrete and explanatory examples. Bio: Marco Del Pra Marco was born in Venice 41 years ago, has two master&#39;s degrees (Computer Science and Mathematics), and has two important publications in applied mathematics. He has been working in Artificial Intelligence for 10 years, mainly as a freelancer. Among others, he worked for the European Commission&#39;s Joint Research Center, for Cuebiq, and as Data Science Lead for Microsoft&#39;s Artificial Intelligence projects in Italy.
Reinforcement Learning Overview | Marco Del Pra from Data Science Milan
]]>
188 0 https://cdn.slidesharecdn.com/ss_thumbnails/reinforcementlearningpresentation-200703075303-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Time Series Classification with Deep Learning | Marco Del Pra /slideshow/time-series-classification-with-deep-learning/233275454 deeplearningtimepresentation-200506105341
Today there are a lot of data that are stored in the form of time series, and with the actual large diffusion of real-time applications many areas are strongly increasing their interest in applications based on this kind of data, like for example finance, advertising, marketing, health care, automated disease detection, biometrics, retail, and identification of anomalies of any kind. It is therefore very interesting to understand the role and potential of machine learning in this sector. Many methods can be used for the classification of the time series, but all of them, apart from deep learning, require some kind of feature engineering as a separate stage before the classification is performed, and this can imply the loss of some important information and the increase of the development and test time. On the contrary, deep learning models such as recurrent and convolutional neural networks already incorporate this kind of feature engineering internally, optimizing it and eliminating the need to do it manually. Therefore they are able to extract information from the time series in a faster, more direct, and more complete way. Bio: Marco Del Pra I am 41 years old, I was born in Venice, I have 2 master's degrees (Computer Science and Mathematics). I have been working for about 10 years in Artificial Intelligence, first as Data Scientist, then as Team Leader and finally as Head of Data. Among others, I worked for Microsoft, for the European Commission (JRC of Ispra) and for Cuebiq. I am currently working as a freelancer and I am creating with 2 other cofounders an innovative AI startup. I have 2 important publications in applied mathematics. Topics: recurrent and convolutional neural networks, deep learning, time-series.]]>

Today there are a lot of data that are stored in the form of time series, and with the actual large diffusion of real-time applications many areas are strongly increasing their interest in applications based on this kind of data, like for example finance, advertising, marketing, health care, automated disease detection, biometrics, retail, and identification of anomalies of any kind. It is therefore very interesting to understand the role and potential of machine learning in this sector. Many methods can be used for the classification of the time series, but all of them, apart from deep learning, require some kind of feature engineering as a separate stage before the classification is performed, and this can imply the loss of some important information and the increase of the development and test time. On the contrary, deep learning models such as recurrent and convolutional neural networks already incorporate this kind of feature engineering internally, optimizing it and eliminating the need to do it manually. Therefore they are able to extract information from the time series in a faster, more direct, and more complete way. Bio: Marco Del Pra I am 41 years old, I was born in Venice, I have 2 master's degrees (Computer Science and Mathematics). I have been working for about 10 years in Artificial Intelligence, first as Data Scientist, then as Team Leader and finally as Head of Data. Among others, I worked for Microsoft, for the European Commission (JRC of Ispra) and for Cuebiq. I am currently working as a freelancer and I am creating with 2 other cofounders an innovative AI startup. I have 2 important publications in applied mathematics. Topics: recurrent and convolutional neural networks, deep learning, time-series.]]>
Wed, 06 May 2020 10:53:41 GMT /slideshow/time-series-classification-with-deep-learning/233275454 ds_mi@slideshare.net(ds_mi) Time Series Classification with Deep Learning | Marco Del Pra ds_mi Today there are a lot of data that are stored in the form of time series, and with the actual large diffusion of real-time applications many areas are strongly increasing their interest in applications based on this kind of data, like for example finance, advertising, marketing, health care, automated disease detection, biometrics, retail, and identification of anomalies of any kind. It is therefore very interesting to understand the role and potential of machine learning in this sector. Many methods can be used for the classification of the time series, but all of them, apart from deep learning, require some kind of feature engineering as a separate stage before the classification is performed, and this can imply the loss of some important information and the increase of the development and test time. On the contrary, deep learning models such as recurrent and convolutional neural networks already incorporate this kind of feature engineering internally, optimizing it and eliminating the need to do it manually. Therefore they are able to extract information from the time series in a faster, more direct, and more complete way. Bio: Marco Del Pra I am 41 years old, I was born in Venice, I have 2 master's degrees (Computer Science and Mathematics). I have been working for about 10 years in Artificial Intelligence, first as Data Scientist, then as Team Leader and finally as Head of Data. Among others, I worked for Microsoft, for the European Commission (JRC of Ispra) and for Cuebiq. I am currently working as a freelancer and I am creating with 2 other cofounders an innovative AI startup. I have 2 important publications in applied mathematics. Topics: recurrent and convolutional neural networks, deep learning, time-series. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/deeplearningtimepresentation-200506105341-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Today there are a lot of data that are stored in the form of time series, and with the actual large diffusion of real-time applications many areas are strongly increasing their interest in applications based on this kind of data, like for example finance, advertising, marketing, health care, automated disease detection, biometrics, retail, and identification of anomalies of any kind. It is therefore very interesting to understand the role and potential of machine learning in this sector. Many methods can be used for the classification of the time series, but all of them, apart from deep learning, require some kind of feature engineering as a separate stage before the classification is performed, and this can imply the loss of some important information and the increase of the development and test time. On the contrary, deep learning models such as recurrent and convolutional neural networks already incorporate this kind of feature engineering internally, optimizing it and eliminating the need to do it manually. Therefore they are able to extract information from the time series in a faster, more direct, and more complete way. Bio: Marco Del Pra I am 41 years old, I was born in Venice, I have 2 master&#39;s degrees (Computer Science and Mathematics). I have been working for about 10 years in Artificial Intelligence, first as Data Scientist, then as Team Leader and finally as Head of Data. Among others, I worked for Microsoft, for the European Commission (JRC of Ispra) and for Cuebiq. I am currently working as a freelancer and I am creating with 2 other cofounders an innovative AI startup. I have 2 important publications in applied mathematics. Topics: recurrent and convolutional neural networks, deep learning, time-series.
Time Series Classification with Deep Learning | Marco Del Pra from Data Science Milan
]]>
974 0 https://cdn.slidesharecdn.com/ss_thumbnails/deeplearningtimepresentation-200506105341-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI /slideshow/ludwig-a-codefree-deep-learning-toolbox-piero-molino-uber-ai/207202233 ludwigpresentation-191218125547
The talk will introduce Ludwig, a deep learning toolbox that allows to train models and to use them for prediction without the need to write code. It is unique in its ability to help make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. By using Ludwig, experts and researchers can simplify the prototyping process and streamline data processing so that they can focus on developing deep learning architectures. Bio: Piero Molino is a Senior Research Scientist at Uber AI with focus on machine learning for language and dialogue. Piero completed a PhD on Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, he became one of the founding members of Uber AI Labs.]]>

The talk will introduce Ludwig, a deep learning toolbox that allows to train models and to use them for prediction without the need to write code. It is unique in its ability to help make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. By using Ludwig, experts and researchers can simplify the prototyping process and streamline data processing so that they can focus on developing deep learning architectures. Bio: Piero Molino is a Senior Research Scientist at Uber AI with focus on machine learning for language and dialogue. Piero completed a PhD on Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, he became one of the founding members of Uber AI Labs.]]>
Wed, 18 Dec 2019 12:55:47 GMT /slideshow/ludwig-a-codefree-deep-learning-toolbox-piero-molino-uber-ai/207202233 ds_mi@slideshare.net(ds_mi) Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI ds_mi The talk will introduce Ludwig, a deep learning toolbox that allows to train models and to use them for prediction without the need to write code. It is unique in its ability to help make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. By using Ludwig, experts and researchers can simplify the prototyping process and streamline data processing so that they can focus on developing deep learning architectures. Bio: Piero Molino is a Senior Research Scientist at Uber AI with focus on machine learning for language and dialogue. Piero completed a PhD on Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, he became one of the founding members of Uber AI Labs. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ludwigpresentation-191218125547-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The talk will introduce Ludwig, a deep learning toolbox that allows to train models and to use them for prediction without the need to write code. It is unique in its ability to help make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike. By using Ludwig, experts and researchers can simplify the prototyping process and streamline data processing so that they can focus on developing deep learning architectures. Bio: Piero Molino is a Senior Research Scientist at Uber AI with focus on machine learning for language and dialogue. Piero completed a PhD on Question Answering at the University of Bari, Italy. Founded QuestionCube, a startup that built a framework for semantic search and QA. Worked for Yahoo Labs in Barcelona on learning to rank, IBM Watson in New York on natural language processing with deep learning and then joined Geometric Intelligence, where he worked on grounded language understanding. After Uber acquired Geometric Intelligence, he became one of the founding members of Uber AI Labs.
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI from Data Science Milan
]]>
589 8 https://cdn.slidesharecdn.com/ss_thumbnails/ludwigpresentation-191218125547-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Audience projection of target consumers over multiple domains a ner and bayesian approach, Gianmario Spacagna, Alberto Pirovano /slideshow/aaaa-188132680/188132680 audienceprojectionoftargetconsumersovermultipledomainsanerandbayesianapproach4-191029093807
Traditional market research is generally conducted by questionnaires or other forms of explicit feedback, directly asked to an ad hoc panel of individuals that in aggregate are representative of a larger group of people. Unfortunately, those traditional approaches are often invasive, nonscalable, and biased. Indirect approaches based on sparse and implicit consumer feedback (e.g., social network interactions, web browsing, or online purchases) are more scalable, authentic, and more suitable for real-time consumer insights. Although those sources of implicit consumer feedback provide relevant and detailed pictures of the population, they individually provide only a limited set of observable behaviors. The Holy Grail of market research is the ability to merge different sources of consumers interests into an augmented view that connects all the dots across multiple domains. Unfortunately, user-centric "fusion" algorithms present many limitations in the case of heterogeneous datasets strongly differing in terms of size and density and when the number of sources to merge increases. We propose a novel approach of Audience Projection able to define a target audience as a subset of the population in a source domain and to project this target to a set of users into a destination dataset. We will show how libraries such as spaCy can provide Deep Learning implementations for Named Entity Recognition (NER) to match related brands and we will use Bayesian Inference to transfer knowledge from the source domain. This way, we can estimate the probability of the user to belong to the target using the source distribution of volume of interests of common entities as model evidence and the source target size as prior probability. Bio: Gianmario Spacagna is the chief scientist and head of AI at Helixa. His teams mission is building the next generation of behavior algorithms and models of human decision making with careful attention to their potential and effects on society. His experience covers a diverse portfolio of machine learning algorithms and data products across different industries. Previously, he worked as a data scientist in IoT automotive (Pirelli Cyber Technology), retail and business banking (Barclays Analytics Centre of Excellence), threat intelligence (Cisco Talos), predictive marketing (AgilOne), plus some occasional freelancing. Hes a co-author of the book Python Deep Learning, contributor to the Professional Manifesto for Data Science, and founder of the Data Science Milan community. Gianmario holds a masters degree in telematics (Polytechnic of Turin) and software engineering of distributed systems (KTH of Stockholm). After having spent half of his career abroad, he now lives in Milan. His favorite hobbies include home cooking, hiking, and exploring the surrounding nature on his motorcycle. ]]>

Traditional market research is generally conducted by questionnaires or other forms of explicit feedback, directly asked to an ad hoc panel of individuals that in aggregate are representative of a larger group of people. Unfortunately, those traditional approaches are often invasive, nonscalable, and biased. Indirect approaches based on sparse and implicit consumer feedback (e.g., social network interactions, web browsing, or online purchases) are more scalable, authentic, and more suitable for real-time consumer insights. Although those sources of implicit consumer feedback provide relevant and detailed pictures of the population, they individually provide only a limited set of observable behaviors. The Holy Grail of market research is the ability to merge different sources of consumers interests into an augmented view that connects all the dots across multiple domains. Unfortunately, user-centric "fusion" algorithms present many limitations in the case of heterogeneous datasets strongly differing in terms of size and density and when the number of sources to merge increases. We propose a novel approach of Audience Projection able to define a target audience as a subset of the population in a source domain and to project this target to a set of users into a destination dataset. We will show how libraries such as spaCy can provide Deep Learning implementations for Named Entity Recognition (NER) to match related brands and we will use Bayesian Inference to transfer knowledge from the source domain. This way, we can estimate the probability of the user to belong to the target using the source distribution of volume of interests of common entities as model evidence and the source target size as prior probability. Bio: Gianmario Spacagna is the chief scientist and head of AI at Helixa. His teams mission is building the next generation of behavior algorithms and models of human decision making with careful attention to their potential and effects on society. His experience covers a diverse portfolio of machine learning algorithms and data products across different industries. Previously, he worked as a data scientist in IoT automotive (Pirelli Cyber Technology), retail and business banking (Barclays Analytics Centre of Excellence), threat intelligence (Cisco Talos), predictive marketing (AgilOne), plus some occasional freelancing. Hes a co-author of the book Python Deep Learning, contributor to the Professional Manifesto for Data Science, and founder of the Data Science Milan community. Gianmario holds a masters degree in telematics (Polytechnic of Turin) and software engineering of distributed systems (KTH of Stockholm). After having spent half of his career abroad, he now lives in Milan. His favorite hobbies include home cooking, hiking, and exploring the surrounding nature on his motorcycle. ]]>
Tue, 29 Oct 2019 09:38:07 GMT /slideshow/aaaa-188132680/188132680 ds_mi@slideshare.net(ds_mi) Audience projection of target consumers over multiple domains a ner and bayesian approach, Gianmario Spacagna, Alberto Pirovano ds_mi Traditional market research is generally conducted by questionnaires or other forms of explicit feedback, directly asked to an ad hoc panel of individuals that in aggregate are representative of a larger group of people. Unfortunately, those traditional approaches are often invasive, nonscalable, and biased. Indirect approaches based on sparse and implicit consumer feedback (e.g., social network interactions, web browsing, or online purchases) are more scalable, authentic, and more suitable for real-time consumer insights. Although those sources of implicit consumer feedback provide relevant and detailed pictures of the population, they individually provide only a limited set of observable behaviors. The Holy Grail of market research is the ability to merge different sources of consumers interests into an augmented view that connects all the dots across multiple domains. Unfortunately, user-centric "fusion" algorithms present many limitations in the case of heterogeneous datasets strongly differing in terms of size and density and when the number of sources to merge increases. We propose a novel approach of Audience Projection able to define a target audience as a subset of the population in a source domain and to project this target to a set of users into a destination dataset. We will show how libraries such as spaCy can provide Deep Learning implementations for Named Entity Recognition (NER) to match related brands and we will use Bayesian Inference to transfer knowledge from the source domain. This way, we can estimate the probability of the user to belong to the target using the source distribution of volume of interests of common entities as model evidence and the source target size as prior probability. Bio: Gianmario Spacagna is the chief scientist and head of AI at Helixa. His teams mission is building the next generation of behavior algorithms and models of human decision making with careful attention to their potential and effects on society. His experience covers a diverse portfolio of machine learning algorithms and data products across different industries. Previously, he worked as a data scientist in IoT automotive (Pirelli Cyber Technology), retail and business banking (Barclays Analytics Centre of Excellence), threat intelligence (Cisco Talos), predictive marketing (AgilOne), plus some occasional freelancing. Hes a co-author of the book Python Deep Learning, contributor to the Professional Manifesto for Data Science, and founder of the Data Science Milan community. Gianmario holds a masters degree in telematics (Polytechnic of Turin) and software engineering of distributed systems (KTH of Stockholm). After having spent half of his career abroad, he now lives in Milan. His favorite hobbies include home cooking, hiking, and exploring the surrounding nature on his motorcycle. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/audienceprojectionoftargetconsumersovermultipledomainsanerandbayesianapproach4-191029093807-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Traditional market research is generally conducted by questionnaires or other forms of explicit feedback, directly asked to an ad hoc panel of individuals that in aggregate are representative of a larger group of people. Unfortunately, those traditional approaches are often invasive, nonscalable, and biased. Indirect approaches based on sparse and implicit consumer feedback (e.g., social network interactions, web browsing, or online purchases) are more scalable, authentic, and more suitable for real-time consumer insights. Although those sources of implicit consumer feedback provide relevant and detailed pictures of the population, they individually provide only a limited set of observable behaviors. The Holy Grail of market research is the ability to merge different sources of consumers interests into an augmented view that connects all the dots across multiple domains. Unfortunately, user-centric &quot;fusion&quot; algorithms present many limitations in the case of heterogeneous datasets strongly differing in terms of size and density and when the number of sources to merge increases. We propose a novel approach of Audience Projection able to define a target audience as a subset of the population in a source domain and to project this target to a set of users into a destination dataset. We will show how libraries such as spaCy can provide Deep Learning implementations for Named Entity Recognition (NER) to match related brands and we will use Bayesian Inference to transfer knowledge from the source domain. This way, we can estimate the probability of the user to belong to the target using the source distribution of volume of interests of common entities as model evidence and the source target size as prior probability. Bio: Gianmario Spacagna is the chief scientist and head of AI at Helixa. His teams mission is building the next generation of behavior algorithms and models of human decision making with careful attention to their potential and effects on society. His experience covers a diverse portfolio of machine learning algorithms and data products across different industries. Previously, he worked as a data scientist in IoT automotive (Pirelli Cyber Technology), retail and business banking (Barclays Analytics Centre of Excellence), threat intelligence (Cisco Talos), predictive marketing (AgilOne), plus some occasional freelancing. Hes a co-author of the book Python Deep Learning, contributor to the Professional Manifesto for Data Science, and founder of the Data Science Milan community. Gianmario holds a masters degree in telematics (Polytechnic of Turin) and software engineering of distributed systems (KTH of Stockholm). After having spent half of his career abroad, he now lives in Milan. His favorite hobbies include home cooking, hiking, and exploring the surrounding nature on his motorcycle.
Audience projection of target consumers over multiple domains a ner and bayesian approach, Gianmario Spacagna, Alberto Pirovano from Data Science Milan
]]>
418 3 https://cdn.slidesharecdn.com/ss_thumbnails/audienceprojectionoftargetconsumersovermultipledomainsanerandbayesianapproach4-191029093807-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Weak supervised learning - Kristina Khvatova /slideshow/weak-supervised-learning-kristina-khvatova/153731713 weaksupervisedlearningpresentation-190705065445
Weakly Supervised Learning: Introduction and Best Practices In the talk we will introduce the definition of three main types of weakly supervised learning: incomplete, inexact and inaccurate; we examine how the models can be trained in case of weak supervision and view the real application of weakly supervised learning, how it can improve results and decrease the costs. Bio: Kristina Khvatova works as a Software Engineer at Softec S.p.A. Currently she is involved in the development of a project for data analysis and visualisation; it includes quantitative and qualitative analysis based on classification, optimisation, time series prediction, anomaly detection techniques. She obtained a master degree in Mathematics at the Saint-Petersburg State University and a master degree in Computer Science at the University of Milano-Bicocca. ]]>

Weakly Supervised Learning: Introduction and Best Practices In the talk we will introduce the definition of three main types of weakly supervised learning: incomplete, inexact and inaccurate; we examine how the models can be trained in case of weak supervision and view the real application of weakly supervised learning, how it can improve results and decrease the costs. Bio: Kristina Khvatova works as a Software Engineer at Softec S.p.A. Currently she is involved in the development of a project for data analysis and visualisation; it includes quantitative and qualitative analysis based on classification, optimisation, time series prediction, anomaly detection techniques. She obtained a master degree in Mathematics at the Saint-Petersburg State University and a master degree in Computer Science at the University of Milano-Bicocca. ]]>
Fri, 05 Jul 2019 06:54:45 GMT /slideshow/weak-supervised-learning-kristina-khvatova/153731713 ds_mi@slideshare.net(ds_mi) Weak supervised learning - Kristina Khvatova ds_mi Weakly Supervised Learning: Introduction and Best Practices In the talk we will introduce the definition of three main types of weakly supervised learning: incomplete, inexact and inaccurate; we examine how the models can be trained in case of weak supervision and view the real application of weakly supervised learning, how it can improve results and decrease the costs. Bio: Kristina Khvatova works as a Software Engineer at Softec S.p.A. Currently she is involved in the development of a project for data analysis and visualisation; it includes quantitative and qualitative analysis based on classification, optimisation, time series prediction, anomaly detection techniques. She obtained a master degree in Mathematics at the Saint-Petersburg State University and a master degree in Computer Science at the University of Milano-Bicocca. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/weaksupervisedlearningpresentation-190705065445-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Weakly Supervised Learning: Introduction and Best Practices In the talk we will introduce the definition of three main types of weakly supervised learning: incomplete, inexact and inaccurate; we examine how the models can be trained in case of weak supervision and view the real application of weakly supervised learning, how it can improve results and decrease the costs. Bio: Kristina Khvatova works as a Software Engineer at Softec S.p.A. Currently she is involved in the development of a project for data analysis and visualisation; it includes quantitative and qualitative analysis based on classification, optimisation, time series prediction, anomaly detection techniques. She obtained a master degree in Mathematics at the Saint-Petersburg State University and a master degree in Computer Science at the University of Milano-Bicocca.
Weak supervised learning - Kristina Khvatova from Data Science Milan
]]>
630 4 https://cdn.slidesharecdn.com/ss_thumbnails/weaksupervisedlearningpresentation-190705065445-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
GANs beyond nice pictures: real value of data generation, Alex Honchar /slideshow/gans-beyond-nice-pictures-real-value-of-data-generation-alex-honchar/133903841 dsm19-190301103616-190301180126
GANs beyond nice pictures: real value of data generation (theory and business applications) About the speaker, Alex Honchar: I am machine learning expert currently applying AI in medtech, fintech and other areas. I also enjoy teaching and blogging (50k+ views monthly) about deep learning applications. As an academia member, I have a track of scientific publications as well. Beside sciences, I travel, do sports and perform card magic.]]>

GANs beyond nice pictures: real value of data generation (theory and business applications) About the speaker, Alex Honchar: I am machine learning expert currently applying AI in medtech, fintech and other areas. I also enjoy teaching and blogging (50k+ views monthly) about deep learning applications. As an academia member, I have a track of scientific publications as well. Beside sciences, I travel, do sports and perform card magic.]]>
Fri, 01 Mar 2019 18:01:26 GMT /slideshow/gans-beyond-nice-pictures-real-value-of-data-generation-alex-honchar/133903841 ds_mi@slideshare.net(ds_mi) GANs beyond nice pictures: real value of data generation, Alex Honchar ds_mi GANs beyond nice pictures: real value of data generation (theory and business applications) About the speaker, Alex Honchar: I am machine learning expert currently applying AI in medtech, fintech and other areas. I also enjoy teaching and blogging (50k+ views monthly) about deep learning applications. As an academia member, I have a track of scientific publications as well. Beside sciences, I travel, do sports and perform card magic. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dsm19-190301103616-190301180126-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> GANs beyond nice pictures: real value of data generation (theory and business applications) About the speaker, Alex Honchar: I am machine learning expert currently applying AI in medtech, fintech and other areas. I also enjoy teaching and blogging (50k+ views monthly) about deep learning applications. As an academia member, I have a track of scientific publications as well. Beside sciences, I travel, do sports and perform card magic.
GANs beyond nice pictures: real value of data generation, Alex Honchar from Data Science Milan
]]>
574 3 https://cdn.slidesharecdn.com/ss_thumbnails/dsm19-190301103616-190301180126-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco /slideshow/continuallifelong-learning-with-deep-architectures-vincenzo-lomonaco/129827800 datasciencemilan-190130090802
Humans have the extraordinary ability to learn continually from experience. Not only can we apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of AI is building an artificial continually learning agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex skills and knowledge. "Continual Learning" (CL) is indeed a fast emerging topic in AI concerning the ability to efficiently improve the performance of a deep model over time, dealing with a long (and possibly unlimited) sequence of data/tasks. In this workshop, after a brief introduction of the topic, well implement different Continual Learning strategies and assess them on common vision benchmarks. Well conclude the workshop with a look at possible real world applications of CL. Vincenzo Lomonaco is a Deep Learning PhD student at the University of Bologna and founder of ContinualAI.org. He is also the PhD students representative at the Department of Computer Science of Engineering (DISI) and teaching assistant of the courses Machine Learning and Computer Architectures in the same department. Previously, he was a Machine Learning software engineer at IDL in-line Devices and a Master Student at the University of Bologna where he graduated cum laude in 2015 with the dissertation Deep Learning for Computer Vision: a Comparison Between CNNs and HTMs on Object Recognition Tasks". ]]>

Humans have the extraordinary ability to learn continually from experience. Not only can we apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of AI is building an artificial continually learning agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex skills and knowledge. "Continual Learning" (CL) is indeed a fast emerging topic in AI concerning the ability to efficiently improve the performance of a deep model over time, dealing with a long (and possibly unlimited) sequence of data/tasks. In this workshop, after a brief introduction of the topic, well implement different Continual Learning strategies and assess them on common vision benchmarks. Well conclude the workshop with a look at possible real world applications of CL. Vincenzo Lomonaco is a Deep Learning PhD student at the University of Bologna and founder of ContinualAI.org. He is also the PhD students representative at the Department of Computer Science of Engineering (DISI) and teaching assistant of the courses Machine Learning and Computer Architectures in the same department. Previously, he was a Machine Learning software engineer at IDL in-line Devices and a Master Student at the University of Bologna where he graduated cum laude in 2015 with the dissertation Deep Learning for Computer Vision: a Comparison Between CNNs and HTMs on Object Recognition Tasks". ]]>
Wed, 30 Jan 2019 09:08:02 GMT /slideshow/continuallifelong-learning-with-deep-architectures-vincenzo-lomonaco/129827800 ds_mi@slideshare.net(ds_mi) Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco ds_mi Humans have the extraordinary ability to learn continually from experience. Not only can we apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of AI is building an artificial continually learning agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex skills and knowledge. "Continual Learning" (CL) is indeed a fast emerging topic in AI concerning the ability to efficiently improve the performance of a deep model over time, dealing with a long (and possibly unlimited) sequence of data/tasks. In this workshop, after a brief introduction of the topic, well implement different Continual Learning strategies and assess them on common vision benchmarks. Well conclude the workshop with a look at possible real world applications of CL. Vincenzo Lomonaco is a Deep Learning PhD student at the University of Bologna and founder of ContinualAI.org. He is also the PhD students representative at the Department of Computer Science of Engineering (DISI) and teaching assistant of the courses Machine Learning and Computer Architectures in the same department. Previously, he was a Machine Learning software engineer at IDL in-line Devices and a Master Student at the University of Bologna where he graduated cum laude in 2015 with the dissertation Deep Learning for Computer Vision: a Comparison Between CNNs and HTMs on Object Recognition Tasks". <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/datasciencemilan-190130090802-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Humans have the extraordinary ability to learn continually from experience. Not only can we apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of AI is building an artificial continually learning agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex skills and knowledge. &quot;Continual Learning&quot; (CL) is indeed a fast emerging topic in AI concerning the ability to efficiently improve the performance of a deep model over time, dealing with a long (and possibly unlimited) sequence of data/tasks. In this workshop, after a brief introduction of the topic, well implement different Continual Learning strategies and assess them on common vision benchmarks. Well conclude the workshop with a look at possible real world applications of CL. Vincenzo Lomonaco is a Deep Learning PhD student at the University of Bologna and founder of ContinualAI.org. He is also the PhD students representative at the Department of Computer Science of Engineering (DISI) and teaching assistant of the courses Machine Learning and Computer Architectures in the same department. Previously, he was a Machine Learning software engineer at IDL in-line Devices and a Master Student at the University of Bologna where he graduated cum laude in 2015 with the dissertation Deep Learning for Computer Vision: a Comparison Between CNNs and HTMs on Object Recognition Tasks&quot;.
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco from Data Science Milan
]]>
313 4 https://cdn.slidesharecdn.com/ss_thumbnails/datasciencemilan-190130090802-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
3D Point Cloud analysis using Deep Learning /slideshow/3d-point-cloud-analysis-using-deep-learning/129479172 makingsensefrompointcloudsturinooct2018pdf-190127205644
Processing 3D images has many use cases. For example, to improve autonomous car driving, to enable digital conversions of old factory buildings, to enable augmented reality solutions for medical surgeries, etc. Also 3D images help in 3D modeling and safety evaluation of products. 3D image processing brings enormous benefits but also amplifies computing cost. The size of the point cloud, the number of points, sparse and irregular point cloud, and the adverse impact of the light reflections, (partial) occlusions, etc., make it difficult for engineers to process point clouds. Moving from using hand crafted features to using deep learning techniques to semantically segment the images, to classify objects, to detect objects, to detect actions in 3D videos, etc., we have come a long way in 3D image processing. 3D Point Cloud image processing is increasingly used to solve Industry 4.0 use cases to help architects, builders and product managers. I will share some of the innovations that are helping the progress of 3D point cloud processing. I will share the practical implementation issues we faced while developing deep learning models to make sense of 3D Point Clouds. Attendees: Beginners and Intermediate skilled in Image Processing and 3D Point Clouds Profile of the speaker: SK Reddy is the Chief Product Officer AI in Hexagon (www.hexagon.com). He is an AI and ML expert and a successful twice startup entrepreneur. He is an AI startup advisor too. Also he is a frequent speaker in conferences and is an AI blogger.]]>

Processing 3D images has many use cases. For example, to improve autonomous car driving, to enable digital conversions of old factory buildings, to enable augmented reality solutions for medical surgeries, etc. Also 3D images help in 3D modeling and safety evaluation of products. 3D image processing brings enormous benefits but also amplifies computing cost. The size of the point cloud, the number of points, sparse and irregular point cloud, and the adverse impact of the light reflections, (partial) occlusions, etc., make it difficult for engineers to process point clouds. Moving from using hand crafted features to using deep learning techniques to semantically segment the images, to classify objects, to detect objects, to detect actions in 3D videos, etc., we have come a long way in 3D image processing. 3D Point Cloud image processing is increasingly used to solve Industry 4.0 use cases to help architects, builders and product managers. I will share some of the innovations that are helping the progress of 3D point cloud processing. I will share the practical implementation issues we faced while developing deep learning models to make sense of 3D Point Clouds. Attendees: Beginners and Intermediate skilled in Image Processing and 3D Point Clouds Profile of the speaker: SK Reddy is the Chief Product Officer AI in Hexagon (www.hexagon.com). He is an AI and ML expert and a successful twice startup entrepreneur. He is an AI startup advisor too. Also he is a frequent speaker in conferences and is an AI blogger.]]>
Sun, 27 Jan 2019 20:56:44 GMT /slideshow/3d-point-cloud-analysis-using-deep-learning/129479172 ds_mi@slideshare.net(ds_mi) 3D Point Cloud analysis using Deep Learning ds_mi Processing 3D images has many use cases. For example, to improve autonomous car driving, to enable digital conversions of old factory buildings, to enable augmented reality solutions for medical surgeries, etc. Also 3D images help in 3D modeling and safety evaluation of products. 3D image processing brings enormous benefits but also amplifies computing cost. The size of the point cloud, the number of points, sparse and irregular point cloud, and the adverse impact of the light reflections, (partial) occlusions, etc., make it difficult for engineers to process point clouds. Moving from using hand crafted features to using deep learning techniques to semantically segment the images, to classify objects, to detect objects, to detect actions in 3D videos, etc., we have come a long way in 3D image processing. 3D Point Cloud image processing is increasingly used to solve Industry 4.0 use cases to help architects, builders and product managers. I will share some of the innovations that are helping the progress of 3D point cloud processing. I will share the practical implementation issues we faced while developing deep learning models to make sense of 3D Point Clouds. Attendees: Beginners and Intermediate skilled in Image Processing and 3D Point Clouds Profile of the speaker: SK Reddy is the Chief Product Officer AI in Hexagon (www.hexagon.com). He is an AI and ML expert and a successful twice startup entrepreneur. He is an AI startup advisor too. Also he is a frequent speaker in conferences and is an AI blogger. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/makingsensefrompointcloudsturinooct2018pdf-190127205644-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Processing 3D images has many use cases. For example, to improve autonomous car driving, to enable digital conversions of old factory buildings, to enable augmented reality solutions for medical surgeries, etc. Also 3D images help in 3D modeling and safety evaluation of products. 3D image processing brings enormous benefits but also amplifies computing cost. The size of the point cloud, the number of points, sparse and irregular point cloud, and the adverse impact of the light reflections, (partial) occlusions, etc., make it difficult for engineers to process point clouds. Moving from using hand crafted features to using deep learning techniques to semantically segment the images, to classify objects, to detect objects, to detect actions in 3D videos, etc., we have come a long way in 3D image processing. 3D Point Cloud image processing is increasingly used to solve Industry 4.0 use cases to help architects, builders and product managers. I will share some of the innovations that are helping the progress of 3D point cloud processing. I will share the practical implementation issues we faced while developing deep learning models to make sense of 3D Point Clouds. Attendees: Beginners and Intermediate skilled in Image Processing and 3D Point Clouds Profile of the speaker: SK Reddy is the Chief Product Officer AI in Hexagon (www.hexagon.com). He is an AI and ML expert and a successful twice startup entrepreneur. He is an AI startup advisor too. Also he is a frequent speaker in conferences and is an AI blogger.
3D Point Cloud analysis using Deep Learning from Data Science Milan
]]>
1052 3 https://cdn.slidesharecdn.com/ss_thumbnails/makingsensefrompointcloudsturinooct2018pdf-190127205644-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Deep time-to-failure: predicting failures, churns and customer lifetime with RNN by Gianmario Spacagna, Chief Scientist at Cubeyou AI /slideshow/deep-timetofailure/123779804 deeptime-to-failure-181123095551
The notebook and documentation of the original tutorial is available at https://github.com/gm-spacagna/deep-ttf. Deep Time-to-Failure: predicting failures, churns and customer lifetime using recurrent neural networks. Machineries and customers are among the most valuable assets for many businesses. A common trait of these assets is that sooner or later they will fail or, in the case of customers, they will churn. In order to catch those failure events we would ideally consider the whole history of the machine/customer available information and learn smart representations of the system status over time. Traditional machine learning and statistical models approach the prediction of time-to-failure, aka. expected lifetime, as a supervised regression problem using handcrafted features. Training those models is hard because of three main reasons: The complexity of extracting predictive features from time-series without overfitting. The difficulty of modeling uncertainty and confidence levels in the predictions. The scarcity of labeled data, failure events are by definition rare and that results in highly unbalanced training datasets. The first issue can be solved adopting recurrent neural architectures. A solution to the the last two problems could be to exploit censored data and to build survival regression models. In this talk we will present a novel technique based on recurrent neural networks that can turn any length-variable sequence of data into a probability distribution representing the estimated remaining time to the failure event. The network will be trained in presence of ground truth as well as with right-censored data. We will demonstrate using a case study regarding 100 jet engine simulated degradation provided by NASA. During the tutorial you will learn: What is Survival Analysis and what are the most popular Survival Regression techniques. How a Weibull distribution can be used as generic distribution for modeling Time-to-Failure events. How to build a deep learning algorithm in Keras leveraging recurrent units (LSTM or GRU) that can map raw time-series of covariates into Weibull probability distributions. The tutorial will also cover a few common pitfalls, visualizations and evaluation tools useful for testing and adapting this approach to generic use cases. You are free to bring your laptop if you would like to do some live coding and experiment yourself. In this case we strongly encourage to check you have all of the requirements installed in your machine. More details on the required packages can be found on the Github repository gm-spacagna/deep-ttf.]]>

The notebook and documentation of the original tutorial is available at https://github.com/gm-spacagna/deep-ttf. Deep Time-to-Failure: predicting failures, churns and customer lifetime using recurrent neural networks. Machineries and customers are among the most valuable assets for many businesses. A common trait of these assets is that sooner or later they will fail or, in the case of customers, they will churn. In order to catch those failure events we would ideally consider the whole history of the machine/customer available information and learn smart representations of the system status over time. Traditional machine learning and statistical models approach the prediction of time-to-failure, aka. expected lifetime, as a supervised regression problem using handcrafted features. Training those models is hard because of three main reasons: The complexity of extracting predictive features from time-series without overfitting. The difficulty of modeling uncertainty and confidence levels in the predictions. The scarcity of labeled data, failure events are by definition rare and that results in highly unbalanced training datasets. The first issue can be solved adopting recurrent neural architectures. A solution to the the last two problems could be to exploit censored data and to build survival regression models. In this talk we will present a novel technique based on recurrent neural networks that can turn any length-variable sequence of data into a probability distribution representing the estimated remaining time to the failure event. The network will be trained in presence of ground truth as well as with right-censored data. We will demonstrate using a case study regarding 100 jet engine simulated degradation provided by NASA. During the tutorial you will learn: What is Survival Analysis and what are the most popular Survival Regression techniques. How a Weibull distribution can be used as generic distribution for modeling Time-to-Failure events. How to build a deep learning algorithm in Keras leveraging recurrent units (LSTM or GRU) that can map raw time-series of covariates into Weibull probability distributions. The tutorial will also cover a few common pitfalls, visualizations and evaluation tools useful for testing and adapting this approach to generic use cases. You are free to bring your laptop if you would like to do some live coding and experiment yourself. In this case we strongly encourage to check you have all of the requirements installed in your machine. More details on the required packages can be found on the Github repository gm-spacagna/deep-ttf.]]>
Fri, 23 Nov 2018 09:55:51 GMT /slideshow/deep-timetofailure/123779804 ds_mi@slideshare.net(ds_mi) Deep time-to-failure: predicting failures, churns and customer lifetime with RNN by Gianmario Spacagna, Chief Scientist at Cubeyou AI ds_mi The notebook and documentation of the original tutorial is available at https://github.com/gm-spacagna/deep-ttf. Deep Time-to-Failure: predicting failures, churns and customer lifetime using recurrent neural networks. Machineries and customers are among the most valuable assets for many businesses. A common trait of these assets is that sooner or later they will fail or, in the case of customers, they will churn. In order to catch those failure events we would ideally consider the whole history of the machine/customer available information and learn smart representations of the system status over time. Traditional machine learning and statistical models approach the prediction of time-to-failure, aka. expected lifetime, as a supervised regression problem using handcrafted features. Training those models is hard because of three main reasons: The complexity of extracting predictive features from time-series without overfitting. The difficulty of modeling uncertainty and confidence levels in the predictions. The scarcity of labeled data, failure events are by definition rare and that results in highly unbalanced training datasets. The first issue can be solved adopting recurrent neural architectures. A solution to the the last two problems could be to exploit censored data and to build survival regression models. In this talk we will present a novel technique based on recurrent neural networks that can turn any length-variable sequence of data into a probability distribution representing the estimated remaining time to the failure event. The network will be trained in presence of ground truth as well as with right-censored data. We will demonstrate using a case study regarding 100 jet engine simulated degradation provided by NASA. During the tutorial you will learn: What is Survival Analysis and what are the most popular Survival Regression techniques. How a Weibull distribution can be used as generic distribution for modeling Time-to-Failure events. How to build a deep learning algorithm in Keras leveraging recurrent units (LSTM or GRU) that can map raw time-series of covariates into Weibull probability distributions. The tutorial will also cover a few common pitfalls, visualizations and evaluation tools useful for testing and adapting this approach to generic use cases. You are free to bring your laptop if you would like to do some live coding and experiment yourself. In this case we strongly encourage to check you have all of the requirements installed in your machine. More details on the required packages can be found on the Github repository gm-spacagna/deep-ttf. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/deeptime-to-failure-181123095551-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The notebook and documentation of the original tutorial is available at https://github.com/gm-spacagna/deep-ttf. Deep Time-to-Failure: predicting failures, churns and customer lifetime using recurrent neural networks. Machineries and customers are among the most valuable assets for many businesses. A common trait of these assets is that sooner or later they will fail or, in the case of customers, they will churn. In order to catch those failure events we would ideally consider the whole history of the machine/customer available information and learn smart representations of the system status over time. Traditional machine learning and statistical models approach the prediction of time-to-failure, aka. expected lifetime, as a supervised regression problem using handcrafted features. Training those models is hard because of three main reasons: The complexity of extracting predictive features from time-series without overfitting. The difficulty of modeling uncertainty and confidence levels in the predictions. The scarcity of labeled data, failure events are by definition rare and that results in highly unbalanced training datasets. The first issue can be solved adopting recurrent neural architectures. A solution to the the last two problems could be to exploit censored data and to build survival regression models. In this talk we will present a novel technique based on recurrent neural networks that can turn any length-variable sequence of data into a probability distribution representing the estimated remaining time to the failure event. The network will be trained in presence of ground truth as well as with right-censored data. We will demonstrate using a case study regarding 100 jet engine simulated degradation provided by NASA. During the tutorial you will learn: What is Survival Analysis and what are the most popular Survival Regression techniques. How a Weibull distribution can be used as generic distribution for modeling Time-to-Failure events. How to build a deep learning algorithm in Keras leveraging recurrent units (LSTM or GRU) that can map raw time-series of covariates into Weibull probability distributions. The tutorial will also cover a few common pitfalls, visualizations and evaluation tools useful for testing and adapting this approach to generic use cases. You are free to bring your laptop if you would like to do some live coding and experiment yourself. In this case we strongly encourage to check you have all of the requirements installed in your machine. More details on the required packages can be found on the Github repository gm-spacagna/deep-ttf.
Deep time-to-failure: predicting failures, churns and customer lifetime with RNN by Gianmario Spacagna, Chief Scientist at Cubeyou AI from Data Science Milan
]]>
8194 4 https://cdn.slidesharecdn.com/ss_thumbnails/deeptime-to-failure-181123095551-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro Panebianco /ds_mi/50-shades-of-text-leveraging-natural-language-processing-nlp-alessandro-panebianco nlp-180624170025
50 Shades of Text - Leveraging Natural Language Processing (NLP) to validate, improve, and expand the functionalities of a product Nowadays, every company either stores or produces text data: from web logs and user queries, to translations and support tickets, yet not everyone knows how to extract valuable insights from it. In this session, we will present a practical case on how to move from raw text data to a valuable business application leveraging upon some of the major NLP methodologies (word embedding, word2vec, doc2vec, fastText, etc.) Bio: Alessandro is a data veteran. He holds two Masters degrees in computer engineering, one from Politecnico di Milano and the other from University of Illinois at Chicago (UIC). He started his career in data consultancy, where he mastered Apache Spark for Machine Learning projects and subsequently joined WW Grainger, one of the largest MRO e-commerce companies in the United States. In September 2017, after more than 5 years in the USA, Alessandro returned to his native country, Italy, where he is now leading a team of data scientists. His current work focuses on achieving energy efficiency through the automation of energy management processes for commercial customers.]]>

50 Shades of Text - Leveraging Natural Language Processing (NLP) to validate, improve, and expand the functionalities of a product Nowadays, every company either stores or produces text data: from web logs and user queries, to translations and support tickets, yet not everyone knows how to extract valuable insights from it. In this session, we will present a practical case on how to move from raw text data to a valuable business application leveraging upon some of the major NLP methodologies (word embedding, word2vec, doc2vec, fastText, etc.) Bio: Alessandro is a data veteran. He holds two Masters degrees in computer engineering, one from Politecnico di Milano and the other from University of Illinois at Chicago (UIC). He started his career in data consultancy, where he mastered Apache Spark for Machine Learning projects and subsequently joined WW Grainger, one of the largest MRO e-commerce companies in the United States. In September 2017, after more than 5 years in the USA, Alessandro returned to his native country, Italy, where he is now leading a team of data scientists. His current work focuses on achieving energy efficiency through the automation of energy management processes for commercial customers.]]>
Sun, 24 Jun 2018 17:00:25 GMT /ds_mi/50-shades-of-text-leveraging-natural-language-processing-nlp-alessandro-panebianco ds_mi@slideshare.net(ds_mi) 50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro Panebianco ds_mi 50 Shades of Text - Leveraging Natural Language Processing (NLP) to validate, improve, and expand the functionalities of a product Nowadays, every company either stores or produces text data: from web logs and user queries, to translations and support tickets, yet not everyone knows how to extract valuable insights from it. In this session, we will present a practical case on how to move from raw text data to a valuable business application leveraging upon some of the major NLP methodologies (word embedding, word2vec, doc2vec, fastText, etc.) Bio: Alessandro is a data veteran. He holds two Masters degrees in computer engineering, one from Politecnico di Milano and the other from University of Illinois at Chicago (UIC). He started his career in data consultancy, where he mastered Apache Spark for Machine Learning projects and subsequently joined WW Grainger, one of the largest MRO e-commerce companies in the United States. In September 2017, after more than 5 years in the USA, Alessandro returned to his native country, Italy, where he is now leading a team of data scientists. His current work focuses on achieving energy efficiency through the automation of energy management processes for commercial customers. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/nlp-180624170025-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> 50 Shades of Text - Leveraging Natural Language Processing (NLP) to validate, improve, and expand the functionalities of a product Nowadays, every company either stores or produces text data: from web logs and user queries, to translations and support tickets, yet not everyone knows how to extract valuable insights from it. In this session, we will present a practical case on how to move from raw text data to a valuable business application leveraging upon some of the major NLP methodologies (word embedding, word2vec, doc2vec, fastText, etc.) Bio: Alessandro is a data veteran. He holds two Masters degrees in computer engineering, one from Politecnico di Milano and the other from University of Illinois at Chicago (UIC). He started his career in data consultancy, where he mastered Apache Spark for Machine Learning projects and subsequently joined WW Grainger, one of the largest MRO e-commerce companies in the United States. In September 2017, after more than 5 years in the USA, Alessandro returned to his native country, Italy, where he is now leading a team of data scientists. His current work focuses on achieving energy efficiency through the automation of energy management processes for commercial customers.
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro Panebianco from Data Science Milan
]]>
569 4 https://cdn.slidesharecdn.com/ss_thumbnails/nlp-180624170025-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply /slideshow/pricing-optimization-closeout-online-and-renewal-strategies-data-reply/97722063 20171211dsmeetupdatareplyclean-180520074524
Product close-out strategy by Ilaria Gianoli, Data Scientist, Data Reply Abstract: How to deal with products in their decline phase? Ilaria will share her experience in optimizing the close-out strategy for a multinational retail leader, with a particular focus on the price optimization. Bio: Ilaria is a Data Scientist at Data Reply, where she works as a consultant across different industries, in particular in the Retail. She uses her mathematical, statistical and machine learning background to turning data into business opportunities. She also works closely to the business to provide quantitative support for decision making, adapting the complexity of the mathematical models to customer needs. She holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano. Online pricing: from theory to application by Giovanni Corradini, Data Scientist, Data Reply Abstract: Multi-Armed Bandit algorithms are populating the world of e-commerce. How do they work? Giovanni will share the basic of this field and an application of a state-of-the-art algorithm on real world simulation of the ticket industry. Bio: Giovanni is a Data Scientist at Data Reply. He holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano. He has a background in statistics, machine learning and data mining and he provides decision making support to industries in many different fields. Renewal Price Optimization for Subscription products by Riccardo Lorenzon, Data Scientist, Data Reply Abstract: We are observing a huge shift in modern economy from a pay-per-product model to a subscription-based model. When it comes to pricing strategies, it is important both to close the single deal and monetize long-term relationships with the customer. Riccardo will present an application of subscription renewal pricing optimization models for a company belonging to the publishing industry. Bio: Riccardo holds a MSc in Mathematical Models for Decision Making from Politecnico di Milano. He developed hands-on experience on end-to-end data projects across multiple industries. His proactive creativity helps him be very effective in the business case design and early stages of projects.]]>

Product close-out strategy by Ilaria Gianoli, Data Scientist, Data Reply Abstract: How to deal with products in their decline phase? Ilaria will share her experience in optimizing the close-out strategy for a multinational retail leader, with a particular focus on the price optimization. Bio: Ilaria is a Data Scientist at Data Reply, where she works as a consultant across different industries, in particular in the Retail. She uses her mathematical, statistical and machine learning background to turning data into business opportunities. She also works closely to the business to provide quantitative support for decision making, adapting the complexity of the mathematical models to customer needs. She holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano. Online pricing: from theory to application by Giovanni Corradini, Data Scientist, Data Reply Abstract: Multi-Armed Bandit algorithms are populating the world of e-commerce. How do they work? Giovanni will share the basic of this field and an application of a state-of-the-art algorithm on real world simulation of the ticket industry. Bio: Giovanni is a Data Scientist at Data Reply. He holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano. He has a background in statistics, machine learning and data mining and he provides decision making support to industries in many different fields. Renewal Price Optimization for Subscription products by Riccardo Lorenzon, Data Scientist, Data Reply Abstract: We are observing a huge shift in modern economy from a pay-per-product model to a subscription-based model. When it comes to pricing strategies, it is important both to close the single deal and monetize long-term relationships with the customer. Riccardo will present an application of subscription renewal pricing optimization models for a company belonging to the publishing industry. Bio: Riccardo holds a MSc in Mathematical Models for Decision Making from Politecnico di Milano. He developed hands-on experience on end-to-end data projects across multiple industries. His proactive creativity helps him be very effective in the business case design and early stages of projects.]]>
Sun, 20 May 2018 07:45:24 GMT /slideshow/pricing-optimization-closeout-online-and-renewal-strategies-data-reply/97722063 ds_mi@slideshare.net(ds_mi) Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply ds_mi Product close-out strategy by Ilaria Gianoli, Data Scientist, Data Reply Abstract: How to deal with products in their decline phase? Ilaria will share her experience in optimizing the close-out strategy for a multinational retail leader, with a particular focus on the price optimization. Bio: Ilaria is a Data Scientist at Data Reply, where she works as a consultant across different industries, in particular in the Retail. She uses her mathematical, statistical and machine learning background to turning data into business opportunities. She also works closely to the business to provide quantitative support for decision making, adapting the complexity of the mathematical models to customer needs. She holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano. Online pricing: from theory to application by Giovanni Corradini, Data Scientist, Data Reply Abstract: Multi-Armed Bandit algorithms are populating the world of e-commerce. How do they work? Giovanni will share the basic of this field and an application of a state-of-the-art algorithm on real world simulation of the ticket industry. Bio: Giovanni is a Data Scientist at Data Reply. He holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano. He has a background in statistics, machine learning and data mining and he provides decision making support to industries in many different fields. Renewal Price Optimization for Subscription products by Riccardo Lorenzon, Data Scientist, Data Reply Abstract: We are observing a huge shift in modern economy from a pay-per-product model to a subscription-based model. When it comes to pricing strategies, it is important both to close the single deal and monetize long-term relationships with the customer. Riccardo will present an application of subscription renewal pricing optimization models for a company belonging to the publishing industry. Bio: Riccardo holds a MSc in Mathematical Models for Decision Making from Politecnico di Milano. He developed hands-on experience on end-to-end data projects across multiple industries. His proactive creativity helps him be very effective in the business case design and early stages of projects. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/20171211dsmeetupdatareplyclean-180520074524-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Product close-out strategy by Ilaria Gianoli, Data Scientist, Data Reply Abstract: How to deal with products in their decline phase? Ilaria will share her experience in optimizing the close-out strategy for a multinational retail leader, with a particular focus on the price optimization. Bio: Ilaria is a Data Scientist at Data Reply, where she works as a consultant across different industries, in particular in the Retail. She uses her mathematical, statistical and machine learning background to turning data into business opportunities. She also works closely to the business to provide quantitative support for decision making, adapting the complexity of the mathematical models to customer needs. She holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano. Online pricing: from theory to application by Giovanni Corradini, Data Scientist, Data Reply Abstract: Multi-Armed Bandit algorithms are populating the world of e-commerce. How do they work? Giovanni will share the basic of this field and an application of a state-of-the-art algorithm on real world simulation of the ticket industry. Bio: Giovanni is a Data Scientist at Data Reply. He holds a MSc in Applied Statistics - Mathematical Engineering from Politecnico di Milano. He has a background in statistics, machine learning and data mining and he provides decision making support to industries in many different fields. Renewal Price Optimization for Subscription products by Riccardo Lorenzon, Data Scientist, Data Reply Abstract: We are observing a huge shift in modern economy from a pay-per-product model to a subscription-based model. When it comes to pricing strategies, it is important both to close the single deal and monetize long-term relationships with the customer. Riccardo will present an application of subscription renewal pricing optimization models for a company belonging to the publishing industry. Bio: Riccardo holds a MSc in Mathematical Models for Decision Making from Politecnico di Milano. He developed hands-on experience on end-to-end data projects across multiple industries. His proactive creativity helps him be very effective in the business case design and early stages of projects.
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply from Data Science Milan
]]>
371 2 https://cdn.slidesharecdn.com/ss_thumbnails/20171211dsmeetupdatareplyclean-180520074524-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrigoni, Senior Data Scientist, Pirelli (pirelli.com) /slideshow/how-pirelli-uses-domino-and-plotly-for-smart-manufacturing-by-alberto-arrigoni-senior-data-scientist-pirelli-pirellicom/80661467 smartmanpirellidominoplotly-171010172744
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrigoni, Senior Data Scientist, Pirelli (pirelli.com) Abstract: Pirelli, a global performance tire manufacturer, uses data science in its 20 factories to improve quality and efficiency, and reduce energy consumption. For this Smart Manufacturing initiative, Pirellis data science team has developed predictive models and analytics tools to monitor processes, machines and materials on the factory floors. In this talk we will show some of the solutions we deploy, demonstrate how we used Dominos data science platform and Plot.ly to build these solutions, and discuss the next steps in this journey towards predictive maintenance. Bio: Alberto Arrigoni is a data scientist at Pirelli, where he works to process sensors and telemetry data for IoT, Smart Factories and connected-vehicle applications. He works closely with all major business units such as R&D, industrial engineering and BI to develop tailored machine learning algorithms and production systems. He holds a PhD in biostatistics from the University of Milan Bicocca and prior to joining Pirelli was a staff data scientist at the National Institute of Molecular Genetics (Milan), as well as a Fulbright student at the Santa Clara University and visiting PhD student at Pacific Biosciences (Menlo Park, CA).]]>

"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrigoni, Senior Data Scientist, Pirelli (pirelli.com) Abstract: Pirelli, a global performance tire manufacturer, uses data science in its 20 factories to improve quality and efficiency, and reduce energy consumption. For this Smart Manufacturing initiative, Pirellis data science team has developed predictive models and analytics tools to monitor processes, machines and materials on the factory floors. In this talk we will show some of the solutions we deploy, demonstrate how we used Dominos data science platform and Plot.ly to build these solutions, and discuss the next steps in this journey towards predictive maintenance. Bio: Alberto Arrigoni is a data scientist at Pirelli, where he works to process sensors and telemetry data for IoT, Smart Factories and connected-vehicle applications. He works closely with all major business units such as R&D, industrial engineering and BI to develop tailored machine learning algorithms and production systems. He holds a PhD in biostatistics from the University of Milan Bicocca and prior to joining Pirelli was a staff data scientist at the National Institute of Molecular Genetics (Milan), as well as a Fulbright student at the Santa Clara University and visiting PhD student at Pacific Biosciences (Menlo Park, CA).]]>
Tue, 10 Oct 2017 17:27:44 GMT /slideshow/how-pirelli-uses-domino-and-plotly-for-smart-manufacturing-by-alberto-arrigoni-senior-data-scientist-pirelli-pirellicom/80661467 ds_mi@slideshare.net(ds_mi) "How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrigoni, Senior Data Scientist, Pirelli (pirelli.com) ds_mi "How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrigoni, Senior Data Scientist, Pirelli (pirelli.com) Abstract: Pirelli, a global performance tire manufacturer, uses data science in its 20 factories to improve quality and efficiency, and reduce energy consumption. For this Smart Manufacturing initiative, Pirellis data science team has developed predictive models and analytics tools to monitor processes, machines and materials on the factory floors. In this talk we will show some of the solutions we deploy, demonstrate how we used Dominos data science platform and Plot.ly to build these solutions, and discuss the next steps in this journey towards predictive maintenance. Bio: Alberto Arrigoni is a data scientist at Pirelli, where he works to process sensors and telemetry data for IoT, Smart Factories and connected-vehicle applications. He works closely with all major business units such as R&D, industrial engineering and BI to develop tailored machine learning algorithms and production systems. He holds a PhD in biostatistics from the University of Milan Bicocca and prior to joining Pirelli was a staff data scientist at the National Institute of Molecular Genetics (Milan), as well as a Fulbright student at the Santa Clara University and visiting PhD student at Pacific Biosciences (Menlo Park, CA). <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/smartmanpirellidominoplotly-171010172744-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> &quot;How Pirelli uses Domino and Plotly for Smart Manufacturing&quot; by Alberto Arrigoni, Senior Data Scientist, Pirelli (pirelli.com) Abstract: Pirelli, a global performance tire manufacturer, uses data science in its 20 factories to improve quality and efficiency, and reduce energy consumption. For this Smart Manufacturing initiative, Pirellis data science team has developed predictive models and analytics tools to monitor processes, machines and materials on the factory floors. In this talk we will show some of the solutions we deploy, demonstrate how we used Dominos data science platform and Plot.ly to build these solutions, and discuss the next steps in this journey towards predictive maintenance. Bio: Alberto Arrigoni is a data scientist at Pirelli, where he works to process sensors and telemetry data for IoT, Smart Factories and connected-vehicle applications. He works closely with all major business units such as R&amp;D, industrial engineering and BI to develop tailored machine learning algorithms and production systems. He holds a PhD in biostatistics from the University of Milan Bicocca and prior to joining Pirelli was a staff data scientist at the National Institute of Molecular Genetics (Milan), as well as a Fulbright student at the Santa Clara University and visiting PhD student at Pacific Biosciences (Menlo Park, CA).
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrigoni, Senior Data Scientist, Pirelli (pirelli.com) from Data Science Milan
]]>
1367 2 https://cdn.slidesharecdn.com/ss_thumbnails/smartmanpirellidominoplotly-171010172744-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-ds_mi-48x48.jpg?cb=1676547533 Data Science Milan aims to gather a community of data scientists, practitioners and enthusiasts based in the Milan area to discuss and promote methodologies, algorithms, tools & technologies. We are an independent group with the only goal of promoting and pioneering knowledge and innovation of the data-driven revolution in the italian peninsula and beyond. We encourage international collaboration, sharing and open source tools. The official language of our events, talks and communication is English. Everyone who is involved in data science projects or want to undertake this career is welcome to join. Feel free to submit your talk proposals, initiatives, projects or topic discussi www.meetup.com/Data-Science-Milan/ https://cdn.slidesharecdn.com/ss_thumbnails/20230207datascienemilannexi-230216113550-2bd6a8b0-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/ml-graph-algorithms-to-prevent-financial-crime-in-digital-payments/255886804 ML &amp; Graph algorithms... https://cdn.slidesharecdn.com/ss_thumbnails/ecimauropelucchi-milandatasciencemeetup-october2022-221030214805-fdfbf821-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/how-to-use-the-economic-complexity-index-to-guide-innovation-plans/253889590 How to use the Economi... https://cdn.slidesharecdn.com/ss_thumbnails/20220721aiguilddinner-220802053509-b8df77ce-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/robustness-metrics-for-ml-models-based-on-deep-learning-methods/252396706 Robustness Metrics for...