Keras with Tensorflow backend can be used for neural networks and deep learning in both R and Python. The document discusses using Keras to build neural networks from scratch on MNIST data, using pre-trained models like VGG16 for computer vision tasks, and fine-tuning pre-trained models on limited data. Examples are provided for image classification, feature extraction, and calculating image similarities.
Applications of Machine Learning at UCSBSri Ambati
?
This document provides an overview of machine learning applications using H2O.ai, including using historical NFL play data to predict whether the next play will be a pass or run, predicting crime arrests in Chicago by combining crime, weather and census data, classifying text messages as ham or spam, and clustering cycling articles to build a question answering system. It also describes H2O.ai and demonstrates its machine learning capabilities through examples and a data science competition.
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
?
What 'kind of things' does a data scientist do? What are the foundations and principles of data science? What is a Data Product? What does the data science process looks like? Learning from data: Data Modeling or Algorithmic Modeling? - talk by Carlos Somohano @ds_ldn at The Cloud and Big Data: HDInsight on Azure London 25/01/13
A number of recent milestones in AI have rekindled the faith that human-grade computer intelligence can fuel the next technological revolution. In parallel and almost independently, the job role of Data Scientist rose to one of the hottest tickets in the technology sector. Despite the obvious overlap in the domains of Data Science and Artificial Intelligence, the two approaches are sufficiently distinct that choosing the wrong one might trigger a product to fail or a hiring process to go wrong. This presentation will offer some clarity and best practices with regards to understanding what data analysis requirements you really have, as what opposed to what you think you have.
DN2017 | From Big Data to Smart Data | Kirk Borne | Booz Allen HamiltonDataconomy Media
?
Smart data are essential when faced with massive-scale data collections. "Smart" refers to data that are tagged or indexed with meaning-filled metadata that carry information about the semantic meaning of the data, its applications, use cases, content, context, and more. Such meta-tags enable efficient and effective discovery, description, and delivery of the right data at the right time, both to humans and to automatic processes.
Kirk Borne is a data scientist and an astrophysicist who has used his talents at Booz Allen since 2015. He was professor of astrophysics and computational science at George Mason University (GMU) for 12 years. He served as undergraduate advisor for the GMU data science program and graduate advisor in the computational science and informatics Ph.D. program.
This document provides an introduction to data science. It discusses the rapid growth of data and defines data science as extracting insights from vast amounts of data using scientific methods. The document outlines the typical steps in the data science process: acquire, prepare, analyze, report and act on data. It also discusses career opportunities in data science and common tools used, including programming languages, mathematics/statistics foundations and visualization/modeling tools.
IIPGH Webinar 1: Getting Started With Data Scienceds4good
?
In this webinar for ICT Professionals Ghana, we explore the concepts of data science and its motivations as a recent specialization. creating the background for how Artificial Intelligence relates to Machine Learning and to Deep Learning. We further discuss the data science technology stack and the opportunities that exist in the space.
Here are some key terms that are similar to "champagne":
- Sparkling wines
- French champagne
- Cognac
- Rosé
- White wine
- Sparkling wine
- Wine
- Burgundy
- Bordeaux
- Cava
- Prosecco
Some specific champagne brands that are similar terms include Mo?t, Veuve Clicquot, Dom Pérignon, Taittinger, and Bollinger. Grape varieties used in champagne production like Chardonnay and Pinot Noir could also be considered similar terms.
"You Can Do It" by Louis Monier (Altavista Co-Founder & CTO) & Gregory Renard (CTO & Artificial Intelligence Lead Architect at Xbrain) for Deep Learning keynote #0 at Holberton School (http://www.meetup.com/Holberton-School/events/228364522/)
If you want to assist to similar keynote for free, checkout http://www.meetup.com/Holberton-School/
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
?
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
This document provides an introduction to data science. It defines data science as the art of turning data into actions through the creation of data products, which provide actionable information without exposing decision makers to underlying data or analytics. Data science differs from traditional analytics in its use of both deductive and inductive reasoning to discover new insights and test hypotheses. It also leverages interdisciplinary teams to generate prospective, actionable insights from diverse and real-time data sources. The goal of data science is to produce data products that answer key business questions and drive better decisions.
Applications of Machine Learning at USC presentation by Alex Tellez
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...Tiago Henriques
?
This document provides an introduction to using data science in cybersecurity. It discusses BinaryEdge, an organization that uses data science and machine learning to analyze cybersecurity data and detect anomalies. The document outlines BinaryEdge's image analysis workflow and how they use tools like logo detection, face detection, and optical character recognition on images. It also discusses some challenges of applying machine learning in cybersecurity and good use cases. Examples of BinaryEdge's data visualization and microservices APIs are shown.
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
?
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
This document outlines a data science competition to build a spam detector using email data. Participants will be provided with training data containing 600 emails and their corresponding labels (spam or not spam). They will use this data to build a model to classify new emails as spam or not spam. The goal is to correctly classify as many new test emails as possible. Visualization and interpretation of results will be important for evaluating model performance and identifying ways to improve the spam detection.
Booz Allen's experts define the science and art of Data Science in the ground breaking The Field Guide to Data Science. The work unlocks the potential data provides in improving every aspect of our lives by explaining how to ask the right questions from data.
Bringing Machine Learning and Knowledge Graphs Together
Six Core Aspects of Semantic AI:
- Hybrid Approach
- Data Quality
- Data as a Service
- Structured Data Meets Text
- No Black-box
- Towards Self-optimizing Machines
From Lab to Factory: Or how to turn data into valuePeadar Coyle
?
We've all heard of 'big data' or data science, but how do we convert these trends into actual business value. I share case studies, and technology tips and talk about the challenges of the data science process. This is all based on two years of in-the-field research of deploying models, and going from prototypes to production.
These are slides from my talk at PyCon Ireland 2015
Francesco Gadaleta, Chief Data Officer at Abe AI, explains the differences between data science and artificial intelligence as we know it today. Learn more about why we're excited about the current breakthroughs in AI and why it's different to what's happened in the past.
Check out the full blog post: http://bit.ly/2g6cWXq
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts! Lisa Lang
?
This is a panel/workshop session developed for NEXT 2014 in Berlin.
Guests:
Lisa Lang (Twilio) Anke Domscheit-Berg (Opengov.me) Olga Steidl (Linko ) Ivan P. Yamshchikov (Yandex) Felienne Hermans (TU Delft)
----
Content:
Everyone is talking about Big Data – but what’s really behind it and how can you make data work for your business?
Collecting data is just one part of the puzzle. To source the right information, read it so it makes sense and -finally- how to execute on it is the most important task for successful big data management.
At this panel workshop we’ll listen to a lot of examples from big companies who’re dealing with massive amount of data on a daily basis. Each panel member will give a short demo and insight to their strategies and might revile some surprising facts.
This workshop is organised in cooperation with Berlin Geekettes.
Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
Martina Pugliese gives a presentation about her background in physics and transition to a career in data science. She completed degrees in physics, including a PhD exploring how natural language evolves over time. She did a data science bootcamp to gain industry skills. Her current role involves using machine learning and data visualization to understand user behavior on a fashion app and improve personalization, retention, and other business metrics. Data science draws on her physics training in modeling reality mathematically and dealing with large datasets, combining academic rigor with an application to real-world problems.
This document outlines an introductory class meeting for STAT 545A. It introduces the instructor and various tools and concepts related to data science, including RStudio, R Markdown, version control with Git and GitHub, and reproducible research. Students are encouraged to use R Markdown for literate programming and to publish their work to GitHub for collaboration and sharing results.
Machine Learning & AI - 2022 intro for pre-college students.pdfEd Fernandez
?
An updated introduction to Machine Learning and AI: basic concepts, linear regression example, neural networks and deep learning basics, intuitive approach to AI and Machine Learning, AutoML, AI demystified, Algorithms, ML tech stack, additional resources
This document discusses the opportunities for market research in big data. It defines different scales of data size from kilobytes to exabytes and provides examples of some of the largest existing data sets. Commercial applications of big data are also outlined, such as customization, predictive modeling, and operational performance improvement. The document proposes that big data can generate new consumer insights through analysis of large consumer data sets from sources like social media and transactions. It notes both challenges and opportunities for market researchers in adapting to the big data landscape.
This document provides an agenda and overview for a data science presentation. It begins with introductions and then discusses what data science is, how it draws from various influences like math, engineering, and business. It explores the skills and background of data scientists. The document discusses how data science applies the scientific method and gives examples of how data science is used in news stories, business applications, and emerging technologies. It addresses practicing data science professionally and maturing the field as a profession through personal development, integrating it into business, and nurturing the analytics community.
This document summarizes a graph analysis of the Dutch movie world using data from IMDB. It discusses using a graph to represent relationships between actors and actresses who have appeared in the same movies. Nodes in the graph represent individuals and edges link those who have co-starred. The analysis identifies central individuals based on degree and betweenness centrality and detects communities of individuals who frequently appear together. It presents an example community of 54 individuals visualized through a word cloud.
A Unifying theory for blockchain and AILonghow Lam
?
This document proposes a unifying theory connecting blockchain and artificial intelligence technologies. It introduces the Lam-Visser theory and how it fits within the Damhof Quadrants framework. The document provides definitions related to the main result, which states that there exists a minimal, ultra-connected, almost everywhere linear and generic solvable, semi-countable polytope if a certain condition is met. It then discusses applications of this theory to questions of associativity and the computation of analytically independent subalgebras.
IIPGH Webinar 1: Getting Started With Data Scienceds4good
?
In this webinar for ICT Professionals Ghana, we explore the concepts of data science and its motivations as a recent specialization. creating the background for how Artificial Intelligence relates to Machine Learning and to Deep Learning. We further discuss the data science technology stack and the opportunities that exist in the space.
Here are some key terms that are similar to "champagne":
- Sparkling wines
- French champagne
- Cognac
- Rosé
- White wine
- Sparkling wine
- Wine
- Burgundy
- Bordeaux
- Cava
- Prosecco
Some specific champagne brands that are similar terms include Mo?t, Veuve Clicquot, Dom Pérignon, Taittinger, and Bollinger. Grape varieties used in champagne production like Chardonnay and Pinot Noir could also be considered similar terms.
"You Can Do It" by Louis Monier (Altavista Co-Founder & CTO) & Gregory Renard (CTO & Artificial Intelligence Lead Architect at Xbrain) for Deep Learning keynote #0 at Holberton School (http://www.meetup.com/Holberton-School/events/228364522/)
If you want to assist to similar keynote for free, checkout http://www.meetup.com/Holberton-School/
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
?
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
This document provides an introduction to data science. It defines data science as the art of turning data into actions through the creation of data products, which provide actionable information without exposing decision makers to underlying data or analytics. Data science differs from traditional analytics in its use of both deductive and inductive reasoning to discover new insights and test hypotheses. It also leverages interdisciplinary teams to generate prospective, actionable insights from diverse and real-time data sources. The goal of data science is to produce data products that answer key business questions and drive better decisions.
Applications of Machine Learning at USC presentation by Alex Tellez
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...Tiago Henriques
?
This document provides an introduction to using data science in cybersecurity. It discusses BinaryEdge, an organization that uses data science and machine learning to analyze cybersecurity data and detect anomalies. The document outlines BinaryEdge's image analysis workflow and how they use tools like logo detection, face detection, and optical character recognition on images. It also discusses some challenges of applying machine learning in cybersecurity and good use cases. Examples of BinaryEdge's data visualization and microservices APIs are shown.
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
?
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
This document outlines a data science competition to build a spam detector using email data. Participants will be provided with training data containing 600 emails and their corresponding labels (spam or not spam). They will use this data to build a model to classify new emails as spam or not spam. The goal is to correctly classify as many new test emails as possible. Visualization and interpretation of results will be important for evaluating model performance and identifying ways to improve the spam detection.
Booz Allen's experts define the science and art of Data Science in the ground breaking The Field Guide to Data Science. The work unlocks the potential data provides in improving every aspect of our lives by explaining how to ask the right questions from data.
Bringing Machine Learning and Knowledge Graphs Together
Six Core Aspects of Semantic AI:
- Hybrid Approach
- Data Quality
- Data as a Service
- Structured Data Meets Text
- No Black-box
- Towards Self-optimizing Machines
From Lab to Factory: Or how to turn data into valuePeadar Coyle
?
We've all heard of 'big data' or data science, but how do we convert these trends into actual business value. I share case studies, and technology tips and talk about the challenges of the data science process. This is all based on two years of in-the-field research of deploying models, and going from prototypes to production.
These are slides from my talk at PyCon Ireland 2015
Francesco Gadaleta, Chief Data Officer at Abe AI, explains the differences between data science and artificial intelligence as we know it today. Learn more about why we're excited about the current breakthroughs in AI and why it's different to what's happened in the past.
Check out the full blog post: http://bit.ly/2g6cWXq
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts! Lisa Lang
?
This is a panel/workshop session developed for NEXT 2014 in Berlin.
Guests:
Lisa Lang (Twilio) Anke Domscheit-Berg (Opengov.me) Olga Steidl (Linko ) Ivan P. Yamshchikov (Yandex) Felienne Hermans (TU Delft)
----
Content:
Everyone is talking about Big Data – but what’s really behind it and how can you make data work for your business?
Collecting data is just one part of the puzzle. To source the right information, read it so it makes sense and -finally- how to execute on it is the most important task for successful big data management.
At this panel workshop we’ll listen to a lot of examples from big companies who’re dealing with massive amount of data on a daily basis. Each panel member will give a short demo and insight to their strategies and might revile some surprising facts.
This workshop is organised in cooperation with Berlin Geekettes.
Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
Martina Pugliese gives a presentation about her background in physics and transition to a career in data science. She completed degrees in physics, including a PhD exploring how natural language evolves over time. She did a data science bootcamp to gain industry skills. Her current role involves using machine learning and data visualization to understand user behavior on a fashion app and improve personalization, retention, and other business metrics. Data science draws on her physics training in modeling reality mathematically and dealing with large datasets, combining academic rigor with an application to real-world problems.
This document outlines an introductory class meeting for STAT 545A. It introduces the instructor and various tools and concepts related to data science, including RStudio, R Markdown, version control with Git and GitHub, and reproducible research. Students are encouraged to use R Markdown for literate programming and to publish their work to GitHub for collaboration and sharing results.
Machine Learning & AI - 2022 intro for pre-college students.pdfEd Fernandez
?
An updated introduction to Machine Learning and AI: basic concepts, linear regression example, neural networks and deep learning basics, intuitive approach to AI and Machine Learning, AutoML, AI demystified, Algorithms, ML tech stack, additional resources
This document discusses the opportunities for market research in big data. It defines different scales of data size from kilobytes to exabytes and provides examples of some of the largest existing data sets. Commercial applications of big data are also outlined, such as customization, predictive modeling, and operational performance improvement. The document proposes that big data can generate new consumer insights through analysis of large consumer data sets from sources like social media and transactions. It notes both challenges and opportunities for market researchers in adapting to the big data landscape.
This document provides an agenda and overview for a data science presentation. It begins with introductions and then discusses what data science is, how it draws from various influences like math, engineering, and business. It explores the skills and background of data scientists. The document discusses how data science applies the scientific method and gives examples of how data science is used in news stories, business applications, and emerging technologies. It addresses practicing data science professionally and maturing the field as a profession through personal development, integrating it into business, and nurturing the analytics community.
This document summarizes a graph analysis of the Dutch movie world using data from IMDB. It discusses using a graph to represent relationships between actors and actresses who have appeared in the same movies. Nodes in the graph represent individuals and edges link those who have co-starred. The analysis identifies central individuals based on degree and betweenness centrality and detects communities of individuals who frequently appear together. It presents an example community of 54 individuals visualized through a word cloud.
A Unifying theory for blockchain and AILonghow Lam
?
This document proposes a unifying theory connecting blockchain and artificial intelligence technologies. It introduces the Lam-Visser theory and how it fits within the Damhof Quadrants framework. The document provides definitions related to the main result, which states that there exists a minimal, ultra-connected, almost everywhere linear and generic solvable, semi-countable polytope if a certain condition is met. It then discusses applications of this theory to questions of associativity and the computation of analytically independent subalgebras.
Data Science inspiratie sessie, ludieke voorbeelden die enkele machine learning technieken illustreren. Voorspellen van huizenprijzen, soap analytics, auto's, Ikea, de nederlandse film wereld
Jaap Huisprijzen, GTST, The Bold, IKEA en IensLonghow Lam
?
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens, zomaar wat toepassingen van machine learning met Dataiku.
狠狠撸s of my presentation at BigDataExpo Utrect 20-Sep-2018
狠狠撸s from my lightning talk at satRDay Amsterdam, 1 sep 2018. Two hobby projects with R package text2vec. 1. Predicting house prices from house descriptions. 2. Word embeddings from the soap series The Bold and The Beautiful
狠狠撸s of my presentation at the Dataiku meetup on 12th July in Amsterdam (NL)
https://www.meetup.com/Analytics-Data-Science-by-Dataiku-Amsterdam/events/251910036/
RTL collects various data sources like click data, account data, and campaign data. Their data science team uses this data for tasks like churn modeling, response modeling, and customer segmentation. They employ techniques like text mining, computer vision, and association rule mining. For text mining of movie plots, they create a term document matrix and calculate cosine similarity to find similar movies. They also use pre-trained models like VGG16 and ResNet with Keras to perform tasks like content tagging, feature extraction, and measuring image similarities. Survival curves are also used to analyze at what points in episodes or series people stop watching.
This summary provides the key points from the document in 3 sentences:
The document discusses extending results on maximal isometries to characterizing properties of Beltrami vectors and applications to questions of countability. It presents definitions for tangential arrows and canonically composite factors. The main result is a theorem stating that under certain conditions, every Euclidean group is linear, semi-reducible and maximal.
Parameter estimation in a non stationary markov modelLonghow Lam
?
This document is the thesis of Longhow Lam on parameter estimation in a nonstationary Markov model for copolymer propagation. It discusses developing a mathematical model to describe the formation of tri-block copolymer chains from monomers during a three-phase chemical process, including the phenomenon of tapering where both monomer types can react during the third phase. The thesis will estimate the model parameters from experimental data, examine identifiability, and analyze the degree of tapering.
The analysis of doubly censored survival dataLonghow Lam
?
This document describes methods for analyzing doubly censored survival data, where the time of infection is interval censored and the time of disease onset or death may be right censored. It applies these methods to data from Amsterdam Cohort Studies on HIV infection. Specifically, it 1) introduces nonparametric models for the infection and incubation time distributions that use maximum likelihood estimation on interval-censored data, 2) applies these methods to data from three cohort studies, estimating the seroconversion and incubation time distributions, and 3) explores extensions including incorporating covariates and using marker data to estimate distributions for prevalently infected individuals.
Machine learning overview (with SAS software)Longhow Lam
?
The document provides an agenda and materials for a workshop on machine learning with SAS. It includes an introduction to machine learning concepts and algorithms. Specific methods that are discussed include regression, decision trees, dimension reduction techniques, and other supervised and unsupervised learning methods. The document emphasizes how SAS software can be used across the entire analytics lifecycle for machine learning, from data preparation to model deployment.
Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Que...suchanadatta3
?
Motivated by the recent success of end-to-end deep neural models
for ranking tasks, we present here a supervised end-to-end neural
approach for query performance prediction (QPP). In contrast to
unsupervised approaches that rely on various statistics of document
score distributions, our approach is entirely data-driven. Further,
in contrast to weakly supervised approaches, our method also does
not rely on the outputs from different QPP estimators. In particular, our model leverages information from the semantic interactions between the terms of a query and those in the top-documents retrieved with it. The architecture of the model comprises multiple layers of 2D convolution filters followed by a feed-forward layer of parameters. Experiments on standard test collections demonstrate
that our proposed supervised approach outperforms other state-of-the-art supervised and unsupervised approaches.
Boosting MySQL with Vector Search Scale22X 2025.pdfAlkin Tezuysal
?
As the demand for vector databases and Generative AI continues to rise, integrating vector storage and search capabilities into traditional databases has become increasingly important. This session introduces the *MyVector Plugin*, a project that brings native vector storage and similarity search to MySQL. Unlike PostgreSQL, which offers interfaces for adding new data types and index methods, MySQL lacks such extensibility. However, by utilizing MySQL's server component plugin and UDF, the *MyVector Plugin* successfully adds a fully functional vector search feature within the existing MySQL + InnoDB infrastructure, eliminating the need for a separate vector database. The session explains the technical aspects of integrating vector support into MySQL, the challenges posed by its architecture, and real-world use cases that showcase the advantages of combining vector search with MySQL's robust features. Attendees will leave with practical insights on how to add vector search capabilities to their MySQL
CloudMonitor - Architecture Audit Review February 2025.pdfRodney Joyce
?
CloudMonitor FinOps is now a Microsoft Certified solution in the Azure Marketplace. This little badge means that we passed a 3rd-party Technical Audit as well as met various sales KPIs and milestones over the last 12 months.
We used our existing Architecture docs for CISOs and Cloud Architects to craft an Audit Response - I've shared it below to help others obtain their cert.
Interestingly, 90% of our customers are in the USA, with very few in Australia. This is odd as the first thing I hear in every meetup and conference, from partners, customers and Microsoft, is that they want to optimise their cloud spend! But very few Australian companies are using the FinOps Framework to lower Azure costs.
Valkey 101 - SCaLE 22x March 2025 Stokes.pdfDave Stokes
?
An Introduction to Valkey, Presented March 2025 at the Southern California Linux Expo, Pasadena CA. Valkey is a replacement for Redis and is a very fast in memory database, used to caches and other low latency applications. Valkey is open-source software and very fast.
Data Science Lectures Data Science Lectures Data Science Lectures Data Science Lectures Data Science Lectures Data Science Lectures Data Science Lectures Data Science Lectures Data Science Lectures Data Science Lectures Data Science Lectures Data Science Lectures
CH. 4.pptxt and I will be there in aboutmiesoabdela57
?
The reason why I am not words that start with a good things to do anything else ?? and I will be there in about you that go against the future is only today and tomorrow is unborn child hood I have a few that you told to us the reason why I was children and I will be
1. Copyright ? 2012, SAS Institute Inc. All rights reserv ed.
GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS,
BRAD PITT AND THE IKEA BILLY INDEX
Longhow Lam – Freelance Data Scientist
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com
@longhowlam
2. Data Science in Action
AGENDA
? TEXT MINING AND MACHINE LEARNING
? SOME CRAZY EXAMPLES
? Goede tijden Slechte tijden
? IENS Restaurant Reviews
? Who looks like Brad Pitt?
? The IKEA Billy Index
4. Text mining: simple example
Doc 1 “I walked accross the street in Amsterdam, 1057DK, with my bike”
Doc 2 “She didn’t walk but cycled with her blue biike, //bitly.com/sdrtw”
Doc 3 “My bicycle is broken, what a piece of junk, @#$%$@!”
Terms Doc 1 Doc 2 Doc 3
+Bicycle (noun) 1 1 1
Cycling (verb) 0 1 0
Blue (adjective) 0 1 0
Amsterdam (location) 1 0 0
+Walk (verb) 1 1 0
Street (noun) 1 0 0
Broken (adjective) 0 0 1
Piece of junk (noun) 0 0 1
1057DK (postal code) 1 0 0
//bitly.com/sdrtw 0 1 0
TERM DOCUMENT MATRIX: A
? Every text document is a (very)
long string (with many zeros!)
? Data mining techniques are
applied to this matrix A
5. Data Science in Action
TEXT MINING PREDICT OR CLUSTER
Combine texts and “normal data” to predict behaviour (churn / fraude)
Use machine learning to train a
learner f to predict the TARGET
Automatically create topics / clusters in huge piles of documents
Apply cluster techniques to divide
documents into topic
Topic 1 Topic 2 Topic 3
6. Data Science in Action
MACHINE LEARNING SOME ALGORITHMS
Predict
Trees
Random Forests
Cluster
K-means
Hierarchical clustering
DBSCAN
Lineair regression
f
y = f(x) = a0 + a1x1 + a2x2+…anxn
Neural networks y = f(g(h(x)))
8. Data Science in Action
GTST ANALYSIS TEXT ANALYTICS
Business pain
Looking at GTST (Dutch soap): what the hack is this all about?
Are there trends in the series, is it not all the same?
Approach
Take the 5000 summaries and apply text mining in SAS
9. Data Science in Action
GTST ANALYSIS RESULTS
Main topics in 5000 episodes
10. Data Science in Action
GTST ANALYSIS DISTANCES BETWEEN TOPICS
12. Data Science in Action
GTST ANALYSIS ZOOMING IN ON A TOPIC
Sub-topics of main topic: topic 16 (Ludo, Isabelle, Martine, Janine)
? Harmsen feeling lonely.
? Plan by Jack, dangerous
? Writing a farewell letter
? Panic, fear,
? Questions about giving kid assignment
? Getting money back, paying
IMPORTANT: Business validation!
I asked my wife, she used to be a loyal GTST watcher
13. Data Science in Action
GTST ANALYSIS TREND RESULTS
Trends over time with SAS text profile feature
15. Data Science in Action
GTST ANALYSIS SIMILARITY OF EPISODES THROUGH THE YEARS
16. Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
Two statistics that I like to share:
17. Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people don’t
wash their hands
after visiting the toilet
18. Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people don’t
wash their hands
after visiting the toilet
84.6% of all statistics are
just made up on the spot !!
20. Data Science in Action
IENS RESTAURANT PATH ANALYTICS
Business pain
I have eaten Chinese, where should I go next?.
Approach
Look at what others do, IENS restaurant reviewers!
21. Data Science in Action
A FEW FACTS… IENS DATA (TRADITIONAL BI)
Most occurring restaurant name (39 times)
Among “dutch”
restaurant (6 times)
% Sustainable kitchens
Biological (67%)
French (58%)
Fish (44%)
Vegetarian (39%)
…
…
…
Chinese (3%)
700 reviews on a “normal” Saturday
Valentine 2015 1200 reviews (1.7 times)
23 times
12 times
22. Data Science in Action
IENS RESTAURANT PATH ANALYSIS: GENERATED PATHS
23. Data Science in Action
IENS REVIEWS CAN SENTIMENT BE PREDICTED?
? Translate the reviews into a term document matrix
? Apply machine learning to predict scores
? Why would you do this?
24. Data Science in Action
IENS REVIEWS CAN I PREDICT THE SENTIMENT?
25. Data Science in Action
IENS REVIEWS PREDICT THE ‘EAT’ SCORE
Neural (2 X 20) R2 of 0.65
Linear reg model R2 of 0.56
26. Data Science in Action
Predicted review score vs. Given review score
IENS REVIEWS PREDICTION THE ‘EAT’ SCORE
27. Data Science in Action
IENS REVIEWS SENTIMENT ANALYSIS / PREDICTIVE MODELING
29. Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
Business pain
Tell me: Who has a strange face at SAS Netherlands?
Approach
Take SAS photos and translate to data and apply machine learning
30. Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
31. Data Science in Action
STRANGE FACE
DETECTION
COMBO OF OPEN API & SAS
? Use Face++ to do facial landmarking (no deep learning!!)
? Import all landmarks in SAS as an ABT
Now you can solve some funny business issues with machine learning:
? Which persons are look-alikes?
Hierarchical clustering
? Are there any accountmanagers?
Predictive modeling / machine learning
? Who is the Brad Pitt at SAS?
Nearest Neighbor
? Funny faces
Anomaly / outlier detection
32. Data Science in Action
STRANGE FACE
DETECTION
HIERARCHICAL CLUSTERING
33. Data Science in Action
STRANGE FACE
DETECTION
BRAD PITT LOOK-A-LIKES…
34. Data Science in Action
STRANGE FACE
DETECTION
OUTLIER DETECTION
36. Data Science in Action
IKEA WEBSITE KEEP TRACK OF BILLY STOCK
Define the IKEA Billy Index
as the change in stock over time
39. Data Science in Action
Every extra unit increase in wind speed results in 19 less Billy’s sold ?
40. Copyright ? 2012, SAS Institute Inc. All rights reserv ed.
Thanks for your attention, QUESTIONS?
Freelance Data Scientist, Ik sta open om eens een kop koffie te drinken
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com/
@longhowlam