際際滷

際際滷Share a Scribd company logo
Copyright 息 2012, SAS Institute Inc. All rights reserv ed.
GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS,
BRAD PITT AND THE IKEA BILLY INDEX
Longhow Lam  Freelance Data Scientist
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com
@longhowlam
Data Science in Action
AGENDA
 TEXT MINING AND MACHINE LEARNING
 SOME CRAZY EXAMPLES
 Goede tijden Slechte tijden
 IENS Restaurant Reviews
 Who looks like Brad Pitt?
 The IKEA Billy Index
Text mining and
Machine Learning
Text mining: simple example
Doc 1 I walked accross the street in Amsterdam, 1057DK, with my bike
Doc 2 She didnt walk but cycled with her blue biike, //bitly.com/sdrtw
Doc 3 My bicycle is broken, what a piece of junk, @#$%$@!
Terms Doc 1 Doc 2 Doc 3
+Bicycle (noun) 1 1 1
Cycling (verb) 0 1 0
Blue (adjective) 0 1 0
Amsterdam (location) 1 0 0
+Walk (verb) 1 1 0
Street (noun) 1 0 0
Broken (adjective) 0 0 1
Piece of junk (noun) 0 0 1
1057DK (postal code) 1 0 0
//bitly.com/sdrtw 0 1 0
TERM DOCUMENT MATRIX: A
 Every text document is a (very)
long string (with many zeros!)
 Data mining techniques are
applied to this matrix A
Data Science in Action
TEXT MINING PREDICT OR CLUSTER
Combine texts and normal data to predict behaviour (churn / fraude)
Use machine learning to train a
learner f to predict the TARGET
Automatically create topics / clusters in huge piles of documents
Apply cluster techniques to divide
documents into topic
Topic 1 Topic 2 Topic 3
Data Science in Action
MACHINE LEARNING SOME ALGORITHMS
Predict
Trees
Random Forests
Cluster
K-means
Hierarchical clustering
DBSCAN
Lineair regression
f
y = f(x) = a0 + a1x1 + a2x2+anxn
Neural networks y = f(g(h(x)))
Data science in action
Data Science in Action
GTST ANALYSIS TEXT ANALYTICS
Business pain
Looking at GTST (Dutch soap): what the hack is this all about?
Are there trends in the series, is it not all the same?
Approach
Take the 5000 summaries and apply text mining in SAS
Data Science in Action
GTST ANALYSIS RESULTS
Main topics in 5000 episodes
Data Science in Action
GTST ANALYSIS DISTANCES BETWEEN TOPICS
Data Science in Action
GTST ANALYSIS ZOOMING IN ON A TOPIC
Data Science in Action
GTST ANALYSIS ZOOMING IN ON A TOPIC
Sub-topics of main topic: topic 16 (Ludo, Isabelle, Martine, Janine)
 Harmsen feeling lonely.
 Plan by Jack, dangerous
 Writing a farewell letter
 Panic, fear,
 Questions about giving kid assignment
 Getting money back, paying
IMPORTANT: Business validation!
I asked my wife, she used to be a loyal GTST watcher
Data Science in Action
GTST ANALYSIS TREND RESULTS
Trends over time with SAS text profile feature
Data Science in Action
GTST ANALYSIS TRENDS OVER TIME
Data Science in Action
GTST ANALYSIS SIMILARITY OF EPISODES THROUGH THE YEARS
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
Two statistics that I like to share:
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people dont
wash their hands
after visiting the toilet
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people dont
wash their hands
after visiting the toilet
84.6% of all statistics are
just made up on the spot !!
Data science in action
Data Science in Action
IENS RESTAURANT PATH ANALYTICS
Business pain
I have eaten Chinese, where should I go next?.
Approach
Look at what others do, IENS restaurant reviewers!
Data Science in Action
A FEW FACTS IENS DATA (TRADITIONAL BI)
Most occurring restaurant name (39 times)
Among dutch
restaurant (6 times)
% Sustainable kitchens
Biological (67%)
French (58%)
Fish (44%)
Vegetarian (39%)



Chinese (3%)
700 reviews on a normal Saturday
Valentine 2015 1200 reviews (1.7 times)
23 times
12 times
Data Science in Action
IENS RESTAURANT PATH ANALYSIS: GENERATED PATHS
Data Science in Action
IENS REVIEWS CAN SENTIMENT BE PREDICTED?
 Translate the reviews into a term document matrix
 Apply machine learning to predict scores
 Why would you do this?
Data Science in Action
IENS REVIEWS CAN I PREDICT THE SENTIMENT?
Data Science in Action
IENS REVIEWS PREDICT THE EAT SCORE
Neural (2 X 20) R2 of 0.65
Linear reg model R2 of 0.56
Data Science in Action
Predicted review score vs. Given review score
IENS REVIEWS PREDICTION THE EAT SCORE
Data Science in Action
IENS REVIEWS SENTIMENT ANALYSIS / PREDICTIVE MODELING
Data science in action
Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
Business pain
Tell me: Who has a strange face at SAS Netherlands?
Approach
Take SAS photos and translate to data and apply machine learning
Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
Data Science in Action
STRANGE FACE
DETECTION
COMBO OF OPEN API & SAS
 Use Face++ to do facial landmarking (no deep learning!!)
 Import all landmarks in SAS as an ABT
Now you can solve some funny business issues with machine learning:
 Which persons are look-alikes?
Hierarchical clustering
 Are there any accountmanagers?
Predictive modeling / machine learning
 Who is the Brad Pitt at SAS?
Nearest Neighbor
 Funny faces
Anomaly / outlier detection
Data Science in Action
STRANGE FACE
DETECTION
HIERARCHICAL CLUSTERING
Data Science in Action
STRANGE FACE
DETECTION
BRAD PITT LOOK-A-LIKES
Data Science in Action
STRANGE FACE
DETECTION
OUTLIER DETECTION
Data science in action
Data Science in Action
IKEA WEBSITE KEEP TRACK OF BILLY STOCK
Define the IKEA Billy Index
as the change in stock over time
Data Science in Action
IKEA WEBSITE THE IKEA BILLY INDEX
Data Science in Action
THE BILLY INDEX SOME STATISTICS
Data Science in Action
Every extra unit increase in wind speed results in 19 less Billys sold
Copyright 息 2012, SAS Institute Inc. All rights reserv ed.
Thanks for your attention, QUESTIONS?
Freelance Data Scientist, Ik sta open om eens een kop koffie te drinken
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com/
@longhowlam

More Related Content

Similar to Data science in action (20)

PDF
IIPGH Webinar 1: Getting Started With Data Science
ds4good
PDF
Deep Learning Class #0 - You Can Do It
Holberton School
PDF
DL Classe 0 - You can do it
Gregory Renard
PPTX
machine learning in the age of big data: new approaches and business applicat...
Armando Vieira
PDF
The field-guide-to-data-science
Booz Allen Hamilton
PDF
Applications of Machine Learning at USC
Sri Ambati
PDF
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
Tiago Henriques
PDF
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
PDF
Just the basics_strata_2013
Ken Mwai
PDF
The Field Guide to Data Science
Booz Allen Hamilton
PPTX
BrightTALK - Semantic AI
Semantic Web Company
PDF
From Lab to Factory: Or how to turn data into value
Peadar Coyle
PDF
AI Is Changing The Way We Look At Data Science
Abe
PPTX
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts!
Lisa Lang
PDF
The Field Guide to Data Science
EMC
PDF
from_physics_to_data_science
Martina Pugliese
PDF
UBC STAT545 2014 Cm001 intro to-course
Jennifer Bryan
PDF
Machine Learning & AI - 2022 intro for pre-college students.pdf
Ed Fernandez
PPTX
Big data may 2012
Phil Rance
PPTX
DevelopingDataScienceProfession
Gary Rector
IIPGH Webinar 1: Getting Started With Data Science
ds4good
Deep Learning Class #0 - You Can Do It
Holberton School
DL Classe 0 - You can do it
Gregory Renard
machine learning in the age of big data: new approaches and business applicat...
Armando Vieira
The field-guide-to-data-science
Booz Allen Hamilton
Applications of Machine Learning at USC
Sri Ambati
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
Tiago Henriques
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
Just the basics_strata_2013
Ken Mwai
The Field Guide to Data Science
Booz Allen Hamilton
BrightTALK - Semantic AI
Semantic Web Company
From Lab to Factory: Or how to turn data into value
Peadar Coyle
AI Is Changing The Way We Look At Data Science
Abe
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts!
Lisa Lang
The Field Guide to Data Science
EMC
from_physics_to_data_science
Martina Pugliese
UBC STAT545 2014 Cm001 intro to-course
Jennifer Bryan
Machine Learning & AI - 2022 intro for pre-college students.pdf
Ed Fernandez
Big data may 2012
Phil Rance
DevelopingDataScienceProfession
Gary Rector

More from Longhow Lam (13)

PPTX
Xomia_20220602.pptx
Longhow Lam
PDF
A Unifying theory for blockchain and AI
Longhow Lam
PDF
Data science inspiratie_sessie
Longhow Lam
PDF
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Longhow Lam
PDF
text2vec SatRDay Amsterdam
Longhow Lam
PDF
Dataiku meetup 12 july 2018 Amsterdam
Longhow Lam
PPTX
MasterSearch_Meetup_AdvancedAnalytics
Longhow Lam
PDF
Latent transwarp neural networks
Longhow Lam
PDF
MathPaperPublished
Longhow Lam
PPTX
Heliview 29sep2015 slideshare
Longhow Lam
PDF
Parameter estimation in a non stationary markov model
Longhow Lam
PDF
The analysis of doubly censored survival data
Longhow Lam
PPTX
Machine learning overview (with SAS software)
Longhow Lam
Xomia_20220602.pptx
Longhow Lam
A Unifying theory for blockchain and AI
Longhow Lam
Data science inspiratie_sessie
Longhow Lam
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Longhow Lam
text2vec SatRDay Amsterdam
Longhow Lam
Dataiku meetup 12 july 2018 Amsterdam
Longhow Lam
MasterSearch_Meetup_AdvancedAnalytics
Longhow Lam
Latent transwarp neural networks
Longhow Lam
MathPaperPublished
Longhow Lam
Heliview 29sep2015 slideshare
Longhow Lam
Parameter estimation in a non stationary markov model
Longhow Lam
The analysis of doubly censored survival data
Longhow Lam
Machine learning overview (with SAS software)
Longhow Lam
Ad

Recently uploaded (20)

PDF
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
PPTX
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
PDF
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
PDF
Informatics Market Insights AI Workforce.pdf
karizaroxx
PPT
Reliability Monitoring of Aircrfat commerce
Rizk2
DOCX
Starbucks in the Indian market through its joint venture.
sales480687
PPTX
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
PDF
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
PPTX
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
PDF
Data science AI/Ml basics to learn .pdf
deokhushi04
PDF
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
PPTX
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
DOCX
Artigo - Playing to Win.planejamento docx
KellyXavier15
PDF
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
PDF
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
PPSX
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
PPTX
Mynd company all details what they are doing a
AniketKadam40952
PDF
Kafka Use Cases Real-World Applications
Accentfuture
Blood pressure (3).pdfbdbsbsbhshshshhdhdhshshs
hernandezemma379
Monitoring Improvement ( Pomalaa Branch).pptx
fajarkunee
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
Informatics Market Insights AI Workforce.pdf
karizaroxx
Reliability Monitoring of Aircrfat commerce
Rizk2
Starbucks in the Indian market through its joint venture.
sales480687
Data Analytics using sparkabcdefghi.pptx
KarkuzhaliS3
Artificial intelligence Presentation1.pptx
SaritaMahajan5
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
Data science AI/Ml basics to learn .pdf
deokhushi04
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
Artigo - Playing to Win.planejamento docx
KellyXavier15
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
Mynd company all details what they are doing a
AniketKadam40952
Kafka Use Cases Real-World Applications
Accentfuture
Ad

Data science in action

  • 1. Copyright 息 2012, SAS Institute Inc. All rights reserv ed. GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS, BRAD PITT AND THE IKEA BILLY INDEX Longhow Lam Freelance Data Scientist https://www.linkedin.com/in/longhowlam https://longhowlam.wordpress.com @longhowlam
  • 2. Data Science in Action AGENDA TEXT MINING AND MACHINE LEARNING SOME CRAZY EXAMPLES Goede tijden Slechte tijden IENS Restaurant Reviews Who looks like Brad Pitt? The IKEA Billy Index
  • 4. Text mining: simple example Doc 1 I walked accross the street in Amsterdam, 1057DK, with my bike Doc 2 She didnt walk but cycled with her blue biike, //bitly.com/sdrtw Doc 3 My bicycle is broken, what a piece of junk, @#$%$@! Terms Doc 1 Doc 2 Doc 3 +Bicycle (noun) 1 1 1 Cycling (verb) 0 1 0 Blue (adjective) 0 1 0 Amsterdam (location) 1 0 0 +Walk (verb) 1 1 0 Street (noun) 1 0 0 Broken (adjective) 0 0 1 Piece of junk (noun) 0 0 1 1057DK (postal code) 1 0 0 //bitly.com/sdrtw 0 1 0 TERM DOCUMENT MATRIX: A Every text document is a (very) long string (with many zeros!) Data mining techniques are applied to this matrix A
  • 5. Data Science in Action TEXT MINING PREDICT OR CLUSTER Combine texts and normal data to predict behaviour (churn / fraude) Use machine learning to train a learner f to predict the TARGET Automatically create topics / clusters in huge piles of documents Apply cluster techniques to divide documents into topic Topic 1 Topic 2 Topic 3
  • 6. Data Science in Action MACHINE LEARNING SOME ALGORITHMS Predict Trees Random Forests Cluster K-means Hierarchical clustering DBSCAN Lineair regression f y = f(x) = a0 + a1x1 + a2x2+anxn Neural networks y = f(g(h(x)))
  • 8. Data Science in Action GTST ANALYSIS TEXT ANALYTICS Business pain Looking at GTST (Dutch soap): what the hack is this all about? Are there trends in the series, is it not all the same? Approach Take the 5000 summaries and apply text mining in SAS
  • 9. Data Science in Action GTST ANALYSIS RESULTS Main topics in 5000 episodes
  • 10. Data Science in Action GTST ANALYSIS DISTANCES BETWEEN TOPICS
  • 11. Data Science in Action GTST ANALYSIS ZOOMING IN ON A TOPIC
  • 12. Data Science in Action GTST ANALYSIS ZOOMING IN ON A TOPIC Sub-topics of main topic: topic 16 (Ludo, Isabelle, Martine, Janine) Harmsen feeling lonely. Plan by Jack, dangerous Writing a farewell letter Panic, fear, Questions about giving kid assignment Getting money back, paying IMPORTANT: Business validation! I asked my wife, she used to be a loyal GTST watcher
  • 13. Data Science in Action GTST ANALYSIS TREND RESULTS Trends over time with SAS text profile feature
  • 14. Data Science in Action GTST ANALYSIS TRENDS OVER TIME
  • 15. Data Science in Action GTST ANALYSIS SIMILARITY OF EPISODES THROUGH THE YEARS
  • 16. Data Science in Action Can you shake hands with your neighbor? A LITTLE STATISTICAL EXPERIMENT Two statistics that I like to share:
  • 17. Data Science in Action Can you shake hands with your neighbor? A LITTLE STATISTICAL EXPERIMENT 50.1% of people dont wash their hands after visiting the toilet
  • 18. Data Science in Action Can you shake hands with your neighbor? A LITTLE STATISTICAL EXPERIMENT 50.1% of people dont wash their hands after visiting the toilet 84.6% of all statistics are just made up on the spot !!
  • 20. Data Science in Action IENS RESTAURANT PATH ANALYTICS Business pain I have eaten Chinese, where should I go next?. Approach Look at what others do, IENS restaurant reviewers!
  • 21. Data Science in Action A FEW FACTS IENS DATA (TRADITIONAL BI) Most occurring restaurant name (39 times) Among dutch restaurant (6 times) % Sustainable kitchens Biological (67%) French (58%) Fish (44%) Vegetarian (39%) Chinese (3%) 700 reviews on a normal Saturday Valentine 2015 1200 reviews (1.7 times) 23 times 12 times
  • 22. Data Science in Action IENS RESTAURANT PATH ANALYSIS: GENERATED PATHS
  • 23. Data Science in Action IENS REVIEWS CAN SENTIMENT BE PREDICTED? Translate the reviews into a term document matrix Apply machine learning to predict scores Why would you do this?
  • 24. Data Science in Action IENS REVIEWS CAN I PREDICT THE SENTIMENT?
  • 25. Data Science in Action IENS REVIEWS PREDICT THE EAT SCORE Neural (2 X 20) R2 of 0.65 Linear reg model R2 of 0.56
  • 26. Data Science in Action Predicted review score vs. Given review score IENS REVIEWS PREDICTION THE EAT SCORE
  • 27. Data Science in Action IENS REVIEWS SENTIMENT ANALYSIS / PREDICTIVE MODELING
  • 29. Data Science in Action OUTLIERS IN FACES DATA MINING & MACHINE LEARNING Business pain Tell me: Who has a strange face at SAS Netherlands? Approach Take SAS photos and translate to data and apply machine learning
  • 30. Data Science in Action OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
  • 31. Data Science in Action STRANGE FACE DETECTION COMBO OF OPEN API & SAS Use Face++ to do facial landmarking (no deep learning!!) Import all landmarks in SAS as an ABT Now you can solve some funny business issues with machine learning: Which persons are look-alikes? Hierarchical clustering Are there any accountmanagers? Predictive modeling / machine learning Who is the Brad Pitt at SAS? Nearest Neighbor Funny faces Anomaly / outlier detection
  • 32. Data Science in Action STRANGE FACE DETECTION HIERARCHICAL CLUSTERING
  • 33. Data Science in Action STRANGE FACE DETECTION BRAD PITT LOOK-A-LIKES
  • 34. Data Science in Action STRANGE FACE DETECTION OUTLIER DETECTION
  • 36. Data Science in Action IKEA WEBSITE KEEP TRACK OF BILLY STOCK Define the IKEA Billy Index as the change in stock over time
  • 37. Data Science in Action IKEA WEBSITE THE IKEA BILLY INDEX
  • 38. Data Science in Action THE BILLY INDEX SOME STATISTICS
  • 39. Data Science in Action Every extra unit increase in wind speed results in 19 less Billys sold
  • 40. Copyright 息 2012, SAS Institute Inc. All rights reserv ed. Thanks for your attention, QUESTIONS? Freelance Data Scientist, Ik sta open om eens een kop koffie te drinken https://www.linkedin.com/in/longhowlam https://longhowlam.wordpress.com/ @longhowlam