狠狠撸

狠狠撸Share a Scribd company logo
Copyright ? 2012, SAS Institute Inc. All rights reserv ed.
GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS,
BRAD PITT AND THE IKEA BILLY INDEX
Longhow Lam – Freelance Data Scientist
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com
@longhowlam
Data Science in Action
AGENDA
? TEXT MINING AND MACHINE LEARNING
? SOME CRAZY EXAMPLES
? Goede tijden Slechte tijden
? IENS Restaurant Reviews
? Who looks like Brad Pitt?
? The IKEA Billy Index
Text mining and
Machine Learning
Text mining: simple example
Doc 1 “I walked accross the street in Amsterdam, 1057DK, with my bike”
Doc 2 “She didn’t walk but cycled with her blue biike, //bitly.com/sdrtw”
Doc 3 “My bicycle is broken, what a piece of junk, @#$%$@!”
Terms Doc 1 Doc 2 Doc 3
+Bicycle (noun) 1 1 1
Cycling (verb) 0 1 0
Blue (adjective) 0 1 0
Amsterdam (location) 1 0 0
+Walk (verb) 1 1 0
Street (noun) 1 0 0
Broken (adjective) 0 0 1
Piece of junk (noun) 0 0 1
1057DK (postal code) 1 0 0
//bitly.com/sdrtw 0 1 0
TERM DOCUMENT MATRIX: A
? Every text document is a (very)
long string (with many zeros!)
? Data mining techniques are
applied to this matrix A
Data Science in Action
TEXT MINING PREDICT OR CLUSTER
Combine texts and “normal data” to predict behaviour (churn / fraude)
Use machine learning to train a
learner f to predict the TARGET
Automatically create topics / clusters in huge piles of documents
Apply cluster techniques to divide
documents into topic
Topic 1 Topic 2 Topic 3
Data Science in Action
MACHINE LEARNING SOME ALGORITHMS
Predict
Trees
Random Forests
Cluster
K-means
Hierarchical clustering
DBSCAN
Lineair regression
f
y = f(x) = a0 + a1x1 + a2x2+…anxn
Neural networks y = f(g(h(x)))
Data science in action
Data Science in Action
GTST ANALYSIS TEXT ANALYTICS
Business pain
Looking at GTST (Dutch soap): what the hack is this all about?
Are there trends in the series, is it not all the same?
Approach
Take the 5000 summaries and apply text mining in SAS
Data Science in Action
GTST ANALYSIS RESULTS
Main topics in 5000 episodes
Data Science in Action
GTST ANALYSIS DISTANCES BETWEEN TOPICS
Data Science in Action
GTST ANALYSIS ZOOMING IN ON A TOPIC
Data Science in Action
GTST ANALYSIS ZOOMING IN ON A TOPIC
Sub-topics of main topic: topic 16 (Ludo, Isabelle, Martine, Janine)
? Harmsen feeling lonely.
? Plan by Jack, dangerous
? Writing a farewell letter
? Panic, fear,
? Questions about giving kid assignment
? Getting money back, paying
IMPORTANT: Business validation!
I asked my wife, she used to be a loyal GTST watcher
Data Science in Action
GTST ANALYSIS TREND RESULTS
Trends over time with SAS text profile feature
Data Science in Action
GTST ANALYSIS TRENDS OVER TIME
Data Science in Action
GTST ANALYSIS SIMILARITY OF EPISODES THROUGH THE YEARS
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
Two statistics that I like to share:
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people don’t
wash their hands
after visiting the toilet
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people don’t
wash their hands
after visiting the toilet
84.6% of all statistics are
just made up on the spot !!
Data science in action
Data Science in Action
IENS RESTAURANT PATH ANALYTICS
Business pain
I have eaten Chinese, where should I go next?.
Approach
Look at what others do, IENS restaurant reviewers!
Data Science in Action
A FEW FACTS… IENS DATA (TRADITIONAL BI)
Most occurring restaurant name (39 times)
Among “dutch”
restaurant (6 times)
% Sustainable kitchens
Biological (67%)
French (58%)
Fish (44%)
Vegetarian (39%)
…
…
…
Chinese (3%)
700 reviews on a “normal” Saturday
Valentine 2015 1200 reviews (1.7 times)
23 times
12 times
Data Science in Action
IENS RESTAURANT PATH ANALYSIS: GENERATED PATHS
Data Science in Action
IENS REVIEWS CAN SENTIMENT BE PREDICTED?
? Translate the reviews into a term document matrix
? Apply machine learning to predict scores
? Why would you do this?
Data Science in Action
IENS REVIEWS CAN I PREDICT THE SENTIMENT?
Data Science in Action
IENS REVIEWS PREDICT THE ‘EAT’ SCORE
Neural (2 X 20) R2 of 0.65
Linear reg model R2 of 0.56
Data Science in Action
Predicted review score vs. Given review score
IENS REVIEWS PREDICTION THE ‘EAT’ SCORE
Data Science in Action
IENS REVIEWS SENTIMENT ANALYSIS / PREDICTIVE MODELING
Data science in action
Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
Business pain
Tell me: Who has a strange face at SAS Netherlands?
Approach
Take SAS photos and translate to data and apply machine learning
Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
Data Science in Action
STRANGE FACE
DETECTION
COMBO OF OPEN API & SAS
? Use Face++ to do facial landmarking (no deep learning!!)
? Import all landmarks in SAS as an ABT
Now you can solve some funny business issues with machine learning:
? Which persons are look-alikes?
Hierarchical clustering
? Are there any accountmanagers?
Predictive modeling / machine learning
? Who is the Brad Pitt at SAS?
Nearest Neighbor
? Funny faces
Anomaly / outlier detection
Data Science in Action
STRANGE FACE
DETECTION
HIERARCHICAL CLUSTERING
Data Science in Action
STRANGE FACE
DETECTION
BRAD PITT LOOK-A-LIKES…
Data Science in Action
STRANGE FACE
DETECTION
OUTLIER DETECTION
Data science in action
Data Science in Action
IKEA WEBSITE KEEP TRACK OF BILLY STOCK
Define the IKEA Billy Index
as the change in stock over time
Data Science in Action
IKEA WEBSITE THE IKEA BILLY INDEX
Data Science in Action
THE BILLY INDEX SOME STATISTICS
Data Science in Action
Every extra unit increase in wind speed results in 19 less Billy’s sold ?
Copyright ? 2012, SAS Institute Inc. All rights reserv ed.
Thanks for your attention, QUESTIONS?
Freelance Data Scientist, Ik sta open om eens een kop koffie te drinken
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com/
@longhowlam

More Related Content

Similar to Data science in action (20)

IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
ds4good
?
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
Gregory Renard
?
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
Holberton School
?
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
Armando Vieira
?
The field-guide-to-data-science
The field-guide-to-data-scienceThe field-guide-to-data-science
The field-guide-to-data-science
Booz Allen Hamilton
?
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
Sri Ambati
?
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
Tiago Henriques
?
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
?
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
Ken Mwai
?
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
Booz Allen Hamilton
?
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
Semantic Web Company
?
From Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into valueFrom Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into value
Peadar Coyle
?
AI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data ScienceAI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data Science
Abe
?
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts!
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts! BIG DATA MANAGEMENT - forget the hype, let's talk about the facts!
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts!
Lisa Lang
?
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
EMC
?
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
Martina Pugliese
?
UBC STAT545 2014 Cm001 intro to-course
UBC STAT545 2014 Cm001 intro to-courseUBC STAT545 2014 Cm001 intro to-course
UBC STAT545 2014 Cm001 intro to-course
Jennifer Bryan
?
Machine Learning & AI - 2022 intro for pre-college students.pdf
Machine Learning & AI - 2022 intro for pre-college students.pdfMachine Learning & AI - 2022 intro for pre-college students.pdf
Machine Learning & AI - 2022 intro for pre-college students.pdf
Ed Fernandez
?
Big data may 2012
Big data may 2012Big data may 2012
Big data may 2012
Phil Rance
?
DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfession
Gary Rector
?
IIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data ScienceIIPGH Webinar 1: Getting Started With Data Science
IIPGH Webinar 1: Getting Started With Data Science
ds4good
?
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
Gregory Renard
?
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
Holberton School
?
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
Armando Vieira
?
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
Sri Ambati
?
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACH...
Tiago Henriques
?
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
?
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
Ken Mwai
?
From Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into valueFrom Lab to Factory: Or how to turn data into value
From Lab to Factory: Or how to turn data into value
Peadar Coyle
?
AI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data ScienceAI Is Changing The Way We Look At Data Science
AI Is Changing The Way We Look At Data Science
Abe
?
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts!
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts! BIG DATA MANAGEMENT - forget the hype, let's talk about the facts!
BIG DATA MANAGEMENT - forget the hype, let's talk about the facts!
Lisa Lang
?
The Field Guide to Data Science
The Field Guide to Data ScienceThe Field Guide to Data Science
The Field Guide to Data Science
EMC
?
UBC STAT545 2014 Cm001 intro to-course
UBC STAT545 2014 Cm001 intro to-courseUBC STAT545 2014 Cm001 intro to-course
UBC STAT545 2014 Cm001 intro to-course
Jennifer Bryan
?
Machine Learning & AI - 2022 intro for pre-college students.pdf
Machine Learning & AI - 2022 intro for pre-college students.pdfMachine Learning & AI - 2022 intro for pre-college students.pdf
Machine Learning & AI - 2022 intro for pre-college students.pdf
Ed Fernandez
?
Big data may 2012
Big data may 2012Big data may 2012
Big data may 2012
Phil Rance
?
DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfession
Gary Rector
?

More from Longhow Lam (13)

Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
Longhow Lam
?
A Unifying theory for blockchain and AI
A Unifying theory for blockchain and AIA Unifying theory for blockchain and AI
A Unifying theory for blockchain and AI
Longhow Lam
?
Data science inspiratie_sessie
Data science inspiratie_sessieData science inspiratie_sessie
Data science inspiratie_sessie
Longhow Lam
?
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensJaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Longhow Lam
?
text2vec SatRDay Amsterdam
text2vec SatRDay Amsterdamtext2vec SatRDay Amsterdam
text2vec SatRDay Amsterdam
Longhow Lam
?
Dataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamDataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 Amsterdam
Longhow Lam
?
MasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalyticsMasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalytics
Longhow Lam
?
Latent transwarp neural networks
Latent transwarp neural networksLatent transwarp neural networks
Latent transwarp neural networks
Longhow Lam
?
MathPaperPublished
MathPaperPublishedMathPaperPublished
MathPaperPublished
Longhow Lam
?
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshare
Longhow Lam
?
Parameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelParameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov model
Longhow Lam
?
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival data
Longhow Lam
?
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
Longhow Lam
?
Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
Longhow Lam
?
A Unifying theory for blockchain and AI
A Unifying theory for blockchain and AIA Unifying theory for blockchain and AI
A Unifying theory for blockchain and AI
Longhow Lam
?
Data science inspiratie_sessie
Data science inspiratie_sessieData science inspiratie_sessie
Data science inspiratie_sessie
Longhow Lam
?
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en IensJaap Huisprijzen, GTST, The Bold, IKEA en Iens
Jaap Huisprijzen, GTST, The Bold, IKEA en Iens
Longhow Lam
?
text2vec SatRDay Amsterdam
text2vec SatRDay Amsterdamtext2vec SatRDay Amsterdam
text2vec SatRDay Amsterdam
Longhow Lam
?
Dataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 AmsterdamDataiku meetup 12 july 2018 Amsterdam
Dataiku meetup 12 july 2018 Amsterdam
Longhow Lam
?
MasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalyticsMasterSearch_Meetup_AdvancedAnalytics
MasterSearch_Meetup_AdvancedAnalytics
Longhow Lam
?
Latent transwarp neural networks
Latent transwarp neural networksLatent transwarp neural networks
Latent transwarp neural networks
Longhow Lam
?
Heliview 29sep2015 slideshare
Heliview 29sep2015 slideshareHeliview 29sep2015 slideshare
Heliview 29sep2015 slideshare
Longhow Lam
?
Parameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov modelParameter estimation in a non stationary markov model
Parameter estimation in a non stationary markov model
Longhow Lam
?
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival data
Longhow Lam
?
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
Longhow Lam
?

Recently uploaded (20)

Updated Willow 2025 Media Deck_Updated010325.pdf
Updated Willow 2025 Media Deck_Updated010325.pdfUpdated Willow 2025 Media Deck_Updated010325.pdf
Updated Willow 2025 Media Deck_Updated010325.pdf
tangramcommunication
?
Presentation1.pptx for data and table analysis
Presentation1.pptx for data and table analysisPresentation1.pptx for data and table analysis
Presentation1.pptx for data and table analysis
vatsalsingla4
?
Lecture-AI and Alogor Parallel Aglorithms.pptx
Lecture-AI and Alogor Parallel Aglorithms.pptxLecture-AI and Alogor Parallel Aglorithms.pptx
Lecture-AI and Alogor Parallel Aglorithms.pptx
humairafatima22
?
Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Que...
Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Que...Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Que...
Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Que...
suchanadatta3
?
Data-Models-in-DBMS-An-Overview.pptx.pptx
Data-Models-in-DBMS-An-Overview.pptx.pptxData-Models-in-DBMS-An-Overview.pptx.pptx
Data-Models-in-DBMS-An-Overview.pptx.pptx
hfebxtveyjxavhx
?
Presentation.2 .reversal. reversal. pptx
Presentation.2 .reversal. reversal. pptxPresentation.2 .reversal. reversal. pptx
Presentation.2 .reversal. reversal. pptx
siliaselim87
?
Boosting MySQL with Vector Search Scale22X 2025.pdf
Boosting MySQL with Vector Search Scale22X 2025.pdfBoosting MySQL with Vector Search Scale22X 2025.pdf
Boosting MySQL with Vector Search Scale22X 2025.pdf
Alkin Tezuysal
?
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Deddy Rahman
?
CloudMonitor - Architecture Audit Review February 2025.pdf
CloudMonitor - Architecture Audit Review February 2025.pdfCloudMonitor - Architecture Audit Review February 2025.pdf
CloudMonitor - Architecture Audit Review February 2025.pdf
Rodney Joyce
?
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptxvnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
deomom129
?
Valkey 101 - SCaLE 22x March 2025 Stokes.pdf
Valkey 101 - SCaLE 22x March 2025 Stokes.pdfValkey 101 - SCaLE 22x March 2025 Stokes.pdf
Valkey 101 - SCaLE 22x March 2025 Stokes.pdf
Dave Stokes
?
643663189-Q4W3-Synthesize-Information-1-pptx.pptx
643663189-Q4W3-Synthesize-Information-1-pptx.pptx643663189-Q4W3-Synthesize-Information-1-pptx.pptx
643663189-Q4W3-Synthesize-Information-1-pptx.pptx
rossanthonytan130
?
100680-05-Eucharist_Orientation_Sessions.pdf
100680-05-Eucharist_Orientation_Sessions.pdf100680-05-Eucharist_Orientation_Sessions.pdf
100680-05-Eucharist_Orientation_Sessions.pdf
jacobdivina9
?
Kaggle & Datathons: A Practical Guide to AI Competitions
Kaggle & Datathons: A Practical Guide to AI CompetitionsKaggle & Datathons: A Practical Guide to AI Competitions
Kaggle & Datathons: A Practical Guide to AI Competitions
rasheedsrq
?
april 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fictionapril 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fiction
omokoredeolasunbomi
?
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
taqyed
?
MLecture 1 Introduction to AI . The basics.pptx
MLecture 1 Introduction to AI . The basics.pptxMLecture 1 Introduction to AI . The basics.pptx
MLecture 1 Introduction to AI . The basics.pptx
FaizaKhan720183
?
Introduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdfIntroduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdf
messagetome133
?
CH. 4.pptxt and I will be there in about
CH. 4.pptxt and I will be there in aboutCH. 4.pptxt and I will be there in about
CH. 4.pptxt and I will be there in about
miesoabdela57
?
MTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptxMTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptx
Rakshit Porwal
?
Updated Willow 2025 Media Deck_Updated010325.pdf
Updated Willow 2025 Media Deck_Updated010325.pdfUpdated Willow 2025 Media Deck_Updated010325.pdf
Updated Willow 2025 Media Deck_Updated010325.pdf
tangramcommunication
?
Presentation1.pptx for data and table analysis
Presentation1.pptx for data and table analysisPresentation1.pptx for data and table analysis
Presentation1.pptx for data and table analysis
vatsalsingla4
?
Lecture-AI and Alogor Parallel Aglorithms.pptx
Lecture-AI and Alogor Parallel Aglorithms.pptxLecture-AI and Alogor Parallel Aglorithms.pptx
Lecture-AI and Alogor Parallel Aglorithms.pptx
humairafatima22
?
Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Que...
Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Que...Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Que...
Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Que...
suchanadatta3
?
Data-Models-in-DBMS-An-Overview.pptx.pptx
Data-Models-in-DBMS-An-Overview.pptx.pptxData-Models-in-DBMS-An-Overview.pptx.pptx
Data-Models-in-DBMS-An-Overview.pptx.pptx
hfebxtveyjxavhx
?
Presentation.2 .reversal. reversal. pptx
Presentation.2 .reversal. reversal. pptxPresentation.2 .reversal. reversal. pptx
Presentation.2 .reversal. reversal. pptx
siliaselim87
?
Boosting MySQL with Vector Search Scale22X 2025.pdf
Boosting MySQL with Vector Search Scale22X 2025.pdfBoosting MySQL with Vector Search Scale22X 2025.pdf
Boosting MySQL with Vector Search Scale22X 2025.pdf
Alkin Tezuysal
?
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024Monitoring Imam Ririn di Pilkada Kota Depok 2024
Monitoring Imam Ririn di Pilkada Kota Depok 2024
Deddy Rahman
?
CloudMonitor - Architecture Audit Review February 2025.pdf
CloudMonitor - Architecture Audit Review February 2025.pdfCloudMonitor - Architecture Audit Review February 2025.pdf
CloudMonitor - Architecture Audit Review February 2025.pdf
Rodney Joyce
?
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptxvnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
vnptloveeeeeeeeeeeeeeeeeeeeeeeeeeee.pptx
deomom129
?
Valkey 101 - SCaLE 22x March 2025 Stokes.pdf
Valkey 101 - SCaLE 22x March 2025 Stokes.pdfValkey 101 - SCaLE 22x March 2025 Stokes.pdf
Valkey 101 - SCaLE 22x March 2025 Stokes.pdf
Dave Stokes
?
643663189-Q4W3-Synthesize-Information-1-pptx.pptx
643663189-Q4W3-Synthesize-Information-1-pptx.pptx643663189-Q4W3-Synthesize-Information-1-pptx.pptx
643663189-Q4W3-Synthesize-Information-1-pptx.pptx
rossanthonytan130
?
100680-05-Eucharist_Orientation_Sessions.pdf
100680-05-Eucharist_Orientation_Sessions.pdf100680-05-Eucharist_Orientation_Sessions.pdf
100680-05-Eucharist_Orientation_Sessions.pdf
jacobdivina9
?
Kaggle & Datathons: A Practical Guide to AI Competitions
Kaggle & Datathons: A Practical Guide to AI CompetitionsKaggle & Datathons: A Practical Guide to AI Competitions
Kaggle & Datathons: A Practical Guide to AI Competitions
rasheedsrq
?
april 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fictionapril 2024 paper 2 ms. english non fiction
april 2024 paper 2 ms. english non fiction
omokoredeolasunbomi
?
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
办理魁北克大学成绩单触购买加拿大鲍蚕础惭成绩单文凭定制
taqyed
?
MLecture 1 Introduction to AI . The basics.pptx
MLecture 1 Introduction to AI . The basics.pptxMLecture 1 Introduction to AI . The basics.pptx
MLecture 1 Introduction to AI . The basics.pptx
FaizaKhan720183
?
Introduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdfIntroduction Lecture 01 Data Science.pdf
Introduction Lecture 01 Data Science.pdf
messagetome133
?
CH. 4.pptxt and I will be there in about
CH. 4.pptxt and I will be there in aboutCH. 4.pptxt and I will be there in about
CH. 4.pptxt and I will be there in about
miesoabdela57
?
MTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptxMTC Supply Chain Management Strategy.pptx
MTC Supply Chain Management Strategy.pptx
Rakshit Porwal
?

Data science in action

  • 1. Copyright ? 2012, SAS Institute Inc. All rights reserv ed. GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS, BRAD PITT AND THE IKEA BILLY INDEX Longhow Lam – Freelance Data Scientist https://www.linkedin.com/in/longhowlam https://longhowlam.wordpress.com @longhowlam
  • 2. Data Science in Action AGENDA ? TEXT MINING AND MACHINE LEARNING ? SOME CRAZY EXAMPLES ? Goede tijden Slechte tijden ? IENS Restaurant Reviews ? Who looks like Brad Pitt? ? The IKEA Billy Index
  • 4. Text mining: simple example Doc 1 “I walked accross the street in Amsterdam, 1057DK, with my bike” Doc 2 “She didn’t walk but cycled with her blue biike, //bitly.com/sdrtw” Doc 3 “My bicycle is broken, what a piece of junk, @#$%$@!” Terms Doc 1 Doc 2 Doc 3 +Bicycle (noun) 1 1 1 Cycling (verb) 0 1 0 Blue (adjective) 0 1 0 Amsterdam (location) 1 0 0 +Walk (verb) 1 1 0 Street (noun) 1 0 0 Broken (adjective) 0 0 1 Piece of junk (noun) 0 0 1 1057DK (postal code) 1 0 0 //bitly.com/sdrtw 0 1 0 TERM DOCUMENT MATRIX: A ? Every text document is a (very) long string (with many zeros!) ? Data mining techniques are applied to this matrix A
  • 5. Data Science in Action TEXT MINING PREDICT OR CLUSTER Combine texts and “normal data” to predict behaviour (churn / fraude) Use machine learning to train a learner f to predict the TARGET Automatically create topics / clusters in huge piles of documents Apply cluster techniques to divide documents into topic Topic 1 Topic 2 Topic 3
  • 6. Data Science in Action MACHINE LEARNING SOME ALGORITHMS Predict Trees Random Forests Cluster K-means Hierarchical clustering DBSCAN Lineair regression f y = f(x) = a0 + a1x1 + a2x2+…anxn Neural networks y = f(g(h(x)))
  • 8. Data Science in Action GTST ANALYSIS TEXT ANALYTICS Business pain Looking at GTST (Dutch soap): what the hack is this all about? Are there trends in the series, is it not all the same? Approach Take the 5000 summaries and apply text mining in SAS
  • 9. Data Science in Action GTST ANALYSIS RESULTS Main topics in 5000 episodes
  • 10. Data Science in Action GTST ANALYSIS DISTANCES BETWEEN TOPICS
  • 11. Data Science in Action GTST ANALYSIS ZOOMING IN ON A TOPIC
  • 12. Data Science in Action GTST ANALYSIS ZOOMING IN ON A TOPIC Sub-topics of main topic: topic 16 (Ludo, Isabelle, Martine, Janine) ? Harmsen feeling lonely. ? Plan by Jack, dangerous ? Writing a farewell letter ? Panic, fear, ? Questions about giving kid assignment ? Getting money back, paying IMPORTANT: Business validation! I asked my wife, she used to be a loyal GTST watcher
  • 13. Data Science in Action GTST ANALYSIS TREND RESULTS Trends over time with SAS text profile feature
  • 14. Data Science in Action GTST ANALYSIS TRENDS OVER TIME
  • 15. Data Science in Action GTST ANALYSIS SIMILARITY OF EPISODES THROUGH THE YEARS
  • 16. Data Science in Action Can you shake hands with your neighbor? A LITTLE STATISTICAL EXPERIMENT Two statistics that I like to share:
  • 17. Data Science in Action Can you shake hands with your neighbor? A LITTLE STATISTICAL EXPERIMENT 50.1% of people don’t wash their hands after visiting the toilet
  • 18. Data Science in Action Can you shake hands with your neighbor? A LITTLE STATISTICAL EXPERIMENT 50.1% of people don’t wash their hands after visiting the toilet 84.6% of all statistics are just made up on the spot !!
  • 20. Data Science in Action IENS RESTAURANT PATH ANALYTICS Business pain I have eaten Chinese, where should I go next?. Approach Look at what others do, IENS restaurant reviewers!
  • 21. Data Science in Action A FEW FACTS… IENS DATA (TRADITIONAL BI) Most occurring restaurant name (39 times) Among “dutch” restaurant (6 times) % Sustainable kitchens Biological (67%) French (58%) Fish (44%) Vegetarian (39%) … … … Chinese (3%) 700 reviews on a “normal” Saturday Valentine 2015 1200 reviews (1.7 times) 23 times 12 times
  • 22. Data Science in Action IENS RESTAURANT PATH ANALYSIS: GENERATED PATHS
  • 23. Data Science in Action IENS REVIEWS CAN SENTIMENT BE PREDICTED? ? Translate the reviews into a term document matrix ? Apply machine learning to predict scores ? Why would you do this?
  • 24. Data Science in Action IENS REVIEWS CAN I PREDICT THE SENTIMENT?
  • 25. Data Science in Action IENS REVIEWS PREDICT THE ‘EAT’ SCORE Neural (2 X 20) R2 of 0.65 Linear reg model R2 of 0.56
  • 26. Data Science in Action Predicted review score vs. Given review score IENS REVIEWS PREDICTION THE ‘EAT’ SCORE
  • 27. Data Science in Action IENS REVIEWS SENTIMENT ANALYSIS / PREDICTIVE MODELING
  • 29. Data Science in Action OUTLIERS IN FACES DATA MINING & MACHINE LEARNING Business pain Tell me: Who has a strange face at SAS Netherlands? Approach Take SAS photos and translate to data and apply machine learning
  • 30. Data Science in Action OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
  • 31. Data Science in Action STRANGE FACE DETECTION COMBO OF OPEN API & SAS ? Use Face++ to do facial landmarking (no deep learning!!) ? Import all landmarks in SAS as an ABT Now you can solve some funny business issues with machine learning: ? Which persons are look-alikes? Hierarchical clustering ? Are there any accountmanagers? Predictive modeling / machine learning ? Who is the Brad Pitt at SAS? Nearest Neighbor ? Funny faces Anomaly / outlier detection
  • 32. Data Science in Action STRANGE FACE DETECTION HIERARCHICAL CLUSTERING
  • 33. Data Science in Action STRANGE FACE DETECTION BRAD PITT LOOK-A-LIKES…
  • 34. Data Science in Action STRANGE FACE DETECTION OUTLIER DETECTION
  • 36. Data Science in Action IKEA WEBSITE KEEP TRACK OF BILLY STOCK Define the IKEA Billy Index as the change in stock over time
  • 37. Data Science in Action IKEA WEBSITE THE IKEA BILLY INDEX
  • 38. Data Science in Action THE BILLY INDEX SOME STATISTICS
  • 39. Data Science in Action Every extra unit increase in wind speed results in 19 less Billy’s sold ?
  • 40. Copyright ? 2012, SAS Institute Inc. All rights reserv ed. Thanks for your attention, QUESTIONS? Freelance Data Scientist, Ik sta open om eens een kop koffie te drinken https://www.linkedin.com/in/longhowlam https://longhowlam.wordpress.com/ @longhowlam