Data teams know that a machine learning prototype in Jupyter Notebook is still far from being a deployed product. This summer, our team won the AI challenge of Rotterdam.ai with Reviewscan, a new product that unlocks data value of online reviews with AI. Since then we have moved beyond prototyping, validating our business proposition and building an automated pipeline using GCP. In this talk, I will explore what it takes to build a ML-product by giving a demo of the Reviewscan API. I will also discuss what challenges are faced when operationalising machine learning.
4. Problem/Solution
Importance of reviews will
increase more than ever
86% read reviews before booking*
91% trust online reviews as much as
personal recommendations**
Reading average of 10 online reviews
*TripBarometer 2017/2018 global study
**BrightLocal research USA 2018
5. WHATISTHISTALKABOUT
? Our journey
? Prototyping
? Choosing the ML-model and applying NLP
? Challenges
? Validating Business Value Proposition
? Experiments
? Implementation
? Product deployment
? Google Cloud Backend Tooling
? Building an automated pipeline
? Lessons learned
6. OURJOURNEY
Winning the AI-challenge
Python for Data Science Rotterdam.ai at CGI Summer 2019
Lu Zheng
New Business
Data Engineer
Elien van Riet
Analytics Translator
Data Scientist
David Fortini
Machine Learning
Engineer
Xu Zhang
PhD
Data Scientist
12. SENTIMENTANALYSIS
? Opinion miner
? Extract different polarity i.e. positive or negative
? Classifying the reviews based on the sentiment
expressed by the reviews
16. OVERCOMING
Review data access
NLP model loosing relevant information by
? Filtering out stopwords too discriminative:
Only nice thing is the nearby park.
? Bigrams and trigrams help to better understand reviews
CHALLENGES
29. 5LESSONSLEARNED
1. Getting started is more important than being right
2. Building a model that is not too discriminative
3. Choosing the right Cloud Platform for optimal ROI
4. Structured lean startup methodology
5. Diverse team??#$%?