�ݺ�ߣ

Customer segmentation
an excuse to use Machine Learning ;-)

�� Julio Martinez
�� Web developer since 2001
�� 2 years working at Ulabox
�� Machine Learning hobbyist
�� Find me: @liopic
Who am I?

1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
Preparing the workshop

My 2017 objective: M.L.
�� Motivation
�� It��s the new hot thing
�� AlphaGo beat Lee Sedol, March 2016
�� Some background, but need to learn more

1. Choose the way
�� Coursera��s vs. books vs. workshops vs. posts
2. Find an excuse to apply it
�� @work is better than @home
Learning about Machine Learning

Customer clusters @work, aka ��the excuse��
�� There is a non-programmer Business Analysis Department
�� Groups of customers based on periodicity + amount spent
�� Example: people that buy once per month, 100� ticket
�� Useful for business reports
�� Not so useful for UX, CRM
�� Groups by behavior? Clustering orders!
Boring!

1. With past data -> make a ML model
�� clean data
�� choose a ML algorithm/s
�� tune the algorithm, with testing
2. With new data -> use model to predict (or give new info)
�� deploy pipeline
�� update model
101 Machine Learning: the method

�� Supervised
�� data + labels(result)
�� Unsupervised
�� just data
�� Reinforcement
�� function to optimize
101 Machine Learning: type of problems

Supervised learning
TRAINING SET
cat cat person
TEST SET
???

Unsupervised learning
TRAINING SET
TEST SET
There is NO test

�� Try to extract features (information, shapes): similar and different
�� Uses:
�� Clustering
�� Anomaly detection (it doesn��t look ��normal��)
�� Dimensional reduction
�� Transfer features, projections ...
Unsupervised learning

�� Use:
�� grouping
�� quantization
�� Algorithms:
�� k-means
�� DBSCAN
Clustering

�� need: how many clusters
k-means

�� need: how many samples at minimum, tune other params
DBSCAN: Density-based spatial clustering of applications with noise

So, ready to hack?
But wait a moment!

�� Data preparation
�� Keep same order of magnitude, usually [0,1]
�� Remove noise
�� Other processes
�� Binarize data, categorical features
�� weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0
�� Process missing data
Before algorithms: data!

�� Explore the data
�� Images are richer than numbers
�� We get more orders at 22h�� vs.
�� Ask domain experts
�� Understand normal & border cases
�� The step at 14h is the web cutoff time
Before algorithms: data!

�� Explore and optimize the data
�� Features that count, feature engineering
�� Avoid the ��curse of dimensionality��
�� Start small, understandable, useful
�� Find excuses to try it, and sell it!
Lessons learned

1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
5. cd scbcn17-customer-segmentation
6. ./jupyter.sh
7. Open the link in your browser and open the Workshop.ipynb file
Let��s hack

�ݺ�ߣ

Customer segmentation scbcn17

More Related Content

Customer segmentation scbcn17