This document provides an overview of using customer segmentation and clustering algorithms for machine learning. It introduces the speaker, Julio Martinez, and his goal of learning and applying machine learning, using customer segmentation at his company Ulabox as an "excuse" to do so. It then provides a high-level overview of clustering algorithms like k-means and DBSCAN, the importance of data preparation, and explores clustering customers' ordering data to group them by behaviors.
5. My 2017 objective: M.L.
¢ Motivation
$ It¨s the new hot thing
$ AlphaGo beat Lee Sedol, March 2016
¢ Some background, but need to learn more
6. 1. Choose the way
$ Coursera¨s vs. books vs. workshops vs. posts
2. Find an excuse to apply it
$ @work is better than @home
Learning about Machine Learning
7. Customer clusters @work, aka ^the excuse ̄
¢ There is a non-programmer Business Analysis Department
¢ Groups of customers based on periodicity + amount spent
$ Example: people that buy once per month, 100 ticket
$ Useful for business reports
$ Not so useful for UX, CRM
¢ Groups by behavior? Clustering orders!
Boring!
8. 1. With past data -> make a ML model
$ clean data
$ choose a ML algorithm/s
$ tune the algorithm, with testing
2. With new data -> use model to predict (or give new info)
$ deploy pipeline
$ update model
101 Machine Learning: the method
9. ¢ Supervised
$ data + labels(result)
¢ Unsupervised
$ just data
¢ Reinforcement
$ function to optimize
101 Machine Learning: type of problems
17. ¢ Data preparation
$ Keep same order of magnitude, usually [0,1]
$ Remove noise
$ Other processes
* Binarize data, categorical features
¢ weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0
* Process missing data
Before algorithms: data!
18. ¢ Explore the data
$ Images are richer than numbers
* ^We get more orders at 22h ̄ vs.
¢ Ask domain experts
$ Understand normal & border cases
* The step at 14h is the web cutoff time
Before algorithms: data!
19. ¢ Explore and optimize the data
$ Features that count, feature engineering
$ Avoid the ^curse of dimensionality ̄
¢ Start small, understandable, useful
¢ Find excuses to try it, and sell it!
Lessons learned
21. 1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
5. cd scbcn17-customer-segmentation
6. ./jupyter.sh
7. Open the link in your browser and open the Workshop.ipynb file
Let¨s hack