際際滷

際際滷Share a Scribd company logo
Customer segmentation
an excuse to use Machine Learning ;-)
Customer segmentation scbcn17
¢ Julio Martinez
¢ Web developer since 2001
¢ 2 years working at Ulabox
¢ Machine Learning hobbyist
¢ Find me: @liopic
Who am I?
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
Preparing the workshop
My 2017 objective: M.L.
¢ Motivation
$ It¨s the new hot thing
$ AlphaGo beat Lee Sedol, March 2016
¢ Some background, but need to learn more
1. Choose the way
$ Coursera¨s vs. books vs. workshops vs. posts
2. Find an excuse to apply it
$ @work is better than @home
Learning about Machine Learning
Customer clusters @work, aka ^the excuse ̄
¢ There is a non-programmer Business Analysis Department
¢ Groups of customers based on periodicity + amount spent
$ Example: people that buy once per month, 100 ticket
$ Useful for business reports
$ Not so useful for UX, CRM
¢ Groups by behavior? Clustering orders!
Boring!
1. With past data -> make a ML model
$ clean data
$ choose a ML algorithm/s
$ tune the algorithm, with testing
2. With new data -> use model to predict (or give new info)
$ deploy pipeline
$ update model
101 Machine Learning: the method
¢ Supervised
$ data + labels(result)
¢ Unsupervised
$ just data
¢ Reinforcement
$ function to optimize
101 Machine Learning: type of problems
Supervised learning
TRAINING SET
cat cat person
TEST SET
???
Unsupervised learning
TRAINING SET
TEST SET
There is NO test
¢ Try to extract features (information, shapes): similar and different
¢ Uses:
$ Clustering
$ Anomaly detection (it doesn¨t look ^normal ̄)
$ Dimensional reduction
$ Transfer features, projections ...
Unsupervised learning
¢ Use:
$ grouping
$ quantization
¢ Algorithms:
$ k-means
$ DBSCAN
Clustering
¢ need: how many clusters
k-means
¢ need: how many samples at minimum, tune other params
DBSCAN: Density-based spatial clustering of applications with noise
So, ready to hack?
But wait a moment!
¢ Data preparation
$ Keep same order of magnitude, usually [0,1]
$ Remove noise
$ Other processes
* Binarize data, categorical features
¢ weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0
* Process missing data
Before algorithms: data!
¢ Explore the data
$ Images are richer than numbers
* ^We get more orders at 22h ̄ vs.
¢ Ask domain experts
$ Understand normal & border cases
* The step at 14h is the web cutoff time
Before algorithms: data!
¢ Explore and optimize the data
$ Features that count, feature engineering
$ Avoid the ^curse of dimensionality ̄
¢ Start small, understandable, useful
¢ Find excuses to try it, and sell it!
Lessons learned
Now, let¨s hack!
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
5. cd scbcn17-customer-segmentation
6. ./jupyter.sh
7. Open the link in your browser and open the Workshop.ipynb file
Let¨s hack
Thank you!

More Related Content

Customer segmentation scbcn17

  • 1. Customer segmentation an excuse to use Machine Learning ;-)
  • 3. ¢ Julio Martinez ¢ Web developer since 2001 ¢ 2 years working at Ulabox ¢ Machine Learning hobbyist ¢ Find me: @liopic Who am I?
  • 4. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ Preparing the workshop
  • 5. My 2017 objective: M.L. ¢ Motivation $ It¨s the new hot thing $ AlphaGo beat Lee Sedol, March 2016 ¢ Some background, but need to learn more
  • 6. 1. Choose the way $ Coursera¨s vs. books vs. workshops vs. posts 2. Find an excuse to apply it $ @work is better than @home Learning about Machine Learning
  • 7. Customer clusters @work, aka ^the excuse ̄ ¢ There is a non-programmer Business Analysis Department ¢ Groups of customers based on periodicity + amount spent $ Example: people that buy once per month, 100 ticket $ Useful for business reports $ Not so useful for UX, CRM ¢ Groups by behavior? Clustering orders! Boring!
  • 8. 1. With past data -> make a ML model $ clean data $ choose a ML algorithm/s $ tune the algorithm, with testing 2. With new data -> use model to predict (or give new info) $ deploy pipeline $ update model 101 Machine Learning: the method
  • 9. ¢ Supervised $ data + labels(result) ¢ Unsupervised $ just data ¢ Reinforcement $ function to optimize 101 Machine Learning: type of problems
  • 10. Supervised learning TRAINING SET cat cat person TEST SET ???
  • 12. ¢ Try to extract features (information, shapes): similar and different ¢ Uses: $ Clustering $ Anomaly detection (it doesn¨t look ^normal ̄) $ Dimensional reduction $ Transfer features, projections ... Unsupervised learning
  • 13. ¢ Use: $ grouping $ quantization ¢ Algorithms: $ k-means $ DBSCAN Clustering
  • 14. ¢ need: how many clusters k-means
  • 15. ¢ need: how many samples at minimum, tune other params DBSCAN: Density-based spatial clustering of applications with noise
  • 16. So, ready to hack? But wait a moment!
  • 17. ¢ Data preparation $ Keep same order of magnitude, usually [0,1] $ Remove noise $ Other processes * Binarize data, categorical features ¢ weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0 * Process missing data Before algorithms: data!
  • 18. ¢ Explore the data $ Images are richer than numbers * ^We get more orders at 22h ̄ vs. ¢ Ask domain experts $ Understand normal & border cases * The step at 14h is the web cutoff time Before algorithms: data!
  • 19. ¢ Explore and optimize the data $ Features that count, feature engineering $ Avoid the ^curse of dimensionality ̄ ¢ Start small, understandable, useful ¢ Find excuses to try it, and sell it! Lessons learned
  • 21. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ 5. cd scbcn17-customer-segmentation 6. ./jupyter.sh 7. Open the link in your browser and open the Workshop.ipynb file Let¨s hack