Machine Learning in Production: Manu Mukerji, Strata CA March 2018

Download as ppt, pdf

3 likes1,029 views

The document outlines a talk by Manu Mukerji on machine learning in production, covering the standard ML flow, data considerations, model testing, and deployment. It emphasizes the importance of training data, the challenges of model accuracy, and the need for a structured team to implement ML solutions effectively. The talk also highlights the application of ML in categorizing products across multiple countries and presents real-world examples, such as self-driving cars.

Engineering

Machine Learning in Production
Manu Mukerji

What is this talk about?
? Agenda:
- Introduction to the business problem
- Normal ML Flow
- Training Data
- Test Data
- Model Creation
- Testing
- How it ties together: The production flow
- Team Setup
- ML Production Examples
- Questions?
- Live Demo!

Typical ML Example
? This is the ”hello world” equivalent of what you find online:

Generalized ML Flow
1. Gather Data
2. Train Model
3. Test for accuracy (most examples end here)
4. Save model for external consumption
5. Use saved model for prediction

Our use case
? Categorization of products
? Categorization of products into ~4800 categories
? Categorization of 6B products into ~4800 categories across 30 countries
? Categorization of 6B products into ~4800 categories across 30 countries every day!

Gathering Training Data
Annotation UI
HDFSElasticsearch
? Annotate at category level (Less than 10K/country)
? Try and map customer category to Google Category
? Expand training data and infer GCAT from mapping (this brings
it into millions of products)
? What else did we try:
? Mechanical Turk
? External Companies that provide data

More About Training Data
? Bad data, bad predictions
? Overfitting: if you have a hammer, everything looks like a nail
? When to retrain?
? When to add more data?

Test Set
? Test set selection
- Normal method: split 10-20% from training data
- Production method: custom test set based on business value
? Scoring of test set
- Normal method correct/total
- Problem that can occur with test set scoring
- Advanced version: negative points for negative customer value

Manual Overrides… Why?
? Prediction will never be 100% accurate
? When its wrong it impacts business

Testing your model
? Training takes time, this is why your test set is really important
? Automate the build pipeline to run evaluation and deploy your model only if its better than
existing one
? Canary test the whole pipeline
? Advanced Resource:
- Chase Roberts: https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765

Scaling Out
? Now we need to do this in 30 countries
? Easier with Latin alphabet languages
? Deep learning to the rescue: this isn't just a cool thing that folks are talking about
4.7 英寸 Retina HD 示器。一款采用显 64 位面架的桌级构 A8 芯片。焦点像素带
的 8MP iSight 相机。触摸 ID ...
4.7 inch Retina HD display. A 64-bit desktop architecture with the A8 chip.
8MP iSight with focus pixel. Touch ID

Accuracy over time
? 70% accuracy ….1 month
? 80% accuracy ….3-6 months
? ~90% accuracy …. 1 year
? The last mile……. Rest of your life!

Team setup for ML
? Team setup
- UI/API Team
- ML/AI Engineers
- ML/AI Research

Knowing when to stop?
? This is hard!
? Get something out!
? Don’t work in a vacuum
? Get the circular data flow working
? Remember business value, don’t over engineer it

Examples of interesting AI in production
? Self driving cars
- Some Components:
- Lots of sensors, cameras
- Object detection, and distinguishing what can move vs not
- Lane detection
- Red light vs green

Self Driving..continued
? If I trained a self driving car model with just 50hrs of data would you trust it?
? Probably not…
? According to the DMV in order to get a license: “Have completed 50 hours of practice with
an adult 25 of age years or older.”1
? 50 Hours at 60 MPH is 3000 miles…
? Would you trust 1,300,000,000 miles?
1
https://www.dmv.ca.gov/portal/dmv/detail/teenweb/permit_btn1/permit
2
https://www.bloomberg.com/news/articles/2016-12-20/the-tesla-advantage-1-3-billion-miles-of-data

Machine Learning in Production: Manu Mukerji, Strata CA March 2018

Live Demo: Help me with training data
? You remember how I said Training data is hard to get!

Thank You
? TensorFlow
? Pandas, Scikit-learn
? Andrew Ng!

Machine Learning in Production: Manu Mukerji, Strata CA March 2018

1. Machine Learning in Production Manu Mukerji

2. What is this talk about? ? Agenda: - Introduction to the business problem - Normal ML Flow - Training Data - Test Data - Model Creation - Testing - How it ties together: The production flow - Team Setup - ML Production Examples - Questions? - Live Demo!

3. Typical ML Example ? This is the ”hello world” equivalent of what you find online:

4. Generalized ML Flow 1. Gather Data 2. Train Model 3. Test for accuracy (most examples end here) 4. Save model for external consumption 5. Use saved model for prediction

5. Our use case ? Categorization of products ? Categorization of products into ~4800 categories ? Categorization of 6B products into ~4800 categories across 30 countries ? Categorization of 6B products into ~4800 categories across 30 countries every day!

6. Training Data Data is the asset!

7. Training data …lack of training data

8. Gathering Training Data Annotation UI HDFSElasticsearch ? Annotate at category level (Less than 10K/country) ? Try and map customer category to Google Category ? Expand training data and infer GCAT from mapping (this brings it into millions of products) ? What else did we try: ? Mechanical Turk ? External Companies that provide data

9. More About Training Data ? Bad data, bad predictions ? Overfitting: if you have a hammer, everything looks like a nail ? When to retrain? ? When to add more data?

10. Test Set ? Test set selection - Normal method: split 10-20% from training data - Production method: custom test set based on business value ? Scoring of test set - Normal method correct/total - Problem that can occur with test set scoring - Advanced version: negative points for negative customer value

11. Manual Overrides… Why? ? Prediction will never be 100% accurate ? When its wrong it impacts business

12. Testing your model ? Training takes time, this is why your test set is really important ? Automate the build pipeline to run evaluation and deploy your model only if its better than existing one ? Canary test the whole pipeline ? Advanced Resource: - Chase Roberts: https://medium.com/@keeper6928/how-to-unit-test-machine-learning-code-57cf6fd81765

13. Scaling Out ? Now we need to do this in 30 countries ? Easier with Latin alphabet languages ? Deep learning to the rescue: this isn't just a cool thing that folks are talking about 4.7 英寸 Retina HD 示器。一款采用显 64 位面架的桌级构 A8 芯片。焦点像素带的 8MP iSight 相机。触摸 ID ... 4.7 inch Retina HD display. A 64-bit desktop architecture with the A8 chip. 8MP iSight with focus pixel. Touch ID

14. Accuracy over time ? 70% accuracy ….1 month ? 80% accuracy ….3-6 months ? ~90% accuracy …. 1 year ? The last mile……. Rest of your life!

15. Team setup for ML ? Team setup - UI/API Team - ML/AI Engineers - ML/AI Research

16. Knowing when to stop? ? This is hard! ? Get something out! ? Don’t work in a vacuum ? Get the circular data flow working ? Remember business value, don’t over engineer it

17. How it all ties in together

18. Examples of interesting AI in production ? Self driving cars - Some Components: - Lots of sensors, cameras - Object detection, and distinguishing what can move vs not - Lane detection - Red light vs green

19. Self Driving..continued ? If I trained a self driving car model with just 50hrs of data would you trust it? ? Probably not… ? According to the DMV in order to get a license: “Have completed 50 hours of practice with an adult 25 of age years or older.”1 ? 50 Hours at 60 MPH is 3000 miles… ? Would you trust 1,300,000,000 miles? 1 https://www.dmv.ca.gov/portal/dmv/detail/teenweb/permit_btn1/permit 2 https://www.bloomberg.com/news/articles/2016-12-20/the-tesla-advantage-1-3-billion-miles-of-data

21. The future

22. Questions?

23. Live Demo: Help me with training data ? You remember how I said Training data is hard to get!

24. Thank You ? TensorFlow ? Pandas, Scikit-learn ? Andrew Ng!

Editor's Notes

#2: About me
#4: This is a typical ML tutorial you can find online… there is no problem with this per say but this isn’t what happens in production, my goal in this talk is to bridge that gap between ML in production and ML on papers
#7: Data is the asset! In the world of online advertising we have a saying, “If you are not buying a product online, you are the product being sold” and no where is this more true than in Machine Learning… Alexa is a good example of this… apple had a head start on the personal assistant with Siri and Amazon was able to come in and break that monopoly… and now products say Works with Alexa not works with Siri Now imagine you have a great idea about a text to speech engine using deep learning, your algorithm is way better than Amazons but without that data your model will never perform as well..
#8: These are the kind of labels we need, this is hard to get correct, for example if you see a photo of a phone case how do you know if case is its own category or its under phone accessories
#9: When a user would come in they would see 50 random products from the category and we would ask to confirm if everything looks fine.. We would do this 3times per category If its not correct they can remove a product or change a category
#10: Bad data bad predictions: wrong categories from customers would mess up category mappings Overfitting: frys example.. Hard drive vs laptop When to retrain: seasonality example sweaters When to add more data: category coverage is more important than total volume, if we can lable all 6b products we wouldn’t need ML… if you realize that you are not getting good results in a particular category you can try adding more data, but this doesn’t always work and its expensive
#11: Test set selection: normal is a random sample, what we did was we based it on revenue.. You still want to get a good mix but have more products in that group.. Scoring of test set: example with earthquake warning app…4/5 points Advanced version: apple fruit vs apple products apple wont care… sansung/apple but if you categorize an galaxy s8 as an iphone they will care
#12: Click to example: buying soup
#13: Training: takes a long time, if you are using DL it can take days, the code doesn’t break or complain you just see poor results Automate the pipeline Canary test.. In our case we made a fake retailer with know products and some overrides and make sure that when we run those we get the expected results Advance resource: I wont get into it here, but Chase talks about some very good ways of adding unit tests..
#15: I don’t reallly have data points for these numbers… Walking is an example Driving is an example..
#16: Front end work, annotation UI, elastic, getting data in and out, ML eng: working on model pipeline, scale issues, data engineering Research: feature engineering, trying new weights etc, new types of models, new papers… image learning is an example.. Pizza box team … with a side of salad.. The important distinction is their work doesn’t fit into a sprint…
#17: Get something out Don’t work in a vacuum: researchers working on same dataset for years.. Iterrate after.. Get the circular data flow working
#20: Self driving is getting a lot of press… its overhyped on one end and on the other end people are scared of it.. This is my attempt to generalize it..
#21: Child Walking Experience is training your neural net with data Matrix helicopter example

狠狠撸

Machine Learning in Production: Manu Mukerji, Strata CA March 2018

More Related Content

Similar to Machine Learning in Production: Manu Mukerji, Strata CA March 2018 (20)

Recently uploaded (20)

Machine Learning in Production: Manu Mukerji, Strata CA March 2018

Editor's Notes