際際滷

際際滷Share a Scribd company logo
Machine Learning
Micha opuszyski
ICM, Warsaw, 2017.01.31
Engineering, maintenance costs, technical debt
Goes Production!
Hmmm... My telly says,
machine learning is amazingly cool.
Should I care about all this
engineering, maintenance costs,
technical debt?
Oh yes! You'd better do!
Example  Hooray! We can predict flu!
Example  Fast forward 5 years. Hey, can we?!?
doi:10.1126/science.1248506
Great supplementary material is available for this paper! Check this link.
What to do?
Not good.
It 's engineering, stupid!
ML engineering  reading list
[Sculley]
Software Engineering for Machine Learning, NIPS 2014 Workshop
ML engineering  reading list
[Sculley]
NIPS 2015
ML engineering  reading list
[Zinkevich]
Reliable Machine Learning in the Wild - NIPS 2016 Workshop
ML engineering  reading list
[Breck]
There is also a presentation on this topic
https://sites.google.com/site/wildml2016nips/Sculley際際滷s1.pdf
Reliable Machine Learning in the Wild - NIPS 2016 Workshop
One more cool thing about the above papers
ML NOW
DISCUSSED
PAPERS
THE HYPE CURVE
VISIBILITY
TIME
So, what they say?
Wisdom learnt the hard way [Sculley]
As the machine learning (ML) community continues
to accumulate years of experience with live systems,
a wide-spread and uncomfortable trend has emerged:
developing and deploying ML systems is relatively
fast and cheap, but maintaining them over time is
difficult and expensive.
This dichotomy can be understood through the lens of
technical debt (...)
Technical debt?
What does it even mean?
Technical debt
Sources of technical debt in ML [Sculley]
Complex models and boundaries erosion
Expensive data dependencies
Feedback loops
Common anti-patterns
Configuration management deficiencies
Changes in the external world
Complex models, boundaries erosion [Sculley]
In programming we strive for separation of concerns, isolation,
encapsulation. More often than not, ML makes that difficult

Entanglement
CACE principle = changing anything changes everything

Correction cascades
Undeclared customers
Undeclared consumers are expensive at best and dangerous at worst
Expensive data dependencies [Sculley]
Data dependencies cost more than code dependencies.
Unstable data dependencies
Underutilized data dependencies
Legacy features
Bundled features
Epsilon features
Correlated features, esp. with one root-cause feature
Static analysis of data dependencies is extremely helpful
Think workflow tools and provenance tracking!
Feedback loops [Sculley]
Direct feedback loops
Hidden feedback loops
Especially, indirect feedback loops are difficult to track!
Common anti-patterns [Sculley]
Glue code
Real systems = 5% ML code + 95% glue code
Rewrite general purpose packages or wrap in a common API

Pipeline jungles
Especially, indirect feedback loops are difficult to track!

Dead experimental code paths
Knight Capital case, 465M$ lost in 45 min. from obsolete exp. code

Abstraction debt
ML abstractions much less developed than, e.g., in relational databases

Bad code smells (less severe anti-patterns)
Plain old data smell
Multi-language smell
Prototype smell
Configuration debt [Sculley]
Another potentially surprising area where debt can accumulate is
in the configuration of ML systems. (...) In a mature system which is
being actively developed, the number of lines of configuration can far
exceed the number of lines of the traditional code. Each configuration
line has a potential for mistakes.

It should be easy to specify a configuration as a small change from a
previous configuration

Configurations should undergo a full code review and be checked into a
repository

It should be hard to make manual errors, omissions, or oversights
It should be easy to see, visually, the difference in configuration between
two models

It should be easy to automatically assert and verify basic facts about the
configuration: features used, transitive closure of data dependencies, etc.

It should be possible to detect unused or redundant settings
Changes in the external world [Sculley]
External world  not stable and beyond control of ML system
maintainers

Comprehensive live monitoring of the system is crucial for
maintenance

Prediction bias
Action limits
Up-stream producers
What to monitor?
Sample sources of problems
Fixed or manually updated thresholds in configuration
Spurious/vanishing correlations
Monitoring [Zinkevich]
Rule #8: Know the freshness requirements of your system
Rule #9: Detect problems before exporting models
Rule #10: Watch for silent failures
Rule #11: Give feature sets owners and documentation
What should be tested/monitored in ML sys. [Breck]
Testing features and data
Test distribution, correlation, other statistical properties, cost of each feature ...
Testing model development
Test off-line scores vs. on-line performance (e.g., via A/B test), impact of
hyperparameters, impact of model freshness, quality on data slices,
comparison with simple baseline, ...
Testing ML infrastructure
Reproducibility of training, model quality before serving, fast roll-backs to
previous versions, ...
Monitoring ML in production
Nans or infinities in the output, computational performance problems or RAM
usage, decrease in quality of results, ...
Other areas of ML-related debt [Sculley]
Culture
Deletion of features, reduction of complexity, improvements in
reproducibility, stability, and monitoring are valued the same (or more!) as
improvements in accuracy

(...) This is most likely to occur within heterogeneous teams with strengths in
both ML research and engineering

Reproducibility debt
ML-system behaviour is difficult to reproduce exactly, because of randomized
algorithms, non-determinism inherent in parallel processing, reliance on initial
conditions, interactions with the external world, ...
Data testing debt
ML converts data into code. For that code to be correct, data need to be
correct. But how do you test data?
Process management debt
How is deployment, maintenance, configuration, recovery of the infrastructure
handled? Bad smell a lot of manual work
Measuring technical debt [Sculley]
Does improving one model or signal degrade others?
What is the transitive closure of all data dependencies?
How easily can an entirely new algorithmic approach be tested at
full scale?

How precisely can the impact of a new change to the system
be measured?

How quickly can new members of the team be brought up to speed?
Thank you!
Questions?
@lopusz

More Related Content

Machine Learning Goes Production

  • 1. Machine Learning Micha opuszyski ICM, Warsaw, 2017.01.31 Engineering, maintenance costs, technical debt Goes Production!
  • 2. Hmmm... My telly says, machine learning is amazingly cool. Should I care about all this engineering, maintenance costs, technical debt?
  • 3. Oh yes! You'd better do!
  • 4. Example Hooray! We can predict flu!
  • 5. Example Fast forward 5 years. Hey, can we?!? doi:10.1126/science.1248506 Great supplementary material is available for this paper! Check this link.
  • 8. ML engineering reading list [Sculley] Software Engineering for Machine Learning, NIPS 2014 Workshop
  • 9. ML engineering reading list [Sculley] NIPS 2015
  • 10. ML engineering reading list [Zinkevich] Reliable Machine Learning in the Wild - NIPS 2016 Workshop
  • 11. ML engineering reading list [Breck] There is also a presentation on this topic https://sites.google.com/site/wildml2016nips/Sculley際際滷s1.pdf Reliable Machine Learning in the Wild - NIPS 2016 Workshop
  • 12. One more cool thing about the above papers ML NOW DISCUSSED PAPERS THE HYPE CURVE VISIBILITY TIME
  • 14. Wisdom learnt the hard way [Sculley] As the machine learning (ML) community continues to accumulate years of experience with live systems, a wide-spread and uncomfortable trend has emerged: developing and deploying ML systems is relatively fast and cheap, but maintaining them over time is difficult and expensive. This dichotomy can be understood through the lens of technical debt (...)
  • 15. Technical debt? What does it even mean?
  • 17. Sources of technical debt in ML [Sculley] Complex models and boundaries erosion Expensive data dependencies Feedback loops Common anti-patterns Configuration management deficiencies Changes in the external world
  • 18. Complex models, boundaries erosion [Sculley] In programming we strive for separation of concerns, isolation, encapsulation. More often than not, ML makes that difficult Entanglement CACE principle = changing anything changes everything Correction cascades Undeclared customers Undeclared consumers are expensive at best and dangerous at worst
  • 19. Expensive data dependencies [Sculley] Data dependencies cost more than code dependencies. Unstable data dependencies Underutilized data dependencies Legacy features Bundled features Epsilon features Correlated features, esp. with one root-cause feature Static analysis of data dependencies is extremely helpful Think workflow tools and provenance tracking!
  • 20. Feedback loops [Sculley] Direct feedback loops Hidden feedback loops Especially, indirect feedback loops are difficult to track!
  • 21. Common anti-patterns [Sculley] Glue code Real systems = 5% ML code + 95% glue code Rewrite general purpose packages or wrap in a common API Pipeline jungles Especially, indirect feedback loops are difficult to track! Dead experimental code paths Knight Capital case, 465M$ lost in 45 min. from obsolete exp. code Abstraction debt ML abstractions much less developed than, e.g., in relational databases Bad code smells (less severe anti-patterns) Plain old data smell Multi-language smell Prototype smell
  • 22. Configuration debt [Sculley] Another potentially surprising area where debt can accumulate is in the configuration of ML systems. (...) In a mature system which is being actively developed, the number of lines of configuration can far exceed the number of lines of the traditional code. Each configuration line has a potential for mistakes. It should be easy to specify a configuration as a small change from a previous configuration Configurations should undergo a full code review and be checked into a repository It should be hard to make manual errors, omissions, or oversights It should be easy to see, visually, the difference in configuration between two models It should be easy to automatically assert and verify basic facts about the configuration: features used, transitive closure of data dependencies, etc. It should be possible to detect unused or redundant settings
  • 23. Changes in the external world [Sculley] External world not stable and beyond control of ML system maintainers Comprehensive live monitoring of the system is crucial for maintenance Prediction bias Action limits Up-stream producers What to monitor? Sample sources of problems Fixed or manually updated thresholds in configuration Spurious/vanishing correlations
  • 24. Monitoring [Zinkevich] Rule #8: Know the freshness requirements of your system Rule #9: Detect problems before exporting models Rule #10: Watch for silent failures Rule #11: Give feature sets owners and documentation
  • 25. What should be tested/monitored in ML sys. [Breck] Testing features and data Test distribution, correlation, other statistical properties, cost of each feature ... Testing model development Test off-line scores vs. on-line performance (e.g., via A/B test), impact of hyperparameters, impact of model freshness, quality on data slices, comparison with simple baseline, ... Testing ML infrastructure Reproducibility of training, model quality before serving, fast roll-backs to previous versions, ... Monitoring ML in production Nans or infinities in the output, computational performance problems or RAM usage, decrease in quality of results, ...
  • 26. Other areas of ML-related debt [Sculley] Culture Deletion of features, reduction of complexity, improvements in reproducibility, stability, and monitoring are valued the same (or more!) as improvements in accuracy (...) This is most likely to occur within heterogeneous teams with strengths in both ML research and engineering Reproducibility debt ML-system behaviour is difficult to reproduce exactly, because of randomized algorithms, non-determinism inherent in parallel processing, reliance on initial conditions, interactions with the external world, ... Data testing debt ML converts data into code. For that code to be correct, data need to be correct. But how do you test data? Process management debt How is deployment, maintenance, configuration, recovery of the infrastructure handled? Bad smell a lot of manual work
  • 27. Measuring technical debt [Sculley] Does improving one model or signal degrade others? What is the transitive closure of all data dependencies? How easily can an entirely new algorithmic approach be tested at full scale? How precisely can the impact of a new change to the system be measured? How quickly can new members of the team be brought up to speed?