際際滷

際際滷Share a Scribd company logo
When a machine learning researcher
and a software engineer
walk into a bar
Presented at
Data Science Slovenia May 2016
by Paul Lam (@quantisan)
Multi-disciplinary to Inter-disciplinary
Data Scientist
Engineers
Designers
Product Managers
Users
Analysts
Ops
Project Managers
User Experience
Marketing
Sales
Marilyn Stember, Advancing the social sciences through the interdisciplinary enterprise,
The Social Science Journal, Volume 28, Issue 1, pages 1-14, 1991
Real engagement makes people
happy, fulfilled, heard, and motivated
Researcher, Engineer, Designer, and Product Manager
Machine
Learning
System
Problem 1: Research Code to Production Code
Machine
Learning
System
Research Code to Production Code - Our Options
Use a framework, e.g. TensorFlow
 Pros: Leverage community engagement
 Cons: Too early for us to be tied to a framework
Re-write to Java/Scala/Clojure
 Pros: Performance, engineering team experience
 Cons: Performance not an issue for now, double the work
Refactor
 Pros: Least amount of immediate efforts for both researcher and engineer
 Cons: Best of both worlds or worst of both worlds?
Who calls our machine learning component
Machine
Learning
SystemHere and here
Code sharing with a ML Python module
Jupyter Notebook
$ pip install burt
import burt
b = burt.Burt(coll)
b.getExperimentBatch()
...
Production Server
$ pip install burt
import burt
b = burt.Burt(coll)
b.getExperimentBatch()

Private PyPI on S3 - https://github.com/novemberfiveco/s3pypi
Engineering practices to consider when refactoring
 Libraries vs Services
 Data persistence
 States vs functions
 Single responsibility principle
 Dont repeat yourself (DRY)
 Documentation
 Continuous integration
 Style standards
 Testing
Share best practices
Problem 2: Interfacing with the ML module
Machine
Learning
System
Machine learning as an internal API
RESTful API
Machine
Learning
System
ClientService
Microservice - http://martinfowler.com/articles/microservices.html
Sequence Diagram - activate campaign
System ML External
Activate email
campaign loadContacts()
return contacts
initializeCampaign()
return campaign ID
getExperimentBatch()
return experiment
Success
API definition: getExperimentBatch()
Input Output
Our internal machine learning API
[1] swagger.io, [2] flask-restplus
Specifications over
documentation
Problem 3: We are used to our own jargons
Machine
Learning
System
Mysteriously failing unit test on getExperimentBatch()
When GET is more than GET
Pair Programming
Pair up
Summary
Machine
Learning
System

More Related Content

When a machine learning researcher and a software engineer walk into a bar

  • 1. When a machine learning researcher and a software engineer walk into a bar Presented at Data Science Slovenia May 2016 by Paul Lam (@quantisan)
  • 2. Multi-disciplinary to Inter-disciplinary Data Scientist Engineers Designers Product Managers Users Analysts Ops Project Managers User Experience Marketing Sales Marilyn Stember, Advancing the social sciences through the interdisciplinary enterprise, The Social Science Journal, Volume 28, Issue 1, pages 1-14, 1991
  • 3. Real engagement makes people happy, fulfilled, heard, and motivated
  • 4. Researcher, Engineer, Designer, and Product Manager Machine Learning System
  • 5. Problem 1: Research Code to Production Code Machine Learning System
  • 6. Research Code to Production Code - Our Options Use a framework, e.g. TensorFlow Pros: Leverage community engagement Cons: Too early for us to be tied to a framework Re-write to Java/Scala/Clojure Pros: Performance, engineering team experience Cons: Performance not an issue for now, double the work Refactor Pros: Least amount of immediate efforts for both researcher and engineer Cons: Best of both worlds or worst of both worlds?
  • 7. Who calls our machine learning component Machine Learning SystemHere and here
  • 8. Code sharing with a ML Python module Jupyter Notebook $ pip install burt import burt b = burt.Burt(coll) b.getExperimentBatch() ... Production Server $ pip install burt import burt b = burt.Burt(coll) b.getExperimentBatch() Private PyPI on S3 - https://github.com/novemberfiveco/s3pypi
  • 9. Engineering practices to consider when refactoring Libraries vs Services Data persistence States vs functions Single responsibility principle Dont repeat yourself (DRY) Documentation Continuous integration Style standards Testing
  • 11. Problem 2: Interfacing with the ML module Machine Learning System
  • 12. Machine learning as an internal API RESTful API Machine Learning System ClientService Microservice - http://martinfowler.com/articles/microservices.html
  • 13. Sequence Diagram - activate campaign System ML External Activate email campaign loadContacts() return contacts initializeCampaign() return campaign ID getExperimentBatch() return experiment Success
  • 15. Our internal machine learning API [1] swagger.io, [2] flask-restplus
  • 17. Problem 3: We are used to our own jargons Machine Learning System
  • 18. Mysteriously failing unit test on getExperimentBatch()
  • 19. When GET is more than GET