This document discusses challenges that arise when machine learning researchers collaborate with software engineers. It presents three key problems: 1) Transitioning research code to production-ready code, 2) Interfacing machine learning models with software, and 3) Differences in terminology between machine learning and software engineering. The document explores options for refactoring code to be shared between teams, proposes designing machine learning as an internal API, and emphasizes the importance of pair programming to overcome differences in backgrounds and jargon.
1 of 22
Download to read offline
More Related Content
When a machine learning researcher and a software engineer walk into a bar
1. When a machine learning researcher
and a software engineer
walk into a bar
Presented at
Data Science Slovenia May 2016
by Paul Lam (@quantisan)
2. Multi-disciplinary to Inter-disciplinary
Data Scientist
Engineers
Designers
Product Managers
Users
Analysts
Ops
Project Managers
User Experience
Marketing
Sales
Marilyn Stember, Advancing the social sciences through the interdisciplinary enterprise,
The Social Science Journal, Volume 28, Issue 1, pages 1-14, 1991
6. Research Code to Production Code - Our Options
Use a framework, e.g. TensorFlow
Pros: Leverage community engagement
Cons: Too early for us to be tied to a framework
Re-write to Java/Scala/Clojure
Pros: Performance, engineering team experience
Cons: Performance not an issue for now, double the work
Refactor
Pros: Least amount of immediate efforts for both researcher and engineer
Cons: Best of both worlds or worst of both worlds?
7. Who calls our machine learning component
Machine
Learning
SystemHere and here
8. Code sharing with a ML Python module
Jupyter Notebook
$ pip install burt
import burt
b = burt.Burt(coll)
b.getExperimentBatch()
...
Production Server
$ pip install burt
import burt
b = burt.Burt(coll)
b.getExperimentBatch()
Private PyPI on S3 - https://github.com/novemberfiveco/s3pypi
9. Engineering practices to consider when refactoring
Libraries vs Services
Data persistence
States vs functions
Single responsibility principle
Dont repeat yourself (DRY)
Documentation
Continuous integration
Style standards
Testing
12. Machine learning as an internal API
RESTful API
Machine
Learning
System
ClientService
Microservice - http://martinfowler.com/articles/microservices.html
13. Sequence Diagram - activate campaign
System ML External
Activate email
campaign loadContacts()
return contacts
initializeCampaign()
return campaign ID
getExperimentBatch()
return experiment
Success