際際滷

際際滷Share a Scribd company logo
Machine Learning
In easy pieces
Sakshi Ganeriwal
Behavioral Analytics
息2018 PayPal Inc. Confidential and proprietary.
I am planning to buy
Hmm! It takes a lot of time
Why do I get weird suggestions
Got it! Let me place the order
Transaction Failed
Artificial Intelligence
Where have we seen AI?
General AI
息2018 PayPal Inc. Confidential and proprietary.
A computer system that uses
learning or other methods to solve
a particular problem.
Specific
Learning mechanisms
Not Extensive
A computer system that operates
like a human brain. Solves new
problems on the spot.
Learn Context
Consciousness
Adaptive
Narrow AI
Personal Assistants
息2018 PayPal Inc. Confidential and proprietary.
MACHINE LEARNING
DEEP MACHINE
LEARNING
NATURAL
LANGUAGE
PROCESSING
ARTIFICIAL INTELLIGENCE
SPEECH
RECOGNITION
EXPERT
SYSTEMS
Over the years
息2018 PayPal Inc. Confidential and proprietary.
test to judge whether machines exhibit human
intelligence
TURING TEST
IBMs computer beat the world chess champion
after losing 5 - till the year before
DEEP BLUE
Watson beats two previous winners of the contest
WATSON WINS JEOPARDY
first Artificial Intelligence conference
DARTMOUTH CONFERENCE
NASA's robotic exploration rovers
autonomously navigate the surface of Mars
SPIRIT AND OPPORTUNITY
20101950 1997
1956 2004
screenplay Sunspring, Daddys Car pop song,
Japanese AI novel and dark poems
LITERATURE
Googles Deep Mind was able to identify cats with 75%
accuracy after being fed 10 million YouTube videos
LETS FIND CATS
2016
2012
Why is AI & ML famous now?
息2018 PayPal Inc. Confidential and proprietary.
COMPUTATION
DATA
STATISTICAL MODELS
3 PILLARS:
What is Machine Learning?
息2018 PayPal Inc. Confidential and proprietary.
Learn from experience Follow instructionsLearn from experience
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
What is Machine Learning?
息2018 PayPal Inc. Confidential and proprietary.
Learn from experience Follow instructionsLearn from experience
data
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Machine Learning
How does machine learning work?
息2018 PayPal Inc. Confidential and proprietary.
INPUT
ALGORITHM
OUTPUT
BREAK DOWN THE PROCESS INTO THREE COMPONENTS:
AKA DATA SET
AKA MODELS
AKA TARGET LABELS/ GROUPED OUTPUT
Inputs: the data that powers ML
息2018 PayPal Inc. Confidential and proprietary.
FROM SOURCE CODE TO STATISTICS, DATA SETS CAN CONTAIN JUST ABOUT ANYTHING
GSA / data - Assorted data from the General Services Administration.
GoogleTrends / data - An index of all open-source data
nationalparkservice / data - An unofficial repository of National Park Service data.
fivethirtyeight / data - Data and code behind the stories and interactives at FiveThirtyEight
beamandrew / medical-data
src-d / awesome-machine-learning-on-source-code 
Interesting links & research papers related to Machine Learning applied to source code
ImageNet - large visual database designed for use in visual object recognition software research
Algorithms: how data is processed and analyzed
SUPERVISED LEARNING
UNSUPERVISED LEARNING
REINFORCED LEARNING
CATEGORIES:
息2018 PayPal Inc. Confidential and proprietary.
Supervised Learning Unsupervised Learning Reinforced Learning
A target output is aimed for and the system
learns from the data model provided
We can give examples,
but we cannot give an algorithm to get from
input to output
No particular end goal.
A structured complex data is
provided to the system to provide
insights
We have some data,
but we have no idea where to start
looking for useful/interesting stuff
Provide feedback on the action
taken by the system which uses it
to learn further
We have no idea how to do
something,
but we can say whether it has been
done right or wrong
息2018 PayPal Inc. Confidential and proprietary.
Machine Learning in easy pieces
Machine Learning in easy pieces
Reinforcement Learning
Supervised Learning
Which algorithm to use??
Price of House
息2018 PayPal Inc. Confidential and proprietary.
70 lakhs
1.6 Crore
?
15
10
5
20
Price(10,00,000)
Size of the house (1000 ft squared)
7
5 12
What is the best estimate for the
price of the house?
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Price of House
息2018 PayPal Inc. Confidential and proprietary.
70 lakhs
1.6 Crore
?
15
10
20
Price(10,00,000)
7
5 12
Size of the house (1000 ft squared)
5
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Linear Regression
Size of the house (1000 ft squared)
70 lakhs
1.6 Crore
?
15
10
20
7
5 12
5
11
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Linear Regression
息2018 PayPal Inc. Confidential and proprietary.
ERROR: + ++
Gradient Descent
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Detecting Spam e-mails
Spam Non-SpamCheap
息2018 PayPal Inc. Confidential and proprietary.
100 emails
25 spam 75 Non-spam
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Detecting Spam e-mails
Spam Non-SpamCheap
If an email contains the word cheap, what is the probability of it being spam?
息2018 PayPal Inc. Confidential and proprietary.
20 5
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Detecting Spam e-mails
Spam Non-SpamCheap
20
5
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Detecting Spam e-mails
Spam Non-SpamCheap
80%
20%
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Detecting Spam e-mails
Cheap
Spelling Mistake
Missing title
etc..
80%
70%
95% Naive Bayes
Algorithm
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Acceptance at a University
Test Grades
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Acceptance at a University
GradesTest
Student 1
Test: 9/10
Grades: 8/10
Student 2
Test: 3/10
Grades: 4/10
Student 3
Test: 7/10
Grades: 6/10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Logistic RegressionGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Logistic Regression
息2018 PayPal Inc. Confidential and proprietary.
ERROR: 2
Gradient Descent
Log-loss function
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
Student 3
Test: 9/10
Grades: 6/10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
Student 4
Test: 9/10
Grades: 1/10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
Student 4
Test: 9/10
Grades: 1/10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
Gradient Descent
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Neural Network
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Logistic Regression & Neural Networks
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Convolutional NN Recurrent NN
 Fixed size input and outputs.
 feed-forward artificial neural network
 Use connectivity pattern
 Learns to recognize patterns across like study images
 break a component into subcomponents
 Handle arbitrary input/output lengths.
 Internal memory to process arbitrary sequences of
inputs.
 Use time-series information i.e. what I spoke last will
impact what I will speak next.
 Ideal for text and speech analysis.
 Create combinations of subcomponents (image
captioning, text generation, language translation, etc.)
息2018 PayPal Inc. Confidential and proprietary.
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
K-means clustering
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Did anyone order pizza?
STOP
Too Big
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Hierarchical clustering
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
Supervised learning VS. Unsupervised
learning
Source: Quora
Supervised learning VS. Unsupervised
learning
Source: Quora
Supervised learning VS. PLUS
Unsupervised learning
Unsupervised learning as feature engineering
E.g.: clustering + KNN, Matrix Factorization
One of the Tricks in Deep Learning is how it combines unsupervised/supervised learning
Stacked Autoencoders, training of CNN
Source: Quora
OUTPUT
息2018 PayPal Inc. Confidential and proprietary.
CLASSIFICATION: GENERATE AN OUTPUT VALUE FOR EACH ITEM IN A DATA SET
REGRESSION: GIVEN THE DATA, PREDICT THE MOST LIKELY VALUE FOR VARIABLE
UNDER CONSIDERATION
CLUSTERING: GROUP THE DATA INTO SIMILAR PATTERNS
A FEW APPROACHES TO FINDING OUTPUTS INCLUDE:
Machine Learning Infrastructure
Source: Quora
ML infrastructure: Experimentation & Production
Option 1:
Favor experimentation and only invest in productionizing once something
shows results. E.g. Have ML researchers use R and then ask Engineers to
implement things in production when they work
Option 2:
Favor production and have researchers struggle to figure out how to run
experiments. E.g. Implement highly optimized code and have ML researchers
experiment only through data available in logs/DB
息2018 PayPal Inc. Confidential and proprietary.
Source: Quora
The two faces of your ML infrastructure
Optimal solution:
Have ML researchers experiment on iPython Notebooks using Python tools
(scikit-learn, Theano). Use same tools in production whenever possible,
implement optimized versions only when needed.
Implement abstraction layers on top of optimized implementations so they can
be accessed from regular/friendly experimentation tools
息2018 PayPal Inc. Confidential and proprietary.
Source: Quora
The untold story of
Data Science vs. and ML engineering
Is ML at a point at which you dont have to be a data scientist to take advantage of it?
There are good tools to get started, BUT
For state-of-art performance, one needs rigorous quantitative understanding
Source: Quora
The data-driven ML innovation funnel
Data Research
Data research & hypothesis
building ->Data Science
AB Testing
Online experimentation, AB Testing analysis->Data Science
ML Exploration Product Design
ML solution building &
implementation ->ML Engineering
Source: Quora
My Go-To Sources
Coursera Deep Learning Specialization by deeplearning.ai
Luis Serrano youtube video Lecture [Udacity course]
DEMYSTIFYING DEEP LEARNING&AI
Data Skeptic Podcast
Talking Machines Podcast
ML Hackerearth
Kaggle Tutorial
https://github.com/collections/machine-learning
息2018 PayPal Inc. Confidential and proprietary.
Libraries and tools
Machine Learning in easy pieces
息2018 PayPal Inc. Confidential and proprietary.
scikit-learn / scikit-learn - machine learning in Python
tensorflow / tensorflow - Computation using data flow graphs for scalable machine learning
Theano / Theano - Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays
efficiently
davisking / dlib - A toolkit for making real world machine learning and data analysis applications in C++
apache / predictionio - a machine learning server for developers and ML engineers. Built on Apache Spark, HBase and Spray.
Machine Learning Models&Algorithms | Amazon SageMaker on AWS - Build, train, and deploy machine learning models & algorithms at scale
KNIME - open source data analytics, reporting and integration platform.
MLlib | Apache Spark
Word2vec - group of related models that are used to produce word embeddings
GloVe - Unsupervised learning algorithm for obtaining vector representations for words
shogun-toolbox / shogun
Supervised Learning Unsupervised Learning Reinforced Learning
Most Industrialized.
igrigorik / decisiontree - ID3-based
implementation of the ML Decision Tree
algorithm
popular approach in natural
language processing (NLP)
keon / awesome-nlp - A curated
list of resources dedicated to NLP
develop self-driving cars or teach a
robot how to manufacture an item.
openai / gym - A toolkit for
developing and comparing
reinforcement learning algorithms.
aikorea / awesome-rl -
Reinforcement learning resources
curated
EXAMPLES IN PRACTICE:
umutisik / Eigentechno- Principal Component Analysis on music loops
jpmckinney / tf-idf-similarity- Ruby gem to calculate the similarity between texts using tf*idf
scikit-learn-contrib / lightning- Large-scale linear classification, regression and ranking in Python
gwding / draw_convnet
息2018 PayPal Inc. Confidential and proprietary.
Data Analytics - Program
Certificate in Statistics
and Computational
Data Science
Certificate programs
in R-Programming
and Statistics.
Certificate Programs
in Data Science
(Microsoft
Professional Program)
Certificate Programs
in Machine Learning
CSCI E-81 Machine
Learning and Data
Mining (Harvard
University)
Certificate in Machine
Learning (University
of Washington)
息2018 PayPal Inc. Confidential and proprietary.
Artificial Intelligence Certifications
Artificial Intelligence Graduate Certificate (Stanford University)
Machine Learning at Columbia University (free Content, certification option)
Machine Learning at Georgia Tech (Free Content, certification option)
IBM Watson Certifications
Microsoft Machine Learning & AI Certification
PG Diploma in Machine Learning and AI  Upgrad and IIIT-B
Certified Artificial Intelligence Professional (Govt. of India with V Skills)
NVIDIA Deep Learning Programs
息2018 PayPal Inc. Confidential and proprietary.
Getting started
josephmisiti / awesome-
machine-learning 
A curated list of awesome
Machine Learning frameworks,
libraries and software.
ujjwalkarn / Machine-Learning-
Tutorials -
machine learning and deep
learning tutorials, articles and
other resources
ChristosChristofidis
/ awesome-deep-learning
A curated list of awesome
Deep Learning tutorials,
projects and
communities.
fastai / courses
fast.ai Courses
jtoy / awesome-tensorflow
TensorFlow - A curated list of
dedicated resources
http://tensorflow.org
nlintz / TensorFlow-Tutorials
Simple tutorials using Google's
TensorFlow Framework
pkmital / tensorflow_tutorials
From the basics to slightly more
interesting applications of Tensorflow
Machine Learning Deep Learning TensorFlow
息2018 PayPal Inc. Confidential and proprietary.
Reading Resources
Keras Document
Learn about image classification and neural networks
Visualizing and Understanding CNNs
AlexNet
VGGNet
GoogLeNet
Text Classification with Keras
息2018 PayPal Inc. Confidential and proprietary.
IF YOU ONLY REMEMBER
ONE THING FROM THIS TALK
JUST BUILD
SOMETHING WITH DATA
AND EXPECT TO GET STUCK AT IT FOR A WHILE
THE
(SOMEWHAT)
UNFORTUNATE
TRUTH
Machine Learning in easy pieces
Math & Machine Learning
 Linear Algebra
 Calculus
 Statistics
 Probability
息2018 PayPal Inc. Confidential and proprietary.
 MIT Linear Algebra Open Course
 MIT Calculus Open Course
 MIT Stats and Probability Course
Thank You!!
sganeriwal@paypal.com
saganeriwal@gmail.com

More Related Content

Machine Learning in easy pieces

  • 1. Machine Learning In easy pieces Sakshi Ganeriwal
  • 2. Behavioral Analytics 息2018 PayPal Inc. Confidential and proprietary. I am planning to buy Hmm! It takes a lot of time Why do I get weird suggestions Got it! Let me place the order Transaction Failed
  • 4. Where have we seen AI?
  • 5. General AI 息2018 PayPal Inc. Confidential and proprietary. A computer system that uses learning or other methods to solve a particular problem. Specific Learning mechanisms Not Extensive A computer system that operates like a human brain. Solves new problems on the spot. Learn Context Consciousness Adaptive Narrow AI Personal Assistants
  • 6. 息2018 PayPal Inc. Confidential and proprietary. MACHINE LEARNING DEEP MACHINE LEARNING NATURAL LANGUAGE PROCESSING ARTIFICIAL INTELLIGENCE SPEECH RECOGNITION EXPERT SYSTEMS
  • 7. Over the years 息2018 PayPal Inc. Confidential and proprietary. test to judge whether machines exhibit human intelligence TURING TEST IBMs computer beat the world chess champion after losing 5 - till the year before DEEP BLUE Watson beats two previous winners of the contest WATSON WINS JEOPARDY first Artificial Intelligence conference DARTMOUTH CONFERENCE NASA's robotic exploration rovers autonomously navigate the surface of Mars SPIRIT AND OPPORTUNITY 20101950 1997 1956 2004 screenplay Sunspring, Daddys Car pop song, Japanese AI novel and dark poems LITERATURE Googles Deep Mind was able to identify cats with 75% accuracy after being fed 10 million YouTube videos LETS FIND CATS 2016 2012
  • 8. Why is AI & ML famous now? 息2018 PayPal Inc. Confidential and proprietary. COMPUTATION DATA STATISTICAL MODELS 3 PILLARS:
  • 9. What is Machine Learning? 息2018 PayPal Inc. Confidential and proprietary. Learn from experience Follow instructionsLearn from experience Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 10. What is Machine Learning? 息2018 PayPal Inc. Confidential and proprietary. Learn from experience Follow instructionsLearn from experience data Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 12. How does machine learning work? 息2018 PayPal Inc. Confidential and proprietary. INPUT ALGORITHM OUTPUT BREAK DOWN THE PROCESS INTO THREE COMPONENTS: AKA DATA SET AKA MODELS AKA TARGET LABELS/ GROUPED OUTPUT
  • 13. Inputs: the data that powers ML 息2018 PayPal Inc. Confidential and proprietary. FROM SOURCE CODE TO STATISTICS, DATA SETS CAN CONTAIN JUST ABOUT ANYTHING GSA / data - Assorted data from the General Services Administration. GoogleTrends / data - An index of all open-source data nationalparkservice / data - An unofficial repository of National Park Service data. fivethirtyeight / data - Data and code behind the stories and interactives at FiveThirtyEight beamandrew / medical-data src-d / awesome-machine-learning-on-source-code Interesting links & research papers related to Machine Learning applied to source code ImageNet - large visual database designed for use in visual object recognition software research
  • 14. Algorithms: how data is processed and analyzed SUPERVISED LEARNING UNSUPERVISED LEARNING REINFORCED LEARNING CATEGORIES: 息2018 PayPal Inc. Confidential and proprietary.
  • 15. Supervised Learning Unsupervised Learning Reinforced Learning A target output is aimed for and the system learns from the data model provided We can give examples, but we cannot give an algorithm to get from input to output No particular end goal. A structured complex data is provided to the system to provide insights We have some data, but we have no idea where to start looking for useful/interesting stuff Provide feedback on the action taken by the system which uses it to learn further We have no idea how to do something, but we can say whether it has been done right or wrong 息2018 PayPal Inc. Confidential and proprietary.
  • 21. Price of House 息2018 PayPal Inc. Confidential and proprietary. 70 lakhs 1.6 Crore ? 15 10 5 20 Price(10,00,000) Size of the house (1000 ft squared) 7 5 12 What is the best estimate for the price of the house? Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 22. Price of House 息2018 PayPal Inc. Confidential and proprietary. 70 lakhs 1.6 Crore ? 15 10 20 Price(10,00,000) 7 5 12 Size of the house (1000 ft squared) 5 Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 23. Linear Regression Size of the house (1000 ft squared) 70 lakhs 1.6 Crore ? 15 10 20 7 5 12 5 11 Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 24. Linear Regression 息2018 PayPal Inc. Confidential and proprietary. ERROR: + ++ Gradient Descent Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 25. Detecting Spam e-mails Spam Non-SpamCheap 息2018 PayPal Inc. Confidential and proprietary. 100 emails 25 spam 75 Non-spam Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 26. Detecting Spam e-mails Spam Non-SpamCheap If an email contains the word cheap, what is the probability of it being spam? 息2018 PayPal Inc. Confidential and proprietary. 20 5 Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 27. Detecting Spam e-mails Spam Non-SpamCheap 20 5 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 28. Detecting Spam e-mails Spam Non-SpamCheap 80% 20% 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 29. Detecting Spam e-mails Cheap Spelling Mistake Missing title etc.. 80% 70% 95% Naive Bayes Algorithm 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 30. Acceptance at a University Test Grades 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 31. Acceptance at a University GradesTest Student 1 Test: 9/10 Grades: 8/10 Student 2 Test: 3/10 Grades: 4/10 Student 3 Test: 7/10 Grades: 6/10 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 32. Acceptance at a UniversityGrades 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 Test 6 7 8 9 10 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 33. Acceptance at a UniversityGrades 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 Test 6 7 8 9 10 No Student 3 Test: 7/10 Grades: 6/10 Quiz: Does the student get Accepted? Yes 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 34. Acceptance at a UniversityGrades 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 Test 6 7 8 9 10 No Student 3 Test: 7/10 Grades: 6/10 Quiz: Does the student get Accepted? Yes 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 35. Acceptance at a UniversityGrades 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 Test 6 7 8 9 10 No Student 3 Test: 7/10 Grades: 6/10 Quiz: Does the student get Accepted? Yes 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 36. Logistic RegressionGrades 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 Test 6 7 8 9 10 No Student 3 Test: 7/10 Grades: 6/10 Quiz: Does the student get Accepted? Yes 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 37. Logistic Regression 息2018 PayPal Inc. Confidential and proprietary. ERROR: 2 Gradient Descent Log-loss function Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 38. Grades 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 Test 6 7 8 9 10 Acceptance at a University Student 3 Test: 9/10 Grades: 6/10 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 39. Grades 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 Test 6 7 8 9 10 Acceptance at a University Student 4 Test: 9/10 Grades: 1/10 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 40. Grades 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 Test 6 7 8 9 10 Acceptance at a University Student 4 Test: 9/10 Grades: 1/10 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 41. Grades 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 Test 6 7 8 9 10 Acceptance at a University 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 42. Grades 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 Test 6 7 8 9 10 Acceptance at a University 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 43. Grades 10 9 8 7 6 5 4 3 2 1 0 0 1 2 3 4 5 Test 6 7 8 9 10 Acceptance at a University Gradient Descent 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 44. Neural Network 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 46. Logistic Regression & Neural Networks 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 47. Convolutional NN Recurrent NN Fixed size input and outputs. feed-forward artificial neural network Use connectivity pattern Learns to recognize patterns across like study images break a component into subcomponents Handle arbitrary input/output lengths. Internal memory to process arbitrary sequences of inputs. Use time-series information i.e. what I spoke last will impact what I will speak next. Ideal for text and speech analysis. Create combinations of subcomponents (image captioning, text generation, language translation, etc.) 息2018 PayPal Inc. Confidential and proprietary.
  • 48. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 49. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 50. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 51. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 52. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 53. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 54. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 55. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 56. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 57. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 58. K-means clustering 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 59. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 60. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 61. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 62. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 63. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 64. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 65. Did anyone order pizza? 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 66. Did anyone order pizza? STOP Too Big 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 67. Hierarchical clustering 息2018 PayPal Inc. Confidential and proprietary. Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
  • 68. Supervised learning VS. Unsupervised learning Source: Quora
  • 69. Supervised learning VS. Unsupervised learning Source: Quora
  • 70. Supervised learning VS. PLUS Unsupervised learning Unsupervised learning as feature engineering E.g.: clustering + KNN, Matrix Factorization One of the Tricks in Deep Learning is how it combines unsupervised/supervised learning Stacked Autoencoders, training of CNN Source: Quora
  • 71. OUTPUT 息2018 PayPal Inc. Confidential and proprietary. CLASSIFICATION: GENERATE AN OUTPUT VALUE FOR EACH ITEM IN A DATA SET REGRESSION: GIVEN THE DATA, PREDICT THE MOST LIKELY VALUE FOR VARIABLE UNDER CONSIDERATION CLUSTERING: GROUP THE DATA INTO SIMILAR PATTERNS A FEW APPROACHES TO FINDING OUTPUTS INCLUDE:
  • 73. ML infrastructure: Experimentation & Production Option 1: Favor experimentation and only invest in productionizing once something shows results. E.g. Have ML researchers use R and then ask Engineers to implement things in production when they work Option 2: Favor production and have researchers struggle to figure out how to run experiments. E.g. Implement highly optimized code and have ML researchers experiment only through data available in logs/DB 息2018 PayPal Inc. Confidential and proprietary. Source: Quora
  • 74. The two faces of your ML infrastructure Optimal solution: Have ML researchers experiment on iPython Notebooks using Python tools (scikit-learn, Theano). Use same tools in production whenever possible, implement optimized versions only when needed. Implement abstraction layers on top of optimized implementations so they can be accessed from regular/friendly experimentation tools 息2018 PayPal Inc. Confidential and proprietary. Source: Quora
  • 75. The untold story of Data Science vs. and ML engineering Is ML at a point at which you dont have to be a data scientist to take advantage of it? There are good tools to get started, BUT For state-of-art performance, one needs rigorous quantitative understanding Source: Quora
  • 76. The data-driven ML innovation funnel Data Research Data research & hypothesis building ->Data Science AB Testing Online experimentation, AB Testing analysis->Data Science ML Exploration Product Design ML solution building & implementation ->ML Engineering Source: Quora
  • 78. Coursera Deep Learning Specialization by deeplearning.ai Luis Serrano youtube video Lecture [Udacity course] DEMYSTIFYING DEEP LEARNING&AI Data Skeptic Podcast Talking Machines Podcast ML Hackerearth Kaggle Tutorial https://github.com/collections/machine-learning 息2018 PayPal Inc. Confidential and proprietary.
  • 81. 息2018 PayPal Inc. Confidential and proprietary. scikit-learn / scikit-learn - machine learning in Python tensorflow / tensorflow - Computation using data flow graphs for scalable machine learning Theano / Theano - Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently davisking / dlib - A toolkit for making real world machine learning and data analysis applications in C++ apache / predictionio - a machine learning server for developers and ML engineers. Built on Apache Spark, HBase and Spray. Machine Learning Models&Algorithms | Amazon SageMaker on AWS - Build, train, and deploy machine learning models & algorithms at scale KNIME - open source data analytics, reporting and integration platform. MLlib | Apache Spark Word2vec - group of related models that are used to produce word embeddings GloVe - Unsupervised learning algorithm for obtaining vector representations for words shogun-toolbox / shogun
  • 82. Supervised Learning Unsupervised Learning Reinforced Learning Most Industrialized. igrigorik / decisiontree - ID3-based implementation of the ML Decision Tree algorithm popular approach in natural language processing (NLP) keon / awesome-nlp - A curated list of resources dedicated to NLP develop self-driving cars or teach a robot how to manufacture an item. openai / gym - A toolkit for developing and comparing reinforcement learning algorithms. aikorea / awesome-rl - Reinforcement learning resources curated EXAMPLES IN PRACTICE: umutisik / Eigentechno- Principal Component Analysis on music loops jpmckinney / tf-idf-similarity- Ruby gem to calculate the similarity between texts using tf*idf scikit-learn-contrib / lightning- Large-scale linear classification, regression and ranking in Python gwding / draw_convnet 息2018 PayPal Inc. Confidential and proprietary.
  • 83. Data Analytics - Program
  • 84. Certificate in Statistics and Computational Data Science Certificate programs in R-Programming and Statistics. Certificate Programs in Data Science (Microsoft Professional Program) Certificate Programs in Machine Learning CSCI E-81 Machine Learning and Data Mining (Harvard University) Certificate in Machine Learning (University of Washington) 息2018 PayPal Inc. Confidential and proprietary.
  • 86. Artificial Intelligence Graduate Certificate (Stanford University) Machine Learning at Columbia University (free Content, certification option) Machine Learning at Georgia Tech (Free Content, certification option) IBM Watson Certifications Microsoft Machine Learning & AI Certification PG Diploma in Machine Learning and AI Upgrad and IIIT-B Certified Artificial Intelligence Professional (Govt. of India with V Skills) NVIDIA Deep Learning Programs 息2018 PayPal Inc. Confidential and proprietary.
  • 87. Getting started josephmisiti / awesome- machine-learning A curated list of awesome Machine Learning frameworks, libraries and software. ujjwalkarn / Machine-Learning- Tutorials - machine learning and deep learning tutorials, articles and other resources ChristosChristofidis / awesome-deep-learning A curated list of awesome Deep Learning tutorials, projects and communities. fastai / courses fast.ai Courses jtoy / awesome-tensorflow TensorFlow - A curated list of dedicated resources http://tensorflow.org nlintz / TensorFlow-Tutorials Simple tutorials using Google's TensorFlow Framework pkmital / tensorflow_tutorials From the basics to slightly more interesting applications of Tensorflow Machine Learning Deep Learning TensorFlow 息2018 PayPal Inc. Confidential and proprietary.
  • 88. Reading Resources Keras Document Learn about image classification and neural networks Visualizing and Understanding CNNs AlexNet VGGNet GoogLeNet Text Classification with Keras 息2018 PayPal Inc. Confidential and proprietary.
  • 89. IF YOU ONLY REMEMBER ONE THING FROM THIS TALK
  • 90. JUST BUILD SOMETHING WITH DATA AND EXPECT TO GET STUCK AT IT FOR A WHILE
  • 93. Math & Machine Learning Linear Algebra Calculus Statistics Probability 息2018 PayPal Inc. Confidential and proprietary. MIT Linear Algebra Open Course MIT Calculus Open Course MIT Stats and Probability Course