The document provides an overview of machine learning concepts including:
1) It describes the three main components of machine learning - inputs (data), algorithms (models), and outputs (predictions or classifications).
2) It discusses different types of machine learning algorithms including supervised learning, unsupervised learning, and reinforced learning and provides examples.
3) It provides examples of applying machine learning algorithms like linear regression, logistic regression, Naive Bayes, and neural networks to problems like predicting housing prices, spam detection, and university acceptance.
2. Behavioral Analytics
息2018 PayPal Inc. Confidential and proprietary.
I am planning to buy
Hmm! It takes a lot of time
Why do I get weird suggestions
Got it! Let me place the order
Transaction Failed
5. General AI
息2018 PayPal Inc. Confidential and proprietary.
A computer system that uses
learning or other methods to solve
a particular problem.
Specific
Learning mechanisms
Not Extensive
A computer system that operates
like a human brain. Solves new
problems on the spot.
Learn Context
Consciousness
Adaptive
Narrow AI
Personal Assistants
6. 息2018 PayPal Inc. Confidential and proprietary.
MACHINE LEARNING
DEEP MACHINE
LEARNING
NATURAL
LANGUAGE
PROCESSING
ARTIFICIAL INTELLIGENCE
SPEECH
RECOGNITION
EXPERT
SYSTEMS
7. Over the years
息2018 PayPal Inc. Confidential and proprietary.
test to judge whether machines exhibit human
intelligence
TURING TEST
IBMs computer beat the world chess champion
after losing 5 - till the year before
DEEP BLUE
Watson beats two previous winners of the contest
WATSON WINS JEOPARDY
first Artificial Intelligence conference
DARTMOUTH CONFERENCE
NASA's robotic exploration rovers
autonomously navigate the surface of Mars
SPIRIT AND OPPORTUNITY
20101950 1997
1956 2004
screenplay Sunspring, Daddys Car pop song,
Japanese AI novel and dark poems
LITERATURE
Googles Deep Mind was able to identify cats with 75%
accuracy after being fed 10 million YouTube videos
LETS FIND CATS
2016
2012
8. Why is AI & ML famous now?
息2018 PayPal Inc. Confidential and proprietary.
COMPUTATION
DATA
STATISTICAL MODELS
3 PILLARS:
9. What is Machine Learning?
息2018 PayPal Inc. Confidential and proprietary.
Learn from experience Follow instructionsLearn from experience
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
10. What is Machine Learning?
息2018 PayPal Inc. Confidential and proprietary.
Learn from experience Follow instructionsLearn from experience
data
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
12. How does machine learning work?
息2018 PayPal Inc. Confidential and proprietary.
INPUT
ALGORITHM
OUTPUT
BREAK DOWN THE PROCESS INTO THREE COMPONENTS:
AKA DATA SET
AKA MODELS
AKA TARGET LABELS/ GROUPED OUTPUT
13. Inputs: the data that powers ML
息2018 PayPal Inc. Confidential and proprietary.
FROM SOURCE CODE TO STATISTICS, DATA SETS CAN CONTAIN JUST ABOUT ANYTHING
GSA / data - Assorted data from the General Services Administration.
GoogleTrends / data - An index of all open-source data
nationalparkservice / data - An unofficial repository of National Park Service data.
fivethirtyeight / data - Data and code behind the stories and interactives at FiveThirtyEight
beamandrew / medical-data
src-d / awesome-machine-learning-on-source-code
Interesting links & research papers related to Machine Learning applied to source code
ImageNet - large visual database designed for use in visual object recognition software research
14. Algorithms: how data is processed and analyzed
SUPERVISED LEARNING
UNSUPERVISED LEARNING
REINFORCED LEARNING
CATEGORIES:
息2018 PayPal Inc. Confidential and proprietary.
15. Supervised Learning Unsupervised Learning Reinforced Learning
A target output is aimed for and the system
learns from the data model provided
We can give examples,
but we cannot give an algorithm to get from
input to output
No particular end goal.
A structured complex data is
provided to the system to provide
insights
We have some data,
but we have no idea where to start
looking for useful/interesting stuff
Provide feedback on the action
taken by the system which uses it
to learn further
We have no idea how to do
something,
but we can say whether it has been
done right or wrong
息2018 PayPal Inc. Confidential and proprietary.
21. Price of House
息2018 PayPal Inc. Confidential and proprietary.
70 lakhs
1.6 Crore
?
15
10
5
20
Price(10,00,000)
Size of the house (1000 ft squared)
7
5 12
What is the best estimate for the
price of the house?
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
22. Price of House
息2018 PayPal Inc. Confidential and proprietary.
70 lakhs
1.6 Crore
?
15
10
20
Price(10,00,000)
7
5 12
Size of the house (1000 ft squared)
5
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
23. Linear Regression
Size of the house (1000 ft squared)
70 lakhs
1.6 Crore
?
15
10
20
7
5 12
5
11
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
24. Linear Regression
息2018 PayPal Inc. Confidential and proprietary.
ERROR: + ++
Gradient Descent
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
25. Detecting Spam e-mails
Spam Non-SpamCheap
息2018 PayPal Inc. Confidential and proprietary.
100 emails
25 spam 75 Non-spam
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
26. Detecting Spam e-mails
Spam Non-SpamCheap
If an email contains the word cheap, what is the probability of it being spam?
息2018 PayPal Inc. Confidential and proprietary.
20 5
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
27. Detecting Spam e-mails
Spam Non-SpamCheap
20
5
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
28. Detecting Spam e-mails
Spam Non-SpamCheap
80%
20%
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
29. Detecting Spam e-mails
Cheap
Spelling Mistake
Missing title
etc..
80%
70%
95% Naive Bayes
Algorithm
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
30. Acceptance at a University
Test Grades
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
31. Acceptance at a University
GradesTest
Student 1
Test: 9/10
Grades: 8/10
Student 2
Test: 3/10
Grades: 4/10
Student 3
Test: 7/10
Grades: 6/10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
32. Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
33. Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
34. Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
35. Acceptance at a UniversityGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
36. Logistic RegressionGrades
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5
Test
6 7 8 9 10
No
Student 3
Test: 7/10
Grades: 6/10
Quiz:
Does the
student get
Accepted?
Yes
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
37. Logistic Regression
息2018 PayPal Inc. Confidential and proprietary.
ERROR: 2
Gradient Descent
Log-loss function
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
38. Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
Student 3
Test: 9/10
Grades: 6/10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
39. Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
Student 4
Test: 9/10
Grades: 1/10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
40. Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
Student 4
Test: 9/10
Grades: 1/10
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
41. Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
42. Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
43. Grades
10
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5
Test
6 7 8 9 10
Acceptance at a University
Gradient Descent
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
44. Neural Network
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
46. Logistic Regression & Neural Networks
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
47. Convolutional NN Recurrent NN
Fixed size input and outputs.
feed-forward artificial neural network
Use connectivity pattern
Learns to recognize patterns across like study images
break a component into subcomponents
Handle arbitrary input/output lengths.
Internal memory to process arbitrary sequences of
inputs.
Use time-series information i.e. what I spoke last will
impact what I will speak next.
Ideal for text and speech analysis.
Create combinations of subcomponents (image
captioning, text generation, language translation, etc.)
息2018 PayPal Inc. Confidential and proprietary.
48. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
49. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
50. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
51. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
52. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
53. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
54. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
55. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
56. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
57. Did anyone order pizza?
息2018 PayPal Inc. Confidential and proprietary.
Source: https://www.youtube.com/watch?v=IpGxLWOIZy4
70. Supervised learning VS. PLUS
Unsupervised learning
Unsupervised learning as feature engineering
E.g.: clustering + KNN, Matrix Factorization
One of the Tricks in Deep Learning is how it combines unsupervised/supervised learning
Stacked Autoencoders, training of CNN
Source: Quora
71. OUTPUT
息2018 PayPal Inc. Confidential and proprietary.
CLASSIFICATION: GENERATE AN OUTPUT VALUE FOR EACH ITEM IN A DATA SET
REGRESSION: GIVEN THE DATA, PREDICT THE MOST LIKELY VALUE FOR VARIABLE
UNDER CONSIDERATION
CLUSTERING: GROUP THE DATA INTO SIMILAR PATTERNS
A FEW APPROACHES TO FINDING OUTPUTS INCLUDE:
73. ML infrastructure: Experimentation & Production
Option 1:
Favor experimentation and only invest in productionizing once something
shows results. E.g. Have ML researchers use R and then ask Engineers to
implement things in production when they work
Option 2:
Favor production and have researchers struggle to figure out how to run
experiments. E.g. Implement highly optimized code and have ML researchers
experiment only through data available in logs/DB
息2018 PayPal Inc. Confidential and proprietary.
Source: Quora
74. The two faces of your ML infrastructure
Optimal solution:
Have ML researchers experiment on iPython Notebooks using Python tools
(scikit-learn, Theano). Use same tools in production whenever possible,
implement optimized versions only when needed.
Implement abstraction layers on top of optimized implementations so they can
be accessed from regular/friendly experimentation tools
息2018 PayPal Inc. Confidential and proprietary.
Source: Quora
75. The untold story of
Data Science vs. and ML engineering
Is ML at a point at which you dont have to be a data scientist to take advantage of it?
There are good tools to get started, BUT
For state-of-art performance, one needs rigorous quantitative understanding
Source: Quora
76. The data-driven ML innovation funnel
Data Research
Data research & hypothesis
building ->Data Science
AB Testing
Online experimentation, AB Testing analysis->Data Science
ML Exploration Product Design
ML solution building &
implementation ->ML Engineering
Source: Quora
78. Coursera Deep Learning Specialization by deeplearning.ai
Luis Serrano youtube video Lecture [Udacity course]
DEMYSTIFYING DEEP LEARNING&AI
Data Skeptic Podcast
Talking Machines Podcast
ML Hackerearth
Kaggle Tutorial
https://github.com/collections/machine-learning
息2018 PayPal Inc. Confidential and proprietary.
81. 息2018 PayPal Inc. Confidential and proprietary.
scikit-learn / scikit-learn - machine learning in Python
tensorflow / tensorflow - Computation using data flow graphs for scalable machine learning
Theano / Theano - Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays
efficiently
davisking / dlib - A toolkit for making real world machine learning and data analysis applications in C++
apache / predictionio - a machine learning server for developers and ML engineers. Built on Apache Spark, HBase and Spray.
Machine Learning Models&Algorithms | Amazon SageMaker on AWS - Build, train, and deploy machine learning models & algorithms at scale
KNIME - open source data analytics, reporting and integration platform.
MLlib | Apache Spark
Word2vec - group of related models that are used to produce word embeddings
GloVe - Unsupervised learning algorithm for obtaining vector representations for words
shogun-toolbox / shogun
82. Supervised Learning Unsupervised Learning Reinforced Learning
Most Industrialized.
igrigorik / decisiontree - ID3-based
implementation of the ML Decision Tree
algorithm
popular approach in natural
language processing (NLP)
keon / awesome-nlp - A curated
list of resources dedicated to NLP
develop self-driving cars or teach a
robot how to manufacture an item.
openai / gym - A toolkit for
developing and comparing
reinforcement learning algorithms.
aikorea / awesome-rl -
Reinforcement learning resources
curated
EXAMPLES IN PRACTICE:
umutisik / Eigentechno- Principal Component Analysis on music loops
jpmckinney / tf-idf-similarity- Ruby gem to calculate the similarity between texts using tf*idf
scikit-learn-contrib / lightning- Large-scale linear classification, regression and ranking in Python
gwding / draw_convnet
息2018 PayPal Inc. Confidential and proprietary.
84. Certificate in Statistics
and Computational
Data Science
Certificate programs
in R-Programming
and Statistics.
Certificate Programs
in Data Science
(Microsoft
Professional Program)
Certificate Programs
in Machine Learning
CSCI E-81 Machine
Learning and Data
Mining (Harvard
University)
Certificate in Machine
Learning (University
of Washington)
息2018 PayPal Inc. Confidential and proprietary.
86. Artificial Intelligence Graduate Certificate (Stanford University)
Machine Learning at Columbia University (free Content, certification option)
Machine Learning at Georgia Tech (Free Content, certification option)
IBM Watson Certifications
Microsoft Machine Learning & AI Certification
PG Diploma in Machine Learning and AI Upgrad and IIIT-B
Certified Artificial Intelligence Professional (Govt. of India with V Skills)
NVIDIA Deep Learning Programs
息2018 PayPal Inc. Confidential and proprietary.
87. Getting started
josephmisiti / awesome-
machine-learning
A curated list of awesome
Machine Learning frameworks,
libraries and software.
ujjwalkarn / Machine-Learning-
Tutorials -
machine learning and deep
learning tutorials, articles and
other resources
ChristosChristofidis
/ awesome-deep-learning
A curated list of awesome
Deep Learning tutorials,
projects and
communities.
fastai / courses
fast.ai Courses
jtoy / awesome-tensorflow
TensorFlow - A curated list of
dedicated resources
http://tensorflow.org
nlintz / TensorFlow-Tutorials
Simple tutorials using Google's
TensorFlow Framework
pkmital / tensorflow_tutorials
From the basics to slightly more
interesting applications of Tensorflow
Machine Learning Deep Learning TensorFlow
息2018 PayPal Inc. Confidential and proprietary.
88. Reading Resources
Keras Document
Learn about image classification and neural networks
Visualizing and Understanding CNNs
AlexNet
VGGNet
GoogLeNet
Text Classification with Keras
息2018 PayPal Inc. Confidential and proprietary.
93. Math & Machine Learning
Linear Algebra
Calculus
Statistics
Probability
息2018 PayPal Inc. Confidential and proprietary.
MIT Linear Algebra Open Course
MIT Calculus Open Course
MIT Stats and Probability Course