ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
A General Overview of Machine
Learning
Boise Data Science Meetup -- September 18, 2018
Ashish Sharma
? Software Systems Engineer -- HomeCU, LLC. (2017 - present)
? Founder -- AI Developers, Boise
? City Ambassador -- AI Saturdays (global initiative of nurture.ai)
? Alumnus -- Boise State University (MS in Computer Science, 2015-2017)
1
Overview
¡ñ AI and Applications
¡ñ Intro to Machine Learning
¡ñ Types of Machine Learning
¡ñ Which algorithm should I use?
¡ñ Effective Machine Learning
Image Source: Cousins of Artificial Intelligence -- Towards Datascience 2
AI Resurgence
? Computational Power (GPUs, cloud computing, distributed systems)
? Availability of large amount of Data (eg. Imagenet)
? Better theoretical understanding of the underlying techniques/algorithms
? Open and easily accessible research culture in academia and industry
(NIPS, ICML, archiv.org)
3
AI Resurgence (contd..)
? Netflix Challenge (2009) $1 Million Prize (User ratings for films)
? Kaggle (2010) (over more than a million users today)
? Fei-Fei Li and team at Stanford open sourced ImageNet (2008-2010)
¡ô Imagenet Large Scale Visual Recognition Challenge (ILSVRC)
? Geoffrey Hinton¡¯s Deep Learning Team wins ImageNet 2012 (Alexnet)
4
Common Applications
? Speech recognition (virtual assistants)
? Advanced machine translation and natural language intelligence
? Strategic gaming algorithms (AlphaGo, chess)
? Computer Vision (image classification and object detection)
? Autonomous Vehicles
? Manufacturing Companies (landing.ai)
? Healthcare (Google¡¯s research on diabetic retinopathy -- with F-score of
0.95, surpassing the accuracy of 8 expert ophthalmologists)
5
Machine Learning
? Form of applied statistics with emphasis
on the use of computers to learn
complex mathematical functions.
? More formally, ¡°A computer program is
said to learn from experience E with
respect to some class of tasks T and
performance measure P, if its
performance at tasks in T, as measured
by P, improves with experience E.¡±
Image Source: xkcd
6
Types of Machine Learning
? Supervised Learning
? Unsupervised Learning
? Reinforcement Learning
7
Supervised Learning
Terminologies:
? Input variable(s)
¡ô independent variable(s)
¡ô feature(s)/characteristic(s) of a single input object
¡ô Numerical -- continuous ( height, area of house) , discrete (grades, age)
¡ô Categorical (race, sex) -- nominal, ordinal
? Target variable(s)
¡ô Dependent variable(s), number/vector (eg. price of house, patient is diabetic, etc.)
8
Supervised Learning
? Function approximation
¡ô Mathematically: solve for coefficient(s) of a function
¡ô Search for a best performing model from a hypothesis space.
¡ô Make predictions based on historical (labeled) data
? Regression (predict continuous target variable)
¡ô Univariate Regression (1 input variable, 1 output variable)
¡ô Multiple Regression (>=2 input variables, 1 output variable)
¡ô Multivariate Regression (>=2 output variable)
? Classification (predict discrete/categorical target variable)
¡ô Email: Spam or not?
¡ô Is this image a dog or cat?
9
Unsupervised Learning
? Unsupervised Learning
¡ô Find hidden patterns and draw inference from (unlabeled) data
¡ô Essential for preliminary data analysis and visualization
? Clustering (grouping of similar data points)
¡ô K-Means, DBSCAN
? Dimensionality Reduction
¡ô Principal Components Analysis
¡ô Autoencoders
10
Reinforcement Learning
? AI, Animal Psychology, Control Theory
? Agents, Actions, Environment, Change in State, Reward/Punishment
? Eg. Deep Attari:
¡ô Input: Snapshots of Attari board images (State and Actions)
¡ô Algorithm: Convolutional NNs with no pooling
¡ô Output layer: tailored for regression score (Maximize Reward)
11
Beginner¡¯s Question!
? (Q)* Which Algorithm Should I Use?
? (A) The answer varies depending on many factors, including:
¡ô The size, quality, and nature of data ;
¡ô The available computational time;
¡ô The urgency of the task; and
¡ô What you want to do with the data(the problem).
* towardsdatascience.com
12
Which algorithm should I use?
¡ô No one algorithm works best for every problem (Yes, not even neural networks!)
13
Important Concepts
? Model Selection:
¡ô K-crossfold validation
¡ô Train/Test/Evaluation Dataset
? Loss functions
? Convex Optimization
? Gradient Descent
? Model Complexity, Overfitting and Underfitting
? Regularization
? Training and Generalization Errors
14
Questions to ask when working on ML project!
? How much data do I have? What type/nature of data?
? How skilled and knowledgeable am I in this domain?
¡ô Will I be able to create more useful features from what I already have?
? How good am I in error analysis?
15
Questions to ask when working on ML project!
? Assumptions, Limitations and Adoption (ALA rule) of the algorithm.
¡ô Linear Regression (linear relationship, no or little multicollinearity, etc.)
¡ô Why does this particular loss function make sense?
? How good am I in debugging the chosen learning algorithm?
16
Effective Machine Learning
? Reduce time spent in programming (more experiments in short time)
¡ô Use off the shelf tools
? Customize and Scale Products
¡ô Start simple, scale as needed (again, choice of relevant toolsets)
? Think like a Scientist
¡ô Use statistics, not logic, to make decisions from the real world observations
* ºÝºÝߣ content referred from Google¡¯s Machine Learning Crash Course
17
Thank You
Ashish Sharma
Email: accssharma@gmail.com
/in/accssharma
@accssharma
AI Developers, Boise: https://github.com/aidevelopersboise/ai6-boise-materials
HomeCU is hiring Software Engineers and Mobile Developers.
https://www.homecu.net/company-jobs.html
18
Visual Demonstrations
? K nearest neighbor: http://vision.stanford.edu/teaching/cs231n-demos/knn/
? CIFAR 10 Image Classification:
https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
19

More Related Content

A General Overview of Machine Learning

  • 1. A General Overview of Machine Learning Boise Data Science Meetup -- September 18, 2018 Ashish Sharma ? Software Systems Engineer -- HomeCU, LLC. (2017 - present) ? Founder -- AI Developers, Boise ? City Ambassador -- AI Saturdays (global initiative of nurture.ai) ? Alumnus -- Boise State University (MS in Computer Science, 2015-2017) 1
  • 2. Overview ¡ñ AI and Applications ¡ñ Intro to Machine Learning ¡ñ Types of Machine Learning ¡ñ Which algorithm should I use? ¡ñ Effective Machine Learning Image Source: Cousins of Artificial Intelligence -- Towards Datascience 2
  • 3. AI Resurgence ? Computational Power (GPUs, cloud computing, distributed systems) ? Availability of large amount of Data (eg. Imagenet) ? Better theoretical understanding of the underlying techniques/algorithms ? Open and easily accessible research culture in academia and industry (NIPS, ICML, archiv.org) 3
  • 4. AI Resurgence (contd..) ? Netflix Challenge (2009) $1 Million Prize (User ratings for films) ? Kaggle (2010) (over more than a million users today) ? Fei-Fei Li and team at Stanford open sourced ImageNet (2008-2010) ¡ô Imagenet Large Scale Visual Recognition Challenge (ILSVRC) ? Geoffrey Hinton¡¯s Deep Learning Team wins ImageNet 2012 (Alexnet) 4
  • 5. Common Applications ? Speech recognition (virtual assistants) ? Advanced machine translation and natural language intelligence ? Strategic gaming algorithms (AlphaGo, chess) ? Computer Vision (image classification and object detection) ? Autonomous Vehicles ? Manufacturing Companies (landing.ai) ? Healthcare (Google¡¯s research on diabetic retinopathy -- with F-score of 0.95, surpassing the accuracy of 8 expert ophthalmologists) 5
  • 6. Machine Learning ? Form of applied statistics with emphasis on the use of computers to learn complex mathematical functions. ? More formally, ¡°A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.¡± Image Source: xkcd 6
  • 7. Types of Machine Learning ? Supervised Learning ? Unsupervised Learning ? Reinforcement Learning 7
  • 8. Supervised Learning Terminologies: ? Input variable(s) ¡ô independent variable(s) ¡ô feature(s)/characteristic(s) of a single input object ¡ô Numerical -- continuous ( height, area of house) , discrete (grades, age) ¡ô Categorical (race, sex) -- nominal, ordinal ? Target variable(s) ¡ô Dependent variable(s), number/vector (eg. price of house, patient is diabetic, etc.) 8
  • 9. Supervised Learning ? Function approximation ¡ô Mathematically: solve for coefficient(s) of a function ¡ô Search for a best performing model from a hypothesis space. ¡ô Make predictions based on historical (labeled) data ? Regression (predict continuous target variable) ¡ô Univariate Regression (1 input variable, 1 output variable) ¡ô Multiple Regression (>=2 input variables, 1 output variable) ¡ô Multivariate Regression (>=2 output variable) ? Classification (predict discrete/categorical target variable) ¡ô Email: Spam or not? ¡ô Is this image a dog or cat? 9
  • 10. Unsupervised Learning ? Unsupervised Learning ¡ô Find hidden patterns and draw inference from (unlabeled) data ¡ô Essential for preliminary data analysis and visualization ? Clustering (grouping of similar data points) ¡ô K-Means, DBSCAN ? Dimensionality Reduction ¡ô Principal Components Analysis ¡ô Autoencoders 10
  • 11. Reinforcement Learning ? AI, Animal Psychology, Control Theory ? Agents, Actions, Environment, Change in State, Reward/Punishment ? Eg. Deep Attari: ¡ô Input: Snapshots of Attari board images (State and Actions) ¡ô Algorithm: Convolutional NNs with no pooling ¡ô Output layer: tailored for regression score (Maximize Reward) 11
  • 12. Beginner¡¯s Question! ? (Q)* Which Algorithm Should I Use? ? (A) The answer varies depending on many factors, including: ¡ô The size, quality, and nature of data ; ¡ô The available computational time; ¡ô The urgency of the task; and ¡ô What you want to do with the data(the problem). * towardsdatascience.com 12
  • 13. Which algorithm should I use? ¡ô No one algorithm works best for every problem (Yes, not even neural networks!) 13
  • 14. Important Concepts ? Model Selection: ¡ô K-crossfold validation ¡ô Train/Test/Evaluation Dataset ? Loss functions ? Convex Optimization ? Gradient Descent ? Model Complexity, Overfitting and Underfitting ? Regularization ? Training and Generalization Errors 14
  • 15. Questions to ask when working on ML project! ? How much data do I have? What type/nature of data? ? How skilled and knowledgeable am I in this domain? ¡ô Will I be able to create more useful features from what I already have? ? How good am I in error analysis? 15
  • 16. Questions to ask when working on ML project! ? Assumptions, Limitations and Adoption (ALA rule) of the algorithm. ¡ô Linear Regression (linear relationship, no or little multicollinearity, etc.) ¡ô Why does this particular loss function make sense? ? How good am I in debugging the chosen learning algorithm? 16
  • 17. Effective Machine Learning ? Reduce time spent in programming (more experiments in short time) ¡ô Use off the shelf tools ? Customize and Scale Products ¡ô Start simple, scale as needed (again, choice of relevant toolsets) ? Think like a Scientist ¡ô Use statistics, not logic, to make decisions from the real world observations * ºÝºÝߣ content referred from Google¡¯s Machine Learning Crash Course 17
  • 18. Thank You Ashish Sharma Email: accssharma@gmail.com /in/accssharma @accssharma AI Developers, Boise: https://github.com/aidevelopersboise/ai6-boise-materials HomeCU is hiring Software Engineers and Mobile Developers. https://www.homecu.net/company-jobs.html 18
  • 19. Visual Demonstrations ? K nearest neighbor: http://vision.stanford.edu/teaching/cs231n-demos/knn/ ? CIFAR 10 Image Classification: https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html 19