This document provides an overview of machine learning concepts including what machine learning is, common machine learning tasks like fraud detection and recommendation engines, and different machine learning techniques like supervised and unsupervised learning. It discusses neural networks and deep learning, and explains the machine learning process from data acquisition to model deployment. It also covers important concepts for evaluating machine learning models like overfitting, accuracy, recall, precision, F1 score, confusion matrices, and regression metrics like mean absolute error, mean squared error and root mean squared error.
Machine learning: A Walk Through School ExamsRamsha Ijaz
油
When it comes to studying, Machines and Students have one thing in common: Examinations. To perform well on their final evaluations, humans require taking classes, reading books and solving practice quizzes. Similarly, machines need artificial intelligence to memorize data, infer feature correlations, and pass validation standards in order to solve almost any problem. In this quick introductory session, we'll walk through these analogies to learn the core concepts behind Machine Learning, and why it works so well!
This document provides an overview of machine learning presented by Mr. Raviraj Solanki. It discusses topics like introduction to machine learning, model preparation, modelling and evaluation. It defines key concepts like algorithms, models, predictor variables, response variables, training data and testing data. It also explains the differences between human learning and machine learning, types of machine learning including supervised learning and unsupervised learning. Supervised learning is further divided into classification and regression problems. Popular algorithms for supervised learning like random forest, decision trees, logistic regression, support vector machines, linear regression, regression trees and more are also mentioned.
This document provides an overview of machine learning. It discusses supervised learning techniques like classification and regression. It also covers unsupervised learning techniques like clustering, dimensionality reduction, and association rule learning. The document outlines the machine learning workflow and compares instance-based versus model-based learning. It discusses challenges like insufficient data, poor data quality, irrelevant features, and overfitting. The goal is to provide learners with a base to build machine learning skills and solve problems using techniques like regression, data preprocessing, visualization, and evaluating models.
The document provides an overview of machine learning, including definitions, types of machine learning algorithms, and the machine learning process. It defines machine learning as using algorithms to learn from data and make predictions. The main types discussed are supervised learning (classification, regression), unsupervised learning (clustering, association rules), and deep learning using neural networks. The machine learning process involves gathering data, feature engineering, splitting data into training/test sets, selecting an algorithm, training a model, validating it on a validation set, and testing it on a held-out test set. Key enablers of machine learning like large datasets and computing power are also mentioned.
This document provides an introduction to machine learning and data science. It discusses key concepts like supervised vs. unsupervised learning, classification algorithms, overfitting and underfitting data. It also addresses challenges like having bad quality or insufficient training data. Python and MATLAB are introduced as suitable software for machine learning projects.
Supervised learning is a fundamental concept in machine learning, where a computer algorithm learns from labeled data to make predictions or decisions. It is a type of machine learning paradigm that involves training a model on a dataset where both the input data and the corresponding desired output (or target) are provided. The goal of supervised learning is to learn a mapping or relationship between inputs and outputs so that the model can make accurate predictions on new, unseen data.v
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org. If you would like to use this material to further our mission of improving access to machine learning. Education please reach out to inquiry@deltanalytics.org.
Reviewing progress in the machine learning certification journey
署介稲駒 艶艶駒駒腫 - Short tech talk on How to Network by Qingyue(Annie) Wang
C腫諮介諮 瑞介駒介 腫 AI and ML on Google Cloud by Margaret Maynard-Reid
叶腫稲介 稲腫諮介諮 瑞介駒介 腫 署瑞腫告介 叶瑞咋駒諮, 咋腫艶介 介告駒腫, 諮 叶駒瑞諮介 by Sowndarya Venkateswaran.
A discussion on sample questions to aid certification exam preparation.
An interactive Q&A session to clarify doubts and questions.
Previewing next steps and topics, including course completions and material reviews.
This document discusses best practices for setting up development and test sets for machine learning models. It recommends that the dev and test sets:
1) Should reflect the actual data distribution you want your model to perform well on, rather than just being a random split of your training data.
2) Should come from the same data distribution. Having mismatched dev and test sets makes progress harder to measure.
3) The dev set should be large enough, typically thousands to tens of thousands of examples, to detect small performance differences as models are improved. The test set size depends on desired confidence in overall performance.
The document provides an overview of regression problems in machine learning. It discusses the different types of regression including simple linear regression, multiple linear regression, and polynomial regression. It explains concepts like error, metrics like R-squared, MAE, and MSE. It also covers model performance issues like underfitting and overfitting, and techniques to address them such as regularization, early stopping, gradient descent, and cross-validation. The goal is to help learners understand regression problems and how to develop and evaluate regression models.
Types of Machine Learning- Tanvir Siddike MoinTanvir Moin
油
Machine learning can be broadly categorized into four main types based on how they learn from data:
Supervised Learning: Imagine a teacher showing you labeled examples (like classifying pictures of cats and dogs). Supervised learning algorithms learn from labeled data, where each data point has a corresponding answer or label. The algorithm analyzes the data and learns to map the inputs to the desired outputs. This is commonly used for tasks like spam filtering, image recognition, and weather prediction.
Unsupervised Learning: Unlike supervised learning, unsupervised learning deals with unlabeled data. It's like being given a pile of toys and asked to organize them however you see fit. The algorithm finds hidden patterns or structures within the data. This is useful for tasks like customer segmentation, anomaly detection, and recommendation systems.
Reinforcement Learning: This is inspired by how humans learn through trial and error. The algorithm interacts with its environment and receives rewards for good decisions and penalties for bad ones. Over time, it learns to take actions that maximize the rewards. This is used in applications like training self-driving cars and playing games like chess.
Semi-Supervised Learning: This combines aspects of supervised and unsupervised learning. It leverages a small amount of labeled data along with a larger amount of unlabeled data to improve the learning process. This is beneficial when labeled data is scarce or expensive to obtain.
1. The document discusses machine learning types including supervised learning, unsupervised learning, and reinforcement learning. It provides examples of applications like spam filtering, recommendations, and fraud detection.
2. Key challenges in machine learning are discussed such as poor quality data, lack of training data, and imperfections when data grows.
3. The difference between data science and machine learning is explained - data science is a broader field that includes extracting insights from data using tools and models, while machine learning focuses specifically on making predictions using algorithms.
Machine learning involves using data to allow computers to learn without being explicitly programmed. There are three main types of machine learning problems: supervised learning, unsupervised learning, and reinforcement learning. The typical machine learning process involves five steps: 1) data gathering, 2) data preprocessing, 3) feature engineering, 4) algorithm selection and training, and 5) making predictions. Generalization is an important concept that relates to how well a model trained on one dataset can predict outcomes on an unseen dataset. Both underfitting and overfitting can lead to poor generalization by introducing bias or variance errors.
Machine learning is a type of artificial intelligence that allows systems to learn from data without being explicitly programmed. The document provides an introduction to machine learning, explaining what it is, why it is used, common algorithms, advantages, and challenges. Some key challenges discussed include poor quality data, overfitting or underfitting training data, the complexity of machine learning processes, lack of training data, slow implementation speeds, and imperfections in algorithms as data grows.
Machine learning is a subset of artificial intelligence focused on developing algorithms and models that enable computers to learn from data without being explicitly programmed. There are three main types of machine learning: supervised learning which uses labeled training data, unsupervised learning which finds patterns in unlabeled data, and reinforcement learning where a computer agent learns to maximize rewards through trial and error interactions with an environment.
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
油
This project uses logistic regression to build a cricket match win predictor. It analyzes match and ball-by-ball data to extract important features, performs exploratory data analysis to derive additional predictive features, and fits a logistic regression model to predict the winning probability of teams based on the game situation. The model achieves an accuracy of 86% on the test data. Future work includes predicting the winner based only on the first innings and adding a user interface to allow custom predictions.
Supervised learning is a fundamental concept in machine learning, where a computer algorithm learns from labeled data to make predictions or decisions. It is a type of machine learning paradigm that involves training a model on a dataset where both the input data and the corresponding desired output (or target) are provided. The goal of supervised learning is to learn a mapping or relationship between inputs and outputs so that the model can make accurate predictions on new, unseen data.v
Delta Analytics is a 501(c)3 non-profit in the Bay Area. We believe that data is powerful, and that anybody should be able to harness it for change. Our teaching fellows partner with schools and organizations worldwide to work with students excited about the power of data to do good.
Welcome to the course! These modules will teach you the fundamental building blocks and the theory necessary to be a responsible machine learning practitioner in your own community. Each module focuses on accessible examples designed to teach you about good practices and the powerful (yet surprisingly simple) algorithms we use to model data.
To learn more about our mission or provide feedback, take a look at www.deltanalytics.org. If you would like to use this material to further our mission of improving access to machine learning. Education please reach out to inquiry@deltanalytics.org.
Reviewing progress in the machine learning certification journey
署介稲駒 艶艶駒駒腫 - Short tech talk on How to Network by Qingyue(Annie) Wang
C腫諮介諮 瑞介駒介 腫 AI and ML on Google Cloud by Margaret Maynard-Reid
叶腫稲介 稲腫諮介諮 瑞介駒介 腫 署瑞腫告介 叶瑞咋駒諮, 咋腫艶介 介告駒腫, 諮 叶駒瑞諮介 by Sowndarya Venkateswaran.
A discussion on sample questions to aid certification exam preparation.
An interactive Q&A session to clarify doubts and questions.
Previewing next steps and topics, including course completions and material reviews.
This document discusses best practices for setting up development and test sets for machine learning models. It recommends that the dev and test sets:
1) Should reflect the actual data distribution you want your model to perform well on, rather than just being a random split of your training data.
2) Should come from the same data distribution. Having mismatched dev and test sets makes progress harder to measure.
3) The dev set should be large enough, typically thousands to tens of thousands of examples, to detect small performance differences as models are improved. The test set size depends on desired confidence in overall performance.
The document provides an overview of regression problems in machine learning. It discusses the different types of regression including simple linear regression, multiple linear regression, and polynomial regression. It explains concepts like error, metrics like R-squared, MAE, and MSE. It also covers model performance issues like underfitting and overfitting, and techniques to address them such as regularization, early stopping, gradient descent, and cross-validation. The goal is to help learners understand regression problems and how to develop and evaluate regression models.
Types of Machine Learning- Tanvir Siddike MoinTanvir Moin
油
Machine learning can be broadly categorized into four main types based on how they learn from data:
Supervised Learning: Imagine a teacher showing you labeled examples (like classifying pictures of cats and dogs). Supervised learning algorithms learn from labeled data, where each data point has a corresponding answer or label. The algorithm analyzes the data and learns to map the inputs to the desired outputs. This is commonly used for tasks like spam filtering, image recognition, and weather prediction.
Unsupervised Learning: Unlike supervised learning, unsupervised learning deals with unlabeled data. It's like being given a pile of toys and asked to organize them however you see fit. The algorithm finds hidden patterns or structures within the data. This is useful for tasks like customer segmentation, anomaly detection, and recommendation systems.
Reinforcement Learning: This is inspired by how humans learn through trial and error. The algorithm interacts with its environment and receives rewards for good decisions and penalties for bad ones. Over time, it learns to take actions that maximize the rewards. This is used in applications like training self-driving cars and playing games like chess.
Semi-Supervised Learning: This combines aspects of supervised and unsupervised learning. It leverages a small amount of labeled data along with a larger amount of unlabeled data to improve the learning process. This is beneficial when labeled data is scarce or expensive to obtain.
1. The document discusses machine learning types including supervised learning, unsupervised learning, and reinforcement learning. It provides examples of applications like spam filtering, recommendations, and fraud detection.
2. Key challenges in machine learning are discussed such as poor quality data, lack of training data, and imperfections when data grows.
3. The difference between data science and machine learning is explained - data science is a broader field that includes extracting insights from data using tools and models, while machine learning focuses specifically on making predictions using algorithms.
Machine learning involves using data to allow computers to learn without being explicitly programmed. There are three main types of machine learning problems: supervised learning, unsupervised learning, and reinforcement learning. The typical machine learning process involves five steps: 1) data gathering, 2) data preprocessing, 3) feature engineering, 4) algorithm selection and training, and 5) making predictions. Generalization is an important concept that relates to how well a model trained on one dataset can predict outcomes on an unseen dataset. Both underfitting and overfitting can lead to poor generalization by introducing bias or variance errors.
Machine learning is a type of artificial intelligence that allows systems to learn from data without being explicitly programmed. The document provides an introduction to machine learning, explaining what it is, why it is used, common algorithms, advantages, and challenges. Some key challenges discussed include poor quality data, overfitting or underfitting training data, the complexity of machine learning processes, lack of training data, slow implementation speeds, and imperfections in algorithms as data grows.
Machine learning is a subset of artificial intelligence focused on developing algorithms and models that enable computers to learn from data without being explicitly programmed. There are three main types of machine learning: supervised learning which uses labeled training data, unsupervised learning which finds patterns in unlabeled data, and reinforcement learning where a computer agent learns to maximize rewards through trial and error interactions with an environment.
Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data.
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
油
This project uses logistic regression to build a cricket match win predictor. It analyzes match and ball-by-ball data to extract important features, performs exploratory data analysis to derive additional predictive features, and fits a logistic regression model to predict the winning probability of teams based on the game situation. The model achieves an accuracy of 86% on the test data. Future work includes predicting the winner based only on the first innings and adding a user interface to allow custom predictions.
A measles outbreak originating in West Texas has been linked to confirmed cases in New Mexico, with additional cases reported in Oklahoma and Kansas. 58 individuals have required hospitalization, and 3 deaths, 2 children in Texas and 1 adult in New Mexico. These fatalities mark the first measles-related deaths in the United States since 2015 and the first pediatric measles death since 2003. The YSPH The Virtual Medical Operations Center Briefs (VMOC) were created as a service-learning project by faculty and graduate students at the Yale School of Public Health in response to the 2010 Haiti Earthquake. Each year, the VMOC Briefs are produced by students enrolled in Environmental Health Science Course 581 - Public Health Emergencies: Disaster Planning and Response. These briefs compile diverse information sources including status reports, maps, news articles, and web content into a single, easily digestible document that can be widely shared and used interactively.Key features of this report include:
- Comprehensive Overview: Provides situation updates, maps, relevant news, and web resources.
- Accessibility: Designed for easy reading, wide distribution, and interactive use.
- Collaboration: The unlocked" format enables other responders to share, copy, and adapt it seamlessly.
The students learn by doing, quickly discovering how and where to find critical油information and presenting油it in an easily understood manner.油油
Proteins, Bio similars & Antibodies.pptxAshish Umale
油
The slides describe about the protein along with biosimilar data, which is helpful for the study respect to the subject. antibody is known to be active against antigen to show its action in treatment of various disease condition.
These slides gives you the information regarding the topic of protein, biosimilars and details about antibody in response to the antigen along with targeted drug to the antigen. As this topic data is useful for the students of sem VI who are studying in Bachelor of Pharmacy with respect to the subject Pharmacology III.
GenAI for Trading and Asset Management by Ernest ChanQuantInsti
油
Presentation from the power-packed webinar on AI-driven trading and automation, bringing together leading experts and a community of over 6000 traders, analysts, developers, and students.
Participants got the chance to interact live with experts, ask questions, and gain practical, actionable skills in automated tradingmaking this webinar a useful resource for anyone serious about the future of trading technology.
In Session 1, renowned quant expert Dr. Ernest Chan explores the evolving role of Generative AI in finance, diving into advanced trading strategies that go beyond traditional language models (LLMs).
About the author:
Dr. Ernest P. Chan is a recognized expert in applying statistical models and machine learning to finance. He is the Founder and Chief Scientist at PredictNow.ai, where he helps investors make informed decisions using advanced data-driven insights. Additionally, he is the Founder and Non-executive Chairman of QTS Capital Management, LLC, which focuses on systematic trading strategies. Dr. Chan has worked at notable organizations like IBM Research, Morgan Stanley, and Credit Suisse, gaining experience in pattern recognition, data mining, and quantitative trading.
Dr. Chan obtained his PhD in Physics from Cornell University and his B.Sc. in Physics from the University of Toronto. He has also authored several influential books, including Quantitative Trading and Algorithmic Trading. He was an Adjunct Associate Professor of Finance at Nanyang Technological University in Singapore and an adjunct faculty at Northwestern Universitys Masters in Data Science program.
Dr. Chan combines extensive industry experience with deep technical knowledge, making him an excellent resource for understanding how to apply machine learning to trading effectively.
This webinar was conducted on: Thursday, April 10, 2025
Introduction to Karnaugh Maps (K-Maps) for Simplifying Boolean ExpressionsGS Virdi
油
Presentation by Dr. G.S. Virdi: Explore the Karnaugh Map (K-Map) technique for simplifying and manipulating Boolean expressions. Dr. Virdi provides an in-depth look at why K-Maps are essential in digital design and how they can streamline logical operations for circuits of varying complexity.
Key Takeaways:
Learn the tabular structure of K-Maps and how to systematically group terms
Discover practical tips for reducing Boolean equations with a visual approach
Gain insights into designing more efficient, cost-effective digital systems
Target Audience: This presentation is ideal for electronics enthusiasts, students of digital logic, and seasoned professionals looking for a straightforward approach to Boolean simplification and circuit optimization.
How to Invoice Shipping Cost to Customer in Odoo 17Celine George
油
Odoo allows the invoicing of the shipping costs after delivery and this ensures that the charges are accurate based on the real time factors like weight, distance and chosen shipping method.
How to manage Customer Tips with Odoo 17 Point Of SaleCeline George
油
In the context of point-of-sale (POS) systems, a tip refers to the optional amount of money a customer leaves for the service they received. It's a way to show appreciation to the cashier, server, or whoever provided the service.
Srikanth Bana describes a district named Srikanth in his Harshcharit. It included the modern Delhi and Haryana regions. This district was very prosperous and happy. But Hiuen Tsang criticizes the residents here. According to him, they were superstitious, narrow-minded and intolerant. It seems that non-Buddhist residents were in majority in Srikanth district. Hiuen Tsang has termed the religious beliefs of these residents as superstition, narrow-mindedness and intolerant.
How to configure the retail shop in Odoo 17 Point of SaleCeline George
油
Odoo's Retail Shop is managed by the module Point of Sale(POS). It is a powerful tool designed to streamline and optimize the operations of retail businesses. It provides a comprehensive solution for managing various aspects of a retail store, from inventory and sales to customer management and reporting.
Strategic Corporate Social Responsibility: Sustainable Value Creation Fourthkeileyrazawi
油
Strategic Corporate Social Responsibility: Sustainable Value Creation Fourth
Strategic Corporate Social Responsibility: Sustainable Value Creation Fourth
Strategic Corporate Social Responsibility: Sustainable Value Creation Fourth
Enhancing SoTL through Generative AI -- Opportunities and Ethical Considerati...Sue Beckingham
油
This presentation explores the role of generative AI (GenAI) in enhancing the Scholarship of Teaching and Learning (SoTL), using Feltens five principles of good practice as a guiding framework. As educators within higher education institutions increasingly integrate GenAI into teaching and research, it is vital to consider how these tools can support scholarly inquiry into student learning, while remaining contextually grounded, methodologically rigorous, collaborative, and appropriately public.
Through practical examples and case-based scenarios, the session demonstrates how generative GenAI can assist in analysing critical reflection of current practice, enhancing teaching approaches and learning materials, supporting SoTL research design, fostering student partnerships, and amplifying the reach of scholarly outputs. Attendees will gain insights into ethical considerations, opportunities, and limitations of GenAI in SoTL, as well as ideas for integrating GenAI tools into their own scholarly teaching practices. The session invites critical reflection and dialogue about the responsible use of GenAI to enhance teaching, learning, and scholarly impact.
Purchase Analysis in Odoo 17 - Odoo 際際滷sCeline George
油
Purchase is one of the important things as a part of a business. It is essential to analyse everything that is happening inside the purchase and keep tracking. In Odoo 17, the reporting section is inside the purchase module, which is purchase analysis.
URINE SPECIMEN COLLECTION AND HANDLING CLASS 1 FOR ALL PARAMEDICAL OR CLINICA...Prabhakar Singh Patel
油
1. Urine analysis provides important information about renal and metabolic function through physical, chemical, and microscopic examination of urine samples.
2. Proper collection, preservation and timely testing of urine samples is necessary to obtain accurate results and detect abnormalities that can indicate underlying diseases.
3.
The topic and research question forms the foundation of the entire systematic review.
A poorly defined topic/question leads to:
Unfocused search strategy
Irrelevant studies
Weak synthesis and conclusions
Code a Strategy on Pine Script With the Help of ChatGPT by Akshay ChoudharyQuantInsti
油
This presentation is from a power-packed webinar on AI-driven trading and automation, bringing together leading experts and a community of over 6000 traders, analysts, developers, and students.
Session 2 features a hands-on experience with Akshay Choudhary and Varun Pothula, who demonstrate how to build and deploy real-world trading bots using Python and Pine Script on MetaTrader5 and TradingView.
Participants got the chance to interact live with experts, ask questions, and gain practical, actionable skills in automated tradingmaking this webinar a useful resource for anyone serious about the future of trading technology.
About the Author:
Akshay, a Quantitative Analyst at QuantInsti, completed his undergraduate studies at IIT Kanpur. Before joining QuantInsti, he served as a Data Scientist at Jio, where he honed his analytical expertise. With a passion for options trading and a keen interest in market psychology, Akshay brings practical insights into designing effective automated strategies.
2. Machine Learning
Before we jump into Neural Networks,
Tensorflow, Keras API etc its a good idea
to understand a few fundamental ideas
regarding machine learning.
In this section well cover some important
theory and concepts surrounding machine
learning.
3. Machine Learning
Section Overview:
What is Machine Learning?
What is Deep Learning?
Difference between Supervised and
Unsupervised Learning
Supervised Learning Process
Evaluating performance
Overfitting
4. What is Machine Learning?
Machine learning is a method of data analysis
that automates analytical model building.
Using algorithms that iteratively learn from
data, machine learning allows computers to
find hidden insights without being explicitly
programmed where to look.
5. What is it used for?
Fraud detection.
Web search results.
Real-time ads on web pages
Credit scoring.
Prediction of equipment failures.
New pricing models.
Network intrusion detection.
Recommendation Engines
Customer Segmentation
Text Sentiment Analysis
Customer Churn
Pattern and image
recognition.
Email spam filtering.
6. What are Neural Networks?
Neural Networks are a way of modeling
biological neuron systems mathematically.
These networks can then be used to solve
tasks that many other types of algorithms can
not (e.g. image classification)
Deep Learning simply refers to neural
networks with more than one hidden layer.
7. Machine Learning
There are different types of machine learning
we will focus on during the next sections of
the course:
Supervised Learning
Unsupervised Learning
8. Machine Learning
Machine Learning
Automated analytical models.
Neural Networks
A type of machine learning architecture
modeled after biological neurons.
Deep Learning
A neural network with more than one
hidden layer.
9. Machine Learning
Lets begin by learning about one of the most
common machine learning tasks- Supervised
Learning!
11. Supervised Learning
Supervised learning algorithms are trained
using labeled examples, such as an input
where the desired output is known.
For example, a segment of text could have a
category label, such as:
Spam vs. Legitimate Email
Positive vs. Negative Movie Review
12. Supervised Learning
The network receives a set of inputs along
with the corresponding correct outputs, and
the algorithm learns by comparing its actual
output with correct outputs to find errors.
It then modifies the model accordingly.
13. Supervised Learning
Supervised learning is commonly used in
applications where historical data predicts
likely future events.
22. Supervised Learning
What we just showed is a simplified approach
to supervised learning, it contains an issue!
Is it fair to use our single split of the data to
evaluate our models performance?
After all, we were given the chance to update
the model parameters again and again.
23. Supervised Learning
To fix this issue, data is often split into 3 sets
Training Data
Used to train model parameters
Validation Data
Used to determine what model
hyperparameters to adjust
Test Data
Used to get some final performance metric
24. Supervised Learning
This means after we see the results on the final
test set we dont get to go back and adjust any
model parameters!
This final measure is what we label the true
performance of the model to be.
25. Supervised Learning
In this course, in general we will simplify our
data by using a simple train/test split.
We will simply train and then evaluate on a test
set (leaving the option to students to go back
and adjust parameters).
After going through the course, you will be able
to easily perform another split to get 3 data sets
if you desire.
27. Machine Learning
Now that we understand the full process for
supervised learning, lets touch upon the
important topics of overfitting and
underfitting.
28. Machine Learning
Overfitting
The model fits too much to the noise from
the data.
This often results in low error on training
sets but high error on test/validation sets.
35. Machine Learning
Underfitting
Model does not capture the underlying
trend of the data and does not fit the data
well enough.
Low variance but high bias.
Underfitting is often a result of an
excessively simple model.
38. Machine Learning
This data was easy to visualize, but how can
we see underfitting and overfitting when
dealing with multi dimensional data sets?
First lets imagine we trained a model and then
measured its error over training time.
42. Machine Learning
When thinking about overfitting and
underfitting we want to keep in mind the
relationship of model performance on the
training set versus the test/validation set.
46. Machine Learning
Ideally the model would perform well on both,
with similar behavior.
Epochs
Error
47. Machine Learning
But what happens if we overfit on the training
data? That means we would perform poorly on
new test data!
Epochs
Error
48. Machine Learning
But what happens if we overfit on the training
data? That means we would perform poorly on
new test data!
Epochs
Error
49. Machine Learning
This is a good indication of training too much
on the training data, you should look for the
point to cut off training time!
Epochs
Error
50. Machine Learning
Well check on this idea again when we
actually begin creating models!
For now just be aware of this possible issue!
52. Model Evaluation
We just learned that after our machine
learning process is complete, we will use
performance metrics to evaluate how our
model did.
Lets discuss classification metrics in more
detail!
53. Model Evaluation
The key classification metrics we need to
understand are:
Accuracy
Recall
Precision
F1 -Score
54. Model Evaluation
But first, we should understand the
reasoning behind these metrics and how
they will actually work in the real world!
55. Model Evaluation
Typically in any classification task your
model can only achieve two results:
Either your model was correct in its
prediction.
Or your model was incorrect in its
prediction.
56. Model Evaluation
Fortunately incorrect vs correct expands to
situations where you have multiple classes.
For the purposes of explaining the metrics,
lets imagine a binary classification
situation, where we only have two
available classes.
57. Model Evaluation
In our example, we will attempt to predict
if an image is a dog or a cat.
Since this is supervised learning, we will
first fit/train a model on training data, then
test the model on testing data.
Once we have the models predictions
from the X_test data, we compare it to the
true y values (the correct labels).
62. Model Evaluation
Test Image
from X_test
Correct Label
from y_test
DOG
TRAINED
MODEL
DOG
Prediction on
Test Image
DOG == DOG ?
Compare Prediction to Correct Label
63. Model Evaluation
Test Image
from X_test
Correct Label
from y_test
DOG
TRAINED
MODEL
CAT
Prediction on
Test Image
DOG == CAT ?
Compare Prediction to Correct Label
64. Model Evaluation
We repeat this process for all the images in
our X test data.
At the end we will have a count of correct
matches and a count of incorrect matches.
The key realization we need to make, is
that in the real world, not all incorrect or
correct matches hold equal value!
65. Model Evaluation
Also in the real world, a single metric wont
tell the complete story!
To understand all of this, lets bring back
the 4 metrics we mentioned and see how
they are calculated.
We could organize our predicted values
compared to the real values in a confusion
matrix.
66. Model Evaluation
Accuracy
Accuracy in classification problems is
the number of correct predictions made
by the model divided by the total
number of predictions.
67. Model Evaluation
Accuracy
For example, if the X_test set was 1 00
images and our model correctly
predicted 80 images, then we have
80/1 00.
0.8 or 80% accuracy.
68. Model Evaluation
Accuracy
Accuracy is useful when target classes
are well balanced
In our example, we would have roughly
the same amount of cat images as we
have dog images.
69. Model Evaluation
Accuracy
Accuracy is not a good choice with
unbalanced classes!
Imagine we had 99 images of dogs and 1
image of a cat.
If our model was simply a line that
always predicted dog we would get 99%
accuracy!
70. Model Evaluation
Accuracy
Imagine we had 99 images of dogs and 1
image of a cat.
If our model was simply a line that
always predicted dog we would get 99%
accuracy!
In this situation well want to
understand recall and precision
71. Model Evaluation
Recall
Ability of a model to find all the relevant
cases within a dataset.
The precise definition of recall is the
number of true positives divided by the
number of true positives plus the
number of false negatives.
72. Model Evaluation
Precision
Ability of a classification model to
identify only the relevant data points.
Precision is defined as the number of
true positives divided by the number of
true positives plus the number of false
positives.
73. Model Evaluation
Recall and Precision
Often you have a trade-off between
Recall and Precision.
While recall expresses the ability to find
all relevant instances in a dataset,
precision expresses the proportion of
the data points our model says was
relevant actually were relevant.
74. Model Evaluation
F1 -Score
In cases where we want to find an
optimal blend of precision and recall we
can combine the two metrics using
what is called the F1 score.
75. Model Evaluation
F1 -Score
The F1 score is the harmonic mean of
precision and recall taking both metrics
into account in the following equation:
76. Model Evaluation
F1 -Score
We use the harmonic mean instead of a
simple average because it punishes
extreme values.
A classifier with a precision of 1 .0 and a
recall of 0.0 has a simple average of 0.5
but an F1 score of 0.
77. Model Evaluation
We can also view all correctly classified
versus incorrectly classified images in the
form of a confusion matrix.
80. Math &
Statistics
Domain
Knowledge
Machine
Learning
Software Research
DS
Model Evaluation
The main point to remember with the
confusion matrix and the various
calculated metrics is that they are all
fundamentally ways of comparing the
predicted values versus the true values.
What constitutes good metrics, will
really depend on the specific situation!
81. Math &
Statistics
Domain
Knowledge
Machine
Learning
Software Research
DS
Model Evaluation
Still confused on the confusion matrix?
No problem! Check out the Wikipedia
page for it, it has a really good diagram
with all the formulas for all the metrics.
Throughout the training, well usually just
print out metrics (e.g. accuracy).
82. Math &
Statistics
Domain
Knowledge
Machine
Learning
Software Research
DS
Model Evaluation
Lets think back on this idea of:
What is a good enough accuracy?
This all depends on the context of the
situation!
Did you create a model to predict
presence of a disease?
Is the disease presence well balanced in
the general population? (Probably not!)
84. Math &
Statistics
Domain
Knowledge
Machine
Learning
Software Research
DS
Model Evaluation
Often we have a precision/recall trade
off, We need to decide if the model will
should focus on fixing False Positives vs.
False Negatives.
In disease diagnosis, it is probably better
to go in the direction of False positives,
so we make sure we correctly classify as
many cases of disease as possible!
87. Evaluating Regression
Lets take a moment now to discuss
evaluating Regression Models
Regression is a task when a model
attempts to predict continuous values
(unlike categorical values, which is
classification)
88. Evaluating Regression
You may have heard of some evaluation
metrics like accuracy or recall.
These sort of metrics arent useful for
regression problems, we need metrics
designed for continuous values!
89. Evaluating Regression
For example, attempting to predict the
price of a house given its features is a
regression task.
Attempting to predict the country a
house is in given its features would be a
classification task.
90. Evaluating Regression
Lets discuss some of the most common
evaluation metrics for regression:
Mean Absolute Error
Mean Squared Error
Root Mean Square Error
91. Evaluating Regression
Mean Absolute Error (MAE)
This is the mean of the absolute
value of errors.
Easy to understand
95. Evaluating Regression
Mean Squared Error (MSE)
This is the mean of the squared
errors.
Larger errors are noted more than
with MAE, making MSE more
popular.
96. Evaluating Regression
Root Mean Square Error (RMSE)
This is the root of the mean of the
squared errors.
Most popular (has same units as y)
97. Machine Learning
Most common question from students:
Is this value of RMSE good?
Context is everything!
A RMSE of $1 0 is fantastic for predicting
the price of a house, but horrible for
predicting the price of a candy bar!
98. Machine Learning
Compare your error metric to the average
value of the label in your data set to try to
get an intuition of its overall performance.
Domain knowledge also plays an
important role here!
99. Machine Learning
Context of importance is also necessary to
consider.
We may create a model to predict how
much medication to give, in which case
small fluctuations in RMSE may actually be
very significant.
100. Evaluating Regression
You should now feel comfortable with the
various methods of evaluating a
regression task.
102. Machine Learning
Weve covered supervised learning, where
the label was known due to historical
labeled data.
But what happens when we dont have
historical labels?
103. Machine Learning
There are certain tasks that fall under
unsupervised learning:
Clustering
Anomaly Detection
Dimensionality Reduction
104. Machine Learning
Clustering
Grouping together unlabeled data
points into categories/clusters
Data points are assigned to a cluster
based on similarity
105. Machine Learning
Anomaly Detection
Attempts to detect outliers in a
dataset
For example, fraudulent transactions
on a credit card.
106. Machine Learning
Dimensionality Reduction
Data processing techniques that
reduces the number of features in a
data set, either for compression, or to
better understand underlying trends
within a data set.
107. Machine Learning
Unsupervised Learning
Its important to note, these are
situations where we dont have the
correct answer for historical data!
Which means evaluation is much
harder and more nuanced!
109. Machine Learning
Later on in the course, well explore
unsupervised learning processes with
specialized neural network structures,
such as autoencoders.