ݺߣ

ݺߣShare a Scribd company logo
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
(An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune)
NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Computer Engineering
(NBA Accredited)
Prof. S. A. Shivarkar
Assistant Professor
Contact No.8275032712
Email- shivarkarsandipcomp@sanjivani.org.in
Subject- Supervised Modeling and AI Technologies (CO9401)
Unit –I: Supervised Learning Naïve Bayes and K-NN
Content
 Baysian classifier, Naive Bayes classifier cases, Constraints of Naïve bayes,
Advantages of Naïve Bayes, Comparison of Naïve bayes with other
classifiers,
 K-nearest neighbor classifier, K-nearest neighbor classifier selection criteria,
Constraints of K-nearest neighbor, Advantages and Disadvantages of K-
nearest neighbor algorithms, controlling complexity of K-NN.
Supervised vs. Unsupervised Learning
 Supervised learning (classification)- Prediction either Yes or No
 Supervision: The training data (observations, measurements, etc.) are
accompanied by labels indicating the class of the observations.
 New data is classified based on the training set.
 Unsupervised learning (clustering)
 The class labels of training data is unknown.
 Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data.
Prediction Problems: Classification vs. Numeric Prediction
 Classification
 Predicts categorical class labels (discrete or nominal)
 classifies data (constructs a model) based on the training set and the
values (class labels) in a classifying attribute and uses it in classifying new
data.
 Numeric Prediction
 Models continuous-valued functions, i.e., predicts unknown or missing
values .
 Typical applications
 Credit/loan approval: Loan approved Yes or No
 Medical diagnosis: if a tumor is cancerous or benign
 Fraud detection: if a transaction is fraudulent
 Web page categorization: which category it is
Classification—A Two-Step Process
 Model construction: describing a set of predetermined classes
 Each tuple/sample is assumed to belong to a predefined class, as determined by the class
label attribute
 The set of tuples used for model construction is training set
 The model is represented as classification rules, decision trees, or mathematical formulae
 Model usage: for classifying future or unknown objects
 Estimate accuracy of the model
 The known label of test sample is compared with the classified result from the model
 Accuracy rate is the percentage of test set samples that are correctly classified by the
model
 Test set is independent of training set (otherwise overfitting)
 If the accuracy is acceptable, use the model to classify new data
 Note: If the test set is used to select models, it is called validation (test) set
Step 1: Model Construction
Step 2: Model Usage
Issues: Data Preparation
 Data cleaning
 Preprocess data in order to reduce noise and handle
missing values
 Relevance analysis (feature selection)
 Remove the irrelevant or redundant attributes
 Data transformation
 Generalize and/or normalize data
Issues: Evaluating Classification Methods
 Accuracy
 classifier accuracy: predicting class label
 predictor accuracy: guessing value of predicted attributes
 Speed
 time to construct the model (training time)
 time to use the model (classification/prediction time)
 Robustness: handling noise and missing values
 Scalability: efficiency in disk-resident databases
 Interpretability
 understanding and insight provided by the model
 Other measures, e.g., goodness of rules, such as decision tree size or compactness of classification rules
Issues: Evaluating Classification Methods: Accuracy
 Accuracy simply measures how often the classifier correctly predicts.
 We can define accuracy as the ratio of the number of correct predictions and the total number of
predictions.
 For binary classification (only two class labels) we use TP and TN.
Accuracy: Confusion Matrix
Bayesian Classification: Why?
 A statistical classifier: performs probabilistic prediction, i.e., predicts class
membership probabilities
 Foundation: Based on Bayes’ Theorem.
 Performance: A simple Bayesian classifier, naïve Bayesian classifier, has comparable
performance with decision tree and selected neural network classifiers
 Incremental: Each training example can incrementally increase/decrease the
probability that a hypothesis is correct — prior knowledge can be combined with
observed data
 Standard: Even when Bayesian methods are computationally intractable, they can
provide a standard of optimal decision making against which other methods can be
measured
Bayes’ Theorem: Basics
 Total probability Theorem:
 Bayes’ Theorem:
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 Classification is to determine P(H|X), (i.e., posteriori probability): the probability that the
hypothesis holds given the observed data sample X
 P(H) (prior probability): the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): probability that sample data is observed
 P(X|H) (likelihood): the probability of observing the sample X, given that the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40, medium income
)
(
)
1
|
(
)
( i
A
P
M
i i
A
B
P
B
P 


)
(
/
)
(
)
|
(
)
(
)
(
)
|
(
)
|
( X
X
X
X
X P
H
P
H
P
P
H
P
H
P
H
P 


Prediction Based on Bayes’ Theorem
 Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes’ theorem
 Informally, this can be viewed as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the highest
among all the P(Ck|X) for all the k classes
 Practical difficulty: It requires initial knowledge of many
probabilities, involving significant computational cost
)
(
/
)
(
)
|
(
)
(
)
(
)
|
(
)
|
( X
X
X
X
X P
H
P
H
P
P
H
P
H
P
H
P 


Classification Is to Derive the Maximum Posteriori
 Let D be a training set of tuples and their associated class labels, and each tuple is
represented by an n-D attribute vector X = (x1, x2, …, xn)
 Suppose there are m classes C1, C2, …, Cm.
 Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X)
 This can be derived from Bayes’ theorem
 Since P(X) is constant for all classes, only
needs to be maximized
)
(
)
(
)
|
(
)
|
(
X
X
X
P
i
C
P
i
C
P
i
C
P 
)
(
)
|
(
)
|
( i
C
P
i
C
P
i
C
P X
X 
Naïve Bayes Classifier
 A simplified assumption: attributes are conditionally independent (i.e., no
dependence relation between attributes):
 This greatly reduces the computation cost: Only counts the class distribution
 If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk for Ak divided by
|Ci, D| (# of tuples of Ci in D)
 If Ak is continous-valued, P(xk|Ci) is usually computed based on Gaussian
distribution with a mean μ and standard deviation σ
and P(xk|Ci) is
)
|
(
...
)
|
(
)
|
(
1
)
|
(
)
|
(
2
1
Ci
x
P
Ci
x
P
Ci
x
P
n
k
Ci
x
P
Ci
P
n
k







X
2
2
2
)
(
2
1
)
,
,
( 








x
e
x
g
)
,
,
(
)
|
( i
i C
C
k
x
g
Ci
P 


X
Naïve Bayes Classifier Example 1
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data to be classified:
X = (age <=30,
Income = medium,
Student = yes
Credit_rating = Fair)
Naïve Bayes Classifier Example 1 Solution
Age P(Y) P(N)
<=30 29=0.22 3/5=0.6
31…..40 49=0.44 0
>40 39=0.33 25=0.4
 Prior Probability
 P(Buys Computer= Yes)=914=0.642
 P(Buys Computer= No)=514=0.357
 Posterior/ Conditional Probability
Income P(Y) P(N)
High 2/9=0.22 2/5=0.4
Medium 4/9=0.44 2/5=0.4
Low 3/9=0.33 1/5=0.2
Credit
Rating
P(Y) P(N)
Fair 6/9=0.67 2/5=0.4
Excellent 3/9=0.33 3/5=0.6
Student P(Y) P(N)
Yes 6/9=0.67 3/5=0.6
No 3/9=0.33 2/5=0.4
Naïve Bayes Classifier Example 1 Solution
Age P(Y) P(N)
<=30 29=0.22 3/5=0.6
31…..40 49=0.44 0
>40 39=0.33 25=0.4
 Prior Probability
 P(Buys Computer= Yes)=914=0.642
 P(Buys Computer= No)=514=0.357
 Posterior/ Conditional Probability
Income P(Y) P(N)
High 2/9=0.22 2/5=0.4
Medium 4/9=0.44 2/5=0.4
Low 3/9=0.33 1/5=0.2
Credit
Rating
P(Y) P(N)
Fair 6/9=0.67 2/5=0.4
Excellent 3/9=0.33 3/5=0.6
Student P(Y) P(N)
Yes 6/9=0.67 3/5=0.6
No 3/9=0.33 2/5=0.4
 P(Yes)=0.22*0.44*0.33*0.67*0.33*0.22*0.44*0.33*
0.67*0.33*0.642=
 P(No)=0.6*0.4*0.6*0.4*0.4*0.4*0.2*0.4*0.6*0.357=
Naïve Bayes Classifier Example 1 Solution
Age P(Y) P(N)
<=30 29=0.22 3/5=0.6
31…..40 49=0.44 0
>40 39=0.33 25=0.4
 Data to be classified
 Age=31…40, Income= High, Student = No,
Credit Rating= Excellent, Buys Computer?
Income P(Y) P(N)
High 2/9=0.22 2/5=0.4
Medium 4/9=0.44 2/5=0.4
Low 3/9=0.33 1/5=0.2
Credit
Rating
P(Y) P(N)
Fair 6/9=0.67 2/5=0.4
Excellent 3/9=0.33 3/5=0.6
Student P(Y) P(N)
Yes 6/9=0.67 3/5=0.6
No 3/9=0.33 2/5=0.4
 P(Yes)=0.44*0.22*0.33*0.33*0.642=0.0067
 P(No)=0*0.4*0.4*0.6*0.357=0
AS P(Yes)> P(No), so Buys Computer=Yes
Naïve Bayes Classifier Example 2
Class:
C1:Plaing Tennis = ‘yes’
C2: C1:Plaing Tennis = ‘no’
Data to be classified:
Outlook=Rainy,
Temp=Hot,
Humidity=High, Windy=
Strong, Play= ?
Day Outlook Temp Humidity Windy Play
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rainy Mild High Weak Yes
D5 Rainy Cool Normal Weak Yes
D6 Rainy Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rainy Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rainy Mild High Strong No
Naïve Bayes Classifier Example 2 Solution
Outlook P(Y) P(N)
Sunny 29=0.22 3/5=0.6
Overcast 49=0.44 0
Rainy 39=0.33 25=0.4
 Prior Probability
 P(Y)=914=0.6428
 P(N)=514=0.3571
 Posterior/ Conditional Probability
Temp P(Y) P(N)
Hot 2/9=0.22 2/5=0.4
Mild 4/9=0.44 2/5=0.4
Cool 3/9=0.33 1/5=0.2
Wind P(Y) P(N)
Weak 6/9=0.67 2/5=0.4
Strong 3/9=0.33 3/5=0.6
Humidity P(Y) P(N)
High 3/9=0.33 4/5=0.8
Normal 6/9=0.66 1/5=0.2
 P(Y)=0.333*0.222*0.33*0.33*0.6428=0.0052
 P(N)=0.4*0.4*0.8*0.6*0.3571=0.0274
As P(Y) <P(N) , WILL NOT PLAY
Naïve Bayes Classifier Example 3
Given the training set for
classification problem into two
classes “fraud” and “normal”. There
are two attributes A1 and A2 taking
values 0 or 1. The Bayes classifier
classifies the instance (A1=1,
A2=1) into class?
A1 A2 Class
1 0 fraud
1 1 fraud
1 1 fraud
1 0 normal
1 1 fraud
0 0 normal
0 0 normal
0 0 normal
1 1 normal
1 0 normal
Benefits of Naïve Bayes Classifier
 It is simple and easy to implement
 It doesn’t require as much training data
 It handles both continuous and discrete data
 It is highly scalable with the number of predictors and data
points
 It is fast and can be used to make real-time predictions
 It is not sensitive to irrelevant features
Limitations of Naïve Bayes Classifier
 Naive Bayes assumes that all predictors (or features) are
independent, rarely happening in real life. This limits the
applicability of this algorithm in real-world use cases.
 This algorithm faces the ‘zero-frequency problem’ where it assigns
zero probability to a categorical variable whose category in the test
data set wasn’t available in the training dataset. It would be best if
you used a smoothing technique to overcome this issue.
 Its estimations can be wrong in some cases, so you shouldn’t take
its probability outputs very seriously.
Type of Distances used in Machine Learning algorithm
 Distance metric are used to represent distances between any two
data points.
 There are many distance metrics.
1. Euclidean Distance
2. Manhattan Distance
3. Minkowski Distance
4. Hamming Distance
Type of Distances used in Machine Learning algorithm
Euclidean distance :√(X₂-X₁)²+(Y₂-Y₁)²
Lets calculate Distance between { 2, 3 } from { 3, 5 }
= √(3-2)²+(5-3)²
=√(1)²+(2)²
= √(1+4
= √(5
Calculate Distance between { 40, 20 } from {20, 35 }
Type of Distances used in Machine Learning algorithm
Manhattan Distance
The Manhattan distance as the sum of absolute differences
Lets calculate Distance between { 2, 3 } from { 3, 5 }
=|2–3|+|3–5|
= |-1| + |-2|
= 1+2
= 3
Calculate Distance between { 40, 20 } from {20, 35 }
|x1 — x2| + |y1 — y2|
The k-Nearest Neighbor Algorithm
 The k-nearest neighbors (KNN) algorithm is a non-parametric
 Supervised learning classifier
 Uses proximity to make classifications or predictions about the grouping of an
individual data point
 Popular and simplest classification and regression classifiers used in machine
learning today
Mostly suited for Binary Classification
The k-Nearest Neighbor Algorithm
 All instances correspond to points in the n-D space
 The nearest neighbor are defined in terms of Euclidean distance, dist(X1, X2)
 Target function could be discrete- or real- valued
 For discrete-valued, k-NN returns the most common value among the k training
examples nearest to xq
 Vonoroi diagram: the decision surface induced by 1-NN for a typical set of
training examples
.
_
_ xq
+
_ _
+
_
_
+
.
.
.
. .
 Step #1 - Assign a value to K.
 Step #2 - Calculate the distance between the new data entry and all
other existing data entries. Arrange them in ascending order.
 Step #3 - Find the K nearest neighbors to the new entry based on the
calculated distances.
 Step #4 - Assign the new data entry to the majority class in the
nearest neighbors.
The k-Nearest Neighbor Algorithm Steps
Type of Distances used in Machine Learning algorithm
Euclidean distance :√(X₂-X₁)²+(Y₂-Y₁)²
Lets calculate Distance between { 2, 3 } from { 3, 5 }
= √(3-2)²+(5-3)²
=√(1)²+(2)²
= √(1+4
= √(5
Calculate Distance between { 40, 20 } from {20, 35 }
Type of Distances used in Machine Learning algorithm
Manhattan Distance
The Manhattan distance as the sum of absolute differences
Lets calculate Distance between { 2, 3 } from { 3, 5 }
=|2–3|+|3–5|
= |-1| + |-2|
= 1+2
= 3
Calculate Distance between { 40, 20 } from {20, 35 }
|x1 — x2| + |y1 — y2|
 For given Barbie movie IMDb Rating = 7.4, Duration = 114, Genre ?
Assume K=3, use Euclidean distance
Euclidean distance :√(X₂-X₁)²+(Y₂-Y₁)²
The k-Nearest Neighbor Algorithm
IMDb Rating Duration Genre
8.0 ( Mission Impossible) 160 Action
6.2 (Gadar 2) 170 Action
7.2 (Rocky and Rani) 168 Comedy
8.2 ( OMG 2) 155 Comedy
 Step 1: Calculate the distances.
 Calculate the distance of new movie and each movie in dataset.
Distance to (8.0,160)=√(7.4-8.0)²+(114-160)² = √(0.36+2116) = 46.00
Distance to (6.2,160)=√(7.4-6.2)²+(114-160)² = √(1.44+2116) = 56.01
Distance to (7.2,168)=√(7.4-7.2)²+(114-168)² = √(0.04+2916) = 54.00
Distance to (8.2,155)=√(7.4-8.2)²+(114-155)² = √(0.64+1618) = 41.00
The k-Nearest Neighbor Algorithm
(X ₁,Y₁) (X₂,Y ₂)
 Step 2: Select K Nearest Neighbor.
 For K=1, Shortest distance is 41.00
 So, Barbie movie Genre is Comedy.
The k-Nearest Neighbor Algorithm
 Step 3: Majority Voting (Classification)
 For K=3, Shortest distance is 41.00, 46.00 and 54
Action, Comedy, Comedy
 So, Barbie movie Genre is Comedy.
 For given data test tuple Brightness=20, saturation=35, Class?
Assume K=5, use Euclidean distance
BRIGHTNESS SATURATION CLASS
40 20 Red
50 50 Blue
60 90 Blue
10 25 Red
70 70 Blue
60 10 Red
25 80 Blue
Euclidean distance :√(X₂-X₁)²+(Y₂-Y₁)²
The k-Nearest Neighbor Algorithm
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 38
Reference
 Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and
Techniques”,Elsevier Publishers, ISBN:9780123814791, 9780123814807.
 https://onlinecourses.nptel.ac.in/noc24_cs22
 https://medium.com/analytics-vidhya/type-of-distances-used-in-machine-
learning-algorithm-c873467140de
 https://www.freecodecamp.org/news/k-nearest-neighbors-algorithm-
classifiers-and-model-example/

More Related Content

More from ShivarkarSandip (20)

Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP Operation
ShivarkarSandip
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data Cleaning
ShivarkarSandip
Introduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAPIntroduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAP
ShivarkarSandip
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAP
ShivarkarSandip
Issues in data mining Patterns Online Analytical Processing
Issues in data mining  Patterns Online Analytical ProcessingIssues in data mining  Patterns Online Analytical Processing
Issues in data mining Patterns Online Analytical Processing
ShivarkarSandip
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basics
ShivarkarSandip
Introduction to Data Communication.pdf
Introduction to Data Communication.pdfIntroduction to Data Communication.pdf
Introduction to Data Communication.pdf
ShivarkarSandip
Classification of Signal.pdf
Classification of Signal.pdfClassification of Signal.pdf
Classification of Signal.pdf
ShivarkarSandip
Sequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfSequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdf
ShivarkarSandip
Sequential Ckt.pdf
Sequential Ckt.pdfSequential Ckt.pdf
Sequential Ckt.pdf
ShivarkarSandip
Flip Flop.pdf
Flip Flop.pdfFlip Flop.pdf
Flip Flop.pdf
ShivarkarSandip
Combinational Ckt.pdf
Combinational Ckt.pdfCombinational Ckt.pdf
Combinational Ckt.pdf
ShivarkarSandip
Boolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfBoolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdf
ShivarkarSandip
Logic Minimization.pdf
Logic Minimization.pdfLogic Minimization.pdf
Logic Minimization.pdf
ShivarkarSandip
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdf
ShivarkarSandip
Unit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfUnit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdf
ShivarkarSandip
Unit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfUnit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdf
ShivarkarSandip
Unit I Operational data Informational data.pdf
Unit I Operational data  Informational data.pdfUnit I Operational data  Informational data.pdf
Unit I Operational data Informational data.pdf
ShivarkarSandip
Unit I Introduction to BI.pdf
Unit I Introduction to BI.pdfUnit I Introduction to BI.pdf
Unit I Introduction to BI.pdf
ShivarkarSandip
Unit III Components and Arch of DWH ETL Process.pdf
Unit III  Components and Arch of DWH ETL Process.pdfUnit III  Components and Arch of DWH ETL Process.pdf
Unit III Components and Arch of DWH ETL Process.pdf
ShivarkarSandip
Data Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP OperationData Warehouse and Architecture, OLAP Operation
Data Warehouse and Architecture, OLAP Operation
ShivarkarSandip
Data Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data CleaningData Preparation and Preprocessing , Data Cleaning
Data Preparation and Preprocessing , Data Cleaning
ShivarkarSandip
Introduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAPIntroduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining, KDD Process, OLTP and OLAP
ShivarkarSandip
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAP
ShivarkarSandip
Issues in data mining Patterns Online Analytical Processing
Issues in data mining  Patterns Online Analytical ProcessingIssues in data mining  Patterns Online Analytical Processing
Issues in data mining Patterns Online Analytical Processing
ShivarkarSandip
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basics
ShivarkarSandip
Introduction to Data Communication.pdf
Introduction to Data Communication.pdfIntroduction to Data Communication.pdf
Introduction to Data Communication.pdf
ShivarkarSandip
Sequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdfSequential Circuit Design-2.pdf
Sequential Circuit Design-2.pdf
ShivarkarSandip
Boolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdfBoolean Algebra Terminologies.pdf
Boolean Algebra Terminologies.pdf
ShivarkarSandip
Unit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdfUnit III Introduction to DWH.pdf
Unit III Introduction to DWH.pdf
ShivarkarSandip
Unit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdfUnit II Decision Making Basics and Concepts.pdf
Unit II Decision Making Basics and Concepts.pdf
ShivarkarSandip
Unit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdfUnit I Factors Responsible for Successful BI Project.pdf
Unit I Factors Responsible for Successful BI Project.pdf
ShivarkarSandip
Unit I Operational data Informational data.pdf
Unit I Operational data  Informational data.pdfUnit I Operational data  Informational data.pdf
Unit I Operational data Informational data.pdf
ShivarkarSandip
Unit III Components and Arch of DWH ETL Process.pdf
Unit III  Components and Arch of DWH ETL Process.pdfUnit III  Components and Arch of DWH ETL Process.pdf
Unit III Components and Arch of DWH ETL Process.pdf
ShivarkarSandip

Recently uploaded (20)

GREEN BULIDING PPT FOR THE REFRENACE.PPT
GREEN BULIDING PPT FOR THE REFRENACE.PPTGREEN BULIDING PPT FOR THE REFRENACE.PPT
GREEN BULIDING PPT FOR THE REFRENACE.PPT
kamalkeerthan61
Air pollution is contamination of the indoor or outdoor environment by any ch...
Air pollution is contamination of the indoor or outdoor environment by any ch...Air pollution is contamination of the indoor or outdoor environment by any ch...
Air pollution is contamination of the indoor or outdoor environment by any ch...
dhanashree78
Equipment for Gas Metal Arc Welding Process
Equipment for Gas Metal Arc Welding ProcessEquipment for Gas Metal Arc Welding Process
Equipment for Gas Metal Arc Welding Process
AhmadKamil87
Power Point Presentation for Electrical Engineering 3-phase.ppt
Power Point Presentation for Electrical Engineering 3-phase.pptPower Point Presentation for Electrical Engineering 3-phase.ppt
Power Point Presentation for Electrical Engineering 3-phase.ppt
Aniket_1415
decarbonization steel industry rev1.pptx
decarbonization steel industry rev1.pptxdecarbonization steel industry rev1.pptx
decarbonization steel industry rev1.pptx
gonzalezolabarriaped
IPC-9716_2024 Requirements for Automated Optical Inspection (AOI) Process Con...
IPC-9716_2024 Requirements for Automated Optical Inspection (AOI) Process Con...IPC-9716_2024 Requirements for Automated Optical Inspection (AOI) Process Con...
IPC-9716_2024 Requirements for Automated Optical Inspection (AOI) Process Con...
ssuserd9338b
CFOT Fiber Optics FOA CERTIFICATION.pptx
CFOT Fiber Optics FOA CERTIFICATION.pptxCFOT Fiber Optics FOA CERTIFICATION.pptx
CFOT Fiber Optics FOA CERTIFICATION.pptx
MohamedShabana37
Lecture -3 Cold water supply system.pptx
Lecture -3 Cold water supply system.pptxLecture -3 Cold water supply system.pptx
Lecture -3 Cold water supply system.pptx
rabiaatif2
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
NgocThang9
health safety and environment presentation
health safety and environment presentationhealth safety and environment presentation
health safety and environment presentation
ssuserc606c7
04 MAINTENANCE OF CONCRETE PAVEMENTS.ppt
04  MAINTENANCE OF CONCRETE PAVEMENTS.ppt04  MAINTENANCE OF CONCRETE PAVEMENTS.ppt
04 MAINTENANCE OF CONCRETE PAVEMENTS.ppt
sreenath seenu
Optimization of Cumulative Energy, Exergy Consumption and Environmental Life ...
Optimization of Cumulative Energy, Exergy Consumption and Environmental Life ...Optimization of Cumulative Energy, Exergy Consumption and Environmental Life ...
Optimization of Cumulative Energy, Exergy Consumption and Environmental Life ...
J. Agricultural Machinery
Mathematics behind machine learning INT255 INT255__Unit 3__PPT-1.pptx
Mathematics behind machine learning INT255 INT255__Unit 3__PPT-1.pptxMathematics behind machine learning INT255 INT255__Unit 3__PPT-1.pptx
Mathematics behind machine learning INT255 INT255__Unit 3__PPT-1.pptx
ppkmurthy2006
Best KNow Hydrogen Fuel Production in the World The cost in USD kwh for H2
Best KNow  Hydrogen Fuel Production in the World The cost in USD kwh for H2Best KNow  Hydrogen Fuel Production in the World The cost in USD kwh for H2
Best KNow Hydrogen Fuel Production in the World The cost in USD kwh for H2
Daniel Donatelli
Mathematics_behind_machine_learning_INT255.pptx
Mathematics_behind_machine_learning_INT255.pptxMathematics_behind_machine_learning_INT255.pptx
Mathematics_behind_machine_learning_INT255.pptx
ppkmurthy2006
US Patented ReGenX Generator, ReGen-X Quatum Motor EV Regenerative Accelerati...
US Patented ReGenX Generator, ReGen-X Quatum Motor EV Regenerative Accelerati...US Patented ReGenX Generator, ReGen-X Quatum Motor EV Regenerative Accelerati...
US Patented ReGenX Generator, ReGen-X Quatum Motor EV Regenerative Accelerati...
Thane Heins NOBEL PRIZE WINNING ENERGY RESEARCHER
Frankfurt University of Applied Science urkunde
Frankfurt University of Applied Science urkundeFrankfurt University of Applied Science urkunde
Frankfurt University of Applied Science urkunde
Lisa Emerson
Engineering at Lovely Professional University (LPU).pdf
Engineering at Lovely Professional University (LPU).pdfEngineering at Lovely Professional University (LPU).pdf
Engineering at Lovely Professional University (LPU).pdf
Sona
Name.docxVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
Name.docxVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVName.docxVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
Name.docxVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
MerijimArsedelPalmad1
TM-ASP-101-RF_Air Press manual crimping machine.pdf
TM-ASP-101-RF_Air Press manual crimping machine.pdfTM-ASP-101-RF_Air Press manual crimping machine.pdf
TM-ASP-101-RF_Air Press manual crimping machine.pdf
ChungLe60
GREEN BULIDING PPT FOR THE REFRENACE.PPT
GREEN BULIDING PPT FOR THE REFRENACE.PPTGREEN BULIDING PPT FOR THE REFRENACE.PPT
GREEN BULIDING PPT FOR THE REFRENACE.PPT
kamalkeerthan61
Air pollution is contamination of the indoor or outdoor environment by any ch...
Air pollution is contamination of the indoor or outdoor environment by any ch...Air pollution is contamination of the indoor or outdoor environment by any ch...
Air pollution is contamination of the indoor or outdoor environment by any ch...
dhanashree78
Equipment for Gas Metal Arc Welding Process
Equipment for Gas Metal Arc Welding ProcessEquipment for Gas Metal Arc Welding Process
Equipment for Gas Metal Arc Welding Process
AhmadKamil87
Power Point Presentation for Electrical Engineering 3-phase.ppt
Power Point Presentation for Electrical Engineering 3-phase.pptPower Point Presentation for Electrical Engineering 3-phase.ppt
Power Point Presentation for Electrical Engineering 3-phase.ppt
Aniket_1415
decarbonization steel industry rev1.pptx
decarbonization steel industry rev1.pptxdecarbonization steel industry rev1.pptx
decarbonization steel industry rev1.pptx
gonzalezolabarriaped
IPC-9716_2024 Requirements for Automated Optical Inspection (AOI) Process Con...
IPC-9716_2024 Requirements for Automated Optical Inspection (AOI) Process Con...IPC-9716_2024 Requirements for Automated Optical Inspection (AOI) Process Con...
IPC-9716_2024 Requirements for Automated Optical Inspection (AOI) Process Con...
ssuserd9338b
CFOT Fiber Optics FOA CERTIFICATION.pptx
CFOT Fiber Optics FOA CERTIFICATION.pptxCFOT Fiber Optics FOA CERTIFICATION.pptx
CFOT Fiber Optics FOA CERTIFICATION.pptx
MohamedShabana37
Lecture -3 Cold water supply system.pptx
Lecture -3 Cold water supply system.pptxLecture -3 Cold water supply system.pptx
Lecture -3 Cold water supply system.pptx
rabiaatif2
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
15. Smart Cities Big Data, Civic Hackers, and the Quest for a New Utopia.pdf
NgocThang9
health safety and environment presentation
health safety and environment presentationhealth safety and environment presentation
health safety and environment presentation
ssuserc606c7
04 MAINTENANCE OF CONCRETE PAVEMENTS.ppt
04  MAINTENANCE OF CONCRETE PAVEMENTS.ppt04  MAINTENANCE OF CONCRETE PAVEMENTS.ppt
04 MAINTENANCE OF CONCRETE PAVEMENTS.ppt
sreenath seenu
Optimization of Cumulative Energy, Exergy Consumption and Environmental Life ...
Optimization of Cumulative Energy, Exergy Consumption and Environmental Life ...Optimization of Cumulative Energy, Exergy Consumption and Environmental Life ...
Optimization of Cumulative Energy, Exergy Consumption and Environmental Life ...
J. Agricultural Machinery
Mathematics behind machine learning INT255 INT255__Unit 3__PPT-1.pptx
Mathematics behind machine learning INT255 INT255__Unit 3__PPT-1.pptxMathematics behind machine learning INT255 INT255__Unit 3__PPT-1.pptx
Mathematics behind machine learning INT255 INT255__Unit 3__PPT-1.pptx
ppkmurthy2006
Best KNow Hydrogen Fuel Production in the World The cost in USD kwh for H2
Best KNow  Hydrogen Fuel Production in the World The cost in USD kwh for H2Best KNow  Hydrogen Fuel Production in the World The cost in USD kwh for H2
Best KNow Hydrogen Fuel Production in the World The cost in USD kwh for H2
Daniel Donatelli
Mathematics_behind_machine_learning_INT255.pptx
Mathematics_behind_machine_learning_INT255.pptxMathematics_behind_machine_learning_INT255.pptx
Mathematics_behind_machine_learning_INT255.pptx
ppkmurthy2006
Frankfurt University of Applied Science urkunde
Frankfurt University of Applied Science urkundeFrankfurt University of Applied Science urkunde
Frankfurt University of Applied Science urkunde
Lisa Emerson
Engineering at Lovely Professional University (LPU).pdf
Engineering at Lovely Professional University (LPU).pdfEngineering at Lovely Professional University (LPU).pdf
Engineering at Lovely Professional University (LPU).pdf
Sona
Name.docxVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
Name.docxVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVName.docxVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
Name.docxVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
MerijimArsedelPalmad1
TM-ASP-101-RF_Air Press manual crimping machine.pdf
TM-ASP-101-RF_Air Press manual crimping machine.pdfTM-ASP-101-RF_Air Press manual crimping machine.pdf
TM-ASP-101-RF_Air Press manual crimping machine.pdf
ChungLe60

Supervised Learning Naïve Bayes and K NN

  • 1. Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423 603 (An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune) NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Computer Engineering (NBA Accredited) Prof. S. A. Shivarkar Assistant Professor Contact No.8275032712 Email- shivarkarsandipcomp@sanjivani.org.in Subject- Supervised Modeling and AI Technologies (CO9401) Unit –I: Supervised Learning Naïve Bayes and K-NN
  • 2. Content  Baysian classifier, Naive Bayes classifier cases, Constraints of Naïve bayes, Advantages of Naïve Bayes, Comparison of Naïve bayes with other classifiers,  K-nearest neighbor classifier, K-nearest neighbor classifier selection criteria, Constraints of K-nearest neighbor, Advantages and Disadvantages of K- nearest neighbor algorithms, controlling complexity of K-NN.
  • 3. Supervised vs. Unsupervised Learning  Supervised learning (classification)- Prediction either Yes or No  Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations.  New data is classified based on the training set.  Unsupervised learning (clustering)  The class labels of training data is unknown.  Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data.
  • 4. Prediction Problems: Classification vs. Numeric Prediction  Classification  Predicts categorical class labels (discrete or nominal)  classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data.  Numeric Prediction  Models continuous-valued functions, i.e., predicts unknown or missing values .  Typical applications  Credit/loan approval: Loan approved Yes or No  Medical diagnosis: if a tumor is cancerous or benign  Fraud detection: if a transaction is fraudulent  Web page categorization: which category it is
  • 5. Classification—A Two-Step Process  Model construction: describing a set of predetermined classes  Each tuple/sample is assumed to belong to a predefined class, as determined by the class label attribute  The set of tuples used for model construction is training set  The model is represented as classification rules, decision trees, or mathematical formulae  Model usage: for classifying future or unknown objects  Estimate accuracy of the model  The known label of test sample is compared with the classified result from the model  Accuracy rate is the percentage of test set samples that are correctly classified by the model  Test set is independent of training set (otherwise overfitting)  If the accuracy is acceptable, use the model to classify new data  Note: If the test set is used to select models, it is called validation (test) set
  • 6. Step 1: Model Construction
  • 8. Issues: Data Preparation  Data cleaning  Preprocess data in order to reduce noise and handle missing values  Relevance analysis (feature selection)  Remove the irrelevant or redundant attributes  Data transformation  Generalize and/or normalize data
  • 9. Issues: Evaluating Classification Methods  Accuracy  classifier accuracy: predicting class label  predictor accuracy: guessing value of predicted attributes  Speed  time to construct the model (training time)  time to use the model (classification/prediction time)  Robustness: handling noise and missing values  Scalability: efficiency in disk-resident databases  Interpretability  understanding and insight provided by the model  Other measures, e.g., goodness of rules, such as decision tree size or compactness of classification rules
  • 10. Issues: Evaluating Classification Methods: Accuracy  Accuracy simply measures how often the classifier correctly predicts.  We can define accuracy as the ratio of the number of correct predictions and the total number of predictions.  For binary classification (only two class labels) we use TP and TN.
  • 12. Bayesian Classification: Why?  A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities  Foundation: Based on Bayes’ Theorem.  Performance: A simple Bayesian classifier, naïve Bayesian classifier, has comparable performance with decision tree and selected neural network classifiers  Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct — prior knowledge can be combined with observed data  Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured
  • 13. Bayes’ Theorem: Basics  Total probability Theorem:  Bayes’ Theorem:  Let X be a data sample (“evidence”): class label is unknown  Let H be a hypothesis that X belongs to class C  Classification is to determine P(H|X), (i.e., posteriori probability): the probability that the hypothesis holds given the observed data sample X  P(H) (prior probability): the initial probability  E.g., X will buy computer, regardless of age, income, …  P(X): probability that sample data is observed  P(X|H) (likelihood): the probability of observing the sample X, given that the hypothesis holds  E.g., Given that X will buy computer, the prob. that X is 31..40, medium income ) ( ) 1 | ( ) ( i A P M i i A B P B P    ) ( / ) ( ) | ( ) ( ) ( ) | ( ) | ( X X X X X P H P H P P H P H P H P   
  • 14. Prediction Based on Bayes’ Theorem  Given training data X, posteriori probability of a hypothesis H, P(H|X), follows the Bayes’ theorem  Informally, this can be viewed as posteriori = likelihood x prior/evidence  Predicts X belongs to Ci iff the probability P(Ci|X) is the highest among all the P(Ck|X) for all the k classes  Practical difficulty: It requires initial knowledge of many probabilities, involving significant computational cost ) ( / ) ( ) | ( ) ( ) ( ) | ( ) | ( X X X X X P H P H P P H P H P H P   
  • 15. Classification Is to Derive the Maximum Posteriori  Let D be a training set of tuples and their associated class labels, and each tuple is represented by an n-D attribute vector X = (x1, x2, …, xn)  Suppose there are m classes C1, C2, …, Cm.  Classification is to derive the maximum posteriori, i.e., the maximal P(Ci|X)  This can be derived from Bayes’ theorem  Since P(X) is constant for all classes, only needs to be maximized ) ( ) ( ) | ( ) | ( X X X P i C P i C P i C P  ) ( ) | ( ) | ( i C P i C P i C P X X 
  • 16. Naïve Bayes Classifier  A simplified assumption: attributes are conditionally independent (i.e., no dependence relation between attributes):  This greatly reduces the computation cost: Only counts the class distribution  If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)  If Ak is continous-valued, P(xk|Ci) is usually computed based on Gaussian distribution with a mean μ and standard deviation σ and P(xk|Ci) is ) | ( ... ) | ( ) | ( 1 ) | ( ) | ( 2 1 Ci x P Ci x P Ci x P n k Ci x P Ci P n k        X 2 2 2 ) ( 2 1 ) , , (          x e x g ) , , ( ) | ( i i C C k x g Ci P    X
  • 17. Naïve Bayes Classifier Example 1 Class: C1:buys_computer = ‘yes’ C2:buys_computer = ‘no’ Data to be classified: X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)
  • 18. Naïve Bayes Classifier Example 1 Solution Age P(Y) P(N) <=30 29=0.22 3/5=0.6 31…..40 49=0.44 0 >40 39=0.33 25=0.4  Prior Probability  P(Buys Computer= Yes)=914=0.642  P(Buys Computer= No)=514=0.357  Posterior/ Conditional Probability Income P(Y) P(N) High 2/9=0.22 2/5=0.4 Medium 4/9=0.44 2/5=0.4 Low 3/9=0.33 1/5=0.2 Credit Rating P(Y) P(N) Fair 6/9=0.67 2/5=0.4 Excellent 3/9=0.33 3/5=0.6 Student P(Y) P(N) Yes 6/9=0.67 3/5=0.6 No 3/9=0.33 2/5=0.4
  • 19. Naïve Bayes Classifier Example 1 Solution Age P(Y) P(N) <=30 29=0.22 3/5=0.6 31…..40 49=0.44 0 >40 39=0.33 25=0.4  Prior Probability  P(Buys Computer= Yes)=914=0.642  P(Buys Computer= No)=514=0.357  Posterior/ Conditional Probability Income P(Y) P(N) High 2/9=0.22 2/5=0.4 Medium 4/9=0.44 2/5=0.4 Low 3/9=0.33 1/5=0.2 Credit Rating P(Y) P(N) Fair 6/9=0.67 2/5=0.4 Excellent 3/9=0.33 3/5=0.6 Student P(Y) P(N) Yes 6/9=0.67 3/5=0.6 No 3/9=0.33 2/5=0.4  P(Yes)=0.22*0.44*0.33*0.67*0.33*0.22*0.44*0.33* 0.67*0.33*0.642=  P(No)=0.6*0.4*0.6*0.4*0.4*0.4*0.2*0.4*0.6*0.357=
  • 20. Naïve Bayes Classifier Example 1 Solution Age P(Y) P(N) <=30 29=0.22 3/5=0.6 31…..40 49=0.44 0 >40 39=0.33 25=0.4  Data to be classified  Age=31…40, Income= High, Student = No, Credit Rating= Excellent, Buys Computer? Income P(Y) P(N) High 2/9=0.22 2/5=0.4 Medium 4/9=0.44 2/5=0.4 Low 3/9=0.33 1/5=0.2 Credit Rating P(Y) P(N) Fair 6/9=0.67 2/5=0.4 Excellent 3/9=0.33 3/5=0.6 Student P(Y) P(N) Yes 6/9=0.67 3/5=0.6 No 3/9=0.33 2/5=0.4  P(Yes)=0.44*0.22*0.33*0.33*0.642=0.0067  P(No)=0*0.4*0.4*0.6*0.357=0 AS P(Yes)> P(No), so Buys Computer=Yes
  • 21. Naïve Bayes Classifier Example 2 Class: C1:Plaing Tennis = ‘yes’ C2: C1:Plaing Tennis = ‘no’ Data to be classified: Outlook=Rainy, Temp=Hot, Humidity=High, Windy= Strong, Play= ? Day Outlook Temp Humidity Windy Play D1 Sunny Hot High Weak No D2 Sunny Hot High Strong No D3 Overcast Hot High Weak Yes D4 Rainy Mild High Weak Yes D5 Rainy Cool Normal Weak Yes D6 Rainy Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rainy Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast Hot Normal Weak Yes D14 Rainy Mild High Strong No
  • 22. Naïve Bayes Classifier Example 2 Solution Outlook P(Y) P(N) Sunny 29=0.22 3/5=0.6 Overcast 49=0.44 0 Rainy 39=0.33 25=0.4  Prior Probability  P(Y)=914=0.6428  P(N)=514=0.3571  Posterior/ Conditional Probability Temp P(Y) P(N) Hot 2/9=0.22 2/5=0.4 Mild 4/9=0.44 2/5=0.4 Cool 3/9=0.33 1/5=0.2 Wind P(Y) P(N) Weak 6/9=0.67 2/5=0.4 Strong 3/9=0.33 3/5=0.6 Humidity P(Y) P(N) High 3/9=0.33 4/5=0.8 Normal 6/9=0.66 1/5=0.2  P(Y)=0.333*0.222*0.33*0.33*0.6428=0.0052  P(N)=0.4*0.4*0.8*0.6*0.3571=0.0274 As P(Y) <P(N) , WILL NOT PLAY
  • 23. Naïve Bayes Classifier Example 3 Given the training set for classification problem into two classes “fraud” and “normal”. There are two attributes A1 and A2 taking values 0 or 1. The Bayes classifier classifies the instance (A1=1, A2=1) into class? A1 A2 Class 1 0 fraud 1 1 fraud 1 1 fraud 1 0 normal 1 1 fraud 0 0 normal 0 0 normal 0 0 normal 1 1 normal 1 0 normal
  • 24. Benefits of Naïve Bayes Classifier  It is simple and easy to implement  It doesn’t require as much training data  It handles both continuous and discrete data  It is highly scalable with the number of predictors and data points  It is fast and can be used to make real-time predictions  It is not sensitive to irrelevant features
  • 25. Limitations of Naïve Bayes Classifier  Naive Bayes assumes that all predictors (or features) are independent, rarely happening in real life. This limits the applicability of this algorithm in real-world use cases.  This algorithm faces the ‘zero-frequency problem’ where it assigns zero probability to a categorical variable whose category in the test data set wasn’t available in the training dataset. It would be best if you used a smoothing technique to overcome this issue.  Its estimations can be wrong in some cases, so you shouldn’t take its probability outputs very seriously.
  • 26. Type of Distances used in Machine Learning algorithm  Distance metric are used to represent distances between any two data points.  There are many distance metrics. 1. Euclidean Distance 2. Manhattan Distance 3. Minkowski Distance 4. Hamming Distance
  • 27. Type of Distances used in Machine Learning algorithm Euclidean distance :√(X₂-X₁)²+(Y₂-Y₁)² Lets calculate Distance between { 2, 3 } from { 3, 5 } = √(3-2)²+(5-3)² =√(1)²+(2)² = √(1+4 = √(5 Calculate Distance between { 40, 20 } from {20, 35 }
  • 28. Type of Distances used in Machine Learning algorithm Manhattan Distance The Manhattan distance as the sum of absolute differences Lets calculate Distance between { 2, 3 } from { 3, 5 } =|2–3|+|3–5| = |-1| + |-2| = 1+2 = 3 Calculate Distance between { 40, 20 } from {20, 35 } |x1 — x2| + |y1 — y2|
  • 29. The k-Nearest Neighbor Algorithm  The k-nearest neighbors (KNN) algorithm is a non-parametric  Supervised learning classifier  Uses proximity to make classifications or predictions about the grouping of an individual data point  Popular and simplest classification and regression classifiers used in machine learning today Mostly suited for Binary Classification
  • 30. The k-Nearest Neighbor Algorithm  All instances correspond to points in the n-D space  The nearest neighbor are defined in terms of Euclidean distance, dist(X1, X2)  Target function could be discrete- or real- valued  For discrete-valued, k-NN returns the most common value among the k training examples nearest to xq  Vonoroi diagram: the decision surface induced by 1-NN for a typical set of training examples . _ _ xq + _ _ + _ _ + . . . . .
  • 31.  Step #1 - Assign a value to K.  Step #2 - Calculate the distance between the new data entry and all other existing data entries. Arrange them in ascending order.  Step #3 - Find the K nearest neighbors to the new entry based on the calculated distances.  Step #4 - Assign the new data entry to the majority class in the nearest neighbors. The k-Nearest Neighbor Algorithm Steps
  • 32. Type of Distances used in Machine Learning algorithm Euclidean distance :√(X₂-X₁)²+(Y₂-Y₁)² Lets calculate Distance between { 2, 3 } from { 3, 5 } = √(3-2)²+(5-3)² =√(1)²+(2)² = √(1+4 = √(5 Calculate Distance between { 40, 20 } from {20, 35 }
  • 33. Type of Distances used in Machine Learning algorithm Manhattan Distance The Manhattan distance as the sum of absolute differences Lets calculate Distance between { 2, 3 } from { 3, 5 } =|2–3|+|3–5| = |-1| + |-2| = 1+2 = 3 Calculate Distance between { 40, 20 } from {20, 35 } |x1 — x2| + |y1 — y2|
  • 34.  For given Barbie movie IMDb Rating = 7.4, Duration = 114, Genre ? Assume K=3, use Euclidean distance Euclidean distance :√(X₂-X₁)²+(Y₂-Y₁)² The k-Nearest Neighbor Algorithm IMDb Rating Duration Genre 8.0 ( Mission Impossible) 160 Action 6.2 (Gadar 2) 170 Action 7.2 (Rocky and Rani) 168 Comedy 8.2 ( OMG 2) 155 Comedy
  • 35.  Step 1: Calculate the distances.  Calculate the distance of new movie and each movie in dataset. Distance to (8.0,160)=√(7.4-8.0)²+(114-160)² = √(0.36+2116) = 46.00 Distance to (6.2,160)=√(7.4-6.2)²+(114-160)² = √(1.44+2116) = 56.01 Distance to (7.2,168)=√(7.4-7.2)²+(114-168)² = √(0.04+2916) = 54.00 Distance to (8.2,155)=√(7.4-8.2)²+(114-155)² = √(0.64+1618) = 41.00 The k-Nearest Neighbor Algorithm (X ₁,Y₁) (X₂,Y ₂)  Step 2: Select K Nearest Neighbor.  For K=1, Shortest distance is 41.00  So, Barbie movie Genre is Comedy.
  • 36. The k-Nearest Neighbor Algorithm  Step 3: Majority Voting (Classification)  For K=3, Shortest distance is 41.00, 46.00 and 54 Action, Comedy, Comedy  So, Barbie movie Genre is Comedy.
  • 37.  For given data test tuple Brightness=20, saturation=35, Class? Assume K=5, use Euclidean distance BRIGHTNESS SATURATION CLASS 40 20 Red 50 50 Blue 60 90 Blue 10 25 Red 70 70 Blue 60 10 Red 25 80 Blue Euclidean distance :√(X₂-X₁)²+(Y₂-Y₁)² The k-Nearest Neighbor Algorithm
  • 38. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 38 Reference  Han, Jiawei Kamber, Micheline Pei and Jian, “Data Mining: Concepts and Techniques”,Elsevier Publishers, ISBN:9780123814791, 9780123814807.  https://onlinecourses.nptel.ac.in/noc24_cs22  https://medium.com/analytics-vidhya/type-of-distances-used-in-machine- learning-algorithm-c873467140de  https://www.freecodecamp.org/news/k-nearest-neighbors-algorithm- classifiers-and-model-example/