際際滷

際際滷Share a Scribd company logo
UNIT-4
SML
SVM
 Support Vector Machine (SVM) is a supervised machine learning algorithm
that can be used for both classification or regression challenges.
 However, it is mostly used in classification problems.
 In the SVM algorithm, we plot each data item as a point in n-dimensional
space (where n is a number of features you have) with the value of each
feature being the value of a particular coordinate.
 Then, we perform classification by finding the hyper-plane that
differentiates the two classes very well.
SVM
 Imagined as a surface that maximizes the boundaries between various types
of points of data that is represent in multidimensional space, also known as a
hyperplane, which creates the most homogeneous points in each subregion.
 Support vector machines can be used on any type of data, but have special
extra advantages for data types with very high dimensions relative to the
observations, for example
Text classification, in which language has the very dimensions of word
vectors
For the quality control of DNA sequencing by labeling chromatograms
correctly
Support vector machines working principles
 Support vector machines are mainly classified into three types based
on their
 working principles:
- Maximum margin classifiers
- - Support vector classifiers
- Support vector machines
Maximum margin classifier
 People usually generalize support vector machines with maximum
margin classifiers. However, there is much more to present in SVMs
compared to maximum margin classifier.
 It is feasible to draw infinite hyperplanes to classify the same set of
data upon, but the million dollar question, is which one to consider as
an ideal hyperplane?
 The maximum margin classifier provides an answer to that: the
hyperplane with the maximum margin of separation width.
Hyperplane
 Hyperplanes: Before going forward, let us quickly review what a hyperplane
is.
 In n-dimensional space, a hyperplane is a flat affine subspace of dimension n-
1.
 This means, in 2-dimensional space, the hyperplane is a straight line which
 separates the 2-dimensional space into two halves
 observations could fall in either of the regions, also called the region of
classes:
SVM
 The mathematical representation of the maximum margin classifier is
as follows, which is an optimization problem
SVM
 Constraint 2 ensures that observations will be on the correct side of
the hyperplane by taking the product of coefficients with x variables
and finally, with a class variable indicator
 In non-separable cases, the maximum margin classifier will not have a
separating hyperplane, which is also known as no feasible solution.
 This issue will be solved with support vector classifiers,
Maximum Margin Classifier
SVM
How does it work?
 the process of segregating the two classes with a hyper-plane.
 How can we identify the right hyper-plane?
Identify the right hyper-plane (Scenario-1):
 Here, we have three hyper-planes (A, B, and C). Now, identify the right hyper-
plane to classify stars and circles.
 You need to remember a thumb rule to identify the right hyper-plane: Select the
hyper-plane which segregates the two classes better. In this scenario, hyper-
plane B has excellently performed this job
Identify the right hyper-plane (Scenario-2)
 Here, we have three hyper-planes (A, B, and C) and all are segregating the classes
well. Now, How can we identify the right hyper-plane?
 Here, maximizing the distances between nearest data point (either class) and hyper-
plane will help us to decide the right hyper-plane. This distance is called as Margin
you can see that the margin for hyper-plane C is high as
compared to both A and B. Hence, we name the right hyper-
plane as C. Another lightning reason for selecting the hyper-
plane with higher margin is robustness. If we select a hyper-
plane having low margin then there is high chance of miss-
classification.
Identify the right hyper-plane (Scenario-3):
 Hint: Use the rules as discussed in previous section to identify the right hyper-
plane.
 Some of you may have selected the hyper-plane B as it has higher margin
compared to A. But, here is the catch, SVM selects the hyper-plane which
classifies the classes accurately prior to maximizing margin. Here, hyper-plane
B has a classification error and A has classified all correctly. Therefore,
the right hyper-plane is A.
Can we classify two classes (Scenario-4)?
 Below, I am unable to segregate the two classes using a straight line, as
one of the stars lies in the territory of other(circle) class as an outlier
Find the hyper-plane to segregate to classes (Scenario-5):
 In the scenario below, we cant have linear hyper-plane between the
two classes, so how does SVM classify these two classes? Till now, we
have only looked at the linear hyper-plane.
SVM
 SVM can solve this problem. Easily! It solves this problem by
introducing additional feature. Here, we will add a new feature
z=x^2+y^2. Now, lets plot the data points on axis x and z:
Support vector classifier
 Support vector classifiers are an extended version of maximum margin
classifiers, in which some violations are tolerated for non-separable cases in
order to create the best fit, even with slight errors within the threshold limit.
 In fact, in real-life scenarios, we hardly find any data with purely separable
classes; most classes have a few or more observations in overlapping classes.
 The mathematical representation of the support vector classifier is as
follows, a slight correction to the constraints to accommodate error terms.
Support Vector Classifier
Support Vector Classifier
 In constraint 4, the C value is a non-negative tuning parameter to either
accommodate more or fewer overall errors in the model.
 High value of C will lead to a more robust model, whereas a lower value
creates the flexible model due to less violation of error terms.
 In practice the C value would be a tuning parameter as is usual with all
machine learning models.
Support Vector Classifier
 The high value of C, the model would be more tolerating and also have
space for violations (errors) in the left diagram,
 whereas with the lower value of C, no scope for accepting violations leads to
a reduction in margin width.
 C is a tuning parameter in Support Vector Classifiers
Support vector machines
 Support vector machines are used when the decision boundary is non-linear
and would not be separable with support vector classifiers whatever the
cost function is.
 The following diagram explains the non-linearly separable cases for both 1-
dimension and 2-dimensions.
1-Dimensional Data Transferable
 we cannot classify using support vector classifiers whatever the cost value is.
 Another way of handling the data, called the kernel trick, using the kernel
function to work with non-linearly separable data.
 A polynomial kernel with degree 2 has been applied in transforming the data
from 1-dimensional to 2-dimensional data.
1-Dimensional Data Transferable
1-Dimensional Data Transferable
 The degree of the polynomial kernel is a tuning parameter
 The practitioner needs to tune them with various values to check
where higher accuracies are possible with the model
2-Dimensional Transferable
 In the 2-dimensional case, the kernel trick is applied as below with the
polynomial kernel with degree 2.
 It seems that observations have been classified successfully using a
linear plane after projecting the data into higher dimensions
Kernel Functions
 Original feature vectors, return the same value as the dot product of its
corresponding mapped feature vectors.
 Kernel functions do not explicitly map the feature vectors to a higher
dimensional space, or calculate the dot product of the mapped vectors.
 Kernels produce the same value through a different series of operations that
can often be computed more efficiently.
REASON
To eliminate the computational requirement to derive the higher-
dimensional vector space from the given basic vector space, so that
observations be separated linearly in higher dimensions.
 Derived vector space will grow exponentially with the increase in dimensions
and it will become almost too difficult to continue computation, even when
you have a variable size of 30 or so.
Kernel Functions
 The following example shows how the size of the variables grows.
(A) Polynomial Kernel:
 Polynomial kernels are popularly used, especially with degree 2.
 In fact, the inventor of support vector machines
 Vladimir N Vapnik, developed using a degree 2 kernel for classifying
handwritten digits.
 Polynomial kernels are given by the following equation:
(B) Radial Basis Function (RBF) / Gaussian Kernel:
 RBF kernels are a good first choice for problems requiring nonlinear models.
 A decision boundary that is a hyperplane in the mapped feature space is
similar to a decision boundary that is a hypersphere in the original space.
 The feature space produced by the Gaussian kernel can have an infinite
number of dimensions, a feat that would be impossible otherwise.
Simplified Equation as
RBF Kernel Model
Artificial Neural Networks (ANN)
 Relationship between a set of input signals and output signals using a model
derived from a replica of the biological brain, which responds to stimuli from its
sensory inputs.
 ANN methods try to model problems using interconnected artificial neurons (or
nodes) to solve machine learning problems.
 Incoming signals are received by the cell's dendrites through a biochemical process
that allows the impulses to be weighted according to their relative importance.
 The cell body begins to accumulate the incoming signals, a threshold is reached, at
which the cell fires and the output signal is then transmitted via an electrochemical
process down the axon
Artificial Neural Networks (ANN)
 At the axon terminal, an electric signal is again processed as a chemical signal
to be passed to its neighboring neurons, which will be dendrites to some
other neuron.
 A similar working principle is loosely used in building an artificial neural
network, in which each neuron has a set of inputs, each of which is given a
specific weight.
 The neuron computes a function on these weighted inputs.
 A linear neuron takes a linear combination of weighted input and applies an
activation function (sigmoid, tanh, relu, and so on) on the aggregated sum.
The details are shown in the following diagram.
Artificial Neural Networks (ANN)
 The network feeds the weighted sum of the input into the logistic function
(in case of sigmoid function).
 The logistic function returns a value between 0 and 1 based on the set
threshold.
for example, here we set the threshold as 0.7.
 Any accumulated signal greater than 0.7 gives the signal of 1 and vice
versa; any accumulated signal less than 0.7 returns the value of 0:
Biological and Artificial Neurons
Neural Network Model
 Neural network models are being considered as universal approximators,
which means by using a neural network methodology.
 we can solve any type of problems with the fine-tuned architecture.
 Hence, studying neural networks is a branch of study and special care is
needed.
 In fact, deep learning is a branch of machine learning, where every problem
is being modeled with artificial neural networks
Artificial Neural Network Model
 A typical artificial neuron with n input dendrites can be represented
by the following formula.
 w weights allow each of the n inputs of x to contribute a greater or
lesser amount to the sum of input signals.
 The accumulated value is passed to the activation function, f(x), and
the resulting signal, y(x), is the output axon
Parameters- Building neural networks
 Activation function:
Choosing an activation function plays a major role in aggregating
signals into the output signal to be propagated to the other neurons of the
network.
 Network architecture or topology:
This represents the number of layers required and the number of
neurons in each layer. More layers and neurons will create a highly non-linear
decision boundary, whereas if we reduce the architecture, the model will be
less flexible and more robust.
 Training optimization algorithm:
The selection of an optimization algorithm plays a critical role as well, in
order to converge quickly and accurately to the best optimal solutions
Parameters- Building neural networks
 Applications of Neural Networks:
In recent years, neural networks (a branch of deep learning) has gained
huge attention in terms of its application in artificial intelligence, in terms of speech,
text, vision, and many other areas.
 Images and videos:
To identify an object in an image or to classify whether it is a dog or a cat
 Text processing (NLP):
Deep-learning-based chatbot and so on
 Speech:
Recognize speech
 Structured data processing:
Building highly powerful models to obtain a non-linear decision boundary
Forward and backpropagation propagation
Forward and Backward Propogation-Intro
 Forward propagation and backpropagation are illustrated with the two
hidden layer deep neural networks in the following example, in which both
layers get three neurons each, in addition to input and output layers.
 The number of neurons in the input layer is based on the number of x
(independent) variables, whereas the number of neurons in the output layer
is decided by the number of classes the model needs to be predicted.
 Only one neuron in each layer; however, the reader can attempt to create
other neurons within the same layer. Weights and biases are initiated from
some random numbers, so that in both forward and backward passes, these
can be updated in order to minimize the errors altogether.
Forward and Backward Propagation-Intro
 During forward propagation, features are input to the network and fed
through the following layers to produce the output activation.
 If we see in the hidden layer 1, the activation obtained is the combination of
bias weight 1 and weighted combination of input values; if the overall value
crosses the threshold, it will trigger to the next layer, else the signal will be 0
to the next layer values.
 Bias values are necessary to control the trigger points.
 In some cases, the weighted combination signal is low; in those cases, bias
will compensate the extra amount for adjusting the aggregated value, which
can trigger for the next level.
Forward and Backward Propagation-Intro
Forward and Backward Propagation-Intro
1. In the last layer (also known as the output layer), outputs are calculated in
the same way from the outputs obtained from hidden layer 2 by taking the
weighted combination of weights and outputs obtained from hidden layer
2.
 Once we obtain the output from the model, a comparison needs to be made
with the actual value and we need to backpropagate the errors across the
net backward in order to correct the weights of the entire neural network
Statistical Machine Learning unit4 lecture notes
Statistical Machine Learning unit4 lecture notes
Forward and Backward Propagation
 we have taken the derivative of the output value and multiplied by
that much amount to the error component, which was obtained from
differencing the actual value with the model output
Statistical Machine Learning unit4 lecture notes
Forward and Backward Propagation
 we will backpropagate the error from the second hidden layer as well.
 In the following diagram, errors are computed from the Hidden 4
neuron in the second hidden layer
Statistical Machine Learning unit4 lecture notes
Statistical Machine Learning unit4 lecture notes
Forward and Backward Propagation
 Once all the neurons in hidden layer 1 are updated, weights between
inputs and the hidden layer also need to be updated, as we cannot
update anything on input variables.
 we will be updating the weights of both the inputs and also, at the
same time, the neurons in hidden layer 1, as neurons in layer 1 utilize
the weights from input only
Statistical Machine Learning unit4 lecture notes
Statistical Machine Learning unit4 lecture notes
Forward and Backward Propagation
 We have not shown the next iteration, in which neurons in the output layer
are updated with errors and backpropagation started again.
 In a similar way, all the weights get updated until a solution converges or the
number of iterations is reached.
Optimization of neural networks
Various techniques have been used for optimizing the weights of neural
networks:
 Stochastic gradient descent (SGD)
 Momentum
 Nesterov accelerated gradient (NAG)
 Adaptive gradient (Adagrad)
 Adadelta
 RMSprop
 Adaptive moment estimation (Adam)
 Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)
Optimization of neural networks
 Adam is a good default choice; we will be covering its working
methodology in this section. If you cannot afford full batch updates, then
try out L-BFGS:
Stochastic gradient descent- SGD
 Gradient descent is a way to minimize an objective function J(慮)
parameterized by a model's parameter 慮 竜 Rd by updating the
parameters in the opposite direction of the gradient of the objective
function with regard to the parameters.
 The learning rate determines the size of the steps taken to reach the
minimum.
 Batch gradient descent (all training observations utilized in each
iteration)
 SGD (one observation per iteration)
 Mini batch gradient descent (size of about 50 training observations for
each iteration)
Gradient Descent
Statistical Machine Learning unit4 lecture notes

More Related Content

Similar to Statistical Machine Learning unit4 lecture notes (20)

Svm ms
Svm msSvm ms
Svm ms
student
SVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.ppt
SVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.pptSVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.ppt
SVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.ppt
Candy491
SUPPORT _ VECTOR _ MACHINE _ PRESENTATION
SUPPORT _ VECTOR _ MACHINE _ PRESENTATIONSUPPORT _ VECTOR _ MACHINE _ PRESENTATION
SUPPORT _ VECTOR _ MACHINE _ PRESENTATION
priinku0410
lec10svm.ppt
lec10svm.pptlec10svm.ppt
lec10svm.ppt
kibrualemu812
Support Vector Machines (lecture by Geoffrey Hinton)
Support Vector Machines (lecture by Geoffrey Hinton)Support Vector Machines (lecture by Geoffrey Hinton)
Support Vector Machines (lecture by Geoffrey Hinton)
ssuser0f60fc2
lec10svm.ppt SVM lecture machine learning
lec10svm.ppt SVM lecture machine learninglec10svm.ppt SVM lecture machine learning
lec10svm.ppt SVM lecture machine learning
AmgadAbdallah2
Classification-Support Vector Machines.pptx
Classification-Support Vector Machines.pptxClassification-Support Vector Machines.pptx
Classification-Support Vector Machines.pptx
Ciceer Ghimirey
sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machine
Shital Andhale
svm.pptx
svm.pptxsvm.pptx
svm.pptx
PriyadharshiniG41
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
rajalakshmi5921
SVM[Support vector Machine] Machine learning
SVM[Support vector Machine] Machine learningSVM[Support vector Machine] Machine learning
SVM[Support vector Machine] Machine learning
aawezix
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
manaswinimysore
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
AyanaRukasar
SVM FOR GRADE 11 pearson Btec 3rd level.ppt
SVM FOR GRADE 11 pearson Btec 3rd level.pptSVM FOR GRADE 11 pearson Btec 3rd level.ppt
SVM FOR GRADE 11 pearson Btec 3rd level.ppt
abigailjudith8
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
覦 蟾
SVM(support vector Machine)withExplanation.pdf
SVM(support vector Machine)withExplanation.pdfSVM(support vector Machine)withExplanation.pdf
SVM(support vector Machine)withExplanation.pdf
ansarinazish958
properties, application and issues of support vector machine
properties, application and issues of support vector machineproperties, application and issues of support vector machine
properties, application and issues of support vector machine
Dr. Radhey Shyam
Support vector machine
Support vector machineSupport vector machine
Support vector machine
zekeLabs Technologies
lecture_16.pptx
lecture_16.pptxlecture_16.pptx
lecture_16.pptx
ObaidUllah693733
Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation Controller
Zahra Sadeghi
Svm ms
Svm msSvm ms
Svm ms
student
SVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.ppt
SVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.pptSVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.ppt
SVM_UNI_TORON_SPACE_VECTOR_MACHINE_MACHINE_LEARNING.ppt
Candy491
SUPPORT _ VECTOR _ MACHINE _ PRESENTATION
SUPPORT _ VECTOR _ MACHINE _ PRESENTATIONSUPPORT _ VECTOR _ MACHINE _ PRESENTATION
SUPPORT _ VECTOR _ MACHINE _ PRESENTATION
priinku0410
Support Vector Machines (lecture by Geoffrey Hinton)
Support Vector Machines (lecture by Geoffrey Hinton)Support Vector Machines (lecture by Geoffrey Hinton)
Support Vector Machines (lecture by Geoffrey Hinton)
ssuser0f60fc2
lec10svm.ppt SVM lecture machine learning
lec10svm.ppt SVM lecture machine learninglec10svm.ppt SVM lecture machine learning
lec10svm.ppt SVM lecture machine learning
AmgadAbdallah2
Classification-Support Vector Machines.pptx
Classification-Support Vector Machines.pptxClassification-Support Vector Machines.pptx
Classification-Support Vector Machines.pptx
Ciceer Ghimirey
sentiment analysis using support vector machine
sentiment analysis using support vector machinesentiment analysis using support vector machine
sentiment analysis using support vector machine
Shital Andhale
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKSSupport Vector Machines USING MACHINE LEARNING HOW IT WORKS
Support Vector Machines USING MACHINE LEARNING HOW IT WORKS
rajalakshmi5921
SVM[Support vector Machine] Machine learning
SVM[Support vector Machine] Machine learningSVM[Support vector Machine] Machine learning
SVM[Support vector Machine] Machine learning
aawezix
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
manaswinimysore
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
AyanaRukasar
SVM FOR GRADE 11 pearson Btec 3rd level.ppt
SVM FOR GRADE 11 pearson Btec 3rd level.pptSVM FOR GRADE 11 pearson Btec 3rd level.ppt
SVM FOR GRADE 11 pearson Btec 3rd level.ppt
abigailjudith8
Anomaly Detection and Localization Using GAN and One-Class Classifier
Anomaly Detection and Localization  Using GAN and One-Class ClassifierAnomaly Detection and Localization  Using GAN and One-Class Classifier
Anomaly Detection and Localization Using GAN and One-Class Classifier
覦 蟾
SVM(support vector Machine)withExplanation.pdf
SVM(support vector Machine)withExplanation.pdfSVM(support vector Machine)withExplanation.pdf
SVM(support vector Machine)withExplanation.pdf
ansarinazish958
properties, application and issues of support vector machine
properties, application and issues of support vector machineproperties, application and issues of support vector machine
properties, application and issues of support vector machine
Dr. Radhey Shyam
Cerebellar Model Articulation Controller
Cerebellar Model Articulation ControllerCerebellar Model Articulation Controller
Cerebellar Model Articulation Controller
Zahra Sadeghi

More from SureshK256753 (7)

Notes Computer Organization and Architecture
Notes Computer Organization and ArchitectureNotes Computer Organization and Architecture
Notes Computer Organization and Architecture
SureshK256753
Statistical Machine Learning unit3 lecture notes
Statistical Machine Learning unit3 lecture notesStatistical Machine Learning unit3 lecture notes
Statistical Machine Learning unit3 lecture notes
SureshK256753
Statistical Machine Learning Lecture notes
Statistical Machine Learning  Lecture notesStatistical Machine Learning  Lecture notes
Statistical Machine Learning Lecture notes
SureshK256753
Open_Closed_set_Topology.pptx
Open_Closed_set_Topology.pptxOpen_Closed_set_Topology.pptx
Open_Closed_set_Topology.pptx
SureshK256753
18MMA21C-U2.pdf
18MMA21C-U2.pdf18MMA21C-U2.pdf
18MMA21C-U2.pdf
SureshK256753
18MMA21C-U5.pdf
18MMA21C-U5.pdf18MMA21C-U5.pdf
18MMA21C-U5.pdf
SureshK256753
FMS_MCQ_QUESTION_BANK.pdf
FMS_MCQ_QUESTION_BANK.pdfFMS_MCQ_QUESTION_BANK.pdf
FMS_MCQ_QUESTION_BANK.pdf
SureshK256753
Notes Computer Organization and Architecture
Notes Computer Organization and ArchitectureNotes Computer Organization and Architecture
Notes Computer Organization and Architecture
SureshK256753
Statistical Machine Learning unit3 lecture notes
Statistical Machine Learning unit3 lecture notesStatistical Machine Learning unit3 lecture notes
Statistical Machine Learning unit3 lecture notes
SureshK256753
Statistical Machine Learning Lecture notes
Statistical Machine Learning  Lecture notesStatistical Machine Learning  Lecture notes
Statistical Machine Learning Lecture notes
SureshK256753
Open_Closed_set_Topology.pptx
Open_Closed_set_Topology.pptxOpen_Closed_set_Topology.pptx
Open_Closed_set_Topology.pptx
SureshK256753
FMS_MCQ_QUESTION_BANK.pdf
FMS_MCQ_QUESTION_BANK.pdfFMS_MCQ_QUESTION_BANK.pdf
FMS_MCQ_QUESTION_BANK.pdf
SureshK256753

Recently uploaded (20)

TI LI畛U CHUYN SU L畛P 12 THEO CH働NG TRNH M畛I DNG CHO C畉 3 B畛 SCH N THI...
TI LI畛U CHUYN SU L畛P 12 THEO CH働NG TRNH M畛I DNG CHO C畉 3 B畛 SCH N THI...TI LI畛U CHUYN SU L畛P 12 THEO CH働NG TRNH M畛I DNG CHO C畉 3 B畛 SCH N THI...
TI LI畛U CHUYN SU L畛P 12 THEO CH働NG TRNH M畛I DNG CHO C畉 3 B畛 SCH N THI...
Nguyen Thanh Tu Collection
PSD-I Exam Dumps: Your Key to Passing on the First Try
PSD-I Exam Dumps: Your Key to Passing on the First TryPSD-I Exam Dumps: Your Key to Passing on the First Try
PSD-I Exam Dumps: Your Key to Passing on the First Try
lethamcmullen
Using GenAI for Universal Design for Learning
Using GenAI for Universal Design for LearningUsing GenAI for Universal Design for Learning
Using GenAI for Universal Design for Learning
Damian T. Gordon
LITERATURE QUIZ | THE QUIZ CLUB OF PSGCAS | 11 MARCH 2025 .pdf
LITERATURE QUIZ | THE QUIZ CLUB OF PSGCAS | 11 MARCH 2025 .pdfLITERATURE QUIZ | THE QUIZ CLUB OF PSGCAS | 11 MARCH 2025 .pdf
LITERATURE QUIZ | THE QUIZ CLUB OF PSGCAS | 11 MARCH 2025 .pdf
Quiz Club of PSG College of Arts & Science
PATENTABILITY UNDER THE 2025 CRI DRAFT GUIDELINES
PATENTABILITY UNDER THE 2025 CRI DRAFT GUIDELINESPATENTABILITY UNDER THE 2025 CRI DRAFT GUIDELINES
PATENTABILITY UNDER THE 2025 CRI DRAFT GUIDELINES
BananaIP Counsels
SUPPOSITORIES
SUPPOSITORIESSUPPOSITORIES
SUPPOSITORIES
Shantanu Ranjan
UNIT 1 Introduction to communication.pptx
UNIT 1 Introduction to communication.pptxUNIT 1 Introduction to communication.pptx
UNIT 1 Introduction to communication.pptx
HARIHARAN A
Proteins, Bio similars & Antibodies.pptx
Proteins, Bio similars &  Antibodies.pptxProteins, Bio similars &  Antibodies.pptx
Proteins, Bio similars & Antibodies.pptx
Ashish Umale
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-6-2025 ver 5.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-6-2025 ver 5.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-6-2025 ver 5.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-6-2025 ver 5.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
How to Invoice Shipping Cost to Customer in Odoo 17
How to Invoice Shipping Cost to Customer in Odoo 17How to Invoice Shipping Cost to Customer in Odoo 17
How to Invoice Shipping Cost to Customer in Odoo 17
Celine George
Synthesis for VIth SEM 21-2-25.pptx by Mrs. Manjushri P. Dabhade
Synthesis for VIth SEM 21-2-25.pptx by Mrs. Manjushri P. DabhadeSynthesis for VIth SEM 21-2-25.pptx by Mrs. Manjushri P. Dabhade
Synthesis for VIth SEM 21-2-25.pptx by Mrs. Manjushri P. Dabhade
Dabhade madam Dabhade
Action of Muscles ppt by Priscilla Jasper Vedam Vemavarapu @ASRHMC
Action of  Muscles ppt by Priscilla Jasper Vedam Vemavarapu @ASRHMCAction of  Muscles ppt by Priscilla Jasper Vedam Vemavarapu @ASRHMC
Action of Muscles ppt by Priscilla Jasper Vedam Vemavarapu @ASRHMC
jaspervedamvemavarap
How to manage Customer Tips with Odoo 17 Point Of Sale
How to manage Customer Tips with Odoo 17 Point Of SaleHow to manage Customer Tips with Odoo 17 Point Of Sale
How to manage Customer Tips with Odoo 17 Point Of Sale
Celine George
How to configure the retail shop in Odoo 17 Point of Sale
How to configure the retail shop in Odoo 17 Point of SaleHow to configure the retail shop in Odoo 17 Point of Sale
How to configure the retail shop in Odoo 17 Point of Sale
Celine George
"The Write Path: Navigating Research Writing, Publication, and Professional G...
"The Write Path: Navigating Research Writing, Publication, and Professional G..."The Write Path: Navigating Research Writing, Publication, and Professional G...
"The Write Path: Navigating Research Writing, Publication, and Professional G...
neelottama
Introduction to Karnaugh Maps (K-Maps) for Simplifying Boolean Expressions
Introduction to Karnaugh Maps (K-Maps) for Simplifying Boolean ExpressionsIntroduction to Karnaugh Maps (K-Maps) for Simplifying Boolean Expressions
Introduction to Karnaugh Maps (K-Maps) for Simplifying Boolean Expressions
GS Virdi
Combinatorial_Chemistry.pptx by Mrs. Manjushri P. Dabhade
Combinatorial_Chemistry.pptx by Mrs. Manjushri P. DabhadeCombinatorial_Chemistry.pptx by Mrs. Manjushri P. Dabhade
Combinatorial_Chemistry.pptx by Mrs. Manjushri P. Dabhade
Dabhade madam Dabhade
Test Bank Pharmacology 3rd Edition Brenner Stevens
Test Bank Pharmacology 3rd Edition Brenner  StevensTest Bank Pharmacology 3rd Edition Brenner  Stevens
Test Bank Pharmacology 3rd Edition Brenner Stevens
evakimworwa38
How to process Interwarehouse and Intrawarehouse transfers in Odoo
How to process Interwarehouse and Intrawarehouse transfers in OdooHow to process Interwarehouse and Intrawarehouse transfers in Odoo
How to process Interwarehouse and Intrawarehouse transfers in Odoo
Celine George
BUSINESS QUIZ | THE QUIZ CLUB OF PSGCAS | 17TH MARCH 2025 .pptx
BUSINESS QUIZ | THE QUIZ CLUB OF PSGCAS | 17TH MARCH 2025 .pptxBUSINESS QUIZ | THE QUIZ CLUB OF PSGCAS | 17TH MARCH 2025 .pptx
BUSINESS QUIZ | THE QUIZ CLUB OF PSGCAS | 17TH MARCH 2025 .pptx
Quiz Club of PSG College of Arts & Science
TI LI畛U CHUYN SU L畛P 12 THEO CH働NG TRNH M畛I DNG CHO C畉 3 B畛 SCH N THI...
TI LI畛U CHUYN SU L畛P 12 THEO CH働NG TRNH M畛I DNG CHO C畉 3 B畛 SCH N THI...TI LI畛U CHUYN SU L畛P 12 THEO CH働NG TRNH M畛I DNG CHO C畉 3 B畛 SCH N THI...
TI LI畛U CHUYN SU L畛P 12 THEO CH働NG TRNH M畛I DNG CHO C畉 3 B畛 SCH N THI...
Nguyen Thanh Tu Collection
PSD-I Exam Dumps: Your Key to Passing on the First Try
PSD-I Exam Dumps: Your Key to Passing on the First TryPSD-I Exam Dumps: Your Key to Passing on the First Try
PSD-I Exam Dumps: Your Key to Passing on the First Try
lethamcmullen
Using GenAI for Universal Design for Learning
Using GenAI for Universal Design for LearningUsing GenAI for Universal Design for Learning
Using GenAI for Universal Design for Learning
Damian T. Gordon
PATENTABILITY UNDER THE 2025 CRI DRAFT GUIDELINES
PATENTABILITY UNDER THE 2025 CRI DRAFT GUIDELINESPATENTABILITY UNDER THE 2025 CRI DRAFT GUIDELINES
PATENTABILITY UNDER THE 2025 CRI DRAFT GUIDELINES
BananaIP Counsels
UNIT 1 Introduction to communication.pptx
UNIT 1 Introduction to communication.pptxUNIT 1 Introduction to communication.pptx
UNIT 1 Introduction to communication.pptx
HARIHARAN A
Proteins, Bio similars & Antibodies.pptx
Proteins, Bio similars &  Antibodies.pptxProteins, Bio similars &  Antibodies.pptx
Proteins, Bio similars & Antibodies.pptx
Ashish Umale
How to Invoice Shipping Cost to Customer in Odoo 17
How to Invoice Shipping Cost to Customer in Odoo 17How to Invoice Shipping Cost to Customer in Odoo 17
How to Invoice Shipping Cost to Customer in Odoo 17
Celine George
Synthesis for VIth SEM 21-2-25.pptx by Mrs. Manjushri P. Dabhade
Synthesis for VIth SEM 21-2-25.pptx by Mrs. Manjushri P. DabhadeSynthesis for VIth SEM 21-2-25.pptx by Mrs. Manjushri P. Dabhade
Synthesis for VIth SEM 21-2-25.pptx by Mrs. Manjushri P. Dabhade
Dabhade madam Dabhade
Action of Muscles ppt by Priscilla Jasper Vedam Vemavarapu @ASRHMC
Action of  Muscles ppt by Priscilla Jasper Vedam Vemavarapu @ASRHMCAction of  Muscles ppt by Priscilla Jasper Vedam Vemavarapu @ASRHMC
Action of Muscles ppt by Priscilla Jasper Vedam Vemavarapu @ASRHMC
jaspervedamvemavarap
How to manage Customer Tips with Odoo 17 Point Of Sale
How to manage Customer Tips with Odoo 17 Point Of SaleHow to manage Customer Tips with Odoo 17 Point Of Sale
How to manage Customer Tips with Odoo 17 Point Of Sale
Celine George
How to configure the retail shop in Odoo 17 Point of Sale
How to configure the retail shop in Odoo 17 Point of SaleHow to configure the retail shop in Odoo 17 Point of Sale
How to configure the retail shop in Odoo 17 Point of Sale
Celine George
"The Write Path: Navigating Research Writing, Publication, and Professional G...
"The Write Path: Navigating Research Writing, Publication, and Professional G..."The Write Path: Navigating Research Writing, Publication, and Professional G...
"The Write Path: Navigating Research Writing, Publication, and Professional G...
neelottama
Introduction to Karnaugh Maps (K-Maps) for Simplifying Boolean Expressions
Introduction to Karnaugh Maps (K-Maps) for Simplifying Boolean ExpressionsIntroduction to Karnaugh Maps (K-Maps) for Simplifying Boolean Expressions
Introduction to Karnaugh Maps (K-Maps) for Simplifying Boolean Expressions
GS Virdi
Combinatorial_Chemistry.pptx by Mrs. Manjushri P. Dabhade
Combinatorial_Chemistry.pptx by Mrs. Manjushri P. DabhadeCombinatorial_Chemistry.pptx by Mrs. Manjushri P. Dabhade
Combinatorial_Chemistry.pptx by Mrs. Manjushri P. Dabhade
Dabhade madam Dabhade
Test Bank Pharmacology 3rd Edition Brenner Stevens
Test Bank Pharmacology 3rd Edition Brenner  StevensTest Bank Pharmacology 3rd Edition Brenner  Stevens
Test Bank Pharmacology 3rd Edition Brenner Stevens
evakimworwa38
How to process Interwarehouse and Intrawarehouse transfers in Odoo
How to process Interwarehouse and Intrawarehouse transfers in OdooHow to process Interwarehouse and Intrawarehouse transfers in Odoo
How to process Interwarehouse and Intrawarehouse transfers in Odoo
Celine George

Statistical Machine Learning unit4 lecture notes

  • 2. SVM Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for both classification or regression challenges. However, it is mostly used in classification problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is a number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the two classes very well.
  • 3. SVM Imagined as a surface that maximizes the boundaries between various types of points of data that is represent in multidimensional space, also known as a hyperplane, which creates the most homogeneous points in each subregion. Support vector machines can be used on any type of data, but have special extra advantages for data types with very high dimensions relative to the observations, for example Text classification, in which language has the very dimensions of word vectors For the quality control of DNA sequencing by labeling chromatograms correctly
  • 4. Support vector machines working principles Support vector machines are mainly classified into three types based on their working principles: - Maximum margin classifiers - - Support vector classifiers - Support vector machines
  • 5. Maximum margin classifier People usually generalize support vector machines with maximum margin classifiers. However, there is much more to present in SVMs compared to maximum margin classifier. It is feasible to draw infinite hyperplanes to classify the same set of data upon, but the million dollar question, is which one to consider as an ideal hyperplane? The maximum margin classifier provides an answer to that: the hyperplane with the maximum margin of separation width.
  • 6. Hyperplane Hyperplanes: Before going forward, let us quickly review what a hyperplane is. In n-dimensional space, a hyperplane is a flat affine subspace of dimension n- 1. This means, in 2-dimensional space, the hyperplane is a straight line which separates the 2-dimensional space into two halves observations could fall in either of the regions, also called the region of classes:
  • 7. SVM The mathematical representation of the maximum margin classifier is as follows, which is an optimization problem
  • 8. SVM Constraint 2 ensures that observations will be on the correct side of the hyperplane by taking the product of coefficients with x variables and finally, with a class variable indicator In non-separable cases, the maximum margin classifier will not have a separating hyperplane, which is also known as no feasible solution. This issue will be solved with support vector classifiers,
  • 10. SVM
  • 11. How does it work? the process of segregating the two classes with a hyper-plane. How can we identify the right hyper-plane?
  • 12. Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B, and C). Now, identify the right hyper- plane to classify stars and circles. You need to remember a thumb rule to identify the right hyper-plane: Select the hyper-plane which segregates the two classes better. In this scenario, hyper- plane B has excellently performed this job
  • 13. Identify the right hyper-plane (Scenario-2) Here, we have three hyper-planes (A, B, and C) and all are segregating the classes well. Now, How can we identify the right hyper-plane? Here, maximizing the distances between nearest data point (either class) and hyper- plane will help us to decide the right hyper-plane. This distance is called as Margin
  • 14. you can see that the margin for hyper-plane C is high as compared to both A and B. Hence, we name the right hyper- plane as C. Another lightning reason for selecting the hyper- plane with higher margin is robustness. If we select a hyper- plane having low margin then there is high chance of miss- classification.
  • 15. Identify the right hyper-plane (Scenario-3): Hint: Use the rules as discussed in previous section to identify the right hyper- plane. Some of you may have selected the hyper-plane B as it has higher margin compared to A. But, here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior to maximizing margin. Here, hyper-plane B has a classification error and A has classified all correctly. Therefore, the right hyper-plane is A.
  • 16. Can we classify two classes (Scenario-4)? Below, I am unable to segregate the two classes using a straight line, as one of the stars lies in the territory of other(circle) class as an outlier
  • 17. Find the hyper-plane to segregate to classes (Scenario-5): In the scenario below, we cant have linear hyper-plane between the two classes, so how does SVM classify these two classes? Till now, we have only looked at the linear hyper-plane.
  • 18. SVM SVM can solve this problem. Easily! It solves this problem by introducing additional feature. Here, we will add a new feature z=x^2+y^2. Now, lets plot the data points on axis x and z:
  • 19. Support vector classifier Support vector classifiers are an extended version of maximum margin classifiers, in which some violations are tolerated for non-separable cases in order to create the best fit, even with slight errors within the threshold limit. In fact, in real-life scenarios, we hardly find any data with purely separable classes; most classes have a few or more observations in overlapping classes. The mathematical representation of the support vector classifier is as follows, a slight correction to the constraints to accommodate error terms.
  • 21. Support Vector Classifier In constraint 4, the C value is a non-negative tuning parameter to either accommodate more or fewer overall errors in the model. High value of C will lead to a more robust model, whereas a lower value creates the flexible model due to less violation of error terms. In practice the C value would be a tuning parameter as is usual with all machine learning models.
  • 22. Support Vector Classifier The high value of C, the model would be more tolerating and also have space for violations (errors) in the left diagram, whereas with the lower value of C, no scope for accepting violations leads to a reduction in margin width. C is a tuning parameter in Support Vector Classifiers
  • 23. Support vector machines Support vector machines are used when the decision boundary is non-linear and would not be separable with support vector classifiers whatever the cost function is. The following diagram explains the non-linearly separable cases for both 1- dimension and 2-dimensions.
  • 24. 1-Dimensional Data Transferable we cannot classify using support vector classifiers whatever the cost value is. Another way of handling the data, called the kernel trick, using the kernel function to work with non-linearly separable data. A polynomial kernel with degree 2 has been applied in transforming the data from 1-dimensional to 2-dimensional data.
  • 26. 1-Dimensional Data Transferable The degree of the polynomial kernel is a tuning parameter The practitioner needs to tune them with various values to check where higher accuracies are possible with the model
  • 27. 2-Dimensional Transferable In the 2-dimensional case, the kernel trick is applied as below with the polynomial kernel with degree 2. It seems that observations have been classified successfully using a linear plane after projecting the data into higher dimensions
  • 28. Kernel Functions Original feature vectors, return the same value as the dot product of its corresponding mapped feature vectors. Kernel functions do not explicitly map the feature vectors to a higher dimensional space, or calculate the dot product of the mapped vectors. Kernels produce the same value through a different series of operations that can often be computed more efficiently. REASON To eliminate the computational requirement to derive the higher- dimensional vector space from the given basic vector space, so that observations be separated linearly in higher dimensions. Derived vector space will grow exponentially with the increase in dimensions and it will become almost too difficult to continue computation, even when you have a variable size of 30 or so.
  • 29. Kernel Functions The following example shows how the size of the variables grows.
  • 30. (A) Polynomial Kernel: Polynomial kernels are popularly used, especially with degree 2. In fact, the inventor of support vector machines Vladimir N Vapnik, developed using a degree 2 kernel for classifying handwritten digits. Polynomial kernels are given by the following equation:
  • 31. (B) Radial Basis Function (RBF) / Gaussian Kernel: RBF kernels are a good first choice for problems requiring nonlinear models. A decision boundary that is a hyperplane in the mapped feature space is similar to a decision boundary that is a hypersphere in the original space. The feature space produced by the Gaussian kernel can have an infinite number of dimensions, a feat that would be impossible otherwise. Simplified Equation as
  • 33. Artificial Neural Networks (ANN) Relationship between a set of input signals and output signals using a model derived from a replica of the biological brain, which responds to stimuli from its sensory inputs. ANN methods try to model problems using interconnected artificial neurons (or nodes) to solve machine learning problems. Incoming signals are received by the cell's dendrites through a biochemical process that allows the impulses to be weighted according to their relative importance. The cell body begins to accumulate the incoming signals, a threshold is reached, at which the cell fires and the output signal is then transmitted via an electrochemical process down the axon
  • 34. Artificial Neural Networks (ANN) At the axon terminal, an electric signal is again processed as a chemical signal to be passed to its neighboring neurons, which will be dendrites to some other neuron. A similar working principle is loosely used in building an artificial neural network, in which each neuron has a set of inputs, each of which is given a specific weight. The neuron computes a function on these weighted inputs. A linear neuron takes a linear combination of weighted input and applies an activation function (sigmoid, tanh, relu, and so on) on the aggregated sum. The details are shown in the following diagram.
  • 35. Artificial Neural Networks (ANN) The network feeds the weighted sum of the input into the logistic function (in case of sigmoid function). The logistic function returns a value between 0 and 1 based on the set threshold. for example, here we set the threshold as 0.7. Any accumulated signal greater than 0.7 gives the signal of 1 and vice versa; any accumulated signal less than 0.7 returns the value of 0:
  • 37. Neural Network Model Neural network models are being considered as universal approximators, which means by using a neural network methodology. we can solve any type of problems with the fine-tuned architecture. Hence, studying neural networks is a branch of study and special care is needed. In fact, deep learning is a branch of machine learning, where every problem is being modeled with artificial neural networks
  • 38. Artificial Neural Network Model A typical artificial neuron with n input dendrites can be represented by the following formula. w weights allow each of the n inputs of x to contribute a greater or lesser amount to the sum of input signals. The accumulated value is passed to the activation function, f(x), and the resulting signal, y(x), is the output axon
  • 39. Parameters- Building neural networks Activation function: Choosing an activation function plays a major role in aggregating signals into the output signal to be propagated to the other neurons of the network. Network architecture or topology: This represents the number of layers required and the number of neurons in each layer. More layers and neurons will create a highly non-linear decision boundary, whereas if we reduce the architecture, the model will be less flexible and more robust. Training optimization algorithm: The selection of an optimization algorithm plays a critical role as well, in order to converge quickly and accurately to the best optimal solutions
  • 40. Parameters- Building neural networks Applications of Neural Networks: In recent years, neural networks (a branch of deep learning) has gained huge attention in terms of its application in artificial intelligence, in terms of speech, text, vision, and many other areas. Images and videos: To identify an object in an image or to classify whether it is a dog or a cat Text processing (NLP): Deep-learning-based chatbot and so on Speech: Recognize speech Structured data processing: Building highly powerful models to obtain a non-linear decision boundary
  • 42. Forward and Backward Propogation-Intro Forward propagation and backpropagation are illustrated with the two hidden layer deep neural networks in the following example, in which both layers get three neurons each, in addition to input and output layers. The number of neurons in the input layer is based on the number of x (independent) variables, whereas the number of neurons in the output layer is decided by the number of classes the model needs to be predicted. Only one neuron in each layer; however, the reader can attempt to create other neurons within the same layer. Weights and biases are initiated from some random numbers, so that in both forward and backward passes, these can be updated in order to minimize the errors altogether.
  • 43. Forward and Backward Propagation-Intro During forward propagation, features are input to the network and fed through the following layers to produce the output activation. If we see in the hidden layer 1, the activation obtained is the combination of bias weight 1 and weighted combination of input values; if the overall value crosses the threshold, it will trigger to the next layer, else the signal will be 0 to the next layer values. Bias values are necessary to control the trigger points. In some cases, the weighted combination signal is low; in those cases, bias will compensate the extra amount for adjusting the aggregated value, which can trigger for the next level.
  • 44. Forward and Backward Propagation-Intro
  • 45. Forward and Backward Propagation-Intro 1. In the last layer (also known as the output layer), outputs are calculated in the same way from the outputs obtained from hidden layer 2 by taking the weighted combination of weights and outputs obtained from hidden layer 2. Once we obtain the output from the model, a comparison needs to be made with the actual value and we need to backpropagate the errors across the net backward in order to correct the weights of the entire neural network
  • 48. Forward and Backward Propagation we have taken the derivative of the output value and multiplied by that much amount to the error component, which was obtained from differencing the actual value with the model output
  • 50. Forward and Backward Propagation we will backpropagate the error from the second hidden layer as well. In the following diagram, errors are computed from the Hidden 4 neuron in the second hidden layer
  • 53. Forward and Backward Propagation Once all the neurons in hidden layer 1 are updated, weights between inputs and the hidden layer also need to be updated, as we cannot update anything on input variables. we will be updating the weights of both the inputs and also, at the same time, the neurons in hidden layer 1, as neurons in layer 1 utilize the weights from input only
  • 56. Forward and Backward Propagation We have not shown the next iteration, in which neurons in the output layer are updated with errors and backpropagation started again. In a similar way, all the weights get updated until a solution converges or the number of iterations is reached.
  • 57. Optimization of neural networks Various techniques have been used for optimizing the weights of neural networks: Stochastic gradient descent (SGD) Momentum Nesterov accelerated gradient (NAG) Adaptive gradient (Adagrad) Adadelta RMSprop Adaptive moment estimation (Adam) Limited memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)
  • 58. Optimization of neural networks Adam is a good default choice; we will be covering its working methodology in this section. If you cannot afford full batch updates, then try out L-BFGS:
  • 59. Stochastic gradient descent- SGD Gradient descent is a way to minimize an objective function J(慮) parameterized by a model's parameter 慮 竜 Rd by updating the parameters in the opposite direction of the gradient of the objective function with regard to the parameters. The learning rate determines the size of the steps taken to reach the minimum. Batch gradient descent (all training observations utilized in each iteration) SGD (one observation per iteration) Mini batch gradient descent (size of about 50 training observations for each iteration)