狠狠撸

Machine Learning
Neural Networks
狠狠撸s mostly adapted from Tom
Mithcell, Han and Kamber

Artificial Neural Networks
飦� Computational models inspired by the human
brain:
飦� Algorithms that try to mimic the brain.
飦� Massively parallel, distributed system, made up of
simple processing units (neurons)
飦� Synaptic connection strengths among neurons are
used to store the acquired knowledge.
飦� Knowledge is acquired by the network from its
environment through a learning process

History
飦� late-1800's - Neural Networks appear as an
analogy to biological systems
飦� 1960's and 70's 鈥� Simple neural networks appear
飦� Fall out of favor because the perceptron is not
effective by itself, and there were no good algorithms
for multilayer nets
飦� 1986 鈥� Backpropagation algorithm appears
飦� Neural Networks have a resurgence in popularity
飦� More computationally expensive

Applications of ANNs
飦� ANNs have been widely used in various domains
for:
飦� Pattern recognition
飦� Function approximation
飦� Associative memory

Properties
飦� Inputs are flexible
飦� any real values
飦� Highly correlated or independent
飦� Target function may be discrete-valued, real-valued, or
vectors of discrete or real values
飦� Outputs are real numbers between 0 and 1
飦� Resistant to errors in the training data
飦� Long training time
飦� Fast evaluation
飦� The function produced can be difficult for humans to
interpret

When to consider neural networks
飦� Input is high-dimensional discrete or raw-valued
飦� Output is discrete or real-valued
飦� Output is a vector of values
飦� Possibly noisy data
飦� Form of target function is unknown
飦� Human readability of the result is not important
Examples:
飦� Speech phoneme recognition
飦� Image classification
飦� Financial prediction

September 18, 2024
Data Mining: Concepts and
Techniques 7
A Neuron (= a perceptron)
飦� The n-dimensional input vector x is mapped into variable y by
means of the scalar product and a nonlinear function mapping
t
-
f
weighted
sum
Input
vector x
output y
Activation
function
weight
vector w
飪�
w0
w1
wn
x0
x1
xn
)
sign(
y
e
For Exampl
n
0
i
t
x
w i
i 飥�
飥� 飪�
飥�

Perceptron
飦� Basic unit in a neural network
飦� Linear separator
飦� Parts
飦� N inputs, x1 ... xn
飦� Weights for each input, w1 ... wn
飦� A bias input x0 (constant) and associated weight w0
飦� Weighted sum of inputs, y = w0x0 + w1x1 + ... + wnxn
飦� A threshold function or activation function,
飦琲.e 1 if y > t, -1 if y <= t

Artificial Neural Networks (ANN)
飦� Model is an assembly of
inter-connected nodes
and weighted links
飦� Output node sums up
each of its input value
according to the weights
of its links
飦� Compare output node
against some threshold t
飦�
X1
X2
X3
Y
Black box
w1
t
Output
node
Input
nodes
w2
w3
)
( t
x
w
I
Y
i
i
i 飥�
飥� 飪�
Perceptron Model
)
( t
x
w
sign
Y
i
i
i 飥�
飥� 飪�
or

Types of connectivity
飦� Feedforward networks
飦� These compute a series of
transformations
飦� Typically, the first layer is the
input and the last layer is the
output.
飦� Recurrent networks
飦� These have directed cycles in their
connection graph. They can have
complicated dynamics.
飦� More biologically realistic.
hidden units
output units
input units

Different Network Topologies
飦� Single layer feed-forward networks
飦� Input layer projecting into the output layer
Input Output
layer layer
Single layer
network

飦� Multi-layer feed-forward networks
飦� One or more hidden layers. Input projects only
from previous layers onto a layer.
Input Hidden Output
layer layer layer
2-layer or
1-hidden layer
fully connected
network

飦� Multi-layer feed-forward networks
Input Hidden Output
layer layers layer

飦� Recurrent networks
飦� A network with feedback, where some of its
inputs are connected to some of its outputs (discrete
time).
Input Output
layer layer
Recurrent
network

Algorithm for learning ANN
飦� Initialize the weights (w0, w1, 鈥�, wk)
飦� Adjust the weights in such a way that the output
of ANN is consistent with class labels of training
examples
飦� Error function:
飦� Find the weights wi鈥檚 that minimize the above error
function
飦� e.g., gradient descent, backpropagation algorithm
飦� 飦�2
)
,
(
飪� 飥�
飥�
i
i
i
i X
w
f
Y
E

Optimizing concave/convex function
飦� Maximum of a concave function = minimum of a
convex function
Gradient ascent (concave) / Gradient descent (convex)
Gradient ascent rule

ann ppt , multilayer perceptron. presentation on

Decision surface of a perceptron
飦� Decision surface is a hyperplane
飦� Can capture linearly separable classes
飦� Non-linearly separable
飦� Use a network of them

Multi-layer Networks
飦� Linear units inappropriate
飦� No more expressive than a single layer
飦� 鈥� Introduce non-linearity
飦� Threshold not differentiable
飦� 鈥� Use sigmoid function

September 18, 2024
Techniques 31
Backpropagation
飦� Iteratively process a set of training tuples & compare the network's
prediction with the actual known target value
飦� For each training tuple, the weights are modified to minimize the mean
squared error between the network's prediction and the actual target
value
飦� Modifications are made in the 鈥渂ackwards鈥� direction: from the output
layer, through each hidden layer down to the first hidden layer, hence
鈥渂ackpropagation鈥�
飦� Steps
飦� Initialize weights (to small random #s) and biases in the network
飦� Propagate the inputs forward (by applying activation function)
飦� Backpropagate the error (by updating weights and biases)
飦� Terminating condition (when error is very small, etc.)

September 18, 2024
Techniques 33
How A Multi-Layer Neural Network Works?
飦� The inputs to the network correspond to the attributes measured for
each training tuple
飦� Inputs are fed simultaneously into the units making up the input layer
飦� They are then weighted and fed simultaneously to a hidden layer
飦� The number of hidden layers is arbitrary, although usually only one
飦� The weighted outputs of the last hidden layer are input to units making
up the output layer, which emits the network's prediction
飦� The network is feed-forward in that none of the weights cycles back to
an input unit or to an output unit of a previous layer
飦� From a statistical point of view, networks perform nonlinear regression:
Given enough hidden units and enough training samples, they can
closely approximate any function

September 18, 2024
Techniques 34
Defining a Network Topology
飦� First decide the network topology: # of units in the input
layer, # of hidden layers (if > 1), # of units in each hidden
layer, and # of units in the output layer
飦� Normalizing the input values for each attribute measured in
the training tuples to [0.0鈥�1.0]
飦� One input unit per domain value, each initialized to 0
飦� Output, if for classification and more than two classes, one
output unit per class is used
飦� Once a network has been trained and its accuracy is
unacceptable, repeat the training process with a different
network topology or a different set of initial weights

September 18, 2024
Techniques 35
Backpropagation and Interpretability
飦� Efficiency of backpropagation: Each epoch (one interation through the
training set) takes O(|D| * w), with |D| tuples and w weights, but # of
epochs can be exponential to n, the number of inputs, in the worst case
飦� Rule extraction from networks: network pruning
飦� Simplify the network structure by removing weighted links that have the
least effect on the trained network
飦� Then perform link, unit, or activation value clustering
飦� The set of input and activation values are studied to derive rules
describing the relationship between the input and hidden unit layers
飦� Sensitivity analysis: assess the impact that a given input variable has on a
network output. The knowledge gained from this analysis can be
represented in rules

September 18, 2024
Techniques 36
Neural Network as a Classifier
飦� Weakness
飦� Long training time
飦� Require a number of parameters typically best determined empirically,
e.g., the network topology or 鈥渟tructure.鈥�
飦� Poor interpretability: Difficult to interpret the symbolic meaning behind
the learned weights and of 鈥渉idden units鈥� in the network
飦� Strength
飦� High tolerance to noisy data
飦� Ability to classify untrained patterns
飦� Well-suited for continuous-valued inputs and outputs
飦� Successful on a wide array of real-world data
飦� Algorithms are inherently parallel
飦� Techniques have recently been developed for the extraction of rules
from trained neural networks

Artificial Neural Networks (ANN)
X1 X2 X3 Y
1 0 0 0
1 0 1 1
1 1 0 1
1 1 1 1
0 0 1 0
0 1 0 0
0 1 1 1
0 0 0 0
飦�
X1
X2
X3
Y
Black box
0.3
0.3
0.3 t=0.4
Output
node
Input
nodes
飪�
飪�
飪�
飥�
飥�
飥�
飥�
飥�
飥�
otherwise
0
true
is
if
1
)
(
where
)
0
4
.
0
3
.
0
3
.
0
3
.
0
( 3
2
1
z
z
I
X
X
X
I
Y

September 18, 2024
Techniques 40
A Multi-Layer Feed-Forward Neural Network
Output layer
Input layer
Hidden layer
Output vector
Input vector: X
wij
ij
k
i
i
k
j
k
j x
y
y
w
w )
藛
( )
(
)
(
)
1
(
飥�
飥�
飥�
飥�
飦�

General Structure of ANN
Activation
function
g(Si
)
Si
Oi
I1
I2
I3
wi1
wi2
wi3
Oi
Neuron i
Input Output
threshold, t
Input
Layer
Hidden
Layer
Output
Layer
x1
x2
x3
x4
x5
y
Training ANN means learning
the weights of the neurons

狠狠撸

ann ppt , multilayer perceptron. presentation on

More Related Content

Similar to ann ppt , multilayer perceptron. presentation on (20)

Recently uploaded (20)

ann ppt , multilayer perceptron. presentation on