狠狠撸

RNNs
Under the hood
On the Surface
Elvis Saravia

Icebreaker
Can we predict the future based on our current decisions?
2
“Which research direction should I take?”
Elvis Saravia

Outline
●Part 1: Review Neural Network Essentials
●Part 2: Sequential Modeling
●Part 3: Introduction to Recurrent Neural Networks
●Part 4: RNNs with Tensorflow
3
Elvis Saravia

Part 1
4
The Neural Network
Elvis Saravia

Perceptron Forward Pass
5
∑ ?
non-linearity
sum
weights
inputs
?0
?1
? ?
?
?0
?1
? ?
?
?
output
? = ?(
?=0
?
?? ? ?? + ?)
Computing output:
? = ?(?? + ?)
? = ? ?, ?1, … , ? ?
? = ? ?, ?1, … , ? ?
tanh,
ReLU,
sigmoid
Vector form:
Elvis Saravia

Multi-Layer Perceptron (MLP)
6
Output layer
Hidden layer
Input layer
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
?3

Deep Neural Networks (DNN)
7
Output layer
Hidden layers
Input layer
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
?3
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
…

Neural Network (Summary)
? Neural Networks learn features after
input is fed into the hidden layers while
updating weights through
backpropagation (SGD)*.
? Data and activation flows in one
direction through the hidden layers, but
neurons never interact with each other.
8
Output layer
Hidden layers
Input layer
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
?3
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
…
* Stochastic Gradient Descent (SGD) Elvis Saravia

Drawbacks of NNs
●Lack sequence modeling capabilities: don’t keep track of past information (i.e.,
no memory), which is very important to model data with a sequential nature.
●Input is of fixed size (more of this later)
9
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
? ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
…
Elvis Saravia

Part 2
10
Sequential Modeling
Elvis Saravia

Sequences
“I took the bus this morning because it was very cold.”
11
Stock price
Speech
waveform
Sentence
●Current values depend on previous values (e.g., melody notes, language rules)
●Order needs to be maintained to preserve meaning
●Sequences usually vary in length
Elvis Saravia

Modeling Sequences
How to represent a sequence?
12
I love the coldness [ 1 1 0 1 0 0 1 ]
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
? ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
…
?
Bag of Words (BOW)
What’s the problem with the BOW representation?

Problem with BOW
Bag of words does not preserve order, therefore no semantics can be captured
13
“The food was good, not bad at all”
“The food was bad, not good at all”
vs
How to differentiate meaning of both sentences?
[ 1 1 0 1 1 0 1 0 1 0 0 1 1 ]

One-Hot Encoding
● Preserve order by maintaining order within feature vector
14
On Monday it was raining
[ 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ]
We preserved order but what is the problem here?

Problem with One-Hot Encoding
● One-hot encoding cannot deal with variations of the same sequence.
15
It was raining on Monday
[ 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ]
[ 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 ]
Elvis Saravia

Solution
Solution: We need to relearn the rules of language at each point in the sentence
to preserve meaning.
16
It was raining on Monday
No idea of state and what comes next!!!
What was learned at
the beginning of the
sentence will have to
be relearned at the
end of the sentence.

Markov Models
State and transitions can be modeled,
therefore it doesn’t matter where
in the sentence we are, we have an idea
of what comes next based on the
probabilities.
17
Problem: Each state depends only on the
last state. We can’t model long-term dependencies!
Elvis Saravia

Long-term dependencies
We need information from the far past and future to accurately model sequences.
18
In Italy, I had a great time and I learnt some of the _____ language
Elvis Saravia

19
It’s time for Recurrent Neural Networks (RNNs)!!!
Elvis Saravia

Part 3
20
Recurrent Neural Networks
(RNNs)

Recurrent Neural Networks (RNNs)
●RNNs model sequential information by assuming long-term dependencies
between elements of a sequence.
●RNNs maintain word order and share parameters across the sequence (i.e., no
need to relearn rules).
●RNNs are recurrent because they perform the same task for every element of a
sequence, with the output being depended on the previous computations.
●RNNs memorize information that has been computed so far, so they deal well
with long-term dependencies.
21

Applications of RNN
●Analyze time series data to predict stock market
●Speech recognition (e.g., Emotion Recognition from Acoustic features)
●Autonomous driving
●Natural Language Processing (e.g., Machine Translation, Question and Answer)?
22
Elvis Saravia

Some examples
Google Magenta Project (Melody composer) – (https://magenta.tensorflow.org)
Sentence Generator - (http://goo.gl/onkPNd)
Image Captioning – (http://goo.gl/Nwx7Kh)
23
Elvis Saravia

RNNs Main Components
- Recurrent neurons
- Unrolling recurrent neurons
- Layer of recurrent neurons
- Memory cell containing hidden state
24
Elvis Saravia

Recurrent Neurons
?
?
A simple recurrent neuron:
? receives input
? produces output
? sends output back to itself
∑
?
?
?
∑
?
- Input
- Output (usually a vector of probabilities)
- sum(W. x) + bias
- Activation function (e.g., ReLU, tanh, sigmoid)
- Weights for inputs
- Weights for outputs of previous time step
- Function of current inputs and previous time step
??
??
??
??
25
?
?

Unrolling/Unfolding recurrent neuron
?
?
∑
?
???3
???3
∑
?
???2
???2
∑
?
???1
???1
∑
?
??
??
∑
?
Unrolling
through
time
Time step
or
frame 26
?? ?? ?? ??
?? ?? ?? ??
??
? ??3 ? ??2 ? ??1 ? ?
time
? ??4
??
??
?

RNNs remember previous state
27
?0: "??"
?0
∑
?
??
??
?0 : vector representing first word
??, ??: weight matrices
?0 : output at t=0
?0 = tanh(?? ?0 + ????1)t=0
Can remember
things from the past
??1 ?0

RNNs remember previous state
28
?1: "???"
?1
∑
?
??
??
?1 : vector representing second word
??, ??: weight matrices stay the same so they are shared across sequence
?1 : output at t=1
?1 = tanh(?? ?1 + ???0)t=1
?0 ?1
?0 = tanh(?? ?0 + ????1)
Can remember
things from t=0

Overview
?
?
∑
?
???3
???3
∑
?
???2
???2
∑
?
???1
???1
∑
?
??
??
∑
?
Unrolling
through
time
scalar
[0,1,2] [3,4,5] [6,7,8] [9,0,1]
?? = ? ?? ?? + ?? ???1 + ?
29[9,8,7] [0,0,0] [6,5,4] [3,2,1]
batch

Code Example
30
Where all the
magic happens!
?? = ? ?? ?? + ?? ???1 + ?

Layer of Recurrent Neurons
?
?
∑
?
∑
?
∑
?
∑
?
∑
?
∑
?
?
?
scalar
vector
31
?
Elvis Saravia

Unrolling Layer
∑
?
∑
?
∑
?
∑
?
∑
?
?
?
Unrolling
through
time
?0
?0
?1
?1
?2
?2
[1,2,3] [1,2,3] [1,2,3]
[4,5,6] [7,8,9] [0,0,0]
Yt = ? ?? ???1 . ? + ? , ? =
??
??
Y? = ? ??. ?? + ???1. ?? + ?
vector
form
32
?? ?? ??
?? ?? ??
?0 ?1 ?2

Variations of RNNs: Input / Output
?0
?0
?1
?1
?2
?2
?3
?3
sequence to sequence
- Stock price
- Other time series
?0
?0
0
?1
0
?2
0
?3
vector to sequence
- Image captioning
?0
?0
?1
?1
0
?′2
0
?′3
0
?′3
Encoder Decoder
- Translation:
- E: Sequence to vector
- D: Vector to sequence
33
?0
?0
?1
?1
?2
?2
?3
?3
sequence to vector
- Sentiment analysis ([-1,+1])
- Other classification tasks
vector of
probabilities
over classes
a.k.a softmax

Training
?0
?0
?1
?1
?2
?2
?3
?3
?, ? ?, ? ?, ? ?, ?
?4
?4
?, ?
?(?2, ?3, ?4)
Forward pass
Backpropagation
34
- Forward pass
- Compute Loss via cost function C
- Minimize Loss by backpropagation
through time (BPTT)
??4
??
=
?=0
4
??4
??4
??4
??4
??4
?? ?
?? ?
??
??
??
=
?
???
??
For one single time step t:
?2, ?3, ?4
Counting the contributions of W in previous time-steps
to the error at time-step t (using Chain rule)
?0 ?1 ?2 ?3 ?4

Drawbacks
Main problem: Vanishing gradients (gradients gets too small)
Intuition: As sequences get longer, gradients tend to get too small during
backpropagation process.
35
?? ?
??
=
?=0
?
?? ?
???
???
?? ?
?? ?
?? ?
?? ?
??
?? ?
?? ??1
?? ??1
?? ??2
. . .
??3
??2
??2
??1
??1
??0
We are just multiplying a lot of small
numbers together
Elvis Saravia

Solutions
Long Short-Term Memory Networks –
Deal with vanishing gradient problem, therefore more reliable to model long-term
dependencies, especially for very long sequences.
36
Elvis Saravia

RNN Extensions
Extended Readings:
○ Bidirectional RNNs – passing states in both directions
○ Deep (Bidirectional) RNNs – stacking RNNs
○ LSTM networks – Adaptation of RNNs
37
Elvis Saravia

In general
●RNNs are great for analyzing sequences of any arbitrary length.
●RNNs are considered “anticipatory” models
●RNNs are also considered creative learning models as they can, for example,
predict set of musical notes to play next in melody, and selects an appropriate
one.
38
Elvis Saravia

Part 3
39
RNNs in Tensorflow
Elvis Saravia

Demo
●Building RNNS in Tensorflow
●Trainining RNNS in Tensorflow
●Image Classification
●Text Classification
40
Elvis Saravia

References
●Introduction to RNNs - http://www.wildml.com/2015/09/recurrent-neural-
networks-tutorial-part-1-introduction-to-rnns/
●NTHU Machine Learning Course - https://goo.gl/B4EqMi
●Hands-On Machine Learning with Scikit-Learn and Tensorflow (Book)
41
Elvis Saravia

狠狠撸

Introduction to Fundamentals of RNNs

More Related Content

Similar to Introduction to Fundamentals of RNNs (20)

More from Elvis Saravia (9)

Recently uploaded (20)

Introduction to Fundamentals of RNNs