狠狠撸

狠狠撸Share a Scribd company logo
RNNs
Under the hood
On the Surface
Elvis Saravia
Icebreaker
Can we predict the future based on our current decisions?
2
“Which research direction should I take?”
Elvis Saravia
Outline
●Part 1: Review Neural Network Essentials
●Part 2: Sequential Modeling
●Part 3: Introduction to Recurrent Neural Networks
●Part 4: RNNs with Tensorflow
3
Elvis Saravia
Part 1
4
The Neural Network
Elvis Saravia
Perceptron Forward Pass
5
∑ ?
non-linearity
sum
weights
inputs
?0
?1
? ?
?
?0
?1
? ?
?
?
output
? = ?(
?=0
?
?? ? ?? + ?)
Computing output:
? = ?(?? + ?)
? = ? ?, ?1, … , ? ?
? = ? ?, ?1, … , ? ?
tanh,
ReLU,
sigmoid
Vector form:
Elvis Saravia
Multi-Layer Perceptron (MLP)
6
Output layer
Hidden layer
Input layer
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
?3
Deep Neural Networks (DNN)
7
Output layer
Hidden layers
Input layer
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
?3
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
…
Neural Network (Summary)
? Neural Networks learn features after
input is fed into the hidden layers while
updating weights through
backpropagation (SGD)*.
? Data and activation flows in one
direction through the hidden layers, but
neurons never interact with each other.
8
Output layer
Hidden layers
Input layer
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
?3
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
…
* Stochastic Gradient Descent (SGD) Elvis Saravia
Drawbacks of NNs
●Lack sequence modeling capabilities: don’t keep track of past information (i.e.,
no memory), which is very important to model data with a sequential nature.
●Input is of fixed size (more of this later)
9
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
? ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
…
Elvis Saravia
Part 2
10
Sequential Modeling
Elvis Saravia
Sequences
“I took the bus this morning because it was very cold.”
11
Stock price
Speech
waveform
Sentence
●Current values depend on previous values (e.g., melody notes, language rules)
●Order needs to be maintained to preserve meaning
●Sequences usually vary in length
Elvis Saravia
Modeling Sequences
How to represent a sequence?
12
I love the coldness [ 1 1 0 1 0 0 1 ]
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
?0
?1
?2
? ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
∑ ?
…
?
Bag of Words (BOW)
What’s the problem with the BOW representation?
Problem with BOW
Bag of words does not preserve order, therefore no semantics can be captured
13
“The food was good, not bad at all”
“The food was bad, not good at all”
vs
How to differentiate meaning of both sentences?
[ 1 1 0 1 1 0 1 0 1 0 0 1 1 ]
One-Hot Encoding
● Preserve order by maintaining order within feature vector
14
On Monday it was raining
[ 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ]
We preserved order but what is the problem here?
Problem with One-Hot Encoding
● One-hot encoding cannot deal with variations of the same sequence.
15
On Monday it was raining
It was raining on Monday
[ 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ]
[ 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 ]
Elvis Saravia
Solution
Solution: We need to relearn the rules of language at each point in the sentence
to preserve meaning.
16
On Monday it was raining
It was raining on Monday
No idea of state and what comes next!!!
What was learned at
the beginning of the
sentence will have to
be relearned at the
end of the sentence.
Markov Models
State and transitions can be modeled,
therefore it doesn’t matter where
in the sentence we are, we have an idea
of what comes next based on the
probabilities.
17
Problem: Each state depends only on the
last state. We can’t model long-term dependencies!
Elvis Saravia
Long-term dependencies
We need information from the far past and future to accurately model sequences.
18
In Italy, I had a great time and I learnt some of the _____ language
Elvis Saravia
19
It’s time for Recurrent Neural Networks (RNNs)!!!
Elvis Saravia
Part 3
20
Recurrent Neural Networks
(RNNs)
Recurrent Neural Networks (RNNs)
●RNNs model sequential information by assuming long-term dependencies
between elements of a sequence.
●RNNs maintain word order and share parameters across the sequence (i.e., no
need to relearn rules).
●RNNs are recurrent because they perform the same task for every element of a
sequence, with the output being depended on the previous computations.
●RNNs memorize information that has been computed so far, so they deal well
with long-term dependencies.
21
Applications of RNN
●Analyze time series data to predict stock market
●Speech recognition (e.g., Emotion Recognition from Acoustic features)
●Autonomous driving
●Natural Language Processing (e.g., Machine Translation, Question and Answer)?
22
Elvis Saravia
Some examples
Google Magenta Project (Melody composer) – (https://magenta.tensorflow.org)
Sentence Generator - (http://goo.gl/onkPNd)
Image Captioning – (http://goo.gl/Nwx7Kh)
23
Elvis Saravia
RNNs Main Components
- Recurrent neurons
- Unrolling recurrent neurons
- Layer of recurrent neurons
- Memory cell containing hidden state
24
Elvis Saravia
Recurrent Neurons
?
?
A simple recurrent neuron:
? receives input
? produces output
? sends output back to itself
∑
?
?
?
∑
?
- Input
- Output (usually a vector of probabilities)
- sum(W. x) + bias
- Activation function (e.g., ReLU, tanh, sigmoid)
- Weights for inputs
- Weights for outputs of previous time step
- Function of current inputs and previous time step
??
??
??
??
25
?
?
Unrolling/Unfolding recurrent neuron
?
?
∑
?
???3
???3
∑
?
???2
???2
∑
?
???1
???1
∑
?
??
??
∑
?
Unrolling
through
time
Time step
or
frame 26
?? ?? ?? ??
?? ?? ?? ??
??
? ??3 ? ??2 ? ??1 ? ?
time
? ??4
??
??
?
RNNs remember previous state
27
?0: "??"
?0
∑
?
??
??
?0 : vector representing first word
??, ??: weight matrices
?0 : output at t=0
?0 = tanh(?? ?0 + ????1)t=0
Can remember
things from the past
??1 ?0
RNNs remember previous state
28
?1: "???"
?1
∑
?
??
??
?1 : vector representing second word
??, ??: weight matrices stay the same so they are shared across sequence
?1 : output at t=1
?1 = tanh(?? ?1 + ???0)t=1
?0 ?1
?0 = tanh(?? ?0 + ????1)
Can remember
things from t=0
Overview
?
?
∑
?
???3
???3
∑
?
???2
???2
∑
?
???1
???1
∑
?
??
??
∑
?
Unrolling
through
time
scalar
[0,1,2] [3,4,5] [6,7,8] [9,0,1]
?? = ? ?? ?? + ?? ???1 + ?
29[9,8,7] [0,0,0] [6,5,4] [3,2,1]
batch
Code Example
30
Where all the
magic happens!
?? = ? ?? ?? + ?? ???1 + ?
Layer of Recurrent Neurons
?
?
∑
?
∑
?
∑
?
∑
?
∑
?
∑
?
?
?
scalar
vector
31
?
Elvis Saravia
Unrolling Layer
∑
?
∑
?
∑
?
∑
?
∑
?
?
?
Unrolling
through
time
?0
?0
?1
?1
?2
?2
[1,2,3] [1,2,3] [1,2,3]
[4,5,6] [7,8,9] [0,0,0]
Yt = ? ?? ???1 . ? + ? , ? =
??
??
Y? = ? ??. ?? + ???1. ?? + ?
vector
form
32
?? ?? ??
?? ?? ??
?0 ?1 ?2
Variations of RNNs: Input / Output
?0
?0
?1
?1
?2
?2
?3
?3
sequence to sequence
- Stock price
- Other time series
?0
?0
0
?1
0
?2
0
?3
vector to sequence
- Image captioning
?0
?0
?1
?1
0
?′2
0
?′3
0
?′3
Encoder Decoder
- Translation:
- E: Sequence to vector
- D: Vector to sequence
33
?0
?0
?1
?1
?2
?2
?3
?3
sequence to vector
- Sentiment analysis ([-1,+1])
- Other classification tasks
vector of
probabilities
over classes
a.k.a softmax
Training
?0
?0
?1
?1
?2
?2
?3
?3
?, ? ?, ? ?, ? ?, ?
?4
?4
?, ?
?(?2, ?3, ?4)
Forward pass
Backpropagation
34
- Forward pass
- Compute Loss via cost function C
- Minimize Loss by backpropagation
through time (BPTT)
??4
??
=
?=0
4
??4
??4
??4
??4
??4
?? ?
?? ?
??
??
??
=
?
???
??
For one single time step t:
?2, ?3, ?4
Counting the contributions of W in previous time-steps
to the error at time-step t (using Chain rule)
?0 ?1 ?2 ?3 ?4
Drawbacks
Main problem: Vanishing gradients (gradients gets too small)
Intuition: As sequences get longer, gradients tend to get too small during
backpropagation process.
35
?? ?
??
=
?=0
?
?? ?
???
???
?? ?
?? ?
?? ?
?? ?
??
?? ?
?? ??1
?? ??1
?? ??2
. . .
??3
??2
??2
??1
??1
??0
We are just multiplying a lot of small
numbers together
Elvis Saravia
Solutions
Long Short-Term Memory Networks –
Deal with vanishing gradient problem, therefore more reliable to model long-term
dependencies, especially for very long sequences.
36
Elvis Saravia
RNN Extensions
Extended Readings:
○ Bidirectional RNNs – passing states in both directions
○ Deep (Bidirectional) RNNs – stacking RNNs
○ LSTM networks – Adaptation of RNNs
37
Elvis Saravia
In general
●RNNs are great for analyzing sequences of any arbitrary length.
●RNNs are considered “anticipatory” models
●RNNs are also considered creative learning models as they can, for example,
predict set of musical notes to play next in melody, and selects an appropriate
one.
38
Elvis Saravia
Part 3
39
RNNs in Tensorflow
Elvis Saravia
Demo
●Building RNNS in Tensorflow
●Trainining RNNS in Tensorflow
●Image Classification
●Text Classification
40
Elvis Saravia
References
●Introduction to RNNs - http://www.wildml.com/2015/09/recurrent-neural-
networks-tutorial-part-1-introduction-to-rnns/
●NTHU Machine Learning Course - https://goo.gl/B4EqMi
●Hands-On Machine Learning with Scikit-Learn and Tensorflow (Book)
41
Elvis Saravia

More Related Content

Similar to Introduction to Fundamentals of RNNs (20)

PPTX
Introduction to deep learning
Junaid Bhat
?
PDF
Recurrent Neural Networks
Sharath TS
?
PPTX
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
?
PPTX
Introduction_to_Deep_learning_Standford_university by Angelica Sun
ssuser36b130
?
PDF
Introduction to Recurrent Neural Network
Knoldus Inc.
?
PDF
rnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
khushbu maurya
?
PDF
Echo state networks and locomotion patterns
Vito Strano
?
PPT
14889574 dl ml RNN Deeplearning MMMm.ppt
ManiMaran230751
?
PDF
Sequencing and Attention Models - 2nd Version
ssuserbd372d
?
PPT
Advanced Machine Learning
ANANDBABUGOPATHOTI1
?
PPT
Lec10new
Ananda Gopathoti
?
PPT
lec10new.ppt
SumantKuch
?
PPT
rnn BASICS
Priyanka Reddy
?
PPTX
Introduction to artificial neural network.pptx
hinanoor13
?
PPT
lec10newwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
khushbu maurya
?
PPT
Artificial neutral network cousre of AI.ppt
attaurahman
?
PDF
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Universitat Politècnica de Catalunya
?
PDF
MATEX @ DAC14
Hao Zhuang
?
PDF
Lecture 5 backpropagation
ParveenMalik18
?
Introduction to deep learning
Junaid Bhat
?
Recurrent Neural Networks
Sharath TS
?
04 Multi-layer Feedforward Networks
Tamer Ahmed Farrag, PhD
?
Introduction_to_Deep_learning_Standford_university by Angelica Sun
ssuser36b130
?
Introduction to Recurrent Neural Network
Knoldus Inc.
?
rnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
khushbu maurya
?
Echo state networks and locomotion patterns
Vito Strano
?
14889574 dl ml RNN Deeplearning MMMm.ppt
ManiMaran230751
?
Sequencing and Attention Models - 2nd Version
ssuserbd372d
?
Advanced Machine Learning
ANANDBABUGOPATHOTI1
?
lec10new.ppt
SumantKuch
?
rnn BASICS
Priyanka Reddy
?
Introduction to artificial neural network.pptx
hinanoor13
?
lec10newwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww
khushbu maurya
?
Artificial neutral network cousre of AI.ppt
attaurahman
?
Recurrent Neural Networks II (D2L3 Deep Learning for Speech and Language UPC ...
Universitat Politècnica de Catalunya
?
MATEX @ DAC14
Hao Zhuang
?
Lecture 5 backpropagation
ParveenMalik18
?

More from Elvis Saravia (9)

PDF
The Future of Brain-Powered Learning
Elvis Saravia
?
PDF
Text mining lab (summer 2017) - Word Vector Representation
Elvis Saravia
?
PDF
Thesis oral defense 2015 elvis saravia
Elvis Saravia
?
PDF
An Introduction to Apache Spark
Elvis Saravia
?
PPTX
The Neurochemistry of Music
Elvis Saravia
?
PDF
NewSQL - The Future of Databases?
Elvis Saravia
?
PDF
Crowdsource Delivery System - Improving traditional delivery systems
Elvis Saravia
?
PDF
Relational Databases - Benefits and Challenges
Elvis Saravia
?
PDF
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
Elvis Saravia
?
The Future of Brain-Powered Learning
Elvis Saravia
?
Text mining lab (summer 2017) - Word Vector Representation
Elvis Saravia
?
Thesis oral defense 2015 elvis saravia
Elvis Saravia
?
An Introduction to Apache Spark
Elvis Saravia
?
The Neurochemistry of Music
Elvis Saravia
?
NewSQL - The Future of Databases?
Elvis Saravia
?
Crowdsource Delivery System - Improving traditional delivery systems
Elvis Saravia
?
Relational Databases - Benefits and Challenges
Elvis Saravia
?
Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental D...
Elvis Saravia
?
Ad

Recently uploaded (20)

PPTX
美国毕业证范本中华盛顿大学学位证书颁奥鲍学生卡购买
Taqyea
?
PPTX
Mynd company all details what they are doing a
AniketKadam40952
?
PPTX
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
?
DOCX
Starbucks in the Indian market through its joint venture.
sales480687
?
PPTX
727325165-Unit-1-Data-Analytics-PPT-1.pptx
revathi148366
?
PPT
Reliability Monitoring of Aircrfat commerce
Rizk2
?
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
?
PPTX
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
?
PDF
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
?
PDF
Digital-Transformation-for-Federal-Agencies.pdf.pdf
One Federal Solution
?
PPT
Camuflaje Tipos Características Militar 2025.ppt
e58650738
?
PDF
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
?
PDF
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
?
DOCX
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
?
DOCX
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
?
PDF
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
?
PDF
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
?
PDF
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
?
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
?
PPTX
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
?
美国毕业证范本中华盛顿大学学位证书颁奥鲍学生卡购买
Taqyea
?
Mynd company all details what they are doing a
AniketKadam40952
?
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
?
Starbucks in the Indian market through its joint venture.
sales480687
?
727325165-Unit-1-Data-Analytics-PPT-1.pptx
revathi148366
?
Reliability Monitoring of Aircrfat commerce
Rizk2
?
A Web Repository System for Data Mining in Drug Discovery
IJDKP
?
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
?
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
?
Digital-Transformation-for-Federal-Agencies.pdf.pdf
One Federal Solution
?
Camuflaje Tipos Características Militar 2025.ppt
e58650738
?
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
?
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
?
COT Feb 19, 2025 DLLgvbbnnjjjjjj_Digestive System and its Functions_PISA_CBA....
kayemorales1105
?
brigada_PROGRAM_25.docx the boys white house
RonelNebrao
?
ilide.info-tg-understanding-culture-society-and-politics-pr_127f984d2904c57ec...
jed P
?
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
?
Business Automation Solution with Excel 1.1.pdf
Vivek Kedia
?
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
?
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
?
Ad

Introduction to Fundamentals of RNNs

  • 1. RNNs Under the hood On the Surface Elvis Saravia
  • 2. Icebreaker Can we predict the future based on our current decisions? 2 “Which research direction should I take?” Elvis Saravia
  • 3. Outline ●Part 1: Review Neural Network Essentials ●Part 2: Sequential Modeling ●Part 3: Introduction to Recurrent Neural Networks ●Part 4: RNNs with Tensorflow 3 Elvis Saravia
  • 4. Part 1 4 The Neural Network Elvis Saravia
  • 5. Perceptron Forward Pass 5 ∑ ? non-linearity sum weights inputs ?0 ?1 ? ? ? ?0 ?1 ? ? ? ? output ? = ?( ?=0 ? ?? ? ?? + ?) Computing output: ? = ?(?? + ?) ? = ? ?, ?1, … , ? ? ? = ? ?, ?1, … , ? ? tanh, ReLU, sigmoid Vector form: Elvis Saravia
  • 6. Multi-Layer Perceptron (MLP) 6 Output layer Hidden layer Input layer ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ?0 ?1 ?2 ?3
  • 7. Deep Neural Networks (DNN) 7 Output layer Hidden layers Input layer ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ?0 ?1 ?2 ?3 ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? …
  • 8. Neural Network (Summary) ? Neural Networks learn features after input is fed into the hidden layers while updating weights through backpropagation (SGD)*. ? Data and activation flows in one direction through the hidden layers, but neurons never interact with each other. 8 Output layer Hidden layers Input layer ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ?0 ?1 ?2 ?3 ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? … * Stochastic Gradient Descent (SGD) Elvis Saravia
  • 9. Drawbacks of NNs ●Lack sequence modeling capabilities: don’t keep track of past information (i.e., no memory), which is very important to model data with a sequential nature. ●Input is of fixed size (more of this later) 9 ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ?0 ?1 ?2 ? ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? … Elvis Saravia
  • 11. Sequences “I took the bus this morning because it was very cold.” 11 Stock price Speech waveform Sentence ●Current values depend on previous values (e.g., melody notes, language rules) ●Order needs to be maintained to preserve meaning ●Sequences usually vary in length Elvis Saravia
  • 12. Modeling Sequences How to represent a sequence? 12 I love the coldness [ 1 1 0 1 0 0 1 ] ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ?0 ?1 ?2 ? ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? … ? Bag of Words (BOW) What’s the problem with the BOW representation?
  • 13. Problem with BOW Bag of words does not preserve order, therefore no semantics can be captured 13 “The food was good, not bad at all” “The food was bad, not good at all” vs How to differentiate meaning of both sentences? [ 1 1 0 1 1 0 1 0 1 0 0 1 1 ]
  • 14. One-Hot Encoding ● Preserve order by maintaining order within feature vector 14 On Monday it was raining [ 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ] We preserved order but what is the problem here?
  • 15. Problem with One-Hot Encoding ● One-hot encoding cannot deal with variations of the same sequence. 15 On Monday it was raining It was raining on Monday [ 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 ] [ 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 ] Elvis Saravia
  • 16. Solution Solution: We need to relearn the rules of language at each point in the sentence to preserve meaning. 16 On Monday it was raining It was raining on Monday No idea of state and what comes next!!! What was learned at the beginning of the sentence will have to be relearned at the end of the sentence.
  • 17. Markov Models State and transitions can be modeled, therefore it doesn’t matter where in the sentence we are, we have an idea of what comes next based on the probabilities. 17 Problem: Each state depends only on the last state. We can’t model long-term dependencies! Elvis Saravia
  • 18. Long-term dependencies We need information from the far past and future to accurately model sequences. 18 In Italy, I had a great time and I learnt some of the _____ language Elvis Saravia
  • 19. 19 It’s time for Recurrent Neural Networks (RNNs)!!! Elvis Saravia
  • 20. Part 3 20 Recurrent Neural Networks (RNNs)
  • 21. Recurrent Neural Networks (RNNs) ●RNNs model sequential information by assuming long-term dependencies between elements of a sequence. ●RNNs maintain word order and share parameters across the sequence (i.e., no need to relearn rules). ●RNNs are recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. ●RNNs memorize information that has been computed so far, so they deal well with long-term dependencies. 21
  • 22. Applications of RNN ●Analyze time series data to predict stock market ●Speech recognition (e.g., Emotion Recognition from Acoustic features) ●Autonomous driving ●Natural Language Processing (e.g., Machine Translation, Question and Answer)? 22 Elvis Saravia
  • 23. Some examples Google Magenta Project (Melody composer) – (https://magenta.tensorflow.org) Sentence Generator - (http://goo.gl/onkPNd) Image Captioning – (http://goo.gl/Nwx7Kh) 23 Elvis Saravia
  • 24. RNNs Main Components - Recurrent neurons - Unrolling recurrent neurons - Layer of recurrent neurons - Memory cell containing hidden state 24 Elvis Saravia
  • 25. Recurrent Neurons ? ? A simple recurrent neuron: ? receives input ? produces output ? sends output back to itself ∑ ? ? ? ∑ ? - Input - Output (usually a vector of probabilities) - sum(W. x) + bias - Activation function (e.g., ReLU, tanh, sigmoid) - Weights for inputs - Weights for outputs of previous time step - Function of current inputs and previous time step ?? ?? ?? ?? 25 ? ?
  • 27. RNNs remember previous state 27 ?0: "??" ?0 ∑ ? ?? ?? ?0 : vector representing first word ??, ??: weight matrices ?0 : output at t=0 ?0 = tanh(?? ?0 + ????1)t=0 Can remember things from the past ??1 ?0
  • 28. RNNs remember previous state 28 ?1: "???" ?1 ∑ ? ?? ?? ?1 : vector representing second word ??, ??: weight matrices stay the same so they are shared across sequence ?1 : output at t=1 ?1 = tanh(?? ?1 + ???0)t=1 ?0 ?1 ?0 = tanh(?? ?0 + ????1) Can remember things from t=0
  • 30. Code Example 30 Where all the magic happens! ?? = ? ?? ?? + ?? ???1 + ?
  • 31. Layer of Recurrent Neurons ? ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ? ? scalar vector 31 ? Elvis Saravia
  • 32. Unrolling Layer ∑ ? ∑ ? ∑ ? ∑ ? ∑ ? ? ? Unrolling through time ?0 ?0 ?1 ?1 ?2 ?2 [1,2,3] [1,2,3] [1,2,3] [4,5,6] [7,8,9] [0,0,0] Yt = ? ?? ???1 . ? + ? , ? = ?? ?? Y? = ? ??. ?? + ???1. ?? + ? vector form 32 ?? ?? ?? ?? ?? ?? ?0 ?1 ?2
  • 33. Variations of RNNs: Input / Output ?0 ?0 ?1 ?1 ?2 ?2 ?3 ?3 sequence to sequence - Stock price - Other time series ?0 ?0 0 ?1 0 ?2 0 ?3 vector to sequence - Image captioning ?0 ?0 ?1 ?1 0 ?′2 0 ?′3 0 ?′3 Encoder Decoder - Translation: - E: Sequence to vector - D: Vector to sequence 33 ?0 ?0 ?1 ?1 ?2 ?2 ?3 ?3 sequence to vector - Sentiment analysis ([-1,+1]) - Other classification tasks vector of probabilities over classes a.k.a softmax
  • 34. Training ?0 ?0 ?1 ?1 ?2 ?2 ?3 ?3 ?, ? ?, ? ?, ? ?, ? ?4 ?4 ?, ? ?(?2, ?3, ?4) Forward pass Backpropagation 34 - Forward pass - Compute Loss via cost function C - Minimize Loss by backpropagation through time (BPTT) ??4 ?? = ?=0 4 ??4 ??4 ??4 ??4 ??4 ?? ? ?? ? ?? ?? ?? = ? ??? ?? For one single time step t: ?2, ?3, ?4 Counting the contributions of W in previous time-steps to the error at time-step t (using Chain rule) ?0 ?1 ?2 ?3 ?4
  • 35. Drawbacks Main problem: Vanishing gradients (gradients gets too small) Intuition: As sequences get longer, gradients tend to get too small during backpropagation process. 35 ?? ? ?? = ?=0 ? ?? ? ??? ??? ?? ? ?? ? ?? ? ?? ? ?? ?? ? ?? ??1 ?? ??1 ?? ??2 . . . ??3 ??2 ??2 ??1 ??1 ??0 We are just multiplying a lot of small numbers together Elvis Saravia
  • 36. Solutions Long Short-Term Memory Networks – Deal with vanishing gradient problem, therefore more reliable to model long-term dependencies, especially for very long sequences. 36 Elvis Saravia
  • 37. RNN Extensions Extended Readings: ○ Bidirectional RNNs – passing states in both directions ○ Deep (Bidirectional) RNNs – stacking RNNs ○ LSTM networks – Adaptation of RNNs 37 Elvis Saravia
  • 38. In general ●RNNs are great for analyzing sequences of any arbitrary length. ●RNNs are considered “anticipatory” models ●RNNs are also considered creative learning models as they can, for example, predict set of musical notes to play next in melody, and selects an appropriate one. 38 Elvis Saravia
  • 39. Part 3 39 RNNs in Tensorflow Elvis Saravia
  • 40. Demo ●Building RNNS in Tensorflow ●Trainining RNNS in Tensorflow ●Image Classification ●Text Classification 40 Elvis Saravia
  • 41. References ●Introduction to RNNs - http://www.wildml.com/2015/09/recurrent-neural- networks-tutorial-part-1-introduction-to-rnns/ ●NTHU Machine Learning Course - https://goo.gl/B4EqMi ●Hands-On Machine Learning with Scikit-Learn and Tensorflow (Book) 41 Elvis Saravia