際際滷

際際滷Share a Scribd company logo
RNNs for Speech
Faster and smaller RNNs with new regularization techniques.
Old Good RNNs
Cannot train RNN!!
Gradients get crazy!!
Fishes are better at remembering!!!
I watched Schmidhuber and liked him!!
I dont care baseline, I use what the cool boys use!!
Why so big, Occams will cry!!
My GPU has 4GB!!
I cant wait months to train!!
X et al. said GRUs are better!!
What else?
I need a RNN size model with LSTM performance !!
I need a smaller model or a better smart phone !!
FastGRNN
http://manikvarma.org/pubs/kusupati18.pdf
This forget gate makes no sense!!
May the ReLU be with you!!
I do speech recognition!!
I watched Bengio and liked him!!
LightGRU
https://arxiv.org/abs/1803.10225
I need Regularization!!!
Dropout is not good!!!
AWD-LSTM
https://arxiv.org/abs/1708.02182
Fast GRNN
 2 trainable matrices vs 6 trainable matrices in a GRU layer.
 Low rank approximation of matrices: w = w1(w2).T
 Integer quantization for parameters.
 Piecewise linear approximation of non-linearities.
FastGRNN vs GRU
RNNs for Speech
Light Gated Recurrent Units
 Remove the reset gate.
 Replace tanh with ReLU
 Batch normalization to reduce ReLU unstability.
 Specifically targeting speech recognition.
 Orthogonal weight initialization, Variational dropout
Redundancy of Reset Gate
Results
40 log-mel filter banks Maximum likelihood
linear regression
All together
GRU FastGRNN LightGRU
ASGD Weight Dropped LSTM
 Drop Connect
 Averaged SGD
 Embedding Dropout
 Activation Regularization
Weight Dropping
 Apply Drop-Connect to hidden to hidden connections. (All U matrices)
 Preventing recurrent unit overfitting.
 It needs not to modify optimized RNN implementations in DL frameworks.
 Apply the same dropout mask for the all sequence.
Average SGD and NT-ASGD
Number of steps
to start averaging Weights optimized per iterationWeights used as the
final model
PyTorch implementation:
https://github.com/pytorch/pytorch/blob/cd9b27231b51633e76e28b6a34002ab83b0660fc/torch/optim/asgd.py
NT-ASGD: Only use ASGD when validation metric fails to improve
Embedding Dropout
 Apply dropout in word level, that is dropout zeros-out randomly selected word
vectors.
Activation Regularization
 Panalize network for producing large changes in hidden states and large
outputs leading to overfitting.
Results
Ad

Recommended

DevOps Cebu Presentation
DevOps Cebu Presentation
Neil Alwin Hermosilla
Automated Speech Recognition
Automated Speech Recognition
Pruthvij Thakar
Cheatsheet recurrent-neural-networks
Cheatsheet recurrent-neural-networks
Steve Nouri
Rnn presentation 2
Rnn presentation 2
Shubhangi Tandon
Recurrent Neural Networks
Recurrent Neural Networks
Sharath TS
Chatbot ppt
Chatbot ppt
Manish Mishra
Deep learning architectures
Deep learning architectures
Joe li
Text classification based on gated recurrent unit combines with support vecto...
Text classification based on gated recurrent unit combines with support vecto...
IJECEIAES
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
AbhijitVenkatesh1
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
M叩rton Mih叩ltz
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative Unit
Satoru Katsumata
Video captioning in Vietnamese using deep learning
Video captioning in Vietnamese using deep learning
IJECEIAES
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
odsc
Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling
gerogepatton
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
gerogepatton
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
ijaia
Duplicate_Quora_Question_Detection
Duplicate_Quora_Question_Detection
Jayavardhan Reddy Peddamail
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
AI Frontiers
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
Universitat Polit竪cnica de Catalunya
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
Recurrent Neural Networks
Recurrent Neural Networks
Rakuten Group, Inc.
RNN.pdf
RNN.pdf
NiharikaThakur32
New research articles 2020 october issue international journal of multimedi...
New research articles 2020 october issue international journal of multimedi...
ijma
Sequence learning and modern RNNs
Sequence learning and modern RNNs
Grigory Sapunov
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
SagarTekwani4
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
Julien SIMON
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)
Hon Weng Chong
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Kunwoo Park
Poster SCGlowTTS Interspeech 2021
Poster SCGlowTTS Interspeech 2021
Bilkent University
Qualcomm research-imagenet2015
Qualcomm research-imagenet2015
Bilkent University

More Related Content

Similar to RNNs for Speech (20)

RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
AbhijitVenkatesh1
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
M叩rton Mih叩ltz
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative Unit
Satoru Katsumata
Video captioning in Vietnamese using deep learning
Video captioning in Vietnamese using deep learning
IJECEIAES
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
odsc
Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling
gerogepatton
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
gerogepatton
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
ijaia
Duplicate_Quora_Question_Detection
Duplicate_Quora_Question_Detection
Jayavardhan Reddy Peddamail
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
AI Frontiers
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
Universitat Polit竪cnica de Catalunya
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
Recurrent Neural Networks
Recurrent Neural Networks
Rakuten Group, Inc.
RNN.pdf
RNN.pdf
NiharikaThakur32
New research articles 2020 october issue international journal of multimedi...
New research articles 2020 october issue international journal of multimedi...
ijma
Sequence learning and modern RNNs
Sequence learning and modern RNNs
Grigory Sapunov
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
SagarTekwani4
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
Julien SIMON
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)
Hon Weng Chong
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Kunwoo Park
RNN and LSTM model description and working advantages and disadvantages
RNN and LSTM model description and working advantages and disadvantages
AbhijitVenkatesh1
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
Deep Learning Architectures for NLP (Hungarian NLP Meetup 2016-09-07)
M叩rton Mih叩ltz
Deep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative Unit
Satoru Katsumata
Video captioning in Vietnamese using deep learning
Video captioning in Vietnamese using deep learning
IJECEIAES
Recurrent Neural Networks for Text Analysis
Recurrent Neural Networks for Text Analysis
odsc
Log Message Anomaly Detection with Oversampling
Log Message Anomaly Detection with Oversampling
gerogepatton
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
gerogepatton
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
ijaia
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
Training at AI Frontiers 2018 - Lukasz Kaiser: Sequence to Sequence Learning ...
AI Frontiers
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
Universitat Polit竪cnica de Catalunya
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
New research articles 2020 october issue international journal of multimedi...
New research articles 2020 october issue international journal of multimedi...
ijma
Sequence learning and modern RNNs
Sequence learning and modern RNNs
Grigory Sapunov
recurrent_neural_networks_april_2020.pptx
recurrent_neural_networks_april_2020.pptx
SagarTekwani4
A pragmatic introduction to natural language processing models (October 2019)
A pragmatic introduction to natural language processing models (October 2019)
Julien SIMON
CoreML for NLP (Melb Cocoaheads 08/02/2018)
CoreML for NLP (Melb Cocoaheads 08/02/2018)
Hon Weng Chong
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Kunwoo Park

More from Bilkent University (6)

Poster SCGlowTTS Interspeech 2021
Poster SCGlowTTS Interspeech 2021
Bilkent University
Qualcomm research-imagenet2015
Qualcomm research-imagenet2015
Bilkent University
Fame cvpr
Fame cvpr
Bilkent University
Performance Evaluation for Classifiers tutorial
Performance Evaluation for Classifiers tutorial
Bilkent University
Eren_Golge_MS_Thesis_2014
Eren_Golge_MS_Thesis_2014
Bilkent University
Cmap presentation
Cmap presentation
Bilkent University
Ad

Recently uploaded (20)

Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
Turning the Page How AI is Exponentially Increasing Speed, Accuracy, and Ef...
Turning the Page How AI is Exponentially Increasing Speed, Accuracy, and Ef...
Impelsys Inc.
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Safe Software
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
June Patch Tuesday
June Patch Tuesday
Ivanti
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
"Database isolation: how we deal with hundreds of direct connections to the d...
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Seminar: Perspectives on Passkeys & Consumer Adoption.pptx
FIDO Alliance
PyCon SG 25 - Firecracker Made Easy with Python.pdf
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
Turning the Page How AI is Exponentially Increasing Speed, Accuracy, and Ef...
Turning the Page How AI is Exponentially Increasing Speed, Accuracy, and Ef...
Impelsys Inc.
Artificial Intelligence in the Nonprofit Boardroom.pdf
Artificial Intelligence in the Nonprofit Boardroom.pdf
OnBoard
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Powering Multi-Page Web Applications Using Flow Apps and FME Data Streaming
Safe Software
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC and Open Hackathons Monthly Highlights June 2025
OpenACC
June Patch Tuesday
June Patch Tuesday
Ivanti
AI vs Human Writing: Can You Tell the Difference?
AI vs Human Writing: Can You Tell the Difference?
Shashi Sathyanarayana, Ph.D
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
MuleSoft for AgentForce : Topic Center and API Catalog
MuleSoft for AgentForce : Topic Center and API Catalog
shyamraj55
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
From Manual to Auto Searching- FME in the Driver's Seat
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
Ad

RNNs for Speech

  • 1. RNNs for Speech Faster and smaller RNNs with new regularization techniques.
  • 2. Old Good RNNs Cannot train RNN!! Gradients get crazy!! Fishes are better at remembering!!! I watched Schmidhuber and liked him!! I dont care baseline, I use what the cool boys use!! Why so big, Occams will cry!! My GPU has 4GB!! I cant wait months to train!! X et al. said GRUs are better!!
  • 3. What else? I need a RNN size model with LSTM performance !! I need a smaller model or a better smart phone !! FastGRNN http://manikvarma.org/pubs/kusupati18.pdf This forget gate makes no sense!! May the ReLU be with you!! I do speech recognition!! I watched Bengio and liked him!! LightGRU https://arxiv.org/abs/1803.10225 I need Regularization!!! Dropout is not good!!! AWD-LSTM https://arxiv.org/abs/1708.02182
  • 4. Fast GRNN 2 trainable matrices vs 6 trainable matrices in a GRU layer. Low rank approximation of matrices: w = w1(w2).T Integer quantization for parameters. Piecewise linear approximation of non-linearities.
  • 7. Light Gated Recurrent Units Remove the reset gate. Replace tanh with ReLU Batch normalization to reduce ReLU unstability. Specifically targeting speech recognition. Orthogonal weight initialization, Variational dropout
  • 9. Results 40 log-mel filter banks Maximum likelihood linear regression
  • 11. ASGD Weight Dropped LSTM Drop Connect Averaged SGD Embedding Dropout Activation Regularization
  • 12. Weight Dropping Apply Drop-Connect to hidden to hidden connections. (All U matrices) Preventing recurrent unit overfitting. It needs not to modify optimized RNN implementations in DL frameworks. Apply the same dropout mask for the all sequence.
  • 13. Average SGD and NT-ASGD Number of steps to start averaging Weights optimized per iterationWeights used as the final model PyTorch implementation: https://github.com/pytorch/pytorch/blob/cd9b27231b51633e76e28b6a34002ab83b0660fc/torch/optim/asgd.py NT-ASGD: Only use ASGD when validation metric fails to improve
  • 14. Embedding Dropout Apply dropout in word level, that is dropout zeros-out randomly selected word vectors. Activation Regularization Panalize network for producing large changes in hidden states and large outputs leading to overfitting.