際際滷

際際滷Share a Scribd company logo
Perspective of Squeezing Model
Dense-Sparse-Dense Training for
DNN paper review &
squeeze model methods
davinnovation@gmail.com
一螻牛 譟磯
Why Dense-Sparse-Dense Training?
 Deep Neural Network Computer Vision, Natural Language Processing, Speech Recognition 煙
れ 覿殊 レ 焔レ 覲伎願 螻,  譬 焔リ骸 願  覲旧″ 覈語 
螳ロ蟆 螻 
 覲旧″ 覈語 Feature Output螳 蟯螻襯   牛  讌襷 一危一 Noise  クレ焔
讌蟆  ( Over Fitting ) 覓語 覦
 覈語 蠍磯ゼ 譴願 覃 under-fitting 覓語 覺谿蟆 朱, 譬 願屋覯 
 Dense-Sparse-Dense training flow (DSD) 蠍一ヾ 覈語 讌覃 旧 る 覦覯  
Overfitting 狩覃 覈語 讌 Dense-Sparse-Dense Training
 Dense-Sparse-Dense Training 豐 3螻襦 覈語 牛 螻殊 螳讌
 1螻 Dense 螻殊朱, 蠍一ヾ ル 旧  Gradient Decent襯 伎 Backpropagation
螻殊螻 狩
 2螻 Sparse 螻殊 Sparsity 曙 牛 覈語 蠏
 3螻 2螻 碁 weightれ  
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
1螻 : Dense
1螻 Dense weight value & connection 螳襯 牛 螻
*  iteration heuristic 朱 蟆一
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
2螻 : Sparse
2螻 Sparse 譴 weight襯   & 螳 る 螻
*  iteration, Top-k heuristic 朱 蟆一
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
3螻 : Dense
3螻 Dense 豌 weight襯   る 螻
*  iteration heuristic 朱 蟆一
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
1螻 : Dense
1螻 Dense weight value & connection 螳襯 牛 螻
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
2螻 : Sparse
2螻 Sparse 譴 weight襯   & 螳 る 螻
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
3螻 : Dense
3螻 Dense 豌 weight襯   る 螻
* 1螻 Dense 蟆郁骸覲企 Value螳  螳
weight 螳螳 企 == 螳螻
Dense-Sparse-Dense Training Result
31.1
31.5
30.4
24
14.5
30
27.2
29.2
22.9
13.4
10 15 20 25 30 35
GoogLe
Net
VGG-16
ResNet-
18
ResNet-
50
DeepSpe
ech2 DSD BaseLine
Baseline 覈碁れ DSD   , error rate螳 螳 蟆  螳
DSD Training
DSD results on Models
* Sparsity 30%
31.14
30.58
30.02
10.96
10.58
10.34
0 20 40
Baseline
Sparse
DSD
VGGNet
31.5
28.19
27.19
11.32
9.23
8.67
0 20 40
Baseline
Sparse
DSD
ResNet-50
24.01
23.55
22.89
7.02
6.88
6.47
0 10 20 30
Baseline
Sparse
DSD
Top-5
Top-1
GoogLeNet
Effect of DSD Training
1. Sparse螻, Re-Dense 螻殊 Saddle Point襯 豢 蟆 蟯谿 螳
2. 1 磯ジ 蟆郁骸 ,  譬 Minima 螳 螳讌
3. Sparse 螻 Training朱 誤 Noise   Robust 蟆渚レ 蟆 
 Sparse 螻 low dimension朱 牛蠍 覓語  Robust 覈語 れ伎螻, Re-Dense 螻殊 re-
initialization 牛 覈語 Robust煙 襷  
DSD Train 牛  螳蟇危螻, 焔レ 一企 覈語 襷
Other Learning Techniques
Dropout
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from
overfitting." Journal of machine learning research15.1 (2014): 1929-1958.
 Model Combination ( Ensemble ) 覃   焔レ 覈語 襷  朱, ル 覈語 襷れ
蟾 覲旧 覈語 旧り鍵 企れ
=>
Training 螻 Random蟆 Node襯
螻一 れ 朱,
1. れ  覈語 旧る
螻朱ゼ 至
2. 覈碁れ Ensemble  蟆郁骸襯
詞  
Randomly
Dropout
On training Phase
Other Learning Techniques
Dropout
Dropout Layer Visualization 觜蟲襯
誤企慨覃 Sparsity螳   蟆 誤
 
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from
overfitting." Journal of machine learning research15.1 (2014): 1929-1958.
Other Learning Techniques
Batch Normalization
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." International Conference on Machine
Learning. 2015.
 Deep Learning 覈   non-linear function activation朱  蟾 覈語  back-
propagation vanishing/exploding gradient 覓語螳 覦
=>
Activation Output 蠏覿襦 襦
襷れ伎 vanishing 覓語襯 願屋
1. 螳 Layer Output 襴 (谿襷 蠏)
2. Batch 襦 蠏
- On Training Phase
- Batch Mean螻 Variation 
- On Test Phase
- Training Phase Mean螻 Variation 企蠏
螳
Other Learning Techniques
Batch Normalization
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." International Conference on Machine
Learning. 2015.
Batch Normalization  觜襯 糾骸 豕譬朱   loss襯 蠍磯 螳
Other Learning Techniques
Residual Network
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2016.
 覈語 蟾伎 襦 Vanishing 覓語襦 誤 旧り鍵 れ 讌
=>
螳 Layer螳  layer襷 郁屋 蟆 
2螳  layer weight螳 郁屋 
 蠏  覲企ゼ 覦  襦
Other Learning Techniques
Residual Network
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2016.
ResNet 牛  蟾 Layer 覈語 旧
Other Learning Techniques
Dense Network
Iandola, Forrest, et al. "Densenet: Implementing efficient convnet descriptor pyramids."
arXiv preprint arXiv:1404.1869 (2014).
 覈語 蟾伎 襦 Vanishing 覓語襦 誤 旧り鍵 れ 讌
=>
螳 Layer螳  layer襷 郁屋 蟆 
N螳  layer weight螳 郁屋 
 蠏  覲企ゼ 覦  襦
Other Learning Techniques
Dense Network
Iandola, Forrest, et al. "Densenet: Implementing efficient convnet descriptor pyramids."
arXiv preprint arXiv:1404.1869 (2014).
  Parameter襦  譬 Accuracy襯 覲伎
Other Learning Techniques & Compression
Distilling
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural
network." arXiv preprint arXiv:1503.02531 (2015).
 Model Combination ( Ensemble ) 覃   焔レ 覈語 襷  朱, ル 覈語 襷れ
蟾 覲旧 覈語 旧り鍵 企れ
=>
Model Ensemble  , Ensemble
Output 覲企ゼ 螳讌螻 襦 Single
Model  伎朱   覈碁 
譬 焔レ 螳讌 Model 詞
Other Learning Techniques & Compression
Distilling
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural
network." arXiv preprint arXiv:1503.02531 (2015).
58.9
61.1
60.8
56 58 60 62
Baseline
10x Ensemble
Distilling
 蟆郁骸襯 覲企 Distilling  蟆郁骸螳 10x
Ensemble 蟆郁骸覲企る 焔レ 伎讌襷
Single Model覲企 焔レ 讌 蟆 覲
Other Learning Techniques & Compression
Singular Value Decomposition
Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for
efficient evaluation." Advances in Neural Information Processing Systems. 2014.
Fully Connected Layer weight matrix襯
singular value decomposition 牛
豢
 覈語 貉れ語 襷 weight襯 螳讌蟆 螻,  貉れ ( FC Layer螳 覿覿 weight襯 谿讌)
=>
Other Learning Techniques & Compression
Singular Value Decomposition
Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for
efficient evaluation." Advances in Neural Information Processing Systems. 2014.
Accuracy 伎 覈語 weight 譴蟆
Other Learning Techniques & Compression
Pruning
Han, Song, et al. "Learning both weights and connections for efficient neural network." NIPS.
2015.
DSD 蟆 Dense 螻殊 , Pruning
蟇一蟆 
 覈語 貉れ語 襷 weight襯 螳讌蟆 螻,  貉れ
=>
Other Learning Techniques & Compression
Pruning
Han, Song, et al. "Learning both weights and connections for efficient neural network." NIPS.
2015.
豕譬 Weight 襯 譴企伎 覈語 焔ル 觜訣蟆 讌
Other Learning Techniques & Compression
Pruning and splicing
Guo, Yiwen, et al. "Dynamic Network Surgery for Efficient DNNs." NIPS. 2016.
DSD 蟆 Dense 螻殊 , Pruning
蟇一螻, Re-Dense  Splicing螻殊
讌
 覈語 貉れ語 襷 weight襯 螳讌蟆 螻,  貉れ
=>
* Splicing 螻殊 : Weight螳 Training 螻殊   覿
伎 蟆一 ( Weight Update 螳 讌  )
Train
Network
Update
T_k
Update
W_k
DSD るジ  豕譬 Step Fully Connected 覿
Other Learning Techniques & Compression
Pruning and splicing
Guo, Yiwen, et al. "Dynamic Network Surgery for Efficient DNNs." NIPS. 2016.
豕譬 Weight 襯  襷 譴企伎 覈語 焔ル 觜訣蟆 讌
Other Compression
SqueezeNet
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
and< 0.5 MB model size."
覈 豌企ゼ  Weight襯 螳讌  襦
り
 覈語 貉れ語 襷 weight襯 螳讌蟆 螻,  貉れ
=>
* FireModule : SqueezeNet
Other Compression
SqueezeNet
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
and< 0.5 MB model size."
豕譬 Weight 襯 譴企伎 覈語 焔ル  一蟇磯 狩
References
 Han, Song, et al. "Dsd: Regularizing deep neural networks with dense-sparse-dense training flow." arXiv preprint
arXiv:1607.04381(2016).
 Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine
learning research15.1 (2014): 1929-1958.
 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European
conference on computer vision. Springer, Cham, 2014.
 Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint
arXiv:1503.02531(2015).
 Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing
internal covariate shift." International Conference on Machine Learning. 2015.
 He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on
computer vision and pattern recognition. 2016.
 Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for efficient
evaluation." Advances in Neural Information Processing Systems. 2014.
 Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model
size."
Q & A
and Discussion

More Related Content

Similar to Dense sparse-dense training for dnn and Other Models (20)

ル 朱語所鍵 efficient netv2 朱碁Μ觀
ル 朱語所鍵 efficient netv2  朱碁Μ觀ル 朱語所鍵 efficient netv2  朱碁Μ觀
ル 朱語所鍵 efficient netv2 朱碁Μ觀
taeseon ryu
Denoising auto encoders(d a)
Denoising auto encoders(d a)Denoising auto encoders(d a)
Denoising auto encoders(d a)
Tae Young Lee
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
Sunghoon Joo
History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AI
Tae Young Lee
求== 求==梶求午メ求 8
求== 求==梶求午メ求 8求== 求==梶求午メ求 8
求== 求==梶求午メ求 8
Sunggon Song
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
LEE HOSEONG
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural network
KyeongUkJang
Transfer learning usage
Transfer learning usageTransfer learning usage
Transfer learning usage
Tae Young Lee
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
seungwoo kim
PR12 Season3 Weight Agnostic Neural Networks
PR12 Season3 Weight Agnostic Neural NetworksPR12 Season3 Weight Agnostic Neural Networks
PR12 Season3 Weight Agnostic Neural Networks
Kyunghoon Jung
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Network
agdatalab
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
gohyunwoong
Learning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttributionLearning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttribution
Gyubin Son
Infra as a model service
Infra as a model serviceInfra as a model service
Infra as a model service
Tae Young Lee
Anomaly Detection based on Diffusion
Anomaly Detection based on DiffusionAnomaly Detection based on Diffusion
Anomaly Detection based on Diffusion
ssuserbaebf8
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
Sunghoon Joo
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
Susang Kim
ル 蠍磯蓋 襴 危
ル 蠍磯蓋 襴 危ル 蠍磯蓋 襴 危
ル 蠍磯蓋 襴 危
Hee Won Park
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
LEE HOSEONG
HistoryOfCNN
HistoryOfCNNHistoryOfCNN
HistoryOfCNN
Tae Young Lee
ル 朱語所鍵 efficient netv2 朱碁Μ觀
ル 朱語所鍵 efficient netv2  朱碁Μ觀ル 朱語所鍵 efficient netv2  朱碁Μ觀
ル 朱語所鍵 efficient netv2 朱碁Μ觀
taeseon ryu
Denoising auto encoders(d a)
Denoising auto encoders(d a)Denoising auto encoders(d a)
Denoising auto encoders(d a)
Tae Young Lee
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
Sunghoon Joo
History of Vision AI
History of Vision AIHistory of Vision AI
History of Vision AI
Tae Young Lee
求== 求==梶求午メ求 8
求== 求==梶求午メ求 8求== 求==梶求午メ求 8
求== 求==梶求午メ求 8
Sunggon Song
carrier of_tricks_for_image_classification
carrier of_tricks_for_image_classificationcarrier of_tricks_for_image_classification
carrier of_tricks_for_image_classification
LEE HOSEONG
Distilling the knowledge in a neural network
Distilling the knowledge in a neural networkDistilling the knowledge in a neural network
Distilling the knowledge in a neural network
KyeongUkJang
Transfer learning usage
Transfer learning usageTransfer learning usage
Transfer learning usage
Tae Young Lee
Deep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendationsDeep neural networks for You-Tube recommendations
Deep neural networks for You-Tube recommendations
seungwoo kim
PR12 Season3 Weight Agnostic Neural Networks
PR12 Season3 Weight Agnostic Neural NetworksPR12 Season3 Weight Agnostic Neural Networks
PR12 Season3 Weight Agnostic Neural Networks
Kyunghoon Jung
Deep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural NetworkDeep Learning & Convolutional Neural Network
Deep Learning & Convolutional Neural Network
agdatalab
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...Bag of Tricks for Image Classification  with Convolutional Neural Networks (C...
Bag of Tricks for Image Classification with Convolutional Neural Networks (C...
gohyunwoong
Learning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttributionLearning how to explain neural networks: PatternNet and PatternAttribution
Learning how to explain neural networks: PatternNet and PatternAttribution
Gyubin Son
Infra as a model service
Infra as a model serviceInfra as a model service
Infra as a model service
Tae Young Lee
Anomaly Detection based on Diffusion
Anomaly Detection based on DiffusionAnomaly Detection based on Diffusion
Anomaly Detection based on Diffusion
ssuserbaebf8
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
Sunghoon Joo
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...[Paper] EDA : easy data augmentation techniques for boosting performance on t...
[Paper] EDA : easy data augmentation techniques for boosting performance on t...
Susang Kim
ル 蠍磯蓋 襴 危
ル 蠍磯蓋 襴 危ル 蠍磯蓋 襴 危
ル 蠍磯蓋 襴 危
Hee Won Park
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
"Learning From Noisy Large-Scale Datasets With Minimal Supervision" Paper Review
LEE HOSEONG

More from Dong Heon Cho (20)

Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
Dong Heon Cho
What is Texture.pdf
What is Texture.pdfWhat is Texture.pdf
What is Texture.pdf
Dong Heon Cho
BADGE
BADGEBADGE
BADGE
Dong Heon Cho
Neural Radiance Field
Neural Radiance FieldNeural Radiance Field
Neural Radiance Field
Dong Heon Cho
2020 > Self supervised learning
2020 > Self supervised learning2020 > Self supervised learning
2020 > Self supervised learning
Dong Heon Cho
All about that pooling
All about that poolingAll about that pooling
All about that pooling
Dong Heon Cho
Background elimination review
Background elimination reviewBackground elimination review
Background elimination review
Dong Heon Cho
Transparent Latent GAN
Transparent Latent GANTransparent Latent GAN
Transparent Latent GAN
Dong Heon Cho
Image matting atoc
Image matting atocImage matting atoc
Image matting atoc
Dong Heon Cho
Multi object Deep reinforcement learning
Multi object Deep reinforcement learningMulti object Deep reinforcement learning
Multi object Deep reinforcement learning
Dong Heon Cho
Multi agent reinforcement learning for sequential social dilemmas
Multi agent reinforcement learning for sequential social dilemmasMulti agent reinforcement learning for sequential social dilemmas
Multi agent reinforcement learning for sequential social dilemmas
Dong Heon Cho
Multi agent System
Multi agent SystemMulti agent System
Multi agent System
Dong Heon Cho
Hybrid reward architecture
Hybrid reward architectureHybrid reward architecture
Hybrid reward architecture
Dong Heon Cho
Use Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutesUse Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutes
Dong Heon Cho
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...
Dong Heon Cho
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image Perspective
Dong Heon Cho
LOL win prediction
LOL win predictionLOL win prediction
LOL win prediction
Dong Heon Cho
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
Dong Heon Cho
Domain adaptation gan
Domain adaptation ganDomain adaptation gan
Domain adaptation gan
Dong Heon Cho
Squeeeze models
Squeeeze modelsSqueeeze models
Squeeeze models
Dong Heon Cho
Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
Dong Heon Cho
What is Texture.pdf
What is Texture.pdfWhat is Texture.pdf
What is Texture.pdf
Dong Heon Cho
Neural Radiance Field
Neural Radiance FieldNeural Radiance Field
Neural Radiance Field
Dong Heon Cho
2020 > Self supervised learning
2020 > Self supervised learning2020 > Self supervised learning
2020 > Self supervised learning
Dong Heon Cho
All about that pooling
All about that poolingAll about that pooling
All about that pooling
Dong Heon Cho
Background elimination review
Background elimination reviewBackground elimination review
Background elimination review
Dong Heon Cho
Transparent Latent GAN
Transparent Latent GANTransparent Latent GAN
Transparent Latent GAN
Dong Heon Cho
Image matting atoc
Image matting atocImage matting atoc
Image matting atoc
Dong Heon Cho
Multi object Deep reinforcement learning
Multi object Deep reinforcement learningMulti object Deep reinforcement learning
Multi object Deep reinforcement learning
Dong Heon Cho
Multi agent reinforcement learning for sequential social dilemmas
Multi agent reinforcement learning for sequential social dilemmasMulti agent reinforcement learning for sequential social dilemmas
Multi agent reinforcement learning for sequential social dilemmas
Dong Heon Cho
Multi agent System
Multi agent SystemMulti agent System
Multi agent System
Dong Heon Cho
Hybrid reward architecture
Hybrid reward architectureHybrid reward architecture
Hybrid reward architecture
Dong Heon Cho
Use Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutesUse Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutes
Dong Heon Cho
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...
Dong Heon Cho
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image Perspective
Dong Heon Cho
LOL win prediction
LOL win predictionLOL win prediction
LOL win prediction
Dong Heon Cho
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
Dong Heon Cho
Domain adaptation gan
Domain adaptation ganDomain adaptation gan
Domain adaptation gan
Dong Heon Cho

Dense sparse-dense training for dnn and Other Models

  • 1. Perspective of Squeezing Model Dense-Sparse-Dense Training for DNN paper review & squeeze model methods davinnovation@gmail.com 一螻牛 譟磯
  • 2. Why Dense-Sparse-Dense Training? Deep Neural Network Computer Vision, Natural Language Processing, Speech Recognition 煙 れ 覿殊 レ 焔レ 覲伎願 螻, 譬 焔リ骸 願 覲旧″ 覈語 螳ロ蟆 螻 覲旧″ 覈語 Feature Output螳 蟯螻襯 牛 讌襷 一危一 Noise クレ焔 讌蟆 ( Over Fitting ) 覓語 覦 覈語 蠍磯ゼ 譴願 覃 under-fitting 覓語 覺谿蟆 朱, 譬 願屋覯 Dense-Sparse-Dense training flow (DSD) 蠍一ヾ 覈語 讌覃 旧 る 覦覯 Overfitting 狩覃 覈語 讌 Dense-Sparse-Dense Training
  • 3. Dense-Sparse-Dense Training 豐 3螻襦 覈語 牛 螻殊 螳讌 1螻 Dense 螻殊朱, 蠍一ヾ ル 旧 Gradient Decent襯 伎 Backpropagation 螻殊螻 狩 2螻 Sparse 螻殊 Sparsity 曙 牛 覈語 蠏 3螻 2螻 碁 weightれ What is Dense-Sparse-Dense Training? Dense Sparse Dense
  • 4. What is Dense-Sparse-Dense Training? Dense Sparse Dense 1螻 : Dense 1螻 Dense weight value & connection 螳襯 牛 螻 * iteration heuristic 朱 蟆一
  • 5. What is Dense-Sparse-Dense Training? Dense Sparse Dense 2螻 : Sparse 2螻 Sparse 譴 weight襯 & 螳 る 螻 * iteration, Top-k heuristic 朱 蟆一
  • 6. What is Dense-Sparse-Dense Training? Dense Sparse Dense 3螻 : Dense 3螻 Dense 豌 weight襯 る 螻 * iteration heuristic 朱 蟆一
  • 7. What is Dense-Sparse-Dense Training? Dense Sparse Dense 1螻 : Dense 1螻 Dense weight value & connection 螳襯 牛 螻
  • 8. What is Dense-Sparse-Dense Training? Dense Sparse Dense 2螻 : Sparse 2螻 Sparse 譴 weight襯 & 螳 る 螻
  • 9. What is Dense-Sparse-Dense Training? Dense Sparse Dense 3螻 : Dense 3螻 Dense 豌 weight襯 る 螻 * 1螻 Dense 蟆郁骸覲企 Value螳 螳 weight 螳螳 企 == 螳螻
  • 10. Dense-Sparse-Dense Training Result 31.1 31.5 30.4 24 14.5 30 27.2 29.2 22.9 13.4 10 15 20 25 30 35 GoogLe Net VGG-16 ResNet- 18 ResNet- 50 DeepSpe ech2 DSD BaseLine Baseline 覈碁れ DSD , error rate螳 螳 蟆 螳
  • 11. DSD Training DSD results on Models * Sparsity 30% 31.14 30.58 30.02 10.96 10.58 10.34 0 20 40 Baseline Sparse DSD VGGNet 31.5 28.19 27.19 11.32 9.23 8.67 0 20 40 Baseline Sparse DSD ResNet-50 24.01 23.55 22.89 7.02 6.88 6.47 0 10 20 30 Baseline Sparse DSD Top-5 Top-1 GoogLeNet
  • 12. Effect of DSD Training 1. Sparse螻, Re-Dense 螻殊 Saddle Point襯 豢 蟆 蟯谿 螳 2. 1 磯ジ 蟆郁骸 , 譬 Minima 螳 螳讌 3. Sparse 螻 Training朱 誤 Noise Robust 蟆渚レ 蟆 Sparse 螻 low dimension朱 牛蠍 覓語 Robust 覈語 れ伎螻, Re-Dense 螻殊 re- initialization 牛 覈語 Robust煙 襷 DSD Train 牛 螳蟇危螻, 焔レ 一企 覈語 襷
  • 13. Other Learning Techniques Dropout Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine learning research15.1 (2014): 1929-1958. Model Combination ( Ensemble ) 覃 焔レ 覈語 襷 朱, ル 覈語 襷れ 蟾 覲旧 覈語 旧り鍵 企れ => Training 螻 Random蟆 Node襯 螻一 れ 朱, 1. れ 覈語 旧る 螻朱ゼ 至 2. 覈碁れ Ensemble 蟆郁骸襯 詞 Randomly Dropout On training Phase
  • 14. Other Learning Techniques Dropout Dropout Layer Visualization 觜蟲襯 誤企慨覃 Sparsity螳 蟆 誤 Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine learning research15.1 (2014): 1929-1958.
  • 15. Other Learning Techniques Batch Normalization Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International Conference on Machine Learning. 2015. Deep Learning 覈 non-linear function activation朱 蟾 覈語 back- propagation vanishing/exploding gradient 覓語螳 覦 => Activation Output 蠏覿襦 襦 襷れ伎 vanishing 覓語襯 願屋 1. 螳 Layer Output 襴 (谿襷 蠏) 2. Batch 襦 蠏 - On Training Phase - Batch Mean螻 Variation - On Test Phase - Training Phase Mean螻 Variation 企蠏 螳
  • 16. Other Learning Techniques Batch Normalization Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International Conference on Machine Learning. 2015. Batch Normalization 觜襯 糾骸 豕譬朱 loss襯 蠍磯 螳
  • 17. Other Learning Techniques Residual Network He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. 覈語 蟾伎 襦 Vanishing 覓語襦 誤 旧り鍵 れ 讌 => 螳 Layer螳 layer襷 郁屋 蟆 2螳 layer weight螳 郁屋 蠏 覲企ゼ 覦 襦
  • 18. Other Learning Techniques Residual Network He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. ResNet 牛 蟾 Layer 覈語 旧
  • 19. Other Learning Techniques Dense Network Iandola, Forrest, et al. "Densenet: Implementing efficient convnet descriptor pyramids." arXiv preprint arXiv:1404.1869 (2014). 覈語 蟾伎 襦 Vanishing 覓語襦 誤 旧り鍵 れ 讌 => 螳 Layer螳 layer襷 郁屋 蟆 N螳 layer weight螳 郁屋 蠏 覲企ゼ 覦 襦
  • 20. Other Learning Techniques Dense Network Iandola, Forrest, et al. "Densenet: Implementing efficient convnet descriptor pyramids." arXiv preprint arXiv:1404.1869 (2014). Parameter襦 譬 Accuracy襯 覲伎
  • 21. Other Learning Techniques & Compression Distilling Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015). Model Combination ( Ensemble ) 覃 焔レ 覈語 襷 朱, ル 覈語 襷れ 蟾 覲旧 覈語 旧り鍵 企れ => Model Ensemble , Ensemble Output 覲企ゼ 螳讌螻 襦 Single Model 伎朱 覈碁 譬 焔レ 螳讌 Model 詞
  • 22. Other Learning Techniques & Compression Distilling Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015). 58.9 61.1 60.8 56 58 60 62 Baseline 10x Ensemble Distilling 蟆郁骸襯 覲企 Distilling 蟆郁骸螳 10x Ensemble 蟆郁骸覲企る 焔レ 伎讌襷 Single Model覲企 焔レ 讌 蟆 覲
  • 23. Other Learning Techniques & Compression Singular Value Decomposition Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for efficient evaluation." Advances in Neural Information Processing Systems. 2014. Fully Connected Layer weight matrix襯 singular value decomposition 牛 豢 覈語 貉れ語 襷 weight襯 螳讌蟆 螻, 貉れ ( FC Layer螳 覿覿 weight襯 谿讌) =>
  • 24. Other Learning Techniques & Compression Singular Value Decomposition Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for efficient evaluation." Advances in Neural Information Processing Systems. 2014. Accuracy 伎 覈語 weight 譴蟆
  • 25. Other Learning Techniques & Compression Pruning Han, Song, et al. "Learning both weights and connections for efficient neural network." NIPS. 2015. DSD 蟆 Dense 螻殊 , Pruning 蟇一蟆 覈語 貉れ語 襷 weight襯 螳讌蟆 螻, 貉れ =>
  • 26. Other Learning Techniques & Compression Pruning Han, Song, et al. "Learning both weights and connections for efficient neural network." NIPS. 2015. 豕譬 Weight 襯 譴企伎 覈語 焔ル 觜訣蟆 讌
  • 27. Other Learning Techniques & Compression Pruning and splicing Guo, Yiwen, et al. "Dynamic Network Surgery for Efficient DNNs." NIPS. 2016. DSD 蟆 Dense 螻殊 , Pruning 蟇一螻, Re-Dense Splicing螻殊 讌 覈語 貉れ語 襷 weight襯 螳讌蟆 螻, 貉れ => * Splicing 螻殊 : Weight螳 Training 螻殊 覿 伎 蟆一 ( Weight Update 螳 讌 ) Train Network Update T_k Update W_k DSD るジ 豕譬 Step Fully Connected 覿
  • 28. Other Learning Techniques & Compression Pruning and splicing Guo, Yiwen, et al. "Dynamic Network Surgery for Efficient DNNs." NIPS. 2016. 豕譬 Weight 襯 襷 譴企伎 覈語 焔ル 觜訣蟆 讌
  • 29. Other Compression SqueezeNet Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." 覈 豌企ゼ Weight襯 螳讌 襦 り 覈語 貉れ語 襷 weight襯 螳讌蟆 螻, 貉れ => * FireModule : SqueezeNet
  • 30. Other Compression SqueezeNet Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." 豕譬 Weight 襯 譴企伎 覈語 焔ル 一蟇磯 狩
  • 31. References Han, Song, et al. "Dsd: Regularizing deep neural networks with dense-sparse-dense training flow." arXiv preprint arXiv:1607.04381(2016). Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine learning research15.1 (2014): 1929-1958. Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European conference on computer vision. Springer, Cham, 2014. Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531(2015). Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International Conference on Machine Learning. 2015. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for efficient evaluation." Advances in Neural Information Processing Systems. 2014. Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size."
  • 32. Q & A and Discussion