Perspective of Squeezing Model
Dense-Sparse-Dense Training for
DNN paper review &
squeeze model methods
一螻牛 譟磯
Why Dense-Sparse-Dense Training?
 Deep Neural Network Computer Vision, Natural Language Processing, Speech Recognition 煙
れ 覿殊 レ 焔レ 覲伎願 螻,  譬 焔リ骸 願  覲旧″ 覈語 
螳ロ蟆 螻 
 覲旧″ 覈語 Feature Output螳 蟯螻襯   牛  讌襷 一危一 Noise  クレ焔
讌蟆  ( Over Fitting ) 覓語 覦
 覈語 蠍磯ゼ 譴願 覃 under-fitting 覓語 覺谿蟆 朱, 譬 願屋覯 
 Dense-Sparse-Dense training flow (DSD) 蠍一ヾ 覈語 讌覃 旧 る 覦覯  
Overfitting 狩覃 覈語 讌 Dense-Sparse-Dense Training
 Dense-Sparse-Dense Training 豐 3螻襦 覈語 牛 螻殊 螳讌
 1螻 Dense 螻殊朱, 蠍一ヾ ル 旧  Gradient Decent襯 伎 Backpropagation
螻殊螻 狩
 2螻 Sparse 螻殊 Sparsity 曙 牛 覈語 蠏
 3螻 2螻 碁 weightれ  
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
1螻 : Dense
1螻 Dense weight value & connection 螳襯 牛 螻
*  iteration heuristic 朱 蟆一
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
2螻 : Sparse
2螻 Sparse 譴 weight襯   & 螳 る 螻
*  iteration, Top-k heuristic 朱 蟆一
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
3螻 : Dense
3螻 Dense 豌 weight襯   る 螻
*  iteration heuristic 朱 蟆一
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
1螻 : Dense
1螻 Dense weight value & connection 螳襯 牛 螻
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
2螻 : Sparse
2螻 Sparse 譴 weight襯   & 螳 る 螻
What is Dense-Sparse-Dense Training?
Dense Sparse Dense
3螻 : Dense
3螻 Dense 豌 weight襯   る 螻
* 1螻 Dense 蟆郁骸覲企 Value螳  螳
weight 螳螳 企 == 螳螻
Dense-Sparse-Dense Training Result
10 15 20 25 30 35
ech2 DSD BaseLine
Baseline 覈碁れ DSD   , error rate螳 螳 蟆  螳
DSD Training
DSD results on Models
* Sparsity 30%
0 20 40
0 20 40
0 10 20 30
Effect of DSD Training
1. Sparse螻, Re-Dense 螻殊 Saddle Point襯 豢 蟆 蟯谿 螳
2. 1 磯ジ 蟆郁骸 ,  譬 Minima 螳 螳讌
3. Sparse 螻 Training朱 誤 Noise   Robust 蟆渚レ 蟆 
 Sparse 螻 low dimension朱 牛蠍 覓語  Robust 覈語 れ伎螻, Re-Dense 螻殊 re-
initialization 牛 覈語 Robust煙 襷  
DSD Train 牛  螳蟇危螻, 焔レ 一企 覈語 襷
Other Learning Techniques
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from
overfitting." Journal of machine learning research15.1 (2014): 1929-1958.
 Model Combination ( Ensemble ) 覃   焔レ 覈語 襷  朱, ル 覈語 襷れ
蟾 覲旧 覈語 旧り鍵 企れ
Training 螻 Random蟆 Node襯
螻一 れ 朱,
1. れ  覈語 旧る
螻朱ゼ 至
2. 覈碁れ Ensemble  蟆郁骸襯
On training Phase
Other Learning Techniques
Dropout Layer Visualization 觜蟲襯
誤企慨覃 Sparsity螳   蟆 誤
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from
overfitting." Journal of machine learning research15.1 (2014): 1929-1958.
Other Learning Techniques
Batch Normalization
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." International Conference on Machine
Learning. 2015.
 Deep Learning 覈   non-linear function activation朱  蟾 覈語  back-
propagation vanishing/exploding gradient 覓語螳 覦
Activation Output 蠏覿襦 襦
襷れ伎 vanishing 覓語襯 願屋
1. 螳 Layer Output 襴 (谿襷 蠏)
2. Batch 襦 蠏
- On Training Phase
- Batch Mean螻 Variation 
- On Test Phase
- Training Phase Mean螻 Variation 企蠏
Other Learning Techniques
Batch Normalization
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network
training by reducing internal covariate shift." International Conference on Machine
Learning. 2015.
Batch Normalization  觜襯 糾骸 豕譬朱   loss襯 蠍磯 螳
Other Learning Techniques
Residual Network
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2016.
 覈語 蟾伎 襦 Vanishing 覓語襦 誤 旧り鍵 れ 讌
螳 Layer螳  layer襷 郁屋 蟆 
2螳  layer weight螳 郁屋 
 蠏  覲企ゼ 覦  襦
Other Learning Techniques
Residual Network
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the
IEEE conference on computer vision and pattern recognition. 2016.
ResNet 牛  蟾 Layer 覈語 旧
Other Learning Techniques
Dense Network
Iandola, Forrest, et al. "Densenet: Implementing efficient convnet descriptor pyramids."
arXiv preprint arXiv:1404.1869 (2014).
 覈語 蟾伎 襦 Vanishing 覓語襦 誤 旧り鍵 れ 讌
螳 Layer螳  layer襷 郁屋 蟆 
N螳  layer weight螳 郁屋 
 蠏  覲企ゼ 覦  襦
Other Learning Techniques
Dense Network
Iandola, Forrest, et al. "Densenet: Implementing efficient convnet descriptor pyramids."
arXiv preprint arXiv:1404.1869 (2014).
  Parameter襦  譬 Accuracy襯 覲伎
Other Learning Techniques & Compression
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural
network." arXiv preprint arXiv:1503.02531 (2015).
 Model Combination ( Ensemble ) 覃   焔レ 覈語 襷  朱, ル 覈語 襷れ
蟾 覲旧 覈語 旧り鍵 企れ
Model Ensemble  , Ensemble
Output 覲企ゼ 螳讌螻 襦 Single
Model  伎朱   覈碁 
譬 焔レ 螳讌 Model 詞
Other Learning Techniques & Compression
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural
network." arXiv preprint arXiv:1503.02531 (2015).
56 58 60 62
10x Ensemble
 蟆郁骸襯 覲企 Distilling  蟆郁骸螳 10x
Ensemble 蟆郁骸覲企る 焔レ 伎讌襷
Single Model覲企 焔レ 讌 蟆 覲
Other Learning Techniques & Compression
Singular Value Decomposition
Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for
efficient evaluation." Advances in Neural Information Processing Systems. 2014.
Fully Connected Layer weight matrix襯
singular value decomposition 牛
 覈語 貉れ語 襷 weight襯 螳讌蟆 螻,  貉れ ( FC Layer螳 覿覿 weight襯 谿讌)
Other Learning Techniques & Compression
Singular Value Decomposition
Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for
efficient evaluation." Advances in Neural Information Processing Systems. 2014.
Accuracy 伎 覈語 weight 譴蟆
Other Learning Techniques & Compression
Han, Song, et al. "Learning both weights and connections for efficient neural network." NIPS.
DSD 蟆 Dense 螻殊 , Pruning
 覈語 貉れ語 襷 weight襯 螳讌蟆 螻,  貉れ
Other Learning Techniques & Compression
Han, Song, et al. "Learning both weights and connections for efficient neural network." NIPS.
豕譬 Weight 襯 譴企伎 覈語 焔ル 觜訣蟆 讌
Other Learning Techniques & Compression
Pruning and splicing
Guo, Yiwen, et al. "Dynamic Network Surgery for Efficient DNNs." NIPS. 2016.
DSD 蟆 Dense 螻殊 , Pruning
蟇一螻, Re-Dense  Splicing螻殊
 覈語 貉れ語 襷 weight襯 螳讌蟆 螻,  貉れ
* Splicing 螻殊 : Weight螳 Training 螻殊   覿
伎 蟆一 ( Weight Update 螳 讌  )
DSD るジ  豕譬 Step Fully Connected 覿
Other Learning Techniques & Compression
Pruning and splicing
Guo, Yiwen, et al. "Dynamic Network Surgery for Efficient DNNs." NIPS. 2016.
豕譬 Weight 襯  襷 譴企伎 覈語 焔ル 觜訣蟆 讌
Other Compression
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
and< 0.5 MB model size."
覈 豌企ゼ  Weight襯 螳讌  襦
 覈語 貉れ語 襷 weight襯 螳讌蟆 螻,  貉れ
* FireModule : SqueezeNet
Other Compression
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
and< 0.5 MB model size."
豕譬 Weight 襯 譴企伎 覈語 焔ル  一蟇磯 狩
 Han, Song, et al. "Dsd: Regularizing deep neural networks with dense-sparse-dense training flow." arXiv preprint
 Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of machine
learning research15.1 (2014): 1929-1958.
 Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European
conference on computer vision. Springer, Cham, 2014.
 Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint
 Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing
internal covariate shift." International Conference on Machine Learning. 2015.
 He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on
computer vision and pattern recognition. 2016.
 Denton, Emily L., et al. "Exploiting linear structure within convolutional networks for efficient
evaluation." Advances in Neural Information Processing Systems. 2014.
 Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model
Q & A
and Discussion

