21. 参考?献
? Sebastian Ruder. “An overview of gradient descent optimization algorithms”. http://ruder.io/optimizing-gradient-descent/.
? Sebastian Ruder. “Optimization for Deep Learning Highlights in 2017”. http://ruder.io/deep-learning-optimization-
2017/index.html.
? Ian Goodfellow and Yoshua Bengio and Aaron Courville. “Deep Learning”. http://www.deeplearningbook.org.
? Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., … Polosukhin, I. (2017). Attention Is All You
Need. In Advances in Neural Information Processing Systems.
? Loshchilov, I., & Hutter, F. (2017). SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of ICLR 2017.
? Loshchilov, I., & Hutter, F. (2017). Fixing Weight Decay Regularization in Adam. arXiv Preprint arXi1711.05101. Retrieved
from http://arxiv.org/abs/1711.05101
? Zeiler, M. D. (2012). ADADELTA: An Adaptive Learning Rate Method. Retrieved from http://arxiv.org/abs/1212.5701
? Kingma, D. P., & Ba, J. L. (2015). Adam: a Method for Stochastic Optimization. International Conference on Learning
Representations, 1?13.
? Masaaki Imaizumi. “深層学習による?滑らかな関数の推定”. 狠狠撸Share.
/masaakiimaizumi1/ss-87969960.
? nishio.”勾配降下法の最適化アルゴリズム”. 狠狠撸Share. /nishio/ss-66840545