This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
1. The document discusses energy-based models (EBMs) and how they can be applied to classifiers. It introduces noise contrastive estimation and flow contrastive estimation as methods to train EBMs.
2. One paper presented trains energy-based models using flow contrastive estimation by passing data through a flow-based generator. This allows implicit modeling with EBMs.
3. Another paper argues that classifiers can be viewed as joint energy-based models over inputs and outputs, and should be treated as such. It introduces a method to train classifiers as EBMs using contrastive divergence.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
1. The document discusses energy-based models (EBMs) and how they can be applied to classifiers. It introduces noise contrastive estimation and flow contrastive estimation as methods to train EBMs.
2. One paper presented trains energy-based models using flow contrastive estimation by passing data through a flow-based generator. This allows implicit modeling with EBMs.
3. Another paper argues that classifiers can be viewed as joint energy-based models over inputs and outputs, and should be treated as such. It introduces a method to train classifiers as EBMs using contrastive divergence.
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
本スライドは、弊社の梅本により弊社内の技術勉強会で使用されたものです。
近年注目を集めるアーキテクチャーである「Transformer」の解説スライドとなっております。
"Arithmer Seminar" is weekly held, where professionals from within and outside our company give lectures on their respective expertise.
The slides are made by the lecturer from outside our company, and shared here with his/her permission.
Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。
Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
40. 参考文献 (1/4)
41
◆近接勾配法 (proximal gradient, forward-backward splitting)
1. G. B. Passty, “Ergodic convergence to a zero of the sum of monotone operators
in Hilbert space,” J. Math. Anal. Appl., 1979. (原著)
2. G. Chen & R. T. Rockafellar, “Convergence rates in forward–backward splitting,”
SIAM J. Optim., 1997. (収束レート解析)
3. P. L. Combettes & V. R. Wajs, ”Signal recovery by proximal forward–backward
splitting,” SIAM Multiscale Model. Simul., 2005. (更なる理論的整備)
4. A. Beck & M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for
linear inverse problems,” SIAM J. Imag. Sci., 2009. (通称 FISTA, Nesterov の optimal
gradient を融合)
5. M. Yamagishi & I. Yamada, “Over-relaxation of the fast iterative shrinkage-
thresholding algorithm with variable stepsize,” Inverse Probl., 2011. (FISTA のス
テップサイズ拡張)
6. I. Daubechies et al., “An iterative thresholding algorithm for linear inverse
problems with a sparsity constraint,” Comm. Pure Appl. Math., 2004. (スパース復
元応用)
7. J. Duchi & Y. Singer, “Efficient online and batch learning using forward-
backward splitting,” J Mach. Learn. Res., 2009. (機械学習応用)
41. 参考文献 (2/4)
42
◆ADMM (Alternating Direction Method of Multipliers)
1. D. Gabay & B. Mercier, “A dual algorithm for the solution of nonlinear
variational problems via finite elements approximations,” Comput. Math. Appl.,
1976. (原著)
2. J. Eckstein & D. P. Bertsekas, “On the Douglas-Rachford splitting method and the
proximal point algorithm for maximal monotone operators,” Math. Program.,
1992. (Dougal-Rachford splittingに基づく導出と理論的整備)
3. B. He & X. Yuan “On the O(1/n) Convergence Rate of the Douglas–Rachford
Alternating Direction Method,” SIAM J. Numer. Anal., 2012. (収束レート)
4. S. Boyd et al., “Distributed optimization and statistical learning via the
alternating direction method of multipliers,” Found. Trends Mach. Learn., 2011.
(分散最適化?機械学習適用とレビュー)
5. J. Eckstein & W. Yao, “Understanding the convergence of the alternating
direction method of multipliers: Theoretical and computational perspectives,”
Pac. J. Optim., (to appear & available online). (非拡大写像の不動点的解釈)
6. M. Afonso et al. “An augmented Lagrangian approach to the constrained
optimization formulation of imaging inverse problems,” IEEE Trans. Image
Process., 2011. (画像復元応用)
7. S. Ono et al., “Cartoon-texture image decomposition using blockwise low-rank
texture characterization,” IEEE Trans. Image Process., 2014. (画像分離応用)
42. 参考文献 (3/4)
43
◆主-双対近接分離法 (primal-dual splitting)
1. A. Chambolle & T. Pock, “A first-order primal-dual algorithm for convex
problems with applications to imaging,” J. Math. Imag. Vis., 2011. (原著、少し限
定的な形)
2. L. Condat, “A primal-dual splitting method for convex optimization involving
Lipschitzian, proximable and linear composite terms,” J. Optim. Theory Appl.,
2013. (原著、今回紹介した形)
3. R. Bo? & E. Csetnek, “On the convergence rate of a forward-backward type
primal-dual splitting algorithm for convex optimization problems,”
Optimization, 2015. (収束レート)
4. S. Ono and I. Yamada, “Decorrelated vectorial total variation,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), 2014. (カラー画像復元応用)
5. S. Ono, M. Yamagishi, and I. Yamada, “A sparse system identification by using
adaptively-weighted total variation via a primal-dual splitting approach,” in
Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2013. (適応信号処理
応用)
6. S. Ono and I. Yamada, “Hierarchical convex optimization with primal-dual
splitting,” IEEE Trans. Signal Process., 2015. (主-双対解集合から所望の解を選出す
る階層型凸最適化の提案)
43. 参考文献 (4/4)
44
◆その他、関係する and/or 役立つと思われる文献?書籍
1. J. J. Moreau, “Fonctions convexes duales et points proximaux dans un espace
Hilbertien,” (in French) C. R. Acad. Sci. Paris Ser. AMath., 1962. (近接写像初出)
2. H. H. Bauschke & P. L. Combettes, Convex analysis and monotone operator
theory in Hilbert spaces. Springer, 2011. (凸解析と単調写像理論について体系的
にまとめた良書、議論は基本的に無限次元ヒルベルト空間で展開)
3. P. L. Combettes & J.-C. Pesquet, “Proximal splitting methods in signal
processing,” in Fixed-Point Algorithm for Inverse Problems in Science and
Engineering, Springer-Verlag, 2011. (近接分離系の手法のレビュー、近接写像が効
率的に計算可能な凸関数のリスト)
4. I. Yamada et al., “Minimizing the Moreau envelope of nonsmooth convex
functions over the fixed point set of certain quasi-nonexpansive mappings,” in
Fixed-Point Algorithm for Inverse Problems in Science and Engineering, Springer-
Verlag, 2011. (いくつかの近接分離系の手法を非拡大写像の不動点的視点から解説)
5. S. Ono & I. Yamada, “Signal recovery with certain involved convex data-fidelity
constraints,” IEEE Trans. Signal Process., 2015. (近接分離系の手法では扱えないデー
タ忠実性制約が課された信号復元問題を解く最適化アルゴリズムの提案)