This document discusses methods for automated machine learning (AutoML) and optimization of hyperparameters. It focuses on accelerating the Nelder-Mead method for hyperparameter optimization using predictive parallel evaluation. Specifically, it proposes using a Gaussian process to model the objective function and perform predictive evaluations in parallel to reduce the number of actual function evaluations needed by the Nelder-Mead method. The results show this approach reduces evaluations by 49-63% compared to baseline methods.
The document discusses control as inference in Markov decision processes (MDPs) and partially observable MDPs (POMDPs). It introduces optimality variables that represent whether a state-action pair is optimal or not. It formulates the optimal action-value function Q* and optimal value function V* in terms of these optimality variables and the reward and transition distributions. Q* is defined as the log probability of a state-action pair being optimal, and V* is defined as the log probability of a state being optimal. Bellman equations are derived relating Q* and V* to the reward and next state value.
This document summarizes a presentation on offline reinforcement learning. It discusses how offline RL can learn from fixed datasets without further interaction with the environment, which allows for fully off-policy learning. However, offline RL faces challenges from distribution shift between the behavior policy that generated the data and the learned target policy. The document reviews several offline policy evaluation, policy gradient, and deep deterministic policy gradient methods, and also discusses using uncertainty and constraints to address distribution shift in offline deep reinforcement learning.
This document summarizes a presentation on offline reinforcement learning. It discusses how offline RL can learn from fixed datasets without further interaction with the environment, which allows for fully off-policy learning. However, offline RL faces challenges from distribution shift between the behavior policy that generated the data and the learned target policy. The document reviews several offline policy evaluation, policy gradient, and deep deterministic policy gradient methods, and also discusses using uncertainty and constraints to address distribution shift in offline deep reinforcement learning.
2018年3月のニューロコンピューティング研究会にて発表.確率行列分解のBayes汎化誤差に関する理論的な不等式について数値実験を試みたかったが,そもそもBayes推定をすることが困難な問題であった:パラメータが単体(simplex)上に存在するために,事後分布からサンプリングを行うことが難しい.そこで本研究ではハミルトニアンモンテカルロ法という効率的なMCMC法を用いてBayes推定をしてみた.理論値と比較し,確率行列分解に対するハミルトニアンモンテカルロ法の有効性を検証した.in Japanese
27. References
? [Gold 67] E. M. Gold, “Language identification in the limit”, Information
and Control, 10, pp.447-474, 1967.
? [Valiant 84] L. G. Valiant, “A theory of the learnable”, Communications of
the Association for Computing Machinery, 27, pp.1134-1142, 1984.
? [Angluin 87] D. Angluin, “Learning regular sets from queries and counter-
examples”, Information and Computaion, 75, pp.87-106, 1987.
27
28. 参考文献
? 榊原康文, 小林聡, 横森貴. 計算論的学習. 培風館 (2001).
? Philip D. Laird(著), 横森貴(訳). 例からの学習ー計算論的学
習理論ー. オーム社 (1992).
? Nello Cristianini, John Shawe-Taylor(著), 大北剛(訳), サポート
ベクターマシン入門, 共立出版 (2005).
28