34. Future Work
?sequential density model の選択によって状態空間上の距離が定義出来るか
?Solomonoff induction (Hutter, 2005) のような全域な確率密度モデルの解析
?sequential density modelとDQNにおけるQ-learningの学習速度があっていないの
で、密度モデルに忘却を導入するか、密度モデルとQ関数を対応が取れたものにす
る
?連続空間においてもpseusdo-count が回数の概念に合うかどうかの検証
35. 引用文献
Bellemare et al. (2016). Unifying Count-Based Exploration and Intrinsic Motivation. NIPS2016
Mnih et al. (2013). Playing Atari with Deep Reinforcement Learning.
Mnih et al. (2015). Human-level control through deep reinforcement learning.
Nair et al. (2015). Massively Parallel Methods for Deep Reinforcement Learning.
van Hasselt et al. (2015). Deep Reinforcement Learning with Double Q-learning.
Kulkarni et al. (2016). Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and
Intrinsic Motivation. NIPS2016
Bellemare et al. (2014). Skip Context Tree Switching. 31st ICML
36. 引用文献
van Hasselt et al. (2015). Double Q-learning. NIPS2010
Machado et al. (2014). Domain-independent optimistic initialization for reinforcement learning.
arxiv:1410.4604
Mnih et al. (2016). Asynchronous methods for deep reinforcement learning. arXiv:1602.01783
Strehl and Littman. (2008). An analysis of model-based interval estimation for Markov desicion process.
Journal of Computer Science, 74(8):1309 - 1331.
Kolter and Ng. (2009). Near-bayesian exploration in polynominal time. 26th ICML
Shumidhuber. (2008). Driven by compression progress.
37. 引用文献
Bellemare et al. (2013). The Arcade Learning Environment: An evaluation platform for general agents.
Journal of Artificial Intelligence Research, 47: 253-279
Hutter (2005). Universal artificial intelligence: Sequential decisions based on algorithmic probability.
Springer