7. 主な関連研究
1. Human-level control through deep reinforcement learning, Minh et al., 2015 (Nature)
a. DQN論文
2. Temporal abstraction (Sutton et al., 1998)
a. option(例えば、複数のアクションを一度に行なう)によって行動空間の時間軸での抽象化が行
える
3. Universal value function (Schaul et al., 2015)
a. 価値関数 V(s) を V(s; g) へと一般化
4. Intrisically Motivated RL (Singh et al., 2004)
a. 心理学でいう内発的要因を考慮できるようなモデル
5. Unifying Count-Based Exploration and Intrinsic Motivation. Bellemare et al., 2016 (NIPS)
a. DeepMindから出てるNIPS2016の論文。Montezuma’s Revengeでもハイスコア。arXivに出た
のは後発。ちなみにこの論文のことは引用してない。
18. 参考文献
1. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic
Motivation, Kulkarni et al., 2016 (NIPS) (https://arxiv.org/abs/1604.06057)
2. Human-level control through deep reinforcement learning, Minh et al., 2015 (Nature)
3. Universal Value Function Approximators, Schaul et al., 2015 (ICML)
4. Between MDPs and semi-MDPs:A framework for temporal abstractionin reinforcement learning,
Sutton et al., 1999 (Artificial Intelligence)
5. Intrinsically Motivated Reinforcement Learning, Singh et al., 2004
6. github.com/EthanMacdonald/h-DQN (https://github.com/EthanMacdonald/h-DQN)
7. Sample trajectory (gif) (https://goo.gl/3Z64Ji)