The document summarizes recent research related to "theory of mind" in multi-agent reinforcement learning. It discusses three papers that propose methods for agents to infer the intentions of other agents by applying concepts from theory of mind:
1. The papers propose that in multi-agent reinforcement learning, being able to understand the intentions of other agents could help with cooperation and increase success rates.
2. The methods aim to estimate the intentions of other agents by modeling their beliefs and private information, using ideas from theory of mind in cognitive science. This involves inferring information about other agents that is not directly observable.
3. Bayesian inference is often used to reason about the beliefs, goals and private information of other agents based
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
Introduction of "TrailBlazer" algorithmKatsuki Ohto
?
論文「Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning」紹介スライドです。NIPS2016読み会@PFN(2017/1/19) https://connpass.com/event/47580/ にて。
The document summarizes recent research related to "theory of mind" in multi-agent reinforcement learning. It discusses three papers that propose methods for agents to infer the intentions of other agents by applying concepts from theory of mind:
1. The papers propose that in multi-agent reinforcement learning, being able to understand the intentions of other agents could help with cooperation and increase success rates.
2. The methods aim to estimate the intentions of other agents by modeling their beliefs and private information, using ideas from theory of mind in cognitive science. This involves inferring information about other agents that is not directly observable.
3. Bayesian inference is often used to reason about the beliefs, goals and private information of other agents based
This document summarizes recent research on applying self-attention mechanisms from Transformers to domains other than language, such as computer vision. It discusses models that use self-attention for images, including ViT, DeiT, and T2T, which apply Transformers to divided image patches. It also covers more general attention modules like the Perceiver that aims to be domain-agnostic. Finally, it discusses work on transferring pretrained language Transformers to other modalities through frozen weights, showing they can function as universal computation engines.
Introduction of "TrailBlazer" algorithmKatsuki Ohto
?
論文「Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning」紹介スライドです。NIPS2016読み会@PFN(2017/1/19) https://connpass.com/event/47580/ にて。
34. Future Work
?sequential density model の選択によって状態空間上の距離が定義出来るか
?Solomonoff induction (Hutter, 2005) のような全域な確率密度モデルの解析
?sequential density modelとDQNにおけるQ-learningの学習速度があっていないの
で、密度モデルに忘却を導入するか、密度モデルとQ関数を対応が取れたものにす
る
?連続空間においてもpseusdo-count が回数の概念に合うかどうかの検証
35. 引用文献
Bellemare et al. (2016). Unifying Count-Based Exploration and Intrinsic Motivation. NIPS2016
Mnih et al. (2013). Playing Atari with Deep Reinforcement Learning.
Mnih et al. (2015). Human-level control through deep reinforcement learning.
Nair et al. (2015). Massively Parallel Methods for Deep Reinforcement Learning.
van Hasselt et al. (2015). Deep Reinforcement Learning with Double Q-learning.
Kulkarni et al. (2016). Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and
Intrinsic Motivation. NIPS2016
Bellemare et al. (2014). Skip Context Tree Switching. 31st ICML
36. 引用文献
van Hasselt et al. (2015). Double Q-learning. NIPS2010
Machado et al. (2014). Domain-independent optimistic initialization for reinforcement learning.
arxiv:1410.4604
Mnih et al. (2016). Asynchronous methods for deep reinforcement learning. arXiv:1602.01783
Strehl and Littman. (2008). An analysis of model-based interval estimation for Markov desicion process.
Journal of Computer Science, 74(8):1309 - 1331.
Kolter and Ng. (2009). Near-bayesian exploration in polynominal time. 26th ICML
Shumidhuber. (2008). Driven by compression progress.
37. 引用文献
Bellemare et al. (2013). The Arcade Learning Environment: An evaluation platform for general agents.
Journal of Artificial Intelligence Research, 47: 253-279
Hutter (2005). Universal artificial intelligence: Sequential decisions based on algorithmic probability.
Springer