This document discusses self-supervised representation learning (SRL) for reinforcement learning tasks. SRL learns state representations by using prediction tasks as an auxiliary objective. The key ideas are: (1) SRL learns an encoder that maps observations to states using a prediction task like modeling future states or actions; (2) The learned state representations improve generalization and exploration in reinforcement learning algorithms; (3) Several SRL methods are discussed, including world models, inverse models, and causal infoGANs.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
The document discusses control as inference in Markov decision processes (MDPs) and partially observable MDPs (POMDPs). It introduces optimality variables that represent whether a state-action pair is optimal or not. It formulates the optimal action-value function Q* and optimal value function V* in terms of these optimality variables and the reward and transition distributions. Q* is defined as the log probability of a state-action pair being optimal, and V* is defined as the log probability of a state being optimal. Bellman equations are derived relating Q* and V* to the reward and next state value.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
This document discusses Mahout, an Apache project for machine learning algorithms like classification, clustering, and pattern mining. It describes using Mahout with Hadoop to build a Naive Bayes classifier on Wikipedia data to classify articles into categories like "game" and "sports". The process includes splitting Wikipedia XML, training the classifier on Hadoop, and testing it to generate a confusion matrix. Mahout can also integrate with other systems like HBase for real-time classification.
* Satoshi Hara and Kohei Hayashi. Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach. AISTATS'18 (to appear).
arXiv ver.: https://arxiv.org/abs/1606.09066#
* GitHub
https://github.com/sato9hara/defragTrees
The document discusses control as inference in Markov decision processes (MDPs) and partially observable MDPs (POMDPs). It introduces optimality variables that represent whether a state-action pair is optimal or not. It formulates the optimal action-value function Q* and optimal value function V* in terms of these optimality variables and the reward and transition distributions. Q* is defined as the log probability of a state-action pair being optimal, and V* is defined as the log probability of a state being optimal. Bellman equations are derived relating Q* and V* to the reward and next state value.
This document discusses generative adversarial networks (GANs) and their relationship to reinforcement learning. It begins with an introduction to GANs, explaining how they can generate images without explicitly defining a probability distribution by using an adversarial training process. The second half discusses how GANs are related to actor-critic models and inverse reinforcement learning in reinforcement learning. It explains how GANs can be viewed as training a generator to fool a discriminator, similar to how policies are trained in reinforcement learning.
This document discusses Mahout, an Apache project for machine learning algorithms like classification, clustering, and pattern mining. It describes using Mahout with Hadoop to build a Naive Bayes classifier on Wikipedia data to classify articles into categories like "game" and "sports". The process includes splitting Wikipedia XML, training the classifier on Hadoop, and testing it to generate a confusion matrix. Mahout can also integrate with other systems like HBase for real-time classification.
This document discusses Python and machine learning libraries like scikit-learn. It provides code examples for loading data, fitting models, and making predictions using scikit-learn algorithms. It also covers working with NumPy arrays and loading data from files like CSVs.