This document summarizes a presentation on offline reinforcement learning. It discusses how offline RL can learn from fixed datasets without further interaction with the environment, which allows for fully off-policy learning. However, offline RL faces challenges from distribution shift between the behavior policy that generated the data and the learned target policy. The document reviews several offline policy evaluation, policy gradient, and deep deterministic policy gradient methods, and also discusses using uncertainty and constraints to address distribution shift in offline deep reinforcement learning.
The document summarizes a research paper that compares the performance of MLP-based models to Transformer-based models on various natural language processing and computer vision tasks. The key points are:
1. Gated MLP (gMLP) architectures can achieve performance comparable to Transformers on most tasks, demonstrating that attention mechanisms may not be strictly necessary.
2. However, attention still provides benefits for some NLP tasks, as models combining gMLP and attention outperformed pure gMLP models on certain benchmarks.
3. For computer vision, gMLP achieved results close to Vision Transformers and CNNs on image classification, indicating gMLP can match their data efficiency.
データマイニングや機械学習をやるときによく問題となる「リーケージ」を防ぐ方法について論じた論文「Leakage in Data Mining: Formulation, Detecting, and Avoidance」(Kaufman, Shachar, et al., ACM Transactions on Knowledge Discovery from Data (TKDD) 6.4 (2012): 1-21.)を解説します。
主な内容は以下のとおりです。
?過去に起きたリーケージの事例の紹介
?リーケージを防ぐための2つの考え方
?リーケージの発見
?リーケージの修正
This document summarizes a presentation on offline reinforcement learning. It discusses how offline RL can learn from fixed datasets without further interaction with the environment, which allows for fully off-policy learning. However, offline RL faces challenges from distribution shift between the behavior policy that generated the data and the learned target policy. The document reviews several offline policy evaluation, policy gradient, and deep deterministic policy gradient methods, and also discusses using uncertainty and constraints to address distribution shift in offline deep reinforcement learning.
The document summarizes a research paper that compares the performance of MLP-based models to Transformer-based models on various natural language processing and computer vision tasks. The key points are:
1. Gated MLP (gMLP) architectures can achieve performance comparable to Transformers on most tasks, demonstrating that attention mechanisms may not be strictly necessary.
2. However, attention still provides benefits for some NLP tasks, as models combining gMLP and attention outperformed pure gMLP models on certain benchmarks.
3. For computer vision, gMLP achieved results close to Vision Transformers and CNNs on image classification, indicating gMLP can match their data efficiency.
データマイニングや機械学習をやるときによく問題となる「リーケージ」を防ぐ方法について論じた論文「Leakage in Data Mining: Formulation, Detecting, and Avoidance」(Kaufman, Shachar, et al., ACM Transactions on Knowledge Discovery from Data (TKDD) 6.4 (2012): 1-21.)を解説します。
主な内容は以下のとおりです。
?過去に起きたリーケージの事例の紹介
?リーケージを防ぐための2つの考え方
?リーケージの発見
?リーケージの修正