This document introduces the deep reinforcement learning model 'A3C' by Japanese.
Original literature is "Asynchronous Methods for Deep Reinforcement Learning" written by V. Mnih, et. al.
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
?
I reviewed the following papers.
- T. Haarnoja, et al., ^Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., ^Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., ^Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
This document introduces the deep reinforcement learning model 'A3C' by Japanese.
Original literature is "Asynchronous Methods for Deep Reinforcement Learning" written by V. Mnih, et. al.
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
?
I reviewed the following papers.
- T. Haarnoja, et al., ^Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., ^Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., ^Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
I updated the previous slides.
Previous slides: /DongMinLee32/causal-confusion-in-imitation-learning-238882277
I reviewed the "Causal Confusion in Imitation Learning" paper.
Paper link: https://papers.nips.cc/paper/9343-causal-confusion-in-imitation-learning.pdf
- Abstract
Behavioral cloning reduces policy learning to supervised learning by training a discriminative model to predict expert actions given observations. Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment. We point out that ignoring causality is particularly damaging because of the distributional shift in imitation learning. In particular, it leads to a counter-intuitive ^causal misidentification ̄ phenomenon: access to more information can yield worse performance. We investigate how this problem arises, and propose a solution to combat it through targeted interventions!either environment interaction or expert queries!to determine the correct causal model. We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.
- Outline
1. Introduction
2. Causality and Causal Inference
3. Causality in Imitation Learning
4. Experiments Setting
5. Resolving Causal Misidentification
- Causal Graph-Parameterized Policy Learning
- Targeted Intervention
6. Experiments
Thank you!
Character Controllers using Motion VAEsDongmin Lee
?
Title: Character Controllers using Motion VAEs
Proceeding: ACM Transactions on Graphics (TOG) (Proc. SIGGRAPH 2020)
Paper: https://dl.acm.org/doi/abs/10.1145/3386569.3392422
Video: https://www.youtube.com/watch?v=Zm3G9oqmQ4Y
Given example motions, how can we generalize these to produce new purposeful motions?
We take a two-step approach to this problem
? Kinematic generative model based on an autoregressive conditional variational autoencoder or motion VAE (MVAE)
? Learning controller to generate desired motions using Deep Reinforcement Learning (Deep RL)
I reviewed the "Causal Confusion in Imitation Learning" paper.
- Abstract
Behavioral cloning reduces policy learning to supervised learning by training a discriminative model to predict expert actions given observations. Such discriminative models are non-causal: the training procedure is unaware of the causal structure of the interaction between the expert and the environment. We point out that ignoring causality is particularly damaging because of the distributional shift in imitation learning. In particular, it leads to a counter-intuitive ^causal misidentification ̄ phenomenon: access to more information can yield worse performance. We investigate how this problem arises, and propose a solution to combat it through targeted interventions!either environment interaction or expert queries!to determine the correct causal model. We show that causal misidentification occurs in several benchmark control domains as well as realistic driving settings, and validate our solution against DAgger and other baselines and ablations.
- Outline
1. Introduction
2. Causality and Causal Inference
3. Causality in Imitation Learning
4. Experiments Setting
5. Resolving Causal Misidentification
- Causal Graph-Parameterized Policy Learning
- Targeted Intervention
6. Experiments
Link: https://papers.nips.cc/paper/9343-causal-confusion-in-imitation-learning.pdf
Thank you!
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
?
I reviewed the PEARL paper.
PEARL (Probabilistic Embeddings for Actor-critic RL) is an off-policy meta-RL algorithm to achieve both meta-training and adaptation efficiency. It performs probabilistic encoder filtering of latent task variables to enables posterior sampling for structured and efficient exploration.
Outline
- Abstract
- Introduction
- Probabilistic Latent Context
- Off-Policy Meta-Reinforcement Learning
- Experiments
Link: https://arxiv.org/abs/1903.08254
Thank you!
PRM-RL: Long-range Robotics Navigation Tasks by Combining Reinforcement Learn...Dongmin Lee
?
I reviewed the PRM-RL paper.
PRM-RL (Probabilistic Roadmap-Reinforcement Learning) is a hierarchical method that combines sampling-based path planning with RL. It uses feature-based and deep neural net policies (DDPG) in continuous state and action spaces. In experiment, authors evaluate PRM- RL, both in simulation and on-robot, on two navigation tasks: end-to-end differential drive indoor navigation in office environments, and aerial cargo delivery in urban environments.
Outline
- Abstract
- Introduction
- Reinforcement Learning
- Methods
- Results
Thank you.
Exploration Strategies in Reinforcement LearningDongmin Lee
?
I presented about "Exploration Strategies in Reinforcement Learning" at AI Robotics KR.
- Exploration strategies in RL
1. Epsilon-greedy
2. Optimism in the face of uncertainty
3. Thompson (posterior) sampling
4. Information theoretic exploration (e.g., Entropy Regularization in RL)
Thank you.
Planning and Learning with Tabular MethodsDongmin Lee
?
1) The document discusses planning methods in reinforcement learning that use models of the environment to generate simulated experiences for training.
2) It introduces Dyna-Q, an algorithm that integrates planning, acting, model learning, and direct reinforcement learning by using a model to generate additional simulated experiences for training.
3) When the model is incorrect, planning may lead to suboptimal policies, but interaction with the real environment can sometimes discover and correct modeling errors; when changes make the environment better, planning may fail to find improved policies without encouraging exploration.
Hello~! :)
While studying the Sutton-Barto book, the traditional textbook for Reinforcement Learning, I created PPT about the Multi-armed Bandits, a Chapter 2.
If there are any mistakes, I would appreciate your feedback immediately.
Thank you.