This document provides an overview of deep deterministic policy gradient (DDPG), which combines aspects of DQN and policy gradient methods to enable deep reinforcement learning with continuous action spaces. It summarizes DQN and its limitations for continuous domains. It then explains policy gradient methods like REINFORCE, actor-critic, and deterministic policy gradient (DPG) that can handle continuous action spaces. DDPG adopts key elements of DQN like experience replay and target networks, and models the policy as a deterministic function like DPG, to apply deep reinforcement learning to complex continuous control tasks.
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
?
I reviewed the following papers.
- T. Haarnoja, et al., ^Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., ^Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., ^Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
1. Deep learning techniques such as convolutional neural networks, recurrent neural networks, and autoencoders can be applied to recommender systems.
2. Convolutional neural networks are commonly used to extract features from images, audio, and video that can then be used for recommendation. Recurrent neural networks can model user sessions as sequences of clicks.
3. Autoencoders learn lower-dimensional representations of items that capture similarities and can be used to make recommendations, especially for cold start problems where little is known about new users or items.
Hello~! :)
While studying the Sutton-Barto book, the traditional textbook for Reinforcement Learning, I created PPT about the Multi-armed Bandits, a Chapter 2.
If there are any mistakes, I would appreciate your feedback immediately.
Thank you.
This document discusses applying deep learning techniques to medical data. It notes that medical data presents unique challenges compared to typical deep learning datasets, such as unclear or unbalanced annotations, and differences between medical images and other image domains. It explores approaches that have been used to address medical data issues, such as using more data, pre-trained models with fine-tuning, different network architectures, and incorporating additional context data from patient records. The document emphasizes that solving medical AI problems requires collaboration between AI and medical experts to effectively leverage both data and domain expertise.
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
?
I reviewed the following papers.
- T. Haarnoja, et al., ^Reinforcement Learning with Deep Energy-Based Policies", ICML 2017
- T. Haarnoja, et al., ^Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor", ICML 2018
- T. Haarnoja, et al., ^Soft Actor-Critic Algorithms and Applications", arXiv preprint 2018
Thank you.
1. Deep learning techniques such as convolutional neural networks, recurrent neural networks, and autoencoders can be applied to recommender systems.
2. Convolutional neural networks are commonly used to extract features from images, audio, and video that can then be used for recommendation. Recurrent neural networks can model user sessions as sequences of clicks.
3. Autoencoders learn lower-dimensional representations of items that capture similarities and can be used to make recommendations, especially for cold start problems where little is known about new users or items.
Hello~! :)
While studying the Sutton-Barto book, the traditional textbook for Reinforcement Learning, I created PPT about the Multi-armed Bandits, a Chapter 2.
If there are any mistakes, I would appreciate your feedback immediately.
Thank you.
This document discusses applying deep learning techniques to medical data. It notes that medical data presents unique challenges compared to typical deep learning datasets, such as unclear or unbalanced annotations, and differences between medical images and other image domains. It explores approaches that have been used to address medical data issues, such as using more data, pre-trained models with fine-tuning, different network architectures, and incorporating additional context data from patient records. The document emphasizes that solving medical AI problems requires collaboration between AI and medical experts to effectively leverage both data and domain expertise.
Imagination-Augmented Agents for Deep Reinforcement Learning?? ?
?
I will introduce a paper about I2A architecture made by deepmind. That is about Imagination-Augmented Agents for Deep Reinforcement Learning
This slide were presented at Deep Learning Study group in DAVIAN LAB.
Paper link: https://arxiv.org/abs/1707.06203
Dsh data sensitive hashing for high dimensional k-nn searchWooSung Choi
?
Gao, Jinyang, et al. "Dsh: data sensitive hashing for high-dimensional k-nnsearch." Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014.
13. ???? ?
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Reinforcement learning is what to do - how to map situations to actions -
so as to maximize a numerical reward signal
38. FrozenLake-v0
[State]
S : starting point, safe
F : frozen surface, safe
H : hole, fall to your doom
G : goal
[Action]
LEFT = 0
DOWN = 1
RIGHT = 2
UP = 3
S F F F
F H F H
F F F H
H F F G
39. Q-Learning
S F F F
F H F H
F F F H
H F F G
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
Reward : 1
( ?? 1?? ?? )
40. Q-Learning
S F F F
F H F H
F F F H
H F F G
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 1
0
0
0 0
0
Reward : 1
( ?? 1?? ?? )
41. Q-Learning
S F F F
F H F H
F F F H
H F F G
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
1
0
0 1
0
0
0 0
0
Reward : 1
( ?? 1?? ?? )
42. Q-Learning
S F F F
F H F H
F F F H
H F F G
0
0 1
0
0
0 1
0
0
0 0
1
0
0 0
1
0
0 0
1
0
0 1
0
0
0 0
0
Reward : 1
( ?? 1?? ?? )
43. Greedy policy? ??
- ??? ?? ?? near optimal policy? ???? ?? ??? ????
??.
S F F F
F F F H
F F F H
H F F G
54. Function Approximation
- Function Approximation example
S F F F
F H F H
F F F H
H F F G
Function
Approximator
(w)
Q(s, Left)
Q(s, Right)
Q(s, Up)
Q(s, Down)
< Current State s >
109. Distributional RL
- Target value distribution? ?? value distribution?? ??? ???
???? ??
- KL-Divergence
¢ Loss function
: Target value distribution
110. Distributional RL
1. ?? state? ?? value ??? ?? :
2. Target atom? ??
¢ Target value distribution
127. Reference
¢ Sutton, R. and Barto, A., Reinforcement Learning: An Introduction, 2nd ed., MIT Press, 2018.
¢ V. Mnih et al., "Human-level control through deep reinforcement learning." Nature, 518 (7540):529C533, 2015.
¢ van Hasselt et al., "Deep Reinforcement Learning with Double Q-learning." arXiv preprint arXiv:1509.06461, 2015.
¢ T. Schaul et al., "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952, 2015.
¢ Z. Wang et al., "Dueling Network Architectures for Deep Reinforcement Learning." arXiv preprint arXiv:1511.06581, 2015.
¢ M. Fortunato et al., "Noisy Networks for Exploration." arXiv preprint arXiv:1706.10295, 2017.
¢ M. G. Bellemare et al., "A Distributional Perspective on Reinforcement Learning." arXiv preprint arXiv:1707.06887, 2017.
¢ M. Hessel et al., "Rainbow: Combining Improvements in Deep Reinforcement Learning." arXiv preprint arXiv:1710.02298, 2017.
¢ ???, ^???? ? DeepRL ̄, https://tykimos.github.io/warehouse/2018-2-7-ISS_Near_and_Far_DeepRL_4.pdf
¢ David Silver, ^Lecture 6 in UCL Course on RL ̄ , http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf
¢ RLKorea ???, ^A Distributional Perspective on Reinforcement Learning ̄, https://reinforcement-learning-kr.github.io/2018/10/02/C51/