際際滷

際際滷Share a Scribd company logo
???? ??? ?? :
Rainbow
???? ????
??? Curt Park
curt.park@medipixel.io
AI Research / Developer in Medipixel
?? ??
??? Kyunghwan Kim
kh.kim@medipixel.io
AI Research / Developer in Medipixel
Guidewire control by RL
https://youtu.be/uAZtUNwA4i0
??
1. ???? ?
2. ???? ??? Deep ????
- Q-Learning
- Function approximation
- DQN
- Rainbow DQN
???? ?
?????
DeepMind AlphaGo
https://www.youtube.com/watch?v=TmPfTpjtdgg
?????
DeepMind DQN with Atari game
https://youtu.be/UZHTNBMAfAA
?????
OpenAI Dota2
https://www.youtube.com/watch?v=cUTMhmVh1qs
?????
Deepmind AlphaStar
?????
https://youtu.be/Dr0RvX1F-YQ
^Sim-to-Real Reinforcement Learning for Deformable Object Manipulation ̄
J. Matas, S. James and A. J Davison
CoRL, 2018
?????
https://www.youtube.com/watch?v=FmMPHL3TcrE
^Learning to Walk via Deep Reinforcement Learning ̄
T. Haarnoja et al.
arXiv:1812.11103v1
???? ?
¢ ????? ?? ??
+ 10 - 100
???? ?
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Reinforcement learning is what to do - how to map situations to actions -
so as to maximize a numerical reward signal
???? ?
¢ ???? (Reinforcement Learning)
- ??(Reward)? ?? ??.
- ??? ?? ??(State)?? ???(Agent)? ??? ??
(Action)? ???? ? ??(Environment)? ??
- ??? ????? ??? ??? ??.
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Markov Decision Process
Agent Environment
State, Reward
Action
< MDP framework >
????? ??
? Trial-and-error search
$ ??? ??(Agent)? ??? ?? ??? ??? ???? ??? ??
??? ??.
? Delayed reward
$ ??? ????? ?? ??? ??? ??? ??? ??? ???
???? ??? ?.
¢ ????? ??
????? ??
¢ Trial-and-error search
? Exploitation (??)
? Exploration (??)
- ??? policy? ??? action? ??
- ??? ?? ???? action ??
??? ??
< ?? > < ??? ??? >
Exploitation Exploration
< Exploitation & Exploration >
????? ??
¢ Delayed reward
- ??? ??? ??? ??? ??? ?? ??? ???? ??(reward)
??? ?? ??? ??? ?? ???.
- ??? ?? ???? ??? ????? !
Return !
Return
S0 S1 S2 S3 S4 ST
R1
< ???? ?? >
Return
S0 S1 S2 S3 S4 ST
R1 R2 R3 R4 R5
< ???? ?? >
Return
¢ Return
? ?? t? state ???? ??? time step T ?? ??? ??? ???
??
Return
¢ Discounted Return
? ???? ??? ??? ??
Policy
MDP ???? ??? ??? ? ??
Policy
¢ Policy (??)
? ?? State?? ?? Action? ?? ??.
$ π(aOs) ? ??.
$ ????? ??? ??? ?? ?.
$ MDP ??? ??? ! ★ ??? ??? ??? !
?? ?? ??
- ?? π ? ??? ? ?? ?? :
- ??? ??
?? ?? ??
?? ??? goodness ??? ? ? ????
?? ?? ??
?? ??? goodness ??? ? ? ????
★ Value function !
Value function
¢ Value function
? Policy π ? ?? ?? ??? ??? ?? ?? ??
- ?? ??(Policy)?? ?? ??(??)? ?? ??? ??.
- ?? ??? ?? : ?? s??? Return? ???.
- ????? ??? ??? ?? ? ? ?? !
Value function
¢ State value function
? ?? π ? ?? ? ?? state? ?? ??? ??
$ ?? : ??? ? ?? ?? ???
Value function
¢ State-action value function (Q ??)
? ?? π ? ?? ? ?? state? action? ?? ??? ??
????? Value Function
??(Reward)? ??? ?? ??? ???
??? ??? ??? ??(Return)? ??? ?? ??? ???
????? ??? ?? ??? ???
Expected Return
=Value Function
Q Learning
Value-based RL
?? State?? ? ? ?? Action? ??? ???
?? ??? ?? Action? ??? !
Value-based RL
?? State?? ? ? ?? Action? ??? ???
?? ??? ?? Action? ??? !
Value function
Greedy policy
Value-based RL
- ?? : Q - value
action 1
Q : 10
action 2
Q : -5
Q Learning
- Q learning
¢ Value function ????
FrozenLake-v0
[State]
S : starting point, safe
F : frozen surface, safe
H : hole, fall to your doom
G : goal
[Action]
LEFT = 0
DOWN = 1
RIGHT = 2
UP = 3
S F F F
F H F H
F F F H
H F F G
Q-Learning
S F F F
F H F H
F F F H
H F F G
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
Reward : 1
( ?? 1?? ?? )
Q-Learning
S F F F
F H F H
F F F H
H F F G
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 1
0
0
0 0
0
Reward : 1
( ?? 1?? ?? )
Q-Learning
S F F F
F H F H
F F F H
H F F G
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
1
0
0 1
0
0
0 0
0
Reward : 1
( ?? 1?? ?? )
Q-Learning
S F F F
F H F H
F F F H
H F F G
0
0 1
0
0
0 1
0
0
0 0
1
0
0 0
1
0
0 0
1
0
0 1
0
0
0 0
0
Reward : 1
( ?? 1?? ?? )
Greedy policy? ??
- ??? ?? ?? near optimal policy? ???? ?? ??? ????
??.
S F F F
F F F H
F F F H
H F F G
? - greedy Policy
- ? ?? ?? exploration ? exploitation ? ??? ??
< ?? > < ??? ??? >
Exploitation Exploration
60 % 40 %
? - greedy Policy
S F F F
F F F H
F F F H
H F F G
What we have learned?
ref : ???? ? DeepRL(???)
https://tykimos.github.io/warehouse/2018-2-7-ISS_Near_and_Far_DeepRL_4.pdf
- ? state, action? ???? Q ?? ?? ?? ? ????
¢ Tabular method
Tabular method ? ??
Tabular method? ??
?? : ? ??? : 84 x 84 x 3 ?? : Continuous
State space? ??? ?
Tabular method? ??
- ?? ??? ?? ★ Large state space
- ??? state? ??? output? ?? ?? ???? (Generalization !)
- parameter? ??? ??? ?????. ★ Function Approximation
- How? Neural Net + Deep Learning !!
Function Approximation
Q Learning
Tabular vs Function Approximation
- ?? state, action? ?? Q ?? table? ???? ????.
- state, action? ??? ???? space complexity ??? ??? ???
- parameter? ??? Q ?? ??.
- ??? ?? ?????? ??.
¢ Tabular method
¢ Function Approximation method
Function Approximation
- parameter : w
ref : David Silver
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf
Function Approximation
- parameter : w
ref : David Silver
http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf
Function Approximation
- Function Approximation example
S F F F
F H F H
F F F H
H F F G
Function
Approximator
(w)
Q(s, Left)
Q(s, Right)
Q(s, Up)
Q(s, Down)
< Current State s >
Tabular vs Function Approximation
- Tabular : state, action? ?? Q ?? ??????.
¢ ' ????. ' ? ??
- F.A. : parameter? gradient descent? ??????.
Deep Q Network
DQN
https://www.youtube.com/watch?v=TmPfTpjtdgg
¢ DQN ?? ?? ??
DQN
https://www.nature.com/articles/nature14236
¢ ???? DQN Agent? Human-level? ???? ??? ???
DQN
- Convolution Neural Network
- Experience Replay
- Fixed Target Network
¢ Deep Q Network (2015)
DQN - CNN
- ???(pixel) ???? ?? ???? ??.
- ??? ?? state? ???? ?? ??? ?? ?? ???? ??.
- ex. ??? ?? : ?? ??, ???, ´ ★ ??? ?? ???
???? ?? : ?? ??, ?? ??, ´ ★ ???? ?? ???
¢ Convolution Neural Network
DQN - CNN
https://www.nature.com/articles/nature14236
Output
? action?
?? Q value
Input
?? ??
pixel ?
DQN - Experience Replay
ref: ????????DQN?? (???)
/CurtPark1/dqn-reinforcement-learning-from-basics-to-dqn
¢ Challenge 1 - Correlation between samples
- ??????? sample? ??? ?? ????? ???? ??? correlation?
??
- Sample?? correlation? ??? ??? ??????.
DQN - Experience Replay
- transition(S, A, R, S¨)? memory(buffer)? ???? batch ??? ????.
- data(transition)?? correlation? ??.
- batch ??? ?? ??.
¢ Experience replay
DQN - Experience Replay
DQN Agent Environment
State, Reward
Action
Replay Buffer
[S, A, R, S¨]
[S, A, R, S¨]
batch?? sampling
N?? Transition ??
DQN - Target Network
¢ Challenge 2 - Non-stationary targets
- Loss function?? target ? current value ? ?? ???? w? ??
???.
- w? ???? ?? target? ??? ??.
target
=
DQN - Target Network
- ?? step?? ???? ?? network? ???? update?? target?? ??
k step ??
copy
Main Network Target Network
¢ Target network
DQN - ?? ??
¢ Gradient Clipping
- Loss function? ???? ???? 1 ??? ?? 1? ??? Clipping
ref: wiki
https://en.wikipedia.org/wiki/Huber_loss
Advanced DQN
Rainbow DQN?
Rainbow DQN?
1. Deep Q Network
2. Double Q-learning
3. Prioritized Replay
4. Dueling Networks
5. Multi-step Learning
6. Distributional RL
7. Noisy Network
Rainbow DQN
¢ Rainbow? ?? ?????? ??? ???? ??? ???
Double Q-Learning
Q-learning? ???
- Q-learning? maximization ???? Q? ????.
- maximization ??? overestimation ??? ??. (????)
- ?, Q-value? ???? ??? ???.
Double Q-learning
-0.1
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Double Q-learning
Left
?? reward : -0.1
Right
reward : 0
<
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Double Q-learning
A
Right 0
Left 0
B action1 0
< Q-table >
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Double Q-learning
A
Right 0
Left 0
B action1 0
< Q-table >
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Double Q-learning
A
Right 0
Left 0
B action1 +0.2
< Q-table >
-0.1 0.2
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Double Q-learning
A
Right 0
Left +0.2
B action1 +0.2
< Q-table >
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Double Q-learning
Q-learning
Q-learning (??)
Double Q-learning
or
Q ★ Q1, Q2
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Double Q-learning
10000? ??? ? ??
???
ref : Reinforcement Learning: An Introduction, 2nd ed (Sutton and Barto)
Double DQN
- ? ?? Q estimator Q1, Q2 ★ DQN? main Q, target Q
- main Q : Q ?? max? ?? action? ???.
- target Q : ??? ???? ???? ??.
Prioritized Replay
Prioritized Replay
" Replay Buffer? ??? ??? ! "
Prioritized Replay
- ?? ??? ??? ????
- ??? ?? ? ?? ??? ??? ??????
Prioritized Replay
- ?? ??? ??? ????
- ??? ?? ? ?? ??? ??? ??????
Prioritized Replay
?? ??? ????? ?? ???
Prioritized Replay
?? ??? ????? ?? ???
★ TD - Error !
Prioritized Replay
- TD error :
-
" TD error? ??? ????? ??? ! "
alpha = [0, 1]
alpha 0?? Uniform sample
Prioritized Replay
- ????? ??? ???? sampling ?? ???
- ?? ????? transition? ??? sampling ? transition? ?? ???
??? ??. ★ ??? bias? ??
- ??? update ? Importance-sampling weight? ?? !
???? ???? PER ??? ???
Prioritized Replay
beta = [0, 1]
beta 0?? weight ?? x
Dueling Networks
Dueling Networks
?? State?? ?? Action? ?? ?? ??
Dueling Networks
??? ??? ??? ??? ?? ? ?? !
10? 20?
-10? -20?
?? : 0
+5? 3?
-2? -3?
Dueling Networks
Dueling Networks
??? (state value)
Value Advantage
Q?? ???? ??
Dueling Networks
Q(s, a1)
Q(s, a2)
Q(s, a3)
S
Dueling Networks
A(s, a1)
A(s, a2)
A(s, a3)
S
V(s)
Q
Dueling Networks
Sum :
- But, ?? sum ???? Q? ?? V? A ?? unique ?? ??
- ex. Q = 4?? V + A? (1, 3), (2, 2), (3, 1) ? ?? ??? ?? ??
¢ Dueling Network??? Q ?? ??
Dueling Networks
Max :
Average :
¢ Dueling Network??? Q ?? ??
Dueling Networks
Max :
Average :
¢ Dueling Network??? Q ?? ??
max? ?? ??? V? A? ?????? ???
max? ??? ??? ??? ???? ???? ???
??? V? A? ???
Multi-Step Learning
Multi-Step Learning
S0 S1 S2 S3 S4 ST
R1
1 step :
S0 S1 S2 S3 S4 ST
R1 R2 R3
3 step :
Distributional RL
ref : RLKorea ???, A Distributional Perspective on Reinforcement Learning
https://reinforcement-learning-kr.github.io/2018/10/02/C51/
network
Distributional RL
S
Q
S
Q
network
" Return? ??? ????. "
Distributional RL
Distributional :
General :
Distributional RL
N
: Value distribution
- x? : atom (or support)
- y? : ? atom? ?? ?? ??
¢ Value distribution
Distributional RL
- Value distribution? ???? ??
¢ Q-value
Distributional RL
- Target value distribution? ?? value distribution?? ??? ???
???? ??
- KL-Divergence
¢ Loss function
: Target value distribution
Distributional RL
1. ?? state? ?? value ??? ?? :
2. Target atom? ??
¢ Target value distribution
Distributional RL
¢ Target value distribution
- x? : atom (or support)
- y? : ? atom? ?? ?? ??
??? ...
Distributional RL
¢ Target value distribution - Projection
- Target value distribution? value distribution? atom? ???
- Target atom? reward? ??? ?? ??? ???
- Projection? ?? ?? ??.
1 2 3 4 5 6 7 1 2.3 3.2 4.1 5 5.9 6.8
R = 0.5, ? = 0.9
Distributional RL
¢ Target value distribution - Projection
1 2.3 3.2 4.1 5 5.9 6.8
3 3.2 4
0.50.5 * (4 - 3.2)
= 0.4
0.5 * (3.2 - 3)
= 0.1
ref : RLKorea ???, A Distributional Perspective on Reinforcement Learning
https://reinforcement-learning-kr.github.io/2018/10/02/C51/
Distributional RL
¢ Target value distribution - Projection
1 2.3 3.2 4.1 5 5.9 6.8
1 2 3 4 5 6 7
network
Distributional RL
S
Q(s, a1)
Q(s, a2)
Q(s, a3)
Q(s, a4)
action size
¢ General
network
Distributional RL
S
action size x atom size
Expectation
Q(s, a1)
Q(s, a2)
Q(s, a3)
Q(s, a4)
action size
¢ Distributional
Noisy Network
Noisy Network
< ?? > < ??? ??? >
Exploitation Exploration
< Exploitation & Exploration >
¢ Exploitation? Exploration
Noisy Network
? - greedy ?? ???? exploration
??? ??? ?
Noisy Network
? - greedy Policy
Random perturbations of the policy
??, ??
Large-scale behavioral pattern???
????
Noisy Network
" Network? noise? ???? exploration? ??. "
perturbations
Noisy Network
" Network? noise? ???? exploration? ??. "
perturbations
State-dependent ? exploration? ? ? ??.
Noisy Network
Q(s, a1)
Q(s, a2)
Q(s, a3)
S
element-wise
multiplication
:
Noisy Network
1. Independent Gaussian noise
- noise? weight, bias size ?? ?? ? weight? ??
- noise? ??? ?? ??: (p x q) + q
¢ Gaussian noise? ??? ??
p x q q
Noisy Network
2. Factorised Gaussian noise
- input size(p)? noise? output size(q)? noise? ??
- ? noise? ??? (p x q) size? noise? ??
- noise? ??? ?? ??: p + q
p
q
?????.
Reference
¢ Sutton, R. and Barto, A., Reinforcement Learning: An Introduction, 2nd ed., MIT Press, 2018.
¢ V. Mnih et al., "Human-level control through deep reinforcement learning." Nature, 518 (7540):529C533, 2015.
¢ van Hasselt et al., "Deep Reinforcement Learning with Double Q-learning." arXiv preprint arXiv:1509.06461, 2015.
¢ T. Schaul et al., "Prioritized Experience Replay." arXiv preprint arXiv:1511.05952, 2015.
¢ Z. Wang et al., "Dueling Network Architectures for Deep Reinforcement Learning." arXiv preprint arXiv:1511.06581, 2015.
¢ M. Fortunato et al., "Noisy Networks for Exploration." arXiv preprint arXiv:1706.10295, 2017.
¢ M. G. Bellemare et al., "A Distributional Perspective on Reinforcement Learning." arXiv preprint arXiv:1707.06887, 2017.
¢ M. Hessel et al., "Rainbow: Combining Improvements in Deep Reinforcement Learning." arXiv preprint arXiv:1710.02298, 2017.
¢ ???, ^???? ? DeepRL ̄, https://tykimos.github.io/warehouse/2018-2-7-ISS_Near_and_Far_DeepRL_4.pdf
¢ David Silver, ^Lecture 6 in UCL Course on RL ̄ , http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/FA.pdf
¢ RLKorea ???, ^A Distributional Perspective on Reinforcement Learning ̄, https://reinforcement-learning-kr.github.io/2018/10/02/C51/

More Related Content

What's hot (20)

???? ???? ??? ???? NAVER 2017
???? ???? ??? ???? NAVER 2017???? ???? ??? ???? NAVER 2017
???? ???? ??? ???? NAVER 2017
Taehoon Kim
?
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
?
???? ???? ??? ???? ????
???? ???? ??? ???? ???????? ???? ??? ???? ????
???? ???? ??? ???? ????
Woong won Lee
?
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
?
?? ?? ?? Reinforcement Learning an introduction
?? ?? ?? Reinforcement Learning an introduction?? ?? ?? Reinforcement Learning an introduction
?? ?? ?? Reinforcement Learning an introduction
Taehoon Kim
?
?.?.?.?. ????!
?.?.?.?. ????!?.?.?.?. ????!
?.?.?.?. ????!
Dongmin Lee
?
?????? ?? ?
?????? ?? ??????? ?? ?
?????? ?? ?
NAVER Engineering
?
[RLkorea] ??? ??? ??
[RLkorea] ??? ??? ??[RLkorea] ??? ??? ??
[RLkorea] ??? ??? ??
ashley ryu
?
1???? GAN(Generative Adversarial Network) ?? ????
1???? GAN(Generative Adversarial Network) ?? ????1???? GAN(Generative Adversarial Network) ?? ????
1???? GAN(Generative Adversarial Network) ?? ????
NAVER Engineering
?
?? ???? SVM(?, ???? ????)
?? ???? SVM(?, ???? ????)?? ???? SVM(?, ???? ????)
?? ???? SVM(?, ???? ????)
SANG WON PARK
?
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
Dongmin Lee
?
Natural Policy Gradient ??? ??
Natural Policy Gradient ??? ??Natural Policy Gradient ??? ??
Natural Policy Gradient ??? ??
Sooyoung Moon
?
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Alexandros Karatzoglou
?
????? ??? ???? ?? ???
????? ??? ???? ?? ???????? ??? ???? ?? ???
????? ??? ???? ?? ???
?? ?
?
Q Learning? CNN? ??? Object Localization
Q Learning? CNN? ??? Object LocalizationQ Learning? CNN? ??? Object Localization
Q Learning? CNN? ??? Object Localization
?? ?
?
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
Dongmin Lee
?
Chapter 10 sequence modeling recurrent and recursive nets
Chapter 10 sequence modeling recurrent and recursive netsChapter 10 sequence modeling recurrent and recursive nets
Chapter 10 sequence modeling recurrent and recursive nets
KyeongUkJang
?
わかりやすいパタ`ンJR 4嫗
わかりやすいパタ`ンJR 4嫗わかりやすいパタ`ンJR 4嫗
わかりやすいパタ`ンJR 4嫗
Motokawa Tetsuya
?
Applying deep learning to medical data
Applying deep learning to medical dataApplying deep learning to medical data
Applying deep learning to medical data
Hyun-seok Min
?
boosting ?? ?? (bagging vs boosting)
boosting ?? ?? (bagging vs boosting)boosting ?? ?? (bagging vs boosting)
boosting ?? ?? (bagging vs boosting)
SANG WON PARK
?
???? ???? ??? ???? NAVER 2017
???? ???? ??? ???? NAVER 2017???? ???? ??? ???? NAVER 2017
???? ???? ??? ???? NAVER 2017
Taehoon Kim
?
Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)Introduction to SAC(Soft Actor-Critic)
Introduction to SAC(Soft Actor-Critic)
Suhyun Cho
?
???? ???? ??? ???? ????
???? ???? ??? ???? ???????? ???? ??? ???? ????
???? ???? ??? ???? ????
Woong won Lee
?
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
NAVER Engineering
?
?? ?? ?? Reinforcement Learning an introduction
?? ?? ?? Reinforcement Learning an introduction?? ?? ?? Reinforcement Learning an introduction
?? ?? ?? Reinforcement Learning an introduction
Taehoon Kim
?
[RLkorea] ??? ??? ??
[RLkorea] ??? ??? ??[RLkorea] ??? ??? ??
[RLkorea] ??? ??? ??
ashley ryu
?
1???? GAN(Generative Adversarial Network) ?? ????
1???? GAN(Generative Adversarial Network) ?? ????1???? GAN(Generative Adversarial Network) ?? ????
1???? GAN(Generative Adversarial Network) ?? ????
NAVER Engineering
?
?? ???? SVM(?, ???? ????)
?? ???? SVM(?, ???? ????)?? ???? SVM(?, ???? ????)
?? ???? SVM(?, ???? ????)
SANG WON PARK
?
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
Dongmin Lee
?
Natural Policy Gradient ??? ??
Natural Policy Gradient ??? ??Natural Policy Gradient ??? ??
Natural Policy Gradient ??? ??
Sooyoung Moon
?
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Deep Learning for Recommender Systems  - Budapest RecSys MeetupDeep Learning for Recommender Systems  - Budapest RecSys Meetup
Deep Learning for Recommender Systems - Budapest RecSys Meetup
Alexandros Karatzoglou
?
????? ??? ???? ?? ???
????? ??? ???? ?? ???????? ??? ???? ?? ???
????? ??? ???? ?? ???
?? ?
?
Q Learning? CNN? ??? Object Localization
Q Learning? CNN? ??? Object LocalizationQ Learning? CNN? ??? Object Localization
Q Learning? CNN? ??? Object Localization
?? ?
?
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
Dongmin Lee
?
Chapter 10 sequence modeling recurrent and recursive nets
Chapter 10 sequence modeling recurrent and recursive netsChapter 10 sequence modeling recurrent and recursive nets
Chapter 10 sequence modeling recurrent and recursive nets
KyeongUkJang
?
わかりやすいパタ`ンJR 4嫗
わかりやすいパタ`ンJR 4嫗わかりやすいパタ`ンJR 4嫗
わかりやすいパタ`ンJR 4嫗
Motokawa Tetsuya
?
Applying deep learning to medical data
Applying deep learning to medical dataApplying deep learning to medical data
Applying deep learning to medical data
Hyun-seok Min
?
boosting ?? ?? (bagging vs boosting)
boosting ?? ?? (bagging vs boosting)boosting ?? ?? (bagging vs boosting)
boosting ?? ?? (bagging vs boosting)
SANG WON PARK
?

Similar to ???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon) (20)

Rl
RlRl
Rl
wonseok jung
?
ML + ?? phase 2
ML + ??  phase 2ML + ??  phase 2
ML + ?? phase 2
HoChul Shin
?
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5
SANG WON PARK
?
Dqn break
Dqn breakDqn break
Dqn break
Juntae Kim
?
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Tae Young Lee
?
Policy gradient
Policy gradientPolicy gradient
Policy gradient
?? ?
?
???? ?? ????? tensor-flow
???? ?? ????? tensor-flow???? ?? ????? tensor-flow
???? ?? ????? tensor-flow
? ??
?
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQN
Curt Park
?
CS294-112 Lecture 13
CS294-112 Lecture 13CS294-112 Lecture 13
CS294-112 Lecture 13
Gyubin Son
?
??????????? & Unity ML Agents
??????????? & Unity ML Agents??????????? & Unity ML Agents
??????????? & Unity ML Agents
Hyunjong Lee
?
Ch9,10. Deployments and Statefulsets
Ch9,10. Deployments and StatefulsetsCh9,10. Deployments and Statefulsets
Ch9,10. Deployments and Statefulsets
Hongmin Park
?
13-DfdasdfsafdsafdasfdasfdsadfasfdsafNN.pptx
13-DfdasdfsafdsafdasfdasfdsadfasfdsafNN.pptx13-DfdasdfsafdsafdasfdasfdsadfasfdsafNN.pptx
13-DfdasdfsafdsafdasfdasfdsadfasfdsafNN.pptx
HinPhmXun1
?
World model
World modelWorld model
World model
Sooyoung Moon
?
Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
Dong Heon Cho
?
???? ?? ???? ??? ??? ??? AI ????
???? ?? ???? ??? ??? ??? AI ???????? ?? ???? ??? ??? ??? AI ????
???? ?? ???? ??? ??? ??? AI ????
NAVER D2
?
Rainbow? ?? ? ?? (The Rainbow's adventure in the vessel) (RL Korea)
Rainbow? ?? ? ?? (The Rainbow's adventure in the vessel) (RL Korea)Rainbow? ?? ? ?? (The Rainbow's adventure in the vessel) (RL Korea)
Rainbow? ?? ? ?? (The Rainbow's adventure in the vessel) (RL Korea)
Kyunghwan Kim
?
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
?? ?
?
Ml for ?????
Ml for ?????Ml for ?????
Ml for ?????
JEEHYUN PAIK
?
[??]Chap11 ????
[??]Chap11 ????[??]Chap11 ????
[??]Chap11 ????
?? ?
?
Dsh data sensitive hashing for high dimensional k-nn search
Dsh  data sensitive hashing for high dimensional k-nn searchDsh  data sensitive hashing for high dimensional k-nn search
Dsh data sensitive hashing for high dimensional k-nn search
WooSung Choi
?
Reinforcement learning v0.5
Reinforcement learning v0.5Reinforcement learning v0.5
Reinforcement learning v0.5
SANG WON PARK
?
Policy gradient
Policy gradientPolicy gradient
Policy gradient
?? ?
?
???? ?? ????? tensor-flow
???? ?? ????? tensor-flow???? ?? ????? tensor-flow
???? ?? ????? tensor-flow
? ??
?
Introduction toDQN
Introduction toDQNIntroduction toDQN
Introduction toDQN
Curt Park
?
CS294-112 Lecture 13
CS294-112 Lecture 13CS294-112 Lecture 13
CS294-112 Lecture 13
Gyubin Son
?
??????????? & Unity ML Agents
??????????? & Unity ML Agents??????????? & Unity ML Agents
??????????? & Unity ML Agents
Hyunjong Lee
?
Ch9,10. Deployments and Statefulsets
Ch9,10. Deployments and StatefulsetsCh9,10. Deployments and Statefulsets
Ch9,10. Deployments and Statefulsets
Hongmin Park
?
13-DfdasdfsafdsafdasfdasfdsadfasfdsafNN.pptx
13-DfdasdfsafdsafdasfdasfdsadfasfdsafNN.pptx13-DfdasdfsafdsafdasfdasfdsadfasfdsafNN.pptx
13-DfdasdfsafdsafdasfdasfdsadfasfdsafNN.pptx
HinPhmXun1
?
Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
Dong Heon Cho
?
???? ?? ???? ??? ??? ??? AI ????
???? ?? ???? ??? ??? ??? AI ???????? ?? ???? ??? ??? ??? AI ????
???? ?? ???? ??? ??? ??? AI ????
NAVER D2
?
Rainbow? ?? ? ?? (The Rainbow's adventure in the vessel) (RL Korea)
Rainbow? ?? ? ?? (The Rainbow's adventure in the vessel) (RL Korea)Rainbow? ?? ? ?? (The Rainbow's adventure in the vessel) (RL Korea)
Rainbow? ?? ? ?? (The Rainbow's adventure in the vessel) (RL Korea)
Kyunghwan Kim
?
Imagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement LearningImagination-Augmented Agents for Deep Reinforcement Learning
Imagination-Augmented Agents for Deep Reinforcement Learning
?? ?
?
[??]Chap11 ????
[??]Chap11 ????[??]Chap11 ????
[??]Chap11 ????
?? ?
?
Dsh data sensitive hashing for high dimensional k-nn search
Dsh  data sensitive hashing for high dimensional k-nn searchDsh  data sensitive hashing for high dimensional k-nn search
Dsh data sensitive hashing for high dimensional k-nn search
WooSung Choi
?

???? ??? ??: Rainbow ???? ???? (2nd dlcat in Daejeon)