ݺߣ

ݺߣShare a Scribd company logo
???? ??? 1??
2018.08.21
Jicheol Woo
??? ????? ?? ?? ??? ? ?? ??? ?? ?? ????
????(Interaction). 

? ??? ?? ???? ??? ?? ??? ???, ??? ???
computational ?? ???? ?? machine learning.
?? (Learning)
Process 

??? ???? ????? ??? ??? ???? ?? ??? ???. 

??? ? ????? ?????? ??(reward)? ???. 

??? ??/?? ?? ??.

????? ???? ????? ? ????? ??? ??? ??? ???
?? ???? ???? ??? ?? ?????.
???? (Reinforcement Learning)
Trial & Error
???? ????

??? ?? ???? ???? ?? ??? ???? ??? ??? ??? ?

??? ?? Reward? ? ?? ?? ???? ??
????? ?? ??

??? ??? ?? ??? ?? ??? ?? ? ???? ?? ??? ??? ???
? ??? (?? ??? ???? ???? ? ?? ??? ???), ?? ? ?? ?
?? ?? ?????? ???? ??? ? ??.
RL Essential things
Delayed Reward
??? ?? ????????? ???? ??? ?? (??? ????)

Trial & Error ?? =>???? ?? ??? ??? ?? ??

??, ??? ??? ??? ??? ?? ????? ??.

??? ??????? ??? ?????? ????? ??? ??? ? ? ??.
Trial & Error
1950???? ?????, ?? ????? ??? ?????? ????? ??
? ?? ?? ??.

Bellman? Bellman equation? ??? optimal control ??? ???, ???
??? DP?? ??. 

??, Bellman? MDP?? ??? ??? ???? ????? ??? ??.
Optimal Control
???? ???? + ????? ???? ??.
Reinforcement Learning is de?ned
not by characterizing learning methods,
but by characterizing a learning problem.
???? ???? ???? ?? ??? ???? ????? ??????.
???? (Reinforcement Learning) ??
Sutton
1. Fly stunt manoeuvres in a helicopter

2. Defeat the world champion at Backgammon

3. Manage an investment portfolio

4. Control a power station

5. Make a humanoid robot walk

6. Play many di?erent Atari games better than humans
???? (Reinforcement Learning) ??
??? ?? ?? ??
????? ??? agent
???? (Reinforcement Learning) ??
Agent? ??? ???? ?????, ????? ???? ?? agent? ? ?
?? ?? ??? ???? ???? ?? ??.

??? ???? ???? ???? ?? ??? ??? ???? ? episode ??
?? ??? ??????? ?.

? ??? Agent? ??? ??? ??? ???? Policy? ??. 

?? ?? ???? ?? ??? ? ??? ???? ??? ? ??? ?? ???
???? ?? ?? => Policy

Trial & Error
1. CNN? ??

2. ?? agent? ?? ?? ??? ???? ??? ? ??? ?

3. Deep Neural Net? function approximator? ??

4. Experience Relay

5. Target networks
Deep Reinforcement Learning
= Deep Learning + Reinforcement Learning
??? ???? ?? ??? ???? gridworld? ?? ?? ???? ????
? ??? ??? ?????, ?? ??? ??? ??? ???? ??? ?? ?
? ??? ??? ???? ??.

???? ??? ? ??? ???? ??? ???? ???? ?? ??? ???
??? ???? ???? ???? ????? ??? ? ? ??? ? ?
=> approximation.

? approximation ? ??? Deep NN? ??? ??? ??? ??? ??? ?.
? ??? ?? ??? ????? ???? ?? ???? ? ??? ?? ? ???
? ?? ???? ? ??? ?? ?, ? ?? ??? ??? ? ? ??, ? ??
state(??)? Markov??? ???.
=
??????? value?? ?? ??? ??? state? ?? ??? ???? ?
state? Markov ??? ????.
Markov Decision Process (MDP)
??? ??? ???? ???? ??? MDP.

MDP (Markov Decision Process) ?
: state, action, state transition probability matrix, reward, discount factor

??? ?? ?? : state

????? ???? ? : action

?? : reward

MDP? ?? ??? ??
Markov Decision Process (MDP)
Agent? ???? ??? ??. ??? ???? ?? ???? ?? ??? ??

???? ?? ?? ??? ??? ??? ????? ??? ? ? ??

??/??? ?? ??? ?? ???.

e.g. ???? Agent? ??? ???? ??, ??, ??? ?? ??.
??? ???? ??. 

??? ?? ??? ?????? ?? ?????? ?? ?? ??? ? ????? ? ?
?? ????? ?? ?? ??? ??.
State
Agent? ???? ?? ??? ?? ? action? ???? ??? ?. 

??? ???? ?. ????? controller.

Agent? ? ???? ?? ? ?? ??? ???
State -> state? ?? ??

?? action? ??? ?? state? deterministic?? ?? ????? ???? ??
??? noise.
Action
State Transition Probability
MDP?? action? reward? ??? ??
???? State-state ??? transition
matrix? ???? ? ??.

Markov chain? state-state ??? ?
state ??? transition probability ? ??
?? ?.

? Markov chain?? ??? ???? ??
? ?? sleep?? ??? ??? ? ?? ?
?? ?? ??? stationary distribution.

?? ??? MDP??? action? ? ???
action? ?? ?? state? ? ??? ??
?.
Markov Chain
s?? state? ?? ?, a?? action? ??? ? ?? ? ?? reward. 

Agent ? ??? ? ?? ??.

Agent? Reward? ??? ?? ???? ??? ??.
? reward? immediate reward?? agent? ??? ????? ?
?? Reward? ?? ?? ??? ??? ?? Reward?? ??.
Reward
??? agent? ? time-step?? 0.1? ???? ?? ?? agent? 1? ??? ?
?, ???? ??? ???? ?? ???

???? 1? ??? ?? ?? ? 1? ?? ??? ?? ???
0 ?? ??? ???? / 1?? ??? ?????
Discount Factor
agent? ?? state? ???? action? ????? ?? state?? ?? action? ?
?? ???? ?? ????? ?.

?? ????? ??? ? optimal policy? ?? ?.
Policy
MDP graph? ?? ? Markov chain?? state ??? transition? ??
? ??? ?? action? ?? state? transition? reward? ??
MDP graph
State value Function
Agent? state1? ??, ???? action? ???? ????, ?? ?
? Reward?? ???? ??? ??. ??? ????? ?? ?? ??
Reward?? ?? ??? ?? return ? ??.
Value Function
Return? expectation? state-value function.

Value function? ?? state s? ???? ??? ??.
State-Value Function
Value function? ??? agent? ?? state?? ??? ???? ??
?? ???? ??? state ?? value function? ????, ??? ?
???? ??? value function? ????? ??.
Value function? return(?? ??? ??? ?? reward? discounted
amount)? ????? ??? ?? ?? ???? ???? ??? ????? ?
??? ?? ? ??.

?? ? state??? ????? ? ??? ???? ?????? ?????? ?
?? ???? ?? ????? ??? ? value-function? ????? ????
???? ??? ?????? ? ???? ??? ???? true value function
? ?? ? ??.
State-Value Function
Value Function ??? ?
???? ?? ???? true??? ???? ?? ??? ??
State-Value Function
Policy? ?????.
?? agent? ?? ????(?? ???? ?? ?? ??)? ??? ???? ?, ????
?????? agent? ???? state?? ?? ?? ?? ??? ?? ??? ?. (?? ??
?? ??? ??? ???? ??? ?? ????? ?? ?? ??)

??? ??? value function? ???? ?? ??? (??? ??? ??? ??? ???
??) agent? ??? ?? ??? ????? ?.

?? ?? ??? ??? ? ?? ?? Policy? ?????? ?? ??, ? policy??
value-function? ??? ? ???? ? value-function? ??? ?? policy? ?? ?
?? ?? ?.
State-Value Function
???? ???? ??????? value-function? ??? ? ??????
??? ??? ?? ??. 

? ??? ?? bias ?? ??, varience? ??? true?? ???? ???
?? ?? ?? ?? ???? ?.
State-Value Function
action?? ?? ???? ? ? ?? ???. ?? ?? ???? ??? ??? ??.

?? state-value function?? ? state? ???? ?? ? state?? ?? ???
???? ?? ???? ???? ?? ??? ??.

agent? ???? ?? ??? ???? ??? state?? value-function?? ?
???? ???? ?? state?? ?? ??? ?? ?????, ? ??? ??? ?
?? ?????? ??? ??.

??? ?? ??? state? ?? value function?? action? ?? value
function? ?? ? ??? ?? action-value function.
Action-Value Function
Action value function? ???? value function?? ?? ?? ?? ???
?? action value function? ?? ???? ?? ??? ?? ???? ???
?? ?? ?? ??? ?? ? ??? ?? ? ??? ? ??? ??.
Action-Value Function
?? state s?? action a? ?? ?? ?? return? ?? ?????? ?? ?
?? ?? ?, ??? ?? ???.

??? ???? ????? state-value function? ??? action-value
function? ??? ?. ?
=> action-value function? ?? ?? Q-value?? q-learning?? deep
q-network ?? ?? ???? q??.
?? ?? value function? ???

??? ???? ??? ??? ??? ???? ??
Bellman equation.
Bellman Equation
Agent? value function? ??? ??? ??? ????? ? ??? ??? ??.
Bellman Expectation Equation
State-value function? ??? ?? ? ???? ??? ? ? ??.
Bellman (Expectation) Equation for value function
Policy ??? ??? action-value function? ??? ?? ??? ? ??.
???, ?? ?? ????? ???? ?? ?? ????? ????? ?? ? ?
?. ??? ?? ?? ?? ???? ???? ?? ??. (state-action pair)
Bellman (Expectation) Equation for value function
? ?? state, ?? ?? state?? action? ??? ??. 

state?? ????? ??? action? ????.

?? v? q? ??? ??? ??? ?? ??.

? ??? ? ??? ? ??? ?? ?? expected return? ?? ??
? sum? ?? state? value function? ?.
Bellman (Expectation) Equation for value function
? ????? reward? ??. ??? ??? ??? ??? ?? ???? ????
??? ?? ? ? ?? ??? ??.

???? ????? ??? deterministic??? ??? ?????, ??? ???
??? transition probability? ????? ??? ???.

Action-value function? immediate reward??? ??? ?? ? state? ? ?
? ??? ? ????? state-value function(? ??? ???? discounted ?)
? ???? ??? ? ??.
Bellman (Expectation) Equation for value function
? ??? ?? ??? ??? ?? ?.
Bellman (Expectation) Equation for value function
?? ?????? ???? ???? ?, reward? state transition
probability? ?? ? ? ??, trial & error? ?? ???? ?.

?? ?? ???? ?? ??? ?? MDP? ?? ??? ???? ? ?
??? MDP? ??? ?.

????? ??? ? MDP ??? ??? ??? ? ??? ?.

??? reward function? state transition probability? ??? ??
?? ??????? Bellman equation?? ????? ?? ? ??.
Bellman (Expectation) Equation for value function
?? ??? ?? action-value function? ??? ???? ?? ??.
Bellman (Expectation) Equation for Q-function
?? Bellman equation? DP? ?? discrete? ????? ??? ??? ??
?? ?? ??.

?? ??? ???? ????? Hamilton-Jacobi-Bellman equation? ??.
Bellman (Expectation) Equation for Q-function
?? ? ? ?? ??? ??? ???? ?? ??? Bellman
expectation equation??? ??. 

??? ??? Bellman equation??? ??. ???? ?? ????
??? ???? ?? ?? ?? ??? ???.
Bellman Optimality Equation
??? ?? ??? ????? ??. 

??? ??(next state-value function)?? ??? value function? ???? ??
backup.

One step backup / multi-step backup.

Full-width backup(??? ?? ?? state? value function? ???? ??) => DP /
sample backup(?? ??? ??? backup) => RL
Backup
Bellman optimality equation ?? optimal value function? ?? ????.

????? ?? : accumulative future reward? ??? ?? ?? ??.

Optimal state-value function?? ?? state?? ??? ??? ??? ?? ???
? ????? ? ??? ??? ?? ?? ??? ?? ??? ??? ?? value
function.

Optimal action-value function? ????? ?? (s,a)?? ?? ? ?? ???
value function.
Optimal Value Function
?, ?? ???? ?? ? ?? ?? ?? ?? ??? ??.

??? ?? ??? optimal action-value function? ?? ??? ???
q?? ?? ??? ???????? ? ??? ??? ?? ?.
Optimal Value Function
Optimal policy? (s,a)?? action-value function? ?? ?? action
?? ??? ??? deterministic.
Optimal Policy
Optimal Policy
Optimal value function ??? ??? ??? ?? ?.

?? backup diagram? ?? ?? ???? ? ???? ??? max.
Bellman Optimality Equation
??? Bellman equation? ??? iterative?? MDP ??? ?? ?? DP?? ?.
Bellman Optimality Equation
The
End

More Related Content

Reinforcement learning basic

  • 2. ??? ????? ?? ?? ??? ? ?? ??? ?? ?? ???? ????(Interaction). ? ??? ?? ???? ??? ?? ??? ???, ??? ??? computational ?? ???? ?? machine learning. ?? (Learning)
  • 3. Process ??? ???? ????? ??? ??? ???? ?? ??? ???. ??? ? ????? ?????? ??(reward)? ???. ??? ??/?? ?? ??. ????? ???? ????? ? ????? ??? ??? ??? ??? ?? ???? ???? ??? ?? ?????. ???? (Reinforcement Learning)
  • 4. Trial & Error ???? ???? ??? ?? ???? ???? ?? ??? ???? ??? ??? ??? ? ??? ?? Reward? ? ?? ?? ???? ?? ????? ?? ?? ??? ??? ?? ??? ?? ??? ?? ? ???? ?? ??? ??? ??? ? ??? (?? ??? ???? ???? ? ?? ??? ???), ?? ? ?? ? ?? ?? ?????? ???? ??? ? ??. RL Essential things Delayed Reward
  • 5. ??? ?? ????????? ???? ??? ?? (??? ????) Trial & Error ?? =>???? ?? ??? ??? ?? ?? ??, ??? ??? ??? ??? ?? ????? ??. ??? ??????? ??? ?????? ????? ??? ??? ? ? ??. Trial & Error
  • 6. 1950???? ?????, ?? ????? ??? ?????? ????? ?? ? ?? ?? ??. Bellman? Bellman equation? ??? optimal control ??? ???, ??? ??? DP?? ??. ??, Bellman? MDP?? ??? ??? ???? ????? ??? ??. Optimal Control ???? ???? + ????? ???? ??.
  • 7. Reinforcement Learning is de?ned not by characterizing learning methods, but by characterizing a learning problem. ???? ???? ???? ?? ??? ???? ????? ??????. ???? (Reinforcement Learning) ?? Sutton
  • 8. 1. Fly stunt manoeuvres in a helicopter 2. Defeat the world champion at Backgammon 3. Manage an investment portfolio 4. Control a power station 5. Make a humanoid robot walk 6. Play many di?erent Atari games better than humans ???? (Reinforcement Learning) ?? ??? ?? ?? ??
  • 9. ????? ??? agent ???? (Reinforcement Learning) ??
  • 10. Agent? ??? ???? ?????, ????? ???? ?? agent? ? ? ?? ?? ??? ???? ???? ?? ??. ??? ???? ???? ???? ?? ??? ??? ???? ? episode ?? ?? ??? ??????? ?. ? ??? Agent? ??? ??? ??? ???? Policy? ??. ?? ?? ???? ?? ??? ? ??? ???? ??? ? ??? ?? ??? ???? ?? ?? => Policy Trial & Error
  • 11. 1. CNN? ?? 2. ?? agent? ?? ?? ??? ???? ??? ? ??? ? 3. Deep Neural Net? function approximator? ?? 4. Experience Relay 5. Target networks Deep Reinforcement Learning = Deep Learning + Reinforcement Learning
  • 12. ??? ???? ?? ??? ???? gridworld? ?? ?? ???? ???? ? ??? ??? ?????, ?? ??? ??? ??? ???? ??? ?? ? ? ??? ??? ???? ??. ???? ??? ? ??? ???? ??? ???? ???? ?? ??? ??? ??? ???? ???? ???? ????? ??? ? ? ??? ? ? => approximation. ? approximation ? ??? Deep NN? ??? ??? ??? ??? ??? ?.
  • 13. ? ??? ?? ??? ????? ???? ?? ???? ? ??? ?? ? ??? ? ?? ???? ? ??? ?? ?, ? ?? ??? ??? ? ? ??, ? ?? state(??)? Markov??? ???. = ??????? value?? ?? ??? ??? state? ?? ??? ???? ? state? Markov ??? ????. Markov Decision Process (MDP)
  • 14. ??? ??? ???? ???? ??? MDP. MDP (Markov Decision Process) ? : state, action, state transition probability matrix, reward, discount factor ??? ?? ?? : state ????? ???? ? : action ?? : reward MDP? ?? ??? ?? Markov Decision Process (MDP)
  • 15. Agent? ???? ??? ??. ??? ???? ?? ???? ?? ??? ?? ???? ?? ?? ??? ??? ??? ????? ??? ? ? ?? ??/??? ?? ??? ?? ???. e.g. ???? Agent? ??? ???? ??, ??, ??? ?? ??. ??? ???? ??. ??? ?? ??? ?????? ?? ?????? ?? ?? ??? ? ????? ? ? ?? ????? ?? ?? ??? ??. State
  • 16. Agent? ???? ?? ??? ?? ? action? ???? ??? ?. ??? ???? ?. ????? controller. Agent? ? ???? ?? ? ?? ??? ??? State -> state? ?? ?? ?? action? ??? ?? state? deterministic?? ?? ????? ???? ?? ??? noise. Action State Transition Probability
  • 17. MDP?? action? reward? ??? ?? ???? State-state ??? transition matrix? ???? ? ??. Markov chain? state-state ??? ? state ??? transition probability ? ?? ?? ?. ? Markov chain?? ??? ???? ?? ? ?? sleep?? ??? ??? ? ?? ? ?? ?? ??? stationary distribution. ?? ??? MDP??? action? ? ??? action? ?? ?? state? ? ??? ?? ?. Markov Chain
  • 18. s?? state? ?? ?, a?? action? ??? ? ?? ? ?? reward. Agent ? ??? ? ?? ??. Agent? Reward? ??? ?? ???? ??? ??. ? reward? immediate reward?? agent? ??? ????? ? ?? Reward? ?? ?? ??? ??? ?? Reward?? ??. Reward
  • 19. ??? agent? ? time-step?? 0.1? ???? ?? ?? agent? 1? ??? ? ?, ???? ??? ???? ?? ??? ???? 1? ??? ?? ?? ? 1? ?? ??? ?? ??? 0 ?? ??? ???? / 1?? ??? ????? Discount Factor
  • 20. agent? ?? state? ???? action? ????? ?? state?? ?? action? ? ?? ???? ?? ????? ?. ?? ????? ??? ? optimal policy? ?? ?. Policy
  • 21. MDP graph? ?? ? Markov chain?? state ??? transition? ?? ? ??? ?? action? ?? state? transition? reward? ?? MDP graph
  • 22. State value Function Agent? state1? ??, ???? action? ???? ????, ?? ? ? Reward?? ???? ??? ??. ??? ????? ?? ?? ?? Reward?? ?? ??? ?? return ? ??. Value Function
  • 23. Return? expectation? state-value function. Value function? ?? state s? ???? ??? ??. State-Value Function
  • 24. Value function? ??? agent? ?? state?? ??? ???? ?? ?? ???? ??? state ?? value function? ????, ??? ? ???? ??? value function? ????? ??. Value function? return(?? ??? ??? ?? reward? discounted amount)? ????? ??? ?? ?? ???? ???? ??? ????? ? ??? ?? ? ??. ?? ? state??? ????? ? ??? ???? ?????? ?????? ? ?? ???? ?? ????? ??? ? value-function? ????? ???? ???? ??? ?????? ? ???? ??? ???? true value function ? ?? ? ??. State-Value Function Value Function ??? ?
  • 25. ???? ?? ???? true??? ???? ?? ??? ?? State-Value Function
  • 26. Policy? ?????. ?? agent? ?? ????(?? ???? ?? ?? ??)? ??? ???? ?, ???? ?????? agent? ???? state?? ?? ?? ?? ??? ?? ??? ?. (?? ?? ?? ??? ??? ???? ??? ?? ????? ?? ?? ??) ??? ??? value function? ???? ?? ??? (??? ??? ??? ??? ??? ??) agent? ??? ?? ??? ????? ?. ?? ?? ??? ??? ? ?? ?? Policy? ?????? ?? ??, ? policy?? value-function? ??? ? ???? ? value-function? ??? ?? policy? ?? ? ?? ?? ?. State-Value Function
  • 27. ???? ???? ??????? value-function? ??? ? ?????? ??? ??? ?? ??. ? ??? ?? bias ?? ??, varience? ??? true?? ???? ??? ?? ?? ?? ?? ???? ?. State-Value Function
  • 28. action?? ?? ???? ? ? ?? ???. ?? ?? ???? ??? ??? ??. ?? state-value function?? ? state? ???? ?? ? state?? ?? ??? ???? ?? ???? ???? ?? ??? ??. agent? ???? ?? ??? ???? ??? state?? value-function?? ? ???? ???? ?? state?? ?? ??? ?? ?????, ? ??? ??? ? ?? ?????? ??? ??. ??? ?? ??? state? ?? value function?? action? ?? value function? ?? ? ??? ?? action-value function. Action-Value Function
  • 29. Action value function? ???? value function?? ?? ?? ?? ??? ?? action value function? ?? ???? ?? ??? ?? ???? ??? ?? ?? ?? ??? ?? ? ??? ?? ? ??? ? ??? ??. Action-Value Function ?? state s?? action a? ?? ?? ?? return? ?? ?????? ?? ? ?? ?? ?, ??? ?? ???. ??? ???? ????? state-value function? ??? action-value function? ??? ?. ? => action-value function? ?? ?? Q-value?? q-learning?? deep q-network ?? ?? ???? q??.
  • 30. ?? ?? value function? ??? ??? ???? ??? ??? ??? ???? ?? Bellman equation. Bellman Equation
  • 31. Agent? value function? ??? ??? ??? ????? ? ??? ??? ??. Bellman Expectation Equation
  • 32. State-value function? ??? ?? ? ???? ??? ? ? ??. Bellman (Expectation) Equation for value function
  • 33. Policy ??? ??? action-value function? ??? ?? ??? ? ??. ???, ?? ?? ????? ???? ?? ?? ????? ????? ?? ? ? ?. ??? ?? ?? ?? ???? ???? ?? ??. (state-action pair) Bellman (Expectation) Equation for value function
  • 34. ? ?? state, ?? ?? state?? action? ??? ??. state?? ????? ??? action? ????. ?? v? q? ??? ??? ??? ?? ??. ? ??? ? ??? ? ??? ?? ?? expected return? ?? ?? ? sum? ?? state? value function? ?. Bellman (Expectation) Equation for value function
  • 35. ? ????? reward? ??. ??? ??? ??? ??? ?? ???? ???? ??? ?? ? ? ?? ??? ??. ???? ????? ??? deterministic??? ??? ?????, ??? ??? ??? transition probability? ????? ??? ???. Action-value function? immediate reward??? ??? ?? ? state? ? ? ? ??? ? ????? state-value function(? ??? ???? discounted ?) ? ???? ??? ? ??. Bellman (Expectation) Equation for value function
  • 36. ? ??? ?? ??? ??? ?? ?. Bellman (Expectation) Equation for value function
  • 37. ?? ?????? ???? ???? ?, reward? state transition probability? ?? ? ? ??, trial & error? ?? ???? ?. ?? ?? ???? ?? ??? ?? MDP? ?? ??? ???? ? ? ??? MDP? ??? ?. ????? ??? ? MDP ??? ??? ??? ? ??? ?. ??? reward function? state transition probability? ??? ?? ?? ??????? Bellman equation?? ????? ?? ? ??. Bellman (Expectation) Equation for value function
  • 38. ?? ??? ?? action-value function? ??? ???? ?? ??. Bellman (Expectation) Equation for Q-function
  • 39. ?? Bellman equation? DP? ?? discrete? ????? ??? ??? ?? ?? ?? ??. ?? ??? ???? ????? Hamilton-Jacobi-Bellman equation? ??. Bellman (Expectation) Equation for Q-function
  • 40. ?? ? ? ?? ??? ??? ???? ?? ??? Bellman expectation equation??? ??. ??? ??? Bellman equation??? ??. ???? ?? ???? ??? ???? ?? ?? ?? ??? ???. Bellman Optimality Equation
  • 41. ??? ?? ??? ????? ??. ??? ??(next state-value function)?? ??? value function? ???? ?? backup. One step backup / multi-step backup. Full-width backup(??? ?? ?? state? value function? ???? ??) => DP / sample backup(?? ??? ??? backup) => RL Backup
  • 42. Bellman optimality equation ?? optimal value function? ?? ????. ????? ?? : accumulative future reward? ??? ?? ?? ??. Optimal state-value function?? ?? state?? ??? ??? ??? ?? ??? ? ????? ? ??? ??? ?? ?? ??? ?? ??? ??? ?? value function. Optimal action-value function? ????? ?? (s,a)?? ?? ? ?? ??? value function. Optimal Value Function
  • 43. ?, ?? ???? ?? ? ?? ?? ?? ?? ??? ??. ??? ?? ??? optimal action-value function? ?? ??? ??? q?? ?? ??? ???????? ? ??? ??? ?? ?. Optimal Value Function
  • 44. Optimal policy? (s,a)?? action-value function? ?? ?? action ?? ??? ??? deterministic. Optimal Policy
  • 46. Optimal value function ??? ??? ??? ?? ?. ?? backup diagram? ?? ?? ???? ? ???? ??? max. Bellman Optimality Equation
  • 47. ??? Bellman equation? ??? iterative?? MDP ??? ?? ?? DP?? ?. Bellman Optimality Equation