際際滷

際際滷Share a Scribd company logo
瑚概讌 朱襴れ 蟇一 覈 蟆
Reinforcement Learning
Wonseok Jung

Wonseok Jung
City University of New York - Baruch College
(Data Science Major)
ConnexionAI A.I Researcher
DeepLearningCollege Reinforcement Learning
Researcher
覈郁規 CTRL (Contest in RL) Leader
Project what Ive done : Reinforcement Learning, 
Object Detection, Chatbot
Github:
https://github.com/wonseokjung
Facebook:
https://www.facebook.com/ws.jung.798
Blog:
https://wonseokjung.github.io/
覈谿
1. How Animals Learn
2. How Humans Learn
3. Reinforcement Learning
4. SuperMario with Reinforcement Learning
REINFORCEMENT LEARNING
PREVIEW
REINFORCEMENT LEARNING
Animal Human SuperMario
A
A
Env
R AtRt
SSt
Rt+1
St+1
Reinforcement
Learning
Agent
Environment
HOW ANIMALS LEARN
HABUTUATION
- 覈 覓殊 給ルレ .
- 300螳 蟆曙誤襷 螳螻  蠎襷豢  給ルレ .
- Head withdrawal reflex :  覓殊牡螳 蟆企  磯ジ 覦
- 蠎襷豢 襾碁Μ襯 蟇企襴覃 殊 蟇磯Μ襯 る 螳.
HOW ANIMALS LEARN
HABUTUATION
HOW ANIMALS LEARN
First try
Second try
Third try
LAW OF EFFECT
- Edward Thorndike(1898)
- Law of effect : 企  蟆郁骸螳 襷譟煙る磯 れ 蠏  覦覲牛.
覦襦 襷譟燕讌 朱 蠏  讌 .
- Reinforcement(螳) : 伎 殊企  覦覲牛蟆 襷 蠏
- Punishment(豌覯) : 伎 殊企  狩蟆 襷 蠏
HOW ANIMALS LEARN
EXAMPLE OF LAW OF EFFECT
HOW ANIMALS LEARN
HOW HUMANS LEARN
INTERACTION WITH ENVIRONMENT
REINFORCEMENT LEARNING
Environment Experience
LearnInteraction
HOW HUMANS LEARN?
- Reinforcement : 伎 殊企  覦覲牛蟆 襷 蠏
- Punishment : 伎 殊企  狩蟆 襷 蠏
HOW HUMANS LEARN
HOW HUMANS LEARN
Experiment Using Tap ball
HOW HUMANS LEARN
https://www.youtube.com/watch?v=2sicukP34fk
HOW HUMANS LEARN -TAP BALL
Day 1 Day 2 Day 3 Day 4
豕螻 : 3
襷 : 2
豕螻 : 23
襷 : 0
豕螻 : 30
襷 : 0
豕螻 : 38
襷 : 1
HOW HUMANS LEARN
HOW HUMAN LEARN
Day 5
豕螻 : 79
襷 : 0
HOW HUMANS LEARN
SIMILARITY LEARNING METHOD B/W ANIMALS AND HUMAN
HOW HUMANS LEARN
Punishment Punishment Punishment
REINFORCEMENT LEARNING
REINFORCEMENT LEARNING
Environment Experience
Learn
REINFORCEMENT LEARNING
Interaction
 
Time step
Action
Transition Function
Reward
Set of states
Set of actions
Start state
Discount factor
t
a
P(s, r  s, a)
r
A
S
S0
粒
Set of reward

Policy
Reward
State
R

r
REINFORCEMENT LEARNING
s
TERMINATION
Time step
Action
Transition Function
Reward
Set of states
Set of actions
Start state
Discount factor
t
a
P(s, r  s, a)
r
A
S
S0
粒
Set of reward

Policy
Reward
State
R

r
REINFORCEMENT LEARNING
s
LEARNING
- Reinforcement learning Reward(覲伎) 豕  action() .
- Learner(覦一磯)  action 企慨覃, reward襯 螳 蟆 覦 action 谿城.
- action 轟レ reward 訖襷 , れ   れ 殊企蟆 
reward レ 殊 .
Action
轟レ
 覲
覩碁 Reward 覩碁 Reward
REINFORCEMENT LEARNING
MARKOV DECISION PROCESS
Action
Agent
Environment
Reward
AtRt
State
St
Rt+1
St+1
REINFORCEMENT LEARNING
TOTAL REWARD
REINFORCEMENT LEARNING
Return of Episode
Return of Episode with discount factor
STATE-VALUE FUNCTION
State-value
REINFORCEMENT LEARNING
STATE-ACTION VALUE FUNCTION
State-Action value
REINFORCEMENT LEARNING
OPTIMAL POLICY
Optimal State-Value function
REINFORCEMENT LEARNING
Optimal State-Action value function
Agent
Exploitation Exploration
?
EXPLOITATION AND EXPLORATION
REINFORCEMENT LEARNING
IMPORTANCE OF EXPLORATION

RussianBlue
2
Curiosity

Munchkin
1
Food
REINFORCEMENT LEARNING
IMPORTANCE OF EXPLORATION
REINFORCEMENT LEARNING
 
Zero
exploration
Exploration
IMPORTANCE OF EXPLORATION-2
REINFORCEMENT LEARNING
 
Fail
SUPERMARIO WITH REINFORCEMENT LEARNING
MARKOV DECISION PROCESS
Action
Agent
Environment
Reward
AtRt
State
St
Rt+1
St+1
SUPERMARIO WITH R.L
Reward: +1
Penalty: -1
MARKOV DECISION PROCESS
Action
Agent
Environment
Reward
AtRt
State
St
Rt+1
St+1
SUPERMARIO WITH R.L
Reward: +1
Penalty: -1
SUPERMARIO WITH R.L
https://github.com/wonseokjung/gym-super-mario-bros
pip install gym-super-mario-bros

import gym_super_mario_bros
env = gym_super_mario_bros.make(SuperMarioBros-v0')
env.reset()
env.render()
INSTALL AND IMPORT ENVIRONMENT
WORLDS & LEVELS
SUPERMARIO WITH R.L
World 1 World 3
World 2 World 4
env = gym_super_mario_bros.make('SuperMarioBros-<world>-<level>-v<version>')
GOAL
SUPERMARIO WITH R.L
REWARD AND PENALTY
SUPERMARIO WITH R.L
Reward
Penalty
蟾覦 螳蟾讌覃 +
覈 谿覃 +
覈燕讌 覈詩覃 -
螳 讌襷 -
蟾覦 覃伎覃 -
STATE, ACTION
SUPERMARIO WITH R.L
env.observation_space.shape
(240, 256, 3) # [ height, weight, channel ]
env.action_space.n
256
SIMPLE_MOVEMENT = [
[nop],
[right],
[right,A],
[right,B],
[right,A,B],
[A],
[left],
]


from nes_py.wrappers import BinarySpaceToDiscreteSpaceEnv
import gym_super_mario_bros
env = gym_super_mario_bros.make(SuperMarioBros-v0)
env =BinarySpaceToDiscreteSpaceEnv(env, SIMPLE_MOVEMENT)
EXPLOITATION AND EXPLORATION
SUPERMARIO WITH R.L
next_state, reward, done, info = env.step(action)
def epsilon_greedy(q_value,step):
if np.random.rand() < epsilon :
return np.random.randint(output)
else : 
action = np.argmax(output)
Exploitation Exploration
REPLAY MEMORY BUFFER
SUPERMARIO WITH R.L
memory = deque([],maxlen=1000000)
memory.append(state,action,reward,next_state)
(St, At, Rt+1, St+1)
next_state, reward, done, info = env.step(action)
eps_min = 0.1
eps_max = 1
eps_decay_steps = 200000
MINIMIZE LOSS
SUPERMARIO WITH R.L
import tensorflow as tf
loss = tf.reduce_mean(tf.squre( y - Q_action ) )
Optimizer =tf.train.AdamsOptimizer(learning_rate)
training_op = optimizer.minize(loss)
(Rt+1 + 粒t+1maxaq慮(St+1, a
)  q慮(St, At))2
(St, At, Rt+1, St+1)
MINIMIZE LOSS
SUPERMARIO WITH R.L
import tensorflow as tf
loss = tf.reduce_mean(tf.squre( y - Q_action ) )
Optimizer =tf.train.AdamsOptimizer(learning_rate)
training_op = optimizer.minize(loss)
(Rt+1 + 粒t+1maxaq慮(St+1, a
)  q慮(St, At))2
(St, At, Rt+1, St+1)
APPROXIMATE ACTION-VALUE
SUPERMARIO WITH R.L
1000EPISODE, 3000EPISODE, TRAINING
SUPERMARIO WITH R.L
1000 episode 3000 episode
5000 EPISODE
SUPERMARIO WITH R.L
5000 episode
SUMMARY
1. How Animals Learn
2. How Humans Learn
3. Reinforcement Learning
4. SuperMario with Reinforcement Learning
REINFORCEMENT LEARNING
REFERENCES
1. Habituation The Birth of Intelligence
2. Law of effect : The Birth of Intelligence ,p.171
3. Thorndike, E. L. (1905). The elements of psychology. New York: A. G. Seiler.
4. Thorndike, E. L. (1898). Animal intelligence: An experimental study of the associative
processes in animals. Psychological Monographs: General and Applied, 2(4), i-109.
5. Supermario environment 
https://github.com/Kautenja/gym-super-mario-bros
6. http://faculty.coe.uh.edu/smcneil/cuin6373/idhistory/thorndike_extra.html
Question?
Github:
https://github.com/wonseokjung
Facebook:
https://www.facebook.com/ws.jung.798
Blog:
https://wonseokjung.github.io/
螳.

Thank you

More Related Content

All about A.I SuperMario (Reinforcement Learning)