�ݺ�ߣ

Introduction to
Reinforcement Learning
with Q-learning
Thomas Beucher
Data Scientist @Recast.AI
thomas.beucher@sap.com

CAT Artificial Neural Network
It’s a Unicorn!
Prediction
No it’s a Cat!
Learning

Vocabulary:
● State
● Action
● Reward
● Policy = which action to choose
in that state to maximize reward

● State = In which store we are
● Action = empty 1, 3 or 4 store
● Reward = whether we stop NIM or not
● Policy = argmaxa Q(s, a)
Here Q is a function of state S and action A which tell how good it is to take the action A when we
are in state S

state 25 state 24 state 23 state 22 ...
action 1 0.0 0.0 0.0 0.0 0.0
action 3 0.0 0.0 0.0 0.0 0.0
action 4 0.0 0.0 0.0 0.0 0.0
Q-table
Q(state 25, action 4) = how good it is to empty 4 stories
when we are in story 25

Bellman’s equation:
● Qt+1(st, at) = Qt(st, at) + α [ r(st, at) + γ maxa Qt(st+1, a) - Qt(st, at) ]
learning rate
discount factor
New Q Old Q Reward Old QMax Q next state

learning rate = 0.5
discount factor = 0.9 Q(s1, a1) = 0 + 100

Q(s6, a1) =
Q(s6, a4) =
0 + 0.5 * ( 0 + 0.9 * 0 - 0 ) = 0
0 + 0.5 * ( 0 + 0.9 * 100 - 0 ) = 45
Next State
Next State

Application of RL
● Wind Turbine Control
● Autonomous Vehicles
● Factory Automation
● Smart Grid
● Fault Detection and Isolation
● Process Planning
● Fleet logistics
● Camera Tuning
● Agriculture
● DDoS Attack Prevention
17

References
1. TEDed video: https://www.youtube.com/watch?v=qMFpOcLroOg

�ݺ�ߣ

Q-learning

More Related Content

Q-learning