際際滷

際際滷Share a Scribd company logo
Introduction to
Reinforcement Learning
with Q-learning
Thomas Beucher
Data Scientist @Recast.AI
thomas.beucher@sap.com
Usual Machine Learning
CAT Artificial Neural Network
Its a Unicorn!
Prediction
No its a Cat!
Learning
Reinforcement Learning
Vocabulary:
 State
 Action
 Reward
 Policy = which action to choose
in that state to maximize reward
Practical case resolution
Rogue AI Riddle
 State = In which store we are
 Action = empty 1, 3 or 4 store
 Reward = whether we stop NIM or not
 Policy = argmaxa Q(s, a)
Here Q is a function of state S and action A which tell how good it is to take the action A when we
are in state S
How we create the Q function?
state 25 state 24 state 23 state 22 ...
action 1 0.0 0.0 0.0 0.0 0.0
action 3 0.0 0.0 0.0 0.0 0.0
action 4 0.0 0.0 0.0 0.0 0.0
Q-table
Q(state 25, action 4) = how good it is to empty 4 stories
when we are in story 25
How to update our Q-table?
Bellmans equation:
 Qt+1(st, at) = Qt(st, at) + 留 [ r(st, at) + 粒 maxa Qt(st+1, a) - Qt(st, at) ]
learning rate
discount factor
New Q Old Q Reward Old QMax Q next state
Launch demo training
learning rate = 0.5
discount factor = 0.9 Q(s1, a1) = 0 + 100
Q(s6, a1) =
Q(s6, a4) =
0 + 0.5 * ( 0 + 0.9 * 0 - 0 ) = 0
0 + 0.5 * ( 0 + 0.9 * 100 - 0 ) = 45
Next State
Next State
Show Demo Results
Application of RL
 Wind Turbine Control
 Autonomous Vehicles
 Factory Automation
 Smart Grid
 Fault Detection and Isolation
 Process Planning
 Fleet logistics
 Camera Tuning
 Agriculture
 DDoS Attack Prevention
17
Thank you :)
References
1. TEDed video: https://www.youtube.com/watch?v=qMFpOcLroOg

More Related Content

Q-learning