This document provides an introduction to reinforcement learning with Q-learning. It defines key reinforcement learning concepts like state, action, reward, and policy. It uses the example of a rogue AI riddle to illustrate these concepts, with states representing stores, actions representing emptying stores, reward for stopping an AI, and a Q-table to learn the policy. The document explains how the Q-table is updated using Bellman's equation and shows a demo training run. It concludes by listing applications of reinforcement learning like autonomous vehicles, smart grids, and more.
8. State = In which store we are
Action = empty 1, 3 or 4 store
Reward = whether we stop NIM or not
Policy = argmaxa Q(s, a)
Here Q is a function of state S and action A which tell how good it is to take the action A when we
are in state S
10. state 25 state 24 state 23 state 22 ...
action 1 0.0 0.0 0.0 0.0 0.0
action 3 0.0 0.0 0.0 0.0 0.0
action 4 0.0 0.0 0.0 0.0 0.0
Q-table
Q(state 25, action 4) = how good it is to empty 4 stories
when we are in story 25