狠狠撸

狠狠撸Share a Scribd company logo
强化学习的王者之旅
余方國 博士
04/11/2023
2
强化学习的王者之旅
https://www.imdb.com/title/tt0108065/?ref_=ttmi_tt
3
强化学习的王者之旅
MuZero
Alpha Zero
Gym
Gym
4
Atari Games
Pong Breakout Phoenix
https://www.gymlibrary.dev/
https://gymnasium.farama.org/
5
Reinforcement Learning Framework
ENVIRONMENT
AGENT
State Action Reward
(s1 → a1 → r1)→ (s2 → a2 → r2)→ (s3 → a3 → r3)→ …
Making Sequential Decisions to Maximize Long-Term Rewards
6
Atari Breakout in OpenAI Gym
import gym
env = gym.make("ALE/Breakout-v5", render_mode="human")
state, info = env.reset()
for index in range(1000):
action = env.action_space.sample() # action by random or policy
state, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
state, info = env.reset()
env.close()
https://www.gymlibrary.dev/
https://gymnasium.farama.org/
7
State/Action/Reward in Atari Breakout
State:
●
(210, 160, 3) - image
Action:
●
0 - NO OP
●
1 - FIRE
●
2 - RIGHT
●
3 - LEFT
Reward:
●
Red - 7 points
●
Orange - 7 points
●
Yellow - 4 points
●
Green - 4 points
●
Aqua - 1 point
●
Blue - 1 point
https://www.gymlibrary.dev/
https://gymnasium.farama.org/
8
From One Game to All The Games in Atari
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
9
A Journey to Artificial General Intelligence
https://www.assemblyai.com/blog/reinforcement-learning-with-deep-q-learning-explained/
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
DQN/2015
R2D2/2019
NGU/2019
Agent57/2020
10
OpenAI Gym Taxi-v3 : State/Action/Reward
State:
●
Number of Variable : 1
●
Range of Variable : [1, 500]
●
25 taxi positions x 5 passenger positions x 4 destination locations
Action:
●
0 : move south
●
1 : move north
●
2 : move east
●
3 : move west
●
4 : pickup passenger
●
5 : drop off passenger
Reward:
●
+20 : delivering passenger
●
-10 : pickup/dropoff illegally
●
-1 : per step unless other rewards is triggered
https://www.gymlibrary.dev/environments/toy_text/taxi/
11
OpenAI Gym Taxi-v3 : Q Table
(500 x 6)
https://www.gocoder.one/blog/rl-tutorial-with-openai-gym
12
Q Learning (with epsilon greedy policy)
3. exploitation
1. initialize Q table
4. exploration
5. action
2. state
8. update Q table
6. next state
7. reward
https://www.cs.toronto.edu/~rgrosse/courses/csc311_f21/
13
Limitation of Q Table
representation
scalability
14
Deep Q Network (DQN) Architecture (1/2)
Ref : Human-level control through deep reinforcement learning
15
Deep Q Network (DQN) Architecture (2/2)
Ref : Massively Parallel Methods for Deep Reinforcement Learning
16
Deep Q Learning (with experience replay and dual networks)
1. initialize replay memory
5. store transition in replay memory
6. get batch from replay memory
2. initialize main network
3. initialize target network
4. epsilon greedy policy from main network
7. calculate error between two networks
8. synchronize two networks
Ref : Human-level control through deep reinforcement learning
17
Deep Q Network (DQN) on Breakout
Artificial Intelligence and the Future - Demis Hassabis/DeepMind
https://youtu.be/zYII3AOSgo8?t=2236
18
Deep Q Network (DQN) Benchmark
Ref : Human-level control through deep reinforcement learning
19
Four Tough Games in Atari
Pitfall Solaris Skiing Montezuma’s Revenge
Problems : long-term credit assignment and exploitation/exploration tradeoff
Solutions : intrinsic motivation, meta-controller, short-term/episodic memory, distributed agents, etc.
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
20
Distributed Reinforcement Learning
Agent57
Gorila
https://arxiv.org/abs/2003.13350
https://arxiv.org/abs/1507.04296
21
How Well Can Agent57 Do?
https://www.deepmind.com/blog/agent57-outperforming-the-human-atari-benchmark
22
Deep Q Network and Brain Activity
23
Policy Gradient on Atari Pong
https://www.youtube.com/watch?v=tqrcjHuNdmQ
24
Reinforcement Learning at DeepMind
https://analyticsindiamag.com/all-hail-the-king-of-reinforcement-learning-deepmind/
25
Mastering Go at DeepMind
https://analyticsindiamag.com/all-hail-the-king-of-reinforcement-learning-deepmind/
26
A Journey to Artificial General Intelligence
https://www.deepmind.com/blog/muzero-mastering-go-chess-shogi-and-atari-without-rules
https://www.youtube.com/watch?v=lVMgxtm5L-U
27
AlphaGo, AlphaGo Zero, Alpha Zero, MuZero
AlphaGo Zero, Nature, 2017
AlphaZero, Science, 2018 MuZero, Nature, 2020
AlphaGo, Nature, 2016
28
AlphaGo Fan/Lee/Master
●
European Go Champion Fan Hui — 5:0
●
South Korean professional Go player Lee Sedol — 4:1
●
Online games with players from China/Korea/Japan — 60:0
●
Chinese professional Go player Ke Jie — 3:0
https://www.youtube.com/watch?v=lVMgxtm5L-U
https://www.youtube.com/watch?v=LX8Knl0g0LE
29
AlphaGo Fan/Lee/Master
●
European Go Champion Fan Hui — 5:0
●
South Korean professional Go player Lee Sedol — 4:1
●
Online games with players from China/Korea/Japan — 60:0
●
Chinese professional Go player Ke Jie — 3:0
https://www.youtube.com/watch?v=lVMgxtm5L-U
https://www.youtube.com/watch?v=WXuK6gekU1Y
30
AlphaGo Inputs and Policy/Value Networks
/ckmarkohchang/alphago-in-depth
31
AlphaGo Monte Carlo Tree Search
/ckmarkohchang/alphago-in-depth
32
AlphaZero Training Process
Self-Play
Train
Value
Network
Train
Policy
Network
https://www.youtube.com/watch?v=lVMgxtm5L-U
33
AlphaZero Network for Chess
Ref: Acquisition of Chess Knowledge in AlphaZero
AlphaGo
? Two networks: policy network and value network
? Conv/ReLu-based layer structure
AlphaZero
? One network with two heads: policy and value
? ResNet-based layer structure
34
AlphaGo Zero Performance Benchmark
https://thirdeyedata.ai/how-to-build-your-own-alphazero-ai-using-python-and-keras/
35
MuZero Training Process
h: representation
f: prediction
g: dynamics
Ref: Mastering Atari, Go, chess and shogi by planning with a learned model
36
MuZero Performance Benchmark
Ref: Mastering Atari, Go, chess and shogi by planning with a learned model
37
MuZero for Self-Driving Car at Tesla
https://www.youtube.com/watch?v=j0z4FweCy4M
38
AlphaGo to AlphaStar by David Silver
Deep Reinforcement Learning from AlphaGo to AlphaStar - London Machine Learning Meetup
39
强化学习的王者之旅
MuZero
Alpha Zero
Gym
Gym
深度強化學習
通用人工智慧
40
强化学习的王者之旅
演講資料
fangkuoyu@gmail.com
博客帳號

More Related Content

强化学习的王者之旅