際際滷

際際滷Share a Scribd company logo
Rise of the Reinforcement Learning
Wonseok Jung
螳

伎襴暑 - Baruch college (Data Science Major)
ConnexionAI Researcher
CTRL (Contest in RL) 襴
DeepLearningCollege 螳 郁規
Project : Object Detection, Chatbot, Reinforcement Learning
Github:
https://github.com/wonseokjung
Facebook:
https://www.facebook.com/ws.jung.798
Blog:
https://wonseokjung.github.io/
覈谿
1. Create Environments
2. Multi-Agent Environment
3.Adversarial self-play
4. Imitation Learning
5. Curriculum Learning
CREATING ENVIRONMENTS
- 螳讌 蟆曙 蠏 蟆曙 襷 螳 螻襴讀  覲伎.
-  螻殊 螳讌 伎螳 覦.
OpenAI-gym
DQN
Supermario
DDQN(tuned)
Sonic
Rainbow DQN(tuned)
OpenSim
DDPG
QUESTIONS
讌覓 :
譯殊伎 蟆曙  願 蟠蠍 覓語襯 蠍 
蟆曙 襷 蟆 螳ロ蟾?
旧螳 譴殊  覦覯 蟾?
Issues : 
1. 旧螳 覓企覓 る蟇碁Π.
I 8, 1080蠍一 :
OpenAI GYM : 豕 5覿 ~ 殊殊 伎
SuperMario Level 1 : 6 
Sonic : OpenAI 螻 覯  : 7螳
Prosthetics : 1 伎 
2. 螳旧 旧 .
螻牛 蟆暑 蟆 螳ロ.
UNITY ML-AGENTS
- Unity襯  螳語 螳 蟆曙 襷れ . 
-  Machine Learning Agents 蠍磯レ朱 旧 覲企 螻殊願 蟆 螳ロ.
MULTI-AGENT ENVIRONMENT
MULTI-AGENTS?
-Intelligent human agents()  るジ agents 覲企ゼ 螻旧.
-覲企ゼ 螻旧覃 cooperation() 蟇磯 Independent(襴曙朱) 蟆 覈襯 
燕.
MULTI-AGENTS
- 豐  Agents螳 .
-  覦襯  penalty襯 覦朱 , 碁 覦襯  Reward襯 覦
.
- 螳 Agent 襴暑 Brain 螳讌螻 朱 襴曙朱 action .
TRAINING USING IMITATION LEARNING
-  Agents Independent蟆 碁Banana襯 谿剰鍵 螳讌 action 
覃 覦一企.
ADVERSARIAL SELF-PLAY
ADVERSARIAL LEARNING?
- 螻牛給 覈襯 燕蠍  覲企ゼ 螻旧覃 Cooperation  讌襷, 覲旧, 豢蟲, 
蟲,  炎骸 螳 麹螳 ろ 蟆曙磯 .
ADVERSARIAL LEARNING
- 螻牛給 覈襯 燕蠍  覲企ゼ 螻旧覃 Cooperation  讌襷, 覲旧, 豢蟲, 
蟲,  炎骸 螳 麹螳 ろ 蟆曙磯 .
-  螻 k striker 螻 襷 Goalkeeper襦 蟲焔 .
Striker
Goalkeeper
Striker
Goalkeeper
Object
ADVERSARIAL LEARNING
- Striker Goalkeeper ク 螻旧 j 襷蠍  Cooperation覃 .
Striker Goalkeeper
GoalKeeper
VS
StrikerVS
Coop Coop
IMITATION LEARNING
IMITATION LEARNING?
- 螻 覓殊 企 覈覓殊 覲願 蠏碁れ  behavior 覲願 覦一企.
- 蟆   覈覦覃 覦一磯 覦覯 Imitation Learning企手 .
TRAINING USING IMITATION LEARNING
Gravity
Agent1 Agent2
Gravity
Ball Ball
Initialization
- 螻旧 譴レ  伎覃, 螳 Agent 螻旧 覦 覦 Agent 朱 蟆狩.
- 覦 豺 Agent 螻旧 覲願 れ 蠍企.
TRAINING USING IMITATION LEARNING
Agent1 Agent2
Strat Training
Action3 Action1Action2 Action3
Action1 Action3
- Agent 螳讌 action 覃  襷 Reward襯 覦  action 
- 企 覦 給 螳 襷 .
TRAINING WITHOUT IMITATION LEARNING
- 襦覺 蟇穴 郁 襷蟆螻 螳 企れ 蟆曙 旧螳 襷れ 蠍碁 一る曙
 覓語螳 覦.
TRAINING USING IMITATION LEARNING
Imitation Learning
Teacher Student
-  觜襯願 螻殊朱 覦一語 蟆 覓瑚  覲願 覦一磯 Imitation Learning
.
- Student Teacher 覲願 覦一
- Teacher(Player)  action 覃 student螳 觜襴 覦一語 蟆 譴.
CURRICULUM LEARNING
CURRICULUM LEARNING
-  企 覦襦 襴 蟆 螳ロ讌 .
- れ螻, 蠍郁, 螻, 蟇穴, 襴 蟆豌 螻覲襦 牛.
- 企  覦覯 螳旧 Curriculum Learning企手 .
CURRICULUM LEARNING
- Agent 螳 task覿 覦一郁鍵  螻覲襦 旧 . 
- 覯 企れ task襯 牛蠍 り鍵 覓語  螻襯  旧 .
ENVIRONMENTS
Agent Goal
Wall
- Agent Goal 谿蠍  action .
- Wall れ螳讌 企 覦.
Action1Action2
Action1
TRAINING USING IMITATION LEARNING
-  agent small wall螻 large wall 狩蟇磯 一企 覈襦 螳蠍
 牛.
SUMMARY
1. Create Environments
2. Multi-Agent Environment
3.Adversarial self-play
4. Imitation Learning
5. Curriculum Learning
SUMMARY
Github:
https://github.com/wonseokjung
Facebook:
https://www.facebook.com/ws.jung.798
Blog:
https://wonseokjung.github.io/
螳.

More Related Content

Rise of unity_ml_7_22