狠狠撸

Global AI Bootcamp Seoul
3D Environment HomeNavi
Language,
Vision
& Action
???(Paul Kim)

HomeNavi Introduction
RL approach
- Value-based
- Policy Search
- Evolution Strategy
- ….
RL approach? ??
????? ??? ???? ??….
??? ????

Motivated paper
Target-driven Visual Navigation Model using
Deep Reinforcement Learning(Y Zhu, 2016)
??? ??!!

Mobile Robot
A mobile robot is a robot that is capable of locomotion
- wikipedia-

Mobile Robot
A mobile robot is a robot that is capable of locomotion
- wikipedia-
?, ???
Model-base? ??
RL????
??? ???
?????!!!

Domain skills
? Camera motion
? Robotics / Manipulation
? APIs
Language
ActionsVision
? Image / video
understanding
? 3D environment perception
? Instruction following
? Question answering
? Dialog

LV&A
LV&A
- Language
- Embedding
- RNN
- Attention
- …
- Vision
- CNN
- YOLO extensions
- …
- Action
- Actor-Critic
- Value Based Approach
- Policy Optimization
- HRL(Hierarchical RL)
- …

Environments
Deepmind Lab
AI2-THOR
MINOS
Matterport3D

Navigation with Vision & RL
unsupervised reinforcement and auxiliary learning agent
Environment
- Deepmind Lab
Sensory Inputs
- Image
Auxiliary Task
- Pixel Control
- Reward Prediction
- Value Function Replay
Control
- A3C

Learning to Navigate in Complex Environments
Environment
- Deepmind Lab
Sensory Inputs
- Image
Auxiliary Task
- Depth prediction
- Loop Closure prediction
Control
- A3C

Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning
Environment
- AI2-THOR
Sensory Inputs
- Image
- Target image
Control
- Siamese Network
- Actor-critic

????
??? ?? ??? ?? ???
?? ???
? ?…

Example
reinforcement learning with unsupervised
auxiliary tasks(M Jaderberg et al, 2016)
??? ??!!

Example
????
?? ???
???
?? ???…

Example
Vision
based

What is Language Grounding?
?? Vision??? ?????
??? ???? Agent?
???? Language?
??? ? ?? ??? ???…

What is Language Grounding?
Pick up a cup
Go to the bedroom
Empty the trash can
Go to the kitchen
Wash dishes
…
…

Multi-Modality Representation
Language??? Vision???
??? ?? ???? ???? ?? ??? ? ??? ??!

Navigation with Vision,
Language(Instructions) & RL
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
Environment
- MALMO
Sensory Inputs
- Image : RGB image
- Instruction : Analogy Making
Hiearchical Structure
- Parameterized Skills
- Meta Contoller
- Pointer
Control
- Actor-Critic(GAE)

Grounded Language Learning in a Simulated 3D World
Environment
- Deepmind Lab
Sensory Inputs
- Image : RGB image
- Instruction
“green object next to the red object”
Auxiliary Tasks
- UNREAL
- Temporal AutoEncoder(tAE)
- Language Prediction(LP)
State Representation
- Concat
Control
- A3C

Gated-Attention Architectures for Task-Oriented Language Grounding
Environment
- VizDoom
Sensory Inputs
- Image : RGB image
- Instruction(templete)
“Go to the tallest red pillar”
- Gated-Attention
Module(Attention based)
Control
- A3C

Building Generalizable Agents with a Realistic and Rich 3D Environment
Environment
- House3D(SUNCG base)
Sensory Inputs
- Image
- RGB only
- RGB + Depth
- Mask + Depth
- Instruction
“Go to Kitchen”
Gated-Attention
Module(Attention based)
Control
- A3C, DDPG

?????
Language? ???
Agent? ???? ??
??? ? ??…

Language(Instructions) & RL Example
Zero-Shot Task Generalization with Multi-Task
Deep Reinforcement Learning(Oh et al, 2017)
??? ??!!

?????
?? ????
???
????…

Vision
Language
based

Question Answering
???? Agent?
Language? ???? ?? ??
??? ?? ???? ???
??? ??? ???…

Language(QA) & RL
IQA: Visual Question Answering in Interactive Environments
Environment
- AI2-THOR
Sensory Inputs
- Image
- IQUAD dataset
“Is there a cup in the microwave?”
Hiearchical Structure
- Hierarchical Interactive Memory
Network
- Planner
- Semantic Memory
- Submodules
- ????? ??(ex. YOLO)
Control
- A3C, HIMN

Language(QA) & RL
Embodied Question Answering
Environment
- House3D(SUNCG base)
Sensory Inputs
- Image
- RGB image
- Segmentation mask
- Depth
- Instruction
“What color is the car?”
- Navigation
- Pretrain then fine tuning with
REINFORCE
- Question, Answering
- EQA dataset
Control
- A3C, PACMAN
- Imitation Learning

Language(QA) & RL
?? ?????
Question Answering
? ???? agent?
???? ?? ??? ? ?…

Language(QA) & RL Example
IQA: Visual Question Answering in
Interactive Environments(D Gordon et al, 2017)
??? ??!!

Question Answering?
??? ????
?? ???
?? ???…

Vision
Language(QA)
based

RL Korea HomaNavi
??? RL_Korea
HomeNavi????
? ?????
?????? ?? ??,
??? ??? ?? ??…..
?? ????!!!
????? ??? ??
??? ?????
(?? ?? ??…???? ??)
Reinforcement
Learning Korea

RL Korea & Modulabs
Reinforcement
Learning Korea
??? ???
LV&A Lab

Lab ??…
??? ???
LV&A Lab
??? ?????..
????

狠狠撸

2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)

More Related Content

Similar to 2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI) (13)

More from Yechan(Paul) Kim (7)

Recently uploaded (20)

2018 global ai_bootcamp_seoul_HomeNavi(Reinforcement Learning, AI)