際際滷

際際滷Share a Scribd company logo
AI-based Robotic Manipulation
Akihiko Yamaguchi(*1)
*1 Grad Schl of Info Sci, Tohoku University
The latest version of these
slides are available at:
http://akihikoy.net/p/rsj18.pdf
Goal of This Talk
Introducing my work on AI-based Robotic
Manipulation
Discussing AI applications in robot industry
Target: Researchers and engineers who understand
basic theory of robotics and machine learning
2
input
output
hidden
鐚 u
update
FK ANN
Can we build a robot that cooks like this?
If not, what are missing?
 AI/Software?
 Hardware?
 Sensors?
Everyday Manipulations are Difficult to Robots
Folding clothes
Cleaning
Cooking
Bathing
Dressing

5
Japanese way of
folding T-shirts
https://youtu.be/b5A
WQ5aBjgE
Chinese
cooking skills
https://youtu.be
/PFGGTPPNdRQ
What are the Difficulties?
Handling variations of:
Dynamics
 Non-rigid objects (Deformable, fragile, irregular shape, )
 E.g. Vegetables, meats, liquids, cloths, 
 No good dynamical models
Situations
 Initial state, object properties, context, ...
 Each vegetable has different shape
Tasks
 Humans are doing many tasks everyday
Hardware capability: Robot << Human body
Feasible tasks of robots << Feasible tasks of humans
Humans have much better hands (& sensors) than robots
6
What is Artificial Intelligence?
Many different methods are called AI
Optimization
Machine Learning
 Supervised learning
 Unsupervised learning
 Reinforcement learning
Reasoning
 Search
 Motion planning

None of the above is AI
OR all programs are AI (if x>0 then y else z is a simplest AI)
7
AI can do many things
Many (AI) methods are
developed for many tasks
Why is AI Useful in Robotic Manipulation?
Handling variations:
Learning dynamics and tasks
Adapting to new situations
Generalizing the behavior to new situations and tasks
Machine Learning: Tools for adaptation
Reasoning: Tools for generalization
Optimization: Most fundamental tools
8
AI is a solution to handle variations
Hardware vs. AI (Software)
General robot arms: Available
6+ DoF arms
General robot hands: Not available
Existing dexterous robot hands do not cover the tasks that humans
can do
Good vision: Available
Good cameras
Good tactile sensors: Not available
No de-facto standard tactile sensors
9
General AI (for manipulation) research needs
General Hardware (arms, hands, sensors)
*But we dont know what general hardware is
AI for Robotic Manipulation
10
Goal: Finding a policy to perform a given task
Dynamic Programming & Reinforcement Learning
11
Moving forward, Tasty,
I am satisfied 財, .
Dynamic Programming when {Fk} are given
Reinforcement Learning when {Fk} are unknown
Robotic manipulation is generally formulated
as a reinforcement learning problem
Hypothesis?
12
If we have a general reinforcement learning method,
robots can learn any (robotic manipulation) tasks
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
Deep Reinforcement Learning
Deep learning: With big data, NN can learn any I/O
mapping with any precision. We don't have to care
about how large the state space is. It can directly
handle image as an input without designing
features.
Deep RL: Using deep NN to represent policy,
dynamical models, or value functions, Deep RL can
handle large state space with big data.
15
Deep Reinforcement Learning
16
(T-L) Learning to play Atari
games by Google DeepMind,
Mnih et al. 2015
https://youtu.be/cjpEIotvwFY
(T-R) DeepMPC Robotic
Experiments - PR2 cuts food,
Lenz et al. 2015
https://youtu.be/BwA90MmkvPU
(B-L) Learning to grasp from
50K Tries, Pinto et al. 2016
https://youtu.be/oSqHc0nLkm8
(B-R) Learning hand-eye
coordination for robotic
grasping, Levine et al. 2017
https://youtu.be/l8zKZLqkfII
17
Rajeswaran, Kumar, Gupta, Schulman, Todorov, Levine: Learning Complex Dexterous Manipulation with Deep
Reinforcement Learning and Demonstrations
https://sites.google.com/view/deeprl-dexterous-manipulation
What is Good AI for Robotic Manipulation?
Task achievement (sum of rewards)
Learning speed (Number of samples)
Key axes to measure AI:
(from talk of Leslie Kaelbling in ICRA16)
Adaptability
Generalization ability
Scalability
18
Experience
Performance
Learning curve is used to measure
the learning performance
What is a Promising Approach?
Deep (Reinforcement) Learning?
End-to-End Learning?
Imitation Learning?

19
No promising approach has been proposed yet
Baxter peels banana https://youtu.be/rEeixPBd3hc
Hypothesis: AI-based Robot Manipulation
Library of skills is essential
Having many alternative strategies
Reasoning and learning are core tools
Structured knowledge should be introduced
Skills, dynamics models, policies, 
Unified approach is the way to go
Hybrid of model-based and model-free
Multiple representations: continuous, primitive, symbolic
(and, unexplored stuff: e.g. Perception skills)
21
Library of skills is essential
22
http://reflectionsintheword.files.wordpress.com/
2012/08/pouring-water-into-glass.jpg
http://schools.graniteschools.org/
edtech-canderson/files/2013/01/
heinz-ketchup-old-bottle.jpg
http://old.post-gazette.com/images2/
20021213hosqueeze_230.jpg
http://img.diytrade.com/cdimg/1352823/17809917/
0/1292834033/shampoo_bottle_bodywash_bottle.jpg http://www.nescafe.com/
upload/golden_roast_f_711.png
AI-based Robotic Manipulation
Pouring Behavior with Skill Library
Skill library
flow ctrl (tip, shake, ), grasp, move arm, 
 State machines (structure, feedback control)
Planning methods
grasp, re-grasp, pouring locations,
feasible trajectories, 
 Optimization-based approach
Learning methods
Skill selection  Table, Softmax
Parameter adjustment
(e.g. shake axis)  Optimization (CMA-ES)
Improve plan quality  Improve value functions25 [Yamaguchi et al. IJHR 2015]
AI-based Robotic Manipulation
AI-based Robotic Manipulation
AI-based Robotic Manipulation
AI-based Robotic Manipulation
Sharing Knowledge Among Robots
30
The same implementation
worked on PR2 and Baxter
PR2 and Baxter:
Diff: Kinematics, grippers
Same: Arm DoF, sensors
Sharable knowledge:
Skills
Behavior structure
Not sharable:
Policy parameters
How Good is This AI?
Scalability
Framework: Applicable to many tasks (to be verified)
Skills: Reusable in many contexts (to be verified)
Adaptability
Adapted skill parameters and skill selections in a few episodes
Simple machine learning and optimization tools worked
Generalization ability
Generalized behaviors over traditional robotic manipulations
(e.g. grasping & moving containers)
Could not generalize over non-rigid-objects (liquids)
31
Reinforcement Learning with Skill Library
for Generalization
32
Reinforcement Learning with Skill Library
Components:
Library of skills
 Skill = Parameterized Policy
Behavior graph
 Graph consisting of skills
 Execution: Need to decide skill parameters and skill selections
Dynamics models
 Partially know, Partially unknown
33
Model-based RL vs. Model-free RL
34
[Reinforcement Learning]
[Direct Policy Search] [Value Function-based]
[Model-based]
[Model-free]
RL RL SL
[Dynamic Programming][Optimization]
Planning
depth
Learning
complexity
[Policy] [Value Functions] [Forward Models]What is
learned
0 1 N
Model-free is tend to obtain better performance
35
[Kober,Peters,2011] [Kormushev,2010]
Model-free is robust in POMDP
36
Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013
https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
POMDP:
Partially Observable
Markov Decision
Process
Model-based is suffered from simulation biases
37
Simulation bias: When forward models are inaccurate (usual when
learning models), integrating the forward models causes a rapid
increase of future state estimation errors
cf. [Atkeson,Schaal,1997b][Kober,Peters,2013]
Model-based is good at generalization
38
input
output
hidden
鐚 u
update
FK ANN
Learning inverse kinematics of android face
[Magtanong, Yamaguchi, et al. 2012]
Model-based is good at sharing / reusing
learned components
39
Forward models are sharable / reusable
Analytical models can be combined
Model-based is flexible to reward changes
40
Model-based RL for Graph-Structured Dynamics
Model-based reinforcement learning
How to deal with simulation biases?
Do not try to learn dx/dt = F(x,u) (dt: small like xx ms)
Learn (sub)task-level dynamics
 Parameters  F_grasp  Grasp result
 Parameters  F_flow_ctrl  Flow ctrl result
Use stochastic models
 Gaussian  F  Gaussian
Use stochastic dynamic programming
 Stochastic Differential DP (DDP)
How to work with a skill library?
Dynamic Programming for graph-structured
dynamical systems
41
Model-based RL for Graph-Structured Dynamics
42
Learning Unknown
Dynamical Systems
with Stochastic
Neural Networks
Planning Actions
with Graph-DDP
43 [Yamaguchi and Atkeson, ICRA 2016]
Stochastic Neural Networks
44
Graph-DDP
[Yamaguchi and Atkeson, Humanoids 2015, 2016]
45
Works in real robots
Pouring Simulation with OpenDE
46
47
Achieved GENERALIZATION
over material variation and
container shapes
AI Approach for Robot Industry
48
When do Human-level Robots Show Up?
Breakthroughs so far
Image recognition with deep learning
Machine translation

Breakthroughs needed for robotic manipulation
Perception for manipulation
 Liquid recognition (e.g. D. Fox), Component recognition, Quantity estimation (e.g. Burgard),
Deformation recognition, 
Integration of structured knowledge (skill library, )
Hybrid of model-based and model-free RL
Multiple representations: continuous, primitive, symbolic
Reasoning about failure recovery
Hardware for general manipulation (robot hands, tactile sensors, tools for robots)

49
Many breakthroughs should be necessary in AI for robotic
manipulation. Human-level is still far from now.
Method-driven vs. Idea-driven vs. Task-driven
Method-driven
Starting point is a method (AI)
Idea-driven
Starting point is an idea (AI-based technology)
Many of current deep learning applications are this type
Task-driven
Starting point is a task
AI might not be the best way
50
On-sight vs. Off-sight, On-line vs. Off-line
On-sight: Using AI in the field
Off-sight: Using AI outside the field
On-line: Sampling and learning simultaneously
Off-line: Sampling and learning separately
51
7 Things to Know Before Using AI
Why AI works? Because the AI engineer designed carefully the
task through trial and error. If an AI engineer doesnt know the
task well, he/she cannot apply AI to the task.
No AI covers all tasks.
AI is too wide area. Its difficult to find a superman who covers all
methods (machine learning, reasoning, optimization, ) and all
domains (robotics, computer vision, natural language proc, ).
Guaranteeing the completion of task is hard.
Guaranteeing the generalization of learned models is hard.
In many robotic applications, improving hardware >> AI solution
(e.g. adding sensors, improving mechanisms).
Humans are underrated (Elon Musk). AI and robots are overrated.
52
Proximity
vision
Force
Slip
Tactile
Emphasizing the Importance of Tactile Sensing
53
AI-based Robotic Manipulation
Optical Skin Sensor was useful to automate cutting behavior
Future AI x Robotics for Robotic Manipulation
Many years should be necessary for AI and robots to acquire
human-level manipulation ability
Need a lot of fundamental research
Unifying many theories: AI, Robotics, Control, Computer Vision
Need general robot hands with tactile sensors
Education is important for sustainable development
Should increase the number of robotics x AI researchers
Case studies and competitions will boost the research
DRC, ARC, RoboCup, WRS, XPRIZE, 
Household activities (e.g. robot cooking), assembly, 56
Ad

Recommended

Robot Learning with Structured Knowledge And Richer Sensing
Robot Learning with Structured Knowledge And Richer Sensing
Akihiko Yamaguchi
Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynam...
Model-based Reinforcement Learning with Neural Networks on Hierarchical Dynam...
Akihiko Yamaguchi
Optical Soft Skin For Soft Object Manipulation
Optical Soft Skin For Soft Object Manipulation
Akihiko Yamaguchi
ARgh! kinesthetic learning
ARgh! kinesthetic learning
fridolin.wild
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
Deep Learning in Robotics: Robot gains Social Intelligence through Multimodal...
gabrielesisinna
An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...
An Experimentation Toolkit for Robotics Control and Manipulation Tasks using ...
Ashwin Reddy
Kateryna Zorina: Learning robot skills from video
Kateryna Zorina: Learning robot skills from video
Lviv Startup Club
Towards Machine Learning of Motor Skills
Towards Machine Learning of Motor Skills
butest
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
cscpconf
[244]襦覺 瑚 牛襦 襷り鍵
[244]襦覺 瑚 牛襦 襷り鍵
NAVER D2
AI Robotics
AI Robotics
Yasir Khan
(deep) reinforcement learning - CAB420
(deep) reinforcement learning - CAB420
Juxi Leitner
Reinforcement learning
Reinforcement learning
Zahra Khoobi
Robotics lover
Robotics lover
Punk Pankaj
EEE-BEE009 - Robotics and Automation Dr. S. P. Vijaya Raghavan (1).pdf
EEE-BEE009 - Robotics and Automation Dr. S. P. Vijaya Raghavan (1).pdf
nanjundaiahlatha
How to train your robot (with Deep Reinforcement Learning)
How to train your robot (with Deep Reinforcement Learning)
Lucas Garc鱈a, PhD
Robotics.ppt
Robotics.ppt
NallagondaChandrika
Robotics.ppt
Robotics.ppt
UmaDeviAnanth
Robotics.ppt
Robotics.ppt
AlfieDilag3
Robotics
Robotics
yousrimousa
Making Robots Learn
Making Robots Learn
inside-BigData.com
25 robotics
25 robotics
richard visey
Applied Robotics Engineering Unit 1 pdf
Applied Robotics Engineering Unit 1 pdf
Dr Mohd Aslam
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
Natalia D鱈az Rodr鱈guez
Applied Robotics Engineering Unit 1 ppt by Asl
Applied Robotics Engineering Unit 1 ppt by Asl
Dr Mohd Aslam
Artificial Intelligence and Machine Learning.pptx
Artificial Intelligence and Machine Learning.pptx
lapixih372
Robotics (1)
Robotics (1)
ashish swain
Introduction to AI Robotics Intelligent Robotics and Autonomous Agents series...
Introduction to AI Robotics Intelligent Robotics and Autonomous Agents series...
klekargibly
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu

More Related Content

Similar to AI-based Robotic Manipulation (20)

LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
cscpconf
[244]襦覺 瑚 牛襦 襷り鍵
[244]襦覺 瑚 牛襦 襷り鍵
NAVER D2
AI Robotics
AI Robotics
Yasir Khan
(deep) reinforcement learning - CAB420
(deep) reinforcement learning - CAB420
Juxi Leitner
Reinforcement learning
Reinforcement learning
Zahra Khoobi
Robotics lover
Robotics lover
Punk Pankaj
EEE-BEE009 - Robotics and Automation Dr. S. P. Vijaya Raghavan (1).pdf
EEE-BEE009 - Robotics and Automation Dr. S. P. Vijaya Raghavan (1).pdf
nanjundaiahlatha
How to train your robot (with Deep Reinforcement Learning)
How to train your robot (with Deep Reinforcement Learning)
Lucas Garc鱈a, PhD
Robotics.ppt
Robotics.ppt
NallagondaChandrika
Robotics.ppt
Robotics.ppt
UmaDeviAnanth
Robotics.ppt
Robotics.ppt
AlfieDilag3
Robotics
Robotics
yousrimousa
Making Robots Learn
Making Robots Learn
inside-BigData.com
25 robotics
25 robotics
richard visey
Applied Robotics Engineering Unit 1 pdf
Applied Robotics Engineering Unit 1 pdf
Dr Mohd Aslam
MILA DL & RL summer school highlights
MILA DL & RL summer school highlights
Natalia D鱈az Rodr鱈guez
Applied Robotics Engineering Unit 1 ppt by Asl
Applied Robotics Engineering Unit 1 ppt by Asl
Dr Mohd Aslam
Artificial Intelligence and Machine Learning.pptx
Artificial Intelligence and Machine Learning.pptx
lapixih372
Robotics (1)
Robotics (1)
ashish swain
Introduction to AI Robotics Intelligent Robotics and Autonomous Agents series...
Introduction to AI Robotics Intelligent Robotics and Autonomous Agents series...
klekargibly
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
LEARNING OF ROBOT NAVIGATION TASKS BY PROBABILISTIC NEURAL NETWORK
cscpconf
[244]襦覺 瑚 牛襦 襷り鍵
[244]襦覺 瑚 牛襦 襷り鍵
NAVER D2
AI Robotics
AI Robotics
Yasir Khan
(deep) reinforcement learning - CAB420
(deep) reinforcement learning - CAB420
Juxi Leitner
Reinforcement learning
Reinforcement learning
Zahra Khoobi
Robotics lover
Robotics lover
Punk Pankaj
EEE-BEE009 - Robotics and Automation Dr. S. P. Vijaya Raghavan (1).pdf
EEE-BEE009 - Robotics and Automation Dr. S. P. Vijaya Raghavan (1).pdf
nanjundaiahlatha
How to train your robot (with Deep Reinforcement Learning)
How to train your robot (with Deep Reinforcement Learning)
Lucas Garc鱈a, PhD
Applied Robotics Engineering Unit 1 pdf
Applied Robotics Engineering Unit 1 pdf
Dr Mohd Aslam
Applied Robotics Engineering Unit 1 ppt by Asl
Applied Robotics Engineering Unit 1 ppt by Asl
Dr Mohd Aslam
Artificial Intelligence and Machine Learning.pptx
Artificial Intelligence and Machine Learning.pptx
lapixih372
Introduction to AI Robotics Intelligent Robotics and Autonomous Agents series...
Introduction to AI Robotics Intelligent Robotics and Autonomous Agents series...
klekargibly

Recently uploaded (20)

Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
ICT Frame Magazine Pvt. Ltd.
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Edge AI and Vision Alliance
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Josef Weingand
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
Securing Account Lifecycles in the Age of Deepfakes.pptx
Securing Account Lifecycles in the Age of Deepfakes.pptx
FIDO Alliance
The Future of Technology: 2025-2125 by Saikat Basu.pdf
The Future of Technology: 2025-2125 by Saikat Basu.pdf
Saikat Basu
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Seminar: Authentication for a Billion Consumers - Amazon.pptx
FIDO Alliance
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
Information Security Response Team Nepal_npCERT_Vice_President_Sudan_Jha.pdf
ICT Frame Magazine Pvt. Ltd.
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance Seminar State of Passkeys.pptx
FIDO Alliance
You are not excused! How to avoid security blind spots on the way to production
You are not excused! How to avoid security blind spots on the way to production
Michele Leroux Bustamante
Security Tips for Enterprise Azure Solutions
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Seminar: Targeting Trust: The Future of Identity in the Workforce.pptx
FIDO Alliance
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Key Requirements to Successfully Implement Generative AI in Edge DevicesOpt...
Edge AI and Vision Alliance
Cyber Defense Matrix Workshop - RSA Conference
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
cnc-processing-centers-centateq-p-110-en.pdf
cnc-processing-centers-centateq-p-110-en.pdf
AmirStern2
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Creating Inclusive Digital Learning with AI: A Smarter, Fairer Future
Impelsys Inc.
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Wenn alles versagt - IBM Tape sch端tzt, was z辰hlt! Und besonders mit dem neust...
Josef Weingand
9-1-1 Addressing: End-to-End Automation Using FME
9-1-1 Addressing: End-to-End Automation Using FME
Safe Software
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Improving Data Integrity: Synchronization between EAM and ArcGIS Utility Netw...
Safe Software
OWASP Barcelona 2025 Threat Model Library
OWASP Barcelona 2025 Threat Model Library
PetraVukmirovic
Ad

AI-based Robotic Manipulation

  • 1. AI-based Robotic Manipulation Akihiko Yamaguchi(*1) *1 Grad Schl of Info Sci, Tohoku University The latest version of these slides are available at: http://akihikoy.net/p/rsj18.pdf
  • 2. Goal of This Talk Introducing my work on AI-based Robotic Manipulation Discussing AI applications in robot industry Target: Researchers and engineers who understand basic theory of robotics and machine learning 2
  • 4. Can we build a robot that cooks like this? If not, what are missing? AI/Software? Hardware? Sensors?
  • 5. Everyday Manipulations are Difficult to Robots Folding clothes Cleaning Cooking Bathing Dressing 5 Japanese way of folding T-shirts https://youtu.be/b5A WQ5aBjgE Chinese cooking skills https://youtu.be /PFGGTPPNdRQ
  • 6. What are the Difficulties? Handling variations of: Dynamics Non-rigid objects (Deformable, fragile, irregular shape, ) E.g. Vegetables, meats, liquids, cloths, No good dynamical models Situations Initial state, object properties, context, ... Each vegetable has different shape Tasks Humans are doing many tasks everyday Hardware capability: Robot << Human body Feasible tasks of robots << Feasible tasks of humans Humans have much better hands (& sensors) than robots 6
  • 7. What is Artificial Intelligence? Many different methods are called AI Optimization Machine Learning Supervised learning Unsupervised learning Reinforcement learning Reasoning Search Motion planning None of the above is AI OR all programs are AI (if x>0 then y else z is a simplest AI) 7 AI can do many things Many (AI) methods are developed for many tasks
  • 8. Why is AI Useful in Robotic Manipulation? Handling variations: Learning dynamics and tasks Adapting to new situations Generalizing the behavior to new situations and tasks Machine Learning: Tools for adaptation Reasoning: Tools for generalization Optimization: Most fundamental tools 8 AI is a solution to handle variations
  • 9. Hardware vs. AI (Software) General robot arms: Available 6+ DoF arms General robot hands: Not available Existing dexterous robot hands do not cover the tasks that humans can do Good vision: Available Good cameras Good tactile sensors: Not available No de-facto standard tactile sensors 9 General AI (for manipulation) research needs General Hardware (arms, hands, sensors) *But we dont know what general hardware is
  • 10. AI for Robotic Manipulation 10 Goal: Finding a policy to perform a given task
  • 11. Dynamic Programming & Reinforcement Learning 11 Moving forward, Tasty, I am satisfied 財, . Dynamic Programming when {Fk} are given Reinforcement Learning when {Fk} are unknown Robotic manipulation is generally formulated as a reinforcement learning problem
  • 12. Hypothesis? 12 If we have a general reinforcement learning method, robots can learn any (robotic manipulation) tasks
  • 14. Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013 https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS
  • 15. Deep Reinforcement Learning Deep learning: With big data, NN can learn any I/O mapping with any precision. We don't have to care about how large the state space is. It can directly handle image as an input without designing features. Deep RL: Using deep NN to represent policy, dynamical models, or value functions, Deep RL can handle large state space with big data. 15
  • 16. Deep Reinforcement Learning 16 (T-L) Learning to play Atari games by Google DeepMind, Mnih et al. 2015 https://youtu.be/cjpEIotvwFY (T-R) DeepMPC Robotic Experiments - PR2 cuts food, Lenz et al. 2015 https://youtu.be/BwA90MmkvPU (B-L) Learning to grasp from 50K Tries, Pinto et al. 2016 https://youtu.be/oSqHc0nLkm8 (B-R) Learning hand-eye coordination for robotic grasping, Levine et al. 2017 https://youtu.be/l8zKZLqkfII
  • 17. 17 Rajeswaran, Kumar, Gupta, Schulman, Todorov, Levine: Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations https://sites.google.com/view/deeprl-dexterous-manipulation
  • 18. What is Good AI for Robotic Manipulation? Task achievement (sum of rewards) Learning speed (Number of samples) Key axes to measure AI: (from talk of Leslie Kaelbling in ICRA16) Adaptability Generalization ability Scalability 18 Experience Performance Learning curve is used to measure the learning performance
  • 19. What is a Promising Approach? Deep (Reinforcement) Learning? End-to-End Learning? Imitation Learning? 19 No promising approach has been proposed yet
  • 20. Baxter peels banana https://youtu.be/rEeixPBd3hc
  • 21. Hypothesis: AI-based Robot Manipulation Library of skills is essential Having many alternative strategies Reasoning and learning are core tools Structured knowledge should be introduced Skills, dynamics models, policies, Unified approach is the way to go Hybrid of model-based and model-free Multiple representations: continuous, primitive, symbolic (and, unexplored stuff: e.g. Perception skills) 21
  • 22. Library of skills is essential 22
  • 25. Pouring Behavior with Skill Library Skill library flow ctrl (tip, shake, ), grasp, move arm, State machines (structure, feedback control) Planning methods grasp, re-grasp, pouring locations, feasible trajectories, Optimization-based approach Learning methods Skill selection Table, Softmax Parameter adjustment (e.g. shake axis) Optimization (CMA-ES) Improve plan quality Improve value functions25 [Yamaguchi et al. IJHR 2015]
  • 30. Sharing Knowledge Among Robots 30 The same implementation worked on PR2 and Baxter PR2 and Baxter: Diff: Kinematics, grippers Same: Arm DoF, sensors Sharable knowledge: Skills Behavior structure Not sharable: Policy parameters
  • 31. How Good is This AI? Scalability Framework: Applicable to many tasks (to be verified) Skills: Reusable in many contexts (to be verified) Adaptability Adapted skill parameters and skill selections in a few episodes Simple machine learning and optimization tools worked Generalization ability Generalized behaviors over traditional robotic manipulations (e.g. grasping & moving containers) Could not generalize over non-rigid-objects (liquids) 31
  • 32. Reinforcement Learning with Skill Library for Generalization 32
  • 33. Reinforcement Learning with Skill Library Components: Library of skills Skill = Parameterized Policy Behavior graph Graph consisting of skills Execution: Need to decide skill parameters and skill selections Dynamics models Partially know, Partially unknown 33
  • 34. Model-based RL vs. Model-free RL 34 [Reinforcement Learning] [Direct Policy Search] [Value Function-based] [Model-based] [Model-free] RL RL SL [Dynamic Programming][Optimization] Planning depth Learning complexity [Policy] [Value Functions] [Forward Models]What is learned 0 1 N
  • 35. Model-free is tend to obtain better performance 35 [Kober,Peters,2011] [Kormushev,2010]
  • 36. Model-free is robust in POMDP 36 Yamaguchi et al. "DCOB: Action space for reinforcement learning of high DoF robots", Autonomous Robots, 2013 https://www.youtube.com/playlist?list=PL41MvLpqzOg8FF0xekWT9NXCdjzN_8PUS POMDP: Partially Observable Markov Decision Process
  • 37. Model-based is suffered from simulation biases 37 Simulation bias: When forward models are inaccurate (usual when learning models), integrating the forward models causes a rapid increase of future state estimation errors cf. [Atkeson,Schaal,1997b][Kober,Peters,2013]
  • 38. Model-based is good at generalization 38 input output hidden 鐚 u update FK ANN Learning inverse kinematics of android face [Magtanong, Yamaguchi, et al. 2012]
  • 39. Model-based is good at sharing / reusing learned components 39 Forward models are sharable / reusable Analytical models can be combined
  • 40. Model-based is flexible to reward changes 40
  • 41. Model-based RL for Graph-Structured Dynamics Model-based reinforcement learning How to deal with simulation biases? Do not try to learn dx/dt = F(x,u) (dt: small like xx ms) Learn (sub)task-level dynamics Parameters F_grasp Grasp result Parameters F_flow_ctrl Flow ctrl result Use stochastic models Gaussian F Gaussian Use stochastic dynamic programming Stochastic Differential DP (DDP) How to work with a skill library? Dynamic Programming for graph-structured dynamical systems 41
  • 42. Model-based RL for Graph-Structured Dynamics 42 Learning Unknown Dynamical Systems with Stochastic Neural Networks Planning Actions with Graph-DDP
  • 43. 43 [Yamaguchi and Atkeson, ICRA 2016] Stochastic Neural Networks
  • 44. 44 Graph-DDP [Yamaguchi and Atkeson, Humanoids 2015, 2016]
  • 47. 47 Achieved GENERALIZATION over material variation and container shapes
  • 48. AI Approach for Robot Industry 48
  • 49. When do Human-level Robots Show Up? Breakthroughs so far Image recognition with deep learning Machine translation Breakthroughs needed for robotic manipulation Perception for manipulation Liquid recognition (e.g. D. Fox), Component recognition, Quantity estimation (e.g. Burgard), Deformation recognition, Integration of structured knowledge (skill library, ) Hybrid of model-based and model-free RL Multiple representations: continuous, primitive, symbolic Reasoning about failure recovery Hardware for general manipulation (robot hands, tactile sensors, tools for robots) 49 Many breakthroughs should be necessary in AI for robotic manipulation. Human-level is still far from now.
  • 50. Method-driven vs. Idea-driven vs. Task-driven Method-driven Starting point is a method (AI) Idea-driven Starting point is an idea (AI-based technology) Many of current deep learning applications are this type Task-driven Starting point is a task AI might not be the best way 50
  • 51. On-sight vs. Off-sight, On-line vs. Off-line On-sight: Using AI in the field Off-sight: Using AI outside the field On-line: Sampling and learning simultaneously Off-line: Sampling and learning separately 51
  • 52. 7 Things to Know Before Using AI Why AI works? Because the AI engineer designed carefully the task through trial and error. If an AI engineer doesnt know the task well, he/she cannot apply AI to the task. No AI covers all tasks. AI is too wide area. Its difficult to find a superman who covers all methods (machine learning, reasoning, optimization, ) and all domains (robotics, computer vision, natural language proc, ). Guaranteeing the completion of task is hard. Guaranteeing the generalization of learned models is hard. In many robotic applications, improving hardware >> AI solution (e.g. adding sensors, improving mechanisms). Humans are underrated (Elon Musk). AI and robots are overrated. 52
  • 55. Optical Skin Sensor was useful to automate cutting behavior
  • 56. Future AI x Robotics for Robotic Manipulation Many years should be necessary for AI and robots to acquire human-level manipulation ability Need a lot of fundamental research Unifying many theories: AI, Robotics, Control, Computer Vision Need general robot hands with tactile sensors Education is important for sustainable development Should increase the number of robotics x AI researchers Case studies and competitions will boost the research DRC, ARC, RoboCup, WRS, XPRIZE, Household activities (e.g. robot cooking), assembly, 56