際際滷

際際滷Share a Scribd company logo
3D SLAM introcution &
current status
Jan. 2017
Jacky Liu
Contents
1. What is SLAM
2. Sensor
3. Open-source SLAM projects
4. Applications
2016/12/13 2
What is SLAM
SLAM framework
Visual odometry
Optimization
Loop closure
Map reconstruction
SLAM pre-processing
2016/12/13 3
What is SLAM
?SLAM (Simultaneous Localization and Mapping)
2016/12/13 4
What is SLAM
2016/12/13 5
What is SLAM
2016/12/13 6
SLAM pipeline (PTAM)
2016/12/13 7
Sensor and 3D imaging tech
2016/12/13 8
Sensor
Laser
? Accurate
? Fast
? Long research history
? Heavy
? Expensive
? Ex. SICK, Velodyne, Rplidar
Camera
? Light-weight
? Cheap
? Rich information
? High computational complexity
? Strong assumption to env.
? Easy affect by noise
? Ex. Mono, Stereo, RGB-D
2016/12/13 9
3D imaging tech.
? Monocular vision
? Stereo vision
? structured-light
? Time-of-Flight(TOF)
2016/12/13 10
3D imaging tech.
Mono Stereo Structed-light TOF
Software complex complex normal simple
Hardware cost low low high normal
Response time normal normal slow fast
DepthAccuracy low low High normal
Low lighting bad bad good Good
Bright light Bad good bad good
Distance Not limited Not limited Midian Midian
Adv. Low cost x High accuracy
Robust to env.
light
Dis.
Scale
uncertainty
Computational
complex
Slow response
Limited detection
distance
2016/12/13 11
Visual Odometry
2016/12/13 12
Visual Odometry
? According to two sequential images, estimate camera ego-motion
? Input: Point in space project to two camera
? Output: ego-motion of two camera
? Monocular: Only pixel related position, no depth info.
? Stereo、RGB-DGet depth info directly
? Dimensions
? 2D-2DEstimate motion by to set of pixel Epipolar geometry
? 3D-2DEstimate motion by known space position
and projection position. PnP
? 3D-3DEstimate motion by two set of known
space point ICP
2016/12/13 13
Visual Odometry
? Problems of monocular camera
? Unsure about scale, need to give initialization parameters
? Can not estimate motion when doing pure rotation
2016/12/13 14
Visual Odometry C Feature/Direct
2016/12/13 15
Visual Odometry C Feature (Main stream)
? Feature
? SIFT、ORB
? Key-point、Descriptor
? Algorithm procedure
1. Get feature point and descriptor from image
2. Matching current image with last image
3. Minimize projection error, estimate ego-motion of camera(PnP、ICP)
? Disadvantage
? Feature extraction is time consume
? Feature extraction could fail
? Error matching
2016/12/13 16
Visual Odometry - Direct
? Photometric invariant assumption
? Sensitive to change of lighting
? No feature extraction
? Current system
? Sparse directSVO-SLAM
? Semi-dense directLSD-SLAM
? Dense directDVO-SLAM
2016/12/13 17
Visual Odometry
Feature Direct
Tracking
Feature descriptor
(100-1000 corner or
surface)
pixel
Reconstruction Corner Whole image
Comp. Complex Low
Sparse C Low
Dense - High
Inconsistence
model robustness
Yes No
History 20 years+ 2012~
Outliers Robust Hard to remove
2016/12/13 18
Visual Odometry - Conclusion
? Visual Odometry
? Matching pixel of feature point
? According matching result calculate camera ego-motion
? Estimate pixel for feature point position in global map
? Imperfect
? Result is noisy => Global optimization
? Accumulate error => Loop closure
? Loss(Camera moving too fast or been blocked) => Re-localization
2016/12/13 19
Optimization
2016/12/13 20
Optimization
? Two major methods
? FilterKalman Filter, EKF, PF, RBPF, UKF
? Non-linear optimizationGraph optimization
, Factor graph
? Graph optimization software packages
? g2o、iSAM
2016/12/13 21
Loop closure
2016/12/13 22
Loop closure
? Recognize visited location
? Error of visual odometry will gradually accumulate
? Use revisit clue to fix error
2016/12/13 23
Loop closure
? Bag-of-Words
? Process
? Separate nose, eyes, mouse
? Build dictionary
? Face = eye*2 + nose*1 + mouse*1
? Feature => Words
2016/12/13 24
Loop closure
? Advantage of visual SLAM
? Compare to Laser, visual SLAM system has richer information, which could
increase accuracy.
False Positive False Negative
2016/12/13 25
Map reconstruction
2016/12/13 26
Map reconstruction
? Different purpose, different Map
? Metric map
? Topological map
? Others
? Purpose
? Navigation, localization, interaction, recognition,
viewing
? Good map
? Accurate, fast
2016/12/13 27
Open-source SLAM projects
2016/12/13 28
Open-source SLAM projects
3D imaging tech Pre-processing
Kintinuous(2015) RGB-D Direct
ElasticFusion(2015) RGB-D Direct
RGBD-SLAM-V2 RGB-D feature
ORB-SLAM (2016) Mono/Stereo/RGB-D feature
LSD-SLAM (2015) Mono/Stereo Direct
SVO (2014) Mono feature
RTAB-Map(2013) RGB-D Feature (SIFT)
DVO-SLAM (2013) RGB-D Direct
KinectFusion (2011) RGB-D Direct
Kinfu Large Scale RGB-D Direct
DTAM (2011) Mono Direct
GoogleCartographer(2016) Laser/LIDAR
GoogleTango RGB-D feature
Hector-SLAM 2D Laser
2016/12/13 29
Kintinuous
? Source
https://github.com/mp3guy/Kintinuous
? Hardware requirement
? CPUIntel i5+
? GPUNVidia 1TFLOPS+
? OS
Ubuntu 14.04、15.04、16.04
? Dependency(main)
OpenGL、CUDA7.0+、OpenNI2、Eigen、
Boost、OpenCV、DBoW2、iSAM
2016/12/13 30
Kintinuous
? Kintinuous improve 3 main problems of KinectFusion
1. Restriction to a fixed small area in space
2. Reliance on geometric information alone for camera pose estimation
3. no means of explicitly incorporating loop closures
? Disadvantage of Kintinuous
1. Need GPU
2. Strip loop closure, incapable of large amount of loop closure
3. Only support ASUS Xtion pro live camera now, can¨t use Microsoft Kinect
2016/12/13 31
ElasticFusion
? Source
https://github.com/mp3guy/ElasticFusion
? Hardware requirement
? CPUIntel i7+
? GPUNVidia 3.5TFLOPS+
? OS
? Ubuntu 14.04、15.04、16.04
? Windows7/10with Visual Studio2013
Update5
? Dependency(main)
OpenGL、CUDA7.0+、OpenNI2、Eigen
2016/12/13 32
ElasticFusion
? ElasticFusion improve two extreme of furmer SLAM systems C
Extreme loopy (MonoSLAM [1], KinectFusion [2])or small amount of
loop (McDonald et al. [3] or Whelan et al. [4])
? Therefore ElasticFusion can handle space of room size, hand-held
camera filming same object that formed multiple loop.
? Disadvantage of ElasticFusion
1. System is not yet optimized, work in restricted space
2. Could not reconstruct map correctly when connected suface information is
not enough.
[1] A. J. Davison, N. D. Molton, I. Reid, and O. Stasse., ^MonoSLAM: Real-Time Single Camera SLAM., ̄ IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pp.
1052-1067, 2007.
[2] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon., ^KinectFusion: Real-Time Dense Surface Mapping and
Tracking., ̄ In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), 2011.
[3] J. B. McDonald, M. Kaess, C. Cadena, J. Neira, and J. J., ^Real-time 6-DOF multi-session visual SLAM, ̄ Robotics and Autonomous Systems, pp. 1144-1158, 2013.
[4] T. Whelan, M. Kaess, H. Johannsson, M. F. Fallon, J. J. Leonard, and J. B. McDonald., ^Real-time large scale dense RGB-D SLAM with volumetric fusion., ̄ International Journal of
Robotics Research (IJRR), pp. 34(4-5):598-626, 2015.
2016/12/13 33
ORB-SLAM2
? Source
https://github.com/raulmur/ORB_SLAM2
? Hardware requirement
? CPUIntel i7+
? OS
Ubuntu 14.04、15.04、16.04
? Dependency(main)
OpenCV、Eigen3、DBoW2、g2o
2016/12/13 34
ORB-SLAM2
? Based on sparse feature point
? Input: Mono, stereo, RGB-D camera
? No need of GPU
? Still under maintenance, good for future development
? Disadvantage of ORB-SLAM2
1. Stereo and RGB-D application is not good enough(no point cloud)
2. Spend time to load dictionary
3. Frame rate <=10Hz, Microsoft Kinect2 qHD(950x540), ThinkPad T450
4. Not yet support map saving or loading
2016/12/13 35
LSD-SLAM
? Source
http://github.com/tum-vision/lsd_slam
? Hardware requirement
? CPUCould run on mobile phone
? OS
Ubuntu 14.04、15.04、16.04
2016/12/13 36
LSD-SLAM
? Monocular camera
? Semi-dense depth map
? High computational efficiency (even could run on mobile phone)
? Capable of dealing with multiple loop and large scene
? Disadvantage of LSD-SLAM
1. Sensitive to lighting (direct method)
2. Localization error is 5~10 times of ORB-SLAM
3. Smooth camera movement assumption => matching will fail with the
camera moving too fast.
4. Assume no moving object
2016/12/13 37
Google projects
2016/12/13 38
Tango Cartographer
Applications
2016/12/13 39
Applications
? Robot
? Navigation
? Object picking
? Moving in complex environment
? Consumer
? Device could sensor environment
? VR/AR applications
2016/12/13 40
Google Tango
2016/12/13 41

More Related Content

3D SLAM introcution& current status

  • 1. 3D SLAM introcution & current status Jan. 2017 Jacky Liu
  • 2. Contents 1. What is SLAM 2. Sensor 3. Open-source SLAM projects 4. Applications 2016/12/13 2
  • 3. What is SLAM SLAM framework Visual odometry Optimization Loop closure Map reconstruction SLAM pre-processing 2016/12/13 3
  • 4. What is SLAM ?SLAM (Simultaneous Localization and Mapping) 2016/12/13 4
  • 8. Sensor and 3D imaging tech 2016/12/13 8
  • 9. Sensor Laser ? Accurate ? Fast ? Long research history ? Heavy ? Expensive ? Ex. SICK, Velodyne, Rplidar Camera ? Light-weight ? Cheap ? Rich information ? High computational complexity ? Strong assumption to env. ? Easy affect by noise ? Ex. Mono, Stereo, RGB-D 2016/12/13 9
  • 10. 3D imaging tech. ? Monocular vision ? Stereo vision ? structured-light ? Time-of-Flight(TOF) 2016/12/13 10
  • 11. 3D imaging tech. Mono Stereo Structed-light TOF Software complex complex normal simple Hardware cost low low high normal Response time normal normal slow fast DepthAccuracy low low High normal Low lighting bad bad good Good Bright light Bad good bad good Distance Not limited Not limited Midian Midian Adv. Low cost x High accuracy Robust to env. light Dis. Scale uncertainty Computational complex Slow response Limited detection distance 2016/12/13 11
  • 13. Visual Odometry ? According to two sequential images, estimate camera ego-motion ? Input: Point in space project to two camera ? Output: ego-motion of two camera ? Monocular: Only pixel related position, no depth info. ? Stereo、RGB-DGet depth info directly ? Dimensions ? 2D-2DEstimate motion by to set of pixel Epipolar geometry ? 3D-2DEstimate motion by known space position and projection position. PnP ? 3D-3DEstimate motion by two set of known space point ICP 2016/12/13 13
  • 14. Visual Odometry ? Problems of monocular camera ? Unsure about scale, need to give initialization parameters ? Can not estimate motion when doing pure rotation 2016/12/13 14
  • 15. Visual Odometry C Feature/Direct 2016/12/13 15
  • 16. Visual Odometry C Feature (Main stream) ? Feature ? SIFT、ORB ? Key-point、Descriptor ? Algorithm procedure 1. Get feature point and descriptor from image 2. Matching current image with last image 3. Minimize projection error, estimate ego-motion of camera(PnP、ICP) ? Disadvantage ? Feature extraction is time consume ? Feature extraction could fail ? Error matching 2016/12/13 16
  • 17. Visual Odometry - Direct ? Photometric invariant assumption ? Sensitive to change of lighting ? No feature extraction ? Current system ? Sparse directSVO-SLAM ? Semi-dense directLSD-SLAM ? Dense directDVO-SLAM 2016/12/13 17
  • 18. Visual Odometry Feature Direct Tracking Feature descriptor (100-1000 corner or surface) pixel Reconstruction Corner Whole image Comp. Complex Low Sparse C Low Dense - High Inconsistence model robustness Yes No History 20 years+ 2012~ Outliers Robust Hard to remove 2016/12/13 18
  • 19. Visual Odometry - Conclusion ? Visual Odometry ? Matching pixel of feature point ? According matching result calculate camera ego-motion ? Estimate pixel for feature point position in global map ? Imperfect ? Result is noisy => Global optimization ? Accumulate error => Loop closure ? Loss(Camera moving too fast or been blocked) => Re-localization 2016/12/13 19
  • 21. Optimization ? Two major methods ? FilterKalman Filter, EKF, PF, RBPF, UKF ? Non-linear optimizationGraph optimization , Factor graph ? Graph optimization software packages ? g2o、iSAM 2016/12/13 21
  • 23. Loop closure ? Recognize visited location ? Error of visual odometry will gradually accumulate ? Use revisit clue to fix error 2016/12/13 23
  • 24. Loop closure ? Bag-of-Words ? Process ? Separate nose, eyes, mouse ? Build dictionary ? Face = eye*2 + nose*1 + mouse*1 ? Feature => Words 2016/12/13 24
  • 25. Loop closure ? Advantage of visual SLAM ? Compare to Laser, visual SLAM system has richer information, which could increase accuracy. False Positive False Negative 2016/12/13 25
  • 27. Map reconstruction ? Different purpose, different Map ? Metric map ? Topological map ? Others ? Purpose ? Navigation, localization, interaction, recognition, viewing ? Good map ? Accurate, fast 2016/12/13 27
  • 29. Open-source SLAM projects 3D imaging tech Pre-processing Kintinuous(2015) RGB-D Direct ElasticFusion(2015) RGB-D Direct RGBD-SLAM-V2 RGB-D feature ORB-SLAM (2016) Mono/Stereo/RGB-D feature LSD-SLAM (2015) Mono/Stereo Direct SVO (2014) Mono feature RTAB-Map(2013) RGB-D Feature (SIFT) DVO-SLAM (2013) RGB-D Direct KinectFusion (2011) RGB-D Direct Kinfu Large Scale RGB-D Direct DTAM (2011) Mono Direct GoogleCartographer(2016) Laser/LIDAR GoogleTango RGB-D feature Hector-SLAM 2D Laser 2016/12/13 29
  • 30. Kintinuous ? Source https://github.com/mp3guy/Kintinuous ? Hardware requirement ? CPUIntel i5+ ? GPUNVidia 1TFLOPS+ ? OS Ubuntu 14.04、15.04、16.04 ? Dependency(main) OpenGL、CUDA7.0+、OpenNI2、Eigen、 Boost、OpenCV、DBoW2、iSAM 2016/12/13 30
  • 31. Kintinuous ? Kintinuous improve 3 main problems of KinectFusion 1. Restriction to a fixed small area in space 2. Reliance on geometric information alone for camera pose estimation 3. no means of explicitly incorporating loop closures ? Disadvantage of Kintinuous 1. Need GPU 2. Strip loop closure, incapable of large amount of loop closure 3. Only support ASUS Xtion pro live camera now, can¨t use Microsoft Kinect 2016/12/13 31
  • 32. ElasticFusion ? Source https://github.com/mp3guy/ElasticFusion ? Hardware requirement ? CPUIntel i7+ ? GPUNVidia 3.5TFLOPS+ ? OS ? Ubuntu 14.04、15.04、16.04 ? Windows7/10with Visual Studio2013 Update5 ? Dependency(main) OpenGL、CUDA7.0+、OpenNI2、Eigen 2016/12/13 32
  • 33. ElasticFusion ? ElasticFusion improve two extreme of furmer SLAM systems C Extreme loopy (MonoSLAM [1], KinectFusion [2])or small amount of loop (McDonald et al. [3] or Whelan et al. [4]) ? Therefore ElasticFusion can handle space of room size, hand-held camera filming same object that formed multiple loop. ? Disadvantage of ElasticFusion 1. System is not yet optimized, work in restricted space 2. Could not reconstruct map correctly when connected suface information is not enough. [1] A. J. Davison, N. D. Molton, I. Reid, and O. Stasse., ^MonoSLAM: Real-Time Single Camera SLAM., ̄ IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), pp. 1052-1067, 2007. [2] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim, A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgibbon., ^KinectFusion: Real-Time Dense Surface Mapping and Tracking., ̄ In Proceedings of the International Symposium on Mixed and Augmented Reality (ISMAR), 2011. [3] J. B. McDonald, M. Kaess, C. Cadena, J. Neira, and J. J., ^Real-time 6-DOF multi-session visual SLAM, ̄ Robotics and Autonomous Systems, pp. 1144-1158, 2013. [4] T. Whelan, M. Kaess, H. Johannsson, M. F. Fallon, J. J. Leonard, and J. B. McDonald., ^Real-time large scale dense RGB-D SLAM with volumetric fusion., ̄ International Journal of Robotics Research (IJRR), pp. 34(4-5):598-626, 2015. 2016/12/13 33
  • 34. ORB-SLAM2 ? Source https://github.com/raulmur/ORB_SLAM2 ? Hardware requirement ? CPUIntel i7+ ? OS Ubuntu 14.04、15.04、16.04 ? Dependency(main) OpenCV、Eigen3、DBoW2、g2o 2016/12/13 34
  • 35. ORB-SLAM2 ? Based on sparse feature point ? Input: Mono, stereo, RGB-D camera ? No need of GPU ? Still under maintenance, good for future development ? Disadvantage of ORB-SLAM2 1. Stereo and RGB-D application is not good enough(no point cloud) 2. Spend time to load dictionary 3. Frame rate <=10Hz, Microsoft Kinect2 qHD(950x540), ThinkPad T450 4. Not yet support map saving or loading 2016/12/13 35
  • 36. LSD-SLAM ? Source http://github.com/tum-vision/lsd_slam ? Hardware requirement ? CPUCould run on mobile phone ? OS Ubuntu 14.04、15.04、16.04 2016/12/13 36
  • 37. LSD-SLAM ? Monocular camera ? Semi-dense depth map ? High computational efficiency (even could run on mobile phone) ? Capable of dealing with multiple loop and large scene ? Disadvantage of LSD-SLAM 1. Sensitive to lighting (direct method) 2. Localization error is 5~10 times of ORB-SLAM 3. Smooth camera movement assumption => matching will fail with the camera moving too fast. 4. Assume no moving object 2016/12/13 37
  • 40. Applications ? Robot ? Navigation ? Object picking ? Moving in complex environment ? Consumer ? Device could sensor environment ? VR/AR applications 2016/12/13 40