This talk presented at IROS 2012, Portugal, discusses a method to generate an overhead semantic map, akin to google maps but with associated object class labels. We run experiment on tens of kilometres of data.
Automatic Dense Semantic Mapping From Visual Street-level Imagery
1. Automatic Dense Semantic Mapping From
Visual Street-level Imagery
Sunando Sengupta[1]
, Paul Sturgess[1]
Lubor Ladicky[2]
, Phillip H.S. Torr[1]
Oxford Brookes University
Visual geometry group, Oxford University 1
2. Dense Semantic Map
Generate an overhead view of an urban region.
Label every pixel in the Map View is associated with an
object class label
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post 2
3. Dense Semantic Map
Street images captured inexpensively from vehicle with
multiple mounted camera[1]
[1] Yotta. DCL, Yotta dcl case studies, Available:
4. Semantic Mapping Framework
Semantic mapping framework comprises of two stages
Street level
Images acquisition
5. Semantic Mapping Framework
Semantic mapping framework comprises of two stages
Semantic Image Segmentation at street level.
Street level
Images acquisition
6. Semantic Mapping Framework
Semantic mapping framework comprises of two stages
Semantic Image Segmentation at street level.
Ground Plane Labelling at a global level.
One of the first attempts to do overhead mapping from
street level images.
Street level
Images acquisition
Ground plane
7. Semantic Image Segmentation
Label every pixel in the image with an object class
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
Input Output
Raw Image Labelled Image
Object Class Labels
Semantic Image Segmentation
We use Conditional Random Field Framework (CRF)
Final SegmentationInput Image
Each pixel is a node in a grid graph G = (V,E).
Each node is a random variable x taking a label from label set.
9. Semantic Image Segmentation - CRF
Total energy
Optimal labelling given as
xxxE )(),()()(
10. Semantic Image Segmentation - CRF
Total energy E = Epix + Epair + Eregion
Epix - Model individual pixels cost of taking a label.
Computed via the dense boosting approach
Multi feature variant of texton boost[1]
Car 0.2
Road 0.3
10[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, Associative hierarchical crfs for
object class image segmentation, in ICCV, 2009.
11. Semantic Image Segmentation - CRF
Total energy E = Epix + Epair + Eregion
Epair- Model each pixel neighbourhood interactions.
Encourages label consistency in adjacent pixels
Sensitive to edges in images.
Contrast sensitive Potts model
xi xj
12. Semantic Image Segmentation - CRF
Total energy E = Epix + Epair + Eregion
Eregion - Model behaviour of a group of pixels.
Classify a region
Encourages all the pixels in a region
to take the same label.
Group of pixels given by a
multiple meanshift segmentations
Car 0.3
Road 0.1
13. Semantic Image Segmentation
Solved using alpha-expansion algorithm[1]
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
Input Image Road Expansion
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99
14. Semantic Image Segmentation
Solved using alpha-expansion algorithm[1]
Input Image Building Expansion
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99
15. Semantic Image Segmentation
Solved using alpha-expansion algorithm[1]
Input Image Sky Expansion
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99
16. Semantic Image Segmentation
Solved using alpha-expansion algorithm[1]
Input Image Pavement Expansion
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99
17. Semantic Image Segmentation
Solved using alpha-expansion algorithm[1]
Input Image Final solution
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99
18. Ground Plane Labelling
Combine many labellings from street level imagery.
Labelled Ground PlaneStreet Level
19. Ground Plane CRF
A CRF defined over the ground plane.
Each ground plane pixel (zi) is a random variable taking a
label from the label set.
Energy for ground plane crf is
EEZE +=)(
21. Ground Plane Pixel Cost
Homography Road Pavement Post/Pole
A ground plane region is estimated.
22. K
Ground Plane Pixel Cost
Homography Road Pavement Post/Pole
Each point in the image projects to a unique point on the
ground plane.
Creating a homography
23. K
Ground Plane Pixel Cost
Ground plane
Pixel histograms
Homography Road Pavement Post/Pole
The image labelling is mapped to the ground plane
via the homography.
24. Labels projected from many views are combined in a histogram.
The normalised histogram gives the na誰ve probability of the ground plane pixel
taking a label.
Ground Plane Pixel Cost
Z Ground plane
Pixel histogramsHomography Road Pavement Post/Pole
25. Ground Plane Pixel Cost
Z Ground plane
Pixel histogramsHomography Road Pavement Post/Pole
Labels projected from many views are combined in a histogram.
The normalised histogram gives the na誰ve probability of the ground plane pixel
taking a label.
26. Ground Plane labelling
Histogram is built for every ground plane pixel giving Eg
Pairwise cost (Eg
pair) added to induce smoothness
Contrast sensitive potts model
33. Dataset
Subset of the images captured by the van
14.8 km of track, 8000 images from each camera.
Pixel-level labelled ground truth images. Dataset
13 object categories
Training - 44 images, testing - 42 images.
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
34. SIS Results
Input Images, output of our image level CRF, ground truths.
Used Automatic Labelling environment[1]
[1] The Automatic Labelling Environment, L Ladicky, PHS Torr. Code available
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post
Ground Truth
36. Ground plane Map Evaluation
Street Images
Map results
Ground Truth
We back-project the ground plane map into image domain
and evaluate the results.
Global pixel accuracy of 86%
38. Conclusions
Presented a method to generate
overhead view semantic
Experiments on large tracks
(~15km) which can be scaled up
to country wide mapping
Dataset available[1].
39. Future Work
Oxford Brookes Vision group
Oxford Brookes University
Perform a 3D street level semantic mapping and
Add detailed street level information like signs,
information boards etc.
Thank you!!!
41. Ground Plane Pixel Cost
Using single view will create a shadow effect for objects
violating flat world assumption and wrong label estimate
Single view
Homography Road Pavement Post/Pole
Editor's Notes
