�ݺ�ߣ

Automatic Dense Semantic Mapping From
Visual Street-level Imagery
Sunando Sengupta[1]
, Paul Sturgess[1]
,
Lubor Ladicky[2]
, Phillip H.S. Torr[1]
[1]
Oxford Brookes University
[2]
Visual geometry group, Oxford University
http://cms.brookes.ac.uk/research/visiongroup/index.php 1

Dense Semantic Map
• Generate an overhead view of an urban region.
• Label every pixel in the Map View is associated with an
object class label
BuildingRoadTreeVegetation FenceSignage
SkyPavement Car Pedestrian Bollard Shop Sign Post 2

Dense Semantic Map
• Street images captured inexpensively from vehicle with
multiple mounted camera[1]
.
3
[1] Yotta. DCL, “Yotta dcl case studies,” Available: http://www.yottadcl.com/surveys/case-
studies/

Semantic Mapping Framework
• Semantic mapping framework comprises of two stages
Street level
Images acquisition
4

– Semantic Image Segmentation at street level.
Street level
Images acquisition
Image
Segmentation
5

– Semantic Image Segmentation at street level.
– Ground Plane Labelling at a global level.
• One of the first attempts to do overhead mapping from
street level images.
Street level
Images acquisition
Image
Segmentation
Ground plane
labelling
6

Semantic Image Segmentation
Label every pixel in the image with an object class
SkyPavement Car Pedestrian Bollard Shop Sign Post
Input Output
Raw Image Labelled Image
Automatic
Labeller
Object Class Labels
7

CRFCRF
constructionconstruction
• We use Conditional Random Field Framework (CRF)
Final SegmentationInput Image
8
• Each pixel is a node in a grid graph G = (V,E).
• Each node is a random variable x taking a label from label set.
X

Semantic Image Segmentation - CRF
• Total energy
• Optimal labelling given as
9
∑∑∑ ∈∈∈∈
++=
Cc
cc
NjVi
jiij
Vi
ii
i
xxxE )(),()()(
,
xx ψψψ
Epix
Epair
Eregion

• Total energy E = Epix + Epair + Eregion
• Epix - Model individual pixel’s cost of taking a label.
– Computed via the dense boosting approach
– Multi feature variant of texton boost[1]
x
Car 0.2
Road 0.3
10[1] L. Ladicky, C. Russell, P. Kohli, and P. H. Torr, “Associative hierarchical crfs for
object class image segmentation,” in ICCV, 2009.

• Epair- Model each pixel neighbourhood interactions.
– Encourages label consistency in adjacent pixels
– Sensitive to edges in images.
– Contrast sensitive Potts model
xi xj
Car
Road
0
g(i,j)
Car
Road
11
Epair

• Eregion - Model behaviour of a group of pixels.
– Classify a region
– Encourages all the pixels in a region
to take the same label.
– Group of pixels given by a
multiple meanshift segmentations
c
Car 0.3
Road 0.1
12

• Solved using alpha-expansion algorithm[1]
13
Input Image Road Expansion
[1] Fast Approximate Energy Minimization via Graph Cuts. Yuri Boykov et al. ICCV 99

14
Input Image Building Expansion

15
Input Image Sky Expansion

16
Input Image Pavement Expansion

17
Input Image Final solution

Ground Plane Labelling
• Combine many labellings from street level imagery.
Automatic
Labeller
Output
Labelled Ground PlaneStreet Level
labellings
Input
18

Ground Plane CRF
• A CRF defined over the ground plane.
• Each ground plane pixel (zi) is a random variable taking a
label from the label set.
• Energy for ground plane crf is
Z
19
g
pair
g
pix
g
EEZE +=)(

Ground Plane Pixel Cost
K
X
Z
• We assume a flat world.
20

Homography Road Pavement Post/Pole
K
X
Z
• A ground plane region is estimated.
21

K
X
Z
22
• Each point in the image projects to a unique point on the
ground plane.
– Creating a homography

K
X
Z
23
Ground plane
Pixel histograms
• The image labelling is mapped to the ground plane
– via the homography.

• Labels projected from many views are combined in a histogram.
• The normalised histogram gives the naïve probability of the ground plane pixel
taking a label.
24
K
X
Z Ground plane
Pixel histogramsHomography Road Pavement Post/Pole

25
K
X
Z Ground plane
Pixel histogramsHomography Road Pavement Post/Pole
• Labels projected from many views are combined in a histogram.
• The normalised histogram gives the naïve probability of the ground plane pixel
taking a label.

Ground Plane labelling
• Histogram is built for every ground plane pixel giving Eg
pix
• Pairwise cost (Eg
pair) added to induce smoothness
– Contrast sensitive potts model
Z

• Final CRF solution obtained using alpha expansion.
Void

Road expansion

Building expansion

Pavement expansion

Car expansion

Ground Plane Labelling
Final Solution

Dataset
• Subset of the images captured by the van
– 14.8 km of track, 8000 images from each camera.
• Pixel-level labelled ground truth images. Dataset
available[1].
• 13 object categories –
• Training - 44 images, testing - 42 images.
[1]http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php
33

SIS Results
• Input Images, output of our image level CRF, ground truths.
• Used Automatic Labelling environment[1]
[1] The Automatic Labelling Environment, L Ladicky, PHS Torr. Code available
http://cms.brookes.ac.uk/staff/PhilipTorr/ale.htm
34
Input
Semantic
segmentation
Ground Truth

Semantic Map Results
Semantic map of Pembroke city
35

Ground plane Map Evaluation
36
Street Images
Back-projected
Map results
Ground Truth
• We back-project the ground plane map into image domain
and evaluate the results.
• Global pixel accuracy of 86%

Conclusions
• Presented a method to generate
overhead view semantic
mapping.
• Experiments on large tracks
(~15km) which can be scaled up
to country wide mapping
• Dataset available[1].
[1] http://cms.brookes.ac.uk/research/visiongroup/projects/SemanticMap/index.php
38

Future Work
39
Oxford Brookes Vision group
Oxford Brookes University
http://cms.brookes.ac.uk/research/visiongroup/index.php
• Perform a 3D street level semantic mapping and
reconstruction.
• Add detailed street level information like signs,
information boards etc.
Thank you!!!

Automatic Dense Semantic Mapping From Visual Street-level Imagery

41
• Using single view will create a shadow effect for objects
violating flat world assumption and wrong label estimate
K
X
Z
Single view
Multi-view

�ݺ�ߣ

Automatic Dense Semantic Mapping From Visual Street-level Imagery

More Related Content

Automatic Dense Semantic Mapping From Visual Street-level Imagery

Editor's Notes