�ݺ�ߣ

Estimating the Aspect Layout of
Object Categories
Yu Xiang and Silvio Savarese
University of Michigan at Ann Arbor
{yuxiang, silvio}@eecs.umich.edu

VISION LAB

VISION LAB
2

From Felzenszwalb et al., 10 From Ramanan &
Traditional object recognition

Sminchisescu, 06
Rigid object

Body part
? Uses 2D bounding boxes

From Viola & Jones, 01 From Barinova et al., 12

Human
Face

Beyond 2D bounding boxes
? Model the 3D properties of objects
? 3D pose
? 3D part location

? More suitable for robotics, autonomous
navigation and manipulation

From Saxena et al., 08
VISION LAB
3

Our goals
Viewpoint: Azimuth 315?,
Elevation 30?, Distance 2

VISION LAB
4

Related work: joint object detection
and pose estimation

From Savarese & Fei-Fei ICCV��07 From Glasner et al. ICCV��11

Azimuth

From Liebelt et al. 08, 10
? Savarese et al. 07, 08 ? Stark et al. 10
? Ozuysal et al. 08 ? Gu & Ren. 10

Elevation
? Liebelt et al. 08, 10 ? Glasner et al. 11
? Xiao et al. 08 ? Payet & Todorovic 11
? Thomas et al. 08 ? Zia et al., 3DRR��11
? Sun et al. 09 ? Pepik et al., CVPR��12 Distance
? Su et al. 09 ? Schels et al., CVPR��12
? Arie-Nachimson & ? Xiang and Savarese, VISION LAB
5
Barsi 09 CVPR��12

Related work: 2D part-based model
Constellation Model Implicit Shape Model

From Leibe et al. ECCV��04 workshop
From Fergus et al. CVPR��03
Deformable
Part Model (DPM)

VISION LAB
6
From Felzenszwalb et al. CVPR��08

Related work: 3D part-based model

From Kushal et. al., CVPR��07
From Hoiem et. al., CVPR��07

Key-view 2
Key-view 1

Key-view 3
VISION LAB
From Chiu et. al., CVPR��07 From Sun et. al. ICCV��09 7

Our contributions
? Propose a 3D part based representation for
object categories
? Introduce the concept of aspect parts
? Jointly solve object detection, pose estimation
and aspect part localization
? Significantly improve pose estimation accuracy,
evaluate rigid part localization

Yu Xiang and Silvio Savarese. Estimating the aspect layout of object categories. In IEEE VISION LAB
8
International Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

Aspect Part
? Parts are arbitrarily defined in previous work

From Fergus et al. CVPR��03 From Felzenszwalb et al., 2010.

? Introduce parts with geometrical and
topological properties, called aspect parts

VISION LAB
9

Aspect Part
? Our definition: a portion of the object whose
3D surface is approximately either entirely
visible from the observer or entirely non-
visible (i.e., self-occluded)

A
VISION LAB
10

Aspect Part

A
VISION LAB
11

Aspect Part

A
VISION LAB
12

Aspect Part

B

VISION LAB
13

Aspect Part

B

VISION LAB
14

Aspect Part

B

VISION LAB
15

Aspect Part

AB
VISION LAB
16

Aspect Part
? Examples

Bed Car Sofa

VISION LAB
17

Aspect Part

? Related to aspect graph [1]
? Related to discriminative aspect, Farhadi et al, 07

Figure from Barb Culter, MIT

[1] J. J. Koenderink and A. J. Doorn. The internal representation of solid shape with VISION LAB
18
respect to vision. Biological Cybernetics, 1979.

Aspect Part

? Related to object affordance or functional part

Seat

VISION LAB
19

Aspect Part

? Related to geometrical attributes of object

Horizontal surface

Vertical surface

VISION LAB
20

Aspect Part

? Related to scene layout estimation

From Hedau, Hoiem & Forsyth, ECCV��10

VISION LAB
21

Aspect Part

? Enables the modeling of object-human interactions

From Gupta et al., CVPR��11

VISION LAB
22

Outline

? Aspect layout model
? Maximal margin parameter estimation
? Model inference
? Experiments
? Conclusion

VISION LAB
23

Input & output

? Input
�C 2D image I
2D part center 2D part shape
? Output coordinates
�C Object label Y ?{?1, ?1}
�C Part configuration in 2D C ? (c1 , , cn ) ci ? ( xi , yi , s i )

VISION LAB
24

Aspect Layout Model

? 3D Object O ? (o1 , , on )

VISION LAB
25

Aspect Layout Model

? Viewpoint representation V=(a,e,d)
? 2D part shape from 3D
si

d

e
a
VISION LAB

Azimuth, elevation and distance 26

Aspect Layout Model

? Model the posterior distribution
P(Y , C | I ) C ? (c , , c ), c
1 n i ? ( xi , yi , s i )

? P(Y , L, O,V | I )

2D part center
object label 3D object viewpoint
coordinates

L ? (l1 , , l n ), li ? ( xi , yi )
VISION LAB
27

Aspect Layout Model

? Conditional Random Field (CRF) [1]

P(Y , L, O,V | I ) ? exp( E (Y , L, O,V , I ))

? Graph structure of the CRF

[1] J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: Probabilistic VISION LAB
28
models for segmenting and labeling sequence data. In ICML, 2001.

Aspect Layout Model

r1 r2 r3 r4 root nodes

p1 p2 p3 part nodes

VISION LAB
29

Aspect Layout Model


p1 p2 p3 part nodes

VISION LAB
30

Aspect Layout Model


p1 p2 p3 part nodes

VISION LAB
31

Aspect Layout Model

? Energy function

E (Y , L, O,V , I )
?? V1 (l i , O,V , I ) ? ? V2 (l i , l j , O, V ), if Y ? ?1
? i
?? (i , j )

?
? 0, if Y ? ?1
unary potential pairwise potential

VISION LAB
32

Aspect Layout Model

? Viewpoint invariant unary potential
? Models part appearances
part template feature vector

?wT ? (l i , O,V , I ), if unoccluded
? i
V1 (l i , O,V , I ) ? ?
?
? ? i , if occluded

self-occlusion weight
VISION LAB
33

Aspect Layout Model
? Rectified HOG features
H

HOG features
original image rectified image

H

ALM only needs one template for each part VISION LAB

across all the viewpoints. 34

Aspect Layout Model
? Pairwise potential
? Constrains 2D relative locations of parts

2D projection

3D world
2D observation

VISION LAB
35

Aspect Layout Model

? Pairwise potential
V2 (l i , l j , O,V )
? ? wx ( xi ? x j ? dij ,O ,V cos(?ij ,O ,V )) 2 ? wy ( yi ? y j ? dij ,O ,V sin(?ij ,O ,V )) 2

( xi , yi )
(x j , y j )

?ij ,O ,V

dij ,O ,V
VISION LAB
36

Aspect Layout Model

? Energy function

E (Y , L, O,V , I | ? ) ? ? ?(Y , L, O,V , I )
T

�C Parameters

? ? (wi ,?i , ?i ,?i , wx , wy )
�C Linear energy function

VISION LAB
37

Aspect Layout Model

? Maximal margin parameter estimation
�C Energy based learning [1]: find an energy function
which outputs the maximal energy value for the
correct label configuration of an object
�C Training set
T ? {( I , Y , L , O ,V ), t ? 1,
t t t t t
N}
�C Structural SVM optimization [2]

[1] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato and F. J. Huang. A tutorial on energy-
based learning. In Predicting Structured Data, MIT Press, 2006.
VISION LAB
[2] I. Tsochantaridis, T. Hofmann, T. Joachims and Y. Altun. Support vector machine 38
learning for interdependent and structured output spaces. In ICML, 2004.

Aspect Layout Model

? Model inference
(Y * , L* , O* ,V * ) ? arg max E (Y , L, O,V , I | ? )
Y , L ,O ,V

�C Run Belief Propagation (BP) [1] for each
combination of O and V to obtain E (Y ? ?1, L* , O* ,V * )
�C Recall the graph structure
�C Y * ? ?1 if E (Y ? ?1, L* , O* ,V * ) ? ? (detection
threshold)

[1] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understanding belief propagation and its VISION LAB
39
generalizations. In Exploring artificial intelligence in the new millennium, 2003.

Experiments

? Datasets
�C 3DObject dataset [1]: 10 categories, 10 instances
each category
�C VOC 2006 Car dataset [2]: 921 car images
�C EPFL Car dataset [3]: 2299 images, 20 instances
�C Our new ImageNet dataset [4]: Bed (400), Chair
(770), sofa (800), table (670)
[1] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation.
In ICCV, 2007.
[2] M. Everingham, A. Zisserman, I. Williams, and L. Van Gool. The PASCAL Visual Object Classes
Challenge 2006 Results.
[3] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object
VISION LAB
localization. In CVPR, 2009. 40
[4] http://www.image-net.org.

Experiments

? Datasets
�C 3DObject dataset [1]: 10 categories, 10 instances
each category
�C VOC 2006 Car dataset [2]: 921 car images
�C EPFL Car dataset [3]: 2299 images, 20 instances
�C Our new ImageNet dataset [4]: Bed (400), Chair
(770), sofa (800), table (670)
[1] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation.
In ICCV, 2007.
[2] M. Everingham, A. Zisserman, I. Williams, and L. Van Gool. The PASCAL Visual Object Classes
Challenge 2006 Results.
[3] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object
VISION LAB
localization. In CVPR, 2009. 41
[4] http://www.image-net.org.

Experiments
? Evaluation measures
�C Detection: Average Precision (AP)

�C Viewpoint: average viewpoint accuracy (the
average of the elements on the main diagonal of
the viewpoint confusion matrix)

�C Part localization: Percentage of Correct Parts
(PCP)-recall curve

VISION LAB
42

Experiments
? 3D models

VISION LAB
43

Experiments
? Average results for eight categories on the
3DObject dataset (8 views)

Method ALM [1] [2]
Viewpoint 80.7 74.2 57.2
Detection 81.8 n/a n/a

[1] C. Gu and X. Ren. Discriminative mixture-of-templates for viewpoint
classification. In ECCV, 2010.
[2] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and VISION LAB
44
pose estimation. In ICCV, 2007.

Experiments
? Results on the Bicycle Category in the
3DObject dataset
Method ALM [1] [2]
Viewpoint 91.4 80.8 75.0
Detection 93.0 n/a 69.8

[1] N. Payet and S. Todorovic. From contours to 3d object detection and pose
estimation. In ICCV, 2011.
[2] J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric VISION LAB
45
model. In CVPR, 2010.

Experiments
? Results on the Car Category in the 3DObject
dataset
Method ALM [1] [2] [3] [4] [5] [6]
Viewpoint 93.4 85.4 85.3 81 70 67 48.5
Detection 98.4 n/a 99.2 89.9 76.7 55.3 n/a

[1] N. Payet and S. Todorovic. From contours to 3d object detection and pose estimation. In ICCV, 2011.
[2] D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich. Viewpoint-aware object detection and pose estimation.
In ICCV, 2011.
[3] M. Stark, M. Goesele, and B. Schiele. Back to the future: Learning shape models from 3d cad data. In BMVC, 2010.
[4] J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric model. In CVPR, 2010.
[5] H. Su, M. Sun, L. Fei-Fei, and S. Savarese. Learning a dense multiview representation for detection, viewpoint VISION LAB
classification and synthesis of object categories. In ICCV, 2009. 46
[6] M. Arie-Nachimson and R. Basri. Constructing implicit 3d shape models for pose estimation. In ICCV, 2009.

Experiments

? Detailed average viewpoint accuracy on the
3DObject dataset

Category Bicycle Car Cellphone Iron Mouse Shoe Stapler Toaster

DPM [1] 88.4 85.0 62.1 82.7 40.0 71.7 58.5 55.0

ALM Root 92.5 89.2 83.4 86.0 58.7 82.7 69.2 59.6

ALM Full 91.4 93.4 85.0 84.6 66.5 87.0 72.8 65.2

[1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection VISION LAB
47
with discriminatively trained part-based models. TPAMI, 2010.

Experiments

? Effect of training set sizes for viewpoint

48

Experiments

? Part localization on the 3DObject dataset

VISION LAB
49

Experiments

VISION LAB
50

Experiments

VISION LAB
51

Experiments
? Average results on the ImageNet dataset

Method ALM Full ALM Root DPM [1]
3 views 86.5 79.0 84.6
7 views 63.4 34.0 49.5

52

Experiments

VISION LAB
53

Experiments

VISION LAB
54

Conclusion
? A new Aspect Layout Model (ALM) for object
detection, pose estimation and aspect part
localization.
? ALM is capable of handling large number of
views, locating aspect parts and reasoning
self-occlusion.
? ALM can be useful for estimating functional
parts or object affordances.
? Our code and datasets are available online.
VISION LAB
55

Acknowledgments

Thank you!

VISION LAB
56

�ݺ�ߣ

Xiang midwest workshop_09212012

More Related Content

Xiang midwest workshop_09212012