This document proposes a 3D part-based representation called an aspect layout model to jointly solve object detection, pose estimation, and aspect part localization. The model represents an object using its 3D parts and introduces the concept of aspect parts, which are portions of an object that are either entirely visible or occluded from a given viewpoint. The model takes an image as input and outputs the object label, 2D part configurations indicating part locations and shapes, and estimates the viewpoint in terms of azimuth, elevation and distance. It models the posterior distribution over object labels, part configurations and viewpoints to solve multiple tasks simultaneously.
1 of 56
Download to read offline
More Related Content
Xiang midwest workshop_09212012
1. Estimating the Aspect Layout of
Object Categories
Yu Xiang and Silvio Savarese
University of Michigan at Ann Arbor
{yuxiang, silvio}@eecs.umich.edu
VISION LAB
2. VISION LAB
2
From Felzenszwalb et al., 10 From Ramanan &
Traditional object recognition
Sminchisescu, 06
Rigid object
Body part
? Uses 2D bounding boxes
From Viola & Jones, 01 From Barinova et al., 12
Human
Face
3. Beyond 2D bounding boxes
? Model the 3D properties of objects
? 3D pose
? 3D part location
? More suitable for robotics, autonomous
navigation and manipulation
From Saxena et al., 08
VISION LAB
3
5. Related work: joint object detection
and pose estimation
From Savarese & Fei-Fei ICCV¡¯07 From Glasner et al. ICCV¡¯11
Azimuth
From Liebelt et al. 08, 10
? Savarese et al. 07, 08 ? Stark et al. 10
? Ozuysal et al. 08 ? Gu & Ren. 10
Elevation
? Liebelt et al. 08, 10 ? Glasner et al. 11
? Xiao et al. 08 ? Payet & Todorovic 11
? Thomas et al. 08 ? Zia et al., 3DRR¡¯11
? Sun et al. 09 ? Pepik et al., CVPR¡¯12 Distance
? Su et al. 09 ? Schels et al., CVPR¡¯12
? Arie-Nachimson & ? Xiang and Savarese, VISION LAB
5
Barsi 09 CVPR¡¯12
6. Related work: 2D part-based model
Constellation Model Implicit Shape Model
From Leibe et al. ECCV¡¯04 workshop
From Fergus et al. CVPR¡¯03
Deformable
Part Model (DPM)
VISION LAB
6
From Felzenszwalb et al. CVPR¡¯08
7. Related work: 3D part-based model
From Kushal et. al., CVPR¡¯07
From Hoiem et. al., CVPR¡¯07
Key-view 2
Key-view 1
Key-view 3
VISION LAB
From Chiu et. al., CVPR¡¯07 From Sun et. al. ICCV¡¯09 7
8. Our contributions
? Propose a 3D part based representation for
object categories
? Introduce the concept of aspect parts
? Jointly solve object detection, pose estimation
and aspect part localization
? Significantly improve pose estimation accuracy,
evaluate rigid part localization
Yu Xiang and Silvio Savarese. Estimating the aspect layout of object categories. In IEEE VISION LAB
8
International Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
9. Aspect Part
? Parts are arbitrarily defined in previous work
From Fergus et al. CVPR¡¯03 From Felzenszwalb et al., 2010.
? Introduce parts with geometrical and
topological properties, called aspect parts
VISION LAB
9
10. Aspect Part
? Our definition: a portion of the object whose
3D surface is approximately either entirely
visible from the observer or entirely non-
visible (i.e., self-occluded)
A
VISION LAB
10
11. Aspect Part
? Our definition: a portion of the object whose
3D surface is approximately either entirely
visible from the observer or entirely non-
visible (i.e., self-occluded)
A
VISION LAB
11
12. Aspect Part
? Our definition: a portion of the object whose
3D surface is approximately either entirely
visible from the observer or entirely non-
visible (i.e., self-occluded)
A
VISION LAB
12
13. Aspect Part
? Our definition: a portion of the object whose
3D surface is approximately either entirely
visible from the observer or entirely non-
visible (i.e., self-occluded)
B
VISION LAB
13
14. Aspect Part
? Our definition: a portion of the object whose
3D surface is approximately either entirely
visible from the observer or entirely non-
visible (i.e., self-occluded)
B
VISION LAB
14
15. Aspect Part
? Our definition: a portion of the object whose
3D surface is approximately either entirely
visible from the observer or entirely non-
visible (i.e., self-occluded)
B
VISION LAB
15
16. Aspect Part
? Our definition: a portion of the object whose
3D surface is approximately either entirely
visible from the observer or entirely non-
visible (i.e., self-occluded)
AB
VISION LAB
16
18. Aspect Part
? Related to aspect graph [1]
? Related to discriminative aspect, Farhadi et al, 07
Figure from Barb Culter, MIT
[1] J. J. Koenderink and A. J. Doorn. The internal representation of solid shape with VISION LAB
18
respect to vision. Biological Cybernetics, 1979.
24. Input & output
? Input
¨C 2D image I
2D part center 2D part shape
? Output coordinates
¨C Object label Y ?{?1, ?1}
¨C Part configuration in 2D C ? (c1 , , cn ) ci ? ( xi , yi , s i )
VISION LAB
24
26. Aspect Layout Model
? Viewpoint representation V=(a,e,d)
? 2D part shape from 3D
si
d
e
a
VISION LAB
Azimuth, elevation and distance 26
27. Aspect Layout Model
? Model the posterior distribution
P(Y , C | I ) C ? (c , , c ), c
1 n i ? ( xi , yi , s i )
? P(Y , L, O,V | I )
2D part center
object label 3D object viewpoint
coordinates
L ? (l1 , , l n ), li ? ( xi , yi )
VISION LAB
27
28. Aspect Layout Model
? Conditional Random Field (CRF) [1]
P(Y , L, O,V | I ) ? exp( E (Y , L, O,V , I ))
? Graph structure of the CRF
[1] J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: Probabilistic VISION LAB
28
models for segmenting and labeling sequence data. In ICML, 2001.
29. Aspect Layout Model
r1 r2 r3 r4 root nodes
p1 p2 p3 part nodes
VISION LAB
29
30. Aspect Layout Model
r1 r2 r3 r4 root nodes
p1 p2 p3 part nodes
VISION LAB
30
31. Aspect Layout Model
r1 r2 r3 r4 root nodes
p1 p2 p3 part nodes
VISION LAB
31
32. Aspect Layout Model
? Energy function
E (Y , L, O,V , I )
?? V1 (l i , O,V , I ) ? ? V2 (l i , l j , O, V ), if Y ? ?1
? i
?? (i , j )
?
? 0, if Y ? ?1
unary potential pairwise potential
VISION LAB
32
33. Aspect Layout Model
? Viewpoint invariant unary potential
? Models part appearances
part template feature vector
?wT ? (l i , O,V , I ), if unoccluded
? i
V1 (l i , O,V , I ) ? ?
?
? ? i , if occluded
self-occlusion weight
VISION LAB
33
34. Aspect Layout Model
? Rectified HOG features
H
HOG features
original image rectified image
H
ALM only needs one template for each part VISION LAB
across all the viewpoints. 34
35. Aspect Layout Model
? Pairwise potential
? Constrains 2D relative locations of parts
2D projection
3D world
2D observation
VISION LAB
35
36. Aspect Layout Model
? Pairwise potential
V2 (l i , l j , O,V )
? ? wx ( xi ? x j ? dij ,O ,V cos(?ij ,O ,V )) 2 ? wy ( yi ? y j ? dij ,O ,V sin(?ij ,O ,V )) 2
( xi , yi )
(x j , y j )
?ij ,O ,V
dij ,O ,V
VISION LAB
36
37. Aspect Layout Model
? Energy function
E (Y , L, O,V , I | ? ) ? ? ?(Y , L, O,V , I )
T
¨C Parameters
? ? (wi ,?i , ?i ,?i , wx , wy )
¨C Linear energy function
VISION LAB
37
38. Aspect Layout Model
? Maximal margin parameter estimation
¨C Energy based learning [1]: find an energy function
which outputs the maximal energy value for the
correct label configuration of an object
¨C Training set
T ? {( I , Y , L , O ,V ), t ? 1,
t t t t t
N}
¨C Structural SVM optimization [2]
[1] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato and F. J. Huang. A tutorial on energy-
based learning. In Predicting Structured Data, MIT Press, 2006.
VISION LAB
[2] I. Tsochantaridis, T. Hofmann, T. Joachims and Y. Altun. Support vector machine 38
learning for interdependent and structured output spaces. In ICML, 2004.
39. Aspect Layout Model
? Model inference
(Y * , L* , O* ,V * ) ? arg max E (Y , L, O,V , I | ? )
Y , L ,O ,V
¨C Run Belief Propagation (BP) [1] for each
combination of O and V to obtain E (Y ? ?1, L* , O* ,V * )
¨C Recall the graph structure
¨C Y * ? ?1 if E (Y ? ?1, L* , O* ,V * ) ? ? (detection
threshold)
[1] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understanding belief propagation and its VISION LAB
39
generalizations. In Exploring artificial intelligence in the new millennium, 2003.
40. Experiments
? Datasets
¨C 3DObject dataset [1]: 10 categories, 10 instances
each category
¨C VOC 2006 Car dataset [2]: 921 car images
¨C EPFL Car dataset [3]: 2299 images, 20 instances
¨C Our new ImageNet dataset [4]: Bed (400), Chair
(770), sofa (800), table (670)
[1] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation.
In ICCV, 2007.
[2] M. Everingham, A. Zisserman, I. Williams, and L. Van Gool. The PASCAL Visual Object Classes
Challenge 2006 Results.
[3] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object
VISION LAB
localization. In CVPR, 2009. 40
[4] http://www.image-net.org.
41. Experiments
? Datasets
¨C 3DObject dataset [1]: 10 categories, 10 instances
each category
¨C VOC 2006 Car dataset [2]: 921 car images
¨C EPFL Car dataset [3]: 2299 images, 20 instances
¨C Our new ImageNet dataset [4]: Bed (400), Chair
(770), sofa (800), table (670)
[1] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation.
In ICCV, 2007.
[2] M. Everingham, A. Zisserman, I. Williams, and L. Van Gool. The PASCAL Visual Object Classes
Challenge 2006 Results.
[3] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object
VISION LAB
localization. In CVPR, 2009. 41
[4] http://www.image-net.org.
42. Experiments
? Evaluation measures
¨C Detection: Average Precision (AP)
¨C Viewpoint: average viewpoint accuracy (the
average of the elements on the main diagonal of
the viewpoint confusion matrix)
¨C Part localization: Percentage of Correct Parts
(PCP)-recall curve
VISION LAB
42
44. Experiments
? Average results for eight categories on the
3DObject dataset (8 views)
Method ALM [1] [2]
Viewpoint 80.7 74.2 57.2
Detection 81.8 n/a n/a
[1] C. Gu and X. Ren. Discriminative mixture-of-templates for viewpoint
classification. In ECCV, 2010.
[2] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and VISION LAB
44
pose estimation. In ICCV, 2007.
45. Experiments
? Results on the Bicycle Category in the
3DObject dataset
Method ALM [1] [2]
Viewpoint 91.4 80.8 75.0
Detection 93.0 n/a 69.8
[1] N. Payet and S. Todorovic. From contours to 3d object detection and pose
estimation. In ICCV, 2011.
[2] J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric VISION LAB
45
model. In CVPR, 2010.
46. Experiments
? Results on the Car Category in the 3DObject
dataset
Method ALM [1] [2] [3] [4] [5] [6]
Viewpoint 93.4 85.4 85.3 81 70 67 48.5
Detection 98.4 n/a 99.2 89.9 76.7 55.3 n/a
[1] N. Payet and S. Todorovic. From contours to 3d object detection and pose estimation. In ICCV, 2011.
[2] D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich. Viewpoint-aware object detection and pose estimation.
In ICCV, 2011.
[3] M. Stark, M. Goesele, and B. Schiele. Back to the future: Learning shape models from 3d cad data. In BMVC, 2010.
[4] J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric model. In CVPR, 2010.
[5] H. Su, M. Sun, L. Fei-Fei, and S. Savarese. Learning a dense multiview representation for detection, viewpoint VISION LAB
classification and synthesis of object categories. In ICCV, 2009. 46
[6] M. Arie-Nachimson and R. Basri. Constructing implicit 3d shape models for pose estimation. In ICCV, 2009.
47. Experiments
? Detailed average viewpoint accuracy on the
3DObject dataset
Category Bicycle Car Cellphone Iron Mouse Shoe Stapler Toaster
DPM [1] 88.4 85.0 62.1 82.7 40.0 71.7 58.5 55.0
ALM Root 92.5 89.2 83.4 86.0 58.7 82.7 69.2 59.6
ALM Full 91.4 93.4 85.0 84.6 66.5 87.0 72.8 65.2
[1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection VISION LAB
47
with discriminatively trained part-based models. TPAMI, 2010.
48. Experiments
? Effect of training set sizes for viewpoint
[1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection VISION LAB
48
with discriminatively trained part-based models. TPAMI, 2010.
52. Experiments
? Average results on the ImageNet dataset
Method ALM Full ALM Root DPM [1]
3 views 86.5 79.0 84.6
7 views 63.4 34.0 49.5
[1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection VISION LAB
52
with discriminatively trained part-based models. TPAMI, 2010.
55. Conclusion
? A new Aspect Layout Model (ALM) for object
detection, pose estimation and aspect part
localization.
? ALM is capable of handling large number of
views, locating aspect parts and reasoning
self-occlusion.
? ALM can be useful for estimating functional
parts or object affordances.
? Our code and datasets are available online.
VISION LAB
55