ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Estimating the Aspect Layout of
      Object Categories
      Yu Xiang and Silvio Savarese
  University of Michigan at Ann Arbor
   {yuxiang, silvio}@eecs.umich.edu

                                        VISION LAB
VISION LAB
                                                                                                                                      2




                                                      From Felzenszwalb et al., 10                    From Ramanan &
Traditional object recognition




                                                                                                      Sminchisescu, 06
                                                      Rigid object




                                                                                          Body part
                                 ? Uses 2D bounding boxes




                                                                 From Viola & Jones, 01          From Barinova et al., 12

                                                                                          Human
                                                            Face
Beyond 2D bounding boxes
? Model the 3D properties of objects
  ? 3D pose
  ? 3D part location

? More suitable for robotics, autonomous
  navigation and manipulation



                                From Saxena et al., 08
                                                         VISION LAB
                                                             3
Our goals
      Viewpoint: Azimuth 315?,
      Elevation 30?, Distance 2




                            VISION LAB
                                4
Related work: joint object detection
            and pose estimation



           From Savarese & Fei-Fei ICCV¡¯07              From Glasner et al. ICCV¡¯11

                                                           Azimuth




                                                                             From Liebelt et al. 08, 10
?   Savarese et al. 07, 08 ?   Stark et al. 10
?   Ozuysal et al. 08      ?   Gu & Ren. 10




                                                                                 Elevation
?   Liebelt et al. 08, 10 ?    Glasner et al. 11
?   Xiao et al. 08         ?   Payet & Todorovic 11
?   Thomas et al. 08       ?   Zia et al., 3DRR¡¯11
?   Sun et al. 09          ?   Pepik et al., CVPR¡¯12       Distance
?   Su et al. 09           ?   Schels et al., CVPR¡¯12
?   Arie-Nachimson & ?         Xiang and Savarese,                                                        VISION LAB
                                                                                                              5
    Barsi 09                   CVPR¡¯12
Related work: 2D part-based model
Constellation Model                       Implicit Shape Model




                                     From Leibe et al. ECCV¡¯04 workshop
From Fergus et al. CVPR¡¯03
                                Deformable
                             Part Model (DPM)




                                                                          VISION LAB
                                                                              6
                       From Felzenszwalb et al. CVPR¡¯08
Related work: 3D part-based model




                                 From Kushal et. al., CVPR¡¯07
From Hoiem et. al., CVPR¡¯07




                                                     Key-view 2
                                 Key-view 1


                                             Key-view 3
                                                                  VISION LAB
    From Chiu et. al., CVPR¡¯07        From Sun et. al. ICCV¡¯09        7
Our contributions
     ? Propose a 3D part based representation for
       object categories
     ? Introduce the concept of aspect parts
     ? Jointly solve object detection, pose estimation
       and aspect part localization
     ? Significantly improve pose estimation accuracy,
       evaluate rigid part localization


Yu Xiang and Silvio Savarese. Estimating the aspect layout of object categories. In IEEE   VISION LAB
                                                                                               8
International Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
Aspect Part
? Parts are arbitrarily defined in previous work




   From Fergus et al. CVPR¡¯03   From Felzenszwalb et al., 2010.



? Introduce parts with geometrical and
  topological properties, called aspect parts

                                                                  VISION LAB
                                                                      9
Aspect Part
? Our definition: a portion of the object whose
  3D surface is approximately either entirely
  visible from the observer or entirely non-
  visible (i.e., self-occluded)




                  A
                                              VISION LAB
                                                  10
Aspect Part
? Our definition: a portion of the object whose
  3D surface is approximately either entirely
  visible from the observer or entirely non-
  visible (i.e., self-occluded)




                  A
                                              VISION LAB
                                                  11
Aspect Part
? Our definition: a portion of the object whose
  3D surface is approximately either entirely
  visible from the observer or entirely non-
  visible (i.e., self-occluded)




                  A
                                              VISION LAB
                                                  12
Aspect Part
? Our definition: a portion of the object whose
  3D surface is approximately either entirely
  visible from the observer or entirely non-
  visible (i.e., self-occluded)




                          B

                                              VISION LAB
                                                  13
Aspect Part
? Our definition: a portion of the object whose
  3D surface is approximately either entirely
  visible from the observer or entirely non-
  visible (i.e., self-occluded)




                          B

                                              VISION LAB
                                                  14
Aspect Part
? Our definition: a portion of the object whose
  3D surface is approximately either entirely
  visible from the observer or entirely non-
  visible (i.e., self-occluded)




                          B

                                              VISION LAB
                                                  15
Aspect Part
? Our definition: a portion of the object whose
  3D surface is approximately either entirely
  visible from the observer or entirely non-
  visible (i.e., self-occluded)




                     AB
                                              VISION LAB
                                                  16
Aspect Part
? Examples




   Bed         Car         Sofa


                                  VISION LAB
                                     17
Aspect Part

 ? Related to aspect graph [1]
 ? Related to discriminative aspect, Farhadi et al, 07




                                  Figure from Barb Culter, MIT

[1] J. J. Koenderink and A. J. Doorn. The internal representation of solid shape with   VISION LAB
                                                                                           18
respect to vision. Biological Cybernetics, 1979.
Aspect Part

? Related to object affordance or functional part




                        Seat



                                                VISION LAB
                                                    19
Aspect Part

? Related to geometrical attributes of object




                          Horizontal surface

                           Vertical surface


                                                VISION LAB
                                                   20
Aspect Part

? Related to scene layout estimation




              From Hedau, Hoiem & Forsyth, ECCV¡¯10

                                                     VISION LAB
                                                        21
Aspect Part

? Enables the modeling of object-human interactions




                From Gupta et al., CVPR¡¯11

                                              VISION LAB
                                                 22
Outline

?   Aspect layout model
?   Maximal margin parameter estimation
?   Model inference
?   Experiments
?   Conclusion



                                          VISION LAB
                                             23
Input & output

? Input
  ¨C 2D image I
                                2D part center      2D part shape
? Output                        coordinates
  ¨C Object label Y ?{?1, ?1}
  ¨C Part configuration in 2D C ? (c1 ,   , cn ) ci ? ( xi , yi , s i )




                                                               VISION LAB
                                                                  24
Aspect Layout Model

? 3D Object O ? (o1 , , on )




                                VISION LAB
                                   25
Aspect Layout Model

? Viewpoint representation V=(a,e,d)
? 2D part shape from 3D
                   si


               d

                        e
                               a
                                         VISION LAB


       Azimuth, elevation and distance      26
Aspect Layout Model

? Model the posterior distribution
            P(Y , C | I ) C ? (c , , c ), c
                                       1        n   i   ? ( xi , yi , s i )

            ? P(Y , L, O,V | I )

                 2D part center
  object label                      3D object       viewpoint
                 coordinates

        L ? (l1 ,   , l n ), li ? ( xi , yi )
                                                                   VISION LAB
                                                                      27
Aspect Layout Model

     ? Conditional Random Field (CRF) [1]

           P(Y , L, O,V | I ) ? exp( E (Y , L, O,V , I ))


     ? Graph structure of the CRF




[1] J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: Probabilistic   VISION LAB
                                                                                           28
models for segmenting and labeling sequence data. In ICML, 2001.
Aspect Layout Model




 r1        r2        r3        r4   root nodes

      p1        p2        p3        part nodes



                                           VISION LAB
                                              29
Aspect Layout Model




 r1        r2        r3        r4   root nodes

      p1        p2        p3        part nodes



                                           VISION LAB
                                              30
Aspect Layout Model




 r1        r2        r3        r4   root nodes

      p1        p2        p3        part nodes



                                           VISION LAB
                                              31
Aspect Layout Model

? Energy function

   E (Y , L, O,V , I )
    ?? V1 (l i , O,V , I ) ? ? V2 (l i , l j , O, V ), if Y ? ?1
    ? i
   ??                        (i , j )

    ?
    ?                                              0, if Y ? ?1
         unary potential       pairwise potential


                                                             VISION LAB
                                                                32
Aspect Layout Model

? Viewpoint invariant unary potential
  ? Models part appearances
           part template           feature vector


                        ?wT ? (l i , O,V , I ), if unoccluded
                        ? i
  V1 (l i , O,V , I ) ? ?
                        ?
                        ?                 ? i , if occluded

                              self-occlusion weight
                                                                VISION LAB
                                                                   33
Aspect Layout Model
  ? Rectified HOG features
                 H

                                       HOG features
original image       rectified image


                          H


 ALM only needs one template for each part     VISION LAB

 across all the viewpoints.                       34
Aspect Layout Model
? Pairwise potential
   ? Constrains 2D relative locations of parts

                                           2D projection




      3D world
                                          2D observation

                                                 VISION LAB
                                                    35
Aspect Layout Model

? Pairwise potential
 V2 (l i , l j , O,V )
  ? ? wx ( xi ? x j ? dij ,O ,V cos(?ij ,O ,V )) 2 ? wy ( yi ? y j ? dij ,O ,V sin(?ij ,O ,V )) 2

              ( xi , yi )
                                (x j , y j )

                                                                       ?ij ,O ,V

                                                                 dij ,O ,V
                                                                                           VISION LAB
                                                                                              36
Aspect Layout Model

? Energy function

 E (Y , L, O,V , I | ? ) ? ? ?(Y , L, O,V , I )
                                       T


  ¨C Parameters

    ? ? (wi ,?i , ?i ,?i , wx , wy )
  ¨C Linear energy function


                                                  VISION LAB
                                                     37
Aspect Layout Model

     ? Maximal margin parameter estimation
          ¨C Energy based learning [1]: find an energy function
            which outputs the maximal energy value for the
            correct label configuration of an object
          ¨C Training set
             T ? {( I , Y , L , O ,V ), t ? 1,
                          t     t    t      t     t
                                                                    N}
          ¨C Structural SVM optimization [2]

[1] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato and F. J. Huang. A tutorial on energy-
based learning. In Predicting Structured Data, MIT Press, 2006.
                                                                                         VISION LAB
[2] I. Tsochantaridis, T. Hofmann, T. Joachims and Y. Altun. Support vector machine         38
learning for interdependent and structured output spaces. In ICML, 2004.
Aspect Layout Model

     ? Model inference
             (Y * , L* , O* ,V * ) ? arg max E (Y , L, O,V , I | ? )
                                               Y , L ,O ,V

          ¨C Run Belief Propagation (BP) [1] for each
            combination of O and V to obtain E (Y ? ?1, L* , O* ,V * )
          ¨C Recall the graph structure
          ¨C Y * ? ?1 if E (Y ? ?1, L* , O* ,V * ) ? ? (detection
            threshold)

[1] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understanding belief propagation and its   VISION LAB
                                                                                              39
generalizations. In Exploring artificial intelligence in the new millennium, 2003.
Experiments

     ? Datasets
           ¨C 3DObject dataset [1]: 10 categories, 10 instances
             each category
           ¨C VOC 2006 Car dataset [2]: 921 car images
           ¨C EPFL Car dataset [3]: 2299 images, 20 instances
           ¨C Our new ImageNet dataset [4]: Bed (400), Chair
             (770), sofa (800), table (670)
[1] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation.
In ICCV, 2007.
[2] M. Everingham, A. Zisserman, I. Williams, and L. Van Gool. The PASCAL Visual Object Classes
Challenge 2006 Results.
[3] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object
                                                                                                      VISION LAB
localization. In CVPR, 2009.                                                                             40
[4] http://www.image-net.org.
Experiments

     ? Datasets
           ¨C 3DObject dataset [1]: 10 categories, 10 instances
             each category
           ¨C VOC 2006 Car dataset [2]: 921 car images
           ¨C EPFL Car dataset [3]: 2299 images, 20 instances
           ¨C Our new ImageNet dataset [4]: Bed (400), Chair
             (770), sofa (800), table (670)
[1] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation.
In ICCV, 2007.
[2] M. Everingham, A. Zisserman, I. Williams, and L. Van Gool. The PASCAL Visual Object Classes
Challenge 2006 Results.
[3] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object
                                                                                                      VISION LAB
localization. In CVPR, 2009.                                                                             41
[4] http://www.image-net.org.
Experiments
? Evaluation measures
  ¨C Detection: Average Precision (AP)

  ¨C Viewpoint: average viewpoint accuracy (the
    average of the elements on the main diagonal of
    the viewpoint confusion matrix)

  ¨C Part localization: Percentage of Correct Parts
    (PCP)-recall curve

                                                     VISION LAB
                                                        42
Experiments
? 3D models




                            VISION LAB
                               43
Experiments
    ? Average results for eight categories on the
      3DObject dataset (8 views)

               Method                ALM               [1]              [2]
               Viewpoint             80.7             74.2             57.2
               Detection             81.8             n/a              n/a




[1] C. Gu and X. Ren. Discriminative mixture-of-templates for viewpoint
classification. In ECCV, 2010.
[2] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and   VISION LAB
                                                                                        44
pose estimation. In ICCV, 2007.
Experiments
     ? Results on the Bicycle Category in the
       3DObject dataset
               Method               ALM               [1]              [2]
               Viewpoint            91.4             80.8             75.0
               Detection            93.0             n/a              69.8




[1] N. Payet and S. Todorovic. From contours to 3d object detection and pose
estimation. In ICCV, 2011.
[2] J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric   VISION LAB
                                                                                         45
model. In CVPR, 2010.
Experiments
      ? Results on the Car Category in the 3DObject
        dataset
      Method              ALM            [1]            [2]            [3]           [4]            [5]           [6]
      Viewpoint           93.4          85.4           85.3            81             70             67          48.5
      Detection           98.4          n/a           99.2           89.9           76.7           55.3           n/a




[1] N. Payet and S. Todorovic. From contours to 3d object detection and pose estimation. In ICCV, 2011.
[2] D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich. Viewpoint-aware object detection and pose estimation.
In ICCV, 2011.
[3] M. Stark, M. Goesele, and B. Schiele. Back to the future: Learning shape models from 3d cad data. In BMVC, 2010.
[4] J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric model. In CVPR, 2010.
[5] H. Su, M. Sun, L. Fei-Fei, and S. Savarese. Learning a dense multiview representation for detection, viewpoint VISION LAB
classification and synthesis of object categories. In ICCV, 2009.                                                      46
[6] M. Arie-Nachimson and R. Basri. Constructing implicit 3d shape models for pose estimation. In ICCV, 2009.
Experiments

     ? Detailed average viewpoint accuracy on the
       3DObject dataset

Category    Bicycle      Car     Cellphone     Iron      Mouse       Shoe       Stapler    Toaster

DPM [1]      88.4       85.0        62.1       82.7       40.0        71.7       58.5        55.0

ALM Root     92.5       89.2        83.4       86.0       58.7        82.7       69.2        59.6

ALM Full     91.4       93.4        85.0       84.6       66.5        87.0       72.8        65.2




[1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection   VISION LAB
                                                                                             47
with discriminatively trained part-based models. TPAMI, 2010.
Experiments

     ? Effect of training set sizes for viewpoint




[1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection   VISION LAB
                                                                                             48
with discriminatively trained part-based models. TPAMI, 2010.
Experiments

? Part localization on the 3DObject dataset




                                              VISION LAB
                                                 49
Experiments




              VISION LAB
                 50
Experiments




              VISION LAB
                 51
Experiments
     ? Average results on the ImageNet dataset

              Method            ALM Full        ALM Root         DPM [1]
              3 views              86.5             79.0            84.6
              7 views              63.4             34.0            49.5




[1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection   VISION LAB
                                                                                             52
with discriminatively trained part-based models. TPAMI, 2010.
Experiments




              VISION LAB
                 53
Experiments




              VISION LAB
                 54
Conclusion
? A new Aspect Layout Model (ALM) for object
  detection, pose estimation and aspect part
  localization.
? ALM is capable of handling large number of
  views, locating aspect parts and reasoning
  self-occlusion.
? ALM can be useful for estimating functional
  parts or object affordances.
? Our code and datasets are available online.
                                            VISION LAB
                                                55
Acknowledgments




Thank you!

                  VISION LAB
                     56

More Related Content

Xiang midwest workshop_09212012

  • 1. Estimating the Aspect Layout of Object Categories Yu Xiang and Silvio Savarese University of Michigan at Ann Arbor {yuxiang, silvio}@eecs.umich.edu VISION LAB
  • 2. VISION LAB 2 From Felzenszwalb et al., 10 From Ramanan & Traditional object recognition Sminchisescu, 06 Rigid object Body part ? Uses 2D bounding boxes From Viola & Jones, 01 From Barinova et al., 12 Human Face
  • 3. Beyond 2D bounding boxes ? Model the 3D properties of objects ? 3D pose ? 3D part location ? More suitable for robotics, autonomous navigation and manipulation From Saxena et al., 08 VISION LAB 3
  • 4. Our goals Viewpoint: Azimuth 315?, Elevation 30?, Distance 2 VISION LAB 4
  • 5. Related work: joint object detection and pose estimation From Savarese & Fei-Fei ICCV¡¯07 From Glasner et al. ICCV¡¯11 Azimuth From Liebelt et al. 08, 10 ? Savarese et al. 07, 08 ? Stark et al. 10 ? Ozuysal et al. 08 ? Gu & Ren. 10 Elevation ? Liebelt et al. 08, 10 ? Glasner et al. 11 ? Xiao et al. 08 ? Payet & Todorovic 11 ? Thomas et al. 08 ? Zia et al., 3DRR¡¯11 ? Sun et al. 09 ? Pepik et al., CVPR¡¯12 Distance ? Su et al. 09 ? Schels et al., CVPR¡¯12 ? Arie-Nachimson & ? Xiang and Savarese, VISION LAB 5 Barsi 09 CVPR¡¯12
  • 6. Related work: 2D part-based model Constellation Model Implicit Shape Model From Leibe et al. ECCV¡¯04 workshop From Fergus et al. CVPR¡¯03 Deformable Part Model (DPM) VISION LAB 6 From Felzenszwalb et al. CVPR¡¯08
  • 7. Related work: 3D part-based model From Kushal et. al., CVPR¡¯07 From Hoiem et. al., CVPR¡¯07 Key-view 2 Key-view 1 Key-view 3 VISION LAB From Chiu et. al., CVPR¡¯07 From Sun et. al. ICCV¡¯09 7
  • 8. Our contributions ? Propose a 3D part based representation for object categories ? Introduce the concept of aspect parts ? Jointly solve object detection, pose estimation and aspect part localization ? Significantly improve pose estimation accuracy, evaluate rigid part localization Yu Xiang and Silvio Savarese. Estimating the aspect layout of object categories. In IEEE VISION LAB 8 International Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
  • 9. Aspect Part ? Parts are arbitrarily defined in previous work From Fergus et al. CVPR¡¯03 From Felzenszwalb et al., 2010. ? Introduce parts with geometrical and topological properties, called aspect parts VISION LAB 9
  • 10. Aspect Part ? Our definition: a portion of the object whose 3D surface is approximately either entirely visible from the observer or entirely non- visible (i.e., self-occluded) A VISION LAB 10
  • 11. Aspect Part ? Our definition: a portion of the object whose 3D surface is approximately either entirely visible from the observer or entirely non- visible (i.e., self-occluded) A VISION LAB 11
  • 12. Aspect Part ? Our definition: a portion of the object whose 3D surface is approximately either entirely visible from the observer or entirely non- visible (i.e., self-occluded) A VISION LAB 12
  • 13. Aspect Part ? Our definition: a portion of the object whose 3D surface is approximately either entirely visible from the observer or entirely non- visible (i.e., self-occluded) B VISION LAB 13
  • 14. Aspect Part ? Our definition: a portion of the object whose 3D surface is approximately either entirely visible from the observer or entirely non- visible (i.e., self-occluded) B VISION LAB 14
  • 15. Aspect Part ? Our definition: a portion of the object whose 3D surface is approximately either entirely visible from the observer or entirely non- visible (i.e., self-occluded) B VISION LAB 15
  • 16. Aspect Part ? Our definition: a portion of the object whose 3D surface is approximately either entirely visible from the observer or entirely non- visible (i.e., self-occluded) AB VISION LAB 16
  • 17. Aspect Part ? Examples Bed Car Sofa VISION LAB 17
  • 18. Aspect Part ? Related to aspect graph [1] ? Related to discriminative aspect, Farhadi et al, 07 Figure from Barb Culter, MIT [1] J. J. Koenderink and A. J. Doorn. The internal representation of solid shape with VISION LAB 18 respect to vision. Biological Cybernetics, 1979.
  • 19. Aspect Part ? Related to object affordance or functional part Seat VISION LAB 19
  • 20. Aspect Part ? Related to geometrical attributes of object Horizontal surface Vertical surface VISION LAB 20
  • 21. Aspect Part ? Related to scene layout estimation From Hedau, Hoiem & Forsyth, ECCV¡¯10 VISION LAB 21
  • 22. Aspect Part ? Enables the modeling of object-human interactions From Gupta et al., CVPR¡¯11 VISION LAB 22
  • 23. Outline ? Aspect layout model ? Maximal margin parameter estimation ? Model inference ? Experiments ? Conclusion VISION LAB 23
  • 24. Input & output ? Input ¨C 2D image I 2D part center 2D part shape ? Output coordinates ¨C Object label Y ?{?1, ?1} ¨C Part configuration in 2D C ? (c1 , , cn ) ci ? ( xi , yi , s i ) VISION LAB 24
  • 25. Aspect Layout Model ? 3D Object O ? (o1 , , on ) VISION LAB 25
  • 26. Aspect Layout Model ? Viewpoint representation V=(a,e,d) ? 2D part shape from 3D si d e a VISION LAB Azimuth, elevation and distance 26
  • 27. Aspect Layout Model ? Model the posterior distribution P(Y , C | I ) C ? (c , , c ), c 1 n i ? ( xi , yi , s i ) ? P(Y , L, O,V | I ) 2D part center object label 3D object viewpoint coordinates L ? (l1 , , l n ), li ? ( xi , yi ) VISION LAB 27
  • 28. Aspect Layout Model ? Conditional Random Field (CRF) [1] P(Y , L, O,V | I ) ? exp( E (Y , L, O,V , I )) ? Graph structure of the CRF [1] J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: Probabilistic VISION LAB 28 models for segmenting and labeling sequence data. In ICML, 2001.
  • 29. Aspect Layout Model r1 r2 r3 r4 root nodes p1 p2 p3 part nodes VISION LAB 29
  • 30. Aspect Layout Model r1 r2 r3 r4 root nodes p1 p2 p3 part nodes VISION LAB 30
  • 31. Aspect Layout Model r1 r2 r3 r4 root nodes p1 p2 p3 part nodes VISION LAB 31
  • 32. Aspect Layout Model ? Energy function E (Y , L, O,V , I ) ?? V1 (l i , O,V , I ) ? ? V2 (l i , l j , O, V ), if Y ? ?1 ? i ?? (i , j ) ? ? 0, if Y ? ?1 unary potential pairwise potential VISION LAB 32
  • 33. Aspect Layout Model ? Viewpoint invariant unary potential ? Models part appearances part template feature vector ?wT ? (l i , O,V , I ), if unoccluded ? i V1 (l i , O,V , I ) ? ? ? ? ? i , if occluded self-occlusion weight VISION LAB 33
  • 34. Aspect Layout Model ? Rectified HOG features H HOG features original image rectified image H ALM only needs one template for each part VISION LAB across all the viewpoints. 34
  • 35. Aspect Layout Model ? Pairwise potential ? Constrains 2D relative locations of parts 2D projection 3D world 2D observation VISION LAB 35
  • 36. Aspect Layout Model ? Pairwise potential V2 (l i , l j , O,V ) ? ? wx ( xi ? x j ? dij ,O ,V cos(?ij ,O ,V )) 2 ? wy ( yi ? y j ? dij ,O ,V sin(?ij ,O ,V )) 2 ( xi , yi ) (x j , y j ) ?ij ,O ,V dij ,O ,V VISION LAB 36
  • 37. Aspect Layout Model ? Energy function E (Y , L, O,V , I | ? ) ? ? ?(Y , L, O,V , I ) T ¨C Parameters ? ? (wi ,?i , ?i ,?i , wx , wy ) ¨C Linear energy function VISION LAB 37
  • 38. Aspect Layout Model ? Maximal margin parameter estimation ¨C Energy based learning [1]: find an energy function which outputs the maximal energy value for the correct label configuration of an object ¨C Training set T ? {( I , Y , L , O ,V ), t ? 1, t t t t t N} ¨C Structural SVM optimization [2] [1] Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato and F. J. Huang. A tutorial on energy- based learning. In Predicting Structured Data, MIT Press, 2006. VISION LAB [2] I. Tsochantaridis, T. Hofmann, T. Joachims and Y. Altun. Support vector machine 38 learning for interdependent and structured output spaces. In ICML, 2004.
  • 39. Aspect Layout Model ? Model inference (Y * , L* , O* ,V * ) ? arg max E (Y , L, O,V , I | ? ) Y , L ,O ,V ¨C Run Belief Propagation (BP) [1] for each combination of O and V to obtain E (Y ? ?1, L* , O* ,V * ) ¨C Recall the graph structure ¨C Y * ? ?1 if E (Y ? ?1, L* , O* ,V * ) ? ? (detection threshold) [1] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Understanding belief propagation and its VISION LAB 39 generalizations. In Exploring artificial intelligence in the new millennium, 2003.
  • 40. Experiments ? Datasets ¨C 3DObject dataset [1]: 10 categories, 10 instances each category ¨C VOC 2006 Car dataset [2]: 921 car images ¨C EPFL Car dataset [3]: 2299 images, 20 instances ¨C Our new ImageNet dataset [4]: Bed (400), Chair (770), sofa (800), table (670) [1] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation. In ICCV, 2007. [2] M. Everingham, A. Zisserman, I. Williams, and L. Van Gool. The PASCAL Visual Object Classes Challenge 2006 Results. [3] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object VISION LAB localization. In CVPR, 2009. 40 [4] http://www.image-net.org.
  • 41. Experiments ? Datasets ¨C 3DObject dataset [1]: 10 categories, 10 instances each category ¨C VOC 2006 Car dataset [2]: 921 car images ¨C EPFL Car dataset [3]: 2299 images, 20 instances ¨C Our new ImageNet dataset [4]: Bed (400), Chair (770), sofa (800), table (670) [1] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and pose estimation. In ICCV, 2007. [2] M. Everingham, A. Zisserman, I. Williams, and L. Van Gool. The PASCAL Visual Object Classes Challenge 2006 Results. [3] M. Ozuysal, V. Lepetit, and P. Fua. Pose estimation for category specific multiview object VISION LAB localization. In CVPR, 2009. 41 [4] http://www.image-net.org.
  • 42. Experiments ? Evaluation measures ¨C Detection: Average Precision (AP) ¨C Viewpoint: average viewpoint accuracy (the average of the elements on the main diagonal of the viewpoint confusion matrix) ¨C Part localization: Percentage of Correct Parts (PCP)-recall curve VISION LAB 42
  • 43. Experiments ? 3D models VISION LAB 43
  • 44. Experiments ? Average results for eight categories on the 3DObject dataset (8 views) Method ALM [1] [2] Viewpoint 80.7 74.2 57.2 Detection 81.8 n/a n/a [1] C. Gu and X. Ren. Discriminative mixture-of-templates for viewpoint classification. In ECCV, 2010. [2] S. Savarese and L. Fei-Fei. 3d generic object categorization, localization and VISION LAB 44 pose estimation. In ICCV, 2007.
  • 45. Experiments ? Results on the Bicycle Category in the 3DObject dataset Method ALM [1] [2] Viewpoint 91.4 80.8 75.0 Detection 93.0 n/a 69.8 [1] N. Payet and S. Todorovic. From contours to 3d object detection and pose estimation. In ICCV, 2011. [2] J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric VISION LAB 45 model. In CVPR, 2010.
  • 46. Experiments ? Results on the Car Category in the 3DObject dataset Method ALM [1] [2] [3] [4] [5] [6] Viewpoint 93.4 85.4 85.3 81 70 67 48.5 Detection 98.4 n/a 99.2 89.9 76.7 55.3 n/a [1] N. Payet and S. Todorovic. From contours to 3d object detection and pose estimation. In ICCV, 2011. [2] D. Glasner, M. Galun, S. Alpert, R. Basri, and G. Shakhnarovich. Viewpoint-aware object detection and pose estimation. In ICCV, 2011. [3] M. Stark, M. Goesele, and B. Schiele. Back to the future: Learning shape models from 3d cad data. In BMVC, 2010. [4] J. Liebelt and C. Schmid. Multi-view object class detection with a 3D geometric model. In CVPR, 2010. [5] H. Su, M. Sun, L. Fei-Fei, and S. Savarese. Learning a dense multiview representation for detection, viewpoint VISION LAB classification and synthesis of object categories. In ICCV, 2009. 46 [6] M. Arie-Nachimson and R. Basri. Constructing implicit 3d shape models for pose estimation. In ICCV, 2009.
  • 47. Experiments ? Detailed average viewpoint accuracy on the 3DObject dataset Category Bicycle Car Cellphone Iron Mouse Shoe Stapler Toaster DPM [1] 88.4 85.0 62.1 82.7 40.0 71.7 58.5 55.0 ALM Root 92.5 89.2 83.4 86.0 58.7 82.7 69.2 59.6 ALM Full 91.4 93.4 85.0 84.6 66.5 87.0 72.8 65.2 [1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection VISION LAB 47 with discriminatively trained part-based models. TPAMI, 2010.
  • 48. Experiments ? Effect of training set sizes for viewpoint [1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection VISION LAB 48 with discriminatively trained part-based models. TPAMI, 2010.
  • 49. Experiments ? Part localization on the 3DObject dataset VISION LAB 49
  • 50. Experiments VISION LAB 50
  • 51. Experiments VISION LAB 51
  • 52. Experiments ? Average results on the ImageNet dataset Method ALM Full ALM Root DPM [1] 3 views 86.5 79.0 84.6 7 views 63.4 34.0 49.5 [1] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection VISION LAB 52 with discriminatively trained part-based models. TPAMI, 2010.
  • 53. Experiments VISION LAB 53
  • 54. Experiments VISION LAB 54
  • 55. Conclusion ? A new Aspect Layout Model (ALM) for object detection, pose estimation and aspect part localization. ? ALM is capable of handling large number of views, locating aspect parts and reasoning self-occlusion. ? ALM can be useful for estimating functional parts or object affordances. ? Our code and datasets are available online. VISION LAB 55
  • 56. Acknowledgments Thank you! VISION LAB 56