chair bookcase sofa table bed bookcase desk bookcase bed bookcase stool
poang billy1 ektorp lack malm1 billy2 expedit billy3 malm2 billy4 poang mean
DPM 0.02 0.27 0.08 0.22 0.24 0.35 0.09 0.18 0.66 0.11 0.76 0.27
ELDA 0.29 0.24 0.35 0.14 0.06 0.77 0.03 0.20 0.60 0.41 0.13 0.29
D+L 0.045 0.014 0.071 0.011 0.007 0.069 0.008 0.587 0.038 0.264 0.003 0.10
D+L+HOG 4.48 2.91 0.17 5.56 0.64 9.70 0.12 5.05 15.39 7.72 0.79 4.78
D+L+H+Region 17.16 11.35 2.43 7.24 2.37 17.18 1.23 7.70 14.24 9.08 3.42 8.49
Full 18.76 15.77 4.43 11.27 6.12 20.62 6.87 7.71 14.56 15.09 7.20 11.67
Table 1. AP Performances on Pose Estimation: we evaluate our pose estimation performance at a fine scale in the IKEAobject database.
As we introduce more features, the performance significantly improves. Note that DPM and ELDA are trained using rendered images.
chair bookcase sofa table bed bookcase desk bookcase bed bookcase stool
poang billy1 ektorp lack malm1 billy2 expedit billy3 malm2 billy4 poang mean
LDA @ 0.5 15.02 5.22 8.91 1.59 15.46 3.08 37.62 34.52 1.85 46.92 0.37 15.51
DPM @ 0.5 27.46 24.28 12.14 10.75 3.41 13.54 46.13 34.22 0.95 0.12 0.53 15.78
Ours @ 0.5 23.17 24.21 6.27 13.93 27.12 26.33 18.42 23.84 22.34 32.19 8.16 20.54
LDA @ 0.8 4.71 0.62 7.49 1.24 3.52 0.11 18.76 21.73 1.27 7.09 0.17 6.06
DPM @ 0.8 7.78 0.56 10.13 0.01 1.25 0.00 35.03 12.73 0.00 0.00 0.44 6.18
Ours @ 0.8 21.74 18.55 5.24 11.93 11.42 25.87 8.99 10.24 18.14 17.92 7.38 14.31
Table 2. AP Performances on Detection: we evaluate our method on detection against [4] and [8] at two different bounding box
intersection over union thresholds (0.5 and 0.8) in the IKEAobject database. The gap between our method and [4] becomes significantly
larger as we increase the threshold; which suggests that our method is better at fine detection.
novel views with a simple texture mapping.
5. Conclusion
We have introduced a novel problem and model of
estimating fine-pose of objects in the image with exact
3D models, combining traditionally used and recently
developed techniques. Moreover, we also provide a new
dataset of images fine-aligned with exactly matched 3D
models, as well as a set of 3D models for widely used
objects. We believe our approach can extend further to
more generic object classes, and enable the community to
try more ambitious goals such as accurate 3D contextual
modeling and full 3D room parsing.
Acknowledgements: This work is funded by ONR
MURI N000141010933. We also thank Phillip Isola and
Aditya Khosla for important suggestions and discussion.
References
[1] N. Dalal and B. Triggs. Histograms of oriented gradients for human
detection. In CVPR, 2005. 2
[2] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and
A. Zisserman. The PASCAL Visual Object Classes Challenge 2007
(VOC2007) Results. 6
[3] P. Felzenszwalb and D. Huttenlocher. Efficient graph-based image
segmentation. IJCV, 59(2):167–181, 2004. 3
[4] P. F. Felzenszwalb, R. B. Girshick, and D. McAllester. Discrimina-
tively trained deformable part models, release 4. 6, 8
[5] S. Fidler, S. J. Dickinson, and R. Urtasun. 3d object detection and
viewpoint estimation with a deformable 3d cuboid model. In NIPS,
2012. 2
[6] M. Fisher and P. Hanrahan. Context-based search for 3d models.
ACM Trans. Graph., 29(6), Dec. 2010. 2
[7] S. Gupta, P. Arbelaez, and J. Malik. Perceptual organization and
recognition of indoor scenes from RGB-D images. In CVPR, 2013.
2
[8] B. Hariharan, J. Malik, and D. Ramanan. Discriminative decorrela-
tion for clustering and classification. In ECCV, 2012. 3, 6, 8
[9] M. Hejrati and D. Ramanan. Analyzing 3d objects in cluttered im-
ages. In NIPS, 2012. 2
[10] P. Henry, M. Krainin, E. Herbst, X. Ren, and D. Fox. Rgbd mapping:
Using depth cameras for dense 3d modeling of indoor environments.
In RGB-D: Advanced Reasoning with Depth Cameras Workshop in
conjunction with RSS, 2010. 2
[11] J. J. Lim, C. L. Zitnick, and P. Dollar. Sketch tokens: A learned
mid-level representation for contour and object detection. In CVPR,
2013. 4
[12] D. Lowe. Fitting parameterized three-dimensional models to images.
PAMI, 1991. 1, 3
[13] D. G. Lowe. Distinctive image features from scale-invariant key-
points. IJCV, 2004. 2
[14] J. L. Mundy. Object recognition in the geometric era: A retrospec-
tive. In Toward CategoryLevel Object Recognition, volume 4170 of
Lecture Notes in Computer Science, pages 3–29. Springer, 2006. 1
[15] T. Ojala, M. Pietikinen, and T. Menp. Multiresolution gray-scale
and rotation invariant texture classification with local binary patterns.
PAMI, 2002. 4
[16] B. Pepik, P. Gehler, M. Stark, and B. Schiele. 3d2pm - 3d deformable
part models. In ECCV, 2012. 2
[17] F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. 3d object mod-
eling and recognition using local affine-invariant image descriptors
and multi-view spatial constraints. IJCV, 66:2006, 2006. 1
[18] S. Satkin, J. Lin, and M. Hebert. Data-driven scene understanding
from 3D models. In BMVC, 2012. 4
[19] Y. Xiang and S. Savarese. Estimating the aspect layout of object
categories. In CVPR, 2012. 2
[20] J. Xiao, B. Russell, and A. Torralba. Localizing 3d cuboids in single-
view images. In NIPS. 2012. 2, 4
[21] Y. Zhao and S.-C. Zhu. Image parsing via stochastic scene grammar.
In NIPS, 2011. 2