Related Publication: [back to menu]
[PDF] [BibTex] [Oral Presentation: 30 min - Best Paper Finalist] [Conference Website]
Project Page:http://vision.lems.brown.edu/project_desc/Object-Recognition-in-Probabilistic-3D-Scenes
People Involved: [back to menu]
Maria Isabel Restrepo Brandon Asher Mayer Joseph L. Mundy
Project Details: [back to menu]
This section is still under construction...
The Probabilistic Volume Model
Pollard and Mundy (2007) proposed a probabilistic volume model that can represent the ambiguity and uncertainty in 3-d models derived from multiple image views. In Pollard's model, a region of three-dimensional space is decomposed into a regular 3-d grid of cells, called voxels. A voxel stores two kinds of state information: (i) the probability that the voxel contains a surface element and (ii) a mixture of Gaussians that models the surface appearance of the voxel as learned from a sequence of images. The surface probability is updated by incremental Bayesian learning , where the probability of a voxel containing a surface element after N+1 images increases if the Gaussian mixture at that voxel explains the intensity observed in the N+1 image better than any other voxelalong the projection ray. In a fixed-grid voxel representation, most of the voxels may correspond to empty areas of a scene, making storage of large, high-resolution scenes prohibitively expensive. Crispell (2010) proposed a continuously varying probabilistic scene model that generalizes the discrete model proposed by Pollard and Mundy. Crispell's model allows non-uniform sampling of the volume leading to an octree representation that is more space-efficient and can handle finer resolution required near 3-d surfaces. More recently a GPU implementation of Crispell's model has been implemented by Miller et al. (2010). Training times decrease by several orders of magnitudes making it feasible to train large number of objects requiered for multi-class object recognition tasks. The following figure sumarizas the probabilisti volume model.
Object Categorization:
The local information in the probabilistic scenes is used to build representations of objects as bags of volumetric words. Local neighborhoods are described using principal component analysis or Taylor series approximation of the surface and appearance attributes. K-means type clustering is used to form a common vocabulary accross categories. Finally, features descriptors are assigned to the most similar vocabulary entry and quantized to learn distributions for different object classes. A Bayesian classifier is used during the testing phase to assign to each object the most probable class label. The workflow just described and the classification results are presented below.


The Data Collection: [back to menu]
The data used in these experiments was collected from a helicopter flying over the city of Providence, RI, USA, and its surroundings. and is made publicly available below. If the data is used in any subsequent publications, please cite: Restrepo, M., Mayer, B., and Mundy, L. Object Recognition in Probabilistic 3D Volumetric Scenes. To appear in International Conference of Pattern Recognition Application and Methods (ICPRAM) 2012.
An approximate resolution of 30 cm/pixel was obtained in the imagery and translated to 30 cm/voxel in the models. The probabilistic volume models were learned using the GPU implementation by Miller et al (2010). The data used to train the probabilistic scenes used in this work are made avalilable below. The camera matrices were estimated using Bundler. They are given in local coordinate systems.
Sites used for object categorization tasks
Links to video frames and corresponding camera matrices:

Object bounding boxes:
Manually labeled bounding boxes were used during the experiments to learn and categorize objects in the PVM. Here we make available the bounding boxes for all objects used in our experiments. The boxes are given by independent .ply files containing min and max corners.




