Object Recognition in Probabilistic 3D Scenes




Description

A semantic description of 3-d scenes is essential to many urban and surveillance applications. The general problems of object localization and class recognition in Computer Vision are traditionally performed in 2D images. In contrast, this project aims to reason about the state of the 3-d world. More specifically, this project uses probabilistic volumetric models of a scene geometry and appearance to perform object categorization tasks directly in 3-d. The methods and results presented here have been accepted as a full paper (30 min. oral presentation) at the International Conference of Pattern Recognition Application and Methods, ICPRAM 20112

Go to (within this page):

Related Publication: [back to menu]

Restrepo, M.I, Mayer, B.A, and Mundy, J.L. Object Recognition in Probabilistic 3D Volumetric Scenes. International Conference of Pattern Recognition Application and Methods (ICPRAM) 2012

 

[PDF] [BibTex] [Oral Presentation: 30 min - Best Paper Finalist] [Conference Website]

Project Page:http://vision.lems.brown.edu/project_desc/Object-Recognition-in-Probabilistic-3D-Scenes

 

People Involved: [back to menu]

Maria Isabel Restrepo Brandon Asher Mayer Joseph L. Mundy

Project Details: [back to menu]

This section is still under construction...

The Probabilistic Volume Model

Pollard and Mundy (2007) proposed a probabilistic volume model that can represent the ambiguity and uncertainty in 3-d models derived from multiple image views. In Pollard's model, a region of three-dimensional space is decomposed into a regular 3-d grid of cells, called voxels. A voxel stores two kinds of state information: (i) the probability that the voxel contains a surface element and (ii) a mixture of Gaussians that models the surface appearance of the voxel as learned from a sequence of images. The surface probability is updated by incremental Bayesian learning , where the probability of a voxel containing a surface element after N+1 images increases if the Gaussian mixture at that voxel explains the intensity observed in the N+1 image better than any other voxelalong the projection ray. In a fixed-grid voxel representation, most of the voxels may correspond to empty areas of a scene, making storage of large, high-resolution scenes prohibitively expensive. Crispell (2010) proposed a continuously varying probabilistic scene model that generalizes the discrete model proposed by Pollard and Mundy.  Crispell's model allows non-uniform sampling of the volume leading to an octree representation that is more space-efficient and can handle finer resolution required near 3-d surfaces. More recently a GPU implementation of Crispell's model has been implemented by Miller et al. (2010). Training times decrease by several orders of magnitudes  making it feasible to train large number of objects requiered for multi-class object recognition tasks. The following figure sumarizas the probabilisti volume model.

 

voxel world

Object Categorization:

The local information in the probabilistic scenes is used to build representations of objects as bags of volumetric words. Local neighborhoods are described using principal component analysis or Taylor series approximation of the surface and appearance attributes. K-means type clustering is used to form a common vocabulary accross categories. Finally, features descriptors are assigned to the most similar vocabulary entry and quantized to learn distributions for different object classes. A Bayesian classifier is used during the testing phase to assign to each object the most probable class label. The workflow just described and the classification results are presented below.

The Data Collection: [back to menu]

The data used in these experiments was collected from a helicopter flying over the city of Providence, RI, USA, and its surroundings. and is made publicly available below. If the data is used in any subsequent publications, please cite: Restrepo, M., Mayer, B., and Mundy, L. Object Recognition in Probabilistic 3D Volumetric Scenes. To appear in International Conference of Pattern Recognition Application and Methods (ICPRAM) 2012.

 

 

An approximate resolution of 30 cm/pixel was obtained in the imagery and translated to 30 cm/voxel in the models. The probabilistic volume models were learned using the GPU implementation by Miller et al (2010). The data used to train the probabilistic scenes used in this work are made avalilable below. The camera matrices were estimated using Bundler. They are given in local coordinate systems.

Sites used for object categorization tasks

 

 

Links to video frames and corresponding camera matrices:


Additional Sites not used in the paper. More coming soon.

Site13:

 

Object bounding boxes:

Manually labeled bounding boxes were used during the experiments to learn and categorize objects in the PVM. Here we make available the bounding boxes for all objects used in our experiments. The boxes are given by independent .ply files containing min and max corners.