Object Recognition in Probabilistic 3D Scenes


A semantic description of 3-d scenes is essential to many urban and surveillance applications. The general problems of object localization and class recognition in Computer Vision are traditionally performed in 2D images. In contrast, this project aims to reason about the state of the 3-d world. More specifically, this project uses probabilistic volumetric models of a scene geometry and appearance to perform object categorization tasks directly in 3-d. The methods and results presented here were fisrt accepted as a full paper (30 min. oral presentation) at the International Conference of Pattern Recognition Application and Methods, ICPRAM 20112. An more recent and comprenhensive evaluation has been accepted for publication at the IEEE Journal of Selected Topics in Signal Processing

Go to (within this page):

Related Publications: [back to menu]



People Involved: [back to menu]

Project Details: [back to menu]

The Probabilistic Volume

odel Pollard and Mundy (2007) proposed a probabilistic volume model that can represent the ambiguity and uncertainty in 3-d models derived from multiple image views. In Pollard's model, a region of three-dimensional space is decomposed into a regular 3-d grid of cells, called voxels. A voxel stores two kinds of state information: (i) the probability that the voxel contains a surface element and (ii) a mixture of Gaussians that models the surface appearance of the voxel as learned from a sequence of images. The surface probability is updated by incremental Bayesian learning , where the probability of a voxel containing a surface element after N+1 images increases if the Gaussian mixture at that voxel explains the intensity observed in the N+1 image better than any other voxelalong the projection ray. In a fixed-grid voxel representation, most of the voxels may correspond to empty areas of a scene, making storage of large, high-resolution scenes prohibitively expensive. Crispell (2010) proposed a continuously varying probabilistic scene model that generalizes the discrete model proposed by Pollard and Mundy. Crispell's model allows non-uniform sampling of the volume leading to an octree representation that is more space-efficient and can handle finer resolution required near 3-d surfaces. More recently a GPU implementation of Crispell's model has been implemented by Miller et al. (2010). Training times decrease by several orders of magnitudes making it feasible to train large number of objects requiered for multi-class object recognition tasks. The following figure sumarizas the probabilisti volume model.

voxel world

Object Categorization

The local information in the probabilistic scenes is used to build representations of objects as bags of volumetric words. Local neighborhoods are described using principal component analysis or Taylor series approximation of the surface and appearance attributes. K-means type clustering is used to form a common vocabulary accross categories. Finally, features descriptors are assigned to the most similar vocabulary entry and quantized to learn distributions for different object classes. A Bayesian classifier is used during the testing phase to assign to each object the most probable class label. The workflow just described and the classification results are presented below.