Model-based object recognition or, more generally, scene interpretation may be conceptualized as a two-part process: one that generates a sequence of hypotheses on object identities and poses, the other that evaluates them based on the object models. Viewed as an optimization problem, the former is concerned with the search sequence, the latter with the objective function. Often, the evaluation of the objective function is computationally expensive. A reasonable search algorithm must then arrive at an acceptable hypothesis within a small number of such evaluations.
In this work, we analyze the search for a scene interpretation from a probabilistic perspective. Object models are formulated as generative models for range data as obtained from a stereo-vision system, laser scanner etc. For visual analysis of natural scenes, that is, scenes cluttered with multiple, non-completely visible objects in an uncontrolled context, it is a highly non-trivial task to optimize the match of a generative model to the data. Local optimization techniques will usually get stuck in meaningless, local optima, whereas techniques akin to exhaustive search (6 DoF per rigid object model) are precluded by time and/or memory constraints. The critical aspect of many object recognition problems hence concerns the generation of a clever search sequence.
We use a three-camera system (Triclops, Point Grey Research Inc.) to perform stereo processing with a horizontal and a vertical stereo pair (baseline about 10 cm). Each image has a resolution of 160x120 pixels. The stereo algorithm employed is a straightforward local correspondence search by minimizing the sum of absolute differences over square patches of the Laplacian-of-Gaussian filtered images.
A novel statistical criterion, the truncated probability of object parameters, is introduced to infer an optimal sequence of object hypotheses to be evaluated for their match to the stereo data. The truncated probability is partly determined from prior knowledge of the objects and partly learned from data. The probabilistic perspective revisits classic concepts from object recognition (grouping, indexing, geometric hashing, alignment) and adds insight into the optimal ordering of object hypotheses for evaluation.
As a key component of the truncated probability, we model local surface shape by point-relation densities. That is, for characteristic shapes, the system learns the statistics of coordinate-independent measures of geometric relations between multiple stereo-data points. The shapes we have chosen are convex and concave corners. They function as features for selecting object hypotheses.
Each hypothesis on an object identity and pose is defined by establishing a correspondence between triples of model and data points. Essentially, object hypotheses are ranked for evaluation according to the probability of their underlying correspondence. This probability, in turn, depends on the posterior probability of the data points being corner points of matching type and on the similarity of their geometric relation.
Evaluation starts with the most probable correspondence, proceeds with less probable correspondences, and stops when the probability drops below a certain threshold or a hypothesis is evaluated as being good enough. Hypotheses are evaluated with a stored generative model of the stereo-data points that arise from the hypothesized object.
Ranking of corner points among all data points by their posterior probability of being a corner: almost 70% of the corner points are ranked within the top 5% of all data points.
The figure below shows an example scene with two shapes we used for experiments. The smaller shape is stacked on the larger as sketched at the top right of the figure. Displayed are one of the camera images from the stereo-vision system and two orthogonal views of the associated 3D-point data; note the poor quality of the data (low resolution, noise, outliers, artifacts). The shape recognized first in the sequence of hypotheses is outlined in the stereo data.
Ulrich Hillenbrand and Gerd Hirzinger. Probabilistic search for object segmentation and recognition. Proceedings European Conference on Computer Vision — ECCV 2002, Lecture Notes in Computer Science Vol. 2352, Springer, pp. 791-806.