Ordinary cameras capture images by projecting light of a three dimensional scene onto a two dimensional plane (i.e. the image plane). Due to this process, one dimension is lost. This dimension is the depth of an image, i.e. the distance between the camera and objects in the scene. The depth can be recovered from two (or more) images, which are taken from different (known) viewpoints. This is called Stereo Vision. The recovery of image depth instantly permits the creation of a three dimensional model of the scene (i.e. reconstruction). This model can be textured with the camera images.
A program for automatic reconstruction has to:
- Find points that correspond in both images (i.e. matching of points).
- Perform triangulation for all correspondences using the known internal and external geometry of the cameras (calibration) for calculating the corresponding three dimensional points.
The former step is generally a time consuming and error prone search operation within the images, which is an important research topic. The latter one is an exact geometric calculation.
Accurate and Efficient Stereo Matching
For real-time robotics applications, fast stereo vision systems are required, which can process images at least several times a second or possibly even at video frame rate. Additionally, there is a demand for accurate matching results. An apparently different application is the offline processing of huge images (e.g. 1000MPixel) from an airborne linear pushbroom camera for reconstruction purposes. However, this application also requires efficient stereo matching as real-time robotics applications, due to the enormous amount of image data. Furthermore, accuracy of stereo matching is crucial. Thus, both applications require very similar solutions.
This research is based on previous investigations and improvements of correlation based real-time stereo methods (Hirschmüller et al., 2002, Hirschmüller 2003) for supporting tele-operated mobile robot applications (Link). The main objective of this research is the increase of stereo matching accuracy, especially at sharp object boundaries. This is generally a problem for all correlation based stereo methods.
A new method has been devised, which does not use correlation windows, but performs pixel-wise matching to avoid wrong matches near object boundaries. Local, pixel-wise matching is controlled by a global energy function, which models smoothness of surfaces as well as sharp object boundaries. Stereo methods, which are based on global energy functions are well known. However, implementations are commonly very slow, due to the combinatorial complexity that arises from minimizing the global energy function. The new method combines techniques of local methods (e.g. stereo correlation) with those of global methods (e.g. global energy minimization). The result is a fast minimization of a semi-global energy function of pixel-wise matches. The complexity of the new method is the same as for stereo correlation, e.g. O(w*h*d). Thus, it is assumed that speed optimized implementations of this method can be used in future real-time or near real-time applications.
First results are given below. The left image is a small part of larger imagery from an airborne linear pushbroom camera (HRSC). Depth images are seen in the middle and on the right. The lighter the gray tone, the higher the pixel is over the ground. The image in the middle shows results of the currently used hierarchical, correlation based stereo algorithm. It can easily be seen that the sharp borders of houses are blurred. The right image shows results of the new method, which are much sharper at object boundaries.
Current and future work includes optimizing the implementation for speed and testing the method on a wide variety of applications.
Target applications are stereo vision problems in which efficiency and accuracy of stereo matching are both important. These include real-time robotics and modeling problems as well as the reconstruction of cities or interesting buildings from huge high resolution images of airborne cameras. Concrete applications include:
- Fast stereo matching for three dimensionally scanning models, etc., using the Multisensory 3D Modeller (3DMo).
- 3D reconstruction of urban terrain from huge high resolution images of an airborne linear pushbroom camera, like HRSC.
This research is based on previous investigations and improvements of correlation based real-time stereo matching methods (Hirschmüller et al., 2002). Fast and accurate stereo matching was required for the concurrent real-time creation of maps and immediate virtual walkthrough to support tele-operated mobile robot applications (Hirschmüller, 2003). This project was done at the Centre for Computational Intelligence at De Montfort University, Leicester, UK.
The image below shows the creation of an overview map (middle) and an immediate virtual walkthrough (right), which were created in real-time purely from images of an arbitrarily moving stereo camera (left) without the use of other sensors. All operations are performed with around 6-8 frames per second on one 2GHz PC.
Heiko Hirschmüller (2005), "Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information", IEEE CVPR, San Diego, USA, June 2005.
Heiko Hirschmüller (2003), "Stereo Vision Based Mapping and Immediate Virtual Walkthroughs", Ph.D. Thesis, De Montfort University, Leicester, UK, June 2003.
Heiko Hirschmüller, Peter R. Innocent and Jon Garibaldi (2002), "Real-Time Correlation-Based Stereo Vision with Reduced Border Errors", International Journal of Computer Vision, Volume 47 (1/2/3), April-June 2002, pp. 229-246.