Stereo Vision

Ordinary cameras capture images by projecting light of a three dimensional scene onto a two dimensional plane (i.e. the image plane). Due to this process, one dimension is lost. This dimension is the depth of an image, i.e. the distance between the camera and objects in the scene. The depth can be recovered from two (or more) images, which are taken from different (known) viewpoints. This is called Stereo Vision. The recovery of image depth instantly permits the creation of a three dimensional model of the scene (i.e. reconstruction). This model can be textured with the camera images.

A program for automatic reconstruction has to:

  • Find points that correspond in both images (i.e. matching of points).
  • Perform triangulation for all correspondences using the known internal and external geometry of the cameras (calibration) for calculating the corresponding three dimensional points.

The former step is generally a time consuming and error prone search operation within the images, which is an important research topic. The latter one is an exact geometric calculation.

Accurate and Efficient Stereo Matching

Original image
Disparity from cross-correlation
Disparity using SGM

For real-time robotics applications, fast stereo vision systems are required, which can process images at least several times a second or possibly even at video frame rate. Additionally, there is a demand for accurate matching results. An apparently different application is the offline processing of huge images (e.g. 1000 MPixel) from an airborne linear pushbroom camera for reconstruction purposes. However, this application also requires efficient stereo matching as real-time robotics applications, due to the enormous amount of image data. Furthermore, accuracy of stereo matching is crucial. Thus, both applications require very similar solutions.

On the one hand, an improved correlation-based real-time stereo method has been implemented (Hirschmüller et al., 2002, Hirschmüller 2003). The main objective of this research is the increase of stereo matching accuracy, especially at sharp object boundaries. This is generally a problem for all correlation based stereo methods.

On the other hand, a new method has been devised, which does not use correlation windows, but performs pixel-wise matching to avoid wrong matches near object boundaries. Local, pixel-wise matching is controlled by a global energy function, which models smoothness of surfaces as well as sharp object boundaries. Stereo methods, which are based on global energy functions are well known. However, implementations are commonly very slow, due to the combinatorial complexity that arises from minimizing the global energy function. The new method combines techniques of local methods (e.g. stereo correlation) with those of global methods (e.g. global energy minimization). The result is a fast minimization of a semi-global energy function of pixel-wise matches. The complexity of the new method is the same as for stereo correlation, e.g. O(w·h·d). Thus, it is assumed that speed optimized implementations of this method can be used in future real-time or near real-time applications.

You can see results on the right-hand side. The image at the top is a small part of larger imagery from an airborne linear pushbroom camera (HRSC). Depth images are seen in the middle and at the bottom. The lighter the gray tone, the higher the pixel is over the ground. The image in the middle shows results of the currently used hierarchical, correlation based stereo algorithm. It can easily be seen that the sharp borders of houses are blurred. The image at the bottom shows results of the new method, which are much sharper at object boundaries.

Semi-Global Matching (SGM)

Semi-Global Matching (SGM) is a dense stereo matching method that can be used for accurate 3D reconstruction from a pair of calibrated images. SGM tries to find correspondences for every pixel. This is supported by a global cost function, which is optimized in 8 path directions across the image. The method has a regular algorithmic structure and uses simple operations. In the core loop, integer values are compared and added. This enables parallel implementations on graphics cards and FPGAs for real-time applications. Although SGM is not any more the winning method in international benchmarks that only evaluate quality (see Middlebury benchmark and KITTI), it turns out that SGM offers a very good mixture of speed, quality and robustness, which is the reason for its success in practice.

SGM is used in photogrammetry. The development of many commercial photogrammetric software packages has been strongly influenced by its appearance. This fact has been honored by the Carl-Pulfrich Award 2011.

SGM is also used for driver assistance systems. The 6D vision system by Daimler researchers uses a real-time implementation of SGM. Since summer 2013, the 6D vision system is a foundation of several commercially available driver assistance systems in production cars.

At the RMC, SGM is used for environment modeling from satellite, aerial and multicopter images as well as the workspace analysis of robots like ROMO and for 3D modeling. A real-time FPGA implementation is used for autonomous navigation of flying, walking and crawling robots as well as for planetary rovers. The reconstruction of airborne pushbroom cameras is carried out by a specilized FPGA implementation. Currently we are working on a more flexible and modern implementation for our mobile robots.

Selected References

Korbinian Schmid, Teodor Tomic, Felix Ruess, Heiko Hirschmüller und Michael Suppa (2013), Stereo Vision based indoor/outdoor Navigation for Flying Robots, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2013, Tokyo Japan, Robocup Best Paper Award

Annett Stelzer, Heiko Hirschmüller und Martin Görner (2012), Stereo-Vision-Based Navigation of a Six-Legged Walking Robot in Unknown Rough Terrain, in the International Journal of Robotics Research, Special Issue on Robot Vision, Volume 31, Issue 4, pp. 381-402.

Heiko Hirschmüller und Daniel Scharstein (2009), Evaluation of Stereo Matching Costs on Images with Radiometric Differences, in IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 31(9), September 2009, pp. 1582-1599.

Heiko Hirschmüller (2008), Stereo Processing by Semi-Global Matching and Mutual Information, in IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 30(2), February 2008, pp. 328-341.

Heiko Hirschmüller (2005), Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information, IEEE CVPR, San Diego, USA, June 2005.

Heiko Hirschmüller (2003), Stereo Vision Based Mapping and Immediate Virtual Walkthroughs, Ph.D. Thesis, De Montfort University, Leicester, UK, June 2003.

Heiko Hirschmüller, Peter R. Innocent and Jon Garibaldi (2002), Real-Time Correlation-Based Stereo Vision with Reduced Border Errors, International Journal of Computer Vision, Volume 47 (1/2/3), April-June 2002, pp. 229-246.

Links