Ordinary cameras capture images by projecting light of a three dimensional scene onto a two dimensional plane (i.e. the image plane). Due to this process, one dimension is lost. This dimension is the depth of an image, i.e. the distance between the camera and objects in the scene. The depth can be recovered from two (or more) images, which are taken from different (known) viewpoints. This is called Stereo Vision. The recovery of image depth instantly permits the creation of a three dimensional model of the scene (i.e. reconstruction). This model can be textured with the camera images.
A program for automatic reconstruction has to:
The former step is generally a time consuming and error prone search operation within the images, which is an important research topic. The latter one is an exact geometric calculation.
Disparity from cross-correlation
Disparity using SGM
For real-time robotics applications, fast stereo vision systems are required, which can process images at least several times a second or possibly even at video frame rate. Additionally, there is a demand for accurate matching results. An apparently different application is the offline processing of huge images (e.g. 1000 MPixel) from an airborne linear pushbroom camera for reconstruction purposes. However, this application also requires efficient stereo matching as real-time robotics applications, due to the enormous amount of image data. Furthermore, accuracy of stereo matching is crucial. Thus, both applications require very similar solutions.
On the one hand, an improved correlation-based real-time stereo method has been implemented (Hirschmüller et al., 2002, Hirschmüller 2003). The main objective of this research is the increase of stereo matching accuracy, especially at sharp object boundaries. This is generally a problem for all correlation based stereo methods.
On the other hand, a new method has been devised, which does not use correlation windows, but performs pixel-wise matching to avoid wrong matches near object boundaries. Local, pixel-wise matching is controlled by a global energy function, which models smoothness of surfaces as well as sharp object boundaries. Stereo methods, which are based on global energy functions are well known. However, implementations are commonly very slow, due to the combinatorial complexity that arises from minimizing the global energy function. The new method combines techniques of local methods (e.g. stereo correlation) with those of global methods (e.g. global energy minimization). The result is a fast minimization of a semi-global energy function of pixel-wise matches. The complexity of the new method is the same as for stereo correlation, e.g. O(w·h·d). Thus, it is assumed that speed optimized implementations of this method can be used in future real-time or near real-time applications.
You can see results on the right-hand side. The image at the top is a small part of larger imagery from an airborne linear pushbroom camera (HRSC). Depth images are seen in the middle and at the bottom. The lighter the gray tone, the higher the pixel is over the ground. The image in the middle shows results of the currently used hierarchical, correlation based stereo algorithm. It can easily be seen that the sharp borders of houses are blurred. The image at the bottom shows results of the new method, which are much sharper at object boundaries.
Semi-Global Matching (SGM) is a dense stereo matching method that can be used for accurate 3D reconstruction from a pair of calibrated images. SGM tries to find correspondences for every pixel. This is supported by a global cost function, which is optimized in 8 path directions across the image. The method has a regular algorithmic structure and uses simple operations. In the core loop, integer values are compared and added. This enables parallel implementations on graphics cards and FPGAs for real-time applications. Although SGM is not any more the winning method in international benchmarks that only evaluate quality (see Middlebury benchmark and KITTI), it turns out that SGM offers a very good mixture of speed, quality and robustness, which is the reason for its success in practice.
SGM is used in photogrammetry. The development of many commercial photogrammetric software packages has been strongly influenced by its appearance. This fact has been honored by the Carl-Pulfrich Award 2011.
SGM is also used for driver assistance systems. The 6D vision system by Daimler researchers uses a real-time implementation of SGM. Since summer 2013, the 6D vision system is a foundation of several commercially available driver assistance systems in production cars.
At the RMC, SGM is used for environment modeling from satellite, aerial and multicopter images as well as the workspace analysis of robots like ROMO and for 3D modeling. A real-time FPGA implementation is used for autonomous navigation of flying, walking and crawling robots as well as for planetary rovers. The reconstruction of airborne pushbroom cameras is carried out by a specilized FPGA implementation. Currently we are working on a more flexible and modern implementation for our mobile robots.
Korbinian Schmid, Teodor Tomic, Felix Ruess, Heiko Hirschmüller und Michael Suppa (2013), Stereo Vision based indoor/outdoor Navigation for Flying Robots, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2013, Tokyo Japan, Robocup Best Paper Award
Annett Stelzer, Heiko Hirschmüller und Martin Görner (2012), Stereo-Vision-Based Navigation of a Six-Legged Walking Robot in Unknown Rough Terrain, in the International Journal of Robotics Research, Special Issue on Robot Vision, Volume 31, Issue 4, pp. 381-402.
Heiko Hirschmüller und Daniel Scharstein (2009), Evaluation of Stereo Matching Costs on Images with Radiometric Differences, in IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 31(9), September 2009, pp. 1582-1599.
Heiko Hirschmüller (2008), Stereo Processing by Semi-Global Matching and Mutual Information, in IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 30(2), February 2008, pp. 328-341.
Heiko Hirschmüller (2005), Accurate and Efficient Stereo Processing by Semi-Global Matching and Mutual Information, IEEE CVPR, San Diego, USA, June 2005.
Heiko Hirschmüller (2003), Stereo Vision Based Mapping and Immediate Virtual Walkthroughs, Ph.D. Thesis, De Montfort University, Leicester, UK, June 2003.
Heiko Hirschmüller, Peter R. Innocent and Jon Garibaldi (2002), Real-Time Correlation-Based Stereo Vision with Reduced Border Errors, International Journal of Computer Vision, Volume 47 (1/2/3), April-June 2002, pp. 229-246.
The cvkit software package is offered for supporting an easy exchange and analysis of computer vision data including images in floating point format and 3D data like point clouds or triangle meshes. The toolkit targets researchers as well as interested groups and individuals. It is also especially useful for working with the Middlebury stereo and multi-view data.
sv is a simple / scientific image viewer that can display monochrome and color images with 8 and 16 bit integer as well as 32 bit float values as data types per color channel. Functions include showing monochrome images with color encoding, defining radiometric ranges, zooming and automatically reloading images (Linux only). For image comparison, settings like zoom, radiometric range, etc, can be kept while switching between images. Depth images (full or parts) with associated camera parameter files can be visualized on-the-fly in 3D. sv natively supports the pgm, ppm and pfm image formats as well as tiff with 8 and 16 bit integer and 32 bit float values. Tiff, jpg, png, gif and many other raster data formats are supported through optional libraries like GDAL.
plyv is a simple but pretty fast viewer for colored point clouds and meshes with per vertex coloring, shading and texture images. It also supports on-the-fly conversion and visualization of depth images and cameras. plyv is based on OpenGL and can cope with big data sets that consist of many million vertices and triangles. Mainly the ply format is supported, which has been invented at Stanford University as an extendable format for storing vertices and polygons together with additional information. It is especially useful for scanned real-world data.
NOTE: The toolkit does not contain SGM or any other stereo method and we do not offer SGM as source code or binary.
Up-to-date versions are available on the Middlebury Stereo Vision Page.