Terrain and city modeling from aerial images
Problem
Aerial images can be acquired relative quickly and with ground resolutions of down to a few cm per pixel. Usually, cameras are pre-calibrated and the position and orientation at each imaging position is measured during the flight by differential GPS and IMU sensors.
For several applications, it is useful or required to reconstruct the scene and integrate all individual images into one consistent, large image.
Method
All images are typically bundle adjusted such that the geometry of all cameras is known with high precision (i.e. less than one pixel error). Aerial cameras include full frame cameras as well as pushbroom cameras. While the former can be modeled using a pinhole model, the latter need a more complex model (Hirschmueller et al., 2005). The most critical step requires matching of all pixels with corresponding pixels of neighboring images using the Semi-Global Matching stereo algorithm.
The geometrical and correspondence information is used for reconstructing all points and mapping them into an equidistant grid. For aerial images, it is typically sufficient to store one height (i.e. the highest) in each grid cell, which results in a 2.5D model. Thus, the individual perspective projections are transformed and fused into an orthographic projection using a 2.5D model (Hirschmueller, 2008).
A true orthographic (single) image is created from all individual images by projecting the height of each grid cell back into all images and searching the image that best "sees" this point. Some geometrical and radiometrical based heuristics are used for finding unoccluded projections and for combining their values. Similarly, tilted (i.e. side) textures for houses, etc. can be generated by using tilted ortho projections (Hirschmueller, 2008).
For visualization, it is often necessary to transform the textured 2.5D model, which is represented as an equidistant grid of height and color values, into a textured mesh, that can be rendered fast on modern graphics hardware. The automatic process performs six steps: tilling with overlap, mesh generation, mesh simplifying, mesh cutting, mesh merging and texture mapping. The result is saved as a group of simplified meshes with corresponding textures in VRML 2.0 format (Liu et al., 2007).
Results
The images below show a fully automatic, textured reconstruction from aerial images with 7 cm/Pixel ground resolution. All steps after bundle adjustment are fully automatic (i.e. do not require any manual interaction).


Mesh generation of a terrain of size 24480x11570 resulted in 3623330 triangles and was done in about 26.5 minutes on a 2.4 GHz PC with 2 GB RAM. An example is shown in the image below.

Automatic terrain modeling is implemented on a cluster for automatically processing huge amounts if image data.
We have processed the data on a computer with Intel Core2 Duo Processor E6600 (2.4GHz) and 2048 MB RAM. The runtime to processing various models is showed in the following table:
| Region | Area Size nxm [pixels] | Resolution [m] | Output [Triangles] | Runtime [min] | 
|---|---|---|---|---|
| Kelheim | 24480 x 11570 | 0.20 | 3,623,330 | 26.5 | 
| Andechs | 6000 x 6000 | 0.15 | 463,333 | 3.6 | 
| Wieskirche | 6000 x 6000 | 0.20 | 465,557 | 3.7 | 
Another application is the reconstruction of Martian terrain from orbital images (Hirschmueller, et al., 2006).
Publications
Heiko Hirschmüller (2008), Stereo Processing by Semi-Global Matching and Mutual Information, in IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 30(2), February 2008, pp. 328-341.
R. Liu, D. Burschka, G. Hirzinger and B. Strackenbrock (2007), Real Time Fully Automatic 3D-Modeling of HRSC Landscape Data, in Proc. of Urban Remote Sensing Joint Event (6th URS/4th Urban), Paris, 2007.
Heiko Hirschmüller, Helmut Mayer, G. Neukum and the HRSC-CoI Team (2006), Stereo Processing of HRSC Mars Express Images by Semi-Global Matching, International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXVI, Part 4, 27-30 September 2006, Goa, India.
Heiko Hirschmüller, Frank Scholten, Gerd Hirzinger (2005), Stereo Vision Based Reconstruction of Huge Urban Areas from an Airborne Pushbroom Camera (HRSC), in Lecture Notes in Computer Science: Pattern Recognition, Proceedings of the 27th DAGM Symposium, 30 August - 2 September 2005, Vienna, Austria, Volume 3663, pp. 58-66.