How can ever-larger volumes of scientific data be processed and evaluated? And how can Earth observation data be meaningfully combined with ground measurements, thereby opening up new sources of information? In the cross-sectoral Big Data Platform project, researchers from the German Aerospace Center (Deutsches Zentrum für Luft- und Raumfahrt; DLR) are devising new methods for the future-oriented field of Big Data Science. The interdisciplinary research project involves 21 DLR institutes from the research fields of spaceflight, aeronautics, transport, energy, digitalisation and security – all working together. The project is set to run for four years and has received more than 21 million euro of funding.
Big Data Science as the key to digital transformation
The volume of scientifically usable data is growing enormously year upon year. Such data can only be systematically evaluated and utilised with the help of effective data analysis or intelligent networking. At the same time, dealing with large quantities of data also presents an enormous challenge. "Big Data Science enables a variety of new approaches to research and is the key to the digital transformation that is taking place across society, including the economy. The Big Data Platform cross-sectoral project, which is firmly anchored in our digitalisation strategy, sees DLR developing methods to derive socially relevant knowledge from raw data," says Pascale Ehrenfreund, Chair of the DLR Executive Board. "We already have a great deal of experience with Big Data Science in a number of areas, such as remote sensing, transport research and detailed computer simulations for use in aeronautics. The broad-based Big Data Platform project will allow us to optimally harness the synergies created through our different areas of research."
Merging different datasets
One important goal of the project is to develop methods for analysing large datasets. In addition, the researchers are working on data management techniques that make it possible to merge heterogeneous datasets. "DLR has long been working in this field of research and has considerable experience in interpreting large datasets," stresses Rolf Hempel, the Project Coordinator and Head of the Simulation and Software Technology Facilityat DLR. "By linking different sets of data, such as satellite images and images of buildings posted on social media, we can derive new, previously unrecognised information. This allows us to distinguish between different types of urban areas much more effectively than was previously the case, for example, and this information can then be used for urban planning."
Data mining and machine learning
Another area of focus for the Big Data Platform cross-sectoral project is the exploration of analytical techniques that make use of data mining and machine learning. Data mining involves analysing data with the aim of ‘tracing’ information and discerning patterns. Machine learning, by contrast, seeks not only to recognise patterns and repeated occurrences, but also for the system to further develop its capabilities by processing training datasets. Self-training systems used in Earth observation research, for example, are able to interpret datasets more quickly and effectively. This means that buildings, roads and even types of vegetation can be detected with far greater accuracy on the basis of aerial and satellite images. In addition, machine-learning processes form an important building block in the creation of ‘autonomous driving’ and ‘smart mobility’ systems. One specific example of this type of application is the high-precision identification of roads and road markings, from which free parking spaces within a city can be filtered out in real-time analysis.
Another specific application being addressed by the Big Data Platform project is the real-time analysis of image data for rapid disaster management. Location data and other detailed information provide emergency services with vital support for their operations. Smart data analysis using machine-learning methods has also proven useful for climate computing, where large datasets are evaluated to obtain a better understanding of climate mechanisms.