High Performance Data Analytics
What and Why?
terrabyte is an innovative High Performance Data Analytics (HPDA) platform operated by the DLR and the Leibniz Supercomputing Center (LRZ), an institution of the Bavarian Academy of Sciences and Humanities. The platform provides researchers with efficient access to Earth Observation data, a powerful processing environment, and practical tools for data analysis. It is becoming the new normal for earth scientists to combine global and decadal observations recorded by multiple international satellite missions with cutting edge Artificial Intelligence (AI) methods to answer the multitude of pressing questions of climate change. The terrabyte platform supports these scientists with the tools they need to let them concentrate on finding the answers. Its focus is on the generation of global satellite data products like the World Settlement Footprint (WSF) Suite, the Global SnowPack, Global WaterPack, or the Tree Canopy Cover Loss.
The platform provides a comprehensive collection of decadal Earth Observation data at global coverage for direct access from the computing resources. All data are accessible via a Spatial Temporal Asset Catalog (STAC) Application Programming Interface (API). In the future, the continually updated collection will comprise Sentinel, Landsat, MODIS, VIIRS, Meteosat, ENIVSAT, and ERS products. Additionally, Analysis Ready Data (ARD) of Sentinel-1 and Sentinel-2 are produced and provided for further analysis.
terrabyte connects DFD’s D-SDA (German Satellite Data Archive) and a large high performance online storage using the LRZ’s Data Science Storage (DSS) concept. It provides more than 30 Petabyte of relevant Earth Observation data online for use in different applications. Earth Observation data from D-SDA can be ordered on demand. terrabyte offers a supplement to commercial data and processing cloud environments and meets current European security and data protection requirements.
terrabyte combines large-scale CPU and NVIDIA GPU computing resources close to the Earth Observation data storage. In addition, higher-level services are provided to simplify the usage of the infrastructure: e.g., JupyterLab (“bring your own environment”) and QGIS in the browser via the terrabyte Portal, Charliecloud as Docker alternative, SLURM as workload manager, STAC metadata API to discover and filter available data curated by terrabyte or other users (“bring your own data”), and data cube analyses based on xarray and Dask. The suite of services and tools will be continuously improved and extended according to their demand by researchers.
In addition to the data collections on terrabyte’s DSS components, terrabyte also provides a high-bandwidth data transfer infrastructure between the DSS and DLR’s satellite data archive as well as between the DSS and DLR-internal storage maintained by the respective institutes.
All scientists from DLR and LRZ with scientific and non-commercial applications have access to the terrabyte platform. If you are interested in an access, please write us an email with your project outline and a short explanation of your application. For terrabyte users, a variety of support and training resources are available.