The increasing performance of today’s supercomputer is mainly caused by a growing parallelism of the numerical computations. This parallelism is achieved by connecting many processing units, but nowadays the number of computational cores per CPU and the capability of each core to treat multiple data simultaneously (SIMD) are even more important. This trend results in increasingly heterogeneous bandwidths and latencies in between the processing units and the memory (cache and RAM) attached to them. Therefore, high-performance software must be developed with great care to account for all arising bottlenecks of the utilized hardware.
In collaboration with MTU Aero Engines AG the flow solver TRACE developed at DLR will be improved w.r.t. its applicability on modern supercomputers. Here, the reorganization of the necessary data structures in memory is a key issue to exploit the capabilities of all hardware layers. Existing optimized libraries will be used where it makes sense, while the remaining parts of the code will be adopted based on modern standards (MPI, OpenMP, GASPI). Furthermore, unavoidable idle times during the communication of the parallel processes shall be used for communication-independent computational work.