The ever increasing demands for computational power tighten the reigns on available budgets. Constraints on expenses for acquisition and electrical power must be met, yielding continuously-increasing hardware and software complexity that application developers have to deal with. Thus, informed decision making on how to invest available budgets is more important than ever. Especially for HPC procurements, a quantitative metric helps to predict the cost effectiveness of an HPC center. However, prevailing metrics such as Linpack Flop/s, or Flop/s per watt only pick up part of the picture that HPC centers are concerned with, i.e., expenses for hardware, software, maintenance, infrastructure, energy, programming effort, as well as the value that researchers get from the HPC system in terms of scientific output.
In this work, I set up methodologies to support the HPC procurement process of German HPC centers. I model cost effectiveness of HPC centers as a productivity figure of merit by defining a ratio of scientific outcome generated over the lifetime of the HPC system to its total cost of ownership (TCO). Scientific outcome is further defined as number of scientific-application runs to embrace the multi-job nature of an HPC system in a meaningful way. I investigate the predictability of the model's parameters and show their robustness towards errors in various real-world HPC setups. The TCO component of my productivity model covers one-time and annual expenses and distinguishes between node-based and node-type-based costs. Costs for development efforts needed to parallelize, port and tune simulation codes to efficiently exploit HPC systems must also be part of a sound productivity model. Since software cost models from mainstream software engineering do not focus on the laborious task of squeezing out the last percentage points of runtime performance, I introduce a methodology to estimate corresponding HPC efforts. It is based on so-called performance life-cycles that model the relationship of performance and effort required to achieve that performance. My methodology further covers the identification and quantification of various impact factors on HPC development effort and provides methods to collect required data sets from human subjects for statistically reliable results. Finally, I present the applicability of my methodologies and models in a case study that covers a real-world application from aeroacoustics simulation.
Productivity and Software Development Effort Estimation in High-Performance Computing
Relying vast HPC investments on informed decision making is important when serving the ever increasing demands for computational power. To compare and evaluate HPC systems in procurement processes, I define a productivity model that focuses on the number of simulation application runs and the total cost of ownership (TCO). As part of TCO, I setup a methodology to estimate software development costs with respect to HPC-related efforts. I provide proof of concepts based on real-world HPC setups.