Shared memory parallel programming, for instance with OpenMP, might look simple at first sight. However, the achievable performance of real-world applications depends on certain machine properties - for instance the achievable memory bandwidth within the memory hierarchy - and in how far the application programmer has taken these into account. This thesis presents solutions for designing shared memory parallel applications targeting current and future system architectures by following a methodical approach. Therefore, it builds on a successful strategy from the software engineering discipline: the introduction of abstractions.
With large shared memory machines typically providing a non-uniform memory access, and taking energy efficiency into account, expressing and managing data locality is important today and will be even more so on the next generation of machines. The de-facto standard parallelization paradigms MPI and OpenMP were not well-equipped to allow for that, nor is this a task a domain scientist is interested in solving. Suitable abstractions for handling parallelism and data locality have to be powerful enough to fulfill their purpose while being simple enough to be applicable to existing application codes. The means of abstraction in this work are twofold: first, it relates to the methodical selection of those architecture-specific properties that are important to achieve performance, and second, it relates to the design of parallel programming model concepts and of software components to foster the parallelization of simulation codes.
For the abstractions to be acceptable by end users, existing data structures and code designs should be left unmodified as much as possible, in particular in object-oriented codes. Hence, the abstractions have to be formulated in a broadly accepted form, for instance they have to integrate well with common design patterns. To achieve this goal, this work first identifies several memory management idioms for NUMA machines. Second, a powerful yet simple-to-use and flexible thread affinity model for OpenMP is developed. Third, the memory management idioms and the thread affinity model are shown to support the parallelization with object-oriented abstractions. To support all this, a set of benchmarks and experiments to analyze OpenMP implementation behavior is presented. This work as a whole proposes a methodic approach to develop parallel scientific software for multi- and many-core architectures.
Abstractions for Performance Programming on Multi-Core Architectures with Hierarchical Memory
Expressing and managing data locality is important today and will be even more so on the next generation of machines. This thesis presents abstractions for achieving high performance with shared memory parallel applications. It identifies several memory management idioms for NUMA machines and the development of the thread affinity model for OpenMP. The memory management idioms and the thread affinity model are shown to support the parallelization with object-oriented abstractions.