Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations

Maurizio Drocco

doi:10.5281/zenodo.1037585

Published October 26, 2017 | Version v1

Thesis Open

Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations

Maurizio Drocco¹

1. University of T

Contributors

Supervisor:

Aldinucci, Marco¹

1. University of Torino, Italy

In the realm of High Performance Computing (HPC), message passing has been the programming paradigm of choice for over twenty years. The durable MPI (Message Passing Interface) standard, with send/receive communication, broadcast, gather/scatter, and reduction collectives is still used to construct parallel programs where each communication is orchestrated by the developer-based precise knowledge of data distribution and overheads; collective communications simplify the orchestration but might induce excessive synchronization.

Early attempts to bring shared-memory programming model—with its programming advantages—to distributed computing, referred as the Distributed Shared Memory (DSM) model, faded away; one of the main issue was to combine performance and programmability with the memory consistency model. The recently proposed Partitioned Global Address Space (PGAS) model is a modern revamp of DSM that exposes data placement to enable optimizations based on locality, but it still addresses (simple) data- parallelism only and it relies on expensive sharing protocols.

We advocate an alternative programming model for distributed computing based on a Global Asynchronous Memory (GAM), aiming to avoid coherency and consistency problems rather than solving them. We materialize GAM by designing and implementing a distributed smart pointers library, inspired by C++ smart pointers. In this model, public and pri- vate pointers (resembling C++ shared and unique pointers, respectively) are moved around instead of messages (i.e., data), thus alleviating the user from the burden of minimizing transfers. On top of smart pointers, we propose a high-level C++ template library for writing applications in terms of dataflow-like networks, namely GAM nets, consisting of stateful processors exchanging pointers in fully asynchronous fashion.

We demonstrate the validity of the proposed approach, from the expressiveness perspective, by showing how GAM nets can be exploited to implement both standalone applications and higher-level parallel program- ming models, such as data and task parallelism. As for the performance perspective, preliminary experiments show both close-to-ideal scalability and negligible overhead with respect to state-of-the-art benchmark implementations. For instance, the GAM implementation of a high-quality video restoration filter sustains a 100 fps throughput over 70%-noisy high-quality video streams on a 4-node cluster of Graphics Processing Units (GPUs), with minimal programming effort.

Files

Drocco_phd_thesis.pdf

Files (2.1 MB)

Name	Size	Download all
Drocco_phd_thesis.pdf md5:345af3c9e67190a48e519c10cef5b06d	2.1 MB	Preview Download

Additional details

TOREADOR – TrustwOrthy model-awaRE Analytics Data platfORm 688797: European Commission
RePhrase – REfactoring Parallel Heterogeneous Resource-Aware Applications - a Software Engineering Approach 644235: European Commission
REPARA – Reengineering and Enabling Performance And poweR of Applications 609666: European Commission
PARAPHRASE – Parallel Patterns for Adaptive Heterogeneous Multicore Systems 288570: European Commission

	All versions	This version
Views	712	703
Downloads	987	986
Data volume	2.3 GB	2.3 GB

Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations

Creators

Contributors

Supervisor:

Description

Files

Drocco_phd_thesis.pdf

Files (2.1 MB)

Additional details

Funding