Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published May 10, 2022 | Version v1
Thesis Open

Workflow models for heterogeneous distributed systems

  • 1. Università degli Studi di Torino, Computer Science Dept.

Contributors

  • 1. Università degli Studi di Torino, Computer Science Dept.

Description

The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security.

Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow.

The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology.

Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures.

Files

main.pdf

Files (4.3 MB)

Name Size Download all
md5:26fbd36254ef27bf2050e99a1344d9b1
4.3 MB Preview Download

Additional details

Related works

Describes
Journal article: 10.1109/TETC.2020.3019202 (DOI)
Journal article: 10.1016/j.future.2021.10.007 (DOI)
Conference paper: 10.5281/zenodo.5151511 (DOI)
Conference paper: 10.4230/OASIcs.PARMA-DITAM.2021.5 (DOI)

Funding

DeepHealth – Deep-Learning and HPC to Boost Biomedical Applications for Health 825111
European Commission
ACROSS – HPC BIG DATA ARTIFICIAL INTELLIGENCE CROSS STACK PLATFORM TOWARDS EXASCALE 955648
European Commission