Published August 6, 2024 | Version v1
Presentation Open

DTIO: Unifying I/O for HPC and AI

  • 1. ROR icon Illinois Institute of Technology

Description

HPC, Big Data Analytics, and Machine Learning have become increasingly intertwined, as popular models such as LLMs and Diffusion Models have driven discovery in fields such as molecular simulation and cosmology. Applications like GenSLMs and OpenFold have proven the value of ML in accelerating scientific applications. However, The convergence of these fields is incomplete, as each has its own storage infrastructure with unique I/O interfaces and storage systems. For HPC, the typical storage infrastructure involves a Parallel File System and HDF5, MPI-IO, or POSIX, while ML workloads such as RAG may utilize a distributed vector database. Their application domains have different I/O needs, with HPC typically utilizing write-intensive bulk operations while ML has read-intensive small operations. There is a need for a system which unifies the existing I/O stack for the convergence of HPC and ML. For this purpose, we propose DTIO, a DataTask I/O Library. DTIO will preserve the semantics of ML and HPC storage stacks while providing transparent data placement for a given data object. It will provide delayed consistency, which achieves better performance and decision-making by performing I/O asynchronously during compute phases. It will replicate tasks across various storage to serve the purposes of converged workflows. Finally, it will utilize I/O interception and translation to the DataTask abstraction in order to accomplish these objectives.

Files

DTIO Poster HUG Intro Presentation.pdf

Files (588.8 kB)

Name Size Download all
md5:dd4c480e8043fa887499290d7ac1c0fa
140.0 kB Preview Download
md5:8fe6dc36c007bab7484118b7138e35d9
448.8 kB Preview Download

Additional details

Related works

Is supplemented by
Video/Audio: https://youtu.be/yfwU4cosGJc (URL)