Published April 3, 2024 | Version 0.0.1
Dataset Open

Fives Input dataset (Cobalt & Darshan traces, combined and preprocessed)

  • 1. ROR icon Inria Rennes - Bretagne Atlantique Research Centre

Description

Dataset made of aggregated and curated Cobalt and Darshan logs from the Theta HPC platform at ALCF.

Cobalt and Darshan logs were obtained from ALCF Public Data repository (https://reports.alcf.anl.gov/data/index.html) and cover the year 2022. This data was generated from resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357. In order to use the scripts contained within this archive, these datasets must be downloaded and placed in the directory '2022' at the root of the extracted archive.

The Darshan logs used in this datasets are originillay available in an aggregated form. The levels of details are usually the following : 

  • job (reservation made to a resource manager for some platform resources)
  • application run (application running inside the job, on the reserved resources ; there may be multiple ones, sequentially or in parallel, during a job's execution)
  • I/O operation (read or write registered to a file from a process of an application)

Darshan CSV files for Theta contain job and application runs informations, but individual I/O of each application run is aggregated into a single entry.

This resource is organised as a single archive containing:

  • YAML files with our datasets, at various granularity levels (in 'preprocessed_datastets' directory):
    • 48 files containing each 1 month worth of job traces for one of 3 job classes (4 files per month, one per job class and one with all job classes) 
    • 4 files containing each the entire year worth of job traces ; 1 file per job class, 1 file with all job classes.
  • A Jupyter Lab notebook, which contains the necessary routines to create aformentionned datasets from raw logs files from ALCF, for the Theta system
  • A requirements.txt file, describing required Python packages and their versions.
  • Various empty directories meant to receive outputs from the Jupyter notebook.

Files

Files (17.6 MB)

Name Size Download all
md5:d442b8281c663d9c1ef8e7f3ac18ce16
17.6 MB Download

Additional details

Software

Programming language
Python
Development Status
Active