Published October 18, 2025 | Version v0.1
Software Open

Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective

Description

This artifact includes the source code for the paper "Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective" published in the MICRO 2025 conference. It contains the evaluated frameworks and a modified Zeus implementation for hardware telemetry collection. It also provides SLURM and bash scripts to prepare and launch experiments, along with Python scripts to reproduce and visualize key results from the paper.

Files

Files (68.6 MB)

Name Size Download all
md5:d2f8e4b0e0ec3ade88fc52c85dcbf586
68.6 MB Download