Published September 27, 2025 | Version v9
Dataset Open

Speculative Automated Refactoring of Imperative Deep Learning Programs to Graph Execution

Description

Efficiency is essential to support ever-growing datasets, especially for Deep Learning (DL) systems. DL frameworks have traditionally embraced deferred execution-style DL code---supporting symbolic, graph-based Deep Neural Network (DNN) computation. While scalable, such development is error-prone, non-intuitive, and difficult to debug. Consequently, more natural, imperative DL frameworks encouraging eager execution have emerged but at the expense of run-time performance. Though hybrid approaches aim for the "best of both worlds," using them effectively requires subtle considerations. Our key insight is that, while DL programs typically execute sequentially, hybridizing imperative DL code resembles parallelizing sequential code in traditional systems. Inspired by this, we present an automated refactoring approach that assists developers in determining which otherwise eagerly-executed imperative DL functions could be effectively and efficiently executed as graphs. The approach features novel static imperative tensor and side-effect analyses for Python. Due to its inherent dynamism, analyzing Python may be unsound; however, the conservative approach leverages a speculative (keyword-based) analysis for resolving difficult cases that informs developers of any assumptions made. The approach is: (i) implemented as a plug-in to the PyDev Eclipse IDE that integrates the WALA Ariadne analysis framework and (ii) evaluated on nineteen DL projects consisting of 132 KLOC. The results show that 326 of 766 candidate functions (42.56%) were refactorable, and an average relative speedup of 2.16x on performance tests was observed with negligible differences in model accuracy. The results indicate that the approach is useful in optimizing imperative DL code to its full potential.

Files

candidate_functions.csv

Files (1.9 GB)

Name Size Download all
md5:d796b4d64f94897248af69e53aa04746
64.0 kB Preview Download
md5:21fc32f517ff050c0d066029b5595d29
218.1 kB Preview Download
md5:94e78804c1ef35704f6ea89d2a8fc2f9
191.1 kB Preview Download
md5:9f14727001525e480324934133595ff9
3.0 MB Preview Download
md5:508965b7313d469ce04881c1d082d27b
36.1 kB Preview Download
md5:ed3d0dee5cc4813f367fc39c96655a1e
28.0 kB Preview Download
md5:9ca44698ad6787678aa435a862dfe32b
37.1 MB Preview Download
md5:ff3cd2f2da6d04ea7811daa8c6009e84
5.7 kB Preview Download
md5:03a36bb192ddd8bd7b64acd078c98d37
1.9 GB Preview Download
md5:7a622d2c2368e267e6ded74172e57100
3.2 MB Preview Download
md5:6c57ff77204abcc2ff970856e1429704
2.3 kB Preview Download
md5:31676dd4f9ab4bdfe7c0ee4f1bfc508b
33.8 kB Preview Download

Additional details

Related works

Is compiled by
Software: 10.5281/zenodo.15045769 (DOI)
Is supplement to
Conference paper: arXiv:2504.05424 (arXiv)

Funding

U.S. National Science Foundation
SHF: Small: Practical Analyses and Safe Transformations for Imperative Deep Learning Programs 2200343
U.S. National Science Foundation
Collaborative Research: CCRI: New: A Software Refactoring Community Infrastructure 2213763
U.S. National Science Foundation
SHF: Small: Knowledge, Methodologies, and Tool-support for Combating Technical Debt in Machine Learning Systems 2343750

Software