Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published December 13, 2020 | Version v1
Software Open

Evolution of software code at the level of fine-grained elements: source code

  • 1. University College London
  • 2. Athens University of Economics and Business

Description

A model regarding the lifetime of individual source code lines or tokens can estimate maintenance effort, guide preventive maintenance, and, more broadly, identify factors that can improve the efficiency of software development.  We present methods and tools that allow tracking of each line's or token's birth and death.  Through them, we analyze 3.3 billion source code element lifetime events in 89 revision control repositories.  Statistical analysis shows that code lines are durable, with a median lifespan of about 2.3 years, and that young lines are more likely to be modified or deleted, following a Weibull distribution with the associated hazard rate decreasing over time.  This behavior appears to be independent from specific characteristics of lines or tokens, as we could not determine factors that influence significantly their longevity across projects.  The programming language, and developer tenure and experience were not found to be significantly correlated with line or token longevity, while project size and project age showed only a slight correlation.

The source code files available here have been used for performing the study. A separate upload contains the corresponding data.

Files

Files (1.5 MB)

Name Size Download all
md5:b7b51ac9a4e1735da783e65d62c1563a
1.5 MB Download

Additional details

Related works

Compiles
Dataset: 10.5281/zenodo.4319986 (DOI)
Is supplement to
Software: https://github.com/dspinellis/code-lifetime/ (URL)