Technical Debt Classification in Issue Trackers using Natural Language Processing based on Transformers

Daniel Skryseth; Karthik Shivashankar; Ildikó Pilán; Antonio Martini

doi:10.5281/zenodo.7225077

Published January 23, 2023 | Version 1.001

Dataset Open

Technical Debt Classification in Issue Trackers using Natural Language Processing based on Transformers

1. University of Oslo, Norway
2. Norwegian Computing Center

In order to ensure transparency and reproducibility, we have made everything available publicly here, including the Code, Models, Datasets and more. All the files and their functionality used in this paper are explained clearly in the README.md file.

Background: Technical Debt (TD) needs to be controlled and tracked during software development. Support to automatically track TD in issue trackers is limited.

Aim: We explore the usage of a large dataset of developer-labeled TD issues in combination with cutting-edge Natural Language Processing (NLP) approaches to automatically classify TD in issue trackers.

Method: We mine and analyze more than 160GB of textual data from GitHub projects, collecting over 55,600 TD issues and consolidating them into a large dataset (GTD dataset). We use such datasets to train and test Transformer ML models. Then we test the model's generalization ability by testing them on six unseen projects. Finally, we re-train the models including part of the TD issues from the target project to test their adaptability.

Results and Conclusion: (i) We create and release the GTD dataset, a comprehensive dataset including TD issues from 6,401 public repositories with various contexts; (ii) By training Transformers using the GTD dataset, we achieve performance metrics that are promising; (iii) Our results are a significant step forward towards supporting the automatic classification of TD in issue trackers, especially when the models are adapted to the context of unseen projects after fine-tuning.

Files

README.md

Files (41.9 GB)

Name	Size
README.md md5:04b567b8dd2a195a263b410cbb7e46ad	25.1 kB	Preview Download
Technical Debt Classification.zip md5:c05a7fb502770246d20e03c37743cc93	41.9 GB	Preview Download

	All versions	This version
Views	3,214	2,718
Downloads	787	642
Data volume	25.2 TB	17.7 TB

Technical Debt Classification in Issue Trackers using Natural Language Processing based on Transformers

Authors/Creators

Description

Files

README.md

Files (41.9 GB)