There is a newer version of the record available.

Published September 21, 2021 | Version v1
Software Open

artifact_detection - A tool for NLP tasks on textual bug reports.

  • 1. Graz University of Technology

Description

artifact_detection
A tool for NLP tasks on textual bug reports.
Automated classification of text into natural language (e.g. English in the contained datasets), and non-natural language text portions (e.g. stack traces, code snippets, log outputs, file listings, urls,) on a line by line basis.

This repo contains the Python implementation of a machine learning classifier model, basic scripts for automated trainingset creation from GitHub issue tickets, a sample dataset sourced from 101 Java projects hosted on GitHub, and a scikit-learn transformer that wraps the pretrained model to be used as preprocessing step in a scikit-learn pipeline.

Detailed discussion of this model can be found in "Identifying non-natural language artifacts in bug reports" - Hirsch T. and Hofer B..

If you use this work in research please cite:
"Identifying non-natural language artifacts in bug reports" - Hirsch T. and Hofer B. in 2nd International Workshop on Software Engineering Automation: A Natural Language Prospective part of 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW ’21), November 15–19, 2021, Virtual.

This is project is also available on GitHub:
https://github.com/AmadeusBugProject/artifact_detection

Files

artifact_detection.zip

Files (35.2 MB)

Name Size Download all
md5:7420df4c92bcc1870ac2650633e9ed32
35.2 MB Preview Download

Additional details

Funding

FWF Austrian Science Fund
Automated Debugging in Use P 32653