Published September 21, 2021
| Version v1
Software
Open
artifact_detection - A tool for NLP tasks on textual bug reports.
Description
artifact_detection A tool for NLP tasks on textual bug reports.
Automated classification of text into natural language (e.g. English in the contained datasets), and non-natural language text portions (e.g. stack traces, code snippets, log outputs, file listings, urls,) on a line by line basis. This repo contains the Python implementation of a machine learning classifier model, basic scripts for automated trainingset creation from GitHub issue tickets, a sample dataset sourced from 101 Java projects hosted on GitHub, and a scikit-learn transformer that wraps the pretrained model to be used as preprocessing step in a scikit-learn pipeline. Detailed discussion of this model can be found in "Identifying non-natural language artifacts in bug reports" - Hirsch T. and Hofer B.. If you use this work in research please cite: "Identifying non-natural language artifacts in bug reports" - Hirsch T. and Hofer B. in 2nd International Workshop on Software Engineering Automation: A Natural Language Prospective part of 36th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW ’21), November 15–19, 2021, Virtual. This is project is also available on GitHub: https://github.com/AmadeusBugProject/artifact_detection
Files
artifact_detection.zip
Files
(35.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:7420df4c92bcc1870ac2650633e9ed32
|
35.2 MB | Preview Download |
Additional details
Funding
- FWF Austrian Science Fund
- Automated Debugging in Use P 32653