1,237 Annotated Developer Apologies from GitHub
Description
Software Developer Apologies
This dataset contains 1,237 GitHub comments with apology annotations (apology vs not apology), released as part of the following publication:
- Benjamin S. Meyers. Human Error Assessment in Software Engineering. Rochester Institute of Technology. 2023.
Included Files
The "github_apologies.csv" file contains the full dataset of 1,237 GitHub comments with apology annotations. In total, there are 365 comments containing an apology (872 non apologies). The comments themselves are a subset of those included in 88.6 Million Developer Comments from GitHub.
Annotation Details
Full details are provided in the above publication. We implemented a naive classifier (Precision: 41.7%, Recall: 99.7%, F1: 86.9%, Accuracy: 91.1%) using counts of apology lemmas. 91% of developer comments containing at least one apology lemma matched our manual annotations. Agreement between raters was almost perfect (Cohen's Kappa = 0.94).
CSV Fields
- ID: Unique identifier for the comment.
- SOURCE: Whether this comment originates from a commit, issue, or pull request.
- COMMENT_URL: The URL linking to the comment.
- COMMENT_TEXT: The raw comment text.
- NUM_APOLOGY_LEMMAS: The count of apology lemmas present in the comment.
- CLASSIFIER_LABEL: The automatically assigned label ("Apology" or "Not Apology").
- RATER_1_LABEL: The manually assigned label ("Apology" or "Not Apology") from Rater 1.
- RATER_2_LABEL: The manually assigned label ("Apology" or "Not Apology") from Rater 2.
- AGREED_LABEL: The agreed upon label ("Apology" or "Not Apology") after Rater 1 and Rater 12 resolved disagreements.
Contact
Please contact Benjamin S. Meyers (email) with questions about this data and its collection.
Acknowledgments
Collection of this data has been sponsored in part by the National Science Foundation (grant 1922169), by the NSA Science of Security Lablet program (grant H98230-17-D-0080/2018-0438-02), and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).
Files
github_apologies.csv
Files
(697.2 kB)
Name | Size | Download all |
---|---|---|
md5:437696005951cf93e9ad18a32f0919e3
|
697.2 kB | Preview Download |
Additional details
Related works
- Is derived from
- Dataset: 10.5281/zenodo.5603093 (DOI)