Published November 7, 2023 | Version 1.0
Dataset Open

1,237 Annotated Developer Apologies from GitHub

  • 1. ROR icon Rochester Institute of Technology

Description

Software Developer Apologies

This dataset contains 1,237 GitHub comments with apology annotations (apology vs not apology), released as part of the following publication:

Included Files

The "github_apologies.csv" file contains the full dataset of 1,237 GitHub comments with apology annotations. In total, there are 365 comments containing an apology (872 non apologies). The comments themselves are a subset of those included in 88.6 Million Developer Comments from GitHub.

Annotation Details

Full details are provided in the above publication. We implemented a naive classifier (Precision: 41.7%, Recall: 99.7%, F1: 86.9%, Accuracy: 91.1%) using counts of apology lemmas. 91% of developer comments containing at least one apology lemma matched our manual annotations. Agreement between raters was almost perfect (Cohen's Kappa = 0.94).

CSV Fields

  • ID: Unique identifier for the comment.
  • SOURCE: Whether this comment originates from a commit, issue, or pull request.
  • COMMENT_URL: The URL linking to the comment.
  • COMMENT_TEXT: The raw comment text.
  • NUM_APOLOGY_LEMMAS: The count of apology lemmas present in the comment.
  • CLASSIFIER_LABEL: The automatically assigned label ("Apology" or "Not Apology").
  • RATER_1_LABEL: The manually assigned label ("Apology" or "Not Apology") from Rater 1.
  • RATER_2_LABEL: The manually assigned label ("Apology" or "Not Apology") from Rater 2.
  • AGREED_LABEL: The agreed upon label ("Apology" or "Not Apology") after Rater 1 and Rater 12 resolved disagreements.

Contact

Please contact Benjamin S. Meyers (email) with questions about this data and its collection.

Acknowledgments

Collection of this data has been sponsored in part by the National Science Foundation (grant 1922169), by the NSA Science of Security Lablet program (grant H98230-17-D-0080/2018-0438-02), and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).

Files

github_apologies.csv

Files (697.2 kB)

Name Size Download all
md5:437696005951cf93e9ad18a32f0919e3
697.2 kB Preview Download

Additional details

Related works

Is derived from
Dataset: 10.5281/zenodo.5603093 (DOI)