Published November 7, 2023 | Version 1.0
Dataset Open

200 Annotated Developer Human Errors from GitHub

  • 1. ROR icon Rochester Institute of Technology

Description

Software Engineers' Human Errors

This dataset contains 200 GitHub comments with manual human error annotations, released as part of the following publication:

Included Files

The "developer_human_errors.csv" file contains the full dataset of 200 software defect descriptions annotated with human error types (slips, lapses, mistakes) and T.H.E.S.E. categories.

CSV Fields

  • ID: Unique identifier for the comment.
  • SOURCE: Whether this comment originates from a commit, issue, or pull request.
  • COMMENT_URL: The URL linking to the comment.
  • COMMENT_TEXT: The raw comment text.
  • HUMAN_ERROR_TYPE: Whether the software defect described is a slip, lapse, or mistake.
  • THESE_V4_ID: Manually assigned T.H.E.S.E. category with labels corresponding to Version 4 of T.H.E.S.E.
  • THESE_NAME: Name corresponding to manually assigned T.H.E.S.E. category.

Annotation Details

Human error types span slips, lapses, and mistakes from James Reason's Generic Error Modelling System (GEMS):

  • Slips: Failures of attention.
  • Lapses: Failures of memory.
  • Mistakes: Failures of planning.

T.H.E.S.E. categories are summarized below:

  • S01: Typos & Misspellings
  • S02: Syntax Errors
  • S03: Overlooking documented Information
  • S04: Multitasking Errors
  • S05: Hardware Interaction Errors
  • S06: Overlooking Proposed Code Changes
  • S07: Overlooking Existing Functionality
  • S08: General Attentional Failure
  • L01: Forgetting to Finish a Development Task
  • L02: Forgetting to Fix a Defect
  • L03: Forgetting to Remove Development Artifacts
  • L04: Working with Outdated Source Code
  • L05: Forgetting an Import Statement
  • L06: Forgetting to Save Work
  • L07: Forgetting Previous Development Discussion
  • L08: General Memory Failure
  • M01: Code Logic Errors
  • M02: Incomplete Domain Knowledge
  • M03: Wrong Assumption Errors
  • M04: Internal Communication Errors
  • M05: External Communication Errors
  • M06: Solution Choice Errors
  • M07: Time Management Errors
  • M08: Inadequate Testing
  • M09: Incorrect/Insufficient Configuration
  • M10: Code Complexity Errors
  • M11: Internationalization/String Encoding Errors
  • M12: Inadequate Experience Errors
  • M13: Insufficient Tooling Access Errors
  • M14: Workflow Order Errors
  • M15: General Planning Failure

Contact

Please contact Benjamin S. Meyers (email) with questions about this data and its collection.

Acknowledgments

Collection of this data has been sponsored in part by the National Science Foundation (grant 1922169), by the NSA Science of Security Lablet program (grant H98230-17-D-0080/2018-0438-02), and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).

Files

developer_human_errors.csv

Files (101.6 kB)

Name Size Download all
md5:463437fdc1c0ea472908f21f433313f3
101.6 kB Preview Download