200 Annotated Developer Human Errors from GitHub
Description
Software Engineers' Human Errors
This dataset contains 200 GitHub comments with manual human error annotations, released as part of the following publication:
- Benjamin S. Meyers. Human Error Assessment in Software Engineering. Rochester Institute of Technology. 2023.
Included Files
The "developer_human_errors.csv" file contains the full dataset of 200 software defect descriptions annotated with human error types (slips, lapses, mistakes) and T.H.E.S.E. categories.
CSV Fields
- ID: Unique identifier for the comment.
- SOURCE: Whether this comment originates from a commit, issue, or pull request.
- COMMENT_URL: The URL linking to the comment.
- COMMENT_TEXT: The raw comment text.
- HUMAN_ERROR_TYPE: Whether the software defect described is a slip, lapse, or mistake.
- THESE_V4_ID: Manually assigned T.H.E.S.E. category with labels corresponding to Version 4 of T.H.E.S.E.
- THESE_NAME: Name corresponding to manually assigned T.H.E.S.E. category.
Annotation Details
Human error types span slips, lapses, and mistakes from James Reason's Generic Error Modelling System (GEMS):
- Slips: Failures of attention.
- Lapses: Failures of memory.
- Mistakes: Failures of planning.
T.H.E.S.E. categories are summarized below:
- S01: Typos & Misspellings
- S02: Syntax Errors
- S03: Overlooking documented Information
- S04: Multitasking Errors
- S05: Hardware Interaction Errors
- S06: Overlooking Proposed Code Changes
- S07: Overlooking Existing Functionality
- S08: General Attentional Failure
- L01: Forgetting to Finish a Development Task
- L02: Forgetting to Fix a Defect
- L03: Forgetting to Remove Development Artifacts
- L04: Working with Outdated Source Code
- L05: Forgetting an Import Statement
- L06: Forgetting to Save Work
- L07: Forgetting Previous Development Discussion
- L08: General Memory Failure
- M01: Code Logic Errors
- M02: Incomplete Domain Knowledge
- M03: Wrong Assumption Errors
- M04: Internal Communication Errors
- M05: External Communication Errors
- M06: Solution Choice Errors
- M07: Time Management Errors
- M08: Inadequate Testing
- M09: Incorrect/Insufficient Configuration
- M10: Code Complexity Errors
- M11: Internationalization/String Encoding Errors
- M12: Inadequate Experience Errors
- M13: Insufficient Tooling Access Errors
- M14: Workflow Order Errors
- M15: General Planning Failure
Contact
Please contact Benjamin S. Meyers (email) with questions about this data and its collection.
Acknowledgments
Collection of this data has been sponsored in part by the National Science Foundation (grant 1922169), by the NSA Science of Security Lablet program (grant H98230-17-D-0080/2018-0438-02), and by a Department of Defense DARPA SBIR program (grant 140D63-19-C-0018).
Files
developer_human_errors.csv
Files
(101.6 kB)
Name | Size | Download all |
---|---|---|
md5:463437fdc1c0ea472908f21f433313f3
|
101.6 kB | Preview Download |