Published July 10, 2020 | Version 2020-01-24
Dataset Open

An Annotated Dataset of Stack Overflow Post Edits

  • 1. The University of Adelaide

Description

To improve software engineering, software repositories have been mined for code snippets and bug fixes. Typically, this mining takes place at the level of files or commits. To be able to dig deeper and to extract insights at a higher resolution, we hereby present an annotated dataset that contains over 7 million edits of code and text on Stack Overflow. Our preliminary study indicates that these edits might be a treasure trove for mining information about fine-grained patches, e.g., for the optimisation of non-functional properties.

EDIT: In the more recent version I fixed GetEditContent.sql, which had an ambiguous column name in one of the select statements.

Files

Files (175.9 MB)

Name Size Download all
md5:bd04f989671e5b0fef8484872412cd86
9.5 kB Download
md5:5b785479b09e53615b386c3e77119be9
11.0 kB Download
md5:0560200b3cd97931183ab45ab234de91
957 Bytes Download
md5:03969793b7bae7a1433ff4b4b6b1d04b
175.9 MB Download
md5:5e7db3bec3da91e24da550c3314c125e
2.9 kB Download