1401828
doi
10.5281/zenodo.1401828
oai:zenodo.org:1401828
user-empirical-software-engineering
Dumani, Lorik
University of Trier
SOTorrent Dataset
Baltes, Sebastian
University of Trier
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
stack overflow
code snippets
software evolution
github
<p>Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built <em>SOTorrent</em>, an open dataset based on the official SO data dump. <em>SOTorrent </em>provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and comments, and by collecting references from GitHub files to SO posts. Our vision is that researchers will use <em>SOTorrent </em>to investigate and understand the evolution of SO posts and their relation to other platforms such as GitHub.</p>
<p><strong>If you use this dataset in your work, please cite our <a href="http://empirical-software.engineering/publications#msr18-sotorrent">MSR 2018 paper</a> (<a href="https://dblp.uni-trier.de/rec/bibtex/conf/msr/BaltesDT008">BibTex</a>).</strong></p>
The dataset is based on the official Stack Overflow data dump released 2018-06-05 (https://archive.org/details/stackexchange) and the Google BigQuery GitHub data set queried 2018-08-01 (https://cloud.google.com/bigquery/public-data/github). Please read the license files (LICENSE.md) before using the dataset.
Zenodo
2018-08-06
info:eu-repo/semantics/other
1135262
user-empirical-software-engineering
2018-07-31
1610026359.376048
371639463
md5:0598808a0872d9672d84fca495d20fbc
https://zenodo.org/records/1401828/files/CommentUrl.csv.gz
978927581
md5:ba445e1138143f98e5662eaf5bd4fd24
https://zenodo.org/records/1401828/files/PostVersion.csv.gz
9197312108
md5:eee064247994b351de9f45dea8db6e72
https://zenodo.org/records/1401828/files/PostBlockDiff.csv.gz
39124
md5:f02ef134b20bfe264755fb4338f47ccf
https://zenodo.org/records/1401828/files/LICENSE.md
5626391469
md5:686f3c9ec9d4dee3d19893a5ef96303d
https://zenodo.org/records/1401828/files/Comments.xml.gz
1266832088
md5:c821ce150972429ef2d247d2e2df9d11
https://zenodo.org/records/1401828/files/Votes.xml.gz
289080710
md5:fdf30e1e366f1d150337c848f55b8492
https://zenodo.org/records/1401828/files/Badges.xml.gz
567545449
md5:c9a8969fbf4a7ea42e6c2135860f53de
https://zenodo.org/records/1401828/files/Users.xml.gz
726669370
md5:dbce201689a439d9bb95e72207553377
https://zenodo.org/records/1401828/files/TitleVersion.csv.gz
20249982931
md5:ddaa03372c119e6b70a6a032e70fdf38
https://zenodo.org/records/1401828/files/PostBlockVersion.csv.gz
1499
md5:cfb179350388e38133c83bbf802c1ce2
https://zenodo.org/records/1401828/files/8_create_sotorrent_indices.sql
6686
md5:2c884318d9344c6c4d5bb0815a0d7bd6
https://zenodo.org/records/1401828/files/1_create_database.sql
1201
md5:799f3fd4fde2522a8acfbf10874b2332
https://zenodo.org/records/1401828/files/2_load_so_from_xml.sql
1031199
md5:0f456720abf859ca4d9cdc5965d78685
https://zenodo.org/records/1401828/files/Tags.xml.gz
489
md5:73693f9ef65547c799aed4d0d98ba474
https://zenodo.org/records/1401828/files/7_load_postreferencegh.sql
4430
md5:f384f9d260aec3535183f71ea74a4b24
https://zenodo.org/records/1401828/files/6_load_sotorrent.sql
506
md5:3f1228feae388d0c8c35570f39ee6300
https://zenodo.org/records/1401828/files/5_create_sotorrent_user.sql
6563
md5:477bf186705ee787514dd84d224a6251
https://zenodo.org/records/1401828/files/4_create_sotorrent_tables.sql
501
md5:0ed7536b5ab713c5f77ae5699604c608
https://zenodo.org/records/1401828/files/3_create_indices.sql
94338197
md5:b5918b15a5f400217300d9d60b7425bd
https://zenodo.org/records/1401828/files/PostLinks.xml.gz
1436
md5:b158c19ca687d1b81e51e608328eabd5
https://zenodo.org/records/1401828/files/README.md
30412754554
md5:b541fff7b494ef8c56bad459b2bf9d5e
https://zenodo.org/records/1401828/files/PostHistory.xml.gz
354849372
md5:a0ca7e300495ca899aa3d17b083b4401
https://zenodo.org/records/1401828/files/PostReferenceGH.csv.gz
17138140172
md5:b8620f67212361096743fa6d65fb40ac
https://zenodo.org/records/1401828/files/Posts.xml.gz
1307806500
md5:2b6999e596386c9b58daa3239f1ee362
https://zenodo.org/records/1401828/files/PostVersionUrl.csv.gz
public
10.5281/zenodo.1135262
isVersionOf
doi