1135263
doi
10.5281/zenodo.1135263
oai:zenodo.org:1135263
user-empirical-software-engineering
Dumani, Lorik
University of Trier
SOTorrent Data Set
Baltes, Sebastian
University of Trier
info:eu-repo/semantics/openAccess
Creative Commons Attribution Share Alike 4.0 International
https://creativecommons.org/licenses/by-sa/4.0/legalcode
stack overflow
code snippets
software evolution
github
<p>Stack Overflow (SO) is the largest Q&A website for software developers, providing a huge amount of copyable code snippets. Recent studies have shown that developers regularly copy those snippets into their software projects, often without the required attribution. Beside possible licensing issues, maintenance issues may arise, because the snippets evolve on SO, but the developers who copied the code are not aware of these changes. To help researchers investigate the evolution of code snippets on SO and their relation to other platforms like GitHub, we build <em>SOTorrent</em>, an open data set based on data from the official SO data dump and the Google BigQuery GitHub data set. <em>SOTorrent</em> provides access to the version history of SO content on the level of whole posts and individual text or code blocks. Moreover, it links SO content to external resources in two ways: (1) by extracting linked URLs from text blocks of SO posts and (2) by providing a table with links to SO posts found in the source code of all projects in the BigQuery GitHub data set.</p>
The dataset is based on the official Stack Overflow data dump released 2017-12-01 (https://archive.org/details/stackexchange) and the Google BigQuery GitHub data set queried 2017-11-20 (https://cloud.google.com/bigquery/public-data/github). Please read all three license files (LICENSE_1.txt, LICENSE_2.txt, LICENSE_3.txt) before using the dataset.
Zenodo
2018-01-05
info:eu-repo/semantics/other
1135262
user-empirical-software-engineering
2017-12-24
1610026356.072737
5276738677
md5:5468330f602b84da1a452f94dcccc362
https://zenodo.org/records/1135263/files/Comments.xml.gz
79
md5:52c862330e4850d08848fe141ffe2dc3
https://zenodo.org/records/1135263/files/PostBlockDiffOperation.csv.gz
60
md5:2627b668db395aa447be78057b327065
https://zenodo.org/records/1135263/files/PostBlockType.csv.gz
20561
md5:30560b322dbbddacfc6292942a42732d
https://zenodo.org/records/1135263/files/LICENSE_3
248
md5:8338f2e6a3dc5c724de1cb6ad8a6f17b
https://zenodo.org/records/1135263/files/LICENSE_2
16114967576
md5:1a9f208abd7c25a8cf31c5773a845af8
https://zenodo.org/records/1135263/files/PostBlockVersion.csv.gz
28418369398
md5:ef60bc0774724df24079dfa2488aeed2
https://zenodo.org/records/1135263/files/PostHistory.xml.gz
86373053
md5:a513ff2491b9e4b72456d26df4a4695a
https://zenodo.org/records/1135263/files/PostLinks.xml.gz
269845170
md5:53318cf851181d2e0c08cba3f7ce3147
https://zenodo.org/records/1135263/files/PostReferenceGH.csv.gz
16057603848
md5:d2d99c634e4c0cf112daace2f1e62cd4
https://zenodo.org/records/1135263/files/Posts.xml.gz
135
md5:2e73dcd4f791e547254bca618857c746
https://zenodo.org/records/1135263/files/PostType.csv.gz
22792
md5:7fe0a3c070cf6da7b9b11bb02adad522
https://zenodo.org/records/1135263/files/LICENSE_1
570995873
md5:966a3601a19e5016becdd1e555df66bc
https://zenodo.org/records/1135263/files/PostVersion.csv.gz
7217130842
md5:d96919b855795684aa4d7f779e31cf29
https://zenodo.org/records/1135263/files/PostBlockDiff.csv.gz
265934518
md5:08f14a0cdccd3f01f98d2dc0f72af702
https://zenodo.org/records/1135263/files/Badges.xml.gz
655289902
md5:a98a845fce686c0d196ad6d3f3ade395
https://zenodo.org/records/1135263/files/PostVersionUrl.csv.gz
868
md5:dd279bd65996102fb6bee854ccba8137
https://zenodo.org/records/1135263/files/README.md
993826
md5:21856cd5f720cf7f065fa82dde75405e
https://zenodo.org/records/1135263/files/Tags.xml.gz
752
md5:3e95f2763006b8a4b86188c14e0e1940
https://zenodo.org/records/1135263/files/7_create_sotorrent_indices.sql
1168761462
md5:b9dcb2e851b6faa274da2e36a954db12
https://zenodo.org/records/1135263/files/Votes.xml.gz
2062
md5:c4923ba6b505cda40e81f71562b38503
https://zenodo.org/records/1135263/files/6_import_sotorrent.sql
3848
md5:792d89fef5bbdb7cab67e52db2054bf1
https://zenodo.org/records/1135263/files/5_create_sotorrent_tables.sql
255
md5:dfc610b177d240d7e1ed825787e54ada
https://zenodo.org/records/1135263/files/4_create_indices.sql
1201
md5:df89b09b4f261ff17a5e331bf1364a50
https://zenodo.org/records/1135263/files/3_load_so_from_xml.sql
506
md5:4d25b7bb873aa15b8b0ff7f8853a8147
https://zenodo.org/records/1135263/files/2_create_sotorrent_user.sql
510443736
md5:16ff25feb3807f480d36d56795f29f4b
https://zenodo.org/records/1135263/files/Users.xml.gz
4340
md5:c6e59e6cf26244266b6140604d84fb28
https://zenodo.org/records/1135263/files/1_create_database.sql
public
10.5281/zenodo.1135262
isVersionOf
doi