Published December 14, 2018 | Version v8
Dataset Open

Dataset with manually validated version histories of Stack Overflow posts

  • 1. University of Trier

Description

We used this dataset to evaluate different string similarity metrics for SOTorrent (http://sotorrent.org/). For the versions published 2018-11-01 and 2018-12-14, we double-checked and updated the ground truth files.

The dataset has been created with this tool: https://github.com/sotorrent/posthistory-gt

The dataset has been validated with this tool: https://github.com/sotorrent/posthistory-comparator-gt-cs

The dataset has been used in this project: https://github.com/sotorrent/metric-evaluation

The most recent version of the files can always be found here: https://github.com/sotorrent/metric-evaluation/tree/master/testdata/samples_comparison

Files

LICENSE.txt

Files (1.5 MB)

Name Size Download all
md5:2acd14bf8b67dea11530bba3b87d1ee8
220 Bytes Preview Download
md5:9ecf8887e16bef685e9357400a02b6d9
338.9 kB Preview Download
md5:b6a59660c96be2882ae6a2c997b12e8e
164.7 kB Preview Download
md5:51d34dd6f2e3e3b46ef0e4ada629cc95
288.6 kB Preview Download
md5:3484651af661bda0d218a53e86213b44
144.6 kB Preview Download
md5:88719b76dc8e056219c0faa80dd1e3c6
201.9 kB Preview Download
md5:d869d62006b5b2f7ff8e8b9f60d136e3
62.8 kB Preview Download
md5:e15faf11993394c38235fa981d30f58c
167.2 kB Preview Download
md5:a7357ed12fcb0a12b4bdf941652e4ee2
168.9 kB Preview Download