Dataset Open Access

The codrep machine learning on source code competition, the raw diff

Chen, Zimin; Monperrus, Martin

CodRep is a machine learning competition on source code data. It is carefully designed so that anybody can enter the competition, whether professional researchers, students or independent scholars, without specific knowledge in machine learning or program analysis. In particular, it aims at being a common playground on which the machine learning and the software engineering research communities can interact.

 

This dataset provides the raw diffs that we collected, that are used for generating the prediction tasks. See more info at https://github.com/KTH/CodRep-competition.

Files (10.2 GB)
Name Size
14914-diffs.zip
md5:14987553c5f1a93727680346bea4a9e7
513.4 MB Download
ant_diffs.zip
md5:8b5b2c79a96f1d7a6e6d4c173463b28b
224.2 MB Download
cassandra_diffs.zip
md5:5335661f788d7bca1f36ed2e146e3af0
722.9 MB Download
commons-codec_diffs.zip
md5:30f08bbbec84143477c7d14964599a45
9.4 MB Download
commons-collections_diffs.zip
md5:4f20fead6129b1decb910fdbbbcc2bf4
27.4 MB Download
commons-compress_diffs.zip
md5:ffabf73f01c2f5c9a269b763fc868d8b
13.0 MB Download
commons-csv_diffs.zip
md5:edd92c6bbda01119c0ac8edd7b31691b
7.3 MB Download
commons-io_diffs.zip
md5:678aa0fa9fe2cd5897b2a09709b158fc
12.5 MB Download
commons-lang_diffs.zip
md5:57a784945f5961854abf5e26e6ada602
48.0 MB Download
commons-math_diffs.zip
md5:2597104d664095bdf9a51950ff840fb7
91.4 MB Download
dataset-diff-02.tar.xz
md5:0b862f9ef1c330ac934cca82a82f0d76
371.3 MB Download
ecf_diffs.zip
md5:30fef695985ded54b628a92449cc738c
146.4 MB Download
eclipse.platform.swt_diffs.zip
md5:3b760b8a730ecd476377f1c9fc33e79c
416.3 MB Download
eclipse_diffs.zip
md5:d81b6d7ee090b2a25fd01c85cbaff6af
21.1 MB Download
elasticsearch_diffs.zip
md5:50626d05a30cc9ad00066af563023923
2.4 GB Download
jmeter_diffs.zip
md5:2d0c632ddf9d08bc6d17856994d03dc1
187.1 MB Download
libgdx_diffs.zip
md5:b356f999665366d5b49f00f332343e94
613.1 MB Download
log4j_diffs.zip
md5:e7408603335330c92ac62331f9c05e62
177.9 MB Download
lucene-solr_diffs.zip
md5:4d289f88cc97c06eef7aded1fa58bb88
253.2 MB Download
openjpa_diffs.zip
md5:f8763d01e5ab64113521483359f01ca0
82.8 MB Download
org.aspectj_diffs.zip
md5:9d173f7e88dcf04e4864f9968faef3cf
70.1 MB Download
org.eclipse.webtools.incubator_diffs.zip
md5:e74459bfa1fc3d999711faf72d96db4f
13.6 MB Download
org.eclipse.xpand_diffs.zip
md5:ffa9c7413fb642cb7fca5ccd2d22eca8
16.4 MB Download
PocketHub_diffs.zip
md5:9a97abfa35b890ef3355afb75e3e64d9
9.7 MB Download
spring-framework_diffs.zip
md5:fd0edf1a0fd18c25f50cd28de60ff64e
342.9 MB Download
storm_diffs.zip
md5:761a994464b00b548fc3da745a51ce07
301.1 MB Download
tomcat_diffs.zip
md5:81c725999edc868b3adca228137aed28
224.8 MB Download
wicket_diffs.zip
md5:b99433c16823e584d2467548c149d346
2.5 GB Download
wildfly_diffs.zip
md5:9b0958cc4ec5a59114b80647cdaaee6b
327.5 MB Download
xerces2-j_diffs.zip
md5:24b1d2aefa12d22d58f32021fd37b1da
97.7 MB Download
zxing_diffs.zip
md5:8e36b7ad3a691db1509b1b5d247e9843
6.3 MB Download
  • Chen, Zimin, and Martin Monperrus. "The codrep machine learning on source code competition." arXiv preprint arXiv:1807.03200 (2018).

97
307
views
downloads
All versions This version
Views 9797
Downloads 307307
Data volume 106.0 GB106.0 GB
Unique views 8585
Unique downloads 2525

Share

Cite as