Dataset Open Access

Code4Bench: A Multidimensional Benchmark of Codeforces Data for Different Program Analysis Techniques

Majd Amirabbas; Vahidi-Asl Mojtaba; Khalilian Alireza; Baraani-Dastjerdi Ahmad; Zamani Bahman


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Reproduciblity</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Benchmark</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Software Testing</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Fault Localization</subfield>
  </datafield>
  <datafield tag="653" ind1=" " ind2=" ">
    <subfield code="a">Program Repair</subfield>
  </datafield>
  <datafield tag="502" ind1=" " ind2=" ">
    <subfield code="c">Faculty of Computer Science and Engineering, Shahid Beheshti University G. C., Tehran, Iran</subfield>
  </datafield>
  <controlfield tag="005">20191101071205.0</controlfield>
  <controlfield tag="001">2582968</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Faculty of Computer Science and Engineering, Shahid Beheshti University G. C., Tehran, Iran</subfield>
    <subfield code="a">Vahidi-Asl Mojtaba</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Software Engineering, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran</subfield>
    <subfield code="a">Khalilian Alireza</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Software Engineering, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran</subfield>
    <subfield code="a">Baraani-Dastjerdi Ahmad</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Department of Software Engineering, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran</subfield>
    <subfield code="a">Zamani Bahman</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Faculty of Computer Science and Engineering, Shahid Beheshti University G. C., Tehran, Iran</subfield>
    <subfield code="4">ths</subfield>
    <subfield code="a">Vahidi-Asl Mojtaba</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Faculty of Computer Science and Engineering, Shahid Beheshti University G. C., Tehran, Iran</subfield>
    <subfield code="4">ths</subfield>
    <subfield code="a">Haghighi Hasan</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">635883922</subfield>
    <subfield code="z">md5:3ae77dfabec6e7a97ca7608c1aa41c04</subfield>
    <subfield code="u">https://zenodo.org/record/2582968/files/code4bench.rar</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2019-03-04</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="o">oai:zenodo.org:2582968</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="4">
    <subfield code="p">Computer Languages</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">Faculty of Computer Science and Engineering, Shahid Beheshti University G. C., Tehran, Iran</subfield>
    <subfield code="a">Majd Amirabbas</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Code4Bench: A Multidimensional Benchmark of Codeforces Data for Different Program Analysis Techniques</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">http://creativecommons.org/licenses/by/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Reproducible research relies on well-designed benchmarks. However, evaluation on a single benchmark increases the risk of overfitting; that is, an optimization to reach a certain performance. In recent years several well-designed benchmarks have been constructed for different subfields of program analysis. However, they often involve real-world industrial projects in few languages such as C or Java. We provide Code4Bench, a benchmark comprising 3,421,357 programs totaling of 306,053,105 lines of code in 41 versions of 28 programming languages such as C/C++, Java, Python, and Kotlin. We have constructed this benchmark from Codeforces, a famous programming competition website, which is widely used by international programmers. Code4Bench advances the state-of-the-art in conducting reproducible and comparative experiments. It helps mitigate the bias and increase the generality and conclusiveness of the results. We present our methodology in construction of Code4Bench and give various descriptive statistics. We have also conducted an online survey on the users of Codeforces&amp;rsquo; website whose code is included in the benchmark. The survey is concerned about the user&amp;rsquo;s demographic information and programming habits, whose results are also provided in the benchmark. Finally, we leveraged an automatic process by which we localized faults within the faulty versions and categorize them according to a coarse-grained classification. In addition to its usage in empirical studies, Code4Bench can be used to teach programming and evolve algorithmic problems. We release Code4Bench in database format to allow researchers to extract other data of the benchmark by arbitrary queries.&lt;/p&gt;

&lt;p&gt;Code4Bench version 1.0.0 is publicly available at &lt;a href="https://zenodo.org/record/2582968"&gt;https://zenodo.org/record/2582968&lt;/a&gt;, with DOI 10.5281/zenodo.2582968, thereby providing long-term storage and versioning. It is released under the terms of Creative Commons Attribution 4.0 International license. Code4Bench is also publicly available at: &lt;a href="https://github.com/code4bench/Code4Bench"&gt;https://github.com/code4bench/Code4Bench&lt;/a&gt;, in which we have provided some additional information and script examples.&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.2582967</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.2582968</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">dataset</subfield>
  </datafield>
</record>
539
79
views
downloads
All versions This version
Views 539539
Downloads 7979
Data volume 50.2 GB50.2 GB
Unique views 494494
Unique downloads 5050

Share

Cite as