There is a newer version of the record available.

Published January 20, 2025 | Version v2
Dataset Open

Comprehensive Multi-Domain Experiment Reproducibility Dataset (E1-E20)

  • 1. Universidade do Porto Faculdade de Engenharia
  • 2. ROR icon INESC TEC

Description

Overview

This dataset, "Multi-Domain Experiment Dataset for Evaluating Reproducibility Tools (E1–E20)," is curated to facilitate the evaluation and benchmarking of reproducibility frameworks. It provides a structured and diverse collection of scientific experiments, enabling researchers and developers to test and compare different tools designed for computational reproducibility.

Dataset Composition

The dataset consists of 20 experiments (E1–E20) covering multiple scientific domains, including computer science, human-computer interaction (HCI), medicine, artificial intelligence, climate change, and economics. These experiments range from simple computational scripts to complex setups requiring integrated databases, multiple programming languages, and domain-specific computational environments.

Experiment Sources

To ensure a well-balanced dataset, experiments were sourced from peer-reviewed scientific conferences and open-access repositories:

  • Computer Science:

    • Software Engineering: Experiments from the IEEE/ACM International Conference on Software Engineering (ICSE 2022), a premier venue for software engineering research.
    • Databases: Selected experiments from the International Conference on Very Large Databases (VLDB 2021), a leading database conference.
    • Human-Computer Interaction (HCI) and User Studies:
      • Experiments sourced from the European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023).
      • These experiments focus on user studies related to software engineering and usability research.
  • Interdisciplinary Fields (Collected from Zenodo):

    • Artificial Intelligence (AI): Experiments covering machine learning and AI-driven methodologies.
    • Climate Change: Computational experiments and simulations addressing environmental research.
    • Medicine: Experiments focusing on medical and healthcare applications.
    • Economics: Computational economic models and data analysis experiments.
  • Related Work on Reproducibility Tools:

    • We included experiments previously used to evaluate reproducibility tools and methodologies.
    • This ensures alignment with prior research and enhances comparability across tools.
    • Among these, two key studies SciInc and SciUnit provided three fully documented experiments:
      • Chicago Food Inspections Evaluation;
      • Variable Infiltration Capacity;
      • Incremental Query Execution.

Methodology for Dataset Curation

To construct this dataset, we employed a structured selection process:

  1. Scientific Conference Selection:

    • We identified key research areas within computer science and selected experiments from ICSE 2022, VLDB 2021, and ESEC/FSE 2023 (focusing on user studies in HCI).
  2. Zenodo Repository Search:

    • Targeted searches were conducted using the keywords "Medical," "Artificial Intelligence," "Climate Change," and "Economics."
    • We filtered results to include only software repositories.
    • From the top 100 search results in each category, five experiments were randomly selected per domain.
  3. Reproducibility Tools & Related Work:

    • We incorporated experiments previously used to evaluate existing reproducibility tools.
    • This selection ensures comparability and continuity with past reproducibility studies.

 

Files

E1-Adam.zip

Files (6.2 GB)

Name Size Download all
md5:47dd803d85f56868e8a377a11c84ad11
1.9 GB Preview Download
md5:79e4df34e1242d0ef235fbc2eca9aea2
83.1 MB Preview Download
md5:e3be01b4869bcbd177a7cca153eea67b
8.1 MB Preview Download
md5:faa213cc3282d7c0871cc12572a8880d
121.9 kB Preview Download
md5:767fb93b16eba3ece6c87d21aede3ea5
322.4 kB Preview Download
md5:d00f6b647074afaefaf54067cbbaa0a8
144.8 MB Preview Download
md5:93e7e184b4caa0cae1274449634ef0ef
24.3 kB Preview Download
md5:d7fec98973763497ceaf281d4432f5e9
255.6 MB Preview Download
md5:61bb598bb995a3a0f39b549af93c40c9
3.1 MB Preview Download
md5:c41fcc13c45a69ef60cf16dea82ba7f3
15.2 kB Preview Download
md5:3457ae14e808f728c7315dc59287a10f
90.7 MB Preview Download
md5:ad5d0c5ab1eb9da0383edf5965120cf2
3.2 GB Preview Download
md5:ff997bbf6c17583262338c9ec31f5c4c
88.8 MB Preview Download
md5:61b18b96f5889d22c99049579ed91679
58.4 MB Preview Download
md5:206032e5c84361025c5f948d16fa4d93
99.9 MB Preview Download
md5:9a15852fcb373381786615be6c389722
28.4 MB Preview Download
md5:859210c9034cf1692362d8c951f18400
6.8 MB Preview Download
md5:ab09c46699283d73145c31eab3afa683
236.1 MB Preview Download
md5:a1b9aa855b202152fde0d58e16a78898
1.6 MB Preview Download
md5:3fc773fcbd3144537f49a4b8fc6df7e9
2.3 MB Preview Download

Additional details

Related works

Has part
Conference proceeding: 10.1145/3641525.3663623 (DOI)

Dates

Submitted
2025-01-30

Software

Programming language
Python, Java, JavaScript, R, C++, Jupyter Notebook, TypeScript