Published May 22, 2025 | Version v4
Dataset Open

Curated dataset of 18 computational experiments-(E1-E18)

  • 1. Universidade do Porto Faculdade de Engenharia
  • 2. ROR icon INESC TEC

Description

Overview

This dataset, "Multi-Domain Experiment Dataset for Evaluating Reproducibility Tools (E1–E18)," is curated to facilitate the evaluation and benchmarking of reproducibility frameworks. It provides a structured and diverse collection of scientific experiments, enabling researchers and developers to test and compare different tools designed for computational reproducibility.

Dataset Composition

The dataset consists of 18 experiments (E1–E18) covering multiple scientific domains, including computer science, human-computer interaction (HCI), medicine, artificial intelligence, climate change, and economics. These experiments range from simple computational scripts to complex setups requiring integrated databases, multiple programming languages, and domain-specific computational environments.

Experiment Sources

To ensure a well-balanced dataset, experiments were sourced from peer-reviewed scientific conferences and open-access repositories:

  • Computer Science:

    • Software Engineering: Experiments from the IEEE/ACM International Conference on Software Engineering (ICSE 2022), a premier venue for software engineering research.
    • Databases: Selected experiments from the International Conference on Very Large Databases (VLDB 2021), a leading database conference.
    • Human-Computer Interaction (HCI) and User Studies:
      • Experiments sourced from the European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023).
      • These experiments focus on user studies related to software engineering and usability research.
  • Interdisciplinary Fields (Collected from Zenodo):

    • Artificial Intelligence (AI): Experiments covering machine learning and AI-driven methodologies.
    • Climate Change: Computational experiments and simulations addressing environmental research.
    • Medicine: Experiments focusing on medical and healthcare applications.
    • Economics: Computational economic models and data analysis experiments.
  • Related Work on Reproducibility Tools:

    • We included experiments previously used to evaluate reproducibility tools and methodologies.
    • This ensures alignment with prior research and enhances comparability across tools.
    • Among these, two key studies SciInc and SciUnit provided three fully documented experiments:
      • Chicago Food Inspections Evaluation;
      • Variable Infiltration Capacity;
      • Incremental Query Execution.

Methodology for Dataset Curation

To construct this dataset, we employed a structured selection process:

  1. Scientific Conference Selection:

    • We identified key research areas within computer science and selected experiments from ICSE 2022, VLDB 2021, and ESEC/FSE 2023 (focusing on user studies in HCI).
  2. Zenodo Repository Search:

    • Targeted searches were conducted using the keywords "Medical," "Artificial Intelligence," "Climate Change," and "Economics."
    • We filtered results to include only software repositories.
    • From the top 100 search results in each category, five experiments were randomly selected per domain.
  3. Reproducibility Tools & Related Work:

    • We incorporated experiments previously used to evaluate existing reproducibility tools.
    • This selection ensures comparability and continuity with past reproducibility studies.

 

Files

E1.zip

Files (3.0 GB)

Name Size Download all
md5:70e58ae25b7d7ec24e843cc81f63c1ff
1.9 GB Preview Download
md5:ab8acd36bec3d1bf9bbb0889846abbe0
82.7 MB Preview Download
md5:636f7396f875c8a68a801ab9f93234e0
2.3 kB Preview Download
md5:47622c196026190b50e683b5663266cd
144.8 MB Preview Download
md5:2be0e782fb98c66ff44fcd6db2444744
24.5 kB Preview Download
md5:90f83f2e3226d48b4c930ea1118f3f85
250.5 MB Preview Download
md5:155b902df19d1664fa7e7cbc748cd897
5.5 MB Preview Download
md5:29371b86761ee9e24123c745ab4c867c
2.4 MB Preview Download
md5:d83d6b1a91d54cee6dddf3f4ab3f3e2e
24.3 MB Preview Download
md5:c0f4a5c4c7e3a84a64d757b60761f087
88.8 MB Preview Download
md5:038bc16a2de2e37291b24bc61adf74a0
37.1 MB Preview Download
md5:785c5968d79d39faff3f28615c121848
58.4 MB Preview Download
md5:90de76534cf2cf8f9047b37b63048484
105.8 MB Preview Download
md5:6b1f46120a7bc0dd2e68eb4dc91ede28
28.4 MB Preview Download
md5:fb57c17ab7f6bd8731974db8307b0f9a
6.8 MB Preview Download
md5:56c59118fa12f5b5a5ecf9b90526fbe1
244.2 MB Preview Download
md5:1d412c8341a7965372ae83a5e1f2afb9
1.7 MB Preview Download
md5:d5468781a6c01dfde363623b1f43c63a
2.3 MB Preview Download

Additional details

Related works

Is continued by
Conference proceeding: 10.1145/3641525.3663623 (DOI)

Dates

Submitted
2025-04-07

Software

Programming language
Python, Java, JavaScript, R, C++, Jupyter Notebook, TypeScript