Comprehensive Multi-Domain Experiment Reproducibility Dataset (E1-E20)
Description
Overview
This dataset, "Multi-Domain Experiment Dataset for Evaluating Reproducibility Tools (E1–E20)," is curated to facilitate the evaluation and benchmarking of reproducibility frameworks. It provides a structured and diverse collection of scientific experiments, enabling researchers and developers to test and compare different tools designed for computational reproducibility.
Dataset Composition
The dataset consists of 20 experiments (E1–E20) covering multiple scientific domains, including computer science, human-computer interaction (HCI), medicine, artificial intelligence, climate change, and economics. These experiments range from simple computational scripts to complex setups requiring integrated databases, multiple programming languages, and domain-specific computational environments.
Experiment Sources
To ensure a well-balanced dataset, experiments were sourced from peer-reviewed scientific conferences and open-access repositories:
-
Computer Science:
- Software Engineering: Experiments from the IEEE/ACM International Conference on Software Engineering (ICSE 2022), a premier venue for software engineering research.
- Databases: Selected experiments from the International Conference on Very Large Databases (VLDB 2021), a leading database conference.
- Human-Computer Interaction (HCI) and User Studies:
- Experiments sourced from the European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023).
- These experiments focus on user studies related to software engineering and usability research.
-
Interdisciplinary Fields (Collected from Zenodo):
- Artificial Intelligence (AI): Experiments covering machine learning and AI-driven methodologies.
- Climate Change: Computational experiments and simulations addressing environmental research.
- Medicine: Experiments focusing on medical and healthcare applications.
- Economics: Computational economic models and data analysis experiments.
-
Related Work on Reproducibility Tools:
- We included experiments previously used to evaluate reproducibility tools and methodologies.
- This ensures alignment with prior research and enhances comparability across tools.
- Among these, two key studies SciInc and SciUnit provided three fully documented experiments:
- Chicago Food Inspections Evaluation;
- Variable Infiltration Capacity;
- Incremental Query Execution.
Methodology for Dataset Curation
To construct this dataset, we employed a structured selection process:
-
Scientific Conference Selection:
- We identified key research areas within computer science and selected experiments from ICSE 2022, VLDB 2021, and ESEC/FSE 2023 (focusing on user studies in HCI).
-
Zenodo Repository Search:
- Targeted searches were conducted using the keywords "Medical," "Artificial Intelligence," "Climate Change," and "Economics."
- We filtered results to include only software repositories.
- From the top 100 search results in each category, five experiments were randomly selected per domain.
-
Reproducibility Tools & Related Work:
- We incorporated experiments previously used to evaluate existing reproducibility tools.
- This selection ensures comparability and continuity with past reproducibility studies.
Files
E1-Adam.zip
Files
(6.2 GB)
Name | Size | Download all |
---|---|---|
md5:47dd803d85f56868e8a377a11c84ad11
|
1.9 GB | Preview Download |
md5:79e4df34e1242d0ef235fbc2eca9aea2
|
83.1 MB | Preview Download |
md5:e3be01b4869bcbd177a7cca153eea67b
|
8.1 MB | Preview Download |
md5:faa213cc3282d7c0871cc12572a8880d
|
121.9 kB | Preview Download |
md5:767fb93b16eba3ece6c87d21aede3ea5
|
322.4 kB | Preview Download |
md5:d00f6b647074afaefaf54067cbbaa0a8
|
144.8 MB | Preview Download |
md5:93e7e184b4caa0cae1274449634ef0ef
|
24.3 kB | Preview Download |
md5:d7fec98973763497ceaf281d4432f5e9
|
255.6 MB | Preview Download |
md5:61bb598bb995a3a0f39b549af93c40c9
|
3.1 MB | Preview Download |
md5:c41fcc13c45a69ef60cf16dea82ba7f3
|
15.2 kB | Preview Download |
md5:3457ae14e808f728c7315dc59287a10f
|
90.7 MB | Preview Download |
md5:ad5d0c5ab1eb9da0383edf5965120cf2
|
3.2 GB | Preview Download |
md5:ff997bbf6c17583262338c9ec31f5c4c
|
88.8 MB | Preview Download |
md5:61b18b96f5889d22c99049579ed91679
|
58.4 MB | Preview Download |
md5:206032e5c84361025c5f948d16fa4d93
|
99.9 MB | Preview Download |
md5:9a15852fcb373381786615be6c389722
|
28.4 MB | Preview Download |
md5:859210c9034cf1692362d8c951f18400
|
6.8 MB | Preview Download |
md5:ab09c46699283d73145c31eab3afa683
|
236.1 MB | Preview Download |
md5:a1b9aa855b202152fde0d58e16a78898
|
1.6 MB | Preview Download |
md5:3fc773fcbd3144537f49a4b8fc6df7e9
|
2.3 MB | Preview Download |
Additional details
Related works
- Has part
- Conference proceeding: 10.1145/3641525.3663623 (DOI)
Dates
- Submitted
-
2025-01-30