Multi-Domain Experiment Dataset for Evaluating Reproducibility Tools (E1-E20)
Description
Overview
This repository provides a curated dataset designed to address the critical need for standardized benchmarks in evaluating reproducibility tools. By encompassing a diverse collection of computational experiments across multiple scientific disciplines, this dataset facilitates the assessment of existing reproducibility tools while identifying areas for improvement.
Dataset Composition
The dataset previously published includes 20 curated experiments (E1–E20) that successfully reproduced results using at least one reproducibility tool. These experiments span various research domains, reflecting the complexity and diversity of modern computational research:
- Computer Science: Experiments sourced from IEEE/ACM ICSE 2022 and VLDB 2021, covering topics such as software engineering, database management, and artificial intelligence.
- Life Sciences: Climate change studies.
- Health Sciences: Medical research.
- Social Sciences & Humanities: Economics and interdisciplinary computational research.
Each experiment is carefully documented, providing:
- The original source code and data (where available).
- Metadata including programming languages, project size, and dependencies.
- Execution instructions to facilitate reproducibility.
Purpose & Applications
This dataset serves as a benchmarking resource for researchers, developers, and tool creators working on reproducibility in computational science. It enables:
- Comparative evaluation of reproducibility tools by testing real-world computational experiments.
- Assessment of limitations in current platforms regarding dependency management, execution consistency, and software environment variability.
- Advancement of reproducibility standards by promoting transparency, reliability, and cross-disciplinary collaboration.
Reproducibility Tools Evaluated
The dataset has been used to assess the capabilities of eight major reproducibility tools, including:
- Whole Tale
- Code Ocean
- RenkuLab
- ReproZip
- Binder (No package available)
- Sciunit
- FLINC
The results highlight key challenges in modern computational reproducibility, including insufficient documentation, evolving software dependencies, and the need for more adaptable solutions.
Files
Code Ocean.zip
Files
(22.5 GB)
Name | Size | Download all |
---|---|---|
md5:4f6cf35a5bfb682dc9071f8717fb6a79
|
1.5 GB | Preview Download |
md5:443d9e24e23547d5b9338edd2097d018
|
42.4 MB | Preview Download |
md5:951dde3cf95d439c3a07d64c50b0203f
|
9.0 GB | Preview Download |
md5:e0e72e54f7c9d4e1bdc8deffeff065bc
|
5.3 kB | Preview Download |
md5:8a3ec66f3a659de04d01d0f4186d81c2
|
6.9 GB | Preview Download |
md5:342ee9d0491c7e90e9de372b0dd31a72
|
5.1 GB | Preview Download |
Additional details
Dates
- Submitted
-
2024-11-13