Published August 4, 2025 | Version v1
Conference paper Open

MaRDI's Zenodo Community for Graphical Modeling and Causal Inference

  • 1. Technical University of Munich

Contributors

  • 1. Nationale Forschungsdateninfrastruktur (NFDI) e.V.
  • 2. University of Amsterdam

Description

The Graphical Modeling and Causal Inference (GMCI) community, hosted on Zenodo [1][A], supports researchers developing statistical methodology via a curated collection of datasets and notebooks. Adopting the FAIR principles [2], it seeks to host and moderate topical contributions of datasets, metadata, data analyses, and methodological implementations in an enriched repository akin to related efforts such as CauseMe [3], OpenML [4], PhysioNet [5]. As a Zenodo community, GMCI provides a free long-term storage solution with unique digital object identifiers (DOI) that ensure and build a stable network of digital research data. Following an open data access strategy under transparently communicated design decisions, the submissions are findable and readily reusable. Initiated as part of MaRDI, the Mathematical Research Data Initiative, a consortium of the National Research Data Infrastructure (NFDI), community entries are embedded into the online MaRDI knowledge graph [B] with over 5 million nodes of publications, software, models, authors, and further information. The GMCI community primarily addresses statisticians working on methods for structure learning and causal effect estimation in the context of probabilistic and causal graphical models [C], [6]–[8]. These problems are notable because the quality of estimates cannot be validated using standard tabular datasets alone, as they target underlying stochastic dependence structures or unobserved interventional regimes. Hence, empirical comparisons of estimation methods rely on enriched datasets, which must include some information on a ground truth. Drawing on a well-curated initial collection of enriched datasets, the GMCI community is designed to establish best-practice standards, also for further moderated submissions from the broader academic community. There are currently two options for contributions to the GMCI community's Zenodo repository [A] in the form of datasets (or dataset collections) and software. The complete submission procedure is detailed online [D]. Any submission requires literature references to the origin of the datasets or methodologies and valid licensing information. Community moderators [E] process each submission, create the necessary embedding links, potentially request revisions, and assist with possible questions. As of April 2025, there were 14 curated datasets online with more than 2,000 recorded downloads. Researchers in various fields face common challenges when modeling data and making causal inferences, as they must apply the available algorithms correctly and draw appropriate conclusions from the obtained results. To connect developed methodologies and raw datasets, we are currently curating educational material as references for data analysis workflows. Understanding workflows and arguments will increase the quality of statistical analyses from both statisticians and non-statisticians and will enable researchers to share their insights with the community. Software contributions that demonstrate and effectively communicate relevant research findings are connected to the community through Zenodo's git release integration. As these contributions are only presented in a folder structure, we are working on an online book to present the statistical notebooks jointly. In summary, the GMCI community offers a stable, structured platform that accumulates enriched datasets and methodological implementations. Contributions gain visibility within a relevant audience, including non-statistical researchers, while also meeting citation requirements through assigned DOIs. By curating high-quality submissions and integrating them into a broader research network, we support transparent, reusable, and collaborative data sharing.

Files

CoRDI_2025_paper_260.pdf

Files (76.5 kB)

Name Size Download all
md5:13a9a7021c2ee0d6f839c414e99c5c31
76.5 kB Preview Download