Declarative, YAML-Based Workflows for Reproducible and Scalable Microbiome Analysis in the mia Ecosystem
Description
Background: TreeSummarizedExperiment objects are widely used in microbiome studies, and the mia ecosystem provides functions for tasks such as transformation, diversity profiling, ordination, and association testing. In practice, these analyses are often implemented as scripts or notebooks that evolve over time. As workflows are revised and extended, researchers frequently regenerate selected outputs and explore alternative choices, while intermediate results are scattered across files and folders. This can make it difficult to reproduce prior outputs, compare analysis variants, and share complete analysis settings with collaborators.
Methods: We propose a configuration-first workflow template in which users specify an analysis plan in a single YAML file using predefined step types and explicit dependencies. The YAML specification is compiled into a targets-compatible pipeline that records step parameters, manages intermediate outputs, and supports structured provenance. To reduce unnecessary recomputation, the workflow can use cached outputs for downstream analyses. We additionally explore a minimal form-based interface to assist with workflow creation without direct code editing.
Results: The template supports common microbiome analyses as modular steps while enforcing a consistent output structure and provenance. The workflow minimizes redundant computation and enables parallel execution of independent steps, thereby improving scalability and performance in high-performance computing environments. Centralizing step definitions in YAML improves traceability of analysis decisions, enables systematic comparison of analysis variants, and simplifies sharing of complete workflow specifications alongside results.
Conclusion: A YAML-based workflow description that integrates mia functions with targets can improve consistency, traceability, and computational efficiency in routine microbiome analyses. Future work will expand the step catalog, strengthen configuration validation, and further develop interactive workflow authoring and shared asset management.
Files
4th_June_Mongad.pdf
Files
(2.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:3c285c8a531fe2b37a089312b7e7642f
|
2.1 MB | Preview Download |
Additional details
Dates
- Submitted
-
2025-06-12