Published June 19, 2026
| Version v1
Poster
Open
JupyDo: an automated server manager and JupyterHub infrastructure for reproducible bioinformatics
Authors/Creators
Description
| Motivation |
| Reproducibility in bioinformatics is often compromised by conflicts between software installations, incompatible dependencies, and the lack of standardized computational environments [1]. |
| Methods |
| JupyDo addresses this challenge by providing an accessible, multi-user infrastructure based on JupyterHub [2], characterized by a guided and fully automated installation process that simplifies deployment. This allows researchers to work in fully independent computational spaces without complex manual configurations. A core innovation of JupyDo is its absolute flexibility: users can start from any custom Docker [3] image. When a custom image is selected, JupyDo automatically builds and adapts it to function seamlessly within the JupyterLab environment. Crucially, this is achieved by deploying a separated, isolated Python installation dedicated exclusively to running the Jupyter infrastructure. This ensures that the original Python environment and dependencies of the base image remain completely untouched. During this automated setup, JupyDo also scans the image for existing Python or R virtual environments (e.g., Conda) and automatically registers them as ready-to-use kernels in JupyterLab. Furthermore, to ensure broad hardware compatibility, JupyDo maintains a dual-mode strategy, supporting deployments across both AMD/x86 and ARM nodes. In addition, JupyDo integrates a dedicated service for the creation and sharing of genomic indices among all users, effectively preventing data duplication and optimizing both storage and computational resources. |
| Results |
| JupyDo guarantees that customized environments can be safely preserved, reviewed, and shared. To support dynamic workflows, we implemented a supervised docker commit feature: users can install new tools within their container and submit a commit request, which an administrator can review and approve, ensuring both flexibility and security. Additionally, an integrated "Export Environment" tool streamlines the publication process. With a single action, researchers can export their entire workspace as a tar.gz archive, retrieve the exact Dockerfile, and automatically generate a preliminary draft of the "Materials and Methods" section detailing the software environment. By providing these robust adaptation and export mechanisms, JupyDo offers a scalable, platform-agnostic solution to ensure computational results remain verifiable, transparent, and effortlessly reusable in modern life science research. |
Notes
Files
Poster_BITS.pdf
Files
(660.9 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:abb5735b8ed74666830053f44b1af31e
|
660.9 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/Vehx35/JupyDo
- Programming language
- Python , Dockerfile , JavaScript
- Development Status
- Active
References
- Errington TM, Denis A, Perfito N, Iorns E, Nosek BA. Challenges for assessing replicability in preclinical cancer biology. eLife. 2021;10.
- D'Onofrio, A. et al. FairFlow: A Transparency-First Framework for Verifiable and Reproducible Bioinformatics. SSRN Electronic Journal (2026). DOI: 10.2139/ssrn.6339959
- Merkel D. Docker: lightweight Linux containers for consistent development and deployment. Linux Journal. 2014;2014(239):2
- Di Tommaso P, et al. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35:316–319.
- Mölder F, et al. Sustainable data analysis with Snakemake. F1000Res. 2021;10:33
- Alessandrì L, et al. rCASC: reproducible classification analysis of single-cell sequencing data. GigaScience. 2019;8
- Beccuti M, et al. SeqBox: RNAseq/ChIPseq reproducible analysis on a consumer game computer. Bioinformatics. 2018;34:871–872
- Jupyter Development Team. JupyterHub: A multi-user server for Jupyter notebooks. https://jupyterhub.readthedocs.io.
- Kulkarni N, et al. Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines. BMC Bioinformatics. 2018;19:349