Published June 19, 2026 | Version v1

JupyDo: an automated server manager and JupyterHub infrastructure for reproducible bioinformatics

Description

Motivation
Reproducibility in bioinformatics is often compromised by conflicts between software installations, incompatible dependencies, and the lack of standardized computational environments [1].
Methods
JupyDo addresses this challenge by providing an accessible, multi-user infrastructure based on JupyterHub [2], characterized by a guided and fully automated installation process that simplifies deployment. This allows researchers to work in fully independent computational spaces without complex manual configurations. A core innovation of JupyDo is its absolute flexibility: users can start from any custom Docker [3] image. When a custom image is selected, JupyDo automatically builds and adapts it to function seamlessly within the JupyterLab environment. Crucially, this is achieved by deploying a separated, isolated Python installation dedicated exclusively to running the Jupyter infrastructure. This ensures that the original Python environment and dependencies of the base image remain completely untouched. During this automated setup, JupyDo also scans the image for existing Python or R virtual environments (e.g., Conda) and automatically registers them as ready-to-use kernels in JupyterLab. Furthermore, to ensure broad hardware compatibility, JupyDo maintains a dual-mode strategy, supporting deployments across both AMD/x86 and ARM nodes. In addition, JupyDo integrates a dedicated service for the creation and sharing of genomic indices among all users, effectively preventing data duplication and optimizing both storage and computational resources.
Results
JupyDo guarantees that customized environments can be safely preserved, reviewed, and shared. To support dynamic workflows, we implemented a supervised docker commit feature: users can install new tools within their container and submit a commit request, which an administrator can review and approve, ensuring both flexibility and security. Additionally, an integrated "Export Environment" tool streamlines the publication process. With a single action, researchers can export their entire workspace as a tar.gz archive, retrieve the exact Dockerfile, and automatically generate a preliminary draft of the "Materials and Methods" section detailing the software environment. By providing these robust adaptation and export mechanisms, JupyDo offers a scalable, platform-agnostic solution to ensure computational results remain verifiable, transparent, and effortlessly reusable in modern life science research.

Notes

Financial support for event participation was provided by the Open Bioinformatics Foundation (OBF) under the OBF Event Fellowships program (Round 1 2026). Official program details: https://www.open-bio.org/event-awards/

Files

Poster_BITS.pdf

Files (660.9 kB)

Name Size Download all
md5:abb5735b8ed74666830053f44b1af31e
660.9 kB Preview Download

Additional details

Software

Repository URL
https://github.com/Vehx35/JupyDo
Programming language
Python , Dockerfile , JavaScript
Development Status
Active

References

  • Errington TM, Denis A, Perfito N, Iorns E, Nosek BA. Challenges for assessing replicability in preclinical cancer biology. eLife. 2021;10.
  • D'Onofrio, A. et al. FairFlow: A Transparency-First Framework for Verifiable and Reproducible Bioinformatics. SSRN Electronic Journal (2026). DOI: 10.2139/ssrn.6339959
  • Merkel D. Docker: lightweight Linux containers for consistent development and deployment. Linux Journal. 2014;2014(239):2
  • Di Tommaso P, et al. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35:316–319.
  • Mölder F, et al. Sustainable data analysis with Snakemake. F1000Res. 2021;10:33
  • Alessandrì L, et al. rCASC: reproducible classification analysis of single-cell sequencing data. GigaScience. 2019;8
  • Beccuti M, et al. SeqBox: RNAseq/ChIPseq reproducible analysis on a consumer game computer. Bioinformatics. 2018;34:871–872
  • Jupyter Development Team. JupyterHub: A multi-user server for Jupyter notebooks. https://jupyterhub.readthedocs.io.
  • Kulkarni N, et al. Reproducible bioinformatics project: a community for reproducible bioinformatics analysis pipelines. BMC Bioinformatics. 2018;19:349