Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines

Allain, Fabrice; Roméjon, Julien; La Rosa, Philippe; Jarlier, Frédéric; Servant, Nicolas; Hupé, Philippe

doi:10.12688/openreseurope.13861.2

Published February 21, 2022 | Version 2

Journal article Open

Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines

1. Mines Paris Tech, Fontainebleau, F-77305, France
2. UMR144, CNRS, Paris, F-75005, France

With the advent of high-throughput biotechnological platforms and their ever-growing capacity, life science has turned into a digitized, computational and data-intensive discipline. As a consequence, standard analysis with a bioinformatics pipeline in the context of routine production has become a challenge such that the data can be processed in real-time and delivered to the end-users as fast as possible. The usage of workflow management systems along with packaging systems and containerization technologies offer an opportunity to tackle this challenge. While very powerful, they can be used and combined in many multiple ways which may differ from one developer to another. Therefore, promoting the homogeneity of the workflow implementation requires guidelines and protocols which detail how the source code of the bioinformatics pipeline should be written and organized to ensure its usability, maintainability, interoperability, sustainability, portability, reproducibility, scalability and efficiency. Capitalizing on Nextflow, Conda, Docker, Singularity and the nf-core initiative, we propose a set of best practices along the development life cycle of the bioinformatics pipeline and deployment for production operations which target different expert communities including i) the bioinformaticians and statisticians ii) the software engineers and iii) the data managers and core facility engineers. We implemented Geniac (Automatic Configuration GENerator and Installer for nextflow pipelines) which consists of a toolbox with three components: i) a technical documentation available at https://geniac.readthedocs.io to detail coding guidelines for the bioinformatics pipeline with Nextflow, ii) a command line interface with a linter to check that the code respects the guidelines, and iii) an add-on to generate configuration files, build the containers and deploy the pipeline. The Geniac toolbox aims at the harmonization of development practices across developers and automation of the generation of configuration files and containers by parsing the source code of the Nextflow pipeline.

Files

openreseurope-1-15693.pdf

Files (2.4 MB)

Name	Size	Download all
openreseurope-1-15693.pdf md5:4b18aca499d57eed93c5760c92814662	2.4 MB	Preview Download

Additional details

Cites: 10.3389/fgene.2014.00199 (DOI); 10.1038/nbt.3820 (DOI); 10.1093/bioinformatics/btw354 (DOI); 10.1038/s41587-020-0439-x (DOI); 10.1093/gigascience/giz109 (DOI); 10.1162/dint_a_00033 (DOI); 10.1016/j.gpb.2020.01.002 (DOI); 10.12688/f1000research.15140.2 (DOI); 10.1038/s41592-018-0046-7 (DOI); 10.1371/journal.pcbi.1008622 (DOI); 10.12688/f1000research.22954.3 (DOI); 10.12688/f1000research.24714.3 (DOI); 10.1371/journal.pone.0177459 (DOI); 10.1080/21655979.2015.1050162 (DOI); 10.1093/bib/bbw020 (DOI); 10.1093/gigascience/giaa140 (DOI); 10.1007/978-1-4939-9074-0_24 (DOI); 10.1038/s10038-020-00862-1 (DOI); 10.1038/sdata.2016.18 (DOI)
Is new version of: 10.12688/openreseurope.13861.1 (DOI)

da Veiga Leprevost F, Barbosa VC, Francisco EL (2014). On best practices in the development of bioinformatics software. Front Genet. doi:10.3389/fgene.2014.00199
Di Tommaso P, Chatzou M, Floden EW (2017). Nextflow enables reproducible computational workflows. Nat Biotechnol. doi:10.1038/nbt.3820
Ewels P, Magnusson M, Lundin S (2016). Multiqc: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. doi:10.1093/bioinformatics/btw354
Ewels PA, Peltzer A, Fillinger S (2020). The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. doi:10.1038/s41587-020-0439-x
Georgeson P, Syme A, Sloggett C (2019). Bionitio: demonstrating and facilitating best practices for bioinformatics command-line software. Gigascience. doi:10.1093/gigascience/giz109
Goble C, Cohen-Boulakia S, Soiland-Reyes S (2020). FAIR Computational Workflows. Data Intell. doi:10.1162/dint_a_00033
Goh WWB, Wong L (2020). The birth of bio-data science: Trends, expectations, and applications. Genomics Proteomics Bioinformatics. doi:10.1016/j.gpb.2020.01.002
Gruening B, Sallou O, Moreno P (2018). Recommendations for the packaging and containerizing of bioinformatics software [version 2; peer review: 2 approved, 1 approved with reservations]. F1000Res. doi:10.12688/f1000research.15140.2
Grüning B, Dale R, Sjödin A (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. doi:10.1038/s41592-018-0046-7
Hupé P, Allain F, Roméjon J (2022a). bioinfo-pf-curie/geniac: version-2.0.0.
Hupé P, Allain F, Servant N (2022b). bioinfo-pf-curie/geniac-demo: version-2.0.0.
Jackson M, Kavoussanakis K, Wallace EWJ (2021). Using prototyping to choose a bioinformatics workflow manage-ment system. PLoS Comput Biol. doi:10.1371/journal.pcbi.1008622
Jarlier F, Joly N, Fedy N (2020). QUARTIC: QUick pArallel algoRithms for high-Throughput sequencIng data proCessing [version 3; peer review: 2 approved]. F1000Res. doi:10.12688/f1000research.22954.3
Kamoun C, Roméjon J, de Soyres H (2020). development workflow protocols for bioinformatics pipelines with git and gitlab. F1000Res. doi:10.12688/f1000research.24714.3
Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS One. doi:10.1371/journal.pone.0177459
La Rosa P, Hupé P, Roméjon J (2022). bioinfo-pf-curie/geniac-demo-dsl2: version-2.0.0.
Lawlor B, Walsh P (2015). Engineering bioinformatics: building reliability, performance and productivity into bioinformatics software. Bioengineered. doi:10.1080/21655979.2015.1050162
Leipzig J (2017). A review of bioinformatic pipeline frameworks. Brief Bioinform. doi:10.1093/bib/bbw020
Merkel D (2014). Docker: Lightweight linux containers for consistent development and deployment. Linux J.
Reiter T, Brooks PT, Irber L (2021). Streamlining data-intensive biology with workflow systems. Gigascience. doi:10.1093/gigascience/giaa140
Servant N, Hupé P (2022). bioinfo-pf-curie/geniac-template: version-2.0.0.
Strozzi F, Janssen R, Wurmus R (2019). Scalable Workflows and Reproducible Data Analysis for Genomics. Methods Mol Biol. doi:10.1007/978-1-4939-9074-0_24
Tanjo T, Kawai Y, Tokunaga K (2021). Practical guide for managing large-scale human genome data in research. J Hum Genet. doi:10.1038/s10038-020-00862-1
Wilkinson MD, Dumontier M, Aalbersberg IJJ (2016). The fair guiding principles for scientific data management and stewardship. Sci Data. doi:10.1038/sdata.2016.18

	All versions	This version
Views	29	28
Downloads	71	70
Data volume	169.3 MB	166.9 MB

Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines

Files

openreseurope-1-15693.pdf

Files (2.4 MB)

Additional details

Related works

References

Geniac: Automatic Configuration GENerator and Installer for nextflow pipelines

Creators

Description

Files

openreseurope-1-15693.pdf

Files (2.4 MB)

Additional details

Related works

References