An Open Software Development-based Ecosystem of R Packages for Metabolomics Data Analysis
Creators
-
Rainer, Johannes1
-
Louail, Philippine1
-
Vicini, Andrea2
-
Gine, Roger3
-
Badia, Josep M3, 4
-
Stravs, Michele5, 6
-
Garcia-Aloy, Mar7
-
Huber, Carolin8
-
Salzer, Liesa9
-
Stanstrup, Jan10
-
Shahaf, Nir11
-
Panse, Christian12, 13
-
Naake, Thomas14
-
Kumler, William15
-
Vangeenderhuysen, Pablo16
-
Brunius, Carl17
-
Hecht, Helge18
-
Neumann, Steffen19
-
Witting, Michael20
-
Gibb, Sebastian21
-
Gatto, Laurent2
- 1. Institute for Biomedicine, Eurac Research, Italy
- 2. Computational Biology and Bioinformatics, de Duve Institute, UCLouvain, Belgium
- 3. Department of Electronic Engineering & IISPV, Universitat Rovira i Virgili, Spain
- 4. CIBER de Diabetes y Enfermedades Metabólicas Asociadas (CIBERDEM), Instituto de Salud Carlos III, Spain
- 5. Department of Environmental Chemistry, Switzerland
- 6. Institute of Molecular Systems Biology, ETH Zurich, Switzerland
- 7. Metabolomics Unit, Research and Innovation Centre, Fondazione Edmund Mach, Italy
- 8. Department of Effect Directed Analysis, Helmholtz Center for Environmental Research, Germany
- 9. Research Unit Analytical BioGeoChemistry, Helmholtz Munich, Germany
- 10. Department of Nutrition, Exercise and Sports, University of Copenhagen, Denmark
- 11. Department of Plant and Environmental Sciences, Weizmann Institute of Science, Israel
- 12. Functional Genomics Center Zurich (FGCZ)-University of Zurich/ETH Zurich, Switzerland
- 13. Swiss Institute of Bioinformatics (SIB), Switzerland
- 14. Genome Biology Unit, EMBL, Germany
- 15. Ingalls Lab at the School of Oceanography, University of Washington, USA
- 16. Laboratory of Integrative Metabolomics (LIMET), Department of Translational Physiology, Infectiology and Public Health (DI04), Faculty of Veterinary Medicine, Ghent University
- 17. Department of Life Sciences, Chalmers University of Technology, Sweden
- 18. RECETOX, Faculty of Science, Masaryk University, Czech Republic
- 19. Computational Plant Biochemistry, MetaCom, Leibniz Institute of Plant Biochemistry, Germany
- 20. Metabolomics and Proteomics Core, Helmholtz Munich, Germany
- 21. Anesthesiology and Intensive Care Medicine, University Hospital Greifswald, Germany
Description
A frequent problem with scientific research software is the lack of support, maintenance and further development. In particular, development by a single researcher can easily result in orphaned software packages, especially if combined with poor documentation or lack of adherence to open software development standards.
The RforMassSpectrometry initiative aims to develop an efficient and stable infrastructure for mass spectrometry (MS) data analysis. As part of this initiative, a growing ecosystem of R software packages is being developed covering different aspects of metabolomics and proteomics data analysis. To avoid the aforementioned problems, community contributions are fostered, and open development, documentation and long-term support emphasized.
At the heart of the package ecosystem is the Spectra package that provides the core infrastructure to handle and analyze MS data. Its design allows easy expansion to support additional file or data formats including data representations with minimal memory footprint or remote data access. The xcms package for LC-MS data preprocessing was updated to reuse this infrastructure, enabling now also the analysis of very large, or remote, data. This integration simplifies in addition complete analysis workflows which can include the MsFeatures package for compounding, and the MetaboAnnotation package for annotation of untargeted metabolomics experiments. Public annotation resources can be easily accessed through packages such as MsBackendMassbank, MsBackendMgf, MsBackendMsp or CompoundDb, the latter also allowing to create and manage lab-specific compound databases. Finally, the MsCoreUtils and MetaboCoreUtils packages provide efficient implementations of commonly used algorithms, designed to be re-used in other R packages. Ultimately, and in contrast to a monolithic software design, the package ecosystem enables to build customized, modular, and reproducible analysis workflows.
Future development will focus on improved data structures and analysis methods for chromatographic data, and better interoperability with other open source softwares including a direct integration with Python MS libraries.
Notes
Files
RforMassSpectrometry_metabolomics.pdf
Files
(1.4 MB)
Name | Size | Download all |
---|---|---|
md5:48d7a86b33fddccfda85917bfea1b55d
|
1.4 MB | Preview Download |
Additional details
References
- Naake, Thomas et al (2023). MsQuality – an interoperable open-source package for the calculation of standardized quality metrics of mass spectrometry data. Bioinformatics https://doi.org/10.1093/bioinformatics/btad618
- Rainer, Johannes et al. (2022). A Modular and Expandable Ecosystem for Metabolomics Data Annotation in R. Metabolites. https://doi.org/10.3390/metabo12020173
- Kockmann, Thomas et al. (2021). The rawrr R Package: Direct Access to Orbitrap Data and Beyond. Journal of Proteome Research. https://doi.org/10.1021/acs.jproteome.0c00866
- Huber, Florian et al. (2020). matchms – processing and similarity evaluation of mass spectrometry data. JOSS https://doi.org/10.21105/joss.02411
- Jarmusch, Alan K et al. (2022). A Universal Language for Finding Mass Spectrometry Data Patterns. bioRxiv https://doi.org/10.1101/2022.08.06.503000