Published July 12, 2021 | Version v1
Journal article Open

On the feasibility of deep learning applications using raw mass spectrometry data

  • 1. Cognitive Computing & Industry Solutions, IBM Research Europe - Zurich, Rueschlikon 8803, Switzerland
  • 2. Institute of Basic Medical Sciences, School of Life Science, Westlake University, Hangzhou 310024, China
  • 3. Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich 8093, Switzerland

Description

In recent years, SWATH-MS has become the proteomic method of choice for data-independent–acquisition, as it enables high proteome coverage, accuracy and reproducibility. However, data analysis is convoluted and requires prior information and expert curation. Furthermore, as quantification is limited to a small set of peptides, potentially important biological information may be discarded. Here we demonstrate that deep learning can be used to learn discriminative features directly from raw MS data, eliminating hence the need of elaborate data processing pipelines. Using transfer learning to overcome sample sparsity, we exploit a collection of publicly available deep learning models already trained for the task of natural image classification. These models are used to produce feature vectors from each mass spectrometry (MS) raw image, which are later used as input for a classifier trained to distinguish tumor from normal prostate biopsies. Although the deep learning models were originally trained for a completely different classification task and no additional fine-tuning is performed on them, we achieve a highly remarkable classification performance of 0.876 AUC. We investigate different types of image preprocessing and encoding. We also investigate whether the inclusion of the secondary MS2 spectra improves the classification performance. Throughout all tested models, we use standard protein expression vectors as gold standards. Even with our naïve implementation, our results suggest that the application of deep learning and transfer learning techniques might pave the way to the broader usage of raw mass spectrometry data in real-time diagnosis.

Files

btab311.pdf

Files (1.4 MB)

Name Size Download all
md5:513ef0da34cbbbced03673df4992c102
1.4 MB Preview Download

Additional details

Funding

iPC – individualizedPaediatricCure: Cloud-based virtual-patient models for precision paediatric oncology 826121
European Commission