There is a newer version of the record available.

Published March 5, 2021 | Version 1.0.0
Journal article Open

Retention Time Prediction Using Neural Networks Increases Identifications in Crosslinking Mass Spectrometry

  • 1. Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany;Data Analytics and Computational Statistics, Hasso Plattner Institute for Digital Engineering;Digital Engineering Faculty, University of Potsdam
  • 2. Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
  • 3. Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany;Wellcome Centre for Cell Biology, School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom

Description

Abstract:

Crosslinking mass spectrometry (Crosslinking MS) has developed into a robust technique that is increasingly used to investigate the interactomes of organelles and cells. However, the incomplete and noisy information contained in mass spectra limits the numbers of protein-protein interactions (PPIs) that can be confidently identified. Here, we successfully leveraged chromatographic retention time (RT) information to aid the identification of crosslinked peptides from mass spectra. Our Siamese machine learning model xiRT achieved highly accurate RT predictions of crosslinked peptides in a multi-dimensional separation of crosslinked E. coli lysate. We combined strong cation exchange (SCX), hydrophilic strong anion exchange (hSAX) and reversed-phase (RP) chromatography and reached  0.94 in RP and a margin of error of 1 fraction for hSAX in 94%, and SCX in 85% of the predictions. Importantly, supplementing the search engine score with retention time features led to a 1.4-fold increase in PPIs at a 1% false discovery rate. We also demonstrate the value of this approach for the more routine analysis of a crosslinked multiprotein complexes. An increase of 1.7-fold in heteromeric crosslinked residue-pairs was achieved at 1% residue-pair FDR for Fanconi anaemia monoubiquitin ligase complex, solely using reversed-phase RT. Retention times are a powerful complement to mass spectrometric information to increase the sensitivity of Crosslinking MS analyses.

 

Conclusion:

Using a Siamese network architecture, we succeeded in bringing RT prediction into the Crosslinking MS field, independent of separation setup and search software. Our open source application xiRT introduces the concept of multi-task learning to achieve multi-dimensional chromatographic retention time prediction, and may use any peptide sequence-dependent measure including for example collision cross section or isoelectric point. The black-box character of the neural network was reduced by means of interpretable machine learning that revealed individual amino acid contributions towards the separation behavior. The RT predictions – even when using only the RP dimension – complement mass spectrometric information to enhance the identification of heteromeric crosslinks in multiprotein complex and proteome-wide studies. Overfitting does not account for this gain as known false target matches from an entrapment database did not increase. Leveraging additional information sources may help to address the mass-spectrometric identification challenge of heteromeric crosslinks.

Notes

Software is available on GitHub: https://github.com/Rappsilber-Laboratory/xiRT The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the jPOST partner repository62 with the dataset identifier PXD020407 and DOI 10.6019/PXD020407

Files

ecoli.zip

Files (3.3 GB)

Name Size Download all
md5:9e9a1a2db3200f19e87e3c12f7d89e81
2.3 GB Preview Download
md5:27af81f8037a57bb870f012962cbf969
55.6 MB Preview Download
md5:449d974f951276c23aeee3f09b80e3f5
722.4 MB Preview Download
md5:e7bc356799f63a62f7a5bc864c434d43
31.3 MB Preview Download
md5:462ab18eef7213280ee793270c831ecf
13.6 MB Preview Download
md5:f19d8f51841ef5d6f7ebf92693d03452
1.7 kB Preview Download
md5:4369149833add9b9a3a09dcf2e91d957
169.7 MB Preview Download

Additional details

Funding

Protein structures in the context of time and space by mass spectrometry. 103139
Wellcome Trust