Presentation Open Access
Bittremieux, Wout; Kelchtermans, Pieter; Valkenborg, Dirk; Martens, Lennart; Laukens, Kris
Collecting and mining mass spectrometry quality control data
The awareness that systematic quality control is an essential factor to enable the growth of proteomics into a mature analytical discipline has increased over the past few years. To this aim the qcML format has recently been proposed to store and disseminate quality control metrics for mass spectrometry-based proteomics experiments. With the aid of the jqcML Java library, this quality control data can easily be used as input for data mining and machine learning tasks. Specific applications are for example subspace clustering in order to detect outliers.
Mass spectrometry is widely used to identify proteins based on the mass distribution of their peptides. Unfortunately, because of its inherent complexity, the results of a mass spectrometry experiment can be subject to a large variability. As a means of quality control, recently several qualitative metrics have been defined.1 These quality control metrics can be used individually to assess the quality of a single experiment. However, by making use of data mining and machine learning approaches, several advanced analyses can be performed between different mass spectrometry runs as well.
In order to provide a pervasive and standardized means to report quality control information for mass spectrometry experiments, the qcML standard2 has been developed. The qcML standard aims to support an automated quality control pipeline by providing a set of useful metrics that can be calculated on the acquired data. Additionally, it aims to provide a standard format for the exchange of these metrics.
To exchange qcML data an XML-based file format has been developed. This is a universal format that captures metrics and metadata about all kinds of mass spectrometry experiments. As such, the qcML file format can be used as a container to separate quality control information from the actual data analysis. Furthermore, the qcDB relational database format has been developed to complement the XML-based file format, in order to store large amounts of qcML data over time.
To work with qcML data the open-source Java library jqcML3 has been developed. Firstly, jqcML provides a complete object model to represent qcML data. Secondly, jqcML provides the ability to read, write and work in a uniform manner with qcML data from different sources, including the XML-based qcML file format and the relational database qcDB.
Results & Discussion
OpenMS4, an open-source library for LC/MS data management and analyses, provides a (modular) tool to calculate qcML data. By using this pipeline raw files detailing mass spectrometry experiments can easily be processed to output a qcML file.
In order to perform advanced analyses on quality control data, results for several mass spectrometry experiments were generated in the qcML format using a Knime5 workflow provided by OpenMS. Subsequently jqcML was used to store the data in a qcDB relational database, and to extract relevant quality control parameters.
Using this data it is possible to perform several analyses between different mass spectrometry runs. Firstly, the behaviour of each parameter can be evaluated individually. However, more advanced analyses are possible as well. For example, outlier detection based on all parameters simultaneously can be performed using specific subspace clustering techniques.
An important requirement for the maturation of mass spectrometry-based proteomics is the dissemination of quality control information alongside published datasets. By adapting and developing specific data mining and machine learning techniques, it will be possible to harness the full power of this new information.
1. Rudnick, P.A. et al. Performance Metrics for Liquid Chromatography-Tandem Mass Spectrometry Systems in Proteomics Analyses. Molecular & Cellular Proteomics 9.2, 225–241 (2010).
2. Walzer M. et al. qcML: an exchange format for quality control metrics from mass spectrometry instruments. [Manuscript submitted].
3. Bittremieux W. et al. jqcML: an open-source Java API for mass spectrometry quality control data in the qcML format. [Manuscript submitted].
4. Kohlbacher, O. et al. TOPP--The OpenMS Proteomics Pipeline. Bioinformatics 23, e191–e197 (2007).
5. Berthold, Michael R. et al. KNIME: The Konstanz information miner. Springer Berlin Heidelberg, 319-326 (2008).