Poster Open Access
Bittremieux, Wout; Kelchtermans, Pieter; Valkenborg, Dirk; Martens, Lennart; Goethals, Bart; Laukens, Kris
Mining mass spectrometry quality control data
Because of the inherent complexity of mass spectrometry, the results of an experiment can be subject to a large variability. As a means of quality control, recently several qualitative metrics have been defined. These quality control metrics can be used individually to assess the quality of a single experiment. In addition, by making use of data mining and machine learning approaches, several advanced analyses can be performed between different mass spectrometry runs as well.
In order to provide a pervasive and standardized means to report quality control information for mass spectrometry experiments, the qcML standard has been developed. The qcML standard supports an automated quality control pipeline by providing a set of useful metrics that can be calculated on the acquired data. In addition, it provides a standard format for the exchange of these metrics; both an XML-based qcML file format and a qcDB relational database structure have been defined.
To work with qcML data the open-source Java library jqcML has been developed. Firstly, jqcML provides a complete object model to represent qcML data. Secondly, jqcML provides the ability to read, write and work in a uniform manner with qcML data from different sources.
Preliminary Data or Plenary Speakers Abstract
A variety of quality control metrics have been calculated using OpenMS and QuaMeter. Subsequently, jqcML was used to combine and merge the distinct metrics generated by the different tools. In addition, using jqcML the multiple qcML files could easily be converted to a qcDB relational database for efficient storage of large amounts of data and the subsequent analysis.
Using this data it is possible to perform several analyses between different mass spectrometry runs. Firstly, the behavior of each quality parameter can be evaluated individually. However, more advanced analyses are possible as well. For example, using pattern mining techniques, frequently co-occurring patterns can be found, from which relationships between various parameters can be deduced. Furthermore, outlier detection based on all parameters simultaneously can be performed using specific subspace clustering techniques.
An important requirement for the maturation of mass spectrometry-based proteomics is the dissemination of quality control information alongside published datasets. By adapting and developing specific data mining and machine learning techniques, it will be possible to harness the full power of this new information.
The qcML standard will be introduced, along with its Java library jqcML. Additionally, possible data mining techniques will be illustrated.