Poster Open Access

Pattern mining of mass spectrometry quality control data

Bittremieux, Wout; Valkenborg, Dirk; Mrzic, Aida; Willems, Hanny; Goethals, Bart; Laukens, Kris


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nam##2200000uu#4500</leader>
  <controlfield tag="005">20200120142611.0</controlfield>
  <controlfield tag="001">55989</controlfield>
  <datafield tag="711" ind1=" " ind2=" ">
    <subfield code="d">7-10 September 2014</subfield>
    <subfield code="g">ECCB</subfield>
    <subfield code="a">European Conference on Computational Biology</subfield>
    <subfield code="c">Strasbourg, France</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">VITO, Mol, Belgium</subfield>
    <subfield code="a">Valkenborg, Dirk</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Antwerp, Antwerp, Belgium</subfield>
    <subfield code="a">Mrzic, Aida</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">VITO, Mol, Belgium</subfield>
    <subfield code="a">Willems, Hanny</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Antwerp, Antwerp, Belgium</subfield>
    <subfield code="a">Goethals, Bart</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Antwerp, Antwerp, Belgium</subfield>
    <subfield code="a">Laukens, Kris</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2172417</subfield>
    <subfield code="z">md5:fad61c32e39e198a2aafc89ed91a7918</subfield>
    <subfield code="u">https://zenodo.org/record/55989/files/ECCB_2014_Pattern_mining_of_mass_spectrometry_quality_control_data.pdf</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="y">Conference website</subfield>
    <subfield code="u">http://www.eccb14.org/</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2014-09-07</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">openaire</subfield>
    <subfield code="o">oai:zenodo.org:55989</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="u">University of Antwerp, Antwerp, Belgium</subfield>
    <subfield code="a">Bittremieux, Wout</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Pattern mining of mass spectrometry quality control data</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="u">https://creativecommons.org/licenses/by-sa/4.0/legalcode</subfield>
    <subfield code="a">Creative Commons Attribution Share Alike 4.0 International</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;&lt;strong&gt;Pattern mining of mass spectrometry quality control data&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mass spectrometry is widely used to identify proteins based on the mass distribution of their peptides. Unfortunately, because of its inherent complexity, the results of a mass spectrometry experiment can be subject to a large variability. As a means of quality control, recently several qualitative metrics have been defined. [1] Initially these quality control metrics were evaluated independently in order to separately assess particular stages of a mass spectrometry experiment. However, this method is insufficient because the different stages of an experiment do not function in isolation, instead they will influence each other. As a result, subsequent work employed a multivariate statistics approach to assess the correlation structure of the different quality control metrics. [2] However, by making use of some more advanced data mining techniques, additional useful information can be extracted from these quality control metrics.&lt;/p&gt;

&lt;p&gt;Various pattern mining techniques can be employed to discover hidden patterns in this quality control data. Subspace clustering tries to detect clusters of items based on a restricted set of dimensions. [3] This can be leveraged to for example detect aberrant experiments where only a few of the quality control metrics are outliers, but the experiment still behaved correctly in general.&lt;br /&gt;
In addition, specialized frequent itemset mining and association rule learning techniques can be used to discover relationships between the various stages of a mass spectrometry experiment, as they are exhibited by the different quality control metrics.&lt;br /&gt;
Finally, a major source of untapped information lies in the temporal aspect. Most often, problems in a mass spectrometry setup appear gradually, but are only observed after a critical juncture. As previous analyses have not used this temporal information directly, there remains a large potential to detect these problems as soon as they start to manifest by taking this additional dimension of information into account. Based on the previously discovered patterns, these can be evaluated over time by making use of sequential pattern mining techniques.&lt;/p&gt;

&lt;p&gt;The awareness has risen that suitable quality control information is mandatory to assess the validity of a mass spectrometry experiment. Current efforts aim to standardize this quality control information [4], which will facilitate the dissemination of the data. This results in a large amount of as of yet untapped information, which can be leveraged by making use of specific data mining techniques in order to harness the full power of this new information.&lt;/p&gt;

&lt;p&gt;[1] Rudnick, P. A. et al. Performance metrics for liquid chromatography-tandem mass spectrometry systems in proteomics analyses. Molecular &amp;amp; Cellular Proteomics 9, 225&amp;ndash;241 (2010).&lt;br /&gt;
[2] Wang, X. et al. QC metrics from CPTAC raw LC-MS/MS data interpreted through multivariate statistics. Analytical Chemistry 86, 2497&amp;ndash;2509 (2014).&lt;br /&gt;
[3] Aksehirli, E., Goethals, B., M&amp;uuml;ller, E. &amp;amp; Vreeken, J. Cartification: A neighborhood preserving transformation for mining high dimensional data. in Thirteenth IEEE International Conference on Data Mining - ICDM &amp;rsquo;13 937&amp;ndash;942 (IEEE, 2013). doi:10.1109/ICDM.2013.146&lt;br /&gt;
[4] Walzer, M. et al. qcML: An exchange format for quality control metrics from mass spectrometry experiments. Molecular &amp;amp; Cellular Proteomics (2014). doi:10.1074/mcp.M113.035907&lt;/p&gt;</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.55989</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">poster</subfield>
  </datafield>
</record>
34
9
views
downloads
All versions This version
Views 3434
Downloads 99
Data volume 19.6 MB19.6 MB
Unique views 3434
Unique downloads 99

Share

Cite as