A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods

Müller, Erik; Huber, Carolin-Elisabeth; Beckers, Liza-Marie; Brack, Werner; Krauss, Martin; Schulze, Tobias

doi:10.5281/zenodo.3756211

Published February 27, 2020 | Version 3

Dataset Open

A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods

1. Helmholtz Centre for Environmental Research

Non-targeted mass spectrometry (MS) has become an important method over the last years in the fields of metabolomics and environmental research. While more and more algorithms and workflows become available to process a large number of data sets nontargeted, there still exist few manually evaluated universal test data sets for refining and evaluating these methods. The first step of non-targeted screening, peak detection (and refinement of it) is arguably the most important step for non-targeted screening. However, the absence of a model data set makes it harder for researchers to evaluate peak detection methods. In this Data Descriptor, we provide a manually checked data set consisting of 255,000 EICs (5000 peaks randomly sampled from across 51 samples) for the evaluation on peak detection and gap filling algorithms. The data set was created from a previous real-world study, of which a subset was used to extract and manually classify ion chromatograms by three mass spectrometry experts. The data set consists of:

51 converted mass spectral files in mzML format
An .RData-file containing the extracted ion chromtograms (EICs)
The randomly selected subset and the original output table of MZmine in .csv-format
Example .xlsx files for the classification
2 central classification tables
Several tables with additional information about the sampling, chemical analysis and expert jugdement on EICs

For a full description of the experiment and the data set, please read the related Data Descriptor with the title "A data set of 255000 randomly selected and manually classified extracted ion chromatograms for evaluation of peak detection methods" in Metabolites (https://www.mdpi.com/journal/metabolites; DOI: https://doi.org/10.3390/metabo10040162).

Files

A_dataset_for_evaluation_of_peak_detection_methods.zip

Files (12.1 GB)

Name	Size	Download all
A_dataset_for_evaluation_of_peak_detection_methods.zip md5:9d004b0500c968b9e8ad74b600ba077f	4.0 GB	Preview Download
A_dataset_for_evaluation_of_peak_detection_methods_v2.zip md5:c601caa3f671b24ae580fff2205054be	4.0 GB	Preview Download
A_dataset_for_evaluation_of_peak_detection_methods_v3.zip md5:d86faf581d85b640384ebb27d11c2085	4.0 GB	Preview Download

Additional details

Is supplement to: Journal article: 10.3390/metabo10040162 (DOI); Journal article: 10.1016/j.scitotenv.2020.138388 (DOI); Journal article: 10.1021/acs.analchem.0c00899 (DOI)

European Commission
SOLUTIONS - Solutions for present and future emerging pollutants in land and water resources management 603437

	All versions	This version
Views	1,768	1,081
Downloads	859	744
Data volume	10.6 TB	10.1 TB

A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods

Files

A_dataset_for_evaluation_of_peak_detection_methods.zip

Files (12.1 GB)

Additional details

Related works

Funding

References

A Data Set of 255,000 Randomly Selected and Manually Classified Extracted Ion Chromatograms for Evaluation of Peak Detection Methods

Creators

Description

Files

A_dataset_for_evaluation_of_peak_detection_methods.zip

Files (12.1 GB)

Additional details

Related works

Funding

References