Published January 17, 2020 | Version v1
Dataset Open

Mathematical chromatography deciphers the molecular fingerprints of dissolved organic matter

  • 1. Chalmers University of Technology
  • 2. Uppsala University

Description

High-resolution mass spectrometry (HRMS) elucidates the molecular composition of dissolved organic matter (DOM) through the unequivocal assignment of molecular formulas. When HRMS is used as a detector coupled to high performance liquid chromatography (HPLC), the molecular fingerprints of DOM are further augmented. However, the identification of eluting compounds remains impossible when DOM chromatograms consist of unresolved humps. Here, we utilized the concept of mathematical chromatography to achieve information reduction and feature extraction. Parallel Factor Analysis (PARAFAC) was applied to a dataset describing the reverse-phase separation of DOM in headwater streams located in southeast Sweden. A dataset consisting of 1355 molecular formulas and 7178 mass spectra was reduced to five components that described 96.89% of the data. Each component summarized the distinct chromatographic elution of molecular formulas with different polarity. Component scores represented the abundance of the identified HPLC features in each sample. Using this chemometric approach allowed the identification of common patterns in HPLC–HRMS datasets by reducing thousands of mass spectra to only a few statistical components. Unlike in principal component analysis (PCA), components closely followed the analytical principles of HPLC–HRMS and therefore represented more realistic pools of DOM. This approach provides a wealth of new opportunities for unravelling the composition of complex mixtures in natural and engineered systems.

Notes

Mathematical chromatography offers information reduction and feature extraction in complex liquid chromatography—mass spectrometry datasets.

All six datasets contain ReadMe files named "Readme - datasetX.txt" that provide detailed information for each dataset. Missing numbers are always idicated by "NaN", contents of rows and columns is always explained in the ReadMe files.

The data publication does not contain the MATLAB scripts or functions that were used to create datasets. However, with the information provided in the section "Methods", all datasets can be recreated. All data are provided as comma-separated files that can be read on any platform in any programming environment that is capable of reading *.csv-files.

PARAFAC can be carried out using software provided free-of-charge in R (multiway package,https://cran.r-project.org/web/packages/multiway/multiway.pdf), in Python (TensorLy package, http://tensorly.org/stable/modules/generated/tensorly.decomposition.parafac.html), and in MATLAB (N-way package, http://models.life.ku.dk/nwaytoolbox).

Funding provided by: Svenska Forskningsrådet Formas
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100001862
Award Number: 2017-00743

Funding provided by: Vetenskapsrådet
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100004359
Award Number: 2018-04618

Funding provided by: Stiftelsen Åforsk
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100009789
Award Number: 19-499

Files

dataset1.zip

Files (183.0 MB)

Name Size Download all
md5:3e1cd48902ff0d37ec5ce7c5ba3eec3d
59.6 MB Preview Download
md5:928ffed3e55d810540409ebd17e27d3a
43.2 MB Preview Download
md5:285d3761c99d916ec1dce3138f7cb24c
2.0 MB Preview Download
md5:90a8b70859aa91efaf0156349900ffc2
2.4 MB Preview Download
md5:27c5cdeb58c59150bfc4c674059e3ab0
65.9 kB Preview Download
md5:88abfb64a9c37e15c8b86416c2583966
75.7 MB Preview Download

Additional details

Related works

Is cited by
10.1039/c9an02176k (DOI)