Published February 17, 2022 | Version 1
Dataset Open

Dataset: A multi-label classifier for predicting the most appropriate instrumental method for the analysis of contaminants of emerging concern

  • 1. National and Kapodistrian University of Athens
  • 2. National Centre for Scientific Research "Demokritos"
  • 3. Cognity S.A
  • 4. Environmental Institute s.r.o.

Description

NORMAN Suspect List Exchange was used for the generation of the dataset. Datasets with clear label (LC or GC) were used. More specifically, we used S3 NORMANCT15, which contains a list of compounds that were detected in surface water from the Danube River in a pan-European collaborative trial employing both GC-HRMS and LC-HRMS. Moreover, the GC and LC target list were used by the following two institutes: National and Kapodistrian University of Athens (NKUA) and Helmholtz Centre for Environmental Research (UFZ). S21 UATHTARGETS is the LC target list of NKUA, S65 UATHTARGETSGC is the GC target list of NKUA and S53 UFZWANATARG contains the LC and GC target list of UFZ. Finally, two GC target lists (S51 WRIGCHRMS and S70 EISUSGCEIMS) were used. These lists contain GC substance lists and were provided by two Slovak institutes, the Water Research Institute (WRI) and Environmental Institute. The aforementioned compound lists were merged together to form a labelled dataset. The SMILES were used to calculate 1446 molecular descriptors. 1446 descriptors were produced by PaDEL-descriptor, logP was produced by JRgui and boiling point by USEPA ECOSAR.

The dataset is used in the publication:

"A multi-label classifier for predicting the most appropriate instrumental method for the analysis of contaminants of emerging concern" authored by

Nikiforos Alygizakis, Vasileios Konstantakos, Grigoris Bouziotopoulos , Evangelos Kormentzas, Jaroslav Slobodnik and Nikolaos S. Thomaidis

Github repository: https://github.com/nalygizakis/LCvsGC

Files

Files (89.1 MB)

Name Size Download all
md5:5f233fddf52ff536e4907376cd4c706d
89.1 MB Download