Dataset: A multi-label classifier for predicting the most appropriate instrumental method for the analysis of contaminants of emerging concern
Creators
- 1. National and Kapodistrian University of Athens
- 2. National Centre for Scientific Research "Demokritos"
- 3. Cognity S.A
- 4. Environmental Institute s.r.o.
Description
NORMAN Suspect List Exchange was used for the generation of the dataset. Datasets with clear label (LC or GC) were used. More specifically, we used S3 NORMANCT15, which contains a list of compounds that were detected in surface water from the Danube River in a pan-European collaborative trial employing both GC-HRMS and LC-HRMS. Moreover, the GC and LC target list were used by the following two institutes: National and Kapodistrian University of Athens (NKUA) and Helmholtz Centre for Environmental Research (UFZ). S21 UATHTARGETS is the LC target list of NKUA, S65 UATHTARGETSGC is the GC target list of NKUA and S53 UFZWANATARG contains the LC and GC target list of UFZ. Finally, two GC target lists (S51 WRIGCHRMS and S70 EISUSGCEIMS) were used. These lists contain GC substance lists and were provided by two Slovak institutes, the Water Research Institute (WRI) and Environmental Institute. The aforementioned compound lists were merged together to form a labelled dataset. The SMILES were used to calculate 1446 molecular descriptors. 1446 descriptors were produced by PaDEL-descriptor, logP was produced by JRgui and boiling point by USEPA ECOSAR.
The dataset is used in the publication:
"A multi-label classifier for predicting the most appropriate instrumental method for the analysis of contaminants of emerging concern" authored by
Nikiforos Alygizakis, Vasileios Konstantakos, Grigoris Bouziotopoulos , Evangelos Kormentzas, Jaroslav Slobodnik and Nikolaos S. Thomaidis
Github repository: https://github.com/nalygizakis/LCvsGC
Files
Files
(89.1 MB)
Name | Size | Download all |
---|---|---|
md5:5f233fddf52ff536e4907376cd4c706d
|
89.1 MB | Download |