There is a newer version of the record available.

Published August 3, 2022 | Version 1
Dataset Open

Datasets used for Homologous Series Classification: NORMAN-SLE, PubChemLite for Exposomics

Authors/Creators

Contributors

Project member:

  • 1. Friedrich-Schiller-University
  • 2. Luxembourg Centre for Systems Biomedicine

Description

Archive of datasets used for Homologous Series Classification project:

Lai, A., Schaub, J., Steinbeck, C., Schymanski, E. L. An Algorithm to Classify Homologous Series. in prep

 

COCONUT
COCONUT_DB_2021-11.smi was downloaded from https://coconut.naturalproducts.net/download on 2022-03-16 and converted to COCONUT_DB_2021-11.txt by running the following in a UNIX command line:


> tr -s '[:blank:]' ',' <COCONUT_DB_2021-11.smi >COCONUT_DB_2021-11.txt
 

Then, the following header was added to COCONUT_DB_2021-11.txt manually in a text editor: "SMILES, Name"

 

NORMAN-SLE
pubchem_norman_sle_tree_parentcid_98116_2022-03-21_from115115.csv was downloaded from the PubChem Classification Browser's 'NORMAN Suspect List Exchange Classification' https://pubchem.ncbi.nlm.nih.gov/classification/#hid=101 on 2022-03-21 after conversion into CIDs using Operator Type: ‘Parent CID’ in the PubChem Identifier Exchange Service via Entrez to remove salts, charged ions, and mixtures. Further details in the publication.

 

PubChemLite for Exposomics
PubChemLite_exposomics_20220225.csv was downloaded from https://zenodo.org/record/6383860. The copy of this file is unchanged from the Zenodo record, just provided here for completeness.
 

Notes

AL and ELS are funded by Luxembourg National Research Fund (FNR) A18/BM/12341006. JS and CS acknowledge funding from the Carl-Zeiss-Foundation.

Files

COCONUT_DB_2021-11.txt

Files (466.9 MB)

Name Size Download all
md5:6b4adcbc5a05a6923fc37c3522ec2494
30.9 MB Download
md5:c4d7e4c5e8f1b0e36258fdc0dede98fc
30.9 MB Preview Download
md5:ffbe2966465032ddd160e8e8f91206be
214.7 MB Preview Download
md5:74c14f956802d15d304c883c1ab0049d
190.5 MB Preview Download