Datasets used for Homologous Series Classification: NORMAN-SLE, PubChemLite for Exposomics
Authors/Creators
- 1. Friedrich-Schiller-University
- 2. Luxembourg Centre for Systems Biomedicine
Description
Archive of datasets used for Homologous Series Classification project:
Lai, A., Schaub, J., Steinbeck, C., Schymanski, E. L. An Algorithm to Classify Homologous Series. in prep
COCONUT
COCONUT_DB_2021-11.smi was downloaded from https://coconut.naturalproducts.net/download on 2022-03-16 and converted to COCONUT_DB_2021-11.txt by running the following in a UNIX command line:
> tr -s '[:blank:]' ',' <COCONUT_DB_2021-11.smi >COCONUT_DB_2021-11.txt
Then, the following header was added to COCONUT_DB_2021-11.txt manually in a text editor: "SMILES, Name"
NORMAN-SLE
pubchem_norman_sle_tree_parentcid_98116_2022-03-21_from115115.csv was downloaded from the PubChem Classification Browser's 'NORMAN Suspect List Exchange Classification' https://pubchem.ncbi.nlm.nih.gov/classification/#hid=101 on 2022-03-21 after conversion into CIDs using Operator Type: ‘Parent CID’ in the PubChem Identifier Exchange Service via Entrez to remove salts, charged ions, and mixtures. Further details in the publication.
PubChemLite for Exposomics
PubChemLite_exposomics_20220225.csv was downloaded from https://zenodo.org/record/6383860. The copy of this file is unchanged from the Zenodo record, just provided here for completeness.