There is a newer version of this record available.

Dataset Open Access

PubChemLite tier0 and tier1

Bolton, Evan; Schymanski, Emma

PubChemLite is a subset of PubChem (https://pubchem.ncbi.nlm.nih.gov/) selected from major categories of the Table of Contents page at the PubChem Classification Browser (https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72). So far we are providing two "flavours":

tier0 is 315,843 compounds compiled from 7 categories: AgroChemInfo, DrugMedicInfo, FoodRelated, PharmacoInfo, SafetyInfo, ToxicityInfo, KnownUse

tier1 is 361,976 compounds compiled from 8 categories (tier0 + BioPathway): AgroChemInfo, BioPathway, DrugMedicInfo, FoodRelated, PharmacoInfo, SafetyInfo, ToxicityInfo, KnownUse

PubChemCIDs have been collapsed by InChIKey first block, reporting the structure from the most annotated CID, plus related CIDs. The fingerprint (FP) indicates whether annotation exists for that category for that chemical (categories in the order given in the latter columns of the file). The Patent and PubMed ID counts are extracted from files on the PubChem FTP site. An additional "FPSum" term counts how many entries are in the fingerprint. Entries that will be ignored by MetFrag (salts, disconnected substances) or cause errors (e.g. transition metals) have been removed.

These files can be used "as is" as localCSV for MetFrag Command Line (https://msbi.ipb-halle.de/MetFrag/) - please do NOT upload these files directly to the web interface, they are too large and will be available in a drop-down menu.

Further details will be in a manuscript that is currently under preparation.

Files (328.6 MB)
Name Size
PubChemLite_18Nov2019_tier0.csv
md5:1891d37da8ce62f161b682a0bdd8343f
150.1 MB Download
PubChemLite_18Nov2019_tier1.csv
md5:e95036aae6568aa4e6937c764f0ffadd
178.5 MB Download
533
524
views
downloads
All versions This version
Views 533257
Downloads 524170
Data volume 87.5 GB26.7 GB
Unique views 465234
Unique downloads 386128

Share

Cite as