There is a newer version of this record available.

Dataset Open Access

PubChemLite tier1 + predicted CCS from CCSbase

LCSB-ECI; Schymanski, Emma; Kondic, Todor; PubChem Team; Bolton, Evan; Thiessen, Paul; Zhang, Jeff; CCSbase Team; Krinsky, Ally; Ross, David H.; Xu, Libin

PubChemLite is a subset of PubChem (https://pubchem.ncbi.nlm.nih.gov/) selected from major categories of the Table of Contents page at the PubChem Classification Browser (https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72). This version of tier1 (see original dataset here: DOI 10.5281/zenodo.3548653) has predicted collision cross section (CCS) values for 6 adducts provided by Libin Xu and team at CCSbase (https://ccsbase.net/).

tier1 is 363,911 compounds (14 Jan 2020) compiled from 8 categories: AgroChemInfo, BioPathway, DrugMedicInfo, FoodRelated, PharmacoInfo, SafetyInfo, ToxicityInfo, KnownUse

CCS adducts provided are: [M+H]+, [M+K]+, [M+NH4]+, [M+Na-2H]-, [M+Na]+, [M-H]-

Details on the CCS prediction are given here: Ross, D. H., Cho, J. H. & Xu, L. Anal. Chem. (2020). doi:10.1021/acs.analchem.9b05772

PubChemCIDs have been collapsed by InChIKey first block, reporting the structure from the most annotated CID, plus related CIDs. Entries that will be ignored by MetFrag (salts, disconnected substances) or cause errors (e.g. transition metals) have been removed. The Patent and PubMed ID counts are extracted from files on the PubChem FTP site. The "AnnoTypeCount" term counts how many of the categories are represented, the subsequent column (named per category) counts the number of annotation categories available in the next sub-category of the TOC entry.

These files can be used "as is" as localCSV for MetFrag Command Line (https://ipb-halle.github.io/MetFrag/) - please do NOT upload these files directly to the web interface, they are too large and will be available in a drop-down menu.

Further details on PubChemLite will be in a manuscript that is currently under preparation.

Please cite this data source and the CCS article DOI: 10.1021/acs.analchem.9b05772 when using this dataset.

Please cite this data source and the CCSbase paper when using this data!
Files (209.5 MB)
Name Size
PubChemLite_14Jan2020_tier1_CCSbase.csv
md5:66be0f164c8d01b8369392c7d7575709
209.5 MB Download
354
298
views
downloads
All versions This version
Views 354211
Downloads 298140
Data volume 60.3 GB29.3 GB
Unique views 302187
Unique downloads 246120

Share

Cite as