Published March 3, 2025 | Version PubChemLite-CCSbase-20250228
Dataset Open

PubChemLite for Exposomics + predicted CCS from CCSbase - February 2025

  • 1. ROR icon University of Luxembourg
  • 2. ROR icon National Center for Biotechnology Information
  • 3. University of Washington
  • 4. ROR icon Pacific Northwest National Laboratory

Description

PubChemLite is a subset of PubChem (https://pubchem.ncbi.nlm.nih.gov/) selected from major categories of the Table of Contents page at the PubChem Classification Browser (https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72). This version of PubChemLite for Exposomics has predicted collision cross section (CCS) values for 8 adducts provided by Libin Xu and team at CCSbase (https://ccsbase.net/).

PubChemLite exposomics is compiled from 10 categories: AgroChemInfo, BioPathway, DrugMedicInfo, FoodRelated, PharmacoInfo, SafetyInfo, ToxicityInfo, KnownUse, DisorderDisease, Identification

CCS adducts provided are: [M+H]+, [M+K]+, [M+NH4]+, [M+Na-2H]-, [M+Na]+, [M-H]-, [M]+, [M]-

Details on the CCS prediction are given here: Ross et al. (2020) Analytical Chemistry, DOI: 10.1021/acs.analchem.9b05772

PubChemLite is described in Schymanski et al. (2021) J. Cheminformatics, DOI: 10.1186/s13321-021-00489-0

An article describing these joint efforts is available: Elapavalore et al. (2025) ES&T Letters, DOI: 10.1021/acs.estlett.4c01003

PubChemCIDs have been collapsed by InChIKey first block, reporting the structure from the most annotated CID, plus related CIDs. Entries that will be ignored by MetFrag (salts, disconnected substances) or cause errors (e.g. transition metals) have been removed. The Patent and PubMed ID counts are extracted from files on the PubChem FTP site. The "AnnoTypeCount" term counts how many of the categories are represented, the subsequent column (named per category) counts the number of annotation categories available in the next sub-category of the TOC entry.

These files can be used "as is" as localCSV for MetFrag Command Line (https://ipb-halle.github.io/MetFrag/) - please do NOT upload these files directly to the web interface, they are too large and will be available in a drop-down menu.

Further details are described in Schymanski et al. (2021) DOI:10.1186/s13321-021-00489-0 and Elapavalore et al. (2025) DOI: 10.1021/acs.estlett.4c01003

NOTE: The latest PubChemLite for Exposomics version can be downloaded at DOI:10.5281/zenodo.5995885 (currently updating monthly). This file will be updated shortly after. 

Please cite this data source and Elapavalore et al. (2025) DOI: 10.1021/acs.estlett.4c01003 when using this dataset.

Notes

Please cite this data source, the CCSbase and PubChemLite papers when using this data! More details under DOI: 10.1021/acs.estlett.4c01003

Files

PubChemLite_CCSbase_20250228.csv

Files (194.2 MB)

Name Size Download all
md5:fbbfdf415aa78458443b8aab52c2c519
194.2 MB Preview Download

Additional details