Published January 7, 2026 | Version 0.1
Dataset Open

Chemical identifiers and molecular structures for chemicals with known CASRN from PubChem

  • 1. ROR icon Technical University of Denmark

Description

The `pubchem_id.db` is a SQLite database containing ~1.6M PubChem compounds with their identifiers and chemical properties. This database is built from the PubChem_CAS_202601.csv file and provides fast local lookup for identifier conversion.

This version of the database is based on the `csv` file downloaded from https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72 for compounds that have CAS entries in the Pubchem datasets. You can download it yourself by going to "Names and Identifiers -> Other Identifiers -> CAS".

Chemical structure data and identifiers were retrieved from PubChem on 2026-01-07. These data points are considered factual and/or public domain.

Files

Files (2.3 GB)

Name Size Download all
md5:f43da632f5be6c3582db00e65b474da8
2.3 GB Download

Additional details

Dates

Created
2026-01-07

Software

Repository URL
https://github.com/USEtox/PROVESID
Programming language
Python
Development Status
Active