Published January 7, 2026
| Version 0.1
Dataset
Open
Chemical identifiers and molecular structures for chemicals with known CASRN from PubChem
Description
The `pubchem_id.db` is a SQLite database containing ~1.6M PubChem compounds with their identifiers and chemical properties. This database is built from the PubChem_CAS_202601.csv file and provides fast local lookup for identifier conversion.
This version of the database is based on the `csv` file downloaded from https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72 for compounds that have CAS entries in the Pubchem datasets. You can download it yourself by going to "Names and Identifiers -> Other Identifiers -> CAS".
Chemical structure data and identifiers were retrieved from PubChem on 2026-01-07. These data points are considered factual and/or public domain.
Files
Files
(2.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:f43da632f5be6c3582db00e65b474da8
|
2.3 GB | Download |
Additional details
Dates
- Created
-
2026-01-07
Software
- Repository URL
- https://github.com/USEtox/PROVESID
- Programming language
- Python
- Development Status
- Active