PubChem CID-SMILES topology classification snapshot
Authors/Creators
Description
Topology annotations for the current PubChem CID-SMILES snapshot.
The Parquet artifact stores one row per PubChem CID with connected-component counts, exact diameters for connected molecules, triangle and square motif counts, mean local and square clustering coefficients, and the following topology predicates computed with smiles-parser and geometric-traits: tree, forest, cactus, chordal, planar, outerplanar, k23_homeomorph, k33_homeomorph, k4_homeomorph, bipartite.
The JSON sidecar stores aggregate counts, parse and topology error totals, and run metadata, while the SVG infographic provides an accessible visual summary of the run. Source snapshot URL: https://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz.
Notes
Files
pubchem-topology-summary.json
Files
(569.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:b29c7c0ec3ad575cdb7f725729e4d91d
|
44.3 kB | Download |
|
md5:3e6657d138c99b902d3570e64320d3cd
|
63.2 kB | Preview Download |
|
md5:036e60b44623e31adb7e101c6e0b1f32
|
569.0 MB | Download |