Published July 25, 2025 | Version v1
Dataset Open

Generated tautomeric forms for NCI structure dataset

  • 1. University of Plovdiv "Paisii Hilendarski"
  • 2. ROR icon Plovdiv University
  • 3. Ideaconsult Ltd

Description

All structures containing more than 60 heavy atoms and more than 4 rings were removed and the basic dataset size is 70 878. They were pre-processed using ChemAxon Standardizer version 5.12.2 including extraction of SMILES linear notation from sdf files, kekulization of aromatic structures, conversion of explicit hydrogen atoms to implicit ones and removal of stereo information. All tautomeric forms for the testing structures were generated by means of Ambit-Tautomer software [https://doi.org/10.1002/minf.201200133], IA-DFS algorithm (incremental approach based on depth-first search) with tautomeric rules for 1.3 and 1.5 hydrogen shifts and removal of topologically equivalent atoms and allene atom. All generated tautomeric forms for the dataset is 1 379 518.

Files

dataset_NCI.csv

Files (85.3 MB)

Name Size Download all
md5:eb97ba19df97ebdeffa9f47dce508315
3.2 MB Preview Download
md5:ac570ead211c9eae009267e475c1f4df
82.0 MB Preview Download