Published May 27, 2023
| Version v1
Dataset
Restricted
SMILES-18 dataset
Creators
Description
Dataset of organic molecules encoded as SMILES strings with 18,322,500 records collected from the Pubchem database.
List of characters included in the dataset:
Description | SMILES Characters |
---|---|
Atoms | "C", "O", "N", "P", "S", "F", "Cl", "Br", "I", "Si", "B" |
Branches | "(", ")" |
Rings | "1", "2", "3", "4", "5", "6", "7", "8", "9" |
Bonds | "=", "#" |
Ions | "+", "-" |
Stereochemistry | "/", "\" |
Miscellaneous | "[", "]" |
For further information, please refer to the paper published on the Journal of Chemical Information and Modeling (https://doi.org/10.1021/acs.jcim.3c01548)