Published May 27, 2023 | Version v1
Dataset Restricted

SMILES-18 dataset

Description

Dataset of organic molecules encoded as SMILES strings with 18,322,500 records collected from the Pubchem database.

List of characters included in the dataset:

Description SMILES Characters
Atoms  "C", "O", "N", "P", "S", "F", "Cl", "Br", "I", "Si", "B"
Branches  "(", ")"
Rings "1", "2", "3", "4", "5", "6", "7", "8", "9"
Bonds  "=", "#"
Ions  "+", "-"
Stereochemistry  "/", "\"
Miscellaneous  "[", "]"

 

For further information, please refer to the paper published on the Journal of Chemical Information and Modeling (https://doi.org/10.1021/acs.jcim.3c01548)

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

You need to satisfy these conditions in order for this request to be accepted:

This dataset is associated with a manuscript that is being considered for publication and we would like to keep it confidential until the end of the process. We are happy to provide access during the peer review process on the understanding that it will remain confidential and will only be used for peer review purposes.

You are currently not logged in. Do you have an account? Log in here