Dataset Open Access

GDB Databases

Tobias Fink; Lorenz C. Blum; Lars Ruddigkeit; Ruud van Deursen; Jean-Louis Reymond

About

GDB-11 enumerates small organic molecules up to 11 atoms of C, N, O and F following simple chemical stability and synthetic feasibility rules.
GDB-13 enumerates small organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules. With 977 468 314 structures, GDB-13 is the largest publicly available small organic molecule database to date.

How to cite

To cite GDB-11, please reference:

Virtual exploration of the chemical universe up to 11 atoms of C, N, O, F: assembly of 26.4 million structures (110.9 million stereoisomers) and analysis for new ring systems, stereochemistry, physico-chemical properties, compound classes and drug discovery. Fink, T.; Reymond, J.-L. J. Chem. Inf. Model. 2007, 47, 342-353.

Virtual Exploration of the Small Molecule Chemical Universe below 160 Daltons. Fink, T.; Bruggesser, H.; Reymond, J.-L. Angew. Chem. Int. Ed. 2005, 44, 1504-1508.

To cite GDB-13, please reference:

970 Million Druglike Small Molecules for Virtual Screening in the Chemical Universe Database GDB-13. Blum L. C.; Reymond J.-L. J. Am. Chem. Soc., 2009, 131, 8732-8733.

To cite GDB-17, please reference:

Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Ruddigkeit Lars, van Deursen Ruud, Blum L. C.; Reymond J.-L. J. Chem. Inf. Model., 2012, 52, 2864-2875.

Download

You can download the databases and subsets of it using the links provided. All the molecules are stored in dearomatized, canonized SMILES format and compressed as tar/gz archive (for Windows users: Download 7-zip to open archives).


GDB-17
GDB-17-Set (50 million)     GDB17.50000000.smi.gz     314 MB
Lead-like Set (100-350 MW & 1-3 clogP)(11 million)     GDB17.50000000LL.smi.gz     75 MB
Lead-like Set (100-350 MW & 1-3 clogP) without small rings (3-4 ring atoms)(0.8 million)   GDB17.50000000LLnoSR.smi.gz     55 MB

GDB-13
Entire GDB-13 (including all C/N/O/Cl/S molecules)     gdb13.tgz     2.6 GB
GDB-13 Subsets (The sum of all the subsets below correspond to the entire GDB-13 above)
Graph subset (saturated hydrocarbons)     gdb13.g.tgz     1.1 MB
Skeleton subset (unsaturated hydrocarbons)     gdb13.sk.tgz     14 MB
Only carbon & nitrogen containing molecules     gdb13.cn.tgz     443 MB
Only carbon & oxygen containing molecules     gdb13.co.tgz     299 MB
Only carbon & nitrogen & oxygen containing molecules     gdb13.cno.tgz     1.8 GB
Chlorine & sulphur containing molecules     gdb13.cls.tgz     189 MB

GDB-13 Subsets (For details please refer to the Table 2 in J Comput Aided Mol Des 2011 25:637 to 647)
GDB-13 Subset AB (~635 Millions)     AB.smi.gz     2.4 GB
GDB-13 Subset ABC (~441 Millions)     ABC.smi.gz     1.7 GB
GDB-13 Subset ABCD (~277 Millions)     ABCD.smi.gz     1.1 GB
GDB-13 Subset ABCDE (~140 Millions)     ABCDE.smi.gz     565 MB
GDB-13 Subset ABCDEF (~43 Millions)     ABCDEF.smi.gz     171 MB
GDB-13 Subset ABCDEFG (~13 Millions)     ABCDEFG.smi.gz     50 MB
GDB-13 Subset ABCDEFGH (~1.4 Millions)     ABCDEFGH.smi.gz     6.2 MB
GDB-13 Random Sample. Annotated with frequency and log-likelihood (Please refer to Exploring the GDB-13 chemical space using deep generative models)
GDB-13 Random Sample (1 Million)     gdb13.1M.freq.ll.smi.gz     14.8 MB

FDB-17
FDB-17     FDB-17-fragmentset.smi.gz     62.2 MB


GDB4c
GDB4c (SMILES)     GDB4c.smi.gz     6.2 MB
GDB4c3D (SMILES)     GDB4c3D.smi.gz     161 MB
GDB4c3D (SDF)     GDB4c3D.sdf.tar.gz     2 GB


Other
GDBMedChem (SMILES)     GDBMedChem.smi     276 MB
GDBChEMBL (SMILES)     GDBChEMBL.smi     353.6 MB
GDB-13 random selection (1 million)     gdb13.rand1M.smi.gz     7.2 MB
Fragment-like subset (Rule of three)     gdb13.frl.tgz     1.2 GB
Dark matter universe up to 9 heavy atoms     dmu9.tgz     87 MB

GDB-11
Entire GDB-11 (including all C/N/O/F molecules)     gdb11.tgz     122 MB
Fragrance Like Subsets: For details please refer to Ruddigkeit et al. Journal of Cheminformatics 2014, 6:27
FragranceDB (SuperScent + Flavornet)     FragranceDB.smi     56 KB
TasteDB (SuperSweet + BitterDB)     TasteDB.smi     44 KB
FragranceDB.FL (Fragrance-like subset of FragranceDB)     FragranceDB.FL.smi     32 KB
ChEMBL.FL (Fragrance-like subset of ChEMBL)     ChEMBL.FL.smi     452 KB
PubChem.FL Fragrance-like subset of PubChem     PubChem.FL.smi     20 MB
ZINC.FL (Fragrance-like subset of ZINC)     ZINC.FL.smi     1.3 MB
GDB-13.FL (Fragrance-like subset of GDB-13)     GDB-13.FL.smi.gz     165 MB

Terms and conditions: The GDB databases may be downloaded free of charge. In published research involving GDB, cite the appropriate references mentioned above. GDB must not be used as part of or in patents. GDB and large portions thereof must not be redistributed without the express written permission of Jean-Louis Reymond.

Files (17.2 GB)
Name Size
ChEMBL.FL.smi
md5:6a60044a4c744ff699cda2d0b05a2283
457.2 kB Download
dmu9.tgz
md5:0eda0e2983a8957153b52d81dc82e289
90.4 MB Download
FDB-17-fragmentset.smi.gz
md5:258b948a683fd35196f559aa5b7fa957
65.2 MB Download
FragranceDB.FL.smi
md5:f38fa3339acc28ba3d520856f60a512f
29.8 kB Download
FragranceDB.smi
md5:73d518abccea673932a59410baf8b6ae
50.8 kB Download
GDB-13.FL.smi.gz
md5:c78ba315f6eccae6b06c9af4f271bf5e
172.0 MB Download
gdb11.tgz
md5:23be109329ca081c3fce5cff02a0a5c9
122.0 MB Download
gdb13.1M.freq.ll.smi.gz
md5:766f3c9ce8b1499bb223ad88ee785586
15.6 MB Download
gdb13.cls.tgz
md5:3a84ee6ff6fb6b0ac829fb187faf5986
198.1 MB Download
gdb13.cn.tgz
md5:49e806acd6eace5a60351af4b30aa14b
464.2 MB Download
gdb13.cno.tgz
md5:759cc2345f94f94aafd8157a78ad863c
1.8 GB Download
gdb13.co.tgz
md5:caf8e9b4e04f2e87b8745934a6064dfd
313.0 MB Download
gdb13.frl.tgz
md5:1b75ed723364b1a5ab767640a48adee7
1.3 GB Download
gdb13.g.tgz
md5:32e62e549a735da5fca3ec26d3ec768f
1.1 MB Download
gdb13.rand1M.smi.gz
md5:9535d66da275194ecd7edde77e39db33
7.5 MB Download
gdb13.sk.tgz
md5:dd96c04e422e76174b13133850f4c3e4
14.5 MB Download
gdb13.tgz
md5:756b3a359bb653dec1aad6cb0b8150aa
2.8 GB Download
GDB13_Subset-AB.smi.gz
md5:9b6031322f6d4e709be48df18cc2daf1
2.6 GB Download
GDB13_Subset-ABC.smi.gz
md5:d39e9770901f6dcd91b7bfe1cfed92eb
1.8 GB Download
GDB13_Subset-ABCD.smi.gz
md5:e83d7e450b71bb4d44bd24c4b9b31eb7
1.1 GB Download
GDB13_Subset-ABCDE.smi.gz
md5:930e81695e4d7bbedc7743d1bbaf03bf
591.0 MB Download
GDB13_Subset-ABCDEF.smi.gz
md5:74a887dae58659b6d1f2216aa9f702bc
178.8 MB Download
GDB13_Subset-ABCDEFG.smi.gz
md5:8b6d19bdbf68c6a6c4a072e0ff1711d9
52.1 MB Download
GDB13_Subset-ABCDEFGH.smi.gz
md5:13c2cda3d40d8808ced5026aeb3a3a01
6.5 MB Download
GDB17.50000000.smi.gz
md5:0e307987b8c970184c34ce51a8beb1ac
328.8 MB Download
GDB17.50000000LL.smi.gz
md5:6e2222dd95391d2e1bddf9f2414fe75e
77.8 MB Download
GDB17.50000000LLnoSR.smi.gz
md5:f8147fe50ec98ae8f24d45bad0c31b7e
56.7 MB Download
GDB4c.smi.gz
md5:5951395313159763236cf07e20549389
6.4 MB Download
GDB4c3D.sdf.tar.gz
md5:c5bfa22f813a4a7d2a7dc86fed2679ce
2.2 GB Download
GDB4c3D.smi.gz
md5:87ffcee18e7221dd80c01ba613a7f586
165.2 MB Download
GDBChEMBL.smi
md5:9534659e97ac71835ea440794a360c8e
289.8 MB Download
GDBMedChem.smi
md5:afcb6f6092b844413525f4a8abdff17d
353.6 MB Download
PubChem.FL.smi
md5:91f6658d545495e3aa5ee85e26d50c5e
20.9 MB Download
TasteDB.smi
md5:6ab60aaca88e20b4aeee17b6c87a9f36
41.3 kB Download
ZINC.FL.smi
md5:dd7f296ca02c91ea4a61970b082d81dc
2.2 MB Download
  • Fink, T.; Reymond, J.-L. J. Chem. Inf. Model. 2007, 47, 342-353.

  • Fink, T.; Bruggesser, H.; Reymond, J.-L. Angew. Chem. Int. Ed. 2005, 44, 1504-1508.

  • Blum L. C.; Reymond J.-L. J. Am. Chem. Soc., 2009, 131, 8732-8733.

  • Ruddigkeit L., van Deursen R., Blum L. C.; Reymond J.-L. J. Chem. Inf. Model., 2012, 52, 2864-2875.

68
1,357
views
downloads
All versions This version
Views 6868
Downloads 1,3571,357
Data volume 799.5 GB799.5 GB
Unique views 6060
Unique downloads 928928

Share

Cite as