There is a newer version of the record available.

Published March 27, 2025 | Version v1
Dataset Open

BigSolDB 2.0: A Comprehensive Dataset of Solubility Values for Organic Compounds in Organic Solvents and Water at Various Temperatures

  • 1. N.S. Kurnakov Institute of General and Inorganic Chemistry, Moscow, 119991, Russia
  • 2. Department of Chemistry, Lomonosov Moscow State University, 119991 Moscow, 1 Leninskiye Gory, Russia
  • 3. Department of Pharmaceutical Sciences, University of Vienna, Josef-Holaubek-Platz 2, 1090 Vienna, Austria

Description

If you use this dataset, please cite our paper: https://doi.org/10.1038/s41597-025-05559-8

BigSolDB v.2.0 contains 103944 experimentally measured solubility values of 1448 organic compounds in 213 solvents reported in the 1595 literature peer-reviewed articles

The 12 columns of this dataset are explained as follows:

  1. SMILES_Solute — SMILES representation of the solute molecule
  2. Temperature_K — temperature for the reported solubility value, K
  3. Solvent — solvent name
  4. SMILES_Solvent — SMILES representation of the solvent molecule
  5. Solubility(mole_fraction) — the reported solubility value expressed in mole fraction of solute
  6. Solubility(mol/L) — the recalculated solubility value expressed in molar concentration of solute (mol/L)
  7. LogS(mol/L) — decimal logarithm of the recalculated solubility value expressed in molar concentration of solute (mol/L)
  8. Compound_Name — solute name
  9. CAS — solute CAS number
  10. PubChem_CID — solute PubChem_CID
  11. FDA_Approved — designation if the solute is a FDA approved drug. ‘Yes’ is stated for FDA approved drugs while ‘No’ is stated for others.
  12. Source — DOI of a data source for given values

An addition dataset of solvents densities values is also presented.

The 4 columns of this dataset are explained as follows:

  1. Solvent — solvent name
  2. Temperature_K — temperature for the reported density value, K
  3. Density_g/cm^3 – the reported density value
  4. Source — data source for given values

An additional note - the values in columns 'Solubility(mol/L)' and 'LogS(mol/L)' were recalculated from the mole fraction values reported in the source articles ('Solubility(mole_fraction)' column), using the densitites of solvents expressed in BigSolDBv2.0_densities.csv.

Online visualization and search across the dataset are available here: https://bigsoldb.streamlit.app/

Files

BigSolDBv2.0.csv

Files (18.4 MB)

Name Size Download all
md5:1e6d8a510387746e9faab5f39dc46317
18.3 MB Preview Download
md5:059ae2deadb2ad1523fd4e24b2c88133
120.0 kB Preview Download

Additional details

Related works

Is described by
Journal article: 10.1038/s41597-025-05559-8 (DOI)

Funding

NS Kurnakova Institute of General and Inorganic Chemistry
Program for Fundamental Research of the N.S. Kurnakov Institute of General and Inorganic Chemistry of the Russian Academy of Sciences 1021071612866-5-1.4.7