Dataset Open Access

Chemical outlier dataset

Mario Lovric

The objects are numbered. The Y-variable are boiling points. Other features are structural features of molecules. In the outlier column the outliers are assigned with a value of 1.

The data is derived from a published chemical dataset on boiling point measurements [1] and from public data [2]. Features were generated by means of the RDKit Python library [3]. The dataset was infused with known outliers (~5%) based on significant structural differences, i.e. polar and non-polar molecules.

  1. Cherqaoui D., Villemin D. Use of a Neural Network to determine the Boiling Point of Alkanes. J CHEM SOC FARADAY TRANS. 1994;90(1):97–102.
  2. https://pubchem.ncbi.nlm.nih.gov/
  3. RDKit: Open-source cheminformatics; http://www.rdkit.org

 

Files (46.7 kB)
Name Size
for_pub_chem_outlier_dataset.xlsx
md5:9561cb5d5ec9c677eee2a159200db3e3
46.7 kB Download
35
8
views
downloads
All versions This version
Views 3535
Downloads 88
Data volume 373.8 kB373.8 kB
Unique views 3434
Unique downloads 88

Share

Cite as