Published February 6, 2018 | Version v1
Dataset Open

Chemical outlier dataset

Creators

Description

The objects are numbered. The Y-variable are boiling points. Other features are structural features of molecules. In the outlier column the outliers are assigned with a value of 1.

The data is derived from a published chemical dataset on boiling point measurements [1] and from public data [2]. Features were generated by means of the RDKit Python library [3]. The dataset was infused with known outliers (~5%) based on significant structural differences, i.e. polar and non-polar molecules.

  1. Cherqaoui D., Villemin D. Use of a Neural Network to determine the Boiling Point of Alkanes. J CHEM SOC FARADAY TRANS. 1994;90(1):97–102.
  2. https://pubchem.ncbi.nlm.nih.gov/
  3. RDKit: Open-source cheminformatics; http://www.rdkit.org

 

Files

Files (46.7 kB)

Name Size Download all
md5:9561cb5d5ec9c677eee2a159200db3e3
46.7 kB Download