Benchmark Datasets Incorporating Diverse Tasks, Sample Sizes, Material Systems, and Data Heterogeneity for Materials Informatics

Henderson, N. Ashley; Kauwe, K. Steven; Sparks, D. Taylor

doi:10.5281/zenodo.4903958

Published June 5, 2021 | Version v1.0

Dataset Open

Benchmark Datasets Incorporating Diverse Tasks, Sample Sizes, Material Systems, and Data Heterogeneity for Materials Informatics

This benchmark data is comprised of 50 different datasets for materials properties obtained from 16 previous publications. The data contains both experimental and computational data, data suited for regression as well as classification, sizes ranging from 12 to 6354 samples, and materials systems spanning the diversity of materials research. In addition to cleaning the data where necessary, each dataset was split into train, validation, and test splits.

For datasets with more than 100 values, train-val-test splits were created, either with a 5-fold or 10-fold cross-validation method, depending on what each respective paper did in their studies. Datasets with less than 100 values had train-test splits created using the Leave-One-Out cross-validation method.

For further information, as well as directions on how to access the data, please go to the corresponding GitHub repository: https://github.com/anhender/mse_ML_datasets/tree/v1.0

Files

anhender/mse_ML_datasets-v1.0.zip

Files (25.3 MB)

Name	Size	Download all
anhender/mse_ML_datasets-v1.0.zip md5:be19c6a8cce6a2112f40e01a7dfab990	25.3 MB	Preview Download

Additional details

Is supplement to: https://github.com/anhender/mse_ML_datasets/tree/v1.0 (URL)

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	230	228
Downloads	38	38
Data volume	961.2 MB	961.2 MB

Benchmark Datasets Incorporating Diverse Tasks, Sample Sizes, Material Systems, and Data Heterogeneity for Materials Informatics

Creators

Description

Files

anhender/mse_ML_datasets-v1.0.zip

Files (25.3 MB)

Additional details

Related works