Published September 20, 2024 | Version v1.0
Dataset Open

Data for: "Comprehensive sampling of coverage effects in catalysis by leveraging generalization in neural network models"

Description

This repository contains the raw data to reproduce the paper: "Comprehensive sampling of coverage effects in catalysis by leveraging generalization in neural network models". Within the .tar.gz file, you will find the directory structure described above.

Directory Structure

`data`

Contains the data to reproduce all figures in the manuscript. Used primarily by the Jupyter Notebooks that plot the data from the paper.

`eval`

Contains the predicted energies according to a MACE model for the following systems and facets:
- covsplit (100, 111, 211, 331, 410, 711): The NN model is trained on low-coverage structures and tested on high-coverage structures for a single facet
- evencov (100, 111, 211, 331, 410, 711): The NN is trained on even coverages and tested on odd coverages for a single facet
- facet (100, 111, 211, 331, 410, 711): the NN is trained on the facet indicated by the folder name (e.g., facet-100 means that the model was trained on Cu(100)) and tested on all of the other facets.
- full: the model was trained on all facets and all coverages
- slopes (various versions and configurations): the models were trained with different body-order correlation (v) for the Cu(711) facet and tested only on the Cu(711) facet
- Rh111: Energies for the Rh(111) + CHOH + CO systems.

`mcmc`

Contains the data for MCMC (Markov Chain Monte Carlo) evaluations for two systems: Cu and Rh
- copper-mcmc-public.tar.gz
- rhodium-mcmc-public.tar.gz

`models`

Contains the weights and parameters of the best-performing MACE models trained in this work, as selected by the validation loss:

File formats: `.model` and `_swa.model` relate to the first-stage of training and the second-stage of training.

`pyscripts`

Python scripts to perform the MCMC sampling given the custom configuration file `sample_cfg.json`.

`scripts`

Shell scripts for evaluation and training the MACE models, along with the hyperparameters used in doing so.

- Evaluation scripts (eval-*.sh)
- Training scripts (train-*.sh)

`train`

Training, validation, and testing data for all Cu and Rh facets in this work, according to the naming scheme described above.

- Rh111
- covsplit
- evencov
- facet
- full
- slopes

Files

Files (545.7 MB)

Name Size Download all
md5:9417d715054a8eb60f1e02e8dcc7dafd
545.7 MB Download

Additional details

Related works

Is supplement to
Preprint: 10.26434/chemrxiv-2023-f6l23 (DOI)
Journal: 10.1039/D4DD00328D (DOI)

Dates

Available
2024-10-14

Software

Repository URL
https://github.com/dskoda/ML-Coverage
Programming language
Python