A database of vacancy formation enthalpies for materials discovery
- 1. Sandia National Laboratories
- 2. National Renewable Energy Laboratory
- 3. Lawrence Livermore National Laboratory
Description
A database of vacancy formation enthalpies for materials discovery
Matthew Witmana, Anuj Goyalb, Tadashi Ogitsuc, Anthony McDaniela, Stephan Lanyb
a Sandia National Laboratories, b National Renewable Energy Laboratory, c Lawrence Livermore National Laboratories
Abstract
This dataset provides DFT calculations of cation and oxygen vacancy defects in oxides which can be used to derive efficient data-driven models for vacancy formation enthalpy. DFT calculations were performed as described in <DOI: 10.26434/chemrxiv-2022-frcns>, where a graph neural network surrogate model was trained and used to screen the Materials Project for promising solar thermochemical water splitting materials. The data, models, scripts and code needed to reproduce the results in <DOI: 10.26434/chemrxiv-2022-frcns> are described below.
Data & Models
1) data_01_03_22/* corresponds to oxide compounds used in model training
2) known_cmpds/* corresponds to known STCH compounds
3) screeningMP/* corresponds to the screening related data
- screening_inelements/* stores only Materials Project oxides whose composition is a subset of the training elements and contains all the vacancy defect predictions
- MP_O_PDs/* stores offline PDs from Materials Project so that adjusting oxide stability metrics can be done somewhat rapidly
- MP_O_Compounds/* stores possible MP oxide compounds to screen
In general, the above folders contain:
- DFT data/structures are included in sub-directories: poscars, magnetic moments, oxidation states, and csvs (containing the vacancy enthalpy for each unique site)
- cgcnn/* contains the processed DFT data for use in the CGCNN code (see Scripts for how to prepare this)
- id_prop.csv.* contains [cif name, defect formation enthalpy] pairs
- Different id_prop.csv.* files correspond to different K-fold stratifications
- in the screening directory, defect formation enthalpy is omitted since it has not been computed with DFT
- model-(X1)k(X2)_(X3)_(X4) corresponds to different CV models for
- X1 different training set sizes (i.e., try to train with only 10%, 40%, or 100% of the data)
- X2 different k folds
- X3 = "struct" or "" for "structure-wise validation" or "defect-wise validation", respectively
- X4 for different encoding strategies
- structure X-Yz.cif indicates structure X, defect element Y, symmetry site z, where one instance of that site has been re-ordered to be the first atom in the cif file
- *.locals contains a one-hot encoding of oxidation states of all sites in that crystal
- *.locals_continuous contains a continuous encoding of oxidation state in that crystal
- *.globals contains global properties of the host structure
- id_prop.csv.* contains [cif name, defect formation enthalpy] pairs
Scripts
- scripts/*.sh scripts to rerun the screenings for different k-folds, encodings, etc.
- scripts/*.ipynb to analyze results
- scripts/prepare_cgcnn.py for translating the data in (poscars/*, csvs/*, oxstate/*, mags/*) to the ML input needed in cgcnn/*
Code
- Install CGCNN and its defect modifications from https://github.com/mwitman1/cgcnndefect
Questions/Collaborations
- Please contact mwitman@sandia.gov
Acknowledgements
- This material is based upon work supported by the U.S. Department of Energy (DOE), Office of Energy Efficiency and Renewable Energy (EERE), specifically the Hydrogen and Fuel Cell Technologies Office. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. Part of the work was performed under the auspices of the US Department of Energy by Lawrence Livermore National Laboratory under contract No.~DE- AC52-07NA27344. The National Renewable Energy Laboratory (NREL) is operated by the Alliance for Sustainable Energy, LLC, for the DOE under Contract No.~DE-AC36-08GO28308. This work used High-Performance Computing resources at NREL, sponsored by DOE-EERE. The views expressed in this article do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
Files
Files
(2.9 GB)
Name | Size | Download all |
---|---|---|
md5:975a346dcf71c511ecfdfccd882024c6
|
2.9 GB | Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.26434/chemrxiv-2022-frcns (DOI)