Prediction of the Cu oxidation state from EELS and XAS spectra using supervised machine learning
Authors/Creators
Description
Dataset accompanying the manuscript "Prediction of the Cu oxidation state from EELS and XAS spectra using supervised machine learning"
Gleason, S.P., Lu, D. & Ciston, J. Prediction of the Cu oxidation state from EELS and XAS spectra using supervised machine learning. npj Comput Mater 10, 221 (2024). https://doi.org/10.1038/s41524-024-01408-1
This dataset is stored as one zip file, which contains several files and subdirectories, which are outlined below:
- The main data file for this paper produced by the authors, "Cu_reproducable_alignment_df_extracted_110222.joblib" which is a serialized pandas dataframe containing all the simulated site averaged XAS spectra, pymatgen structure objects, oxidation state labels, and other chemical and physical identifiers used to simulate the spectra and train and evaluate the ML model discussed in the paper linked to this dataset. This dataset contains ~3500 site averaged simulated XAS spectra of Cu containing materials with several post processing steps developed by the authors to correct systematic errors in the simulation procedure, remove flawed spectra, and prepare the spectra for ML model training. These post processing steps are detailed in the "Methods - Training set generation" second in the associated manuscript.
- Three subdirectories named "xas paper", "Cu_deconvolved_spectra" and "Additional_Literature_Spectra" contain experimental EELS/XAS data either:
- extracted by the authors from the literature (in the case of "xas paper" and "Additional_Literature_Spectra" which are stored as csv files) or
- taken by the authors using the TEAM I microscope at the National Center for Electron Microscopy at Lawrence Berkeley National Laboratory (in the case of "Cu_deconvolved_spectra" stored as dm4 files).
- The subdirectory "Dataset_generation" which contains:
- a pandas dataframe containing ~3700 site averaged XAS spectra simulated using the FEFF9 code base. This database combines the ~1500 site averaged spectra extracted from The Materials Project, and labeled by the authors with the material's oxidation state and other chemical/physical descriptors with ~2200 site averaged spectra simulated by the authors in this work.
- a subdirectory called "FEFF Simulations" which contains additional all site specific simulated spectra used in this work. The zip file "Z=29.zip" and the .joblib file are spectra extracted from the materials project, and the other .zip files contain the ~3500 site specific spectra simulated in by the authors in this work. These are distinct from site averaged spectra, where multiple symmetrically inequivalent absorbing sites from one material are averaged into a representation for the entire material. The raw FEFF outputs and input files are stored as text files in wrapper directories labeled by the materials project ID of the structure. Each feff.in file contains the structure representation in enough detail to build a pymatgen structure object. A processed pandas dataframe, in which the spectra are site averaged, is also included.
- Some other small helper files used to make example figures in the published manuscript.
Abstract (English)
Electron energy loss spectroscopy (EELS) and X-ray absorption spectroscopy (XAS) provide detailed information about
bonding, distributions and locations of atoms, and their coordination numbers and oxidation states. However, analysis of XAS/EELS data often relies on matching an unknown experimental sample to a series of simulated or experimental
standard samples. This limits analysis throughput and the ability to extract quantitative information from a sample. In this work, we have trained a random forest model capable of predicting the oxidation state of copper based on its L-edge spectrum. Our model attains an R2 score of 0.85 and a root mean square error of 0.24 on simulated data. It has also successfully predicted experimental L-edge EELS spectra taken in this work and XAS spectra extracted from the literature. We further demonstrate the utility of this model by predicting simulated and experimental spectra of mixed valence samples generated by this work. This model can be integrated into a real-time EELS/XAS analysis pipeline on mixtures of copper-containing materials of unknown composition and oxidation state. By expanding the training data, this methodology can be extended to data-driven spectral analysis of a broad range of materials.
Files
Data For Supervised Machine Learning Prediction of the Cu Oxidation State from EELS Spectra -20251224T090703Z-3-001.zip
Files
(710.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:1246825b838f77b926d0e58f6ca68e45
|
710.3 MB | Preview Download |
Additional details
Related works
- Is described by
- Publication: https://www.nature.com/articles/s41524-024-01408-1 (URL)
Funding
- United States Department of Energy
- 4D Camera Distillery: From Massive Electron Microscopy Scattering Data to Useful Information with AI/ML
- Office of Basic Energy Sciences
- DE-AC02-05CH11231
- Office of Basic Energy Sciences
- DE-SC0012704
Dates
- Accepted
-
2024-09-17
Software
- Repository URL
- https://github.com/smglsn12/ML_XAS_EELS
- Programming language
- Python
- Development Status
- Active
References
- Jain, A. et al. Commentary: The Materials Project: A Materials Genome Approach To Accelerating Materials Innovation (American Institute of Physics Inc., 2013).
- Chen, Y. et al. Database of ab initio L-edge X-ray absorption near edge structure. Scientific Data https://doi.org/10.1038/s41597-021-00936-5 (2021).
- Carbone, M. R. et al. Lightshow: a Python package for generating computational x-ray absorption spectroscopy input files. J. Open Source Softw. 8, 5182 (2023).
- Rehr, J. J., Kas, J. J., Vila, F. D., Prange, M. P. & Jorissen, K. Parameter-free calculations of X-ray spectra with FEFF9. Phys. Chem. Chem. Phys. 12, 5503–5513 (2010).