Published April 25, 2023
| Version v1
Journal article
Open
Code and data for "Exploiting redundancy in large materials datasets for efficient machine learning with less data"
Creators
- 1. Department of Materials Science and Engineering, University of Toronto, 27 King's College Cir, Toronto, ON, Canada.
- 2. Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD, USA.
- 3. Canmet MATERIALS, Natural Resources Canada, 183 Longwood Road south, Hamilton, ON, Canada.
Description
Code and data for "Exploiting redundancy in large materials datasets for efficient machine learning with less data", published in Nat Commun 14, 7283 (2023).
The code is provided in the codes_2023Jul31.zip file.
There are 6 datasets considered: JARVIS 2018 and 2022 snapshot. Materials Project 2018 and 2021 snapshots. OQMD 2014 and 2021 snapshots. The older snapshot is a subset of the newer snapshot.
For each of them, there is
- a pickle file (dataset_featurized_matminer.pkl) which is a pandas.DataFrame with index being material_id and columns containing the retrived properties (e_form, bandgap, bulk_modulus), metainformation (formula, chemical system etc.), and 273 Matminer features (the last 273 columns). Note that the atomic structures are not stored in this file.
- a pickle file (dataset_pmg_structure.pkl) which is a pandas.Series containing the corresponding pymatgen structure objects.
- a zip file (dataset_cif.zip) which contains the corresponding cif files converted from the corresponding pymatgen structure objects.
Please note that in the paper, we dropped materials whose formation energy is above 5 eV/atom for all the properties/tasks.
Files
jarvis22_cif.zip
Files
(6.5 GB)
Name | Size | Download all |
---|---|---|
md5:98394c71c38d8900ef3f1955c1a26bc4
|
127.8 MB | Download |
md5:b1233a211275bddffafacde791f7902c
|
44.2 MB | Preview Download |
md5:0d16060701e26ea18ef22bdd8bee69bb
|
221.6 MB | Download |
md5:7ca1b3d2285744a8e44afb16395ab854
|
173.6 MB | Download |
md5:448f3164530c6471a15a5d3de6898dcb
|
182.1 MB | Download |
md5:11a8570de88e6ed2066e4ffae36e13d8
|
119.1 MB | Preview Download |
md5:4254ceb6707575f7811e7e52053c941c
|
384.4 MB | Download |
md5:56461c7e47ccc3869a8f6a60919b2cf1
|
713.5 MB | Download |
md5:04172bc5b4a2949cd909778d4bd7f5f6
|
640.1 MB | Download |
md5:3982b475aa5b4689e33a962fffd030d1
|
497.4 MB | Preview Download |
md5:733d271a4f6f3b8275581656af2badb0
|
2.5 GB | Download |
md5:197c6ca367b6375b7799f4eff9061d16
|
911.3 MB | Download |