Published March 4, 2024 | Version v2
Dataset Open

MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials Modeling

  • 1. Intel Labs
  • 2. Intel AXG
  • 3. Vector Institute

Description

We propose MatSci ML, a novel benchmark for modeling MATerials SCIence using Machine Learning methods focused on solid-state materials with periodic crystal structures. Applying machine learning methods to solid-state materials is a nascent field with substantial fragmentation largely driven by the great variety of datasets used to develop machine learning models. This fragmentation makes comparing the performance and generalizability of different methods difficult, thereby hindering overall research progress in the field. Building on top of open-source datasets, including large-scale datasets like the OpenCatalyst Project, OQMD, NOMAD, the Carolina Materials Database, and Materials Project, the MatSci ML benchmark provides a diverse set of materials systems and properties data for model training and evaluation, including simulated energies, atomic forces, material bandgaps, as well as classification data for crystal symmetries via space groups. The diversity of properties in MatSci ML makes the implementation and evaluation of multi-task learning algorithms for solid-state materials possible, while the diversity of datasets facilitates the development of new, more generalized algorithms and methods across multiple datasets. In the multi-dataset learning setting, MatSci ML enables researchers to combine observations from multiple datasets to perform joint prediction of common properties, such as energy and forces. Using MatSci ML, we evaluate the performance of different graph neural networks and equivariant point cloud networks on several benchmark tasks spanning single task, multitask, and multi-data learning scenarios. Our open-source code is available at https://github.com/IntelLabs/matsciml.

Files

carolina_matdb.zip

Files (6.3 GB)

Name Size Download all
md5:26c91b708aaaf196b6ab7a05ebd47dca
119.5 MB Preview Download
md5:8818d02ab3027e8f8741caff3b5104ec
61.4 MB Preview Download
md5:325b37ac615bf93a8b7cdf4b6fe917c3
4.8 GB Preview Download
md5:f31b9920f68d21462d03406931d00907
162.1 MB Preview Download
md5:ea750fb04edb7cc3a128bb6d4747ba79
353.2 MB Preview Download
md5:db208df2c9a805deebddf917d7b2230e
300.8 MB Preview Download
md5:13ee568ae539a8d206e932e617727330
465.9 MB Preview Download

Additional details

Related works

Is published in
Preprint: https://arxiv.org/abs/2309.05934 (URL)