Published January 28, 2025 | Version v1
Dataset Open

Datasets and Distillation Labels for the Paper "Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians"

  • 1. ROR icon University of California, Berkeley
  • 1. ROR icon University of California, Berkeley

Description

We provide 6 data folders, which were used in our paper  Amin, I., Raja, S., Krishnapriyan, A.S. (2024). Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians. Accepted to ICLR 2025. arXiv:2501.09009.

md22_JMP_labels.tar.gz - md22 JMP (large and small, finetuned) Hessian Labels for Buckyball Catcher and Double Walled Nanotube splits

SPICE_MaceOFF_labels.tar.gz - SPICE Mace-OFF Hessian Labels

MPtrj_labels.tar.gz -  MPTrj Mace-MP Hessian Labels

spice_separated.tar.gz - SPICE subdatasets (lmdb) (Solvated Amino Acids, Molecules with Iodine, DES370K Monomers)

md22.tar.gz - MD22 datasets (lmdb) for buckyball catcher and double wall nanotube. Taken from the JMP repository (see paper).

MPtrj_separated_all_splits.zip -  MPtrj subdatasets (lmdb) filtered by property (Pm3m Spacegroup, Systems with Yttrium, Bandgap >= 5 meV). 

The original data was taken from the SPICE dataset , MPtrj dataset, and  md22 dataset

The repository for the paper, where these datasets can be used, is available at https://github.com/ASK-Berkeley/MLFF-distill.

If you found any of this useful, please consider citing the paper:

@article{amin2025distilling,
      title={Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians},
      author={Ishan Amin, Sanjeev Raja, and Krishnapriyan, A.S.},
      journal={International Conference on Learning Representations 2025},
      year={2025},
      archivePrefix={arXiv},
      eprint={2501.09009},
}

Files

Files (21.1 GB)

Name Size Download all
md5:2375b5f637d109edcae227d8dab7b78d
70.0 MB Download
md5:47e80768fb02f4b94ad0bd473bc5d7a4
8.8 GB Download
md5:a64cfbb211177075f92a37a78a75f086
10.9 GB Download
md5:3babe0333aa9c5988496b60e7284668c
92.8 MB Download
md5:9c3a371886cbc167bbe7ae69469e706a
1.3 GB Download
md5:21820323edbad65c5f069b4826c25b1b
27.2 MB Download

Additional details

Software

Repository URL
https://github.com/ASK-Berkeley/MLFF-distill
Programming language
Python