Datasets and Distillation Labels for the Paper "Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians"
Contributors
Project members:
Description
We provide 6 data folders, which were used in our paper Amin, I., Raja, S., Krishnapriyan, A.S. (2024). Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians. Accepted to ICLR 2025. arXiv:2501.09009.
md22_JMP_labels.tar.gz - md22 JMP (large and small, finetuned) Hessian Labels for Buckyball Catcher and Double Walled Nanotube splits
SPICE_MaceOFF_labels.tar.gz - SPICE Mace-OFF Hessian Labels
MPtrj_labels.tar.gz - MPTrj Mace-MP Hessian Labels
spice_separated.tar.gz - SPICE subdatasets (lmdb) (Solvated Amino Acids, Molecules with Iodine, DES370K Monomers)
md22.tar.gz - MD22 datasets (lmdb) for buckyball catcher and double wall nanotube. Taken from the JMP repository (see paper).
MPtrj_separated_all_splits.zip - MPtrj subdatasets (lmdb) filtered by property (Pm3m Spacegroup, Systems with Yttrium, Bandgap >= 5 meV).
The original data was taken from the SPICE dataset , MPtrj dataset, and md22 dataset
The repository for the paper, where these datasets can be used, is available at https://github.com/ASK-Berkeley/MLFF-distill.
If you found any of this useful, please consider citing the paper:
@article{amin2025distilling,
title={Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians},
author={Ishan Amin, Sanjeev Raja, and Krishnapriyan, A.S.},
journal={International Conference on Learning Representations 2025},
year={2025},
archivePrefix={arXiv},
eprint={2501.09009},
}
Files
Files
(21.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:2375b5f637d109edcae227d8dab7b78d
|
70.0 MB | Download |
|
md5:47e80768fb02f4b94ad0bd473bc5d7a4
|
8.8 GB | Download |
|
md5:a64cfbb211177075f92a37a78a75f086
|
10.9 GB | Download |
|
md5:3babe0333aa9c5988496b60e7284668c
|
92.8 MB | Download |
|
md5:9c3a371886cbc167bbe7ae69469e706a
|
1.3 GB | Download |
|
md5:21820323edbad65c5f069b4826c25b1b
|
27.2 MB | Download |
Additional details
Software
- Repository URL
- https://github.com/ASK-Berkeley/MLFF-distill
- Programming language
- Python