Published January 20, 2026 | Version v1
Dataset Open

Supplementary data for MAD-SURF: a machine learning interatomic potential for molecular adsorption on coinage metal surfaces

  • 1. Departamento de Física Teórica de la Materia Condensada, Universidad Autónoma de Madrid ES
  • 2. Department of Applied Physics, Aalto University
  • 3. Department of Chemistry and Materials Science, Aalto University

Description

This repository contains the data used to train the MAD-SURF potential, a machine learning interatomic potential based on the MACE-architecture tailored for the adsorption of molecules on coinage metals. The dataset has been acquired with a consistent DFT method: PBE with the Tkatchenko-Scheffler van der Waals correction employing C6 dipole-dipole dispersion coefficients specifically parametrized for surface screening. The dataset was acquired using a multi-source strategy to ensure structurally diverse data, including active learning for adsorption structures, molecular dynamics for distorted geometries, automated interaction site screening for intermolecular interactions, and normal mode sampling for intramolecular interactions. The repository also contains the structures and trajectories needed to produce the Figures in the corresponding MAD-SURF article, and finally, the trained model potentials. 

 

Brief description of file contents:

  • dataset.zip: contains large and and small training and test datasets for MAD-SURF
  • dataset_with_config_types.zip: contains the full dataset with annotated entries according to source type, i.e. molecular dynamics, bayesian optimization structure search, normal mode sampling, etc. Can be used to provide custom weights based on data type during (re)training.
  • models.zip: all of the potentials trained during this work. The MAD-SURF model used in the article is also uploaded here as MAD-SURF.model, corresponding to the same model as in models/finetuned/foundational_model/on_training_subset/
  • fig_3_aggregated_aromatic_hydrocarbons_cu111.zip: the relaxed structures of the hydrocarbon aggregates on Cu111.
  • fig_4_organic_monolayers.zip: the relaxed structures of the herringbone (HB) and brick wall (BW) monolayer phases of PTCDA and Pentacene on Cu111, Ag111 and Ag110.
  • fig_5_beta_cyclodextrin_au111.zip: the relaxed structures of the primary and secondary faces in both clockwise and counter-clockwise hydrogen bonding networks.
  • fig_6_Au_herringbone_reconstruction.zip: the LAMMPS input and output files for the relaxation of the Au111 herringbone reconstruction model, one with the bottom layer fixed and one where it is fully relaxed, as well 
  • fig_7_pentacene_large_scale_dynamics.zip: the LAMMPS input and output files for the large scale molecular dynamics simulations of the random tiling pentacene monolayer containing 118 pentacene molecules on a 13.27×13.71 nm^2 Cu111 substrate. The archive also contains the MAD-SURF model converted to LAMMPS type potential (.pt). However, it is recommended that the LAMMPS potential is recreated in the actual computing enviroment that is used.  

Files

dataset.zip

Files (1.4 GB)

Name Size Download all
md5:365c6c75168c25f7f9af1a408d2835a7
185.1 MB Preview Download
md5:b5081f3e7fb864f168340045494eb11f
167.9 MB Preview Download
md5:3e813042b89da2176f6d2917dec9a3cc
40.7 kB Preview Download
md5:f82983855162e6b31228580e9fcbdaae
22.7 kB Preview Download
md5:1e26f0dbe47713be11819b2bf5a6087a
20.3 kB Preview Download
md5:d9a84aec7627003945a63a52d9471acb
20.0 MB Preview Download
md5:2f6e607ef13ccad5c975ed805d7fb389
285.9 MB Preview Download
md5:8d8efb92a2f91194edef5794c7fabf75
18.4 MB Download
md5:196d0e8bc574358b48c6c7dcc9141c84
18.4 MB Download
md5:f4c58020469634ace841d187be0541a3
726.4 MB Preview Download

Additional details

Software

Repository URL
https://github.com/SPMTH/Adsorb-GMLIP
Programming language
Python , Shell