Published June 26, 2024 | Version v3
Presentation Open

MDverse: Shedding Light on the Dark Matter of Molecular Dynamics Simulations

  • 1. ROR icon Université Paris Cité

Description

The rise of open science and the absence of a global dedicated data repository for molecular dynamics (MD) simulations has led to the accumulation of MD files in generalist data repositories, constituting the dark matter of MD  - data that is 
technically accessible, but neither indexed, curated, or easily searchable. Leveraging an original search strategy, we found and indexed about 250,000 files and 2,000 datasets from Zenodo, Figshare and Open Science Framework. With a focus on files produced by the Gromacs MD software, we illustrate the potential offered by the mining of publicly available MD data. We identified systems with specific molecular composition and were able to characterize essential parameters of MD simulation, such as temperature and simulation length, and identify model resolution, such as all-atom and coarse-grain. Based on this analysis, we inferred metadata to propose a search engine prototype to explore collected MD data. To continue in this direction, we call on the community to pursue the effort of sharing MD data, and increase populating and standardizing metadata to reuse this valuable matter.

Files

2024-06-26_MDverse_Poulain.pdf

Files (5.4 MB)

Name Size Download all
md5:af502a3307a7df69b4391f39add1e86d
5.4 MB Preview Download