Published May 24, 2023 | Version 1.0.0
Dataset Open

MISATO - Machine learning dataset for structure-based drug discovery

  • 1. Helmholtz Munich, Molecular Targets and Therapeutics Center, Institute of Structural Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany.
  • 2. Forschungszentrum Jülich, Jülich Supercomputing Centre, Jülich, Germany.
  • 3. Helmholtz AI, Helmholtz Zentrum München, Neuherberg, Germany.
  • 4. Helmholtz Munich, Computational Health Center, Institute of Computational Biology, Ingolstädter Landstr. 1, 85764 Neuherberg, Germany.

Description

Developments in Artificial Intelligence (AI) have had an enormous impact on scientific research in recent years. Yet, relatively few robust methods have been reported in the field of structure-based drug discovery. To train AI models to abstract from structural data, highly curated and precise biomolecule-ligand interaction datasets are urgently needed. We present MISATO, a curated dataset of almost 20000 experimental structures of protein-ligand complexes, associated molecular dynamics traces, and electronic properties. Semi-empirical quantum mechanics was used to systematically refine protonation states of proteins and small molecule ligands. Molecular dynamics traces for protein-ligand complexes were obtained in explicit water. The dataset is made readily available to the scientific community via simple python data-loaders. AI baseline models are provided for dynamical and electronic properties. This highly curated dataset is expected to enable the next-generation of AI models for structure-based drug discovery. Our vision is to make MISATO the first step of a vibrant community project for the development of powerful AI-based drug discovery tools.

Notes

Funding: BMWi ZIM. KK 5197901TS0. BMBF, SUPREME, 031L0268.

Files

test_MD.txt

Files (193.2 GB)

Name Size Download all
md5:52f1fd282d5da90f54641eddacbb70a0
5.7 GB Download
md5:9bc6446922cd80e0f2f3f69349bf88ed
132.8 GB Download
md5:cdedc66f99f2514b84aefc103e4868a8
54.3 GB Download
md5:1199f0af1eac684da3a6c2ddd0f321df
343.1 MB Download
md5:a3f8ada0c6562ff3d8e53653816a36ba
8.1 kB Preview Download
md5:0eff7e1729307f9d773353382fbb1f1e
68.8 kB Preview Download
md5:a37f0dfead91accfe089b3555e0d9e33
8.0 kB Preview Download