Published January 19, 2026 | Version v2
Dataset Open

MTD2025: A high-quality dataset curated specifically for multi-target molecule generation

Authors/Creators

Contributors

Project member:

Description

Multi-Target Dataset 2025 (MTD2025) is a high-quality dataset curated specifically for multi-target molecule generation. It is constructed based on the Papyrus bioactivity database and integrates experimental activity data from multiple authoritative sources. We systematically filtered, paired, and reconstructed the data to generate a multi-target resource. The final dataset contains 4,011 unique proteins and 123,024 unique small molecules, together with more than 600,000 quantum-precision 3D conformations, resulting in approximately 446,000 dual-target and 283,000 triple-target associations. For each molecule, we firstly performed conformational search using CREST, and then optimized the resulting structures with the quantum-accurate LiTEN-FF to obtain the locally lowest-energy 3D conformation, ensuring high structural quality and physical plausibility. 

Files

Dual_targets.csv

Files (208.8 MB)

Name Size Download all
md5:45ed6a924f5526a4a43b30aab52e9bda
59.7 MB Preview Download
md5:98a8a771e9bca8f3d61e7288acfa7eb9
110.2 MB Download
md5:0534fc0929e3ef187de2e9c7f4f14880
1.8 kB Preview Download
md5:b3197a1a9aff2dc58a9d2c49d40c77cf
39.0 MB Preview Download

Additional details

References

  • Béquignon OJM, Bongers BJ, Jespers W, Ijzerman AP, van der Water B, van Westen GJP. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. Journal of Cheminformatics 15, 3 (2023).