MTD2025: A high-quality dataset curated specifically for multi-target molecule generation
Authors/Creators
Contributors
Project member:
Description
Multi-Target Dataset 2025 (MTD2025) is a high-quality dataset curated specifically for multi-target molecule generation. It is constructed based on the Papyrus bioactivity database and integrates experimental activity data from multiple authoritative sources. We systematically filtered, paired, and reconstructed the data to generate a multi-target resource. The final dataset contains 4,011 unique proteins and 123,024 unique small molecules, together with more than 600,000 quantum-precision 3D conformations, resulting in approximately 446,000 dual-target and 283,000 triple-target associations. For each molecule, we firstly performed conformational search using CREST, and then optimized the resulting structures with the quantum-accurate LiTEN-FF to obtain the locally lowest-energy 3D conformation, ensuring high structural quality and physical plausibility.
Files
Dual_targets.csv
Additional details
References
- Béquignon OJM, Bongers BJ, Jespers W, Ijzerman AP, van der Water B, van Westen GJP. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. Journal of Cheminformatics 15, 3 (2023).