There is a newer version of the record available.

Published January 29, 2025 | Version 1.0.0
Dataset Open

Datasets, alphabets and models from paper 'Reverse Engineering Molecules from Fingerprints through Deterministic Enumeration and Generative Models.

  • 1. EDMO icon National Research Institute For Agriculture, Food And Environment
  • 2. ROR icon University of Manchester

Description

Files utilized and produced within the molecule-signature project:

  • alphabets.zip: Alphabets of molecule signatures.
  • datasets.zip: Datasets from MetaNetX, eMolecules, and DrugBank used to build alphabets and train the generative models.
  • models.zip: PyTorch/Lightning models and SentencePiece tokenization models for decoding SMILES from ECFP.

See embedded README.md files and the publication for in depth details.

Files

datasets.zip

Files (3.3 GB)

Name Size Download all
md5:b67511d6a24f1de82a4b383338e0a828
52.9 MB Preview Download
md5:294aa49f17cb07059602f15e42b4a91d
2.8 GB Preview Download
md5:62b95e08eaee98968956ca506bff6ea4
470.7 MB Preview Download

Additional details

Funding

Agence Nationale de la Recherche
Galaxy-BioProd - Galaxy-BioProd: An operating portal for the production of biosourced products ANR-22-PEBB-0008
Agence Nationale de la Recherche
GENCI - GENCI ANR-17-EQPX-0001
Agence Nationale de la Recherche
IFB (ex Renabi-IFB) - Institut français de bioinformatique ANR-11-INBS-0013

Software

Repository URL
https://github.com/brsynth/molecule-signature-paper
Programming language
Python
Development Status
Active