Published April 18, 2023 | Version 2
Dataset Open

EnzymeMap

  • 1. TU Wien
  • 2. IBM
  • 3. Massachusetts Institute of Technology

Description

EnzymeMap (enzymemap_v2_brenda2023.csv) is a large dataset of atom mapped, balanced enzymatic reactions sorted by EC (Enzyme Commission) number.  It is intended to be used for machine learning models for predicting enzymatic reactions or bioretrosynthesis. For details on the extraction, correction and curation of the data, please refer to the publication "EnzymeMap: Curation, validation and data-driven prediction of enzymatic reactions" by E. Heid, D. Probst, W. H. Green and G. K. H. Madsen. Please cite this publication if you use EnzymeMap. A preprint is available at https://doi.org/10.26434/chemrxiv-2023-jzw9w.

The file raw_unmapped_v2_brenda2023.csv furthermore holds raw unmapped, uncurated data used in the publication for retraining of the transformers models behind IBM RXN-for-Chemistry platform. Note: The publication uses the newest version of this data (v2_brenda2023), whereas the online server IBM RXN-for-Chemistry was trained on version 1 (brenda2022) prior to release of the publication.

The origin of EnzymeMap is curated data taken from BRENDA version 2023-1, which was then atom mapped and modestly extended. For some reactions or enzyme classes BRENDA includes additional (uncurated) information not included in EnzymeMap. If one is searching for more information on a particular reaction or enzyme class, we suggest the reader check the corresponding BRENDA entry and the original literature sources.

VERSION 2: Correction of erroneous mappings for isomerase reactions, correction of missing protons in some reaction. Addition of protein information were available. Since different proteins can catalyze the same reactions, the number of reactions in EnzymeMap has increased greatly compared to Version 1. Please remove duplicates where necessary (e.g. if your project does not require protein information, drop the respective columns and then remove duplicates).

Files

compound_to_smiles.json

Files (31.2 MB)

Name Size Download all
md5:cd7203a9d692c218bb1754483cf76b19
9.2 MB Preview Download
md5:334ae7294018b0913c2c39c824892b37
13.1 MB Download
md5:f7653e392df16c0adfbe768d36675eab
8.9 MB Download

Additional details

Funding

FWF Austrian Science Fund
Computer-aided design of multi-enzyme networks J 4415