Reactzyme: A Benchmark for Enzyme-Reaction Prediction
Creators
Description
Official dataset of Reactzyme - Reactzyme: A Benchmark forEnzyme-Reaction Prediction.
Our study utilizes a comprehensive dataset compiled from the SwissProt and Rhea databases. SwissProt, a curated subset of the UniProt database, has been selected for its high-quality, human-derived functional annotations of protein sequences. This section of UniProt is particularly valuable for its expert-reviewed entries, which ensure reliable and accurate functional data, making it ideal for our analysis. Rhea is employed for its precise mapping from enzymes to specific catalyzed functions, offering detailed descriptions of biochemical reactions.
The SwissProt and Rhea dataset are downloaded on January 8, 2024, and includes data entries up to this date, providing the most recent and comprehensive data available for our study. We selectively exclude water molecules and unspecific functional groups that could mask the true molecular structures. Conversely, we remove metal ions, gas molecules, and other small molecules because of their potential to bind to proteins, a characteristic that presents a valuable learning feature for our model. To this end, the total dataset comprises 178,463 positive enzyme-reaction pairs, including 178,327 unique enzymes and 7,726 unique reactions.
Files
deepchem_vocab.txt
Files
(395.8 MB)
Name | Size | Download all |
---|---|---|
md5:669bdd627c946114e87f06bffb4f33d9
|
87.4 MB | Download |
md5:95ca5f1d57a4a7a82bb3cca0ad742e9c
|
3.6 kB | Preview Download |
md5:e351fdb85830968fc9abe933c39f9eda
|
47.5 MB | Preview Download |
md5:5a64bef090335f884a767006867d64cf
|
1.4 kB | Download |
md5:2d9f4e6c78d8daf5752cc2a5ae2bef0d
|
46.7 MB | Preview Download |
md5:cb5a575a08954f6d28311b9a4bef52fe
|
3.5 MB | Download |
md5:c437435a239326c157e1d20f00d8e00e
|
47.6 MB | Preview Download |
md5:a669647f418bf54dc7c5d0059c2b2a09
|
62.3 MB | Download |
md5:5b9d384c96a597680b140c3a333f1600
|
100.7 MB | Download |