Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks
Creators
- 1. New England Biolabs, 240 County Road, Ipswich, MA 01938, United States
- 2. Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden
- 3. Invitae, 1400 16th St, San Francisco, CA 94103, United States
- 4. Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden; Institute of Biotechnology, Life Sciences Centre, Vilnius University, Sauletekio al. 7, LT10257 Vilnius, Lithuania; Randall Centre for Cell & Molecular Biophysics, King's College London, New Hunt's House, Guy's Campus, SE1 1UL London, UK
- 5. Microsoft Research New England, 1 Memorial Drive, Cambridge, MA, 02142, United States
Description
Large supplementary data files to accompany the manuscript: Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks
- AlphaFold2 predicted structures
- Full sequence lists
- Tables of metrics
- Tables of experimental results
- Phylogenetic Trees
- Raw experimental data
If you use these data, please cite the associated paper:
Johnson, Sean R., Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, and Kevin K. Yang. “Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks.” bioRxiv, March 4, 2023. https://doi.org/10.1101/2023.03.04.531015.
The chorsimate mutase (CM) and lysozyme (PF00959, PF01832, PF05838, PF06737, and PF16754) sequences originate from previously published works. If you use them, please cite the appropriate works:
Chorismate mutase:
Russ, William P., Matteo Figliuzzi, Christian Stocker, Pierre Barrat-Charlaix, Michael Socolich, Peter Kast, Donald Hilvert, et al. “An Evolution-Based Model for Designing Chorismate Mutase Enzymes.” Science 369, no. 6502 (July 24, 2020): 440–45. https://doi.org/10.1126/science.aba3304.
Lysozyme:
Madani, Ali, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos, et al. “Large Language Models Generate Functional Protein Sequences across Diverse Families.” Nature Biotechnology, January 26, 2023, 1–8. https://doi.org/10.1038/s41587-022-01618-2.
Also note that for natural sequences in these external datasets, we did not calculate alignment-based metrics because they do not make a clear distinction between natural "training" and natural "test" sequences.
Version 2.0.0 (Jan 2024):
Updates to the following files:
experimental_raw_data.zip
Experimental_results_tabulation.xlsx
experimentally_tested_metrics.csv
We reran assays in a more quantitative manner, so, compared to previous versions, there are some changes in categorization of some enzymes from "inactive" to "active", or vice-versa
Files
AlphaFold2_structures.zip
Files
(610.7 MB)
Name | Size | Download all |
---|---|---|
md5:7c7b2609b4ff6dfde40abb32995a1b32
|
170.9 MB | Preview Download |
md5:9d6b1ad95bdffe9feba07b07dec72798
|
1.0 MB | Preview Download |
md5:827a20051d2d9beba344080e2f674113
|
717 Bytes | Download |
md5:5b7a1cad757995a612f762517abdfde6
|
7.2 kB | Download |
md5:7421f6b028757cda6de84aeae14989f2
|
190.6 kB | Preview Download |
md5:599a0e62febdc46a768059c4db7160c1
|
207.1 kB | Download |
md5:6ae5c366180ba65ad0a46c97e99192b7
|
360.8 kB | Preview Download |
md5:39ac29a927dcd3de9d4fcdd34a11b074
|
79 Bytes | Download |
md5:de5a510fdcb71640d7d9037548c1718d
|
272.0 MB | Preview Download |
md5:8f01c77c1f525eec3f780da282bfbec2
|
80.1 kB | Preview Download |
md5:1137b83980715980af2bac8ef85867b2
|
2.0 MB | Preview Download |
md5:6bb147a96117344c07ed71b02495c4dd
|
7.8 kB | Download |
md5:04b81bb04efa1c97231d321cea8c2a1e
|
8.1 kB | Download |
md5:73844ae110c733f9328c2cfee7e27fa7
|
95.0 MB | Preview Download |
md5:007f1f3808582f21abefb1bf7f4361d5
|
68.9 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.1101/2023.03.04.531015 (DOI)