There is a newer version of the record available.

Published January 30, 2024 | Version 2.0.0
Journal article Open

Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks

  • 1. New England Biolabs, 240 County Road, Ipswich, MA 01938, United States
  • 2. Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden
  • 3. Invitae, 1400 16th St, San Francisco, CA 94103, United States
  • 4. Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden; Institute of Biotechnology, Life Sciences Centre, Vilnius University, Sauletekio al. 7, LT10257 Vilnius, Lithuania; Randall Centre for Cell & Molecular Biophysics, King's College London, New Hunt's House, Guy's Campus, SE1 1UL London, UK
  • 5. Microsoft Research New England, 1 Memorial Drive, Cambridge, MA, 02142, United States

Description

Large supplementary data files to accompany the manuscript: Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks

  • AlphaFold2 predicted structures
  • Full sequence lists
  • Tables of metrics
  • Tables of experimental results
  • Phylogenetic Trees
  • Raw experimental data


If you use these data, please cite the associated paper:

Johnson, Sean R., Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, and Kevin K. Yang. “Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks.” bioRxiv, March 4, 2023. https://doi.org/10.1101/2023.03.04.531015.

 

The chorsimate mutase (CM) and lysozyme (PF00959, PF01832, PF05838, PF06737, and PF16754) sequences originate from previously published works. If you use them, please cite the appropriate works:
Chorismate mutase:
Russ, William P., Matteo Figliuzzi, Christian Stocker, Pierre Barrat-Charlaix, Michael Socolich, Peter Kast, Donald Hilvert, et al. “An Evolution-Based Model for Designing Chorismate Mutase Enzymes.” Science 369, no. 6502 (July 24, 2020): 440–45. https://doi.org/10.1126/science.aba3304.
Lysozyme:
Madani, Ali, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos, et al. “Large Language Models Generate Functional Protein Sequences across Diverse Families.” Nature Biotechnology, January 26, 2023, 1–8. https://doi.org/10.1038/s41587-022-01618-2.

Also note that for natural sequences in these external datasets, we did not calculate alignment-based metrics because they do not make a clear distinction between natural "training" and natural "test" sequences.

 

Version 2.0.0 (Jan 2024):

Updates to the following files:

experimental_raw_data.zip
Experimental_results_tabulation.xlsx
experimentally_tested_metrics.csv

We reran assays in a more quantitative manner, so, compared to previous versions, there are some changes in categorization of some enzymes from "inactive" to "active", or vice-versa

Files

AlphaFold2_structures.zip

Files (610.7 MB)

Name Size Download all
md5:7c7b2609b4ff6dfde40abb32995a1b32
170.9 MB Preview Download
md5:9d6b1ad95bdffe9feba07b07dec72798
1.0 MB Preview Download
md5:827a20051d2d9beba344080e2f674113
717 Bytes Download
md5:5b7a1cad757995a612f762517abdfde6
7.2 kB Download
md5:7421f6b028757cda6de84aeae14989f2
190.6 kB Preview Download
md5:599a0e62febdc46a768059c4db7160c1
207.1 kB Download
md5:6ae5c366180ba65ad0a46c97e99192b7
360.8 kB Preview Download
md5:39ac29a927dcd3de9d4fcdd34a11b074
79 Bytes Download
md5:de5a510fdcb71640d7d9037548c1718d
272.0 MB Preview Download
md5:8f01c77c1f525eec3f780da282bfbec2
80.1 kB Preview Download
md5:1137b83980715980af2bac8ef85867b2
2.0 MB Preview Download
md5:6bb147a96117344c07ed71b02495c4dd
7.8 kB Download
md5:04b81bb04efa1c97231d321cea8c2a1e
8.1 kB Download
md5:73844ae110c733f9328c2cfee7e27fa7
95.0 MB Preview Download
md5:007f1f3808582f21abefb1bf7f4361d5
68.9 MB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.1101/2023.03.04.531015 (DOI)