Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks

doi:10.5281/zenodo.10594384

Published January 30, 2024 | Version 2.0.0

Journal article Open

Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks

1. New England Biolabs, 240 County Road, Ipswich, MA 01938, United States
2. Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden
3. Invitae, 1400 16th St, San Francisco, CA 94103, United States
4. Department of Biology and Biological Engineering, Chalmers University of Technology, Kemivägen 10, SE41296, Gothenburg, Sweden; Institute of Biotechnology, Life Sciences Centre, Vilnius University, Sauletekio al. 7, LT10257 Vilnius, Lithuania; Randall Centre for Cell & Molecular Biophysics, King's College London, New Hunt's House, Guy's Campus, SE1 1UL London, UK
5. Microsoft Research New England, 1 Memorial Drive, Cambridge, MA, 02142, United States

Large supplementary data files to accompany the manuscript: Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks

AlphaFold2 predicted structures
Full sequence lists
Tables of metrics
Tables of experimental results
Phylogenetic Trees
Raw experimental data

If you use these data, please cite the associated paper:

Johnson, Sean R., Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, and Kevin K. Yang. “Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks.” bioRxiv, March 4, 2023. https://doi.org/10.1101/2023.03.04.531015.

The chorsimate mutase (CM) and lysozyme (PF00959, PF01832, PF05838, PF06737, and PF16754) sequences originate from previously published works. If you use them, please cite the appropriate works:
Chorismate mutase:
Russ, William P., Matteo Figliuzzi, Christian Stocker, Pierre Barrat-Charlaix, Michael Socolich, Peter Kast, Donald Hilvert, et al. “An Evolution-Based Model for Designing Chorismate Mutase Enzymes.” Science 369, no. 6502 (July 24, 2020): 440–45. https://doi.org/10.1126/science.aba3304.
Lysozyme:
Madani, Ali, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos, et al. “Large Language Models Generate Functional Protein Sequences across Diverse Families.” Nature Biotechnology, January 26, 2023, 1–8. https://doi.org/10.1038/s41587-022-01618-2.

Also note that for natural sequences in these external datasets, we did not calculate alignment-based metrics because they do not make a clear distinction between natural "training" and natural "test" sequences.

Version 2.0.0 (Jan 2024):

Updates to the following files:

experimental_raw_data.zip
Experimental_results_tabulation.xlsx
experimentally_tested_metrics.csv

We reran assays in a more quantitative manner, so, compared to previous versions, there are some changes in categorization of some enzymes from "inactive" to "active", or vice-versa

Files

AlphaFold2_structures.zip

Files (610.7 MB)

Name	Size	Download all
AlphaFold2_structures.zip md5:7c7b2609b4ff6dfde40abb32995a1b32	170.9 MB	Preview Download
CM_table.csv md5:9d6b1ad95bdffe9feba07b07dec72798	1.0 MB	Preview Download
CuSOD_round2_pre-test.newick md5:827a20051d2d9beba344080e2f674113	717 Bytes	Download
CuSOD_round2_round3.newick md5:5b7a1cad757995a612f762517abdfde6	7.2 kB	Download
experimental_raw_data.zip md5:7421f6b028757cda6de84aeae14989f2	190.6 kB	Preview Download
Experimental_results_tabulation.xlsx md5:599a0e62febdc46a768059c4db7160c1	207.1 kB	Download
experimentally_tested_metrics.csv md5:6ae5c366180ba65ad0a46c97e99192b7	360.8 kB	Preview Download
FeSOD_round2_pre-test.newick md5:39ac29a927dcd3de9d4fcdd34a11b074	79 Bytes	Download
generated_metrics_table.csv md5:de5a510fdcb71640d7d9037548c1718d	272.0 MB	Preview Download
lysozyme_table.csv md5:8f01c77c1f525eec3f780da282bfbec2	80.1 kB	Preview Download
lysozyme_training.csv md5:1137b83980715980af2bac8ef85867b2	2.0 MB	Preview Download
MDH_round2_round3.newick md5:6bb147a96117344c07ed71b02495c4dd	7.8 kB	Download
Round2_pre-test_SOD_results.tsv md5:04b81bb04efa1c97231d321cea8c2a1e	8.1 kB	Download
round3_selection_table.csv md5:73844ae110c733f9328c2cfee7e27fa7	95.0 MB	Preview Download
sequences.zip md5:007f1f3808582f21abefb1bf7f4361d5	68.9 MB	Preview Download

Additional details

Is supplement to: Preprint: 10.1101/2023.03.04.531015 (DOI)

	All versions	This version
Views	1,009	162
Downloads	1,076	282
Data volume	54.9 GB	11.8 GB

Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks

Creators

Description

Files

AlphaFold2_structures.zip

Files (610.7 MB)

Additional details

Related works