Published June 28, 2024 | Version v1
Dataset Open

Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning

  • 1. ROR icon University of Basel
  • 2. ROR icon ETH Zurich

Description

This entry contains data, pretrained models and supplementary files for our enzyme engineering study:

Title: Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning
Journal: ACS Central Science

If you use any of the data or code in the repository, please cite the paper.

The contents of this entry are:

  1. Sequence embeddings needed for reproducing the code are found in data.zip. Unzip the contents of the folder to /data in the code structure. 
  2. NGS sequencing analysis and raw data in NGS analysis.zip
  3. 10% subset of structures generated with the Rosetta software in structures.zip
  4. Pretrained and saved models for plotting, clustering and further prediction are saved in models.zip
  5. Raw assay data including our designed libraries by active learning are found in assay_and_ML_data.zip

Files

assay_and_ML_data.zip

Files (7.1 GB)

Name Size Download all
md5:aef5cba9b0abf3f5fc5353bd5c9ca565
365.5 kB Preview Download
md5:d1d13f569f1e1e652b7b3abc10adf60c
1.7 GB Preview Download
md5:0ce73b5b2a2e0fac27224ece2679850d
598.8 kB Preview Download
md5:9fe5c705f194144fdcf6fcbe88816bef
352.6 MB Preview Download
md5:396ebd69dc5f8e352a0d00354a178a20
5.0 GB Preview Download

Additional details

Related works

Is supplement to
Publication: 10.1021/acscentsci.4c00258 (DOI)

Funding

Swiss National Science Foundation
NCCR Catalysis (phase I) 180544
Swiss National Science Foundation
NCCR Molecular Systems Engineering 200021_178760

Software

Repository URL
https://github.com/lasgroup/ml-protein-design-sav-gold
Programming language
Python
Development Status
Concept