Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published February 26, 2024 | Version v1
Publication Open

Design of a generalized platform for gathering protein sequence → function datasets at scale

  • 1. Align to Innovate
  • 2. ROR icon Harvard Medical School
  • 3. ROR icon The Francis Crick Institute
  • 4. ROR icon National Institute of Standards and Technology
  • 5. Universität Greifswald
  • 6. ROR icon Technische Universität Berlin
  • 7. 056ncc580
  • 1. Medium Biosciences
  • 2. ROR icon Harvard Medical School
  • 3. ROR icon University of Illinois Urbana-Champaign
  • 4. ROR icon University of California, Berkeley
  • 5. ROR icon Wellcome Sanger Institute
  • 6. ROR icon Microsoft (United States)
  • 7. ROR icon University of Saskatchewan

Description

This article proposes a high-throughput experimental platform for collecting large-scale protein function datasets. The platform utilizes a pooled, growth-based assay to measure protein function quantitatively, allowing for the analysis of up to 500,000 protein variants per experiment at a cost of approximately $0.05 per sequence. This method is designed to be adaptable to a wide variety of protein functions by validating gene circuits and establishing calibration variants. The process involves creating barcoded libraries of protein variants, transforming them into bacteria, growing them under selective conditions, and sequencing the barcodes to quantify differential growth rates. The data collected will populate an open dataset after an embargo period, facilitating the development of machine learning models to predict protein functions from DNA sequences. The platform aims to standardize data collection across different labs and protein families, ultimately contributing to the creation of a generalizable predictive model for protein function, which could significantly advance the field of biology.

Files

20240806_SequenceToFunction_Main.pdf

Files (6.2 MB)

Name Size Download all
md5:34aba387718ea820be43a016142d2cf0
6.2 MB Preview Download

Additional details

Related works

Is continued by
Publication: 10.5281/zenodo.12819116 (DOI)
Publication: 10.5281/zenodo.12819109 (DOI)