Phenotyping uncharacterized microbial genes with a large-scale bacterial functional genomics dataset
Description
We propose generating a large microbial phenomics atlas of genes from diverse microbes. In partnership with Pioneer Labs, we envision the “Tesseract”, a dataset of 5x10^9 fitness measurements of 5 million phylogenetically diverse gene sequences characterized across 50 host strains in 100 conditions each, thus generating a ‘phenotypic fingerprint’ for each gene. In this proposal, we describe a three-phase strategy to de-risk key technologies and scale up data collection capacity. With the proposed advances in cost-effective data collection and closed-loop experiment design, we project collecting this dataset would cost approximately $700k. Investing in this corpus of data would be an asset for the field, providing the big data needed to train AI to predict protein function, predict gene transfer success, and functionally annotate a broad diversity of microbial genomes.
Files
Phenotyping uncharacterized microbial genes with a large-scale bacterial functional genomics dataset.pdf
Files
(3.2 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:fc0e4a55c971ca7b26c25e60c7e7db20
|
3.2 MB | Preview Download |