Published March 20, 2024 | Version v1
Dataset Open

SSProteinFitnessPrediction

Description

This repository contains a compilation of 19 datasets of protein-fitness pairs containing single-substituted variants (17 datasets), double-substituted variants (1 dataset) and multiple-substituted variants (1 dataset). For more information go to README.pdf

Files

README.pdf

Files (6.9 GB)

Name Size Download all
md5:97278c4ddc63e54b033e19cf31141dcf
824.7 MB Preview Download
md5:52dc4d64fd755b254b1f139771c1bf0f
590.9 MB Preview Download
md5:2f2a70e190b398d1e7dece630eb5cfa4
197.2 MB Preview Download
md5:278385132624823495744a66f112790b
200.4 MB Preview Download
md5:815a2d5f6edef00c5b78d4ce276ec2b4
200.5 MB Preview Download
md5:2d87a6d740e8fb1960e1c2af0f22fc87
99.4 MB Preview Download
md5:ebc3ea80decd8e979247c8044d003c88
77.9 MB Preview Download
md5:3186ebb2296109796e105f513f4686a1
37.2 MB Preview Download
md5:6d1ed6b090886df891b12bfe93420cff
87.1 MB Preview Download
md5:1fc983b2a7da8f442547d560f07beb8b
652.4 MB Preview Download
md5:f8bd7db1ee63c82f75fa603eb7c6d953
220.1 MB Preview Download
md5:d4c87596ba445f210f7cbc70242e881e
260.5 MB Preview Download
md5:f19c5c976f368c71c428449ec76d32ef
608.5 MB Preview Download
md5:80e7a830c2110082fafd87601bdc726b
1.4 GB Preview Download
md5:1f888895204214fc7c9be69c60dad5a3
48.1 MB Preview Download
md5:cafd6fbe83c55fc3e5091257d6df120b
20.5 MB Preview Download
md5:6e30b0d9f527503264db26aacd42e7be
317.1 MB Preview Download
md5:e2504269af6b8f1770dcf7e9544eae87
85.4 MB Preview Download
md5:2db01fbf1da620dd06429801eacb26d3
48.8 MB Preview Download
md5:f3e5a48cee91762091d1e82bfba96180
30.1 MB Preview Download
md5:52986507e0af94deacf59d831931a78e
194.2 MB Preview Download
md5:585422acc013ea5a6dd909ab5b52449e
62.9 MB Preview Download
md5:67a48d9b1aa7f19e56bfceb5e0c5808c
186.3 MB Preview Download
md5:c18f6ecdedea05f46db3ea89dc384c3d
155.4 MB Preview Download
md5:93b075fa37484be4934e07301b8831c4
4.3 MB Preview Download
md5:6c0752215ae5bf5ec8733132b0212c7d
18.8 MB Preview Download
md5:306e72ddb88c555cf9d077b6c5453187
28.6 MB Preview Download
md5:17e885c383f04afacb5144648fc155e6
14.4 MB Preview Download
md5:83fe18fb31e6a62af911da17be3438ed
145.4 kB Preview Download
md5:5eef2c4484d3272d024b4cd57f79bd17
78.0 MB Preview Download
md5:b0db2dd48bf55e240f1c63156b9806cf
81.0 MB Preview Download
md5:6ee9a1ec123da9758a24a3aa738f6a97
74.5 MB Preview Download
md5:ad76737b7a80f25c74a38b7bd176b81f
56.9 MB Preview Download

Additional details

Software

References

  • Ethan C Alley, Grigory Khimulya, Surojit Biswas, Mohammed AlQuraishi, and George M Church. Unified rational protein engineering with sequence-based deep representation learning. Nature methods, 16(12):1315– 1322, 2019.
  • Carlos L Araya, Douglas M Fowler, Wentao Chen, Ike Muniez, Jeffery W Kelly, and Stanley Fields. A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proceedings of the National Academy of Sciences, 109(42):16858–16863, 2012.
  • Surojit Biswas, Grigory Khimulya, Ethan C Alley, Kevin M Esvelt, and George M Church. Low-N protein engineering with data-efficient deep learning. Nature methods, 18(4):389–396, 2021.
  • M Dayhoff, R Schwartz, and B Orcutt. A model of evolutionary change in proteins. In Atlas of protein sequence and structure, volume 5, pages 345–352. National biomedical research foundation Silver Spring, MD, USA, 1978.
  • Zhifeng Deng, Wanzhi Huang, Erol Bakkalbasi, Nicholas G Brown, Carolyn J Adamski, Kacie Rice, Donna Muzny, Richard A Gibbs, and Timothy Palzkill. Deep sequencing of systematic combinatorial libraries reveals β-lactamase sequence constraints at high resolution. Journal of molecular biology, 424(3-4):150–167, 2012.
  • Michael B Doud and Jesse D Bloom. Accurate measurement of the effects of all amino-acid mutations on influenza hemagglutinin. Viruses, 8(6):155, 2016.
  • Elad Firnberg, Jason W Labonte, Jeffrey J Gray, and Marc Ostermeier. A comprehensive, high-resolution map of a gene's fitness landscape. Molecular biology and evolution, 31(6):1581–1592, 2014.
  • Alexander-Maurice Illig, Niklas E Siedhoff, Ulrich Schwaneberg, and Mehdi D Davari. A hybrid model combining evolutionary probability and machine learning leverages data-driven protein engineering, 2022. Preprint at https://www.biorxiv.org/content/early/2022/06/07/2022.06.07.495081.
  • Hervé Jacquier, André Birgy, Hervé Le Nagard, Yves Mechulam, Emmanuelle Schmitt, Jérémy Glodt, Beat- rice Bercot, Emmanuelle Petit, Julie Poulain, Guilène Barnaud, et al. Capturing the mutational landscape of the beta-lactamase tem-1. Proceedings of the National Academy of Sciences, 110(32):13067–13072, 2013.
  • Jacob O Kitzman, Lea M Starita, Russell S Lo, Stanley Fields, and Jay Shendure. Massively parallel single-amino-acid mutagenesis. Nature methods, 12(3):203–206, 2015.
  • Daniel Melamed, David L Young, Caitlin E Gamble, Christina R Miller, and Stanley Fields. Deep mutational scanning of an rrm domain of the saccharomyces cerevisiae poly (a)-binding protein. Rna, 19(11):1537–1551, 2013.
  • Parul Mishra, Julia M Flynn, Tyler N Starr, and Daniel NA Bolon. Systematic mutant analyses elucidate general and client-specific aspects of hsp90 function. Cell reports, 15(3):588–598, 2016.
  • Hangfei Qi, C Anders Olson, Nicholas C Wu, Ruian Ke, Claude Loverdo, Virginia Chu, Shawna Truong, Roland Remenyi, Zugen Chen, Yushen Du, Sheng-Yao Su, Laith Q Al-Mawsawi, Ting-Ting Wu, Shu-Hua Chen, Chung-Yen Lin, Weidong Zhong, James O Lloyd-Smith, and Ren Sun. A quantitative high-resolution genetic profile rapidly identifies sequence determinants of hepatitis c viral fitness and drug sensitivity. PLoS pathogens, 10(4):e1004064, 2014.
  • Liat Rockah-Shmuel, Ágnes Tóth-Petróczy, and Dan S Tawfik. Systematic mapping of protein mutational space by prolonged drift reveals the deleterious effects of seemingly neutral mutations. PLoS computational biology, 11(8):e1004421, 2015.
  • Philip A Romero, Tuan M Tran, and Adam R Abate. Dissecting enzyme function with microfluidic-based deep mutational scanning. Proceedings of the National Academy of Sciences, 112(23):7159–7164, 2015.
  • Benjamin P Roscoe and Daniel NA Bolon. Systematic exploration of ubiquitin sequence, e1 activation efficiency, and experimental fitness in yeast. Journal of molecular biology, 426(15):2854–2870, 2014.
  • Benjamin P Roscoe, Kelly M Thayer, Konstantin B Zeldovich, David Fushman, and Daniel NA Bolon. Analyses of the effects of all ubiquitin point mutants on yeast growth rate. Journal of molecular biology, 425(8):1363–1377, 2013.
  • Karen S Sarkisyan, Dmitry A Bolotin, Margarita V Meer, Dinara R Usmanova, Alexander S Mishin, George V Sharonov, Dmitry N Ivankov, Nina G Bozhanova, Mikhail S Baranov, Onuralp Soylemez, et al. Local fitness landscape of the green fluorescent protein. Nature, 533(7603):397–401, 2016.
  • Lea M Starita, Jonathan N Pruneda, Russell S Lo, Douglas M Fowler, Helen J Kim, Joseph B Hiatt, Jay Shendure, Peter S Brzovic, Stanley Fields, and Rachel E Klevit. Activity-enhancing mutations in an e3 ubiquitin ligase identified by high-throughput mutagenesis. Proceedings of the National Academy of Sciences, 110(14):E1263–E1272, 2013.
  • Lea M Starita, David L Young, Muhtadi Islam, Jacob O Kitzman, Justin Gullingsrud, Ronald J Hause, Douglas M Fowler, Jeffrey D Parvin, Jay Shendure, and Stanley Fields. Massively parallel functional analysis of brca1 ring domain variants. Genetics, 200(2):413–422, 2015.
  • Michael A Stiffler, Doeke R Hekstra, and Rama Ranganathan. Evolvability as a function of purifying selection in tem-1 β-lactamase. Cell, 160(5):882–892, 2015.