Published September 14, 2020
| Version v1
Dataset
Open
Data for "Learning the language of viral evolution and escape"
Description
Training data from:
- Influenza A HA protein sequences from the NIAID Influenza Research Database (IRD) (http://www.fludb.org)
- HIV-1 Env protein sequences from the Los Alamos National Laboratory (LANL) HIV database (https://www.hiv.lanl.gov)
- Coronavidae spike protein sequences from the Virus Pathogen Resource (ViPR) database (https://www.viprbrc.org/brc/home.spg?decorator=corona)
- SARS-CoV-2 Spike protein sequences from NCBI Virus (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/)
- SARS-CoV-2 Spike and other Betacoronavirus spike protein sequences from GISAID (https://www.gisaid.org/)
Datasets for fitness and escape validation:
- Fitness single-residue DMS of HA H1 WSN33 from Doud and Bloom (2016)
- Fitness combinatorial DMS of antigenic site B in six HA H3 strains from Wu et al. (2020)
- Fitness single-residue DMS of Env BF520 and BG505 from Haddox et al. (2018)
- ACE2 binding affinity combinatorial DMS of Spike from Starr et al. (2020)
- Escape single-residue DMS of HA H1 WSN33 from Doud et al. (2018)
- Escape single-residue DMS of HA H3 Perth09 from Lee et al. (2019)
- Escape single-residue DMS of Env BG505 from Dingens et al. (2019)
- Escape mutations of Spike from Baum et al. (2020)
- Escape single-residue DMS of Spike from Greaney et al. (2020)
Files
Files
(93.3 MB)
Name | Size | Download all |
---|---|---|
md5:c11f2718094e36b06f1e400e2dfff946
|
93.3 MB | Download |