SARS-CoV-2 Surface Glycoprotein Sequences, NCBI Data Hub, October 2021 (ViralEntropR archive)
Authors/Creators
Description
Archive of SARS-CoV-2 surface glycoprotein (Spike protein) amino acid sequences downloaded from the NCBI SARS-CoV-2 Data Hub on October 12, 2021.
Downloaded with the following filters:
- Organism: Severe acute respiratory syndrome coronavirus 2 (taxid: 2697049)
- Nucleotide completeness: complete
- Protein: surface glycoprotein
- Result: 137,132 sequences, 173 MB uncompressed FASTA
Original data source: NCBI Virus (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/). Original data is a US Government work and is in the public domain within
the United States. Data from international contributors is subject to the INSDC open-access policy (https://www.insdc.org/about-insdc/).
Archived as a static snapshot for reproducibility of analyses in the ViralEntropR R package.
Cited as:
- Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2016). GenBank. Nucleic Acids Research. 44(D1):D67-D72. doi:10.1093/nar/gkv1276
- Sayers EW, Bolton EE, Brister JR, et al. (2022). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research. 50(D1):D20-D26. doi:10.1093/nar/gkab1112
- NCBI Virus [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [2020] - [cited 2021 Oct 12]. Available from: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/
Data source and licensing:
Sequence data downloaded from NCBI Virus (National Center for Biotechnology Information, U.S. National Library of Medicine) on October 12, 2021..
Per NCBI Website and Data Usage Policies (https://www.ncbi.nlm.nih.gov/home/about/policies/):
"NCBI itself places no restrictions on the use or distribution of the data contained therein."
Data use confirmed with NCBI Help Desk, Case #CAS-1470196-D4S2Z8, May 2025:
"You may use the sequence data for scientific and educational purposes."
Note: Some submitted sequences may be subject to patent, copyright, or other intellectual property rights claimed by original submitters or their country of origin.
The compilation, curation, and packaging of this archive by the ViralEntropR authors is released under CC0 1.0 Universal.
Files
Files
(181.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:4e9f5ca1b8a0f99c15a7ad55e9ccb25b
|
181.5 MB | Download |
Additional details
Related works
- Is derived from
- Dataset: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/ (URL)
- Is supplement to
- Software: https://github.com/vadimtyuryaev/ViralEntropR (URL)
Software
- Repository URL
- https://github.com/vadimtyuryaev/ViralEntropR
- Programming language
- R
- Development Status
- Active
References
- Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2016). GenBank. Nucleic Acids Research. 44(D1):D67-D72. doi:10.1093/nar/gkv1276
- Sayers EW, Bolton EE, Brister JR, et al. (2022). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research. 50(D1):D20-D26. doi:10.1093/nar/gkab1112
- NCBI Virus [Internet]. Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information; [2020] - [cited 2021 Oct 12]. Available from: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/