Published December 14, 2024 | Version v3
Dataset Open

Dataset for: Pre-pandemic artificial MERS analog of polyfunctional SARS-CoV-2 S1/S2 furin cleavage site domain is unique among spike proteins of genus Betacoronavirus

  • 1. ROR icon Constructor University

Contributors

  • 1. ROR icon Constructor University

Description

 

Data File Descriptions and Methods

  1. Data file 1 [betacov_matching_IPR042578.fasta]: Representative set of 2,465 betacoronavirus S protein overlapping homologous superfamily sequences retrieved in fasta format on 4 December 2022 from the InterPro repository at https://www.ebi.ac.uk/interpro/entry/InterPro/IPR042578/.

  2. Data File 2 [betacov_matching_IPR042578_motif.fasta]: With Data File 1 as input, extracted 98,122 furin cleavage site (FCS) output motifs of 20 amino acids length, including overlapping and redundant sequences, produced with the FindFur algorithm with preset parameters as described by (Gu, 2020). FindFur as used was deposited on 15 December 2020 at the GitHub software repository at https://github.com/chwisteeng/FindFur.

  3. Data File 3 [table_s1s2_hits_betacov_polyf.pdf]: Compiled summary table of sequence hits (PDF) of spike S1/S2 domains across genus Betacoronavirus. The compiled table of hits removed from Data File 2 sequences corresponding to spike protein fragments (incomplete length spike proteins as deposited at GenBank) and duplicates (redundant parts identically overlapping within the 20 amino acids motif windows), and then selected one sequence representative for multiple but identical sequences. Collection dates and geographical locations were retrieved from the NCBI Genbank protein database at https://www.ncbi.nlm.nih.gov/protein/. For SARS-CoV-2 spike variants, these data were also cross-validated with the SARS-CoV-2 lineage mutation tracker (Gangavarapu, 2023) available at https://outbreak.info which was based on extensive sequencing data from the global GISAID initiative (https://gisaid.org/). Solid lines (-) depict pat7 NLS, asterisks (*) O-glycosites, and circumflex (^) symbols FCS.

  4. Data File 4 [table_s1s2_hits_betacov_polyf.xlsx]: Compiled summary table of sequence hits (MS Excel) of spike S1/S2 domains across genus Betacoronavirus. The compiled table of hits removed from Data File 2 sequences corresponding to spike protein fragments (incomplete length spike proteins as deposited at GenBank) and duplicates (redundant parts identically overlapping within the 20 amino acids motif windows), and then selected one sequence representative for multiple but identical sequences. Collection dates and geographical locations were retrieved from the NCBI Genbank protein database at https://www.ncbi.nlm.nih.gov/protein/. For SARS-CoV-2 spike variants, these data were also cross-validated with the SARS-CoV-2 lineage mutation tracker (Gangavarapu, 2023) available at https://outbreak.info which was based on extensive sequencing data from the global GISAID initiative (https://gisaid.org/). Solid lines (-) depict pat7 NLS, asterisks (*) O-glycosites, and circumflex (^) symbols FCS.

  5. Data File 5 [betacov_s1s2_nls_pat7_furin_psort.txt]: Nuclear localization signal (NLS) detection output for 5 representative betacoronavirus spike sequence domains, including the positive hits for pat7 in SARS-CoV-2 and for MERS-MA30 CoV. NLS predictions used the PSORT algorithm available as a webservice at https://wolfpsort.hgc.jp/ which is based on the work of Nakai and Horton (Nakai and Horton, 1999). Numbering refers to Data File 3 and Data File 4.

  6. Data File 6 [betacov_s1s2_oglyc_netogly.txt]: Detection output for 5 representative betacoronavirus spike sequence domains tested for Thr/Ser O-glycosite residue pairs with the standard prediction software NetOGlyc4.0 (Steentoft et al., 2013) as available at https://services.healthtech.dtu.dk/services/NetOGlyc-4.0/. Positive hits have scores above 0.5. Numbering refers to Data File 3 and Data File 4.

  7. Data File 7 [betacov_s1s2_nls_pat7_furin_blastp.txt]: Comprehensive sequence database searches using were performed using the NCBI protein BLAST (blastp) algorithm with webservice available at https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins. The following blastp search parameters and settings were used: Word size=2; Expect value=200000; Hitlist size=500; Gapcosts=9,1; Matrix=PAM30; Filter string=F; Genetic Code=1;Window Size=40; Threshold=11; Composition-based stats=0; Database Posted date=Jan 19, 2023 2:59 AM; Number of letters=17,117,563; Number of sequences=10,766; Entrez query: Includes: Betacoronavirus (taxid:694002); Excludes: SARS-CoV-2 (taxid:2697049). The six polyfunctional input query consensus motif sequences were TXXPR(K/H/R)XRSX and TXXPRX(K/H/R)RSX.

References

Gu, C., 2020. FindFur: A Tool for Predicting Furin Cleavage Sites of Viral Envelope Substrates. Master’s Thesis, San Jose State University, CA, USA. doi: 10.31979/etd.4ahv-9jya 

Gangavarapu K, Latif AA, Mullen JL, Alkuzweny M, Hufbauer E, Tsueng G, Haag E, Zeller M, Aceves CM, Zaiets K, Cano M, Zhou X, Qian Z, Sattler R, Matteson NL, Levy JI, Lee RTC, Freitas L, Maurer-Stroh S; GISAID Core and Curation Team; Suchard MA, Wu C, Su AI, Andersen KG, Hughes LD. Outbreak.info genomic reports: scalable and dynamic surveillance of SARS-CoV-2 variants and mutations. Nat Methods. 2023. 20(4):512-522. doi: 10.1038/s41592-023-01769-3.

Nakai, K., Horton, P., 1999. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization. Trends Biochem Sci 24, 34–36. doi: 10.1016/s0968-0004(98)01336-x

Steentoft, C., Vakhrushev, S.Y., Joshi, H.J., Kong, Y., Vester-Christensen, M.B., Schjoldager, K.T.-B.G., Lavrsen, K., Dabelsteen, S., Pedersen, N.B., Marcos-Silva, L., Gupta, R., Bennett, E.P., Mandel, U., Brunak, S., Wandall, H.H., Levery, S.B., Clausen, H., 2013. Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J 32, 1478–1488. doi: 10.1038/emboj.2013.79

Files

table_s1s2_hits_betacov_polyf.pdf

Files (10.1 MB)

Name Size Download all
md5:ead6b2c9ad292ffd15744d659a4dc0ee
3.4 MB Download
md5:b8c43fb4913b20daa315cae23460e37a
5.1 MB Download
md5:f8f8d5e42c0e5e3d583c5fd37ffd1f01
1.6 MB Preview Download
md5:5298ec6b3f9db2300ac99b2ccdb3385e
4.2 kB Preview Download
md5:b7b8e754d5426a35d48a2d56c4b756de
5.7 kB Preview Download
md5:f8cd6c796ea1bb52e9e814464011f792
73.7 kB Preview Download
md5:73b7a7ccdd71aebff8b5975428e7a207
13.5 kB Download

Additional details

Related works

Is supplement to
Publication: 10.1186/s12863-024-01290-2 (DOI)

Dates

Available
2024-12-14