Fundamental evolution of all <em>Orthocoronavirinae</em> including three deadly lineages descendent from Chiroptera-hosted coronaviruses: SARS-CoV, MERS-CoV, and SARS-CoV-2
Authors/Creators
- 1. University of North Carolina at Charlotte
Description
The article "Fundamental evolution of all Orthocoronavirinae including three deadly lineages descendent from Chiroptera-hosted coronaviruses: SARS-CoV, MERS-CoV, and SARS-CoV-2" was written by Denis Jacob Machado, Rachel Scott, Sayal Guirales, and Daniel A. Janies. The article was published online on April 26, 2021 (Cladistics, DOI: 10.1111/cla.12454).
I. SUPPLEMENTARY DIGITAL MATERIAL
The supplementary digital material is available at Zenodo, DOI: 10.5281/zenodo.3740770. In all files containing molecular information from GISAID's EpiCoV database, we masked GISAID's data, viz. each nucleotide was replaced by missing data ("?" or "N"), in compliance with that database's policies.
Selected terminals
The accession numbers of the 2,006 terminals (76 COVID-2019 from GISAID and 1,930 sequences from NCBI's RefSeq and GenBank databases) used in this study are listed in "terminals.csv." Sequence metadata (with different tabs that contain notes on terminal names and host information) is in "metadata.xlsx."
All sequences here are unique, and no sequence is a substring of another complete genome on the database. Also, selected sequences are longer than 26 Kbp and have less than 0.1% of character states that are different from A, C, T, or G (e.g., missing data and gaps). Finally, we were able to predict the partitions ORF1ab, M, S, and N for all sequences herein.
Data matrix
The final DNA matrix in "matrix.ss" comprises 38,274 characters divided into four partitions, representing the genes ORF1ab (translated by ribosomal frameshifting), S (spike glycoprotein trimer), M (membrane protein), and N (nucleoprotein).
The same matrix is also available in NEXUS format ("matrix.nex"), and the partitions and selected models are descriped in the NEXUS file "partitions.nex."
Tree search
The template for the script used to perform different tree search replicates on TNT is named "treeSearch.RUN." This script was executed ten times, changing the replicate number accordingly. A total of 100 rounds of tree fusing were executed using all trees found this way (see "fuse.RUN"). Consensus trees were produced with "consensus.RUN." Trees with branch lengths were produced with "branchLength.RUN." Bootstrap calculations were performed with "bootstrap.RUN." The calculation of Goodman-Bremer support values was based on the macro "Bremer.RUN".
Recombination analyses
The parameters used for whole-genome alignment and recombination detection among the complete genomes of the SARS-CoV-2 reference sequence (RefSeq accession number NC_045512.2), a bat-hosted COV RaTG13 (GISAID accession number EPI_ISL_41402131), a representative of the Pan_SL-CoV_GD clade (GISAID accession number EPI_ISL_410721), and two other bat-hosted SARS-like viruses (GenBank accession numbers MG772933.1 and MG772934.1), as well as the main results, are provided in a single PDF file ("recombination.pdf").
Graphical abstract
The graphical abstract below summarizes our main results. See full image in file "graphicalAbstract.pdf".
Phylogenetic trees from parsimony analyses
The NEXUS file "parsimony.nex" contains the best heuristic results from the parsimony analyses (six trees), the tree with branch lengths, the tree with bootstrap values, the tree with Goodman-Bremer support values, the tree with REP values, and the strict consensus tree. The file also contains a tree with merged data (e.g., node numbers, clade frequencies, branch lengths).
Bootstrap values and clade sizes from parsimony analysis
Boostrap values among all nodes varied from 0 to 100% (mean = 65.74%, median = 80%, and mode = 100%). Boostrap values on the consensus tree varied from 1 to 100% (mean = 75.17%, median = 90%, and mode = 100%). For scatter plots and histograms showing the variation of boostrap values in relation to clade size, see file "bootstrap.png."
Complete consensus tree from parsimony analyses
A high-resolution version of the consensus tree from the best six heuristic results from tree search performed under the parsimony criterion is in file "parsimony.pdf" Branch lengths are proportional to the number of transformations and branch colors correspond to bootstrap values (see legend in the figure).
Host shifts
The spreadsheet in "hosts.csv" contains the minimum and the maximum number of each type of host transformations. The complete consensus tree with the YBYRÁ's categorization of host transformations is available in "hosts.pdf."
TreeTime analyses
Analyses with TreeTime v0.7.5 (available at github.com/neherlab/treetime) following instructions from its documentation (revision f1c83c30, available at treetime.readthedocs.io). We included the results of the following analyses:
The spreadsheet in "treetime.csv" contains the main results from TreeTime analysis, including estimated mutation rates and the minimum and maximum estimated dates for the selected virus clades. It also gives each of the virus' earliest publications and their respective DOIs. Finally, this spreadsheet has the details about the earliest genetic sequences submitted to NCBI's databases for each of the virus it lists.
- Host shift calculation using the "mugration" model: the compressed folder "mugration.zip" contains the GTR model calculations ("GTR.txt"), confidence values per node and state ("confidence.csv"), and the annotated tree data showing all host shifts ("annotated_tree.nex" and "annotated_tree.pdf").
- Mutation rates: the compressed folder "mutation_rates.zip" contains details about selected clades, including branch lengths ("clade_data.csv"). It also contains host and collection dates for terminals ("terminal_data.csv") and root-to-tip regression analyses ("root-to-tip-regressions.csv" and "root-to-tip-regressions.pdf")
Recombination detection analysis
The spreadsheet in "summaryFromRdp5_505terminals.xlsx" contains the results of the recombination detection analysis of a 505 terminals dataset. The results in there were used to test the sensitivity of phylogenetic analysis to the removal of putative recombinant sequences.
Maximum likelihood trees
The maximum likelihood tree (log-likelihood: -2,240,329.5917) is available in "likelihood.nex." Node labels show the support values formatted as SH-aLRT support and bootstrap values. The branch lengths are proportional to the average number of nucleotide substitutions per nucleotide site.
Unconstrained maximum likelihood trees for each partition are in "ml_gene_trees.nex."
Subsets for sensitivity analysis
The matrices, partition schemes, best heuristic solutions, and strict consensus trees from the datasets of 505 and 315 terminals used to test the sensitivity to putative recombinant genomes are in the NEXUS files "dataset505terminals.nex" and "dataset315terminals.nex", respectively.
Phylogenetic analyses of the SARS-CoV-2 related clade
The NEXUS file "sarscov2.nex" contains the alignment matrix and partition scheme used in the independent phylogenetic analyses of the SARS-CoV-2 clade.
The best heuristic solutions (8,900 steps each) and strict consensus tree from parsimony analyses are available in "sarscov2_parsimony.nex."
The maximum likelihood tree (likelihood score equal to -67,779.744) is in "sarscov2_ml.nex." Node labels show the support values formatted as SH-aLRT support and bootstrap values. The branch lengths are proportional to the average number of nucleotide substitutions per nucleotide site.
Alignment comparisons in the SARS-CoV-2-related clade
The file "sarscov2_aligns.xlsx" contain summary stats of the alignment comparisons between the SARS-CoV-2 reference sequence (NCBI's RefSeq accession number NC_045512.2) and related viruses found in humans, bats, and pangolin hosts.
Alignment comparisons of the repeat binding motif of the spike glycoprotein
The file "rbm.xlsx" contains details on the comparisons between the receptor-binding motif (RBM) of the spike glycoprotein of SARS-CoV-2 (NCBI's RefSeq accession number NC_045512) and other viruses infecting humans, bats, and pangolins in the SARS-CoV-2-related clade. This Excel spreadsheet has two tabs summarizing data from the amino acid and nucleotide alignments, respectively.
II. SUPPLEMENTARY ACKNOWLEDGEMENT TABLE
The complete GISAID acknowledgement table is provided in file "acknowledgement.xlsx" (Zenodo, DOI: 10.5281/zenodo.3740770).
III. GLOSSARY
We compose a glossary, provided in file "glossary.pdf" (Zenodo, DOI: 10.5281/zenodo.3740770), with selected terms and concepts that are in our manuscript or that are crucial to understanding the references we cited.
Notes
Files
bootstrap.png
Files
(228.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a59558af98cc0f0bae32a73fd5938762
|
17.6 kB | Download |
|
md5:20d52eacc5e76efeed37db8ef9e37f9a
|
517.7 kB | Preview Download |
|
md5:fea54bce04990bc66861d08badb451e0
|
209 Bytes | Download |
|
md5:9738d239af41a29ef9c67a726fa6b749
|
169 Bytes | Download |
|
md5:83c919bc70f621d57a192a49568ea492
|
17.3 kB | Download |
|
md5:c31ca507fbc2eaa2ee515859a2effc9c
|
453 Bytes | Download |
|
md5:fa072f595b023aaa0c864fc64209b7f3
|
11.6 MB | Download |
|
md5:becd70da400647e20fc9ec2806e70ed2
|
19.7 MB | Download |
|
md5:add52ef95928262f42744e153ba971e0
|
457 Bytes | Download |
|
md5:a323ebb0672c109ac9e0c8a95abba644
|
1.5 MB | Preview Download |
|
md5:1531da3d2f300a99953005e8424d8231
|
669.9 kB | Preview Download |
|
md5:53183c08a9692dc2b2021253c6f20bb1
|
34.0 kB | Preview Download |
|
md5:d3a1b04311deff898c402c9fe95cd6e4
|
1.6 MB | Preview Download |
|
md5:226e42b0091606b855dcab4033549c86
|
281.9 kB | Download |
|
md5:545f2e468fc6cde1ab15401c4afd692f
|
76.8 MB | Download |
|
md5:4ca2a2bca4f6049546172de6553ad404
|
76.8 MB | Download |
|
md5:3a40c63ed9e1ee1025e32f427198a37d
|
283.5 kB | Download |
|
md5:7ed330389e877e2b3e8fbd6a2077de5c
|
326.4 kB | Download |
|
md5:8728d90fa17f70616a40c2e65f3d4b34
|
361.0 kB | Preview Download |
|
md5:46924b7383ca6f8116e87d3816beca0e
|
753.6 kB | Preview Download |
|
md5:1674f3fe15f720bde3f10e7cad49ec46
|
1.3 MB | Download |
|
md5:f48e180b8cd0c36a6c300d392cc5dd64
|
264.5 kB | Preview Download |
|
md5:f4a1deb17071cd9059f9f5171e4117ee
|
157 Bytes | Download |
|
md5:dc0293e28a98a85d9a6a44689e30a7f8
|
8.0 MB | Download |
|
md5:965768f82a9b3bb997243a97cc660827
|
532.1 kB | Preview Download |
|
md5:94c29d04f880a4f75e51d06e8556d16d
|
7.3 MB | Download |
|
md5:3935c49e5dce74c116cd7869054a8df3
|
1.2 MB | Download |
|
md5:8920deca3e10ca80a34a15a18d584f40
|
23.9 kB | Download |
|
md5:6330936dccf643bd882f60f1f9124e2c
|
17.8 MB | Download |
|
md5:cf78133b771c0d265e823747b81607c7
|
321.6 kB | Download |
|
md5:b9793419b88159a6a2f5da2d79f92cd8
|
111.3 kB | Preview Download |
|
md5:affc46423e0d28115ba3d5f0b16b34dd
|
296 Bytes | Download |
|
md5:6b1e69929dbe316e7821568941c618f8
|
3.6 kB | Preview Download |