Published April 26, 2021 | Version 2021.04.26
Journal article Open

Fundamental evolution of all <em>Orthocoronavirinae</em> including three deadly lineages descendent from Chiroptera-hosted coronaviruses: SARS-CoV, MERS-CoV, and SARS-CoV-2

  • 1. University of North Carolina at Charlotte

Description

The article "Fundamental evolution of all Orthocoronavirinae including three deadly lineages descendent from Chiroptera-hosted coronaviruses: SARS-CoV, MERS-CoV, and SARS-CoV-2" was written by Denis Jacob Machado, Rachel Scott, Sayal Guirales, and Daniel A. Janies. The article was published online on April 26, 2021 (Cladistics, DOI: 10.1111/cla.12454).

 

I. SUPPLEMENTARY DIGITAL MATERIAL

 

The supplementary digital material is available at Zenodo, DOI: 10.5281/zenodo.3740770. In all files containing molecular information from GISAID's EpiCoV database, we masked GISAID's data, viz. each nucleotide was replaced by missing data ("?" or "N"), in compliance with that database's policies.

 

Selected terminals

 

The accession numbers of the 2,006 terminals (76 COVID-2019 from GISAID and 1,930 sequences from NCBI's RefSeq and GenBank databases) used in this study are listed in "terminals.csv." Sequence metadata (with different tabs that contain notes on terminal names and host information) is in "metadata.xlsx."

All sequences here are unique, and no sequence is a substring of another complete genome on the database. Also, selected sequences are longer than 26 Kbp and have less than 0.1% of character states that are different from A, C, T, or G (e.g., missing data and gaps). Finally, we were able to predict the partitions ORF1ab, M, S, and N for all sequences herein.

 

Data matrix

 

The final DNA matrix in "matrix.ss" comprises 38,274 characters divided into four partitions, representing the genes ORF1ab (translated by ribosomal frameshifting), S (spike glycoprotein trimer), M (membrane protein), and N (nucleoprotein).

The same matrix is also available in NEXUS format ("matrix.nex"), and the partitions and selected models are descriped in the NEXUS file "partitions.nex."

 

Tree search

 

The template for the script used to perform different tree search replicates on TNT is named "treeSearch.RUN." This script was executed ten times, changing the replicate number accordingly. A total of 100 rounds of tree fusing were executed using all trees found this way (see "fuse.RUN"). Consensus trees were produced with "consensus.RUN." Trees with branch lengths were produced with "branchLength.RUN." Bootstrap calculations were performed with "bootstrap.RUN." The calculation of Goodman-Bremer support values was based on the macro "Bremer.RUN".

 

Recombination analyses

 

The parameters used for whole-genome alignment and recombination detection among the complete genomes of the SARS-CoV-2 reference sequence (RefSeq accession number NC_045512.2), a bat-hosted COV RaTG13 (GISAID accession number EPI_ISL_41402131), a representative of the Pan_SL-CoV_GD clade (GISAID accession number EPI_ISL_410721), and two other bat-hosted SARS-like viruses (GenBank accession numbers MG772933.1 and MG772934.1), as well as the main results, are provided in a single PDF file ("recombination.pdf").

 

Graphical abstract

 

The graphical abstract below summarizes our main results. See full image in file "graphicalAbstract.pdf".

 

Phylogenetic trees from parsimony analyses

 

The NEXUS file "parsimony.nex" contains the best heuristic results from the parsimony analyses (six trees), the tree with branch lengths, the tree with bootstrap values, the tree with Goodman-Bremer support values, the tree with REP values, and the strict consensus tree. The file also contains a tree with merged data (e.g., node numbers, clade frequencies, branch lengths).

 

Bootstrap values and clade sizes from parsimony analysis

 

Boostrap values among all nodes varied from 0 to 100% (mean = 65.74%, median = 80%, and mode = 100%). Boostrap values on the consensus tree varied from 1 to 100% (mean = 75.17%, median = 90%, and mode = 100%). For scatter plots and histograms showing the variation of boostrap values in relation to clade size, see file "bootstrap.png."

 

Complete consensus tree from parsimony analyses

 

A high-resolution version of the consensus tree from the best six heuristic results from tree search performed under the parsimony criterion is in file "parsimony.pdf" Branch lengths are proportional to the number of transformations and branch colors correspond to bootstrap values (see legend in the figure).

 

Host shifts

 

The spreadsheet in "hosts.csv" contains the minimum and the maximum number of each type of host transformations. The complete consensus tree with the YBYRÁ's categorization of host transformations is available in "hosts.pdf."

 

TreeTime analyses

 

Analyses with TreeTime v0.7.5 (available at github.com/neherlab/treetime) following instructions from its documentation (revision f1c83c30, available at treetime.readthedocs.io). We included the results of the following analyses:

The spreadsheet in "treetime.csv" contains the main results from TreeTime analysis, including estimated mutation rates and the minimum and maximum estimated dates for the selected virus clades. It also gives each of the virus' earliest publications and their respective DOIs. Finally, this spreadsheet has the details about the earliest genetic sequences submitted to NCBI's databases for each of the virus it lists.

  • Host shift calculation using the "mugration" model: the compressed folder "mugration.zip" contains the GTR model calculations ("GTR.txt"), confidence values per node and state ("confidence.csv"), and the annotated tree data showing all host shifts ("annotated_tree.nex" and "annotated_tree.pdf").
  • Mutation rates: the compressed folder "mutation_rates.zip" contains details about selected clades, including branch lengths ("clade_data.csv"). It also contains host and collection dates for terminals ("terminal_data.csv") and root-to-tip regression analyses ("root-to-tip-regressions.csv" and "root-to-tip-regressions.pdf")

Recombination detection analysis

 

The spreadsheet in "summaryFromRdp5_505terminals.xlsx" contains the results of the recombination detection analysis of a 505 terminals dataset. The results in there were used to test the sensitivity of phylogenetic analysis to the removal of putative recombinant sequences.

 

Maximum likelihood trees

 

The maximum likelihood tree (log-likelihood: -2,240,329.5917) is available in "likelihood.nex." Node labels show the support values formatted as SH-aLRT support and bootstrap values. The branch lengths are proportional to the average number of nucleotide substitutions per nucleotide site.

Unconstrained maximum likelihood trees for each partition are in "ml_gene_trees.nex."

 

Subsets for sensitivity analysis

 

The matrices, partition schemes, best heuristic solutions, and strict consensus trees from the datasets of 505 and 315 terminals used to test the sensitivity to putative recombinant genomes are in the NEXUS files "dataset505terminals.nex" and "dataset315terminals.nex", respectively.

 

Phylogenetic analyses of the SARS-CoV-2 related clade

 

The NEXUS file "sarscov2.nex" contains the alignment matrix and partition scheme used in the independent phylogenetic analyses of the SARS-CoV-2 clade.

The best heuristic solutions (8,900 steps each) and strict consensus tree from parsimony analyses are available in "sarscov2_parsimony.nex."

The maximum likelihood tree (likelihood score equal to -67,779.744) is in "sarscov2_ml.nex." Node labels show the support values formatted as SH-aLRT support and bootstrap values. The branch lengths are proportional to the average number of nucleotide substitutions per nucleotide site.

 

Alignment comparisons in the SARS-CoV-2-related clade

 

The file "sarscov2_aligns.xlsx" contain summary stats of the alignment comparisons between the SARS-CoV-2 reference sequence (NCBI's RefSeq accession number NC_045512.2) and related viruses found in humans, bats, and pangolin hosts.

 

Alignment comparisons of the repeat binding motif of the spike glycoprotein

 

The file "rbm.xlsx" contains details on the comparisons between the receptor-binding motif (RBM) of the spike glycoprotein of SARS-CoV-2 (NCBI's RefSeq accession number NC_045512) and other viruses infecting humans, bats, and pangolins in the SARS-CoV-2-related clade. This Excel spreadsheet has two tabs summarizing data from the amino acid and nucleotide alignments, respectively.

 

II. SUPPLEMENTARY ACKNOWLEDGEMENT TABLE

 

The complete GISAID acknowledgement table is provided in file "acknowledgement.xlsx" (Zenodo, DOI: 10.5281/zenodo.3740770).

 

III. GLOSSARY

 

We compose a glossary, provided in file "glossary.pdf" (Zenodo, DOI: 10.5281/zenodo.3740770), with selected terms and concepts that are in our manuscript or that are crucial to understanding the references we cited.

 

Notes

ABSTRACT—The severe acute respiratory syndrome coronavirus (SARS-CoV) emerged in humans in 2002. Despite reports showing Chiroptera as the original animal reservoir of SARS-CoV, many argue that Carnivora-hosted viruses are the most likely origin. The emergence of the Middle East respiratory syndrome coronavirus (MERS-CoV) in 2012 also involves Chiroptera-hosted lineages. However, factors such as the lack of comprehensive phylogenies hamper our understanding of host shifts once MERS-CoV emerged in humans and Artiodactyla. Since 2019, the origin of SARS-CoV-2, causative agent of coronavirus disease 2019 (COVID-19), added to this episodic history of zoonotic transmission events. Here we introduce a phylogenetic analysis of 2,006 unique and complete genomes of different lineages of Orthocoronavirinae. We used gene annotations to align orthologous sequences for total evidence analysis under the parsimony optimality criterion. Deltacoronavirus and Gammacoronavirus were set as outgroups to understand spillovers of Alphacoronavirus and Betacoronavirus among ten orders of animals. We corroborated that Chiroptera-hosted viruses are the sister group of SARS-CoV, SARS-CoV-2, and MERS-related viruses. Other zoonotic events were qualified and quantified to provide a comprehensive picture of the risk of coronaviruses' emergence among humans. Finally, we applied a 250 SARS-CoV-2 genomes dataset to elucidate the phylogenetic relationship between SARS-CoV-2 and Chiroptera-hosted coronaviruses.

Files

bootstrap.png

Files (228.3 MB)

Name Size Download all
md5:a59558af98cc0f0bae32a73fd5938762
17.6 kB Download
md5:20d52eacc5e76efeed37db8ef9e37f9a
517.7 kB Preview Download
md5:fea54bce04990bc66861d08badb451e0
209 Bytes Download
md5:9738d239af41a29ef9c67a726fa6b749
169 Bytes Download
md5:83c919bc70f621d57a192a49568ea492
17.3 kB Download
md5:c31ca507fbc2eaa2ee515859a2effc9c
453 Bytes Download
md5:fa072f595b023aaa0c864fc64209b7f3
11.6 MB Download
md5:becd70da400647e20fc9ec2806e70ed2
19.7 MB Download
md5:add52ef95928262f42744e153ba971e0
457 Bytes Download
md5:a323ebb0672c109ac9e0c8a95abba644
1.5 MB Preview Download
md5:1531da3d2f300a99953005e8424d8231
669.9 kB Preview Download
md5:53183c08a9692dc2b2021253c6f20bb1
34.0 kB Preview Download
md5:d3a1b04311deff898c402c9fe95cd6e4
1.6 MB Preview Download
md5:226e42b0091606b855dcab4033549c86
281.9 kB Download
md5:545f2e468fc6cde1ab15401c4afd692f
76.8 MB Download
md5:4ca2a2bca4f6049546172de6553ad404
76.8 MB Download
md5:3a40c63ed9e1ee1025e32f427198a37d
283.5 kB Download
md5:7ed330389e877e2b3e8fbd6a2077de5c
326.4 kB Download
md5:8728d90fa17f70616a40c2e65f3d4b34
361.0 kB Preview Download
md5:46924b7383ca6f8116e87d3816beca0e
753.6 kB Preview Download
md5:1674f3fe15f720bde3f10e7cad49ec46
1.3 MB Download
md5:f48e180b8cd0c36a6c300d392cc5dd64
264.5 kB Preview Download
md5:f4a1deb17071cd9059f9f5171e4117ee
157 Bytes Download
md5:dc0293e28a98a85d9a6a44689e30a7f8
8.0 MB Download
md5:965768f82a9b3bb997243a97cc660827
532.1 kB Preview Download
md5:94c29d04f880a4f75e51d06e8556d16d
7.3 MB Download
md5:3935c49e5dce74c116cd7869054a8df3
1.2 MB Download
md5:8920deca3e10ca80a34a15a18d584f40
23.9 kB Download
md5:6330936dccf643bd882f60f1f9124e2c
17.8 MB Download
md5:cf78133b771c0d265e823747b81607c7
321.6 kB Download
md5:b9793419b88159a6a2f5da2d79f92cd8
111.3 kB Preview Download
md5:affc46423e0d28115ba3d5f0b16b34dd
296 Bytes Download
md5:6b1e69929dbe316e7821568941c618f8
3.6 kB Preview Download