There is a newer version of the record available.

Published June 16, 2022 | Version V1.1
Dataset Open

The emergence of high-fitness variants accelerates the slowdown of genome heterogeneity in the coronavirus

  • 1. Department of Genetics, Faculty of Sciences, University of Granada, 18071, Granada, Spain
  • 2. Department of Applied Physics II and Institute Carlos I for Theoretical and Computational Physics, University of Málaga, 29071, Málaga, Spain
  • 3. Dipartimento di Scienze della Terra, dell'Ambiente e delle Risorse, Università di Napoli Federico II, 80126, Napoli, Italy
  • 4. 7Centro de Investigaciones sobre Desertificación, Consejo Superior de Investigaciones Científicas (CSIC), University of València and Generalitat Valenciana, 46113, Valencia, Spain
  • 5. Institute of Integrative Systems Biology (I2Sysbio), University of València and Consejo Superior de Investigaciones Científicas (CSIC), 46980, Valencia, Spain

Description

Supplement of the paper

“The emergence of high-fitness variants accelerates the slowdown of genome heterogeneity in the coronavirus”

Since the outbreak of the COVID-19 pandemic, the SARS-CoV-2 coronavirus accumulated an important amount of genome variability through mutation and recombination. To test evolutionary trends that could inform us on the adaptive process of the virus to its human host, we compute a genome-wide measure of Sequence Compositional Complexity (SCC) in high-quality coronavirus genomes from across the globe, covering the full span of the pandemic. In early samples, we find no statistical support for any trend in SCC values over time, although the virus genome appears to evolve faster than Brownian Motion expectation. However, in samples taken after the emergence of Variants of Concern with higher transmissibility, and controlling for phylogenetic and sampling effects, we detect a declining trend for SCC and an increasing one for its absolute evolutionary rate. This means that the decline in SCC itself accelerated over time, and that increasing fitness of variant genomes lead to a reduction of their genome sequence heterogeneity.

Supplementary files

File

Description

SupplementaryTables S1-S18.xlsx

The strain name, the collection date, and the SCC values for each analyzed genome.

SupplementaryTableS19.pdf

A complete list acknowledging all originating and submitting laboratories for the sequence data in GISAID EpiCoV on which these analyses are based.

SupplementaryTable S20.pdf

A complete list acknowledging the authors, originating and submitting laboratories of the genetic sequences we used for the analysis of the Nextstrain sample.

PhylogeneticTimetrees_NexusFormat.zip

Phylogenetic timetrees (Nexus format).

PhylogeneticTimetrees_NewickFormat.zip

Phylogenetic timetrees (Newick format).

SCCdata.zip

SCC data.

 

Notes

This project was funded by grants from the Spanish Minister of Science, Innovation and Universities (former Spanish Minister of Economy and Competitiveness) to J.L.O. (Project AGL2017-88702-C2-2-R) and A.M. (Project PID2019-105969GB-I00), a grant from Generalitat Valenciana to A.M. (Project Prometeo/2018/A/133) and co-financed by the European Regional Development Fund (ERDF). The most time-demanding computations were done on the servers of the Laboratory of Bioinformatics, Dept. of Genetics & Institute of Biotechnology, Center of Biomedical Research, 18100, Granada, Spain.

Files

PhylogeneticTimetrees_NewickFormat.zip

Files (4.9 MB)

Name Size Download all
md5:543413cc46daf993e3a9b9e3c712dc86
612.8 kB Preview Download
md5:1c2b12e84b5a15b846f7d017bead88a1
551.0 kB Preview Download
md5:d768112eb5be79e2cf4caf4319bf78b7
16.6 kB Download
md5:ece8cf7ce54b320de3e821ce49dd0f01
526.8 kB Preview Download
md5:5cc08059138ed7dd3868f601bd5c6e89
760.2 kB Preview Download
md5:a9d588d7b8c39c28201c5fb3ff271641
679.4 kB Preview Download
md5:da50514205e76a01bf3c42290df9b332
1.7 MB Download