Published January 20, 2025 | Version v1
Dataset Open

Multiple sequence alignment of 1,870,492 SARS-CoV-2 genomes assembled by the Viridian project

  • 1. ROR icon Centre de Recherche en Informatique, Signal et Automatique de Lille

Description

The assembled genomes were obtained from the publication by Hunt et al (10.1101/2024.04.29.591666). They were filtered to remove sequences with at least 100 non-ACGT nucleotides or with at least two consecutive Ns (except at the ends). The unaligned filtered sequences are available at https://zenodo.org/records/14698684.

The 1,870,492 remaining sequences were then aligned using Halign3 (10.1093/molbev/msac166), taking 1.2TB of RAM.

The computation was performed on the IFB Core cluster managed by the Institut Français de Bioinformatique.

Files

Files (63.2 MB)

Name Size Download all
md5:75db04a474804bb3b2cd583a3b9b24c8
63.2 MB Download

Additional details

Funding

Agence Nationale de la Recherche
INSSANE - Integrated Sequencing and Structural Analysis of RNA Probing Experiments ANR-21-CE45-0034
Agence Nationale de la Recherche
IFB (ex Renabi-IFB) - Institut français de bioinformatique ANR-11-INBS-0013

Dates

Submitted
2025-01-20