Published April 28, 2026 | Version 0.1.0
Dataset Open

Phylogeny-aided detection of contamination in nearly 5 million SARS-CoV-2 genomes

  • 1. EDMO icon European Molecular Biology Laboratory - European Bioinformatics Institute
  • 2. ROR icon Institut Pasteur
  • 3. ROR icon Institut de Biologie de l'École Normale Supérieure
  • 4. ROR icon Australian National University
  • 5. EMBL-European Bioinformatics Institute
  • 6. ROR icon European Bioinformatics Institute

Description

Zenodo repository of "Phylogeny-aided detection of contamination in nearly 5 million SARS-CoV-2 genomes".

This repository contains:

  • a reference tree built using MAPLE, of 785,011 samples that passed filtering steps;
  • a MAPLE alignment of 4,952,451 samples with masked positions;
  • a list of 10,942 flagged putatively contaminated samples.

Those files were generated using PhyCD with default parameters.

Files

flagged_samples_5_0_10_3.txt

Files (716.9 MB)

Name Size Download all
md5:66d90af3a722fe22ad8b5e22f25f8366
127.7 kB Preview Download
md5:25e20c445f5a45b5a8dfd3a9c6eac890
415.9 MB Download
md5:4b25e819720a935e2fa6cc6bdfad4b76
300.8 MB Download

Additional details

Software

Repository URL
https://github.com/oanoufa/PhyCD
Programming language
Python